Hashtable object problem in multithreading

  • Thread starter Thread starter Andrius B.
  • Start date Start date
A

Andrius B.

Hi all.

I am developing a project, one part of which is reading data from MDB
(Access DB file), performing some calculation with the data and then
displaying in ListView.

The problem is the time needed to fill the Listview control. There are more
than 5000 items (with 15 subitems on each item) to display, despite the
using of Listview.Items.Addrange method.

As one way for the solution could be using multithreading and a Hashtable
object; this Hashtable is a declared as friend variable (Friend Queue as
Hashtable)
So, one function runs in one current thread, doing reading from MDB and
calculation (calculation takes not so less time), and adding the data to
ListviewItem object, and this ListviewItem is added to
hashtable: Queue.Add(key, ListviewItemobj). Thus, the quantity of the
objects stored in Queue is increasing

Another function runs in another thread and could perform reading at the
same time the content of Queue, getting the ListviewItem object from it one
by one, and then adding them to the Listview using Listview.Items.Add (item
as listviewitem) method.
The problem is for this second function - to find out which item is already
taken from Queue and inserted into Listview, and which are not, in order to
avoind duplicate inserting.
For this purpose, the code should remove the key/value (key/listviewitem in
my case) pair, which are no longer needed, from the Queue. But the objects
in the Hashtable could be accessed only by Keys. So, I must use smth like
Keys.IEnumerable interface for to hashtable object, calling .Movenext and
..Current of the IEnumerable object. But that produces an exception
("Collection has been changed, Enumaration will not go on" or smth like
that), because both threads use the same Hashtable object Queue
simultaneosly. Should I used some kind of locking the Queue object in order
the first thread could perform no writing to it? But in such a case, I would
loose some time. Doing for...each cycle many times for the Queue, while the
quantity of items in which is only increasing, cannot also be a good
solution.

By the way, the principle in the issue described is like performing some
task in the human life. E.g., we have a lot of books on the high shell, and
two workers for a task to bring all the book to the truck-car. The first
worker takes one or some books from the shell and puts them on the floor
formating a heap. Simultaneosly, the second worker comes and takes one or
some books from the top of the heap and brings them to the truck. Sometimes
the first worker works faster than the second, and the heap increases.
Sometimes the second does, and the heap becomes smaller or even diseappers,
because the second worker takes the last books from the floor, and the first
worker at this moment is just taking books from the shell.
So, the second worker should not wait till all the books will be in the
heap, and only then begin to carry them to the truck. It would be a nonsense
:)
The same thing (simultaneosly working) I wuold like to do in my project.


Sorry for long "story". I just wanted to explain my problem as clear as
possible.

Thanks for any ideas.
 
Andrius said:
The problem is for this second function - to find out which item is
already taken from Queue and inserted into Listview, and which are
not, in order to avoind duplicate inserting.

Why not use what the name of the variable implies: a queue (object)? (FIFO)
You just have to synchronize the access for a short time (synclock in VB)

And one thing to think about: I don't know how much of the time is spent
with adding the listviewitems and how much for reading the data and doing
the calculations. If the former takes most of the time, it's of little use
to put the whole process in another thread because adding the items still
must be done in the UI thread. Though, it can be a means of unloading the UI
thread. And don't forget the Invoke/Begininvoke overhead if there is any.


Armin
 
Andrius B. said:
Hi all.

I am developing a project, one part of which is reading data from MDB
(Access DB file), performing some calculation with the data and then
displaying in ListView.

The problem is the time needed to fill the Listview control. There are
more than 5000 items (with 15 subitems on each item) to display, despite
the using of Listview.Items.Addrange method.

The performance consequences of filling listviews and treeviews completely
and immediately is a chronic problem that's been faced and solved many
times. The best approach I know of is to do what's often called "lazy
population". For a treeview, one extremely good strategy is to not populate
a subnode until the user clicks on a node's "expand" icon - it's a simple
trick, which I can expand on (pun intended). For a listview, a good strategy
is to page the contents in as the user navigates around the listview. So
when you first display it you'd fill only the rows that were visible, or
better some larger but not-too-expensive range of the listview. This might
be three or four or ten pages of rows, so that the user could scroll down
smoothly through some amount of the contents. As the user gets near the end
of what you've populated (your code would be watching as the user navigates
around), your code would then populate a few more pages, and so on. Because
the time to add on a few more pages is relatively small, the user may see
that as a reasonable or maybe even insignificant delay. Alternatively,
rather than doing this automatically, you could use some paging controls -
Next Page and Previous Page buttons, and perhaps a list of clickable page
numbers, like search results. Or, even better, provide a query interface, so
that your users can refine what they want to see before you start loading it
up.

A refinement on the lazy population approach would be initially to populate
some number of pages of contents and then start up a background worker
thread that would prepare arrays of ListViewItems and and send them back in
the progress events, so that the event handler (on the UI thread) could do
an AddRange to contribute to full population. In the meantime the user is
merrily working away on what you've already provided and if they don't try
to get ahead of you it'll seem like it's all "immediately" available. If
they do try to get ahead of you (e.g., page to end), your UI code would have
to block until the necessary data was available, or bite the bullet and
institute a paging scheme.

However you go about this, it's almost always a mistake to take the simplest
approach and try to fill the control completely, because in a very large
dataset the user is only going to be interacting with a small proportion of
the contents anyway, and most of the load time is wasted and prevents the
user from doing anything with the data until the unnecessary work is
complete. I've heard of naive programmers trying to load up controls with as
many as 100,000 rows, and just not quite understanding why the user
interface was so unusable.

Indexing for searches is a different issue, and here the indexes could be
generic collections, such as Dictionary(of T, T). You could have indexes for
as many columns as you wanted to index, of course, and because you could
build those indexes during the initial load of the data from the database,
the full indexes would be available before you actually started to load any
of the rows. That, in conjunction with a paging scheme, would give you good
random access to the dataset. But, for this, everything depends on how your
users actually need to be able to access the data.

I rarely use data binding for anything more complicated than small dropdown
lists and so forth, so I can't comment authoritatively on what resources
that would bring to the table, but if data binding of, say, a DataTable to
the ListView could be made to handle paging for you that would simplify your
task considerably.

Tom Dacon
Dacon Software Consulting
 
Back
Top