Speed and memory usage of Dataset

  • Thread starter Thread starter dougd
  • Start date Start date
D

dougd

I will start by saying I am somewhat new to ADO.net, that said here
goes.

I have a dbase file with 450,000+ records, (400MB+ file size) I am
loading only the UniqueID field into a DataSet and I have 2 problems.
1. It takes about 45 seconds to load
2. It is consuming about 400MB of RAM at that point.

Is this kind of behaivor to be expected?
Sample code:

string connString = "Provider=VFPOLEDB.1;Data Source=c:\\btest\\";
connNav = new OleDbConnection(connString);
connNav.Open();
dsNav = new DataSet();
daNav = new OleDbDataAdapter("select id from test", connNav);
daNav.Fill(dsNav, "test"); <-- Takes about 45 seconds

The ID field is numeric. There are about 220 other fields in the table
(It was this way when I started here).

Any help would be appriciated.

Doug
 
Hello dougd,

I am not that surprised at the size (though a little). When you get the
resulting dataset, does it have the ID field constrainted as unique (e.g.
Is it theprimary key)? To test the speed, do the same with a DataReader
and see how fast that is. My guess is that the DataReader will be faster
(because there is overhead with a DataSet), but should be similar. If it
is similar, then its the data access taking that time (450K records is a
lot of data). If not, there may be something nefarious going on. BTW, ADO.NET
1.x or 2.0?

Thanks,
Shawn Wildermuth
Speaker, Author and C# MVP
http://adoguy.com
 
ADO.NET is not really designed to browse entire tables of very large
databases. It's connectionless model means the entire table has to be
loaded into memory taking a long time and lots of memory. Generally
the ADO.NET approach is to only load a filtered subset of the table in
the Data Set and operate on that.

If you have a desktop application that you want to provide browsing of
very large tables for then ADO.NET is probably not the best solution.
You can use standard ADODB connected recordsets from .NET - however
then you can't use .NET Databinding. Infralution has a solution to
this that allows you to bind .NET controls to connected ADO recordsets
- providing almost instantaneous startup and minimal memory overhead.
You can get more information and download an evaluation version from:

www.infralution.com/virtualdata.html

Regards
Grant Frisken
Infralution
 
Dough,

This is certainly a situation where an arraylist (or probably better a
hashtable if you have to find), filled with a datareader, will most probably
save you a lot time and memory.

As seldom does the dataset give in this case nothing extra.
(in fact the datarows)

Just my thought,

Cor
 
It is ADO.net 2.0. In the dbase table it has a unique index, however
when it build the datatable it is not constrained as a PK. (I don't
think)

I am aware of the fact that browsing a large dataset is not an ideal
method in ADO.net that is why i am using the approach of only loading
the UniqueID as a DataSet, attaching that set to a binding source. On
the PositionChanged event of that Binding Source I use the current
UniqueID to pull a new DataSet of only that records information.

I did try this with a DataReader, and upon loading it into a
BindingSource, its memory usage was arround 575MB. I think the
BindingSource actually loads the enitre contents of the DataReader into
it List property.

Any other thoughts? I suppose I can just go back to using old-fashioned
ADO.

Doug
 
So I tried removing almost all of the fields from the test file, parsed
it down to uniqueID field + a few text fields, and using a datareader,
memory usage went down to arround 100MB.

This leads me to belive that even though I was only selecting 1 column,
it was allocating enough space in memory for the entire file. That
can't be right.

Using the stripped down table, also ran in about 5 seconds with a
DataReader vs. about 12 secs before I stripped it down.

Doug
 
Doug,
Keep in mind that 450,000 numbers is going to consumes a fair amount of
memory... Loading that many id numbers into a control is going to
consume even more memory...

You need to ask yourself: Why am I loading so many ID numbers at a
time? Is it really necessary? Maybe there is one or more fields in
that database that can be used to filter the number of numbers that are
returned...

If there isn't an alternative, then you will have to live with the
memory requirements.

If you are going for memory conservation, general performance, and
don't need the extra capabilities of the DataSet, you will find that
the DataReader is superior to the DataSet.
 
Back
Top