Super-fast keyfile

Lee Gillie · Jul 6, 2004

New to ADO.NET so seeking comments on strategy.

I need to scan data, and build a "keyfile" on data I am extracting
during the scan. Actual fields varies by job, and I dynamically create
this from a description, and there may be 5-30 fields of integers and
strings typically. Small jobs may be a few hundred records, but I could
have up to 250,000. The machine this runs on is an extremely fast, dual
processor, with green gobs of physical memory. I need to sort the data
in alternate orders, make passes over it, updating and filling in more
fields as I go.

My thought was to use an Dataset. I understand I can make tables
programmatically, and populate programmatically, record at a time, field
at a time. Then I can use a Dataview to make passes in alternate
orderings, and make changes and fill in empty fields.

After each pass I want to persist the major table to Jet, primarily for
problem analysis. As each pass is done, replace the table content on
disk with what I have in my in-memory dataset.

I have already implemented this talking directly to JET, but have been
dissappointed in the performance, even after some amount of tweaking.

It would seem the new approach should be a real screamer. Seeking
comments from those more experienced with ADO.NET on this approach.

Lee Gillie · Jul 9, 2004

Lee said:
New to ADO.NET so seeking comments on strategy.

I need to scan data, and build a "keyfile" on data I am extracting
during the scan. Actual fields varies by job, and I dynamically create
this from a description, and there may be 5-30 fields of integers and
strings typically. Small jobs may be a few hundred records, but I could
have up to 250,000. The machine this runs on is an extremely fast, dual
processor, with green gobs of physical memory. I need to sort the data
in alternate orders, make passes over it, updating and filling in more
fields as I go.

My thought was to use an Dataset. I understand I can make tables
programmatically, and populate programmatically, record at a time, field
at a time. Then I can use a Dataview to make passes in alternate
orderings, and make changes and fill in empty fields.

After each pass I want to persist the major table to Jet, primarily for
problem analysis. As each pass is done, replace the table content on
disk with what I have in my in-memory dataset.

I have already implemented this talking directly to JET, but have been
dissappointed in the performance, even after some amount of tweaking.

It would seem the new approach should be a real screamer. Seeking
comments from those more experienced with ADO.NET on this approach.

FWIW -

I started down this path. Getting good results. Found out the
totally-in-memory DataSet was pretty fast, but also to avoid DataView
like the plague. It really bogged down, but went to use Rows property of
DataTable when ever I possibly can, and it screams.

Between phases and passes I would still like to be able to persist the
DataSet to disk for diagnostic purposes. I can not find a way to create
the Access database populated with the tables from my programmatically
created DataSet. The reason for wanting to use Access is that it is a
self-contained file, I don't need a server, and the DB would mostly be
used for ad-hoc stuff, after the fact. It is as if the Access DB must
already exist, and I don't see how to do this from ADO.NET. So for the
moment I am merely persisting to a tab-separated text file.

- Lee

Super-fast keyfile

Lee Gillie

Lee Gillie