Fastest way to import large fixed width file into a DataTable

  • Thread starter Thread starter BostonNole
  • Start date Start date
B

BostonNole

Using Visual Studio 2005, .NET 2.0 and VB.NET:

I am looking for the fastest possible way to import a very large fixed
width file (over 6 million records and over 1.2 GB file size) into a
DataTable.

Any suggestions?
 
BostonNole,

What are you planning to do with the data once it is in a datatable?

Kerry Moorman
 
What is the goal ?

A datatable is supposed to hold as much data as a user really needs (not a
whole db or table). If you want to import into SQL Server I would rather see
the SqlBulkInsert capability (external in 1.1, included as part of ADO.NET
in 2.0) rather than using a datatable as an intermediate step...
 
I need to run some validation rules, formatting rules and business
rules against the data, then dump it out to another text file.
 
SqlBulkInsert ??? I think you mean SQLBulkCopy class.

SQL Server or any other relational database is not an option.
 
I saw in the other thread this is to check the file. What kind of checks ?
Basically :

- either you can check on a row per row basis in which case you could
perhaps read them using buffered reads and check from there (avoiding to
load the whole thing in memory as a single huge chunk), You could also does
what you can on a row per row basis and keep just what is needed for row
against row comparison (for example if you need to make sure you have unique
values, you have just to keep the pk field, keeping other fields is not
needed). Also do you need a datatable or could it be a data structure of
your own ?

- my personal preference for the second option would be to load this in a db
so that I can perform whatever sql statements could allows to check this in
case you would need to check a row against another row (not an option in
your case) or any kind of queries you could have to do

- my last resort would be to load inside a datatable (and once loaded you'll
still have to check).

My first step would be first to see what I can check while reading and what
I need to really keep for further checks...
 
Using Visual Studio 2005, .NET 2.0 and VB.NET:

I am looking for the fastest possible way to import a very large fixed
width file (over 6 million records and over 1.2 GB file size) into a
DataTable.

Any suggestions?

You could possibly write a SQL function that will do the data
massaging and call the function via code and let the SQL server do all
the work.

While I don't work with the size files you're dealing with generally
when I have to work with a flat file I convert the file to an object
and in the object constructor I massage my data and then use the
object to do the insertion. Granted you're running through the list
twice but the insert goes really fast once its out of the file.
 
Hallo,
I need to run some validation rules, formatting rules and business
rules against the data, then dump it out to another text file.

Can you give me an idea what is the sense of this, it sounds for me as going
from here to Bruxelles via Tokio. Probably not even a nice trip.

Cor
 
Can't do SQL call. SQL and other relational databases are not an
option.

The goal is simple....find the fastest most efficient way to import a
very large fixed width text file into some type of object (preferrably
a DataTable) in .NET.
 
I need to run some validation rules, formatting rules and business
rules against the data, then dump it out to another text file.

With the size of file your talking about, moving it all into memory is
probably not going to be a good option - and that is exactly what your
talking about by loading it into a DataTable.

So, the question becomes - do your validation rules depend on data
from other records in the file, or can each record be treated
individually? Because if they can be treated individually, then I
would consider reading the groups of records (you said it was fixed
width file), processing them, and writing them out to the new file,
repeat until finished. Most likely, the slow part of this is going to
be your reads and writes... So, by reading and writting them in
chunks (say building your output record in a stringbuilder) you can
minimize the number of disk hits.
 
Back
Top