How to specify a schema when filling a DataTable from a .csv file

  • Thread starter Thread starter Ryan
  • Start date Start date
R

Ryan

Hi

I have many comma-delimited files which I'm performing data
comparisons on. I've decided to import that data via DataTables and
OleDbDataAdapters. The loading of the data works great, it is
controlling of the schema that is problematic. I need to have full
control of the schema since I do data comparisons between the
DataTables at a later time. So, right now, when I import a file, some
columns contain incorrect data types due to ADO.NET doing some
guesswork. Ideally, I would like to treat all the columns as strings.

What is the best approach to controlling the schema? Is it to:

1) Create a schema.ini file for each data file? This is less than
ideal since I have many data files, each with different names. I
couldn't even get this to work actually.

2) Construct each DataTable and set each column's datatype
accordingly before filling it? I tried this and the OleDbDataAdapters
ignored my columns when filling the DataTable. It actually added more
columns.

3) Create a typed DataSet and fill that?

4) Something I'm not thinking of?

Here is a sample of the code I'm using as well two text file examples
to show you my dilemma.

test1.csv test2.csv
10 10
20 20
123 14512431234

string conn = @"Provider=Microsoft.Jet.OLEDB.4.0;Data
Source=c:\simulation;Extended
Properties='text;HDR=no;FMT=delimited;MaxScanRows=100'";

string sql1 = "Select * From [test1.csv]";
string sql2 = "Select * From [test2.csv]";

DataTable dt1 = new DataTable("test1.csv");
DataTable dt2 = new DataTable("test2.csv");

OleDbDataAdapter da1 = new OleDbDataAdapter(sql1, conn);
OleDbDataAdapter da2 = new OleDbDataAdapter(sql2, conn);

da1.Fill(dt1);
da2.Fill(dt2);

Console.WriteLine("test1.csv column 1 DataType: [" +
dt1.Columns[0].DataType.ToString() + "]");
Console.WriteLine("test2.csv column 1 DataType: [" +
dt2.Columns[0].DataType.ToString() + "]");

Console.ReadLine();

Output is:
test1.csv column 1 DataType: [System.Int32]
test2.csv column 1 DataType: [System.Double]

Thansk in advance!!
 
I usually create the schema.ini file. This isn't really that hard.

There can only be one schema.ini file per directory and if you have 10 .csv
files in a single directory you just need to append the schema.ini with 10
sections, each with the name of the .csv file.

refer to
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/odbc/htm/odbcjetschema_ini_file.asp
for more details on the format. This way you can control the type to be
"text" always or if you know the data types beforehand, preset them.

Jiho
 
Ryan,

I was going to recommend using Typed DataSets, but if you've already got it
working using the schema.ini file, then you're probably all set.

~~Bonnie
 
Back
Top