DataTable indexes on ADO.Net 2.0

  • Thread starter Thread starter bryan.young
  • Start date Start date
B

bryan.young

How can I force the creation of an index that will be used with the
DataTable.Select method?

I need to use the DataTable.Select because I'm doing a range select
ex: select all rows where column x is between 0.001 and 0.006
If I create a DataView with a sort on column x (and I'm told an index
is created then), the DataTable.Select method doesn't use it (ie, no
change in execution time). However, if I specify
DataTable.Columns("ColumnX").Unique = true, an index is definitely
used (from 3.5 minutes execution time to 6 seconds).

But I can't guarantee that Column X will be unique. How can I force
the creation of that index where I can use it in a range selection.

ex:

mydrs = dtErrorData.Select("ColumnX > " & CurrentSample.ToString & "
AND ColumnX < " & (CurrentSample + 0.05).ToString) ', "ColumnX ASC")

For Each mydr In mydrs
fColumnX = mydr("ColumnX")
do some stuff
Next
 
How can I force the creation of an index that will be used with the
DataTable.Select method?

I need to use the DataTable.Select because I'm doing a range select
ex: select all rows where column x is between 0.001 and 0.006
If I create a DataView with a sort on column x (and I'm told an index
is created then), the DataTable.Select method doesn't use it (ie, no
change in execution time). However, if I specify
DataTable.Columns("ColumnX").Unique = true, an index is definitely
used (from 3.5 minutes execution time to 6 seconds).

But I can't guarantee that Column X will be unique. How can I force
the creation of that index where I can use it in a range selection.

ex:

mydrs = dtErrorData.Select("ColumnX > " & CurrentSample.ToString & "
AND ColumnX < " & (CurrentSample + 0.05).ToString) ', "ColumnX ASC")

For Each mydr In mydrs
fColumnX = mydr("ColumnX")
do some stuff
Next
Bryan,

There are a couple of ways select out a range of data from a DataTable or
DataView using indexes.

On my machine the best performing way to do it was to use DataTable.Select()
and the DataTable's PrimaryKey. In order to get the primaryKey to work, I
added an identity column to the DataTable and included it in the PK. This
method took about 3 seconds to create a DataTable with 100,000 rows of test
data an then about 0.06 seconds to select back about 7,000 rows. As long as
the select statement is covered by the PK, the performance stayed the same.

Something like this:

dt = new DataTable("TestData");
dt.Columns.Add("Value", typeof(double));

// Add an identity column in order to build a composite PK.
col = dt.Columns.Add("SeqNo", typeof(int));
col.AutoIncrement = true;
col.AutoIncrementSeed = 1;
col.AutoIncrementStep = 1;

dt.Columns.Add("Message", typeof(string));
dt.PrimaryKey = new DataColumn[] { dt.Columns[0], dt.Columns[1] };

.... populate the DataTable with random data ...

string filter = "";
DataTable dtRes;
DataRow[] rows;
DataRow newRow;

dtRes = dt.Clone();
filter = "[Value] >= " + txtMin.Text + " AND [Value] <= " + txtMax.Text;
rows = TestData.Select(filter, "Value ASC");

foreach (DataRow row in rows) {
newRow = dtRes.NewRow();
newRow[0] = row[0];
newRow[1] = row[1];
newRow[2] = row[2];
dtRes.Rows.Add(newRow);
}

Another way you can do it with DataView.RowFilter. This is a bit cleaner
code-wise but does not perform as well as the first method. This took about
3 seconds to populate a DataTable with 100,000 random rows, about 3.5 seconds
to sort the DataView, then about 0.5 seconds for each query after the initial
sort. The code for using a row filter looks like this:

DataView dv;
DataTable dtRes;

dv = dt.DefaultView;
dv.Sort = "Value ASC";
dv.RowFilter = "Value >= " + txtMin.Text + " AND Value <= " + txtMax.Text;
dtRes = dv.ToTable();

DataView.FindRows() looks promising if you are doing simple lookups (ie.
Value = 0.05). I dont think FindRows() works with greater than or less than
evaluations, though.

string filter = "";
DataView dv;
DataTable dtRes;
DataRowView[] rows;
DataRow newRow;

dtRes = dt.Clone();

dv = dt.DefaultView;
dv.Sort = "Value ASC";
rows = dv.FindRows(".005");

foreach (DataRowView row in rows)
{
newRow = dtRes.NewRow();
newRow[0] = row[0];
newRow[1] = row[1];
newRow[2] = row[2];
dtRes.Rows.Add(newRow);
}

Hope this helps,
Jason Vermillion
 
Thanks Jason. I appreciate the suggestion. Unfortunately, I'm trying
to select all records where a particular attribute is within a range
(a measurement is within 5 mm). And unless I force it to be so, I
can't guarantee the data to be unique. Presently, that's what I do...
force the data in that column to be unique and set the .unique=true on
that column... it creates an index and yes, it is faster... basically
its the same as your first suggestion.... it's just I had rather not
manipulate the actual data to be unique.

The second method would work, except I have to change the .RowFilter
in a loop to do what I want and that causes the index to be rebuilt.
Same result as with no index. So, I'm still stuck. I wish MS would
have incorporated the .Select method in the DataView and allowed me
access to create and use indexes.

How can I force the creation of an index that will be used with the
DataTable.Select method?
I need to use the DataTable.Select because I'm doing a range select
ex: select all rows where column x is between 0.001 and 0.006
If I create a DataView with a sort on column x (and I'm told an index
is created then), the DataTable.Select method doesn't use it (ie, no
change in execution time). However, if I specify
DataTable.Columns("ColumnX").Unique = true, an index is definitely
used (from 3.5 minutes execution time to 6 seconds).
But I can't guarantee that Column X will be unique. How can I force
the creation of that index where I can use it in a range selection.

mydrs = dtErrorData.Select("ColumnX > " & CurrentSample.ToString & "
AND ColumnX < " & (CurrentSample + 0.05).ToString) ', "ColumnX ASC")
For Each mydr In mydrs
fColumnX = mydr("ColumnX")
do some stuff
Next

Bryan,

There are a couple of ways select out a range of data from a DataTable or
DataView using indexes.

On my machine the best performing way to do it was to use DataTable.Select()
and the DataTable's PrimaryKey. In order to get the primaryKey to work, I
added an identity column to the DataTable and included it in the PK. This
method took about 3 seconds to create a DataTable with 100,000 rows of test
data an then about 0.06 seconds to select back about 7,000 rows. As long as
the select statement is covered by the PK, the performance stayed the same.

Something like this:

dt = new DataTable("TestData");
dt.Columns.Add("Value", typeof(double));

// Add an identity column in order to build a composite PK.
col = dt.Columns.Add("SeqNo", typeof(int));
col.AutoIncrement = true;
col.AutoIncrementSeed = 1;
col.AutoIncrementStep = 1;

dt.Columns.Add("Message", typeof(string));
dt.PrimaryKey = new DataColumn[] { dt.Columns[0], dt.Columns[1] };

... populate the DataTable with random data ...

string filter = "";
DataTable dtRes;
DataRow[] rows;
DataRow newRow;

dtRes = dt.Clone();
filter = "[Value] >= " + txtMin.Text + " AND [Value] <= " + txtMax.Text;
rows = TestData.Select(filter, "Value ASC");

foreach (DataRow row in rows) {
newRow = dtRes.NewRow();
newRow[0] = row[0];
newRow[1] = row[1];
newRow[2] = row[2];
dtRes.Rows.Add(newRow);

}

Another way you can do it with DataView.RowFilter. This is a bit cleaner
code-wise but does not perform as well as the first method. This took about
3 seconds to populate a DataTable with 100,000 random rows, about 3.5 seconds
to sort the DataView, then about 0.5 seconds for each query after the initial
sort. The code for using a row filter looks like this:

DataView dv;
DataTable dtRes;

dv = dt.DefaultView;
dv.Sort = "Value ASC";
dv.RowFilter = "Value >= " + txtMin.Text + " AND Value <= " + txtMax.Text;
dtRes = dv.ToTable();

DataView.FindRows() looks promising if you are doing simple lookups (ie.
Value = 0.05). I dont think FindRows() works with greater than or less than
evaluations, though.

string filter = "";
DataView dv;
DataTable dtRes;
DataRowView[] rows;
DataRow newRow;

dtRes = dt.Clone();

dv = dt.DefaultView;
dv.Sort = "Value ASC";
rows = dv.FindRows(".005");

foreach (DataRowView row in rows)
{
newRow = dtRes.NewRow();
newRow[0] = row[0];
newRow[1] = row[1];
newRow[2] = row[2];
dtRes.Rows.Add(newRow);

}

Hope this helps,
Jason Vermillion
 
Back
Top