<***VB.NET***> DataTable.Select's efficiency???

  • Thread starter Thread starter dotnet-MS
  • Start date Start date
D

dotnet-MS

Hello!!
You would be welcome to answer the Question:

I make a DataSet's instance(sfSalesPlanDataSet) of DB's Table,
Add new DataRow into sfSalesPlanDataSet, and seek a record,
But When I Creating the DataRow, seek the record, it becomes very slow.
the following code is the sample:
When running the <***Sample 1***>, it is very fast (3 seconds);
When running the <***Sample 2***>, it becomes very very slow (20 minutes);

<***Sample 1***>

Dim i As Integer = 0
Dim j As Random = New Random(10000)

'sfSalesPlanDataSet: DataSet's Instance of a DB's Table
For i = 0 To 10000 ':loop 1
'Create a new data row
Dim dataRow As DataRow = sfSalesPlanDataSet.Tables(0).NewRow()

'Set value to datarow
dataRow(0) = "A"
dataRow(1) = i.ToString()
dataRow(2) = "200405"
dataRow(3) = 100.0

'add dataRow to the dataTable
sfSalesPlanDataSet.Tables(0).Rows.Add(dataRow)
Next

'seek the record from the DataTable
For i = 0 To 10000 ':loop 2
Dim selectedRecords() As DataRow = Nothing
selectedRecords =
sfSalesPlanDataSet.Tables(0).Select(String.Format("ITEMNO = '{0}'",
i.ToString()))
Next

<***Sample 2***>
Dim i As Integer = 0
Dim j As Random = New Random(10000)

'sfSalesPlanDataSet: DataSet's Instance of a DB's Table
For i = 0 To 10000 ':loop 1
'Create a new data row
Dim dataRow As DataRow = sfSalesPlanDataSet.Tables(0).NewRow()

'Set value to datarow
dataRow(0) = "A"
dataRow(1) = i.ToString()
dataRow(2) = "200405"
dataRow(3) = 100.0

'add dataRow to the dataTable
sfSalesPlanDataSet.Tables(0).Rows.Add(dataRow)

'seek the record from the DataTable
Dim selectedRecords() As DataRow = Nothing
selectedRecords =
sfSalesPlanDataSet.Tables(0).Select(String.Format("ITEMNO = '{0}'",
i.ToString()))
Next

From Yejian ([email protected])
 
Hi Yejian

Did you try the dataview rowfilter with a sort before, that is faster,
however I do not know how much in your situation.

Cor
 
Hi Cor

Thanks for your help,
but it is still very slow.
what is the difference between <***Sample 1***> and <***Sample 2***>?
why the <***Sample 2***> is so much slower than the <***Sample 1***>?

thank you very much.

Yejian
 
Although I can't replicate the exact scenario that you have (at least I
can't be sure its the same b/c I don't have the table definitions), i"m not
seeign anywhere near this drastic performance degradation. However, the two
code blocks are dramatically different. In the first one you are doing all
of the additions in one swoop. You are doing the .Select 's on essentially
static data, it's not chaning. So the Select statement at the second pass
of the loop is being performed on the same set of data as the first, third,
nine thousandth etc. The same isn't the case in the second one. Each time
through you are querying a different datatable in that the number of rows is
different.

There could be some other stuff going on too b/c I can't see the column
definitions and if/what the primary key column is would have some bearing.
Also, what percentage of the time is being eaten by the addtion vs. the
Select on your machine? Again I'm not seeing the profound degradtion and
where I am seeing some, the addition of the row to the table is accounting
for a good part of it.

When i get to work I can run some more tests, but it'd help if I knew which
row your key was and/or if you have a sort specified

--
W.G. Ryan MVP Windows - Embedded

http://forums.devbuzz.com
http://www.knowdotnet.com/dataaccess.html
http://www.msmvps.com/williamryan/
 
Hi William:

This is the function of creating DataTable, Thanks for your help!
********Start *********
Public Function CreateDataTable() As Boolean
Try
'Create DataTable
Dim salesPlanTable As DataTable = New DataTable("SFSalesPlan")
salesPlanTable.CaseSensitive = False

'Column1
Dim shipmentFacilityColumn As DataColumn = New
DataColumn("SHIPMENTFACILITY", GetType(String))
salesPlanTable.Columns.Add(shipmentFacilityColumn)
shipmentFacilityColumn.AllowDBNull = False

'Column2
Dim itemNoColumn As DataColumn = New DataColumn("ITEMNO",
GetType(String))
salesPlanTable.Columns.Add(itemNoColumn)
itemNoColumn.AllowDBNull = False

'Column3
Dim jobMonthColumn As DataColumn = New DataColumn("JOBMONTH",
GetType(String))
salesPlanTable.Columns.Add(jobMonthColumn)
jobMonthColumn.AllowDBNull = False

'Column4
Dim salesQuantityColumn As DataColumn = New
DataColumn("SALESQTY", GetType(Integer))
salesPlanTable.Columns.Add(salesQuantityColumn)

'Add primary key
Dim primaryKey() As DataColumn = {shipmentFacilityColumn,
itemNoColumn, jobMonthColumn}
salesPlanTable.PrimaryKey = primaryKey

'Add DataTable to DataSet
Me.Tables.Add(salesPlanTable)
Catch ex As Exception
Return False
End Try

Return True
End Function
********End*********

Ye Jian
 
HI Cor:
I'm glad to hear from you. Thanks for your help!
Because I'm a Chinese, My English is a little poor.
Let's look for answers.
Thank you all the same.
Yejian
 
HI Yejian,

I see now it is a complete theoretical problem, however interesting.
I tested it and got the same strange results as you.
When we both are not overseeing something, the first thought should be that
the second option should be faster, because it is doing everytime a growing
table, however it is dramaticly slower.

(Bill when you see this, you should have been to bed, that early you cannot
stend up in the morning, I have added a complete testable set after this
when you are interested and maybe you see what we do not see.)

I as well tested if the select was adding something to the datatable object
when doing the select, however as far as I could see it did not. (I
serialized before and after and the where both the same)

So the only thing I can say is: I am curious as well.
It looks if there is some tagsort build, however where.

Sorry no answer

Cor

\\\
Dim ds As New DataSet
Dim dt As New DataTable
For x As Integer = x To 3
Dim dc As New DataColumn
dt.Columns.Add(dc)
Next
dt.Columns(1).ColumnName = "ITEMNO"
ds.Tables.Add(dt)
Dim i As Integer = 0
Dim starttimer As Integer = Environment.TickCount
For i = 0 To 1000
'Create a new data row
Dim dataRow As DataRow = ds.Tables(0).NewRow()
'Set value to datarow
dataRow(0) = "A"
dataRow(1) = i.ToString()
dataRow(2) = "200405"
dataRow(3) = 100.0
'add dataRow to the dataTable
ds.Tables(0).Rows.Add(dataRow)
Next
'seek the record from the DataTable
For i = 0 To 1000 ':loop 2
Dim selectedRecords() As DataRow
selectedRecords = ds.Tables(0).Select(String.Format _
("ITEMNO = '{0}'", i.ToString()))
Next
MessageBox.Show(CStr(Environment.TickCount - starttimer))
starttimer = Environment.TickCount
'ds: DataSet's Instance of a DB's Table
For i = 0 To 1000 ':loop 1
'Create a new data row
Dim dataRow As DataRow = ds.Tables(0).NewRow()
'Set value to datarow
dataRow(0) = "A"
dataRow(1) = i.ToString()
dataRow(2) = "200405"
dataRow(3) = 100.0
'add dataRow to the dataTable
ds.Tables(0).Rows.Add(dataRow)
'seek the record from the DataTable
Dim selectedRecords() As DataRow
selectedRecords = ds.Tables(0).Select _
(String.Format("ITEMNO = '{0}'", i.ToString()))
Next
MessageBox.Show((Environment.TickCount - starttimer).ToString)
End Sub
///
 
dotnet-MS said:
You would be welcome to answer the Question:

<snip>

For any C# fans out there, here's a short but complete program which
demonstrates the problem in C#. It's only doing 4000 iterations to keep
my sanity in check, but it shows the problem easily enough. On my box,
I got results of

Batched: 00:00:00.2187500
Interleaved: 00:01:07.5781250


I'll investigate why...


using System;
using System.Data;

class Test
{
const int Iterations = 4000;

static void Main()
{
TestBatched();
TestInterleaved();
}

static void TestBatched()
{
DataTable table = GetTable();
DateTime start = DateTime.Now;

for (int i=0; i < Iterations; i++)
{
DataRow row = table.NewRow();
row[0]=i.ToString();
row[1]="A";
row[2]="B";
row[3]=100.0;
table.Rows.Add(row);
}

for (int i=0; i < Iterations; i++)
{
DataRow[] selected = table.Select
(String.Format("ID='{0}'", i));
}
DateTime end = DateTime.Now;
Console.WriteLine ("Batched: {0}", end-start);
}

static void TestInterleaved()
{
DataTable table = GetTable();
DateTime start = DateTime.Now;

for (int i=0; i < Iterations; i++)
{
DataRow row = table.NewRow();
row[0]=i.ToString();
row[1]="A";
row[2]="B";
row[3]=100.0;
table.Rows.Add(row);
DataRow[] selected = table.Select
(String.Format("ID='{0}'", i));
}

DateTime end = DateTime.Now;
Console.WriteLine ("Interleaved: {0}", end-start);
}

static DataTable GetTable()
{
DataTable table = new DataTable();
table.CaseSensitive = false;

table.Columns.Add ("ID", typeof(string));
table.Columns.Add ("First", typeof(string));
table.Columns.Add ("Second", typeof(string));
table.Columns.Add ("Third", typeof(double));
return table;
}
}
 
Jon Skeet said:
<snip>

For any C# fans out there, here's a short but complete program which
demonstrates the problem in C#. It's only doing 4000 iterations to keep
my sanity in check, but it shows the problem easily enough. On my box,
I got results of

Batched: 00:00:00.2187500
Interleaved: 00:01:07.5781250

I'll investigate why...

Well, the first thing I tried was to make the data table case-
sensitive. This reduced the time for interleaved selects by half, while
barely touching the batched selects time.

This suggests to me that the DataTable (or something) is caching some
normalised (or otherwise processed) version of the table, but that the
cache is cleared each time a new row is added.

Another way of showing this behaviour is by adding *lots* of rows (eg a
million) to the table, and then in one test doing two selects, and in
another doing a select, adding a row, then doing another select. The
second test takes about twice as long as the first.
 
Hi Jon,

This was my first thought also (see my tagsort sentence).
This suggests to me that the DataTable (or something) is caching some
normalised (or otherwise processed) version of the table, but that the
cache is cleared each time a new row is added.
Using the dataview.rowfilter makes it a little bit faster however not that
fast as the first version.
(I keep the original reference from YeJian).

Cor
 
Ok, I ran through it again and I see the degradation now. I only used 100
originally b/c I was comparing proportions and stepping through it.. bad
choice on my part I realize now.

It's slower than hell.

Now a few observations. The addition of the rows appeared to take longer
when I just used 100 but it's exacerbated greatly when I use large numbers.

However, the addition of the rows took : .0468750. No big deal
With the separate loops, I'm looking at 2.68750000 which is what we'd
expect.

I was looking into it and originally I agreed with Cor's assessment about
the rowfitler. I expected that it would be faster than both methods in that
the array of rows being created wasn't happening and all that good stuff.
Absolutely not the case. I put it outside of the creation loop and ran it
in its own loop. It took 32.1718750 (for 4000 as Jon used) to iterate
through and set the filter. This is very weird. However, the same code
with Select took 2.68 as mentioned above. This appears to indicate a
dramatic performance difference with the RowFilter which I had always
beleived to be faster. I need to do some more research, but the RowFilter
seemed to perform about as bad as the select statement in the same loop. I
suspected that maybe adding a sort in there might help but it didn't. Well,
it did a little bit, it knocked it down to 30.4687500 but that's trivial in
comparison the variations in Select and Rowfilter. So anyway, my original
statements about no real degradation were based on not running the test as
you did which was wrong on my part. With 40% of what you ran I saw a huge
difference. In addition, I saw an even more profound difference when using
the rowfilter. Moreoever, using the RowFilter in it's own loop (which with
Select is the fastest fo the methods) was still slower by an huge margin.
However, putting the rowfilter inside the loop (the slowest method with
Select.) reduced the time to 18.7812500 (unsorted) By Sorting on ID I moved
it down to 16.5687500 (so sorting it beforehand shaves 2 seconds in both
cases - probably need some more rows to see if there's a proportional
difference.

---Still ran Slow roughly 32 seconds
int TestInterleaved()

{

DataTable table = GetTable();

// DateTime start = DateTime.Now;

DataView dv = table.DefaultView;

for (int i=0; i < Iterations; i++)

{

DataRow row = table.NewRow();

row[0]=i.ToString();

row[1]="A";

row[2]="B";

row[3]=100.0;

table.Rows.Add(row);


}

DateTime start = DateTime.Now;

// for (int i=0; i < Iterations; i++)

// {

// DataRow[] selected = table.Select

// (String.Format("ID='{0}'", i));

// }

for(int i = 0; i<Iterations; i++)

{

dv.RowFilter = "ID = '" + i + "'";

}


DateTime end = DateTime.Now;

res.Text = (end-start).ToString();

return 0;

}



--This ran at 18 seconds, the inverse of what we'd expect:

int TestInterleaved()

{

DataTable table = GetTable();

DateTime start = DateTime.Now;

DataView dv = table.DefaultView;

for (int i=0; i < Iterations; i++)

{

DataRow row = table.NewRow();

row[0]=i.ToString();

row[1]="A";

row[2]="B";

row[3]=100.0;

table.Rows.Add(row);

dv.RowFilter = "ID = '" + i + "'";

}


// for (int i=0; i < Iterations; i++)

// {

// DataRow[] selected = table.Select

// (String.Format("ID='{0}'", i));

// }

// dv.Sort = "ID";

// for(int i = 0; i<Iterations; i++)

// {

//

// }


DateTime end = DateTime.Now;

res.Text = (end-start).ToString();

return 0;

}



W.G. Ryan, eMVP

http://forums.devbuzz.com/
http://www.knowdotnet.com/williamryan.html
http://www.msmvps.com/WilliamRyan/
 
Cor:

Originally I thought the same thing and woudl have expected the rowfilter to
be faster. I've rerun my tests and found just the opposite, the rowfilter is
a lot slower in both cases. The even weirder part is that it's slower still
when you PUT IT IN ITS OWN LOOP. So putting it in the loop where the rows
are added is almost twice as fast. I can kind of think of a reason this may
be but its just a guess.... The rowfilter doesn't have as many rows to look
through when filtering if you use apply the filter in the addition loop. If
you do it outside of the loop, it's searching through 4,000 rows each time
(or 10k if you used it instead). If you do it in the loop, the first pass
only searches for 1 row, then 2, then 3 etc. Even more weird is that it
only shaves two seconds off if you sort it first. Definitely counter to
what I would have guessed.

--

W.G. Ryan, eMVP

http://forums.devbuzz.com/
http://www.knowdotnet.com/williamryan.html
http://www.msmvps.com/WilliamRyan/
 
Hi Bill,

I tested it as well, by the way, in the VB code is a small error, I did not
clear the datatable before the next test, and however the difference stays
high 1:22

With the dataview that is terrible slow when you set first a sort, when you
do not do that my experience is than, that the differences are aprox 1:15:22
and therefore faster than the select.


Cor
 
That's interesting, post your code if you would.. or if you don't mind,
check out my code in the other post (the one on 12:37), I've included all of
my code and no matter which method I chose, the DataView was slower by a
bunch in the instance where the loops are seperate w/ the select statement.
I tried slicing it every way I could and I still came up slower with the
RowFilter (which really suprises me). The sort didn't take any time at all
and If I understand you correctly, this is a difference between our two
scenarios. The only difference I saw is that sorting it took about 2 seconds
off the total time. I was also using 4,000 rows as in Jon S's example. i
posted the code I used though, maybe I'm doing something wrong but if so, I
thought i knew the dataview pretty well so I'd love to see what you are
doing different.

Cheers,

Bill

--

W.G. Ryan, eMVP

http://forums.devbuzz.com/
http://www.knowdotnet.com/williamryan.html
http://www.msmvps.com/WilliamRyan/
 
Hi Bill,

Or maybe I did something wrong, it is the same code as I posted yesterday,
however I added a simple partition for the dataview. This is the same code
as Yejian, however made directly testtable (You can paste it in a load event
of a form in VB (what I mostly do) or make your own module, whatever you
want)

Cor

Dim ds As New DataSet
Dim dt As New DataTable
For x As Integer = x To 3
Dim dc As New DataColumn
dt.Columns.Add(dc)
Next
dt.Columns(1).ColumnName = "ITEMNO"
ds.Tables.Add(dt)
Dim i As Integer = 0
ds.Tables(0).Clear()
Dim starttimer As Integer = Environment.TickCount
For i = 0 To 1000
'Create a new data row
Dim dataRow As DataRow = ds.Tables(0).NewRow()
'Set value to datarow
dataRow(0) = "A"
dataRow(1) = i.ToString()
dataRow(2) = "200405"
dataRow(3) = 100.0
'add dataRow to the dataTable
ds.Tables(0).Rows.Add(dataRow)
Next
'seek the record from the DataTable
For i = 0 To 1000 ':loop 2
Dim selectedRecords() As DataRow
selectedRecords = ds.Tables(0).Select(String.Format _
("ITEMNO = '{0}'", i.ToString()))
Next
MessageBox.Show(CStr(Environment.TickCount - starttimer))
ds.Tables(0).Clear()
starttimer = Environment.TickCount
'ds: DataSet's Instance of a DB's Table
For i = 0 To 1000 ':loop 1
'Create a new data row
Dim dataRow As DataRow = ds.Tables(0).NewRow()
'Set value to datarow
dataRow(0) = "A"
dataRow(1) = i.ToString()
dataRow(2) = "200405"
dataRow(3) = 100.0
'add dataRow to the dataTable
ds.Tables(0).Rows.Add(dataRow)
'seek the record from the DataTable
Dim selectedRecords() As DataRow
selectedRecords = ds.Tables(0).Select _
(String.Format("ITEMNO = '{0}'", i.ToString()))
Next
MessageBox.Show((Environment.TickCount - starttimer).ToString)
ds.Tables(0).Clear()
starttimer = Environment.TickCount
'ds: DataSet's Instance of a DB's Table
For i = 0 To 1000 ':loop 1
'Create a new data row
Dim dataRow As DataRow = ds.Tables(0).NewRow()
'Set value to datarow
dataRow(0) = "A"
dataRow(1) = i.ToString()
dataRow(2) = "200405"
dataRow(3) = 100.0
'add dataRow to the dataTable
ds.Tables(0).Rows.Add(dataRow)
'seek the record from the DataTable
Dim dv As New DataView(ds.Tables(0))
dv.RowFilter = String.Format("ITEMNO = '{0}'", i.ToString())
Next
MessageBox.Show((Environment.TickCount - starttimer).ToString)
 
I'd expect a case insensitive select would take longer, as the two arguments,
the row value and the select value, must be converted to a common case,
either upper or lower. This is just overhead.

I too am surprised by the result. A column which is a primary key has a hash
table built for it, which is why the Find method on the DataRowCollection
performs so well. Maybe there is a similar under the covers activity going
 
hi all:
Thanks for your help. But, Maybe there is no good way to do this whit
ADO.NET.
So, I have to use HashTable to do it.
Thank you very much.

yejian


Jon Skeet said:
dotnet-MS said:
You would be welcome to answer the Question:

<snip>

For any C# fans out there, here's a short but complete program which
demonstrates the problem in C#. It's only doing 4000 iterations to keep
my sanity in check, but it shows the problem easily enough. On my box,
I got results of

Batched: 00:00:00.2187500
Interleaved: 00:01:07.5781250


I'll investigate why...


using System;
using System.Data;

class Test
{
const int Iterations = 4000;

static void Main()
{
TestBatched();
TestInterleaved();
}

static void TestBatched()
{
DataTable table = GetTable();
DateTime start = DateTime.Now;

for (int i=0; i < Iterations; i++)
{
DataRow row = table.NewRow();
row[0]=i.ToString();
row[1]="A";
row[2]="B";
row[3]=100.0;
table.Rows.Add(row);
}

for (int i=0; i < Iterations; i++)
{
DataRow[] selected = table.Select
(String.Format("ID='{0}'", i));
}
DateTime end = DateTime.Now;
Console.WriteLine ("Batched: {0}", end-start);
}

static void TestInterleaved()
{
DataTable table = GetTable();
DateTime start = DateTime.Now;

for (int i=0; i < Iterations; i++)
{
DataRow row = table.NewRow();
row[0]=i.ToString();
row[1]="A";
row[2]="B";
row[3]=100.0;
table.Rows.Add(row);
DataRow[] selected = table.Select
(String.Format("ID='{0}'", i));
}

DateTime end = DateTime.Now;
Console.WriteLine ("Interleaved: {0}", end-start);
}

static DataTable GetTable()
{
DataTable table = new DataTable();
table.CaseSensitive = false;

table.Columns.Add ("ID", typeof(string));
table.Columns.Add ("First", typeof(string));
table.Columns.Add ("Second", typeof(string));
table.Columns.Add ("Third", typeof(double));
return table;
}
}
 
Hi Bill,

Funny I had tested that once and I was thinking you did knew that.
And because in this thread I stated that as well to YeJian I thought you did
deny that after testing.

However, now we know both.

:-)

Cor
 
Back
Top