Data Relations performance issue

  • Thread starter Thread starter Alex Ben-Ari
  • Start date Start date
A

Alex Ben-Ari

I am encountering a strange performance issue with DataRelation which
I am hoping someone can cast some light on.
I have two tables related to each other by a single field (of type
varchar, about 60 characters long on average).
Now I load a single row from the first table to a DataTable and about
40000 rows from the second table to another DataTable.
Now I add a relation between the two tables (the DataTable with the
single row is the parent). All goes well.
However: if I do exactly the same thing but instead of loading 40000
rows to the second table I load 80000 rows, the time it takes to add
the relation jumps from less than a second to close to an hour.
Why is this happening ? Any ideas anyone ?

(I tried adding the relation both before and after loading the data
into the tables - same result).

Alex.
 
Hi Alex,

One hour?
Wow, can you create a simple sample to repro the problem?
 
Miha - see end of this reply for a code snippet similar to the one
which causes the problem.
Rich - as for your two issues:
The long varchar key is a necessity in our case, I think. I don't want
to go into the design in too much detail, but it is a big problem for
me to use numerical ids because it later greatly complicates merges
between different proejcts and introduces great threats to the
integrity of the data. So let's assume for now that this is mandatory.
As for the large volumes I am loading into datasets - I don't see why
this should be a problem. The total RAM used for the dataset with the
80000 rows is about 100MB which is not a problem memory wise, and I
don't see why performance for placing the relation should rise by such
a factor when doubling the amount of rows from 40000 to 80000. I would
imagine it should be near linear. If it had I would have no problem
since placing the relation for 40000 takes a mere second or so.

Here is the "offending" code:
DataSet ds = new DataSet();
DataTable parent = new DataTable();
DataTable child = new DataTable();
....(create adaptor, set select command etc')...
adaptor.Fill(parent); //1 row
adaptor.Fill(child); //80000 rows
ds.Tables.Add(parent);
ds.Tables.Add(child);
//and now the performance intensive line
ds.Relations.Add("rel1",parent.Columns["someCol"],child.Columns["someCol"],false);

thanks for your replies,
Alex.
 
Hi Alex,

Unfortunatelly I am going off to the vacation.
I'll reply in a week if it will be still actual :-)

--
Miha Markic [MVP C#] - RightHand .NET consulting & development
miha at rthand com
www.rthand.com

Alex Ben-Ari said:
Miha - see end of this reply for a code snippet similar to the one
which causes the problem.
Rich - as for your two issues:
The long varchar key is a necessity in our case, I think. I don't want
to go into the design in too much detail, but it is a big problem for
me to use numerical ids because it later greatly complicates merges
between different proejcts and introduces great threats to the
integrity of the data. So let's assume for now that this is mandatory.
As for the large volumes I am loading into datasets - I don't see why
this should be a problem. The total RAM used for the dataset with the
80000 rows is about 100MB which is not a problem memory wise, and I
don't see why performance for placing the relation should rise by such
a factor when doubling the amount of rows from 40000 to 80000. I would
imagine it should be near linear. If it had I would have no problem
since placing the relation for 40000 takes a mere second or so.

Here is the "offending" code:
DataSet ds = new DataSet();
DataTable parent = new DataTable();
DataTable child = new DataTable();
...(create adaptor, set select command etc')...
adaptor.Fill(parent); //1 row
adaptor.Fill(child); //80000 rows
ds.Tables.Add(parent);
ds.Tables.Add(child);
//and now the performance intensive line
ds.Relations.Add("rel1",parent.Columns["someCol"],child.Columns["someCol"],false);

thanks for your replies,
Alex.

Rich said:
I'd be interested in different people's opinions on this but I would
raise
two questions re the overall design. First, I'm not sure what specific
kind
of relation is built into your database, or even what DBMS you are using,
but
I always thought that any form of key that averages 60 bytes is sort of
pushing the limits.

Secondly, I'm wondering why you're using such large tables in your client
app. Whether you need to set up a relation or not, I wouldn't even try
this
- unless it was TOTALLY avoidable. I think of Data Tables and Data Sets
as
temporary work areas. You get small amounts of data...make modifications
to
it...then send it back.

Like I said, I'd be interested in different opinions but I really thought
that putting tens of thousands of rows into a Data Table is just not what
Data Set and Data Table objects were designed for.
 
An update:
After playing around with this a while longer, trying to track down
the slowing factor (without success - but I have a long list of things
that DO NOT affect this :) ) I am left with the numebr of rows in the
child as the only parameter that affects performance. Whenever I have
more than 50000 rows in the child table the performance hit occurs.
Another VERY interesting thing: Adding the relation actually does not
succeed in this case. I have left it running for the night and it
creashed with stack overflow. Intersting...
Could it be an MS bug?
 
Miha - I have a feeling it will be still actual :) thanks.
In any case, I will post any developments in this issue here.
Have a nice vacation.

Alex.
Miha Markic said:
Hi Alex,

Unfortunatelly I am going off to the vacation.
I'll reply in a week if it will be still actual :-)

--
Miha Markic [MVP C#] - RightHand .NET consulting & development
miha at rthand com
www.rthand.com

Alex Ben-Ari said:
Miha - see end of this reply for a code snippet similar to the one
which causes the problem.
Rich - as for your two issues:
The long varchar key is a necessity in our case, I think. I don't want
to go into the design in too much detail, but it is a big problem for
me to use numerical ids because it later greatly complicates merges
between different proejcts and introduces great threats to the
integrity of the data. So let's assume for now that this is mandatory.
As for the large volumes I am loading into datasets - I don't see why
this should be a problem. The total RAM used for the dataset with the
80000 rows is about 100MB which is not a problem memory wise, and I
don't see why performance for placing the relation should rise by such
a factor when doubling the amount of rows from 40000 to 80000. I would
imagine it should be near linear. If it had I would have no problem
since placing the relation for 40000 takes a mere second or so.

Here is the "offending" code:
DataSet ds = new DataSet();
DataTable parent = new DataTable();
DataTable child = new DataTable();
...(create adaptor, set select command etc')...
adaptor.Fill(parent); //1 row
adaptor.Fill(child); //80000 rows
ds.Tables.Add(parent);
ds.Tables.Add(child);
//and now the performance intensive line
ds.Relations.Add("rel1",parent.Columns["someCol"],child.Columns["someCol"],false);

thanks for your replies,
Alex.

Rich said:
I'd be interested in different people's opinions on this but I would
raise
two questions re the overall design. First, I'm not sure what specific
kind
of relation is built into your database, or even what DBMS you are using,
but
I always thought that any form of key that averages 60 bytes is sort of
pushing the limits.

Secondly, I'm wondering why you're using such large tables in your client
app. Whether you need to set up a relation or not, I wouldn't even try
this
- unless it was TOTALLY avoidable. I think of Data Tables and Data Sets
as
temporary work areas. You get small amounts of data...make modifications
to
it...then send it back.

Like I said, I'd be interested in different opinions but I really thought
that putting tens of thousands of rows into a Data Table is just not what
Data Set and Data Table objects were designed for.

:

I am encountering a strange performance issue with DataRelation which
I am hoping someone can cast some light on.
I have two tables related to each other by a single field (of type
varchar, about 60 characters long on average).
Now I load a single row from the first table to a DataTable and about
40000 rows from the second table to another DataTable.
Now I add a relation between the two tables (the DataTable with the
single row is the parent). All goes well.
However: if I do exactly the same thing but instead of loading 40000
rows to the second table I load 80000 rows, the time it takes to add
the relation jumps from less than a second to close to an hour.
Why is this happening ? Any ideas anyone ?

(I tried adding the relation both before and after loading the data
into the tables - same result).

Alex.
 
Back
Top