Reading a lot of rows, how to get better performance without GetRows()?

  • Thread starter Thread starter Thomas Jensen
  • Start date Start date
T

Thomas Jensen

Hi,

I'm replacing a VB application with a .NET application written in C#.
Previously I have been using the GetRows() function of ADO recordsets
quite extensively, because in the case of the old application it would
lower data retrival time several times.

I think the problem is that my data is stored in tables with a
relatively small row size, for instance tables containing currency
values (date/value pairs).
I will typically be selecting 100-1000 rows at a time.
Using an OdbcConnection/OdbcCommand/OdbcDataReader combination,
retrieving 500 records take as long as 1000 ms over a 100mbit/s switched
network.
I'm using OdbcDataReader.GetValues() to retrieve the values.
I have run the application with DevPartner Profiler, which confirms that
it is in the OdbcDataReader.GetValues() that the time is spend.

I'm currently programming for a MySQL server (4.0.14-nt) using MySql
ODBC driver 3.51, but I have experienced the same using a MSSQL server.

Performance Monitor shows up to 2000 packets/sec over the network during
processing, so I guess the problem arises from retrieving the records
one of a time.

Is there a way to force ADO.NET to buffer/prefetch data to avoid this
problem?


Simple program showing the problem:

OdbcConnection conn = new
OdbcConnection("DSN=mysql;DATABASE=idix_store;");
conn.Open();
string sql = "SELECT test_int, date_created FROM t_unit;";
OdbcCommand cmd = new OdbcCommand(sql, conn);
OdbcDataReader reader = cmd.ExecuteReader();
DateTime t1 = DateTime.Now;
while(reader.Read())
{
object[] vals = new object[reader.FieldCount];
reader.GetValues(vals); // <-- 400ms (500 rows),
// with GetRows this would be <50ms
}
DateTime t2 = DateTime.Now;
Console.WriteLine(t2-t1);
reader.Close();
 
Well unfortunately a DataSet using a DataAdapter won't help because the fill method on the DataAdapter is really using a DataReader to populate the DataSet underneath the hood. So a DataReader is about as fast as it can get. As for Prefetching I don't beleive a ADO.Net can do this, someone please correct me if I am wrong. If syncronization and row order is not an issue maybe you can run the query on multiple async threads. Or I know I may try outputing the data as XML at least with SQL Server, I am not sure how fast its going to be or whether MySQL even supports it, but it may be worth trying?

Keenan Newton
 
Keenan said:
Well unfortunately a DataSet using a DataAdapter won't help because
the fill method on the DataAdapter is really using a DataReader to
populate the DataSet underneath the hood. So a DataReader is about

Ok, I see - I kind of suspected that.
as fast as it can get. As for Prefetching I don't beleive a ADO.Net
can do this, someone please correct me if I am wrong. If
syncronization and row order is not an issue maybe you can run the
query on multiple async threads. Or I know I may try outputing the
data as XML at least with SQL Server, I am not sure how fast its
going to be or whether MySQL even supports it, but it may be worth
trying?

I think I found another sollution, however I will check up on that if it
turns out not to work.
 
Thomas said:
Hi,

I'm replacing a VB application with a .NET application written in C#.
Previously I have been using the GetRows() function of ADO
recordsets quite extensively, because in the case of the old
application it would lower data retrival time several times.

I think I may have found a sollution, I downloaded a lib called
ByteFX .Data which gives much better thruput:

System.Data.Odbc (params in SQL) : ~2200 ms
System.Data.Odbc (Command.Parameters) : ~2000 ms
ByteFX.Data.MySqlClient (params in SQL) : ~100 ms
ByteFX.Data.MySqlClient (Command.Parameters) : ~650 ms

All figures includes adding of parameters, Execute() and iterating over
the ___DataReader object.

A pity though, I had hoped to avoid using 3rd party modules.
 
I would like the ability to dump the results of a query into a simple 2D object array without the overhead of a dataset (XML parsing, huge amount of unused features) I don't even need startrow and rowcount. The SQL can provie only the needed rows (I would use a dataset if i wanted to change the filter without requerying .

This application just takes SQL and pivots the results and outputs an html table for display on a webpage sent in an email.
 
I would like the ability to dump the results of a query into a simple 2D object array without the overhead of a dataset (XML parsing, huge amount of unused features) I don't even need startrow and rowcount. The SQL can provie only the needed rows (I would use a dataset if i wanted to change the filter without requerying .

This application just takes SQL and pivots the results and outputs an html table for display on a webpage sent in an email.
 
I would like the ability to dump the results of a query into a simple 2D
object array without the >overhead of a dataset (XML parsing, huge amount
of unused features)

You are misinformed about the overhead of a dataset. There is no XML
parsing, the data is stored in object arrays, and the unused features are
mostly free if you don't use them.

David
 
Overcome Datatable overhead with Concat/Split

Hello chaps

I'm using Mysql 4.1 with the MysqlDataAdapter.Fill and a Datatable

I found there's a big overhead with each column returned. i.e. that 3Mb of 10 columns took a lot longer to arrive over the network than 3Mb of 1 column.

If you're prepared to lose the db datatypes of the columns and just want a fast select try merging them into one column with something like a '|' separator and then splitting them back into several columns after they've arrived.

SELECT concat(fieild_1,'|',field_1,'|',field_3,'|') as my_big_column FROM .....
...
DataTable dt_data = new DataTable();
objDataAdapter.Fill(dt_data);

for (r=0; r;)
{

//Get the string to split
lstemp = dt_data.Rows[r][0].ToString(); //0 because it's always the first and only column
lsarray = lstemp.Split('|');
myAge = Int32.Parse(lsarray[COLUMN_FOR_MYAGE]);
myName = ...

}

It's not for everyone, and involves hardcoding the column positions, but if you know your data then the speed gains are about 3 times !! i.e. on my data of about 5 columns and 18,000 rows it came down from 600ms to less than 200ms.

I'm interested in the other thoughts on this like the CORE and BYTEFX drivers. Any good/bad feedback from using them?

Andy
 
Back
Top