Datasets and aggregate functions

  • Thread starter Thread starter John Dann
  • Start date Start date
J

John Dann

I need to perform some aggregate stats on an ado.net dataset.datatable
(which just contains numeric data), nothing too complicated, just eg:

** max and min values in a column

** row index of where the max and min occurred

** column mean

I'm not aware that there are any ado.net 'native' functions to do
this specifically on datasets (but tell me if I'm wrong on this!).
This is also a function that will be called iteratively and
frequently, so performance is not irrelevant.

Question is what the best way of approaching this might be? There are
obviously several options like:

1. Iterating through all the rows of each column as required and
writing the routines 'manually' to do the calculations required.

2. A variant on [1] of maybe assigning the column to an array (but
presumably I'd need to populate this iteratively from the dataset?)
before doing the calculations (in case this might improve
performance).

3. Populate some third party data object (eg the C1 DataExpress class,
which does offer aggregate functions 'built in') with the dataset
table.

4. As [3] but use eg a data grid that offers similar functions.

5. Anything else??

Anyone have any views please on what might be the best approach to
try?

JGD
 
Hi John,

John Dann said:
I need to perform some aggregate stats on an ado.net dataset.datatable
(which just contains numeric data), nothing too complicated, just eg:

** max and min values in a column

use DataTable.Compute
** row index of where the max and min occurred
Iteration


** column mean

See .net help on DataColumn.Expression for what can be Computed.
I'm not aware that there are any ado.net 'native' functions to do
this specifically on datasets (but tell me if I'm wrong on this!).
This is also a function that will be called iteratively and
frequently, so performance is not irrelevant.

Question is what the best way of approaching this might be? There are
obviously several options like:

1. Iterating through all the rows of each column as required and
writing the routines 'manually' to do the calculations required.

2. A variant on [1] of maybe assigning the column to an array (but
presumably I'd need to populate this iteratively from the dataset?)
before doing the calculations (in case this might improve
performance).

No, it won't improve the performance. At least not significatnly.
3. Populate some third party data object (eg the C1 DataExpress class,
which does offer aggregate functions 'built in') with the dataset
table.

4. As [3] but use eg a data grid that offers similar functions.

5. Anything else??

Anyone have any views please on what might be the best approach to
try?

I would go with 1. It gives you speed and control.
 
use DataTable.Compute

Iteration

Many thanks for this insight. One follow-up question:

I was trying to find the index of the row with the max value (actually
it's the value of another column at that same index that I really
need) by trying:

' Previously define dt as datatable and populate
dim tmax as double = dt.compute("MAX(colname)", Nothing)
dim dr() as datarow=dt.select("colname=" & tmax.ToString)

tmax is identified OK, but the dr array is not populated at all
(ubound=-1). Maybe I need different syntax in the Select parameter,
but I'm not sure what? I don't know why but I'm also a little
uncomfortable searching for a precise double value as a string.

MAybe this is why you suggested searching by 'manual' iteration?

JGD
 
Many thanks for this insight. One follow-up question:

I was trying to find the index of the row with the max value (actually
it's the value of another column at that same index that I really
need) by trying:

' Previously define dt as datatable and populate
dim tmax as double = dt.compute("MAX(colname)", Nothing)
dim dr() as datarow=dt.select("colname=" & tmax.ToString)

tmax is identified OK, but the dr array is not populated at all
(ubound=-1). Maybe I need different syntax in the Select parameter,
but I'm not sure what? I don't know why but I'm also a little
uncomfortable searching for a precise double value as a string.

MAybe this is why you suggested searching by 'manual' iteration?

Yes, do the iteration instead as it will be more reliable and faster (only
one loop).
 
Back
Top