Group Records on a date/time field?

  • Thread starter Thread starter Codemonkey
  • Start date Start date
C

Codemonkey

Hi,

I have a table laid out with the following structure:

SampleDate (DateTime),
Value (Long Integer)

Both fields are cumalitive (i.e. Each sucessive sample date is later than
the previous record and each value field is greater than the last).

The amount of time between each record is not constant (a sample may be
taken every 10 minutes or at different varying intervals).

What is the easiest and most efficient way of grouping these records by a
configurable time period? For example to:

1) Get the Max of Value for each hour in the day or each day in the month or
each month in the year?
2) Get the Change (Max-Min) of Value for each hour in the day or each day in
the month or each month in the year?

Obviously no record should be returned for a time period where there is no
sample. I also want to use SQL only (so it can be called from ADO.net).

At the minute, I'm grouping on the following Calculated Field:
Format(SampleDate, "dd/MM/yyyy HH")

This results in grouping the records for each hour in the day. I don't like
this method for the following reasons:

1) The performace could be better without the changing from Date to String.
2) I'm stuck with time periods defined by the different format strings that
Format() can accept - e.g. I can't break it down in to 6 hour periods.

Any ideas would be greatly appreciated, I'm open to suggentions on queries
and table design (or anything else that may help!)

Thanks,

Trev.
 
Hi,



SELECT int(SampleDate*24*6) As Starting,
Max(Value) As Maximum,
Max(Value)-Min(Value) As Change

FROM mytable

GROUP BY int(SampleDate*24*6)



you would have to divide Starting by 24.0* 6.0 to get it into a date_time
value ( there are 24 hour per days, and 6 (periods of 10 minutes) per
hour ). You can then format that date_time accordingly to whatever format
you wish. ( I haven't done it in the query because it is faster to GROUP BY
on an integer than on a string).



Hoping it may help
Vanderghast, Access MVP
 
Michel,

Thanks for the help. So far I've got it partially working the way I want.
I'll post the finished query here soon.

Cheers again.

Trev.
 
Michel,

Here's the solution I've come up with by using your suggestion. It turns out
that I don't need the "Change" field.

------------------------------------------
PARAMETERS [@DayMultiplier] IEEEDouble;

SELECT

CLng(Test.SampleDate*[@DayMultiplier]) AS Grouping,
CDate([Grouping]/[@DayMultiplier]) AS SampleDate,
Max(Test.Value) AS MaximumValue

FROM Test

GROUP BY CLng(Test.SampleDate*[@DayMultiplier]);
-------------------------------------------

@DayMultiplier is a parameter used to specify the period of time the samples
are grouped into (e.g. "24" gives hourly samples, "1" gives daily samples,
"0.5" gives 2 day samples, "1/7" gives weekly samples etc.)

Sample Data (dates are in UK format - dd/MM/yyyy)
-----------
SampleDate Value
01/01/2003 13:20:00 20
01/01/2003 13:29:59 25
01/01/2003 13:45:00 50
01/01/2003 14:04:00 75
01/01/2003 14:31:00 80
01/01/2003 15:00:00 100
01/01/2003 15:10:00 110
01/01/2003 15:25:00 125
01/01/2003 16:25:00 151
01/01/2003 16:45:00 152
01/01/2003 18:00:00 180
01/01/2003 18:10:00 185
01/01/2003 18:45:00 196
02/01/2003 10:13:00 200
02/01/2003 10:45:00 210
02/01/2003 16:45:00 230
03/01/2003 12:00:01 250
03/01/2003 12:05:00 265
03/01/2003 12:35:00 370

Results when grouped by hour (@DayMultiplier = 24)
--------------------------------------------------
SampleDate MaximumValue
01/01/2003 13:00:00 25
01/01/2003 14:00:00 75
01/01/2003 15:00:00 125
01/01/2003 16:00:00 151
01/01/2003 17:00:00 152
01/01/2003 18:00:00 185
01/01/2003 19:00:00 196
02/01/2003 10:00:00 200
02/01/2003 11:00:00 210
02/01/2003 17:00:00 230
03/01/2003 12:00:00 265
03/01/2003 13:00:00 370

So far it seems to work the way I want (see sample data above). I haven't
had a chance to test it in all situations or with a lot of data to test
performance. However, I did notice that the "SampleDate" is the mid point
for the maximum calculations (i.e. if grouped by hours then a "SampleDate"
of 14:00:00 will return the "Maximum" between 13:30:00 and 14:29:29). This
isn't a major problem for me.

Just one more performance related question though as I haven't a lot of data
to test with yet: Given the overhead of returning the unnecessecary
"Grouping" field to the client, would the following query be more efficient
than the one above? Is the Jet SQL engine smart enough not to calculate
"CLng(Test.SampleDate*[@DayMultiplier])" twice?

------------------------------------------
PARAMETERS [@DayMultiplier] IEEEDouble;

SELECT
CDate(CLng(Test.SampleDate*[@DayMultiplier])/[@DayMultiplier]) AS
SampleDate,
Max(Test.Value) AS MaximumValue

FROM Test

GROUP BY CLng(Test.SampleDate*[@DayMultiplier]);
------------------------------------------


Thanks again for your help.

Trev.
 
Disregard the stupid question about performance in the last post. I guess
even if Jet isn't smart enough not to calculate
"CLng(Test.SampleDate*[@DayMultiplier])" twice, then both queries will be
slower as both use the same expression twice. This leaves the second query
faster because it doesn't return the unused fields.

D'Oh.

Trev.


Codemonkey said:
Michel,

Here's the solution I've come up with by using your suggestion. It turns out
that I don't need the "Change" field.

------------------------------------------
PARAMETERS [@DayMultiplier] IEEEDouble;

SELECT

CLng(Test.SampleDate*[@DayMultiplier]) AS Grouping,
CDate([Grouping]/[@DayMultiplier]) AS SampleDate,
Max(Test.Value) AS MaximumValue

FROM Test

GROUP BY CLng(Test.SampleDate*[@DayMultiplier]);
-------------------------------------------

@DayMultiplier is a parameter used to specify the period of time the samples
are grouped into (e.g. "24" gives hourly samples, "1" gives daily samples,
"0.5" gives 2 day samples, "1/7" gives weekly samples etc.)

Sample Data (dates are in UK format - dd/MM/yyyy)
-----------
SampleDate Value
01/01/2003 13:20:00 20
01/01/2003 13:29:59 25
01/01/2003 13:45:00 50
01/01/2003 14:04:00 75
01/01/2003 14:31:00 80
01/01/2003 15:00:00 100
01/01/2003 15:10:00 110
01/01/2003 15:25:00 125
01/01/2003 16:25:00 151
01/01/2003 16:45:00 152
01/01/2003 18:00:00 180
01/01/2003 18:10:00 185
01/01/2003 18:45:00 196
02/01/2003 10:13:00 200
02/01/2003 10:45:00 210
02/01/2003 16:45:00 230
03/01/2003 12:00:01 250
03/01/2003 12:05:00 265
03/01/2003 12:35:00 370

Results when grouped by hour (@DayMultiplier = 24)
--------------------------------------------------
SampleDate MaximumValue
01/01/2003 13:00:00 25
01/01/2003 14:00:00 75
01/01/2003 15:00:00 125
01/01/2003 16:00:00 151
01/01/2003 17:00:00 152
01/01/2003 18:00:00 185
01/01/2003 19:00:00 196
02/01/2003 10:00:00 200
02/01/2003 11:00:00 210
02/01/2003 17:00:00 230
03/01/2003 12:00:00 265
03/01/2003 13:00:00 370

So far it seems to work the way I want (see sample data above). I haven't
had a chance to test it in all situations or with a lot of data to test
performance. However, I did notice that the "SampleDate" is the mid point
for the maximum calculations (i.e. if grouped by hours then a "SampleDate"
of 14:00:00 will return the "Maximum" between 13:30:00 and 14:29:29). This
isn't a major problem for me.

Just one more performance related question though as I haven't a lot of data
to test with yet: Given the overhead of returning the unnecessecary
"Grouping" field to the client, would the following query be more efficient
than the one above? Is the Jet SQL engine smart enough not to calculate
"CLng(Test.SampleDate*[@DayMultiplier])" twice?

------------------------------------------
PARAMETERS [@DayMultiplier] IEEEDouble;

SELECT
CDate(CLng(Test.SampleDate*[@DayMultiplier])/[@DayMultiplier]) AS
SampleDate,
Max(Test.Value) AS MaximumValue

FROM Test

GROUP BY CLng(Test.SampleDate*[@DayMultiplier]);
------------------------------------------


Thanks again for your help.

Trev.
 
Back
Top