Any suggestions for optimising this very slow (45 minute) query?

  • Thread starter Thread starter dochsm
  • Start date Start date
D

dochsm

This query processes 200,000 records each week. It calculates two
cumulative attendance totals by weeknumber for students who are
identified by their admission number (Adno). Trouble is it takes
around 45 minutes to run. Is there a better way to write it?

Howard



INSERT
INTO Tbl_CumulativeAttendance
(
Adno ,
ExportWeekNumber ,
Used_Surname ,
CumulativePresentNumber ,
CumulativePossibleNumber,
CumulativePercent
)
SELECT DISTINCT
tmp_attends.Adno ,
tmp_attends.ExportWeekNumber,
tmp_attends.Used_Surname ,
(SELECT SUM(presentMark)
FROM tbl_TEMP_all_attendances_with_week_number
WHERE AdNo = tmp_attends.Adno
AND val( ExportWeekNumber) <=
val(tmp_attends.ExportWeekNumber)
) AS CumulativePresentNumber,

(SELECT SUM(PossibleMark)
FROM tbl_TEMP_all_attendances_with_week_number
WHERE AdNo = tmp_attends.Adno
AND val(ExportWeekNumber) <=
val(tmp_attends.ExportWeekNumber)
) AS CumulativePossibleNumber,

IIf([CumulativePossibleNumber]>0,
([CumulativePresentNumber]/[CumulativePossibleNumber])*100,
0) AS CumulativePercent

FROM tbl_TEMP_all_attendances_with_week_number AS tmp_attends
ORDER BY
tmp_attends.Adno,
tmp_attends.ExportWeekNumber;
 
That was indeed faster. I made a temp table first using the val() so I
could index by week number and got rid of the Val() in the query. At
first I thought nothing was happening as the status bar showed almost
no progress for about 5 minutes. Then it shot to the end in about two
seconds. Interestingly though I did consider a self join first of all
but was put off doing it by comments on some groups that a subqueries
are faster. Obviously not in this case.
Thanks for the help
Howard
 
Yes I did. I made a temp table with the same structure as my original
data (a linked csv fife) except that I changed the weeknum type to
integer and indexed it. Then I used a delete query to clear out all
the records followed by an append with a Val() to convert my weeknum
to integer and refill it each week. Then I used that temp table as the
source of the query you helped me with. I'd prefer to use a make table
instead of delete followed by an append as I suspect that would be
faster to do that bit but don't know how to do a make table query with
data while at the same time defining a data type and an index (as
opposed to using DDL to to just make an empty table) Howard
 
A small question though.
Although your new code generates exactly the same percentage figures
as the original code, the cumulative present and possible figures are
very different.
I would expect the possible to be 10 more than it was last week each
time,
as they are in the original query but I cannot see where the new
figures are coming from.
Is there a way to retain the original sums?

eg
original result

WeekNumber CumulativePresentNumber CumulativePossibleNumber
CumulativePercent
1 3 4 75
2 13 14 92.8
3 23 24 95.8
4 33 34 97.0
5 42 44 95.4
6 48 52 92.3
7 54 62 87.0
8 64 72 88.8

result of faster query

ExportWeekNumber CumulativePresentNumber CumulativePossibleNumber
CumulativePercent
1 36 48 75
2 182 196 92.8
3 322 336 95.8
4 462 476 97.0
5 588 616 95.4
6 672 728 92.3
7 756 868 87.0
8 1792 2016 88.8


Howard


Weeknum CumPresentr CumPossibler CumPercent
1 3 4 75
2 13 14 92.8
3 23 24 95.8
4 33 34 97.0
5 42 44 95.4
6 48 52 92.3
7 54 62 87.0
8 64 72 88.8
9 74 82 90.2

ExportWeekNumber CumulativePresentNumber CumulativePossibleNumber
CumulativePercent
1 36 48 75
2 182 196 92.8
3 322 336 95.8
4 462 476 97.0
5 588 616 95.4
6 672 728 92.3
7 756 868 87.0
8 1792 2016 88.8
 
Here it is.
I altered your code slightly to do without the extra alias but it
produced the same result as that your code
Thanks for your continued help with this.
Howard

INSERT INTO Tbl_CumulativeAttendance
( Adno,
ExportWeekNumber,
Used_Surname,
CumulativePresentNumber,
CumulativePossibleNumber,
CumulativePercent )
SELECT
tmp_attends.Adno,
tmp_attends.ExportWeekNumber,
tmp_attends.Used_Surname,
SUM(Q.presentMark) AS CumulativePresentNumber,
SUM(Q.PossibleMark) AS CumulativePossibleNumber,
IIf(SUM(Q.PossibleMark)>0,(SUM(Q.presentMark)/
SUM(Q.PossibleMark))*100,0) AS CumulativePercent
FROM
tbl_TEMP_all_attendances_with_week_number AS tmp_attends
INNER JOIN
tbl_TEMP_all_attendances_with_week_number AS Q ON
(Q.AdNo=tmp_attends.Adno) AND
(Q.ExportWeekNumber<=tmp_attends.ExportWeekNumber)
GROUP BY
tmp_attends.Adno,
tmp_attends.ExportWeekNumber,
tmp_attends.Used_Surname
ORDER BY tmp_attends.Adno, tmp_attends.ExportWeekNumber;


The first few rows of the table it is processing look like this. It
represents the attendance mark for lots of students,
Each row is one mark,
2 marks each day - these are often, but not always the same eg 3rd
September,
Usually 10 marks a week unless there are short weeks at the beginning
of terms.
I pre-processed the raw data to get this table by translating the
various raw mark codes entered by the staff into a 1 or a 0 for
both presentmark and possiblemark so that it was easier to add up. 1 =
it was a present mark or a possible mark, 0 means it wasn't
The week number is the third column from the right

(BTW I realise this table is not normalised, The derivable columns are
only there while I debug it and check the maths.
The finished thing will only have the Adno and the rightmost 4
columns)


Adno UPN Used_Surname Used_Forename Legal_Surname Legal_Forename
Gender Reg Year Mark markdate ExportWeekNumber PresentMark
PossibleMark
000001 E573222599033 Bloggs Sarah Bloggs Sarah F 11IC Year 11 # 31-
Aug-10 1 0 0
000001 E573222599033 Bloggs Sarah Bloggs Sarah F 11IC Year 11 # 31-
Aug-10 1 0 0
000001 E573222599033 Bloggs Sarah Bloggs Sarah F 11IC Year 11 # 01-
Sep-10 1 0 0
000001 E573222599033 Bloggs Sarah Bloggs Sarah F 11IC Year 11 # 01-
Sep-10 1 0 0
000001 E573222599033 Bloggs Sarah Bloggs Sarah F 11IC Year 11 \ 02-
Sep-10 1 1 1
000001 E573222599033 Bloggs Sarah Bloggs Sarah F 11IC Year 11 / 02-
Sep-10 1 1 1
000001 E573222599033 Bloggs Sarah Bloggs Sarah F 11IC Year 11 M 03-
Sep-10 1 0 1
000001 E573222599033 Bloggs Sarah Bloggs Sarah F 11IC Year 11 \ 03-
Sep-10 1 1 1
000001 E573222599033 Bloggs Sarah Bloggs Sarah F 11IC Year 11 # 04-
Sep-10 1 0 0
000001 E573222599033 Bloggs Sarah Bloggs Sarah F 11IC Year 11 # 04-
Sep-10 1 0 0
000001 E573222599033 Bloggs Sarah Bloggs Sarah F 11IC Year 11 # 05-
Sep-10 1 0 0
000001 E573222599033 Bloggs Sarah Bloggs Sarah F 11IC Year 11 # 05-
Sep-10 1 0 0
000001 E573222599033 Bloggs Sarah Bloggs Sarah F 11IC Year 11 / 06-
Sep-10 2 1 1 <---- this is where week 2 starts
000001 E573222599033 Bloggs Sarah Bloggs Sarah F 11IC Year 11 \ 06-
Sep-10 2 1 1
000001 E573222599033 Bloggs Sarah Bloggs Sarah F 11IC Year 11 / 07-
Sep-10 2 1 1
000001 E573222599033 Bloggs Sarah Bloggs Sarah F 11IC Year 11 \ 07-
Sep-10 2 1 1
000001 E573222599033 Bloggs Sarah Bloggs Sarah F 11IC Year 11 \ 08-
Sep-10 2 1 1
000001 E573222599033 Bloggs Sarah Bloggs Sarah F 11IC Year 11 / 08-
Sep-10 2 1 1
000001 E573222599033 Bloggs Sarah Bloggs Sarah F 11IC Year 11 / 09-
Sep-10 2 1 1
000001 E573222599033 Bloggs Sarah Bloggs Sarah F 11IC Year 11 \ 09-
Sep-10 2 1 1
000001 E573222599033 Bloggs Sarah Bloggs Sarah F 11IC Year 11 \ 10-
Sep-10 2 1 1
000001 E573222599033 Bloggs Sarah Bloggs Sarah F 11IC Year 11 / 10-
Sep-10 2 1 1
000001 E573222599033 Bloggs Sarah Bloggs Sarah F 11IC Year 11 # 11-
Sep-10 2 0 0
 
Yes I did. I made a temp table with the same structure as my original
data (a linked csv fife) except that I changed the weeknum type to
integer and indexed it. Then I used a delete query to clear out all
the records followed by an append with a Val() to convert my weeknum
to integer and refill it each week. Then I used that temp table as the
source of the query you helped me with.

I'd prefer to use a make table
instead of delete followed by an append as I suspect that would be
faster to do that bit

There is very little difference in performance between a make table
vs. a delete all followed by an append.
while at the same time defining a data type and an index (as
opposed to using DDL to to just make an empty table)

When doing an insert of a lot of records, and it isn't a multi-user
table, it is much faster to remove all indexes, insert the records and
then create the indexes in code.

Also to avoid the bloating mentioned by Marshall see the
TempTables.MDB page at my website which illustrates how to use a
temporary MDB in your app.
http://www.granite.ab.ca/access/temptables.htm

Tony

--
Tony Toews, Microsoft Access MVP
Tony's Main MS Access pages - http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
For a convenient utility to keep your users FEs and other files
updated see http://www.autofeupdater.com/
 
Yes, that mod completely changed the performance gain.
Its now down to just 20 seconds!! (and that's returning all the
derivable data and also doing a sort, neither of which are really
necessary at this stage)

Brilliant. I had been running my query overnight so that the data was
ready for use by later queries initiated by the user.
At this new speed though I could run it in real time and save myself a
lot of bother.

I must study the code, and that of Tony, so that I learn from it and
don't make the same mistake again.

Thanks for all your help.

Howard
 
There is very little difference in performance between a make table
vs. a delete all followed by an append.


When doing an insert of a lot of records, and it isn't a multi-user
table, it is much faster to remove all indexes, insert the records and
then create the indexes in code.

Also to avoid the bloating mentioned by Marshall see the
TempTables.MDB page at my website which illustrates how to use a
temporary MDB in your app.http://www.granite.ab.ca/access/temptables.htm

Tony

--
Tony Toews, Microsoft Access MVP
Tony's Main MS Access pages -http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog -http://msmvps.com/blogs/access/
For a convenient utility to keep your users FEs and other files
  updated seehttp://www.autofeupdater.com/

Thanks for the advice Tony, I'll give that a go.
Howard
 
it is mandatory to avoid system caching
from badly skewing the test compared to your real usage
scenario,

I noticed many years ago that running a complex report took 40 seconds
the first time. But after flipping to design view, making some
changes and then rerunning it only took 20 seconds. Until the report
was closed. Whereupon opening it took 40 seconds.

Tony
--
Tony Toews, Microsoft Access MVP
Tony's Main MS Access pages - http://www.granite.ab.ca/accsmstr.htm
Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
For a convenient utility to keep your users FEs and other files
updated see http://www.autofeupdater.com/
 
Back
Top