DSUM alternative for update query?

  • Thread starter Thread starter Helmut Glänzer
  • Start date Start date
H

Helmut Glänzer

DSUM is not suitable to build the running sum on large datasets. It appears
to always add all rows to the current so that the performance degenerates
more than linearly. I am wondering what alternative there might exist. In
particular the following statement seems to work with most SQL compatible
DBs:
update t set rs = s + (select max(rs) from t) where t.f <= t.f;
where t is the table, rs the field for the running sum, s the sums that go
into the running sum and f2 the field to sort by.

However access will not accept such a statement although max(rs) is a
scalar.
Does anyone know a remedy?
 
Hi,


Make a temporary table out of the total query; use that table in the
update statement.


Hoping it may help,
Vanderghast, Access MVP
 
First, yes you can use an UPDATE SQL statement similar to
the one you posted; you just need to fix the syntax.

Try something like:

update t set rs = s + (select max(rs) from t where t.f <=
t.f);

Notice I moved the the closing parenthesis.

---------------------------

However! Just because MAX is scalar (i.e., only returns
one value) does not mean it will be very quick on very
large recordsets. I would even argue the use of the
term "linear" to describe the increase in time
requirements.

The only way I know of to achieve "linear" time
requirements for this update is to use what I call an
inline function. Essentially, you will use a function
that stores the previous value, performs the simple
addition, and returns the result for updating. This type
of solution is truly linear.

I posted the following earlier. Using inline functions is
not always the easiest, and while you will be performing
an update, the same applies for views (but in fact,
updating makes your life easier in many ways; see 2a
below).
------------------------

I consider this authoritative, but not exclusive.

There are essentially two competing methods to accomplish
accumulations by performing line-by-line analysis, each
with their advantages and drawbacks. The two methods are
(a) queries, and (b) functions.

Questions that determine the appropriate answer:

-- How many records will be displayed as a result? 100 or
fewer? Thousands?

-- Will the query need to be updatable?

-- Will the user scroll through these records?

-- Will the user need to make changes and have the
calculations immediately update?

-- Will the user reorder the records, thus requiring the
recalculation of the cumulative totals?

-- Are there groupings? In other words, will you be
looking for the last cumulative value after Group A, then
after Group B, etc.?

------------

An overview of the competing methodologies:

tblOrderItems
--OrderItemID - AutoNumber
--OrderItem - Text 255
--OrderValue - Currency

Version 1: Queries Only

There are a few ways of implementing this as a subquery.
Here, I provide an example of using a subquery within the
select statement to return the cumulative values up to
that point (known as a correlated query).

SELECT MainOrderItems.OrderItemID,
MainOrderItems.OrderValue, (Select Sum([OrderValue]) From
tblOrderItems As SubQuery Where SubQuery.[OrderItemID] <=
MainOrderItems.OrderItemID) AS Cumulative
FROM tblOrderItems AS MainOrderItems
ORDER BY MainOrderItems.OrderItemID;

Advantages:
--This version is fully encapsulated within SQL
--SQL geeks love it because it makes them feel uber and
reinforces their opinion that code is for the uninitiated

Disadvantages:
--Doesn't allow for updates*
--The displayed sort order must be managed
--Don't try this with thousands of records if you want an
answer anytime soon

*OK, OK, so there IS a way to make the results updatable.
Use DSum in the select statement instead of a subquery.
For example:

SELECT tblOrderItems.*, nz(DSum
("[OrderValue]","tblOrderItems"," [OrderItemID]<" &
[tblOrderItems].[OrderItemID]))+[OrderValue] AS Cumulative
FROM tblOrderItems;

This combination allows for the cumulative action to work
while maintaining its updatability capabilities. AND, the
numbers will automatically recalculate if you make a
change. If you don't want the automatic recalculation,
you can use a DLookup against the first query that uses
the subquery.

Version 2: Functions

There are two distinct ways to utilize functions. I break
them down into (a) inline functions and (b) non-inline
functions, or in other words: (a) hard but blazingly fast
and (b) somewhat easy and flexible but slow.

2(a) Inline functions take the current value of the
record, perform a mathematic (or other) function, "stores"
the result of calculation, and return the answer for
display.

Inline functions use the "stored" value for next call.
Inline functions can be used in Queries, Forms, and
Reports.

Advantages:
--Absolute performance screamers. Inline functions are
hands-down the fastest executors.
--Able to easily handle the largest record counts; linear
time requirement growth

Disadvantages:
--Hardest to manage*
--Is best used in conjunction with Aggregate functions,
such as Last, with or without Group By.**
--Not designed for scrolling, but you can do it if you
really have to***

*Inline functions require the additional requirement of
resetting the "stored" value from execution to exection.
Within queries, this can be achieved by having one query
return the first record, and if the record being executed
is the first record, then it resets the stored value. For
reports, it can be reset with the Report_Open. For forms,
don't use inline functions -- use non-inline functions or
queries.

**Inline functions are superior at returning an aggregate
result based upon line-by-line analysis in an incredibly
fast manner (quite possibly unbeatable speed-wise, even
when compared against code using cursors). Essentially,
you create a query that uses the inline function. Then
you create another query that performs the aggregation
against the first query.

If function execution order is important, there's an
important caveat. The query that runs the inline function
must be based upon a query that includes the SELECT TOP
100 PERCENT ... keywords. "TOP 100 PERCENT" ensures that
your stated Order By is performed against the subquery.
This ensures the inline function is called in the exact
order you demand (such as forcing groups to execute in
order). Then an aggregate query can simply take
the "last" of the inline query to return the blitzkrieg
mathematic results.

Note: Cumulative multiplication in conjunction with
aggregate functions does not require a function. You can
use an expession similar to this one: IIf(Sum(IIf([Value]
<0,1,0)) Mod 2=1,-1,1)*Sgn(Min(Abs([Value])))*Exp(Sum(Log
(Abs(IIf([Value]=0,1,[Value])))))

***While technically the results can be set to be
updatable while being shown to the user, it just doesn't
work. This is because the function is called as the user
scrolls through the recordset for the row the user is on.
This wreaks absolute havok with "stored" variables, and
will return unexpected values. If you are going to scroll
the results, you *should* set the result type to
Snapshot. However, even this is unadvisable; update the
the field with an UPDATE statement before displaying to
the user.

If in-line functions must be used with a scrollable and
updatable recordset, then you must implement methods that
dramatically hamper performance. This is because you have
to forget about using stored values, and dynamically
calculate the proper result based upon the users position
within the recordset. (I suppose you could implement both
methods. The first for the initial run for speed, the
second kicks in after the last value is processed. But
jeez...)

Here's an example of an in-line function (in a standard
module). This function performs a cumulative multiply,
and resets the value when a new CoGroupID is passed. (In
the real world, this should be performed using the
expression I listed above; this is for illustration only.)

Function InlineMultiply(lngInlineValue As Long,
bInitialize As Boolean, lngCoGroupID As Long) As Long
On Error GoTo Err_Overflow

Static lngHoldValue As Long
Static lngHoldCoGroup As Long
Static lngHoldCompanyID As Long

' determine if to initialize
If bInitialize Then
lngHoldValue = 1
lngHoldCoGroup = lngCoGroupID
lngHoldCompanyID = 0
End If

' determine if a different co group passed (requires
resetting)
If lngHoldCoGroup <> lngCoGroupID Then
lngHoldValue = 1
lngHoldCoGroup = lngCoGroupID
End If

' return the hold value * inline value
lngHoldValue = lngHoldValue * lngInlineValue
InlineMultiply = lngHoldValue

Exit_InlineMultiply:
Exit Function

Err_Overflow:
InlineMultiply = 0
lngHoldValue = 1
lngHoldCoGroup = 0
Resume Exit_InlineMultiply

End Function

The first query involved utilizes the TOP 100 PERCENT
keywords to force the appropriate execution order of the
function. Otherwise, there would be no guarantee that all
CoGroupID 1's would execute before any CoGroupID 2's were
passed (thus resetting the value prematurely).

SELECT TOP 100 PERCENT tblCompanies.*
FROM tblCompanies
WHERE tblCompanies.Value Is Not Null
ORDER BY tblCompanies.CoGroupID, tblCompanies.CompanyName;
|| note this Order By CoGroupID
|| in conjunction with TOP 100 PERCENT

--This query forces the Order By to be performed, thanks
to the TOP 100 PERCENT keywords. This is then used as a
subquery in the next query that actually calls the inline
function:

SELECT qrySortedCompanyData.CompanyID,
qrySortedCompanyData.CoGroupID,
qrySortedCompanyData.CompanyName,
qrySortedCompanyData.Value, InlineMultiply([Value],
[qrySortedCompanyData].[CompanyID]=[qryFirstCompany].
[CompanyID],[qrySortedCompanyData].[CoGroupID]) AS
TheInline
FROM qrySortedCompanyData, qryFirstCompany
ORDER BY qrySortedCompanyData.CoGroupID,
qrySortedCompanyData.CompanyName;

--This query can then be used to make summaries:

SELECT qryRunningMultiply.CoGroupID, Last
(qryRunningMultiply.TheInline) AS LastValue
FROM qryRunningMultiply
GROUP BY qryRunningMultiply.CoGroupID;

-- This summary shows the cumulative value for each group.

2(b) Non-Inline Functions.

Non-inline functions are chiefly designed for use on
forms. They take the recordsetclone and a bookmark, and
perform their calculations against that recordset.

Advantages:
--Fully updatable recordsets
--Immediate reculations
--Easy to write and maintain
--The function uses the current sort, even if the user
changes it

Disadvantages:
--Not exactly speedy
--Not for thousands of records

The following in-line function example returns the row
number:

-TextBox controlsource:
=IIf(IsNull([TheValue]),Null,LineNumber([RecordsetClone],
[Bookmark]))

-Non-inline Function:
Function LineNumber(rsc As Recordset, strBookmark As
String) As Long

' align to the current bookmark and return its
absolute position
rsc.Bookmark = strBookmark
LineNumber = rsc.AbsolutePosition + 1

End Function

This next example returns the cumulative sum of values
showing on the datasheet.

-Textbox controlsource:
=IIf(IsNull([TheValue]),Null,CumulativeValues
([RecordsetClone],[Bookmark],"TheValue"))

-Non-inline function:
Function CumulativeValues(rsc As Recordset, strBookmark As
String, strFieldName As String) As Long

Dim lngTotal As Long

' align to the current bookmark
rsc.Bookmark = strBookmark

' move previous until bof encountered
While Not rsc.BOF
lngTotal = lngTotal + rsc.Fields(strFieldName)
rsc.MovePrevious
Wend

' return the value
CumulativeValues = lngTotal

End Function
 
update t set rs = s + (select max(rs) from t AS t2 where
t2.f <= t.f);

Sorry, even my initial fix didn't fix the correlated
subquery. Try something like the above (still untested
though).

David
 
Back
Top