Median

  • Thread starter Thread starter fl
  • Start date Start date
F

fl

I would like to know how to get a median from a query. I wonder why a
median function does not exist like the max, min or avg? Thanks.
 
Hi,


I don't know why, anyhow, there are many solutions proposed on the web,
here is one that works in Access 2003 (with MS SQL Server, you can change
the LAST aggregate with MIN, MAX, or SUM, in this case, here, since it
operates on records that just have all the same value, for the implied
group).


------------
SELECT x, AVG(value) As Median
FROM (SELECT a.x, a.value
FROM (MedianForX As a INNER JOIN MedianForX As b
ON a.x=b.x AND a.value>=b.value)
INNER JOIN ( SELECT x, COUNT(*) As s
FROM MedianForX
GROUP BY x) As c
ON a.x=c.x
GROUP BY a.x, a.value
HAVING 2*(COUNT(*)-1)=LAST(c.s)
OR COUNT(*) =INT(0.5+ LAST(c.s)/2)
)
GROUP BY x
------------

with the data:

MedianForX
x value
A 1
A 2
A 3
B 1
B 2
B 3
B 4





it returns

Query39
x Median
A 2
B 2.5





Note that I do not handle duplicated values with the previous formulation.



The having clause handle the even/odd case, returning two groups, or one,
respectively, and if two groups are returned, the average is taken as
median. Basically, the inner query makes so all the job:


SELECT a.x, a.value
FROM (MedianForX As a INNER JOIN MedianForX As b
ON a.x=b.x AND a.value>=b.value)
INNER JOIN ( SELECT x, COUNT(*) As s
FROM MedianForX
GROUP BY x) As c
ON a.x=c.x
GROUP BY a.x, a.value
HAVING 2*(COUNT(*)-1)=LAST(c.s)
OR COUNT(*) =INT(0.5+ LAST(c.s)/2)



To understand it, it may help to observe that COUNT(*) is the actual rank
for the actual group, among a total of c.s values sharing the value x. If
there is an odd number of records, for a given x value, say c.s=3, then, the
HAVING clause is true only if the rank, COUNT(*), has a value of 2. If there
is an even number of records, say c.s=4, as for x='B' here up, then the
having clause is true if the rank is either 2, either 3. So, basically,
that's all. I may end up with a nicer HAVING clause, some day, but that
ugly one works none the less... :-)



Hoping it may help,
Vanderghast, Access MVP
 
Probably because it is a lot more trouble to calculate and relatively
rarely used in database-type applications. Max, Min, Avg, and StDev
can all be calculated by a single sequential pass through a set of
data, maintaining no more than one (Min, Max) or two (Avg, StDev)
intermediate values during the pass. Calculating the median can't be
"short-circuited" in this way.

I would like to know how to get a median from a query. I wonder why a
median function does not exist like the max, min or avg? Thanks.

Please respond to the Newsgroup, so that others may benefit from the exchange.
Peter R. Fletcher
 
Thank you for your help. The median comes out correct. The only problem I have
is that one tenth of the rows are missing. Total rows return 29 instead of 33.
 
Hi,


It may be because there are duplicated values within a given group?
If so, the formulation I proposed won't work. If you have duplicated values,
within a given group, you may try:


ON a.x=b.x AND ( a.value>b.value OR (a.value=b.value AND a.pk>=b.pk)) )


instead of

ON a.x=b.x AND a.value>=b.value)



but then, you need a primary key field, pk. That ON segment is just used to
rank, uniquely, each value, within each given group. If a value is repeated,
within a group, two records may be "first", as example, and if there is only
three records at all in the group, no record is second, and the having
clause fails to pick anything, for that given group. Using a criteria
involving the primary key, as shown, is sure to break "ex-equo", and just
one record will be first, and just one will be second, etc.


Hoping it may help,
Vanderghast, Access MVP
 
Back
Top