A Challenge - Select the highest (or lowest) 25%

  • Thread starter Thread starter mcl
  • Start date Start date
M

mcl

I was just presented a problem. How would you do a query where you select
the highest (or lowest) 25% of the values of a parameter? The parameter
typically has anywhere from 10 to over 100 values ranging from 0 to maybe 40
or 50. So the criteria would be the higest (or lowest) 25% of the records.
Any ideas?
 
I saw the option to select how many records you want but that doesn't do
what I want. For example look at the SQL:

SELECT TOP 25 PERCENT Precip_Inches.wmo, Precip_Inches.[mod],
Precip_Inches.Jan
FROM Precip_Inches
ORDER BY Precip_Inches.wmo, Precip_Inches.Jan DESC;

I want the top 25 percent of Precip_Inches.Jan, not the first 25% of
records. This is weather data. It's monthly precip data for over 20000
stations. Some have only 10 years of data. Some have 400 years of data. For
each station I want the top 25% NOT just the first 25% of all records. In
fact for all years and all stations there are a total 1057239 records. If I
include parameters to select just one station then it works for that station
but I don't want that. I want to crunch all the stations at once. I just
picked on one month here. Is there anyway I can manipulate that "TOP 25
PERCENT " to have it work on Precip_Inches.Jan only?
 
You need change your order by to:

ORDER BY Precip_Inches.Jan DESC, Precip_Inches.wmo

or remove the sort on Precip_Inches.wmo altogether.

--
Paul Overway
Logico Solutions, LLC
www.logico-solutions.com


mcl said:
I saw the option to select how many records you want but that doesn't do
what I want. For example look at the SQL:

SELECT TOP 25 PERCENT Precip_Inches.wmo, Precip_Inches.[mod],
Precip_Inches.Jan
FROM Precip_Inches
ORDER BY Precip_Inches.wmo, Precip_Inches.Jan DESC;

I want the top 25 percent of Precip_Inches.Jan, not the first 25% of
records. This is weather data. It's monthly precip data for over 20000
stations. Some have only 10 years of data. Some have 400 years of data. For
each station I want the top 25% NOT just the first 25% of all records. In
fact for all years and all stations there are a total 1057239 records. If I
include parameters to select just one station then it works for that station
but I don't want that. I want to crunch all the stations at once. I just
picked on one month here. Is there anyway I can manipulate that "TOP 25
PERCENT " to have it work on Precip_Inches.Jan only?


mcl said:
I was just presented a problem. How would you do a query where you select
the highest (or lowest) 25% of the values of a parameter? The parameter
typically has anywhere from 10 to over 100 values ranging from 0 to
maybe
40
or 50. So the criteria would be the higest (or lowest) 25% of the records.
Any ideas?
 
I would suggest that you right click the blank area to the right in the query design and click Properties, and use the Top Values property simply select 25%. Also if you want the top 25% or bottom 25% sort the field in ascending or descending order.

Kind Regards Cathi
 
First, do you have a unique number (Primary Key) on the data?

If so, you can use that to get the results you want, by using a sub-query.

SELECT A.wmo, A.[Mod], A.Jan
FROM Precip_Inches as A
WHERE PrimaryKey in
(SELECT TOP 25 Percent B.PrimaryKey
FROM Precip_Inches As B
WHERE B.wmo = A.wmo
ORDER BY B.Jan Asc)

If you don't have a primary key this can be done, but it will be more complex.
I saw the option to select how many records you want but that doesn't do
what I want. For example look at the SQL:

SELECT TOP 25 PERCENT Precip_Inches.wmo, Precip_Inches.[mod],
Precip_Inches.Jan
FROM Precip_Inches
ORDER BY Precip_Inches.wmo, Precip_Inches.Jan DESC;

I want the top 25 percent of Precip_Inches.Jan, not the first 25% of
records. This is weather data. It's monthly precip data for over 20000
stations. Some have only 10 years of data. Some have 400 years of data. For
each station I want the top 25% NOT just the first 25% of all records. In
fact for all years and all stations there are a total 1057239 records. If I
include parameters to select just one station then it works for that station
but I don't want that. I want to crunch all the stations at once. I just
picked on one month here. Is there anyway I can manipulate that "TOP 25
PERCENT " to have it work on Precip_Inches.Jan only?

mcl said:
I was just presented a problem. How would you do a query where you select
the highest (or lowest) 25% of the values of a parameter? The parameter
typically has anywhere from 10 to over 100 values ranging from 0 to maybe 40
or 50. So the criteria would be the higest (or lowest) 25% of the records.
Any ideas?
 
That doesn't really do it. Each station in ther database has a WMO number
(World Meteorological Organization) plus additional digit (mod).
If I sort by precip first I get the top 25% of all January's no matter what
station. I need the top 25% of January's for EACH station.
Logically this is what I need:

SELECT Precip_Inches.wmo, Precip_Inches.mod, TOP 25 PERCENT
Precip_Inches.Jan
FROM Precip_Inches
ORDER B Precip_Inches.wmo, Precip_Inches.mod, Precip_Inches.Jan DESC;

Problem is that Access can't handle this logic.

If I put in the parameters to select a particular WMO and mod it works fine
for that one station. But I want to do them all at one shot.

BTW, if there are any weather geeks here this is the data I'm working with:

http://www.ncdc.noaa.gov/cgi-bin/res40.pl?page=ghcn.html

(You need to agree to the terms to get to it)


So I need to sort by WMO first, then display the top 25% for Jan for Each
WMO Number.
Paul Overway said:
You need change your order by to:

ORDER BY Precip_Inches.Jan DESC, Precip_Inches.wmo

or remove the sort on Precip_Inches.wmo altogether.

--
Paul Overway
Logico Solutions, LLC
www.logico-solutions.com


mcl said:
I saw the option to select how many records you want but that doesn't do
what I want. For example look at the SQL:

SELECT TOP 25 PERCENT Precip_Inches.wmo, Precip_Inches.[mod],
Precip_Inches.Jan
FROM Precip_Inches
ORDER BY Precip_Inches.wmo, Precip_Inches.Jan DESC;

I want the top 25 percent of Precip_Inches.Jan, not the first 25% of
records. This is weather data. It's monthly precip data for over 20000
stations. Some have only 10 years of data. Some have 400 years of data. For
each station I want the top 25% NOT just the first 25% of all records. In
fact for all years and all stations there are a total 1057239 records.
If
I
include parameters to select just one station then it works for that station
but I don't want that. I want to crunch all the stations at once. I just
picked on one month here. Is there anyway I can manipulate that "TOP 25
PERCENT " to have it work on Precip_Inches.Jan only?


maybe
 
Hi,


Isn't John Spencer's solution working?


Vanderghast, Access MVP


mcl said:
That doesn't really do it. Each station in ther database has a WMO number
(World Meteorological Organization) plus additional digit (mod).
If I sort by precip first I get the top 25% of all January's no matter what
station. I need the top 25% of January's for EACH station.
Logically this is what I need:

SELECT Precip_Inches.wmo, Precip_Inches.mod, TOP 25 PERCENT
Precip_Inches.Jan
FROM Precip_Inches
ORDER B Precip_Inches.wmo, Precip_Inches.mod, Precip_Inches.Jan DESC;

Problem is that Access can't handle this logic.

If I put in the parameters to select a particular WMO and mod it works fine
for that one station. But I want to do them all at one shot.

BTW, if there are any weather geeks here this is the data I'm working with:

http://www.ncdc.noaa.gov/cgi-bin/res40.pl?page=ghcn.html

(You need to agree to the terms to get to it)


So I need to sort by WMO first, then display the top 25% for Jan for Each
WMO Number.
Paul Overway said:
You need change your order by to:

ORDER BY Precip_Inches.Jan DESC, Precip_Inches.wmo

or remove the sort on Precip_Inches.wmo altogether.

--
Paul Overway
Logico Solutions, LLC
www.logico-solutions.com


mcl said:
I saw the option to select how many records you want but that doesn't do
what I want. For example look at the SQL:

SELECT TOP 25 PERCENT Precip_Inches.wmo, Precip_Inches.[mod],
Precip_Inches.Jan
FROM Precip_Inches
ORDER BY Precip_Inches.wmo, Precip_Inches.Jan DESC;

I want the top 25 percent of Precip_Inches.Jan, not the first 25% of
records. This is weather data. It's monthly precip data for over 20000
stations. Some have only 10 years of data. Some have 400 years of
data.
For
each station I want the top 25% NOT just the first 25% of all records. In
fact for all years and all stations there are a total 1057239 records.
If
I
include parameters to select just one station then it works for that station
but I don't want that. I want to crunch all the stations at once. I just
picked on one month here. Is there anyway I can manipulate that "TOP 25
PERCENT " to have it work on Precip_Inches.Jan only?


I was just presented a problem. How would you do a query where you select
the highest (or lowest) 25% of the values of a parameter? The parameter
typically has anywhere from 10 to over 100 values ranging from 0 to maybe
40
or 50. So the criteria would be the higest (or lowest) 25% of the records.
Any ideas?
 
Nope. As I said that gives me the top 25% of all records regardless of what
station it is. Basicallythe top 25% rainiest January's anywhere it the world
(20900 observing stations, 1057239 total January's). I need the top 25% for
EACH station.
If I specify ONE specific WMO & mod it works for that station. But I want to
do all 20900 at one shot.

Michel Walsh said:
Hi,


Isn't John Spencer's solution working?


Vanderghast, Access MVP


mcl said:
That doesn't really do it. Each station in ther database has a WMO number
(World Meteorological Organization) plus additional digit (mod).
If I sort by precip first I get the top 25% of all January's no matter what
station. I need the top 25% of January's for EACH station.
Logically this is what I need:

SELECT Precip_Inches.wmo, Precip_Inches.mod, TOP 25 PERCENT
Precip_Inches.Jan
FROM Precip_Inches
ORDER B Precip_Inches.wmo, Precip_Inches.mod, Precip_Inches.Jan DESC;

Problem is that Access can't handle this logic.

If I put in the parameters to select a particular WMO and mod it works fine
for that one station. But I want to do them all at one shot.

BTW, if there are any weather geeks here this is the data I'm working with:

http://www.ncdc.noaa.gov/cgi-bin/res40.pl?page=ghcn.html

(You need to agree to the terms to get to it)


So I need to sort by WMO first, then display the top 25% for Jan for Each
WMO Number.
doesn't
do
what I want. For example look at the SQL:

SELECT TOP 25 PERCENT Precip_Inches.wmo, Precip_Inches.[mod],
Precip_Inches.Jan
FROM Precip_Inches
ORDER BY Precip_Inches.wmo, Precip_Inches.Jan DESC;

I want the top 25 percent of Precip_Inches.Jan, not the first 25% of
records. This is weather data. It's monthly precip data for over 20000
stations. Some have only 10 years of data. Some have 400 years of data.
For
each station I want the top 25% NOT just the first 25% of all
records.
In
fact for all years and all stations there are a total 1057239
records.
If
I
include parameters to select just one station then it works for that
station
but I don't want that. I want to crunch all the stations at once. I just
picked on one month here. Is there anyway I can manipulate that "TOP 25
PERCENT " to have it work on Precip_Inches.Jan only?


I was just presented a problem. How would you do a query where you
select
the highest (or lowest) 25% of the values of a parameter? The parameter
typically has anywhere from 10 to over 100 values ranging from 0 to
maybe
40
or 50. So the criteria would be the higest (or lowest) 25% of the
records.
Any ideas?
 
Oops, wait a minute. I'm looking at it again (I'm working with a lot of
data. 25% of 1057239 records is a lot to play with.) Maybe it is working. Of
course if it is I'll be doing a lot more. I'll be doing the same for all
months, and also the low 25%. Taking the means of both and subtracting them.
It's someone's (not mine) idea for a precip variability index (other than st
dev) since precip isn't normally distributed.
Someone else came up with the idea and I said I would try to get Access to
do it. Kind of curious what it looks like. Might be an exercise in futility.

mcl said:
Nope. As I said that gives me the top 25% of all records regardless of what
station it is. Basicallythe top 25% rainiest January's anywhere it the world
(20900 observing stations, 1057239 total January's). I need the top 25% for
EACH station.
If I specify ONE specific WMO & mod it works for that station. But I want to
do all 20900 at one shot.

Michel Walsh said:
Hi,


Isn't John Spencer's solution working?


Vanderghast, Access MVP


mcl said:
That doesn't really do it. Each station in ther database has a WMO number
(World Meteorological Organization) plus additional digit (mod).
If I sort by precip first I get the top 25% of all January's no matter what
station. I need the top 25% of January's for EACH station.
Logically this is what I need:

SELECT Precip_Inches.wmo, Precip_Inches.mod, TOP 25 PERCENT
Precip_Inches.Jan
FROM Precip_Inches
ORDER B Precip_Inches.wmo, Precip_Inches.mod, Precip_Inches.Jan DESC;

Problem is that Access can't handle this logic.

If I put in the parameters to select a particular WMO and mod it works fine
for that one station. But I want to do them all at one shot.

BTW, if there are any weather geeks here this is the data I'm working with:

http://www.ncdc.noaa.gov/cgi-bin/res40.pl?page=ghcn.html

(You need to agree to the terms to get to it)


So I need to sort by WMO first, then display the top 25% for Jan for Each
WMO Number.
You need change your order by to:

ORDER BY Precip_Inches.Jan DESC, Precip_Inches.wmo

or remove the sort on Precip_Inches.wmo altogether.

--
Paul Overway
Logico Solutions, LLC
www.logico-solutions.com


I saw the option to select how many records you want but that
doesn't
do
what I want. For example look at the SQL:

SELECT TOP 25 PERCENT Precip_Inches.wmo, Precip_Inches.[mod],
Precip_Inches.Jan
FROM Precip_Inches
ORDER BY Precip_Inches.wmo, Precip_Inches.Jan DESC;

I want the top 25 percent of Precip_Inches.Jan, not the first 25% of
records. This is weather data. It's monthly precip data for over 20000
stations. Some have only 10 years of data. Some have 400 years of data.
For
each station I want the top 25% NOT just the first 25% of all records.
In
fact for all years and all stations there are a total 1057239 records.
If
I
include parameters to select just one station then it works for that
station
but I don't want that. I want to crunch all the stations at once.
I
just
picked on one month here. Is there anyway I can manipulate that
"TOP
25
PERCENT " to have it work on Precip_Inches.Jan only?


I was just presented a problem. How would you do a query where you
select
the highest (or lowest) 25% of the values of a parameter? The
parameter
typically has anywhere from 10 to over 100 values ranging from 0 to
maybe
40
or 50. So the criteria would be the higest (or lowest) 25% of the
records.
Any ideas?
 
Hi,


Are we speaking of the same SQL statement? In


SELECT A.wmo, A.[Mod], A.Jan
FROM Precip_Inches as A
WHERE PrimaryKey in
(SELECT TOP 25 Percent B.PrimaryKey
FROM Precip_Inches As B
WHERE B.wmo = A.wmo
ORDER BY B.Jan DESC)



clearly, the TOP 25 Percent occur on each wmo ( WHERE B.wmo=A.wmo ). May
be the ordering was confusing, add an order by (which does not change the
result, overall, just its presentation):

SELECT A.wmo, A.[Mod], A.Jan
FROM Precip_Inches as A
WHERE PrimaryKey in
(SELECT TOP 25 Percent B.PrimaryKey
FROM Precip_Inches As B
WHERE B.wmo = A.wmo
ORDER BY B.Jan DESC)
ORDER BY A.wmo, A.Jan DESC



That is the same kind of query as finding the best 10 salesmen per division,
or the top 10 exportations by countries, or the three most populous cities
per state, etc.



Hoping it may help,
Vanderghast, Access MVP
 
Hi,


On one million of records, that will take forever with Jet, and the
cubic of it in MS SQL Server. Better to try an alternative based on ranking
the data, indeed.

Append the data in a temp table that got an autonumber value (you don't
specify the autonumber field while appending the data), a little bit like:

INSERT INTO temp ( wmo, precipitation)
SELECT wmo, precipitation
FROM myTable
ORDER BY wmo, precipitation

I assume the autonumber field if called f1. That is involved, when you
compute the ranges of f1 to retrieve, in making a query, Q1:

SELECT wmo, MIN(f1) As Starting, MIN(f1) + 0.25*( MAX(f1)-MN(f1)) as
Ending
FROM temp
GROUP BY wmo

Indeed, in temp, the data is ordered, through precipitation, and f1, the
autonumber, is "ranking" (supplying the position) the data. So, for a wmo,
if f1_min is 1000 and f1_max is 2000, the top 25 is with values of f1 from
1000 to 1250. It is a given thing. Thanks to ranking... which had avoid the
time trap of the first technique ever proposed, by simply appending data in
a table, after having made a SORT on the data.


At last, in a final query, what is left is just about really picking the
data:

SELECT temp.wmo, temp.precipitation
FROM temp INNER JOIN Q1
ON temp.f1 BETWEEN Q1.Starting AND Q1.Ending


.... or Average it, as required...

That should be much faster... about execution time, than the first proposed
solution, but less elegant, in formulation... and you have to manage the
temp table (empty it when you start, etc)




Hoping it may help,
Vanderghast, Access MVP



mcl said:
Oops, wait a minute. I'm looking at it again (I'm working with a lot of
data. 25% of 1057239 records is a lot to play with.) Maybe it is working. Of
course if it is I'll be doing a lot more. I'll be doing the same for all
months, and also the low 25%. Taking the means of both and subtracting them.
It's someone's (not mine) idea for a precip variability index (other than st
dev) since precip isn't normally distributed.
Someone else came up with the idea and I said I would try to get Access to
do it. Kind of curious what it looks like. Might be an exercise in futility.

mcl said:
Nope. As I said that gives me the top 25% of all records regardless of what
station it is. Basicallythe top 25% rainiest January's anywhere it the world
(20900 observing stations, 1057239 total January's). I need the top 25% for
EACH station.
If I specify ONE specific WMO & mod it works for that station. But I
want
to
do all 20900 at one shot.

Michel Walsh said:
Hi,


Isn't John Spencer's solution working?


Vanderghast, Access MVP


That doesn't really do it. Each station in ther database has a WMO number
(World Meteorological Organization) plus additional digit (mod).
If I sort by precip first I get the top 25% of all January's no matter
what
station. I need the top 25% of January's for EACH station.
Logically this is what I need:

SELECT Precip_Inches.wmo, Precip_Inches.mod, TOP 25 PERCENT
Precip_Inches.Jan
FROM Precip_Inches
ORDER B Precip_Inches.wmo, Precip_Inches.mod, Precip_Inches.Jan DESC;

Problem is that Access can't handle this logic.

If I put in the parameters to select a particular WMO and mod it works
fine
for that one station. But I want to do them all at one shot.

BTW, if there are any weather geeks here this is the data I'm working
with:

http://www.ncdc.noaa.gov/cgi-bin/res40.pl?page=ghcn.html

(You need to agree to the terms to get to it)


So I need to sort by WMO first, then display the top 25% for Jan for Each
WMO Number.
You need change your order by to:

ORDER BY Precip_Inches.Jan DESC, Precip_Inches.wmo

or remove the sort on Precip_Inches.wmo altogether.

--
Paul Overway
Logico Solutions, LLC
www.logico-solutions.com


I saw the option to select how many records you want but that doesn't
do
what I want. For example look at the SQL:

SELECT TOP 25 PERCENT Precip_Inches.wmo, Precip_Inches.[mod],
Precip_Inches.Jan
FROM Precip_Inches
ORDER BY Precip_Inches.wmo, Precip_Inches.Jan DESC;

I want the top 25 percent of Precip_Inches.Jan, not the first
25%
once.
 
Back
Top