Calculating quartiles in a query

  • Thread starter Thread starter Paras Shah
  • Start date Start date
P

Paras Shah

Need assistance in automating the process of defining quartiles in a query.
What I mean by that is separating data into 4 groups based on the the value
of a single field. Please see example below:

Project No Score
13 8
478 10
563 2
576 19
213 32
431 1
143 23
245 14

I would like a query that will generate 4 new queries/ tables for each
quartile. So in this simple case of 8 records each new query/table should
have 2 records each, as such

Top Quartile
Project No Score
431 1
563 2

Middle-1 Quartile
Project No Score
13 8
478 10

Middle-2 Quartile
Project No Score
245 14
576 19

Bottom Quartile Score
143 23
213 32

Of course the source table will have records added to it on a regular basis
and hence I would like these quartiles to be dynamically generated - so 4
new queries is better than 4 new tables.

This one has really stumped me and all help is very much appreciated.
Thanks a ton.

-Paras
 
Hi,


SELECT a.ProjectNo, a.Score
FROM myTable As a INNER JOIN myTable As b
ON a.score >= b.score
GROUP BY a.ProjectNo, a.Score
HAVING COUNT(*) > x1* (SELECT COUNT(*) FROM myTable)
AND COUNT(*) <= x2 * (SELECT COUNT(*) FROM myTable)



where x1 and x2 are the breaking percentage (like x1=.25, x2=.5 ) . I assume
there is not two score ever equal. If so, change the ON clause
appropriately, like for something like:

ON a.score > b.score OR ( a.score=b.score AND a.pk>b.pk )

in order to necessary break all ties.


For the first and last quartile, you can always use

SELECT TOP 25 PERCENT *
FROM myTable
ORDER BY score [ ASC | DESC ]


(ie. use either ASC, either DESC ).




Hoping it may help
Vanderghast, Access MVP
 
Hello Michel-

Thank you for your reponse. Unfortunately I am not able to follow your
suggestion.

First off, there are definitely going to be "Scores" that are equal.
Secondly I am not sure where this code should inserted. Should I write this
up in the SQL view of a new query? Also, table "a" is supposed to be my
source table and then table "b" is the one of the created tables? Will this
just extend to create a table "c" and table "d" for the 3rd and 4th
quartiles.

Thanks,
Paras
Michel Walsh said:
Hi,


SELECT a.ProjectNo, a.Score
FROM myTable As a INNER JOIN myTable As b
ON a.score >= b.score
GROUP BY a.ProjectNo, a.Score
HAVING COUNT(*) > x1* (SELECT COUNT(*) FROM myTable)
AND COUNT(*) <= x2 * (SELECT COUNT(*) FROM myTable)



where x1 and x2 are the breaking percentage (like x1=.25, x2=.5 ) . I assume
there is not two score ever equal. If so, change the ON clause
appropriately, like for something like:

ON a.score > b.score OR ( a.score=b.score AND a.pk>b.pk )

in order to necessary break all ties.


For the first and last quartile, you can always use

SELECT TOP 25 PERCENT *
FROM myTable
ORDER BY score [ ASC | DESC ]


(ie. use either ASC, either DESC ).




Hoping it may help
Vanderghast, Access MVP



Paras Shah said:
Need assistance in automating the process of defining quartiles in a query.
What I mean by that is separating data into 4 groups based on the the value
of a single field. Please see example below:

Project No Score
13 8
478 10
563 2
576 19
213 32
431 1
143 23
245 14

I would like a query that will generate 4 new queries/ tables for each
quartile. So in this simple case of 8 records each new query/table should
have 2 records each, as such

Top Quartile
Project No Score
431 1
563 2

Middle-1 Quartile
Project No Score
13 8
478 10

Middle-2 Quartile
Project No Score
245 14
576 19

Bottom Quartile Score
143 23
213 32

Of course the source table will have records added to it on a regular basis
and hence I would like these quartiles to be dynamically generated - so 4
new queries is better than 4 new tables.

This one has really stumped me and all help is very much appreciated.
Thanks a ton.

-Paras
 
Hi,



a and b are alias for your table ( I use the name tableName,
use the real name in place of tableName, keep the alias a, and b, as they
are).

SELECT a.ProjectNo, a.Score
FROM myTable As a INNER JOIN myTable As b
ON (a.score > b.score) OR ( a.score=b.score AND a.ProjectNo >
b.ProjectNo )
GROUP BY a.ProjectNo, a.Score
HAVING COUNT(*) > x1* (SELECT COUNT(*) FROM myTable)
AND COUNT(*) <= x2 * (SELECT COUNT(*) FROM myTable)



You type that in the SQL view. Alternatively,

SELECT a.ProjectNo, a.Score
FROM myTable As a, myTable As b
WHERE (a.score > b.score) OR ( a.score=b.score AND a.ProjectNo >
b.ProjectNo )
GROUP BY a.ProjectNo, a.Score
HAVING COUNT(*) > x1* (SELECT COUNT(*) FROM myTable)
AND COUNT(*) <= x2 * (SELECT COUNT(*) FROM myTable)


and you would be able to continue to use the graphical part of the query
designer.


You use the query, with the appropriate value for x1 and x2, where a
""quartile table" is required. You don't have to create table, by
themselves, since the query can do it as well, and furthermore, is always up
to date... (while a computed table may not have the latest modifications).
The only reason you may got to make a table out of the query is if you need
to MODIFY the data in that table (because the actual query is NOT
updateable). If you need to create a table, in the designer, change the
select type to a create table query type.




Hoping it may help,
Vanderghast, Access MVP



Paras Shah said:
Hello Michel-

Thank you for your reponse. Unfortunately I am not able to follow your
suggestion.

First off, there are definitely going to be "Scores" that are equal.
Secondly I am not sure where this code should inserted. Should I write this
up in the SQL view of a new query? Also, table "a" is supposed to be my
source table and then table "b" is the one of the created tables? Will this
just extend to create a table "c" and table "d" for the 3rd and 4th
quartiles.

Thanks,
Paras
Michel Walsh said:
Hi,


SELECT a.ProjectNo, a.Score
FROM myTable As a INNER JOIN myTable As b
ON a.score >= b.score
GROUP BY a.ProjectNo, a.Score
HAVING COUNT(*) > x1* (SELECT COUNT(*) FROM myTable)
AND COUNT(*) <= x2 * (SELECT COUNT(*) FROM myTable)



where x1 and x2 are the breaking percentage (like x1=.25, x2=.5 ) . I assume
there is not two score ever equal. If so, change the ON clause
appropriately, like for something like:

ON a.score > b.score OR ( a.score=b.score AND a.pk>b.pk )

in order to necessary break all ties.


For the first and last quartile, you can always use

SELECT TOP 25 PERCENT *
FROM myTable
ORDER BY score [ ASC | DESC ]


(ie. use either ASC, either DESC ).




Hoping it may help
Vanderghast, Access MVP



Paras Shah said:
Need assistance in automating the process of defining quartiles in a query.
What I mean by that is separating data into 4 groups based on the the value
of a single field. Please see example below:

Project No Score
13 8
478 10
563 2
576 19
213 32
431 1
143 23
245 14

I would like a query that will generate 4 new queries/ tables for each
quartile. So in this simple case of 8 records each new query/table should
have 2 records each, as such

Top Quartile
Project No Score
431 1
563 2

Middle-1 Quartile
Project No Score
13 8
478 10

Middle-2 Quartile
Project No Score
245 14
576 19

Bottom Quartile Score
143 23
213 32

Of course the source table will have records added to it on a regular basis
and hence I would like these quartiles to be dynamically generated -
so
 
Hello Michel-

Really appreciate your continued help!!

The query does work except for one issue - it does not include the lowest
score, i.e. it does include twhat should be the first record of the 1st
quartile. Why does this happen?

Also, while the code works, I cannot figure out what function is actually
figuring out what should be in 4 quartiles, i.e. how is it sorting the data
based on Score?

-Paras
Michel Walsh said:
Hi,



a and b are alias for your table ( I use the name tableName,
use the real name in place of tableName, keep the alias a, and b, as they
are).

SELECT a.ProjectNo, a.Score
FROM myTable As a INNER JOIN myTable As b
ON (a.score > b.score) OR ( a.score=b.score AND a.ProjectNo >
b.ProjectNo )
GROUP BY a.ProjectNo, a.Score
HAVING COUNT(*) > x1* (SELECT COUNT(*) FROM myTable)
AND COUNT(*) <= x2 * (SELECT COUNT(*) FROM myTable)



You type that in the SQL view. Alternatively,

SELECT a.ProjectNo, a.Score
FROM myTable As a, myTable As b
WHERE (a.score > b.score) OR ( a.score=b.score AND a.ProjectNo >
b.ProjectNo )
GROUP BY a.ProjectNo, a.Score
HAVING COUNT(*) > x1* (SELECT COUNT(*) FROM myTable)
AND COUNT(*) <= x2 * (SELECT COUNT(*) FROM myTable)


and you would be able to continue to use the graphical part of the query
designer.


You use the query, with the appropriate value for x1 and x2, where a
""quartile table" is required. You don't have to create table, by
themselves, since the query can do it as well, and furthermore, is always up
to date... (while a computed table may not have the latest modifications).
The only reason you may got to make a table out of the query is if you need
to MODIFY the data in that table (because the actual query is NOT
updateable). If you need to create a table, in the designer, change the
select type to a create table query type.




Hoping it may help,
Vanderghast, Access MVP



Paras Shah said:
Hello Michel-

Thank you for your reponse. Unfortunately I am not able to follow your
suggestion.

First off, there are definitely going to be "Scores" that are equal.
Secondly I am not sure where this code should inserted. Should I write this
up in the SQL view of a new query? Also, table "a" is supposed to be my
source table and then table "b" is the one of the created tables? Will this
just extend to create a table "c" and table "d" for the 3rd and 4th
quartiles.

Thanks,
Paras
Michel Walsh said:
Hi,


SELECT a.ProjectNo, a.Score
FROM myTable As a INNER JOIN myTable As b
ON a.score >= b.score
GROUP BY a.ProjectNo, a.Score
HAVING COUNT(*) > x1* (SELECT COUNT(*) FROM myTable)
AND COUNT(*) <= x2 * (SELECT COUNT(*) FROM myTable)



where x1 and x2 are the breaking percentage (like x1=.25, x2=.5 ) . I assume
there is not two score ever equal. If so, change the ON clause
appropriately, like for something like:

ON a.score > b.score OR ( a.score=b.score AND a.pk>b.pk )

in order to necessary break all ties.


For the first and last quartile, you can always use

SELECT TOP 25 PERCENT *
FROM myTable
ORDER BY score [ ASC | DESC ]


(ie. use either ASC, either DESC ).




Hoping it may help
Vanderghast, Access MVP



Need assistance in automating the process of defining quartiles in a
query.
What I mean by that is separating data into 4 groups based on the the
value
of a single field. Please see example below:

Project No Score
13 8
478 10
563 2
576 19
213 32
431 1
143 23
245 14

I would like a query that will generate 4 new queries/ tables for each
quartile. So in this simple case of 8 records each new query/table should
have 2 records each, as such

Top Quartile
Project No Score
431 1
563 2

Middle-1 Quartile
Project No Score
13 8
478 10

Middle-2 Quartile
Project No Score
245 14
576 19

Bottom Quartile Score
143 23
213 32

Of course the source table will have records added to it on a regular
basis
and hence I would like these quartiles to be dynamically generated -
so
4
new queries is better than 4 new tables.

This one has really stumped me and all help is very much appreciated.
Thanks a ton.

-Paras
 
I just did an experiment and removed the AND condition of
a.ProjectNo>b.projectNo

Doing this allowed for the lowest Score to be included. And it did not seem
to have any adverse effect. Why was this condition included in the first
place?

-Paras
Michel Walsh said:
Hi,



a and b are alias for your table ( I use the name tableName,
use the real name in place of tableName, keep the alias a, and b, as they
are).

SELECT a.ProjectNo, a.Score
FROM myTable As a INNER JOIN myTable As b
ON (a.score > b.score) OR ( a.score=b.score AND a.ProjectNo >
b.ProjectNo )
GROUP BY a.ProjectNo, a.Score
HAVING COUNT(*) > x1* (SELECT COUNT(*) FROM myTable)
AND COUNT(*) <= x2 * (SELECT COUNT(*) FROM myTable)



You type that in the SQL view. Alternatively,

SELECT a.ProjectNo, a.Score
FROM myTable As a, myTable As b
WHERE (a.score > b.score) OR ( a.score=b.score AND a.ProjectNo >
b.ProjectNo )
GROUP BY a.ProjectNo, a.Score
HAVING COUNT(*) > x1* (SELECT COUNT(*) FROM myTable)
AND COUNT(*) <= x2 * (SELECT COUNT(*) FROM myTable)


and you would be able to continue to use the graphical part of the query
designer.


You use the query, with the appropriate value for x1 and x2, where a
""quartile table" is required. You don't have to create table, by
themselves, since the query can do it as well, and furthermore, is always up
to date... (while a computed table may not have the latest modifications).
The only reason you may got to make a table out of the query is if you need
to MODIFY the data in that table (because the actual query is NOT
updateable). If you need to create a table, in the designer, change the
select type to a create table query type.




Hoping it may help,
Vanderghast, Access MVP



Paras Shah said:
Hello Michel-

Thank you for your reponse. Unfortunately I am not able to follow your
suggestion.

First off, there are definitely going to be "Scores" that are equal.
Secondly I am not sure where this code should inserted. Should I write this
up in the SQL view of a new query? Also, table "a" is supposed to be my
source table and then table "b" is the one of the created tables? Will this
just extend to create a table "c" and table "d" for the 3rd and 4th
quartiles.

Thanks,
Paras
Michel Walsh said:
Hi,


SELECT a.ProjectNo, a.Score
FROM myTable As a INNER JOIN myTable As b
ON a.score >= b.score
GROUP BY a.ProjectNo, a.Score
HAVING COUNT(*) > x1* (SELECT COUNT(*) FROM myTable)
AND COUNT(*) <= x2 * (SELECT COUNT(*) FROM myTable)



where x1 and x2 are the breaking percentage (like x1=.25, x2=.5 ) . I assume
there is not two score ever equal. If so, change the ON clause
appropriately, like for something like:

ON a.score > b.score OR ( a.score=b.score AND a.pk>b.pk )

in order to necessary break all ties.


For the first and last quartile, you can always use

SELECT TOP 25 PERCENT *
FROM myTable
ORDER BY score [ ASC | DESC ]


(ie. use either ASC, either DESC ).




Hoping it may help
Vanderghast, Access MVP



Need assistance in automating the process of defining quartiles in a
query.
What I mean by that is separating data into 4 groups based on the the
value
of a single field. Please see example below:

Project No Score
13 8
478 10
563 2
576 19
213 32
431 1
143 23
245 14

I would like a query that will generate 4 new queries/ tables for each
quartile. So in this simple case of 8 records each new query/table should
have 2 records each, as such

Top Quartile
Project No Score
431 1
563 2

Middle-1 Quartile
Project No Score
13 8
478 10

Middle-2 Quartile
Project No Score
245 14
576 19

Bottom Quartile Score
143 23
213 32

Of course the source table will have records added to it on a regular
basis
and hence I would like these quartiles to be dynamically generated -
so
4
new queries is better than 4 new tables.

This one has really stumped me and all help is very much appreciated.
Thanks a ton.

-Paras
 
Hi,


Indeed, it should have been >= rather than > :

ON (a.score>b.score) OR ( a.score=b.score AND a.ProjectNo >= b.ProjectNo )


It is include to artificially break possible equality. Assume:

Score ProjectNo
1 1
1 2
1 3
1 4
1 5
6 6
7 7
8 8

then, the first quartile will be made of just the first two records, even
if, in theory, the fifth first ones should be able to made it (leaving no
one for the second quartile). If you don't want that behavior and accept the
(light) possibility that there is no record at all in one of the quartile,
just use, for the ON clause:

ON a.score >= b.score



Hoping it may help,
Vanderghast, Access MVP



Paras Shah said:
I just did an experiment and removed the AND condition of
a.ProjectNo>b.projectNo

Doing this allowed for the lowest Score to be included. And it did not seem
to have any adverse effect. Why was this condition included in the first
place?

-Paras
Michel Walsh said:
Hi,



a and b are alias for your table ( I use the name tableName,
use the real name in place of tableName, keep the alias a, and b, as they
are).

SELECT a.ProjectNo, a.Score
FROM myTable As a INNER JOIN myTable As b
ON (a.score > b.score) OR ( a.score=b.score AND a.ProjectNo >
b.ProjectNo )
GROUP BY a.ProjectNo, a.Score
HAVING COUNT(*) > x1* (SELECT COUNT(*) FROM myTable)
AND COUNT(*) <= x2 * (SELECT COUNT(*) FROM myTable)



You type that in the SQL view. Alternatively,

SELECT a.ProjectNo, a.Score
FROM myTable As a, myTable As b
WHERE (a.score > b.score) OR ( a.score=b.score AND a.ProjectNo >
b.ProjectNo )
GROUP BY a.ProjectNo, a.Score
HAVING COUNT(*) > x1* (SELECT COUNT(*) FROM myTable)
AND COUNT(*) <= x2 * (SELECT COUNT(*) FROM myTable)


and you would be able to continue to use the graphical part of the query
designer.


You use the query, with the appropriate value for x1 and x2, where a
""quartile table" is required. You don't have to create table, by
themselves, since the query can do it as well, and furthermore, is
always
up
to date... (while a computed table may not have the latest modifications).
The only reason you may got to make a table out of the query is if you need
to MODIFY the data in that table (because the actual query is NOT
updateable). If you need to create a table, in the designer, change the
select type to a create table query type.




Hoping it may help,
Vanderghast, Access MVP



Paras Shah said:
Hello Michel-

Thank you for your reponse. Unfortunately I am not able to follow your
suggestion.

First off, there are definitely going to be "Scores" that are equal.
Secondly I am not sure where this code should inserted. Should I
write
this
up in the SQL view of a new query? Also, table "a" is supposed to be my
source table and then table "b" is the one of the created tables?
Will
this
just extend to create a table "c" and table "d" for the 3rd and 4th
quartiles.

Thanks,
Paras
Hi,


SELECT a.ProjectNo, a.Score
FROM myTable As a INNER JOIN myTable As b
ON a.score >= b.score
GROUP BY a.ProjectNo, a.Score
HAVING COUNT(*) > x1* (SELECT COUNT(*) FROM myTable)
AND COUNT(*) <= x2 * (SELECT COUNT(*) FROM myTable)



where x1 and x2 are the breaking percentage (like x1=.25, x2=.5 ) . I
assume
there is not two score ever equal. If so, change the ON clause
appropriately, like for something like:

ON a.score > b.score OR ( a.score=b.score AND a.pk>b.pk )

in order to necessary break all ties.


For the first and last quartile, you can always use

SELECT TOP 25 PERCENT *
FROM myTable
ORDER BY score [ ASC | DESC ]


(ie. use either ASC, either DESC ).




Hoping it may help
Vanderghast, Access MVP



Need assistance in automating the process of defining quartiles in a
query.
What I mean by that is separating data into 4 groups based on the the
value
of a single field. Please see example below:

Project No Score
13 8
478 10
563 2
576 19
213 32
431 1
143 23
245 14

I would like a query that will generate 4 new queries/ tables for each
quartile. So in this simple case of 8 records each new query/table
should
have 2 records each, as such

Top Quartile
Project No Score
431 1
563 2

Middle-1 Quartile
Project No Score
13 8
478 10

Middle-2 Quartile
Project No Score
245 14
576 19

Bottom Quartile Score
143 23
213 32

Of course the source table will have records added to it on a regular
basis
and hence I would like these quartiles to be dynamically
generated -
so
4
new queries is better than 4 new tables.

This one has really stumped me and all help is very much appreciated.
Thanks a ton.

-Paras
 
Thank you for all your help Michel. It has really been useful!!

-Paras
Michel Walsh said:
Hi,


Indeed, it should have been >= rather than > :

ON (a.score>b.score) OR ( a.score=b.score AND a.ProjectNo >= b.ProjectNo )


It is include to artificially break possible equality. Assume:

Score ProjectNo
1 1
1 2
1 3
1 4
1 5
6 6
7 7
8 8

then, the first quartile will be made of just the first two records, even
if, in theory, the fifth first ones should be able to made it (leaving no
one for the second quartile). If you don't want that behavior and accept the
(light) possibility that there is no record at all in one of the quartile,
just use, for the ON clause:

ON a.score >= b.score



Hoping it may help,
Vanderghast, Access MVP



Paras Shah said:
I just did an experiment and removed the AND condition of
a.ProjectNo>b.projectNo

Doing this allowed for the lowest Score to be included. And it did not seem
to have any adverse effect. Why was this condition included in the first
place?

-Paras
always
..
I
assume
there is not two score ever equal. If so, change the ON clause
appropriately, like for something like:

ON a.score > b.score OR ( a.score=b.score AND a.pk>b.pk )

in order to necessary break all ties.


For the first and last quartile, you can always use

SELECT TOP 25 PERCENT *
FROM myTable
ORDER BY score [ ASC | DESC ]


(ie. use either ASC, either DESC ).




Hoping it may help
Vanderghast, Access MVP



Need assistance in automating the process of defining quartiles
in
a the
the for
each
 
Back
Top