Frequency distribution

  • Thread starter Thread starter Andreas
  • Start date Start date
A

Andreas

Hi all,

I have a query "qryP&L" which contains the column "AfterTaxPL". These
are profit and losses, i.e. positive and negative numbers. I would
like to create a frequency distribution. I already specified the
boundaries in a seperate table called
"tblFrequencyDistributionBounds". This table comprises of three fields
"ID","LowerBound", and "UpperBound".

I would like to create a new query that gives me the frequency
distribution for "AfterTaxPL" according to the boundaries specified in
"tblFrequencyDistributionBounds". I am just not that familiar with SQL
in order to combine this table with the particular column of my
original query.

Can anybody help?

Regards,
Andreas
 
Hi all,

I have a query "qryP&L" which contains the column "AfterTaxPL". These
are profit and losses, i.e. positive and negative numbers. I would
like to create a frequency distribution. I already specified the
boundaries in a seperate table called
"tblFrequencyDistributionBounds". This table comprises of three fields
"ID","LowerBound", and "UpperBound".

I would like to create a new query that gives me the frequency
distribution for "AfterTaxPL" according to the boundaries specified in
"tblFrequencyDistributionBounds". I am just not that familiar with SQL
in order to combine this table with the particular column of my
original query.

Can anybody help?

Regards,
Andreas

If your bounds are uniformly spaced, maybe the Partition() function
will give you the results you seek. If not, post some sample tables
and a sample of the results you want and perhaps someone here will
manufacture an appropriate query for you.

James A. Fortune
(e-mail address removed)
 
If your bounds are uniformly spaced, maybe the Partition() function
will give you the results you seek.  If not, post some sample tables
and a sample of the results you want and perhaps someone here will
manufacture an appropriate query for you.

The bounds are not uniformly spaced and as I have negative values for
50% of the bounds, I can't use the Partition() function. This function
is only good for boundaries with positive values, at least as far as I
know.

tblFrequencyDistributionBounds

ID LowerBound UpperBound
01 -1000
02 -999 -500
03 -499 -250
04 -249 0
05 1 250
06 251 500
07 500 1000
08 1001

qryP&L

AfterTaxPL
675
350
-800
-20
45
23
150
2500
5400

The frequency distribution should then look like this:

x < -1000: 0
-999 < x < -500: 1
-499 < x < -250: 0
-249 < x < 0: 1
1 < x < 250: 3
251 < x < 500 1
500 < x < 1000 1
x > 1000: 2

Hope that helps clarifying my problem.

Regards,
Andreas
 
Andreas said:
The bounds are not uniformly spaced and as I have negative values for
50% of the bounds, I can't use the Partition() function. This function
is only good for boundaries with positive values, at least as far as I
know.

tblFrequencyDistributionBounds

ID LowerBound UpperBound
01 -1000
02 -999 -500
03 -499 -250
04 -249 0
05 1 250
06 251 500
07 500 1000
08 1001

qryP&L

AfterTaxPL
675
350
-800
-20
45
23
150
2500
5400

The frequency distribution should then look like this:

x < -1000: 0
-999 < x < -500: 1
-499 < x < -250: 0
-249 < x < 0: 1
1 < x < 250: 3
251 < x < 500 1
500 < x < 1000 1
x > 1000: 2

Something like this should work (not tested):
Select
FDB.LowerBound, FDB.UpperBound, count(PL.AfterTaxPL) as TheCount
From tblFrequencyDistributionBounds as FDB
Left Outer Join qryP&L as PL
On (PL.AfterTaxPL>= FDB.LowerBound Or FDB.LowerBound Is Null)
And (PL.AfterTaxPL <= FDB.UpperBound Or FDB.UpperBound Is Null)
Group By FDB.LowerBound, FDB.UpperBound
Order By FDB.LowerBound

If Access does not support inequalities in the join condition, you can move
the conditions to a Where clause. It looks like you have small gaps between
your bands: upper limit = -1000 and the next band's lower limit = -999,
so -999.50 does not fall into either band. You could maybe use -999.99 as
the lower limit, or use -1000 and change the upper bound to just < instead
of <=.
 
Something like this should work (not tested):
Select
    FDB.LowerBound, FDB.UpperBound, count(PL.AfterTaxPL) as TheCount
From tblFrequencyDistributionBounds as FDB
    Left Outer Join qryP&L as PL
        On (PL.AfterTaxPL>= FDB.LowerBound Or FDB.LowerBound IsNull)
            And (PL.AfterTaxPL <= FDB.UpperBound Or FDB.UpperBound Is Null)
Group By FDB.LowerBound, FDB.UpperBound
Order By FDB.LowerBound

If Access does not support inequalities in the join condition, you can move
the conditions to a Where clause. It looks like you have small gaps between
your bands: upper limit = -1000 and the next band's lower limit = -999,
so -999.50 does not fall into either band. You could maybe use -999.99 as
the lower limit, or use -1000 and change the upper bound to just < instead
of <=.

Hi Paul,

First, your solution is working perfectly fine. Thanks for that!
Second, you are right with the boundaries, I only used the < for the
upper bound.

I am just wondering whether you are willing to explain the SQL code to
me in a more detailed fashion. I particularly don't understand the
Left Outer Join command as well as the "FDB.LowerBound Is Null"
criteria.

Also, "TheCount" now gives me absolute values. What if I need the
result in percentage values, i.e. I need to divide the result by the
sum of all counts. How am I supposed to incorporate this issue into
the SQL code?

Regards,
Andreas
 
The bounds are not uniformly spaced and as I have negative values for
50% of the bounds, I can't use the Partition() function. This function
is only good for boundaries with positive values, at least as far as I
know.

I see that Paul Shapiro has provided a solution. Uniform spacing is
the most important criteria for using the Partition() function, at
least easily :-). With a little creativity, negative values should
not have stopped you from using the Partition() function.

James A. Fortune
(e-mail address removed)
 
I see that Paul Shapiro has provided a solution. Uniform spacing is
the most important criteria for using the Partition() function, at

I should have said "criterion."

James A. Fortune
(e-mail address removed)
 
Andreas said:
Hi Paul,

First, your solution is working perfectly fine. Thanks for that!
Second, you are right with the boundaries, I only used the < for the
upper bound.

I am just wondering whether you are willing to explain the SQL code to
me in a more detailed fashion. I particularly don't understand the
Left Outer Join command as well as the "FDB.LowerBound Is Null"
criteria.

Also, "TheCount" now gives me absolute values. What if I need the
result in percentage values, i.e. I need to divide the result by the
sum of all counts. How am I supposed to incorporate this issue into
the SQL code?

Regards,
Andreas

Left Outer Join means that all records from the left-side table in the join
expression are included in the output, even if there are no matching records
in the right-side table. If you used an inner join you would not see any
band that had zero rows in the output, meaning no matching rows in qryP&L.

The "FDB.LowerBound Is Null" is required because your lowest band has a null
value for the lower limit, and similarly the highest band has a null value
for the upper limit. Any comparison to a null value returns null, not true.
So (100 > null) is false, and those two outer bands would not be included in
your output without the special null criteria.

For the percentages, this would work in SLQ Server, but maybe not in Access.
The only change is the scalar subquery used in the Select clause to compute
the total row count.
Select
FDB.LowerBound, FDB.UpperBound, count(PL.AfterTaxPL) as TheCount
, count(PL.AfterTaxPL)/(Select count(*) as TotalRows From qryP&L) as
ThePercentage
From tblFrequencyDistributionBounds as FDB
Left Outer Join qryP&L as PL
On (PL.AfterTaxPL>= FDB.LowerBound Or FDB.LowerBound Is Null)
And (PL.AfterTaxPL <= FDB.UpperBound Or FDB.UpperBound Is Null)

If that doesn't work, then maybe this would:
Select
FDB.LowerBound, FDB.UpperBound, count(PL.AfterTaxPL) as TheCount
, count(PL.AfterTaxPL)/RowCount.TotalRows as ThePercentage
From tblFrequencyDistributionBounds as FDB
Left Outer Join qryP&L as PL
On (PL.AfterTaxPL>= FDB.LowerBound Or FDB.LowerBound Is Null)
And (PL.AfterTaxPL <= FDB.UpperBound Or FDB.UpperBound Is Null)
Cross Join (Select count(*) as TotalRows From qryP&L) as RowCount

The cross join (syntax may not be supported in Access?) is the same as an
inner join without an On clause. It combines all rows from each table. In
this case the RowCount subquery will always have a single row, so the only
change is to make that count available.

If that still doesn't work you could use a saved query to compute the total
row count. Include that row counting saved query in your present query,
without any join lines to any other table.
 
Left Outer Join means that all records from the left-side table in the join
expression are included in the output, even if there are no matching records
in the right-side table. If you used an inner join you would not see any
band that had zero rows in the output, meaning no matching rows in qryP&L..

The "FDB.LowerBound Is Null" is required because your lowest band has a null
value for the lower limit, and similarly the highest band has a null value
for the upper limit. Any comparison to a null value returns null, not true.
So (100 > null) is false, and those two outer bands would not be includedin
your output without the special null criteria.

For the percentages, this would work in SLQ Server, but maybe not in Access.
The only change is the scalar subquery used in the Select clause to compute
the total row count.
Select
     FDB.LowerBound, FDB.UpperBound, count(PL.AfterTaxPL) as TheCount
    , count(PL.AfterTaxPL)/(Select count(*) as TotalRows From qryP&L)as
ThePercentage
From tblFrequencyDistributionBounds as FDB
     Left Outer Join qryP&L as PL
         On (PL.AfterTaxPL>= FDB.LowerBound Or FDB.LowerBoundIs Null)
             And (PL.AfterTaxPL <= FDB.UpperBound Or FDB.UpperBound Is Null)

If that doesn't work, then maybe this would:
Select
     FDB.LowerBound, FDB.UpperBound, count(PL.AfterTaxPL) as TheCount
    , count(PL.AfterTaxPL)/RowCount.TotalRows as ThePercentage
From tblFrequencyDistributionBounds as FDB
     Left Outer Join qryP&L as PL
         On (PL.AfterTaxPL>= FDB.LowerBound Or FDB.LowerBoundIs Null)
             And (PL.AfterTaxPL <= FDB.UpperBound Or FDB.UpperBound Is Null)
    Cross Join (Select count(*) as TotalRows From qryP&L) as RowCount

The cross join (syntax may not be supported in Access?) is the same as an
inner join without an On clause. It combines all rows from each table. In
this case the RowCount subquery will always have a single row, so the only
change is to make that count available.

If that still doesn't work you could use a saved query to compute the total
row count. Include that row counting saved query in your present query,
without any join lines to any other table.

Hi Paul,

Thanks for the explanations. Your first solution works perfectly.

Regards,
Andreas
 
fashionable

Andreas said:
The bounds are not uniformly spaced and as I have negative values for
50% of the bounds, I can't use the Partition() function. This function
is only good for boundaries with positive values, at least as far as I
know.

tblFrequencyDistributionBounds

ID LowerBound UpperBound
01 -1000
02 -999 -500
03 -499 -250
04 -249 0
05 1 250
06 251 500
07 500 1000
08 1001

qryP&L

AfterTaxPL
675
350
-800
-20
45
23
150
2500
5400

The frequency distribution should then look like this:

x < -1000: 0
-999 < x < -500: 1
-499 < x < -250: 0
-249 < x < 0: 1
1 < x < 250: 3
251 < x < 500 1
500 < x < 1000 1
x > 1000: 2

Hope that helps clarifying my problem.

Regards,
Andreas
 
Andreas said:
Hi all,

I have a query "qryP&L" which contains the column "AfterTaxPL". These
are profit and losses, i.e. positive and negative numbers. I would
like to create a frequency distribution. I already specified the
boundaries in a seperate table called
"tblFrequencyDistributionBounds". This table comprises of three fields
"ID","LowerBound", and "UpperBound".

I would like to create a new query that gives me the frequency
distribution for "AfterTaxPL" according to the boundaries specified in
"tblFrequencyDistributionBounds". I am just not that familiar with SQL
in order to combine this table with the particular column of my
original query.

Can anybody help?

Regards,
Andreas
 
Back
Top