Frequency distribution and descriptive statististics

  • Thread starter Thread starter Luciano Paulino da Silva
  • Start date Start date
L

Luciano Paulino da Silva

Dear All,
I have a worksheet like the following distribution in A and B columns
where the frequency is the number of occurrences for a given class
that could be repeated:
A B
Frequency Class
38 3
11 3
2 10
53 2
33 3
19 4
16 5
15 6
2 7
2 8
13 4
53 2
39 3
15 5

For some situations I can have thousands of rows.

In this way, I first need that the frequency distribution of these
data could be organized like the following:

Class Frequency
2 106
3 121
4 32
5 16
6 15
7 2
8 2
9 0
10 2

After that, I need that the descriptive statistics of this data could
be calculated including:

Mean
Median
Mode
Minimum
First Quartile (Q1)
Second Quartile (Q2)
Third Quartile (Q3)
Maximum
R

Variance
Standard deviation
Mean deviation
Sum of squares of deviation

Skewness
Kurtosis
Normal-inverse Gaussian distribution
Confidence interval 0.05
Confidence interval 0.50
Confidence interval 0.95

Somebody could help me with a macro or other solution to this?
Thanks in advance,
Luciano
 
Dear Joel,
Thank you very much for your prompt response. However, I have tested
the macro and it is running fine but the descriptive statistics is
being calculated only from frequency data instead class data taking
into account the number of ocurrences on the frequency.
The example given:
2 106
3 121
4 32
5 16
6 15
7 2
8 2
9 0
10 2

The column to be taken into account it would be the column 1 for which
the mean and all other parameters should be calculated considering the
number of occurrences of the... for example the number 2 is repeated
106 times. Did you understand the major question?
Thanks in advance,
Luciano
 
Dear Bernd,
Your suggestions are good options. However, I need a more complete
descriptive statistics including several other calculations.
Thank you,
Luciano
 
Dear Joel,
Thank you very much for your help. However I believe that I'm not
explaining correctly what should be the method. I need a statistics
for weighted values for which I will not have the descriptive
statistics of the frequency or class but the descriptive statistics of
the frequency distribution table. Like this in the case of the mean:
http://www.ltcconline.net/greenl/courses/201/descstat/meanSDGrouped.htm
Thanks in advance,
Luciano
 
I took all the data into an array TotalArray which has each class item
duplicated in the array acording to the frequency.  I then ran the
statics on this array and put it into column G

Sub GetStatistics()
Dim FrequencyArray() As Variant
Dim TotalArray() As Variant

Set Oldsht = ActiveSheet

RowCount = 2
NewRow = 2
With Oldsht
.Range("E1") = "Class"
.Range("F1") = "Frequency"
LastRow = .Range("A" & Rows.Count).End(xlUp).Row
Set ClassRange = .Range("B1:B" & LastRow)

'put a the data into a single array
TotalCount = 0
Do While .Range("A" & RowCount) <> ""
Class = .Range("B" & RowCount)
Frequency = .Range("A" & RowCount)
For i = 1 To Frequency
ReDim Preserve TotalArray(0 To TotalCount)
TotalArray(TotalCount) = Class
TotalCount = TotalCount + 1
Next i

'look up class in column E
Set c = .Columns("E").Find(what:=Class, _
LookIn:=xlValues, lookat:=xlWhole)
If c Is Nothing Then
.Range("E" & NewRow) = Class
.Range("F" & NewRow) = Frequency
NewRow = NewRow + 1
Else
.Range("F" & c.Row) = _
.Range("F" & c.Row) + Frequency
End If

RowCount = RowCount + 1
Loop

'sort the results
LastRow = .Range("E" & Rows.Count).End(xlUp).Row
.Range("E1:F" & LastRow).Sort _
header:=xlYes, _
Key1:=.Range("E1"), _
order1:=xlAscending

Set DataRange = .Range("F2:F" & LastRow)
SummaryRow = LastRow + 2
.Range("E" & SummaryRow) = "Statistics"

Mean = WorksheetFunction.Average(DataRange)
.Range("E" & (SummaryRow + 1)) = "Mean"
.Range("F" & (SummaryRow + 1)) = Mean
Mean = WorksheetFunction.Average(TotalArray)
.Range("G" & (SummaryRow + 1)) = Mean

Median = WorksheetFunction.Median(DataRange)
.Range("E" & (SummaryRow + 2)) = "Median"
.Range("F" & (SummaryRow + 2)) = Median
Median = WorksheetFunction.Median(TotalArray)
.Range("G" & (SummaryRow + 2)) = Median

Mode = WorksheetFunction.Mode(DataRange)
.Range("E" & (SummaryRow + 3)) = "Mode"
.Range("F" & (SummaryRow + 3)) = Mode
Mode = WorksheetFunction.Mode(TotalArray)
.Range("G" & (SummaryRow + 3)) = Mode

Minimum = WorksheetFunction.Min(DataRange)
.Range("E" & (SummaryRow + 4)) = "Minimum"
.Range("F" & (SummaryRow + 4)) = Minimum
Minimum = WorksheetFunction.Min(TotalArray)
.Range("G" & (SummaryRow + 4)) = Minimum

Quartile = WorksheetFunction.Quartile(DataRange, 1)
.Range("E" & (SummaryRow + 5)) = "First Quartile"
.Range("F" & (SummaryRow + 5)) = Quartile
Quartile = WorksheetFunction.Quartile(TotalArray, 1)
.Range("G" & (SummaryRow + 5)) = Quartile

Quartile = WorksheetFunction.Quartile(DataRange, 2)
.Range("E" & (SummaryRow + 6)) = "Second Quartile"
.Range("F" & (SummaryRow + 6)) = Quartile
Quartile = WorksheetFunction.Quartile(TotalArray, 2)
.Range("G" & (SummaryRow + 6)) = Quartile

Quartile = WorksheetFunction.Quartile(DataRange, 3)
.Range("E" & (SummaryRow + 7)) = "Third Quartile"
.Range("F" & (SummaryRow + 7)) = Quartile
Quartile = WorksheetFunction.Quartile(TotalArray, 3)
.Range("G" & (SummaryRow + 7)) = Quartile

Max = WorksheetFunction.Max(DataRange)
.Range("E" & (SummaryRow + 8)) = "Maximum"
.Range("F" & (SummaryRow + 8)) = Max
Max = WorksheetFunction.Max(TotalArray)
.Range("G" & (SummaryRow + 8)) = Max

.Range("E" & (SummaryRow + 9)) = "Pearson product-moment"
.Range("F" & (SummaryRow + 9)) = "What are the 2nd data Points?"

Variance = WorksheetFunction.Var(DataRange)
.Range("E" & (SummaryRow + 10)) = "Variance"
.Range("F" & (SummaryRow + 10)) = Variance
Variance = WorksheetFunction.Var(TotalArray)
.Range("G" & (SummaryRow + 10)) = Variance

StdDev = WorksheetFunction.StDev(DataRange)
.Range("E" & (SummaryRow + 8)) = "Standard Deviation"
.Range("F" & (SummaryRow + 8)) = StdDev
StdDev = WorksheetFunction.StDev(TotalArray)
.Range("G" & (SummaryRow + 8)) = StdDev

NewCol = 7 'ColG
.Cells(1, NewCol) = "Mean"
.Cells(1, (NewCol + 1)) = "Median"
.Cells(1, (NewCol + 2)) = "Mode"
.Cells(1, (NewCol + 3)) = "Minimum"
.Cells(1, (NewCol + 4)) = "First Quartile"
.Cells(1, (NewCol + 5)) = "Second Quartile"
.Cells(1, (NewCol + 6)) = "Third Quartile"
.Cells(1, (NewCol + 7)) = "Maximum"
.Cells(1, (NewCol + 8)) = "Pearson product-moment"
.Cells(1, (NewCol + 9)) = "Variance"
.Cells(1, (NewCol + 10)) = "Standard Deviation"

'second method of statistics

For RowCount = 2 To LastRow
Class = .Range("E" & RowCount)

'put original data for class into an array
Erase FrequencyArray
ClassCount = 0
Set c = ClassRange.Find(what:=Class, _
LookIn:=xlValues, lookat:=xlWhole)
If Not c Is Nothing Then
FirstAddr = c.Address
Do
ReDim Preserve FrequencyArray(0 To ClassCount)
FrequencyArray(ClassCount) = c.Offset(0, -1)
ClassCount = ClassCount + 1
Set c = ClassRange.FindNext(after:=c)
Loop While Not c Is Nothing And c.Address <> FirstAddr
End If

Mean = WorksheetFunction.Average(FrequencyArray)
.Cells(RowCount, NewCol) = Mean

Median = WorksheetFunction.Median(FrequencyArray)
.Cells(RowCount, (NewCol + 1)) = Median

'Mode = WorksheetFunction.Mode(FrequencyArray)
'.Cells(RowCount, (NewCol + 2)) = Mode

Minimum = WorksheetFunction.Min(FrequencyArray)
.Cells(RowCount, (NewCol + 3)) = Minimum

Quartile = WorksheetFunction.Quartile(FrequencyArray, 1)
.Cells(RowCount, (NewCol + 4)) = Quartile

Quartile = WorksheetFunction.Quartile(FrequencyArray, 2)
.Cells(RowCount, (NewCol + 5)) = Quartile

Quartile = WorksheetFunction.Quartile(FrequencyArray, 3)
.Cells(RowCount, (NewCol + 6)) = Quartile

Max = WorksheetFunction.Max(FrequencyArray)
.Cells(RowCount, (NewCol + 7)) = Max

'Pearson = WorksheetFunction _
' .Pearson(FrequencyArray, DataRange)
'.Cells(RowCount, (NewCol + 8)) = Pearson

If UBound(FrequencyArray) > 0 Then
Variance = WorksheetFunction.Var(FrequencyArray)
Else
Variance = 0
End If
.Cells(RowCount, (NewCol + 9)) = Variance

If UBound(FrequencyArray) > 0 Then
StdDev = WorksheetFunction.StDev(FrequencyArray)
Else
StdDev = 0
End If
.Cells(RowCount, (NewCol + 10)) = StdDev
Next RowCount
End With
End Sub

Dear Joel,
Thank you ver much for your code. However, I id happening some thing
that I do not understand with it since for a given data like bellow, I
have some solutions that I can not explain what didi they represent:

data:

Frequency Number
1 3
5 3
2 11
1 7
2 8
5 2
2 2
3 3
2 3
2 5
1 3
1 5
1 3
1 5
1 3
1 2
1 2
1 5
1 5
1 2
1 3
1 3
1 5
1 9
1 7
1 7
1 9
1 7
1 3

I believe that the results of the code should be:

Class Frequency
2 10
3 17
5 7
7 4
8 2
9 2
11 2

Statistics
Mean 6,285714286
Median 4
Mode 2
Minimum 2
First Quartile 2
Second Quartile 4
Third Quartile 8,5
Standard Deviation 5,61884584
Pearson product-moment What are the 2nd data Points?
Variance 31,57142857

But the results are:

Class Frequency Mean Median Mode Minimum First Quartile Second
Quartile Third Quartile Maximum Pearson product-moment Variance
Standard Deviation
2 10 2 1 1 1 1 2 5 3 1,732050808
3 17 1,7 1 1 1 1 1,75 5 1,788888889 1,33749351
5 7 1,166666667 1 1 1 1 1 2 0,166666667 0,40824829
7 4 1 1 1 1 1 1 1 0 0
8 2 2 2 2 2 2 2 2 0 0
9 2 1 1 1 1 1 1 1 0 0
11 2 2 2 2 2 2 2 2 0 0

Statistics
Mean 6,285714286 4,318181818
Median 4 3
Mode 2 3
Minimum 2 2
First Quartile 2 3
Second Quartile 4 3
Third Quartile 8,5 5
Standard Deviation 5,61884584 2,531409455
Pearson product-moment What are the 2nd data Points?
Variance 31,57142857 6,408033827

Do you know what is happening?

Thanks in advance,
Luciano
 
I left all three reuslts on the worksheet

Results 1  Column F : Performing the statistics on the frequency
numbers afer combining each class as a single group

Results 2  Column G to R : Getting the statistics on each class

Results 3  Column G : Getting the statistics on each class on the
combined number set of number using the frequency of each class

2 10
3 17
5 7
7 4
8 2
9 2
11 2

I made an array that looks lik this

{2,2,2,2,2,2,2,2,2,2,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,3,5,5,5,5,5,5,5,7,7,7,7,8,8,9,9,11,11}

Then ran the statistics of the above array.

Ok, I understood now, fantastic :)
Thank you,
Luciano
 
Back
Top