Normalizing/Standardizing Data

  • Thread starter Thread starter Stumped
  • Start date Start date
S

Stumped

Hi,

I am comparing data from 2008 and 2009 but want to ensure I am not comparing
apples & oranges. I am hoping some one can help me.
2009:
10 projects total
8 projects done correctly
1 project done partially correct
1 project done wrong
80% done correctly
10% done partially correct
10% done wrong

2008:
13 projects total
12 projects done correctly
1 project done partially correct
1 project done wrong
92.3% done correctly
7.7% done partially correct
7.7% done wrong

Question: am I comparing apples to apples when I say 7.7% were done
incorrectly in 2008 and 10% were done incorrectly in 2009? I think the total
number of projects is skewing my data and some how I should normalize/equate
the total before doing a percentage check. What do you think?

I am also doing the same review dollar-wise and doing it dollar-wise shows
things were better in 2009 as compared to 2008. Do I need to do something
with the dollars also to make the dollars from one year to the next be equal,
or just the quantity needs to be equal? Or, not at all?

Thanks.
 
Decided to add the dollars for perspective. Also, the question I am trying to
answer is: was there an improvement from 2008 - 2009 in doing projects
correctly?

2009:
$100,000.00 projects total
$94,031.87 projects done correctly
$2,403.40 projects done partially correct
$3,564.73 projects done wrong
94.0% done correctly
2.4% done partially correct
3.6% done wrong

2008:
$227,158.78 projects total
$181,424.88 projects done correctly
$22,533.31 project done partially correct
$23,200.59 project done wrong
79.9% done correctly
9.9% done partially correct
7.7% done wrong
 
the question I am trying to answer is:
was there an improvement from 2008 - 2009
in doing projects correctly?

I'm not a stats guy, so I'm probably wrong here.
To 'Normalize a vector, try this...
In A1:A3, place your data 8,1,1
Place the following in B1, and copy down to B3.

=A1/(SQRT(SUMSQ($A$1:$A$3)))

Do the same with your other data (12,1,1) (Note: Total = 14, not 13)
The two vectors I get are:

{0.984732, 0.123091, 0.123091}


{0.993127, 0.082761, 0.082761}



If we divide the two vectors, we get the following:

{0.991547, 1.48732, 1.48732}

The "success rate" change is very close to one. (meaning not much of a
difference) This is because the 'success rate' is the biggest component
of your 3 items.

The failure rate seems to gotten worse as you said.
It appears worse at 1.48 vs your data of

10/7.7 = 1.2987

Again, I may be wrong here.
= = = = = = = =
HTH
Dana DeLouis
 
Back
Top