D
Daniel
We are trying to create a separate average for a number of different data
sets, so that we can then compare those averages against each other to see
who stands out.
Our problem is that in each data set, there are entries that are
significantly different than the bulk of the entries. So different as to
heavily skew the average for that data set. We could have a data set that
is (simplified) { 1 ... 99, 32456, 65234}. Obviously, those last two
entries will make an average meaningless.
What would be the best way to create a more accurate or more meaningful
average? Would data normalization help or standard deviations? If so, how
would I proceed? Is it possible to automatically drop the high end and low
end numbers and just average the "middle" 90 percent? In other words, drop
the top five percent and the low five percent and average what's left?
Any help or ideas would be greatly appreciated.
TIA
sets, so that we can then compare those averages against each other to see
who stands out.
Our problem is that in each data set, there are entries that are
significantly different than the bulk of the entries. So different as to
heavily skew the average for that data set. We could have a data set that
is (simplified) { 1 ... 99, 32456, 65234}. Obviously, those last two
entries will make an average meaningless.
What would be the best way to create a more accurate or more meaningful
average? Would data normalization help or standard deviations? If so, how
would I proceed? Is it possible to automatically drop the high end and low
end numbers and just average the "middle" 90 percent? In other words, drop
the top five percent and the low five percent and average what's left?
Any help or ideas would be greatly appreciated.
TIA