B
bill..
I have an application that generates hourly system performance
logfiles which I graph to look for long term trending.
The metric I use gradually varies from 1% to about 15% depending on
various external factors - such as time of day and day of week.
My problem is that the logfiles sometime hiccup and generate bad data
resulting is huge spikes in my curve. I have trapped for the big ones
Unfortunately I do not have the option to fix the application that
generated the bad data.
Are there any statistical function that I can use to look for such
deviations?
Something that would allow me to toss any data over 5% from the trend
line would be perfect.
eg
value
1.2
1.9
2.4
3.1
2.6
11.3 toss this one using NA() since it is way off the curve
3.4
4.6
6.3
7.5
9.3
11.3 keep this one since it is not too far off the curve
8.8
Any suggestions?
Thanks
Bill
logfiles which I graph to look for long term trending.
The metric I use gradually varies from 1% to about 15% depending on
various external factors - such as time of day and day of week.
My problem is that the logfiles sometime hiccup and generate bad data
resulting is huge spikes in my curve. I have trapped for the big ones
large deviations from the curve.20% in my source data but I need something smarter so I can catch
Unfortunately I do not have the option to fix the application that
generated the bad data.
Are there any statistical function that I can use to look for such
deviations?
Something that would allow me to toss any data over 5% from the trend
line would be perfect.
eg
value
1.2
1.9
2.4
3.1
2.6
11.3 toss this one using NA() since it is way off the curve
3.4
4.6
6.3
7.5
9.3
11.3 keep this one since it is not too far off the curve
8.8
Any suggestions?
Thanks
Bill