We definitely will be moving to a better solution for handling these periodic
events in the next version.
I did, however, find the true root of the problem and I can now reproduce it.
The base problem is with the set method of the Timer.Interval property.
Background:
The Interval property is typed as double but has a check that will throw an
ArgumentException if the value is less than or equal to zero. You can set
the interval all the way up to Double.MaxValue without error. However, when
you call the Timer.Start() method, it will throw an
ArgumentOutOfRangeException if the Interval is greater than or equal
Int32.MaxValue (2,147,483,647) or less than or equal to zero. The developers
are obviously trying to protect some internal variable from a too-large
value. (Why this class is designed this way is another mystery. Perhaps
some backwards-compatibility requirement?)
The problem surfaces when you change the Timer’s interval property after the
timer is running. Internally, some math is done which involves the
Environment.TickCount (probably trying to account for the fact that the
interval value may be changed part way through the current interval).
Environment.TickCount is a Uint32 variable (max value 4,294,967,295 or 49.71
days when representing milliseconds).
If the current value of the TickCount, when combined with the new Interval
value exceeds Uint32.MaxValue, the timer will begin firing continuously until
the overflow ends. This requires either changing the Interval again or
waiting for the TickCount to reset.
To summarize, if your system has been running for less than ~25 days, there
is no way you can experience a problem. Over the next ~25 days, the safe
maximum interval set limit decreases to zero. If you exceed the limit, the
timer will fire continuously until the end of the TickCount cycle.
Because of the time relation, the larger the interval – the longer it will
misbehave. The maximum time that it can misbehave is the interval you
specified.
If the timer’s event handler updates the interval (as was my case) it will
often be less than the full interval time because the problem will not appear
until the event and subsequent setting of the interval actually occurs. (If
the Timer’s start time would be random with respect to the host computers
start time, the error time would tend towards an average of Interval/2.)
This corresponds to my observation of field failures, which occurred, on
average, within 12 hours of the system uptime reaching 50 days of uptime.
(My hourly timer would have failed also approximately 11.5 hours after the
daily timers had the service not been restarted.)
Limitations:
If you are not updating the interval, and instead using the AutoReset
feature, you will never see this problem. If you set the Interval to a value
that, when combined with TickCount, would cause an overflow, then
subsequently call the timer’s Start() method, you will not experience a
problem.
I now know how to patch my code until I can redesign the next official
version. I do want to file this bug with Microsoft. Does anyone know the
best way to do that?
If you want to investigate this problem yourself you will probably find the
“Adjust Tick Count†utility useful as it avoids a lot of waiting.
http://www.ysgyfarnog.co.uk/utilities/adjusttickcount/