Rationale behind int to enum casts

  • Thread starter Thread starter Andreas Huber
  • Start date Start date
Daniel,
The best I can think of, a project I worked on about a year ago was
processing data files produced by a horridly buggy program(an apparently
rather old one which the company was unwilling to replace, even though it no
longer worked in their current situation). It would commonly pump out
invalid file names in the command file, or generate incorrect command
values. Of course, they didn't tell me this ahead of time, so I went ahead
and wrote against the (correct)test data they gave me, and when I went to
test it live, I found that the source could cause somewhere between 10% and
20% entries to be incorrect. So I was left with the task of making the
processor capable of handling all the possible invalid options, which meant
taking away exception handling in alot of cases where I could and using
manual checks, it wasn't acceptable to spend that much time processing
exceptions.
I don't know how often strange situations like that come up, I just know I
seem to find well more than my share of them.

How can a program obviously doing lots of disk I/O be slowed down
noticeably by something that happens entirely in the processor? Yes
the processing of one exception is slow when directly compared to a
single if but I can't really believe it is so slow that it overshadows
the time spent reading a single file. More specifically, you say that
about 20% of the files were non-existent, how can the time spent
processing the exceptions thrown for those 20% be in the same league
as the time spent reading/writing the existing 80% of the files?
In my experience the processing of a single exception takes
microseconds (I've just tested it) whereas I/O is always in the
millisecond range. The only thing that takes a lot of time is when you
query the stack trace of an exception, that is usually in the second
range.
BTW, I don't doubt your experience it is just so contrary to
everything I've seen.
Do you have any before/after timings of your app, if possible with
stats that show how many exceptions were thrown?
Well, the situation comes down in batch processing. If you have a list of
1000, 2000, 10000 files, the chances that 10 or 100 of them may not exist
isn't small, 10 or 100 exceptions isn't my kind of thing in that situation.
Also, the chances that the folder information is cached in memory is pretty
high, higher after the first call, assuming your not scanning across a huge
possible directory set. The call to File.Exists() should remove any disk
access required in the open call, otherwise the call to open the file would
take the same period of time.

Again, I assume you are going to read/write the files that you have
opened successfully. No matter how small amounts of data you
read/write from/to each file it's always going to take milliseconds as
there is no way that their content is already cached, unless you run
your batch processing app twice on the same data what sort of defeats
the whole purpose of batch processing, doesn't it?
Moreover, the processing of those 100 exceptions will only take
milliseconds. How should that ever be noticeable?
Anyway, put enums in the same situation. Batch processing a huge collection
of basic entries, one member of which is a enum, if the exception from the
cast causes 1 second of delay, and there are many incorrect enums(that does
happen) what is the performance drain? This is a case where the value
matters, can be handled directly by the loader(just log a invalid entry and
move on). Granted a simple Try type method to load the enum and return false
if it fails would solve that problem as well.

I don't see how the difference between the processing of one exception
and the processing of one error-code can be anywhere near one second.
This can only happen if you query the stack trace of the exception but
that would not be fair as you don't have a stack-trace with
error-codes, right?

BTW, one other situation where exception processing is really slow is
when you run a program in the debugger, but I guess that was not the
case when you took those timings.

[snip]
Still, that generates unexpected functionality to the experianced developer
if set on by default, and would be ignored by the new programmer if it was
set off.

Not quite, you forget the situation where you have one project
architect setting up the VS project, working together with a number of
newbies. This is the most common situation in projects today.
I would therefore be happy when the setting is off by default, I can
then turn it on for new projects and leave it off for current
projects.
[snip]
The main advantage of this is it would allow both Flags and non-Flags
decorated enums to be validly converted in either situation. It can be that
in one case a Flags enum needs to be precise, but in others needs to be
loose. A specific rule using Flags potentially causes loss of flexibility.
Adding another attribute to the mix fixes that, but adds more complexity and
locks the enum's use to what the author of the enum wants. In some cases the
author cannot know how exactly the enum will be used outside of it being
Flags or a singular enum, or the enum will be of value in another
situation.

Maybe, but this is rather uncommon isn't it? I tend to define new
enums for such cases.
It depends, for some things, replacing system enums is a bad idea, as it
seems to confuse people. I don't see any reason why there needs to be four
identical WindowState enums or whatever.

I meant that you only create a new enum if an existing one nearly but
not quite fits your needs. I never felt that urge for enums provided
by the framework.
For a third party library I'll
rarely reuse one, but I don't like the confusion included with providing an
identical, both by contents and its semantic usage, replacement to something
defined in the mscorlib or System assemblies.

If you really want to create enum objects having more states than what
the enum definition in the .NET framework allows then you're doing
something that is rather uncommon. You could still do it but you'd
have to take the detour via a special function rather than the direct
route via the cast. I think that's perfectly reasonable.
Beyond that, you can't always
ignore the uncommon, it seems they come up more commonly across the board
than it ever seems they could.

As I've said several times now: I don't want to ignore the uncommon!
Rather, I want the uncommon to be more complicated (or less obvious)
than the common. With the current implementation of enums it is
exactly the other way round, which is a burden for I guess at least
90% of the users.

Regards,

Andreas
 
Andreas Huber said:
Daniel,


How can a program obviously doing lots of disk I/O be slowed down
noticeably by something that happens entirely in the processor? Yes
the processing of one exception is slow when directly compared to a
single if but I can't really believe it is so slow that it overshadows
the time spent reading a single file. More specifically, you say that
about 20% of the files were non-existent, how can the time spent
processing the exceptions thrown for those 20% be in the same league
as the time spent reading/writing the existing 80% of the files?
In my experience the processing of a single exception takes
microseconds (I've just tested it) whereas I/O is always in the
millisecond range. The only thing that takes a lot of time is when you
query the stack trace of an exception, that is usually in the second
range.
BTW, I don't doubt your experience it is just so contrary to
everything I've seen.
Do you have any before/after timings of your app, if possible with
stats that show how many exceptions were thrown?
It wasn't significant, however when it slowed down the response time the
users wanted they got upset, as with many things, perception matttered more
than reality. Often times it'd have large blocks of corrupt files in a row,
and the company got upset if they didn't see files changing. Thinking about
it now, if I had just displayed the next file name after each exception and
moved on there would have been no problem, however in that harried and
annoyed state I was in fixing the problem that particular solution didn't
come to me(never said I was a very good UI designer). I didn't keep any
timings, but the speed difference wasn't significant, the UI just didn't
stop for a few seconds at a time per batch(of course, most exceptions were
logged...don't recall if I was calling the log method for file exceptions or
not). If I was that would have certainly be a very nasty perf problem as
well, 20,000 or so stack trace queries in a row...performance wasn't that
bad.
As I said, it wasn't a very good example, sadly embaressing example, but it
was the best I could think of.
Well, the situation comes down in batch processing. If you have a list of
1000, 2000, 10000 files, the chances that 10 or 100 of them may not exist
isn't small, 10 or 100 exceptions isn't my kind of thing in that situation.
Also, the chances that the folder information is cached in memory is pretty
high, higher after the first call, assuming your not scanning across a huge
possible directory set. The call to File.Exists() should remove any disk
access required in the open call, otherwise the call to open the file would
take the same period of time.

Again, I assume you are going to read/write the files that you have
opened successfully. No matter how small amounts of data you
read/write from/to each file it's always going to take milliseconds as
there is no way that their content is already cached, unless you run
your batch processing app twice on the same data what sort of defeats
the whole purpose of batch processing, doesn't it?
Moreover, the processing of those 100 exceptions will only take
milliseconds. How should that ever be noticeable?
Anyway, put enums in the same situation. Batch processing a huge collection
of basic entries, one member of which is a enum, if the exception from the
cast causes 1 second of delay, and there are many incorrect enums(that does
happen) what is the performance drain? This is a case where the value
matters, can be handled directly by the loader(just log a invalid entry and
move on). Granted a simple Try type method to load the enum and return false
if it fails would solve that problem as well.

I don't see how the difference between the processing of one exception
and the processing of one error-code can be anywhere near one second.
This can only happen if you query the stack trace of the exception but
that would not be fair as you don't have a stack-trace with
error-codes, right?

BTW, one other situation where exception processing is really slow is
when you run a program in the debugger, but I guess that was not the
case when you took those timings.

[snip]
Still, that generates unexpected functionality to the experianced developer
if set on by default, and would be ignored by the new programmer if it was
set off.

Not quite, you forget the situation where you have one project
architect setting up the VS project, working together with a number of
newbies. This is the most common situation in projects today.
I would therefore be happy when the setting is off by default, I can
then turn it on for new projects and leave it off for current
projects.
[snip]
The main advantage of this is it would allow both Flags and non-Flags
decorated enums to be validly converted in either situation. It can
be
that
in one case a Flags enum needs to be precise, but in others needs to be
loose. A specific rule using Flags potentially causes loss of flexibility.
Adding another attribute to the mix fixes that, but adds more
complexity
and
locks the enum's use to what the author of the enum wants. In some
cases
the
author cannot know how exactly the enum will be used outside of it being
Flags or a singular enum, or the enum will be of value in another situation.

Maybe, but this is rather uncommon isn't it? I tend to define new
enums for such cases.
It depends, for some things, replacing system enums is a bad idea, as it
seems to confuse people. I don't see any reason why there needs to be four
identical WindowState enums or whatever.

I meant that you only create a new enum if an existing one nearly but
not quite fits your needs. I never felt that urge for enums provided
by the framework.
For a third party library I'll
rarely reuse one, but I don't like the confusion included with providing an
identical, both by contents and its semantic usage, replacement to something
defined in the mscorlib or System assemblies.

If you really want to create enum objects having more states than what
the enum definition in the .NET framework allows then you're doing
something that is rather uncommon. You could still do it but you'd
have to take the detour via a special function rather than the direct
route via the cast. I think that's perfectly reasonable.
Beyond that, you can't always
ignore the uncommon, it seems they come up more commonly across the board
than it ever seems they could.

As I've said several times now: I don't want to ignore the uncommon!
Rather, I want the uncommon to be more complicated (or less obvious)
than the common. With the current implementation of enums it is
exactly the other way round, which is a burden for I guess at least
90% of the users.

Regards,

Andreas
 
Back
Top