Compare string contents ignoring order

  • Thread starter Thread starter Ted Collins
  • Start date Start date
Bob O`Bob said:
For example, Dirk's code *will* equate "AA" to "AB" (though not
vice-versa).
If you're okay with that kind of failure mode, then great.

Yikes! I could have sworn I'd thought of that, but obviously I over
looked it. Good catch! *I'm* not okay with that kind of failure mode.
 
Ted Collins said:
Doh! Answered my own question. I trimmed at the same
time as we were fixing the nulls.

Great! But see Bob O`Bob's late post, commenting on my code. It
wouldn't take much to revise the code so that it catches the case he
mentioned.
 
Dirk Goldgar said:
Yikes! I could have sworn I'd thought of that, but obviously I over
looked it. Good catch! *I'm* not okay with that kind of failure
mode.

Oh, yeah, now I remember: Ted did specifically state that no character
would appear twice in either string, and I predicated my code on that.
Still, if the reliability of that assurance is at all questionable,
checking it from both sides would seem a low-cost safeguard.
 
Dirk said:
Oh, yeah, now I remember: Ted did specifically state that no character
would appear twice in either string, and I predicated my code on that.
Still, if the reliability of that assurance is at all questionable,
checking it from both sides would seem a low-cost safeguard.


Even in my own code, the stuff I do that is (at least initially) only for my own use,
I just bristle at any of those kinds of promises, like Ted started from.

That may be largely due to my history of having been bitten by similar but invalid
assumptions, but there's also a certain amount of what "professionalism" means to me
that says one should be aware of what one's code can do even with "unexpected" input.

Obviously a simple routine like we're discussing can have about three possible
outcomes: 'True', 'False', and 'Raise Error'.
Personally, I prefer to factor out the third case wherever that can be done.

In the current situation I recommend: if the error is duplicated characters,
then act the same as if they just weren't there.


Bob
 
Bob O`Bob said:
Even in my own code, the stuff I do that is (at least initially) only
for my own use, I just bristle at any of those kinds of promises,
like Ted started from.

That may be largely due to my history of having been bitten by
similar but invalid assumptions, but there's also a certain amount of
what "professionalism" means to me that says one should be aware of
what one's code can do even with "unexpected" input.

Obviously a simple routine like we're discussing can have about three
possible outcomes: 'True', 'False', and 'Raise Error'.
Personally, I prefer to factor out the third case wherever that can
be done.

In the current situation I recommend: if the error is duplicated
characters,
then act the same as if they just weren't there.

I agree in general with your viewpoint. In any given case, though, you
have to decide what assumptions you're going to make about your inputs.
If you choose to make no assumptions, that may maximize the independent
reliability of the routine -- possibly at the cost of performance, as in
this case -- without necessarily improving the reliability of the
application as a whole. That is, there are cases where you can say, "If
these assumptions are untrue, then the application is so broken that
this routine isn't even going to be called, or if it is called it
doesn't matter whether the routine gives the correct result." Because
of what Ted said about the source of his strings, I was taking this to
be the situation with him.

On the other hand, this certainly does have its drawbacks, in the form
of (a) decreased reusability of the code, and (b) danger of
misunderstanding the code's restrictions, so that it gets reused in
other applications in which the assumptions don't apply. So what to do?
It's all well and good to say a procedure should protect itself against
faulty inputs, but extreme zeal here can have a major impact on
performance. In our particular case, it doubles the looping time when
the strings match. That may be an insignificant difference in Ted's
application, or it may not. Assuming it is necessary for performance
reasons to make specific assumptions about the inputs -- assumptions
that you can't actually check without incurring undesired perfomance
costs -- one way to minimize the chance of future errors is to describe
the usage restriction in the procedure's name.

I think I was at fault here in naming my function as I did, since the
name is misleading as to the generality of its application. Probably I
should have named the function something like "ComparePermissions",
which is more specific to the particular application and makes it less
likely that the function will be reused unthinkingly. Also, I should
have had comments in the code to explicitly state the assumptions; here
I plead the fact that it was just quickie "demonstrator" code. It was
sloppy on my part, though.
 
Back
Top