[repost] Most efficient way to make sure that 2 byte[] are identical?

  • Thread starter Thread starter José Joye
  • Start date Start date
J

José Joye

I have to compare 2 byte[] and I must be sure that they are fully identic.
I have to perform this check about 1000 times per minute and on arrays that
are between 100-200K in size.
In that sense, what is the most efficient method?

Sincerely,
José
 
That depends. Can you spare some time when these byte arrays are initialy
loaded or created to compute hash values? If so, do that and compare the
hashes. If not, a simple loop comparing each byte and returning false on
the first mismatch will probably be your best bet.

However, look for ways to optimize this based on your data: if the array is
normally modified at the end only, start comparing at the end. If certain
offsets are more likely to be different, compare those. If your code is the
one modifying the byte array, and the arrays are changed infrequently, use
hashes and check them on subsequent tries.


As you can see, there's no one most efficient method, if you really want to
perform best you need to know your data and how it's modified.
 
Thanks for your thoughts!

Since I cannot predict the kind of information I will receive (However, the
probability that the arrays are identicals is really high), I will go for
something like MD5.ComputeHash().


José

Philip Rieck said:
That depends. Can you spare some time when these byte arrays are initialy
loaded or created to compute hash values? If so, do that and compare the
hashes. If not, a simple loop comparing each byte and returning false on
the first mismatch will probably be your best bet.

However, look for ways to optimize this based on your data: if the array is
normally modified at the end only, start comparing at the end. If certain
offsets are more likely to be different, compare those. If your code is the
one modifying the byte array, and the arrays are changed infrequently, use
hashes and check them on subsequent tries.


As you can see, there's no one most efficient method, if you really want to
perform best you need to know your data and how it's modified.

José Joye said:
I have to compare 2 byte[] and I must be sure that they are fully identic.
I have to perform this check about 1000 times per minute and on arrays that
are between 100-200K in size.
In that sense, what is the most efficient method?

Sincerely,
José
 
José Joye said:
Thanks for your thoughts!

Since I cannot predict the kind of information I will receive (However, the
probability that the arrays are identicals is really high), I will go for
something like MD5.ComputeHash().

That's not going to be any faster than doing it the brute force way.

If the arrays are different every time, the fastest thing to do is
almost certainly just check the lengths to start with and then just
compare each byte of the array in the obvious way:

if (array1.Length != array2.Length)
{
return false;
}
for (int i=0; i < array1.Length; i++)
{
if (array1 != array2)
{
return false;
}
}
return true;

The only problem I can see with that is in terms of caching, and I
don't think it'll actually make any odds.
 
You're right Jon.
However, I will stick to the MD5 hashing. This will allow me to attach the
hash to the message and also to calculate it in advance.

Thanks,
José
José Joye said:
Thanks for your thoughts!

Since I cannot predict the kind of information I will receive (However, the
probability that the arrays are identicals is really high), I will go for
something like MD5.ComputeHash().

That's not going to be any faster than doing it the brute force way.

If the arrays are different every time, the fastest thing to do is
almost certainly just check the lengths to start with and then just
compare each byte of the array in the obvious way:

if (array1.Length != array2.Length)
{
return false;
}
for (int i=0; i < array1.Length; i++)
{
if (array1 != array2)
{
return false;
}
}
return true;

The only problem I can see with that is in terms of caching, and I
don't think it'll actually make any odds.
 
José Joye said:
You're right Jon.
However, I will stick to the MD5 hashing. This will allow me to attach the
hash to the message and also to calculate it in advance.

Don't forget that although the hash will never give any false negatives
(arrays which are actually identical but are deemed unequal), it might
give false positives (arrays which are different but are deemed equal).
There's a tiny chance of it happening, but you need to weigh up what
the consequences would be.
 
Back
Top