Searching for a small byte array in a large binary file (Quickly!)

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I'm trying to quickly grab the dimensions out of the headers of JPEG files.
I need to look for the hex string FFC0001108 in a file that will be from 1-40
megs in size. If someone has experience doing a similar type of search, what
is the most efficient method you have found for doing this? I can see doing
this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string, but
that sounds a bit slow. Thanks in advance for suggestions!

Josh
 
Skwerl said:
I'm trying to quickly grab the dimensions out of the headers of JPEG files.
I need to look for the hex string FFC0001108 in a file that will be from 1-40
megs in size. If someone has experience doing a similar type of search, what
is the most efficient method you have found for doing this? I can see doing
this by comparing the first byte of the string to each consecutive byte of
the file until it is found, and then checking for the rest of the string, but
that sounds a bit slow. Thanks in advance for suggestions!

You can do slightly better than that - examine every 5th byte. If it's
one of the bytes in the string, examine the appropriate bytes around it
(so if it's 0xc0, examine the previous one to check whether or not it's
0xff, then the subsequent three to see whether they're 0001108). If the
byte you first examine isn't one of the five you're looking at, you
know the string can't be found in the vicinity of it, so can move to
the position 5 bytes along.

It probably won't save *very* much time, as you'll still do the same
amount of IO, but there'll be less CPU time used examining the memory.
 
Yes Patrice, you are correct in that I should not need to scan the entire
file. The problem is that I don't always know that a .jpg image file will
actually BE an JPEG. The other problem is that the various internal
thumbnails in the file that Photoshop, ACDSee etc. make come before the
actual image and are marked the same way since they are in effect full JPEG
files inside the JPEG. I haven't found a really quick method to hop right to
the actual image and skip all the thumbnails yet. Hopping through the file
using the length bytes might be the best way. Thanks!
 
Back
Top