Partial file hashing

  • Thread starter Thread starter schneider
  • Start date Start date
S

schneider

Hi all,

I want to calculate an MD5 hash of a file just to a certain position
and then skip the remaining bytes, for example the first 600KB of a
1MB file. Copying the si
gnificant part of the file into a byte[] is not an option since the
concerning files can be very large. Any suggestions?

Sincerely,
Hannes
 
I want to calculate an MD5 hash of a file just to a certain position
and then skip the remaining bytes, for example the first 600KB of a
1MB file. Copying the si
gnificant part of the file into a byte[] is not an option since the
concerning files can be very large. Any suggestions?

I have not tried this, but I think that it should work:

Inherit from FileStream and override the Read method so that it stops
reading after 600KB, or whatever size you want. Then call
ComputeHash(stream) passing your inherited stream.
 
For a small amount of memory like 600KB you may as well create a memory
stream and copy part of the file into it.
 
I want to calculate an MD5 hash of a file just to a certain position
and then skip the remaining bytes, for example the first 600KB of a
1MB file. Copying the si
gnificant part of the file into a byte[] is not an option since the
concerning files can be very large. Any suggestions?

I am missing something.

Reading the first 600 KB into a byte[] and hash that should
not be a problem no matter how big the file is.

Arne
 
I want to calculate an MD5 hash of a file just to a certain position
and then skip the remaining bytes, for example the first 600KB of a
1MB file. Copying the si
gnificant part of the file into a byte[] is not an option since the
concerning files can be very large. Any suggestions?

   I have not tried this, but I think that it should work:

   Inherit from FileStream and override the Read method so that it stops
reading after 600KB, or whatever size you want. Then call
ComputeHash(stream) passing your inherited stream.

Nice proposal, I think I'll give it a try. Thanks a lot.
 
For a small amount of memory like 600KB you may as well create a memory
stream and copy part of the file into it.

I guess I didn't point it out correctly. 600KB was an example, it
could also be 600MB or even 2GB. My goal is to calculate a hash
effectively for a random part of a file without reading the concerning
bytes to memory as a whole.
 
I want to calculate an MD5 hash of a file just to a certain position
and then skip the remaining bytes, for example the first 600KB of a
1MB file. Copying the si
gnificant part of the file into a byte[] is not an option since the
concerning files can be very large. Any suggestions?

I am missing something.

Reading the first 600 KB into a byte[] and hash that should
not be a problem no matter how big the file is.

Arne

600KB is a random value, it could also be 600MB or more. It comes down
to being able to calculate a hash for a random part of a file (of
random size) without reading all concerning bytes into memory. Sorry
for missing that, I thought that was clear.

Hannes
 
I want to calculate an MD5 hash of a file just to a certain position
and then skip the remaining bytes, for example the first 600KB of a
1MB file. Copying the si
gnificant part of the file into a byte[] is not an option since the
concerning files can be very large. Any suggestions?
   I have not tried this, but I think that it should work:
   Inherit from FileStream and override the Read method so that it stops
reading after 600KB, or whatever size you want. Then call
ComputeHash(stream) passing your inherited stream.

Nice proposal, I think I'll give it a try. Thanks a lot.

Works great. My new class PartialFileStream derives from FileStream.
Quite simple, but it works. Here's the "beta" code without laying
claim to perfection for those who face similar problems.

class PartialFileStream : FileStream
{
public PartialFileStream(string path, FileMode mode, long
startPosition, long endPosition)
: base(path, mode)
{
base.Seek(startPosition, SeekOrigin.Begin);
ReadTillPosition = endPosition;
}

public long ReadTillPosition { get; set; }

public override int Read(byte[] array, int offset, int count)
{
if (base.Position >= this.ReadTillPosition)
return 0;

if (base.Position + count > this.ReadTillPosition)
count = (int)(this.ReadTillPosition - base.Position);

return base.Read(array, offset, count);
}
}
 
I want to calculate an MD5 hash of a file just to a certain position
and then skip the remaining bytes, for example the first 600KB of a
1MB file. Copying the si
gnificant part of the file into a byte[] is not an option since the
concerning files can be very large. Any suggestions?
I am missing something.

Reading the first 600 KB into a byte[] and hash that should
not be a problem no matter how big the file is.

600KB is a random value, it could also be 600MB or more. It comes down
to being able to calculate a hash for a random part of a file (of
random size) without reading all concerning bytes into memory. Sorry
for missing that, I thought that was clear.

In that case open a stream, seek to where you want
to start, read in chunks of maybe 100 KB and update
the hash for each read.

Arne
 
Just use reflector to read String.GetHashCode method, then all you need to
do is to do that code in a loop that reads X bytes into a buffer at a time
until you have read a total of Y bytes.
 
Back
Top