Partial file hashing

schneider · Dec 10, 2008

Hi all,

I want to calculate an MD5 hash of a file just to a certain position
and then skip the remaining bytes, for example the first 600KB of a
1MB file. Copying the si
gnificant part of the file into a byte[] is not an option since the
concerning files can be very large. Any suggestions?

Sincerely,
Hannes

Alberto Poblacion · Dec 10, 2008

I want to calculate an MD5 hash of a file just to a certain position
and then skip the remaining bytes, for example the first 600KB of a
1MB file. Copying the si
gnificant part of the file into a byte[] is not an option since the
concerning files can be very large. Any suggestions?

I have not tried this, but I think that it should work:

Inherit from FileStream and override the Read method so that it stops
reading after 600KB, or whatever size you want. Then call
ComputeHash(stream) passing your inherited stream.

Peter Morris · Dec 10, 2008

For a small amount of memory like 600KB you may as well create a memory
stream and copy part of the file into it.

Arne Vajhøj · Dec 11, 2008

I want to calculate an MD5 hash of a file just to a certain position
and then skip the remaining bytes, for example the first 600KB of a
1MB file. Copying the si
gnificant part of the file into a byte[] is not an option since the
concerning files can be very large. Any suggestions?

I am missing something.

Reading the first 600 KB into a byte[] and hash that should
not be a problem no matter how big the file is.

Arne

schneider · Dec 11, 2008

I want to calculate an MD5 hash of a file just to a certain position
and then skip the remaining bytes, for example the first 600KB of a
1MB file. Copying the si
gnificant part of the file into a byte[] is not an option since the
concerning files can be very large. Any suggestions?

Click to expand...

I have not tried this, but I think that it should work:

Inherit from FileStream and override the Read method so that it stops
reading after 600KB, or whatever size you want. Then call
ComputeHash(stream) passing your inherited stream.

Nice proposal, I think I'll give it a try. Thanks a lot.

schneider · Dec 11, 2008

For a small amount of memory like 600KB you may as well create a memory
stream and copy part of the file into it.

I guess I didn't point it out correctly. 600KB was an example, it
could also be 600MB or even 2GB. My goal is to calculate a hash
effectively for a random part of a file without reading the concerning
bytes to memory as a whole.

schneider · Dec 11, 2008

[email protected] said:
[email protected] said:

I want to calculate an MD5 hash of a file just to a certain position
and then skip the remaining bytes, for example the first 600KB of a
1MB file. Copying the si
gnificant part of the file into a byte[] is not an option since the
concerning files can be very large. Any suggestions?

Click to expand...

I am missing something.

Reading the first 600 KB into a byte[] and hash that should
not be a problem no matter how big the file is.

Arne

600KB is a random value, it could also be 600MB or more. It comes down
to being able to calculate a hash for a random part of a file (of
random size) without reading all concerning bytes into memory. Sorry
for missing that, I thought that was clear.

Hannes

schneider · Dec 11, 2008

news:[email protected]...

Click to expand...

I want to calculate an MD5 hash of a file just to a certain position
and then skip the remaining bytes, for example the first 600KB of a
1MB file. Copying the si
gnificant part of the file into a byte[] is not an option since the
concerning files can be very large. Any suggestions?

Click to expand...

Click to expand...

I have not tried this, but I think that it should work:

Click to expand...

Inherit from FileStream and override the Read method so that it stops
reading after 600KB, or whatever size you want. Then call
ComputeHash(stream) passing your inherited stream.

Click to expand...

Nice proposal, I think I'll give it a try. Thanks a lot.

Works great. My new class PartialFileStream derives from FileStream.
Quite simple, but it works. Here's the "beta" code without laying
claim to perfection for those who face similar problems.

class PartialFileStream : FileStream
{
public PartialFileStream(string path, FileMode mode, long
startPosition, long endPosition)
: base(path, mode)
{
base.Seek(startPosition, SeekOrigin.Begin);
ReadTillPosition = endPosition;
}

public long ReadTillPosition { get; set; }

public override int Read(byte[] array, int offset, int count)
{
if (base.Position >= this.ReadTillPosition)
return 0;

if (base.Position + count > this.ReadTillPosition)
count = (int)(this.ReadTillPosition - base.Position);

return base.Read(array, offset, count);
}
}

Arne Vajhøj · Dec 14, 2008

[email protected] said:
[email protected] said:

I want to calculate an MD5 hash of a file just to a certain position
and then skip the remaining bytes, for example the first 600KB of a
1MB file. Copying the si
gnificant part of the file into a byte[] is not an option since the
concerning files can be very large. Any suggestions?

Click to expand...

I am missing something.

Reading the first 600 KB into a byte[] and hash that should
not be a problem no matter how big the file is.

Click to expand...

600KB is a random value, it could also be 600MB or more. It comes down
to being able to calculate a hash for a random part of a file (of
random size) without reading all concerning bytes into memory. Sorry
for missing that, I thought that was clear.

In that case open a stream, seek to where you want
to start, read in chunks of maybe 100 KB and update
the hash for each read.

Arne

Peter Morris · Dec 15, 2008

Just use reflector to read String.GetHashCode method, then all you need to
do is to do that code in a loop that reads X bytes into a buffer at a time
until you have read a total of Y bytes.

Partial file hashing

schneider

Alberto Poblacion

Peter Morris

Arne Vajhøj

schneider

schneider

schneider

schneider

Arne Vajhøj

Peter Morris