MD5 for Large Files

  • Thread starter Thread starter Richard Lemay
  • Start date Start date
R

Richard Lemay

I am new to VB.NET and I am trying to learn. So, your indulgence for the
triviality of my questions is kindly requested.

I would like to calculate an MD5 hash for very lage files. The examples I
came across read the file into a byte array and apply the hash to that
array. The following code illustrates what I am doing. I would like to
perform the hash calculation on a stream. Is this possible? If so, an
example would be greatly appreciated.

Imports System.Text
Imports System.Security.Cryptography
Imports System.IO

Public Class Form1
Inherits System.Windows.Forms.Form
Dim fs As FileStream = New FileStream("c:\ConfDenise.PDF",
FileMode.Open)
Dim r As BinaryReader = New BinaryReader(fs)

Public Function GenerateHash(ByRef Buff() As Byte) As String
Dim Md5 As New MD5CryptoServiceProvider()
Dim ByteHash() As Byte = Md5.ComputeHash(Buff)
Return Convert.ToBase64String(ByteHash)
End Function

#Region " Windows Form Designer generated code "
' snip
#End Region

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles Button1.Click
fs.Seek(0, SeekOrigin.Begin)
Label1.Text = GenerateHash(r.ReadBytes(fs.Length))
End Sub
End Class
 
If you are familiar with Block cipher encryption in .NET, hashes are
supposed to work the same way.
In other words, hash algorithms (like MD5) implement the ICryptoTransform
interface, which allows them to be used by the CryptoStream object.
You could create a memory stream, and wrap it with a CryptoStream, and pass
the MD5 instance to the CryptoStream constructor. Anytime you "write" to the
CryptoStream, it should transform the data through the protecte HashCore
function of MD5. Then, when you are done, you call FlushFinalBlock on the
CryptoStream, which in turn calls the MD5 HashFinal method. Then you can
check the Hash property of the md5 object for the final hash value.

Function GetHash() As String

Dim cs As CryptoStream
Dim ms As MemoryStream = New MemoryStream
Dim md5Hash As MD5CryptoServiceProvider = New MD5CryptoServiceProvider

Try

Dim buffer As Byte() = New Byte() {1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15}

cs = New CryptoStream(ms, md5Hash, CryptoStreamMode.Write)
cs.Write(buffer, 0, buffer.Length)

cs.FlushFinalBlock()
Return Convert.ToBase64String(md5Hash.Hash())

Catch ex As Exception

MsgBox("Error during hash operation: " + ex.ToString())

Finally
If Not (cs Is Nothing) Then cs.Close()
If Not (md5Hash Is Nothing) Then md5Hash.Clear()
End Try
End Function

The only thing to note is that in my sample, i just filled the buffer once
with a bunch of numbers.
In your case, you need to wrap a Loop around the cs.Write(...) line, and
read from your file to the buffer one chunk at a time, and write the buffer
to the CryptoStream until you run out of file.

-Rob Teixeira [MVP]
 
Thanks Rob,
I'll give it a try.
Richard

Rob Teixeira said:
If you are familiar with Block cipher encryption in .NET, hashes are
supposed to work the same way.
In other words, hash algorithms (like MD5) implement the ICryptoTransform
interface, which allows them to be used by the CryptoStream object.
You could create a memory stream, and wrap it with a CryptoStream, and pass
the MD5 instance to the CryptoStream constructor. Anytime you "write" to the
CryptoStream, it should transform the data through the protecte HashCore
function of MD5. Then, when you are done, you call FlushFinalBlock on the
CryptoStream, which in turn calls the MD5 HashFinal method. Then you can
check the Hash property of the md5 object for the final hash value.

Function GetHash() As String

Dim cs As CryptoStream
Dim ms As MemoryStream = New MemoryStream
Dim md5Hash As MD5CryptoServiceProvider = New MD5CryptoServiceProvider

Try

Dim buffer As Byte() = New Byte() {1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 15}

cs = New CryptoStream(ms, md5Hash, CryptoStreamMode.Write)
cs.Write(buffer, 0, buffer.Length)

cs.FlushFinalBlock()
Return Convert.ToBase64String(md5Hash.Hash())

Catch ex As Exception

MsgBox("Error during hash operation: " + ex.ToString())

Finally
If Not (cs Is Nothing) Then cs.Close()
If Not (md5Hash Is Nothing) Then md5Hash.Clear()
End Try
End Function

The only thing to note is that in my sample, i just filled the buffer once
with a bunch of numbers.
In your case, you need to wrap a Loop around the cs.Write(...) line, and
read from your file to the buffer one chunk at a time, and write the buffer
to the CryptoStream until you run out of file.

-Rob Teixeira [MVP]
 
Back
Top