ForwardOnlyFileReader - mostly finished implementation...

  • Thread starter Thread starter Robert
  • Start date Start date
R

Robert

Some TODO's in there which I will fix in 12-16 hours. Beer time here on the beach..
Also, I have only used this via the test case. I have not plugged it into a BinaryReader
yet. There may be some impedance mismatches...

145 lines for the class, 170 with the test case.
3 hours work. so about 1 line per minute, including the blank ones

Last, there is at least one bug..

Happy file reading to all!


Imports System.IO
Imports System.Collections.Generic

Friend Class ForwardOnlyFileReader : Inherits Stream
Friend Class BufferChunk
Private mChunk() As Byte
Private mStartingPosition As Long

Friend ReadOnly Property StartingPosition() As Long
Get
Return mStartingPosition
End Get
End Property

Friend ReadOnly Property LastPosition() As Long
Get
Return mStartingPosition + mChunk.Length
End Get
End Property

Friend Sub New(ByVal StartingPosition As Long, ByRef Chunk() As Byte)
If Chunk.Count = 0 Then Throw New System.ArgumentNullException("Bufferchunk.Read - Chunk is Null.")

mStartingPosition = StartingPosition
mChunk = Chunk
End Sub

Friend Function Read(ByVal ChunkOffset As Integer, ByVal Buffer() As Byte, ByVal BufferOffset As Integer, ByVal
DesiredBytes As Integer) As Integer
DesiredBytes = Math.Min(DesiredBytes, mChunk.Count - ChunkOffset) ' take what we can get..
Array.ConstrainedCopy(mChunk, ChunkOffset, Buffer, BufferOffset, DesiredBytes)

Return DesiredBytes
End Function
End Class

Private mBufferChunks As New List(Of BufferChunk)
Private mChunkSize As Integer
Private mLength As Long
Private mPosition As Long

Public Sub New(ByVal FileStream As FileStream, ByVal ChunkSize As Integer)
If FileStream Is Nothing Then Throw New System.ArgumentNullException("FileStream is null.")
If Not FileStream.CanRead Then Throw New System.NotSupportedException("The stream does not support reading.")

mChunkSize = ChunkSize
mLength = FileStream.Length
Dim aBufferChunk As BufferChunk
Do Until FileStream.Position = mLength
Dim PositionBeforeRead As Long = FileStream.Position
Dim Buffer(CInt(Math.Min(ChunkSize - 1, mLength - FileStream.Position - 1))) As Byte
Dim Bytes As Integer = FileStream.Read(Buffer, 0, Buffer.Count)
aBufferChunk = New BufferChunk(PositionBeforeRead, Buffer)
mBufferChunks.Add(aBufferChunk)
Loop

FileStream.Close()
FileStream.Dispose()
End Sub
'TODO another constructor with AutoResetEvent
Public Overrides Function Read(ByVal Buffer() As Byte, ByVal BufferOffset As Integer, ByVal Count As Integer) As
Integer
Dim BytesRead As Integer
Dim BytesRemaining As Integer = Count
Dim ChunkOffset As Integer

If BufferOffset < 0 Then Throw New System.ArgumentOutOfRangeException("Bufferchunk.Read - BufferOffset is < 0.")
If Count < 0 Then Throw New System.ArgumentOutOfRangeException("FOFR.Read - count is < 0.")

'TODO - implement System.ArgumentException: The sum of offset and count is larger than the buffer length.
'TODO - implement System.ArgumentNullException: buffer is null.
'cant happen with mem - System.IO.IOException: An I/O error occurs. cant happen with mem
'TODO - implement System.ObjectDisposedException: Methods were called after the stream was closed.

Do
Do While mBufferChunks.Count > 0
If mPosition >= mBufferChunks.Item(0).LastPosition Then
mBufferChunks.RemoveAt(0)
Else
Exit Do ' we have the right chunk.
End If
Loop

If mBufferChunks.Count = 0 Then Return BytesRead

ChunkOffset = CInt(mPosition - mBufferChunks(0).StartingPosition)
BytesRead = mBufferChunks(0).Read(ChunkOffset, Buffer, BufferOffset, BytesRemaining)
BufferOffset = BufferOffset + BytesRead
BytesRemaining = BytesRemaining - BytesRead
mPosition = mPosition + BytesRead
Loop Until (BytesRemaining = 0) Or (mPosition = mLength) ' filled the buffer, or eof

Return BytesRead
End Function
#Region "Properties"
Public Overrides ReadOnly Property Length() As Long
Get
Return mLength
End Get
End Property

Public Overrides Property Position() As Long
Get
Return mPosition
End Get
Set(ByVal value As Long)
mPosition = value ' BufferChunks will be lost..
End Set
End Property

#Region "Unused Properties"
Public Overrides ReadOnly Property CanRead() As Boolean
Get
Return True
End Get
End Property

Public Overrides ReadOnly Property CanSeek() As Boolean
Get
Return False
End Get
End Property

Public Overrides ReadOnly Property CanWrite() As Boolean
Get
Return False
End Get
End Property

Public Overrides Sub Flush()
Throw New NotSupportedException("ForwardOnlyMemoryStreamReader does not support Flush.")
End Sub

Public Overrides Function Seek(ByVal offset As Long, ByVal origin As System.IO.SeekOrigin) As Long
Throw New System.NotSupportedException("ForwardOnlyMemoryStreamReader does not support Seek.")
End Function

Public Overrides Sub SetLength(ByVal value As Long)
Throw New NotSupportedException("ForwardOnlyMemoryStreamReader does not support SetLength.")
End Sub

Public Overrides Sub Write(ByVal buffer() As Byte, ByVal offset As Integer, ByVal count As Integer)
Throw New NotSupportedException("ForwardOnlyMemoryStreamReader does not support Write.")
End Sub
#End Region
#End Region
End Class

Friend Class FOFiRTest
Friend Function Test() As Boolean
Dim i As Integer
Dim ActualText As String = File.ReadAllText("c:\ForwardOnlyFileReaderTest.txt")
' Create a file with notepad.. 'ActualText = "abcdefghijklmnopqrstuvwxyz0123456789"
Dim encoding As System.Text.Encoding = System.Text.Encoding.ASCII

For i = 1 To 64
Dim s As String = ""
Dim TestFileStream As New FileStream("c:\ForwardOnlyFileReaderTest.txt", FileMode.Open)
Dim FOFir As New ForwardOnlyFileReader(TestFileStream, i)
Dim Bytes(5) As Byte ' TODO - FIX BUG Put another for loop inside For i, and vary the buffer size. 3 works,
5 fails.
Do While FOFir.Read(Bytes, 0, 3) > 0 ' 3 bytes
s = s & encoding.GetString(Bytes)
Loop
If s.Length <> ActualText.Length Then
If s <> ActualText Then Return False
End If
Next

Return True
End Function
End Class
 
This is 6x slower than a memory stream. VS 2008 Performance Explorer shows a
lot of time being wasted calling BufferChunk.Read. Setting up a stack frame and
passing all the parameters around adds up I think. Compiler should inline this,
but it looks like I will have to do that myself.

Works when passed to a binary reader, and has a minimal unit test,
that intentionally uses tiny buffers to allow testing of reads that
span buffers..

Typical usage would be with a 64k buffer. More than 80k, and the Large
Object Heap may become a problem...

Stay tuned for version 1.1

============================================================
Imports System.IO
Imports System.Collections.Generic

Friend Class ForwardOnlyFileReader : Inherits Stream
Friend Class BufferChunk
Private mChunk() As Byte
Private mStartingPosition As Long

Friend ReadOnly Property StartingPosition() As Long
Get
Return mStartingPosition
End Get
End Property

Friend ReadOnly Property LastPosition() As Long
Get
Return mStartingPosition + mChunk.Length
End Get
End Property

Friend Sub New(ByVal StartingPosition As Long, ByRef Chunk() As Byte)
If Chunk.Count = 0 Then Throw New System.ArgumentNullException("Bufferchunk.Read - Chunk is Null.")

mStartingPosition = StartingPosition
mChunk = Chunk
End Sub

Friend Function Read(ByVal ChunkOffset As Integer, ByVal Buffer() As Byte, ByVal BufferOffset As Integer, ByVal
DesiredBytes As Integer) As Integer
DesiredBytes = Math.Min(DesiredBytes, mChunk.Count - ChunkOffset) ' take what we can get..
Array.ConstrainedCopy(mChunk, ChunkOffset, Buffer, BufferOffset, DesiredBytes)

Return DesiredBytes
End Function
End Class

Private mBufferChunks As New List(Of BufferChunk)
Private mChunkSize As Integer
Private mLength As Long
Private mPosition As Long

Public Sub New(ByVal FileStream As FileStream, ByVal ChunkSize As Integer)
If FileStream Is Nothing Then Throw New System.ArgumentNullException("FileStream is null.")
If Not FileStream.CanRead Then Throw New System.NotSupportedException("The stream does not support reading.")

mChunkSize = ChunkSize
mLength = FileStream.Length
Dim aBufferChunk As BufferChunk
Do Until FileStream.Position = mLength
Dim PositionBeforeRead As Long = FileStream.Position
Dim Buffer(CInt(Math.Min(ChunkSize - 1, mLength - FileStream.Position - 1))) As Byte
Dim Bytes As Integer = FileStream.Read(Buffer, 0, Buffer.Count)
aBufferChunk = New BufferChunk(PositionBeforeRead, Buffer)
mBufferChunks.Add(aBufferChunk)
Loop

FileStream.Close()
FileStream.Dispose()
End Sub
'TODO another constructor with AutoResetEvent
Public Overrides Function Read(ByVal Buffer() As Byte, ByVal BufferOffset As Integer, ByVal Count As Integer) As
Integer
Dim BytesRead As Integer
Dim BytesRemaining As Integer = Count
Dim ChunkOffset As Integer

If BufferOffset < 0 Then Throw New System.ArgumentOutOfRangeException("ForwardOnlyFileReader.Read - BufferOffset
is < 0.")
If Count < 0 Then Throw New System.ArgumentOutOfRangeException("ForwardOnlyFileReader.Read - count is < 0.")
If BufferOffset + Count > Buffer.Length Then Throw New System.ArgumentException("ForwardOnlyFileReader.Read -
The sum of offset and- count is larger than the buffer length.")
If Buffer.Length = 0 Then Throw New System.ArgumentNullException("ForwardOnlyFileReader.Buffer is null.")

Do
RemoveBuffers()
If mBufferChunks.Count = 0 Then Return BufferOffset

ChunkOffset = CInt(mPosition - mBufferChunks(0).StartingPosition)
BytesRead = mBufferChunks(0).Read(ChunkOffset, Buffer, BufferOffset, BytesRemaining)
BufferOffset = BufferOffset + BytesRead
BytesRemaining = BytesRemaining - BytesRead
mPosition = mPosition + BytesRead
Loop Until (BytesRemaining = 0) Or (mPosition = mLength) ' filled the buffer, or eof

Return BufferOffset
End Function

Private Sub RemoveBuffers()
Do While mBufferChunks.Count > 0
If mPosition >= mBufferChunks.Item(0).LastPosition Then
mBufferChunks.RemoveAt(0)
Else
Exit Do ' we have the right chunk.
End If
Loop
End Sub

'Private Function RemainingBufferedBytes() As Long ' Unneeded ?!?
' If mBufferChunks.Count = 0 Then Return 0

' Return mBufferChunks(mBufferChunks.Count - 1).LastPosition - mPosition
'End Function
#Region "Properties"
Public Overrides ReadOnly Property Length() As Long
Get
Return mLength
End Get
End Property

Public Overrides Property Position() As Long
Get
Return mPosition
End Get
Set(ByVal value As Long)
mPosition = value ' BufferChunks will be lost..
End Set
End Property

#Region "Unsupported Properties"
Public Overrides ReadOnly Property CanRead() As Boolean
Get
Return True
End Get
End Property

Public Overrides ReadOnly Property CanSeek() As Boolean
Get
Return False
End Get
End Property

Public Overrides ReadOnly Property CanWrite() As Boolean
Get
Return False
End Get
End Property

Public Overrides Sub Flush()
Throw New NotSupportedException("ForwardOnlyFileReader does not support Flush.")
End Sub

Public Overrides Function Seek(ByVal offset As Long, ByVal origin As System.IO.SeekOrigin) As Long
Throw New System.NotSupportedException("ForwardOnlyFileReader does not support Seek.")
End Function

Public Overrides Sub SetLength(ByVal value As Long)
Throw New NotSupportedException("ForwardOnlyFileReader does not support SetLength.")
End Sub

Public Overrides Sub Write(ByVal buffer() As Byte, ByVal offset As Integer, ByVal count As Integer)
Throw New NotSupportedException("ForwardOnlyFileReader does not support Write.")
End Sub
#End Region
#End Region
End Class

Friend Class FOFiRTest
Friend Function Test() As Boolean
Dim ActualText As String = File.ReadAllText("c:\ForwardOnlyFileReaderTest.txt")
' Create a file with notepad.. 'ActualText = "abcdefghijklmnopqrstuvwxyz0123456789"
Dim encoding As System.Text.Encoding = System.Text.Encoding.ASCII

For ChunkSize As Integer = 1 To 64
For ReadSize As Integer = 1 To 32
Dim s As String = ""
Dim TestFileStream As New FileStream("c:\ForwardOnlyFileReaderTest.txt", FileMode.Open)
Dim FOFir As New ForwardOnlyFileReader(TestFileStream, ChunkSize)
Dim Bytes(ReadSize - 1) As Byte
Dim BytesRead As Integer

Do While FOFir.Position < FOFir.Length
BytesRead = FOFir.Read(Bytes, 0, ReadSize)
s = s & encoding.GetString(Bytes, 0, BytesRead)
Loop

If s.Length <> ActualText.Length Then
If s <> ActualText Then
Debug.WriteLine("s = " & s)
Debug.WriteLine("Actual = " & ActualText)
Debug.WriteLine("Test failed with ChunkSize of " & ChunkSize & " and ReadSize of " & ReadSize)
Return False
End If
End If
Next
Next

Return True
End Function
End Class
 
Robert said:
Public Sub New(ByVal FileStream As FileStream, ByVal ChunkSize As
Integer)

FileStream.Close()
FileStream.Dispose()
End Sub


I would recommend not closing, nor disposing, of the stream inside your
constructor. It has nothing to do with performance, but there are reasons
that the caller may expect to be able to call these methods, especially
since you are taking an open stream in this constructor.
 
Some TODO's in there which I will fix in 12-16 hours.  Beer time here on the beach..
Also, I have only used this via the test case.  I have not plugged it into a BinaryReader
yet.  There may be some impedance mismatches...

145 lines for the class, 170 with the test case.
3 hours work. so about 1 line per minute, including the blank ones

Last, there is at least one bug..

Happy file reading to all!

Imports System.IO
Imports System.Collections.Generic

Friend Class ForwardOnlyFileReader : Inherits Stream
    Friend Class BufferChunk
        Private mChunk() As Byte
        Private mStartingPosition As Long

        Friend ReadOnly Property StartingPosition() As Long
            Get
                Return mStartingPosition
            End Get
        End Property

        Friend ReadOnly Property LastPosition() As Long
            Get
                Return mStartingPosition + mChunk.Length
            End Get
        End Property

        Friend Sub New(ByVal StartingPosition As Long, ByRef Chunk() As Byte)
            If Chunk.Count = 0 Then Throw New System.ArgumentNullException("Bufferchunk.Read - Chunk is Null.")

            mStartingPosition = StartingPosition
            mChunk = Chunk
        End Sub

        Friend Function Read(ByVal ChunkOffset As Integer, ByVal Buffer() As Byte, ByVal BufferOffset As Integer, ByVal
DesiredBytes As Integer) As Integer
            DesiredBytes = Math.Min(DesiredBytes, mChunk.Count - ChunkOffset) ' take what we can get..
            Array.ConstrainedCopy(mChunk, ChunkOffset, Buffer, BufferOffset, DesiredBytes)

            Return DesiredBytes
        End Function
    End Class

    Private mBufferChunks As New List(Of BufferChunk)
    Private mChunkSize As Integer
    Private mLength As Long
    Private mPosition As Long

    Public Sub New(ByVal FileStream As FileStream, ByVal ChunkSize AsInteger)
        If FileStream Is Nothing Then Throw New System.ArgumentNullException("FileStream is null.")
        If Not FileStream.CanRead Then Throw New System.NotSupportedException("The stream does not support reading.")

        mChunkSize = ChunkSize
        mLength = FileStream.Length
        Dim aBufferChunk As BufferChunk
        Do Until FileStream.Position = mLength
            Dim PositionBeforeRead As Long = FileStream.Position
            Dim Buffer(CInt(Math.Min(ChunkSize - 1, mLength -FileStream.Position - 1))) As Byte
            Dim Bytes As Integer = FileStream.Read(Buffer, 0, Buffer.Count)
            aBufferChunk = New BufferChunk(PositionBeforeRead, Buffer)
            mBufferChunks.Add(aBufferChunk)
        Loop

        FileStream.Close()
        FileStream.Dispose()
    End Sub
    'TODO another constructor with AutoResetEvent
    Public Overrides Function Read(ByVal Buffer() As Byte, ByVal BufferOffset As Integer, ByVal Count As Integer) As
Integer
        Dim BytesRead As Integer
        Dim BytesRemaining As Integer = Count
        Dim ChunkOffset As Integer

        If BufferOffset < 0 Then Throw New System.ArgumentOutOfRangeException("Bufferchunk.Read - BufferOffset is < 0.")
        If Count < 0 Then Throw New System.ArgumentOutOfRangeException("FOFR.Read - count is < 0.")

        'TODO - implement System.ArgumentException: The sum of offset and count is larger than the buffer length.
        'TODO - implement System.ArgumentNullException: buffer isnull.
        'cant happen with mem - System.IO.IOException: An I/O error occurs.  cant happen with mem
        'TODO - implement System.ObjectDisposedException: Methodswere called after the stream was closed.

        Do
            Do While mBufferChunks.Count > 0
                If mPosition >= mBufferChunks.Item(0).LastPosition Then
                    mBufferChunks.RemoveAt(0)
                Else
                    Exit Do ' we have the right chunk..
                End If
            Loop

            If mBufferChunks.Count = 0 Then Return BytesRead

            ChunkOffset = CInt(mPosition - mBufferChunks(0)..StartingPosition)
            BytesRead = mBufferChunks(0).Read(ChunkOffset, Buffer, BufferOffset, BytesRemaining)
            BufferOffset = BufferOffset + BytesRead
            BytesRemaining = BytesRemaining - BytesRead
            mPosition = mPosition + BytesRead
        Loop Until (BytesRemaining = 0) Or (mPosition = mLength) ' filled the buffer, or eof

        Return BytesRead
    End Function
#Region "Properties"
    Public Overrides ReadOnly Property Length() As Long
        Get
            Return mLength
        End Get
    End Property

    Public Overrides Property Position() As Long
        Get
            Return mPosition
        End Get
        Set(ByVal value As Long)
            mPosition = value ' BufferChunks will be lost..
        End Set
    End Property

#Region "Unused Properties"
    Public Overrides ReadOnly Property CanRead() As Boolean
        Get
            Return True
        End Get
    End Property

    Public Overrides ReadOnly Property CanSeek() As Boolean
        Get
            Return False
        End Get
    End Property

    Public Overrides ReadOnly Property CanWrite() As Boolean
        Get
            Return False
        End Get
    End Property

    Public Overrides Sub Flush()
        Throw New NotSupportedException("ForwardOnlyMemoryStreamReader does not support Flush.")
    End Sub

    Public Overrides Function Seek(ByVal offset As Long, ByVal originAs System.IO.SeekOrigin) As Long
        Throw New System.NotSupportedException("ForwardOnlyMemoryStreamReader does not support Seek.")
    End Function

    Public Overrides Sub SetLength(ByVal value As Long)
        Throw New NotSupportedException("ForwardOnlyMemoryStreamReader does not support SetLength.")
    End Sub

    Public Overrides Sub Write(ByVal buffer() As Byte, ByVal offset As Integer, ByVal count As Integer)
        Throw New NotSupportedException("ForwardOnlyMemoryStreamReader does not support Write.")
    End Sub
#End Region
#End Region
End Class

Friend Class FOFiRTest
    Friend Function Test() As Boolean
        Dim i As Integer
        Dim ActualText As String = File.ReadAllText("c:\ForwardOnlyFileReaderTest.txt")
        ' Create a file with notepad..  'ActualText = "abcdefghijklmnopqrstuvwxyz0123456789"
        Dim encoding As System.Text.Encoding = System.Text.Encoding.ASCII

        For i = 1 To 64
            Dim s As String = ""
            Dim TestFileStream As New FileStream("c:\ForwardOnlyFileReaderTest.txt", FileMode.Open)
            Dim FOFir As New ForwardOnlyFileReader(TestFileStream, i)
            Dim Bytes(5) As Byte ' TODO - FIX BUG Put anotherfor loop inside For i, and vary the buffer size. 3 works,
5 fails.
            Do While FOFir.Read(Bytes, 0, 3) > 0 ' 3 bytes
                s = s & encoding.GetString(Bytes)
            Loop
            If s.Length <> ActualText.Length Then
                If s <> ActualText Then Return False
            End If
        Next

        Return True
    End Function
End Class

What about with big files, it totally consumes a lot of memory.
Well you can also change the constructor to more generic version which
is actually Stream.

Best Regards
Kerem Küsmezer
 
What about with big files, it totally consumes a lot of memory.
Well you can also change the constructor to more generic version which
is actually Stream.

Yes, the first hit uses fileize amount of memory, but as it is read, this is released
slowly, as you move into new chunks of the file.

It is not useful in all cases, only when constrained by memory.

I am not entirely happy with it, as it is 2x slower than filestream and memory
stream. But when those use too much memory it helps a bit.

I have a wrapper that looks at the number of threads and therefore number of readers
and when the usual .Net classes start using too much memory, then uses this stream type/
So maybe 10% of the files use this class. Small files just buffer everything into
a memory Stream, the rest use this..

Still a win, as small files run fast, as before, big files run slower, but at least they run,
Instead of just waiting for memory... Something like x + (0.1 * (1-x))

Gives about a 10% increase in throughput. Do NOT use in all cases.

So, big reads for disk performance. Low memory, use this.
But still do big reads, just release the memory more often so that
avg memory used is filesize/2. More threads can then run..

My data is somewhat bursty. Some big files, some small.
All works well unless you get 5-10 big files in a row.
then out of memory.

Still looking at other ideas, but so far, this combination yields the fastest
run times on MY 20 gig data set, with 2-4 threads. Other data sets may
need other techniques... Mine is all sequential, read 20 gigs, write 10 megs.
Read speed high with as many working threads as possible is the goal.
 
Back
Top