How to read the LAST NON-BLANK line from a text file?

  • Thread starter Thread starter Tony Bansten
  • Start date Start date
T

Tony Bansten

How can I read the last non-blank (=non whitespace) line from a text file?

Ok, the most primitive way would be to start reading from the beginning and then loop
through all the lines until the end.

But that is rather unconvenient and cumbersome.

Is there a smarter way?

Tony
 
Hi Tony,

What I probably would do is loop from the end of the textfile to the
beginning until the character is no whitespace.

Probably is that the most effective, I don't think that this will be choosen
by any method which is behind the scene.

Cor
 
The most primitive way is what the luxury coders can do and that is reading
the whole file split it and then loop in reverse order
however if you once processed files of gigagbytes in size then you
understand that this isn`t going to work in all situations your app will
blow up ( out of memory )
or it will become verry slow .

The following piece of code solves this problem as it only reads a peace of
the end of the file and then split the lines
so it doesn`t loop through a large file it reads a chunk at the end and then
splits it in a few lines determines the last one and returns this

'---------------------------------------------------------------------------------------
'-- Michel Posseth [MCP]
'-- Created on : 15-03-2009
'-- http:\\www.vbdotnetcoder.com
'-- (e-mail address removed)
'---------------------------------------------------------------------------------------
Option Compare Binary
Option Explicit On
Option Strict On
Option Infer On
Imports System.IO
Imports System.Text
Public Class ClsReadTextFileReversed
Implements IDisposable
Private _FileToReadReverse As String
Public Property FileToReadReverse() As String
Get
If Not My.Computer.FileSystem.FileExists(_FileToReadReverse)
Then
Throw New ArgumentException("Property doesn`t contain file
path")
End If
Return _FileToReadReverse
End Get
Private Set(ByVal value As String)
If Not My.Computer.FileSystem.FileExists(value) Then
Throw New ArgumentException("File does not exist")
End If
_FileToReadReverse = value
End Set
End Property
Public Sub New(ByVal FullFilePath As String)
Me.FileToReadReverse = FullFilePath
End Sub
Private _Sr As StreamReader
Public Property Sr() As StreamReader
Get
Return _Sr
End Get
Private Set(ByVal value As StreamReader)
_Sr = value
End Set
End Property
Private _Fs As FileStream
Public Property Fs() As FileStream
Get
Return _Fs
End Get
Private Set(ByVal value As FileStream)
_Fs = value
End Set
End Property
Private _filesize As Long
Public Property Filesize() As Long
Get
Return _filesize
End Get
Private Set(ByVal value As Long)
_filesize = value
End Set
End Property
Private Function Init() As Boolean
Dim ret As Boolean
If Not Initialized Then
Try
Fs = New FileStream(FileToReadReverse, FileMode.Open,
FileAccess.Read, FileShare.Read)
Sr = New StreamReader(Fs, True)
Filesize = Sr.BaseStream.Length
Initialized = True
ret = True
Catch ex As Exception
RaiseEvent eException(ex)
ret = False
End Try
Else
ret = True
End If
Return ret
End Function
Public Event eException(ByVal ex As Exception)
Private _Initialized As Boolean
Public Property Initialized() As Boolean
Get
Return _Initialized
End Get
Private Set(ByVal value As Boolean)
_Initialized = value
End Set
End Property
Private _newline As String = Environment.NewLine
Public Property Newline() As String
Get
Return _newline
End Get
Set(ByVal value As String)
_newline = value
End Set
End Property
Private _SplitOption As StringSplitOptions =
StringSplitOptions.RemoveEmptyEntries
Public Property SplitOption() As StringSplitOptions
Get
Return _SplitOption
End Get
Set(ByVal value As StringSplitOptions)
_SplitOption = value
End Set
End Property
Public Function ReadLastLine() As String
Dim ret As String = String.Empty
If Init() Then
Dim buffersize As Long = 1024

buffersize = Math.Min(Filesize, buffersize)
Sr.BaseStream.Seek(-buffersize, SeekOrigin.End)

Dim text As String = Sr.ReadToEnd
Dim lines As String() = text.Split(New String() {Newline},
SplitOption)
Dim n As Integer = 4

If lines.Length <= n Then
If lines.Length < n Then
n = lines.Length
End If
If Filesize = buffersize + 1 Then
n -= 1
ElseIf Filesize >= buffersize + 2 Then
If lines(0) = "" OrElse lines(0)(0) <> ControlChars.Lf
Then
Sr.BaseStream.Seek(-buffersize - 2,
SeekOrigin.[End])
If Not (Sr.Read() = 13 AndAlso Sr.Read() = 10) Then
n -= 1
End If
Else
Sr.BaseStream.Seek(-buffersize - 1,
SeekOrigin.[End])
If Not (Sr.Read() = 13) Then
n -= 1
Else
lines(0) = lines(0).Substring(1)
End If
End If
End If
End If

Sr.Close()
Dim lastLines As String()

If n < lines.Length Then
lastLines = New String(n - 1) {}
If n > 0 Then
Array.Copy(lines, lines.Length - n, lastLines, 0, n)
End If
Else
lastLines = lines
End If
ret = lastLines(lastLines.Length - 1)
End If
Return ret
End Function
Public Sub Close()
If Sr IsNot Nothing Then
Sr.Close()
Sr.Dispose()
Sr = Nothing
End If
If Fs IsNot Nothing Then
Fs.Close()
Fs.Dispose()
Fs = Nothing
End If
End Sub
#Region " IDisposable Support "
Private disposedValue As Boolean = False ' To detect redundant
calls
' IDisposable
Protected Overridable Sub Dispose(ByVal disposing As Boolean)
If Not Me.disposedValue Then
If disposing Then
If Sr IsNot Nothing Then
Sr.Close()
Sr.Dispose()
Sr = Nothing
End If
If Fs IsNot Nothing Then
Fs.Close()
Fs.Dispose()
Fs = Nothing
End If
' TODO: free other state (managed objects).
End If

' TODO: free your own state (unmanaged objects).
' TODO: set large fields to null.
End If
Me.disposedValue = True
End Sub
' This code added by Visual Basic to correctly implement the disposable
pattern.
Public Sub Dispose() Implements IDisposable.Dispose
' Do not change this code. Put cleanup code in Dispose(ByVal
disposing As Boolean) above.
Dispose(True)
GC.SuppressFinalize(Me)
End Sub
#End Region
End Class


'
 
Michel,

Correct, are you sure this is about Gigabytes, for a string this probably
more than any company can produce.

(As soon that it becomes an array of whaterver then the sentence above
change)

Therefore I am interesting what company is able to do that, for sure not a
simple Dutch energy company or something like that.

Do you have a job now at City Bank?

:-)

Cor
The most primitive way is what the luxury coders can do and that is
reading
the whole file split it and then loop in reverse order
however if you once processed files of gigagbytes in size then you
understand that this isn`t going to work in all situations your app will
blow up ( out of memory )
or it will become verry slow .

The following piece of code solves this problem as it only reads a peace
of
the end of the file and then split the lines
so it doesn`t loop through a large file it reads a chunk at the end and
then
splits it in a few lines determines the last one and returns this

'---------------------------------------------------------------------------------------
'-- Michel Posseth [MCP]
'-- Created on : 15-03-2009
'-- http:\\www.vbdotnetcoder.com
'-- (e-mail address removed)
'---------------------------------------------------------------------------------------
Option Compare Binary
Option Explicit On
Option Strict On
Option Infer On
Imports System.IO
Imports System.Text
Public Class ClsReadTextFileReversed
Implements IDisposable
Private _FileToReadReverse As String
Public Property FileToReadReverse() As String
Get
If Not My.Computer.FileSystem.FileExists(_FileToReadReverse)
Then
Throw New ArgumentException("Property doesn`t contain file
path")
End If
Return _FileToReadReverse
End Get
Private Set(ByVal value As String)
If Not My.Computer.FileSystem.FileExists(value) Then
Throw New ArgumentException("File does not exist")
End If
_FileToReadReverse = value
End Set
End Property
Public Sub New(ByVal FullFilePath As String)
Me.FileToReadReverse = FullFilePath
End Sub
Private _Sr As StreamReader
Public Property Sr() As StreamReader
Get
Return _Sr
End Get
Private Set(ByVal value As StreamReader)
_Sr = value
End Set
End Property
Private _Fs As FileStream
Public Property Fs() As FileStream
Get
Return _Fs
End Get
Private Set(ByVal value As FileStream)
_Fs = value
End Set
End Property
Private _filesize As Long
Public Property Filesize() As Long
Get
Return _filesize
End Get
Private Set(ByVal value As Long)
_filesize = value
End Set
End Property
Private Function Init() As Boolean
Dim ret As Boolean
If Not Initialized Then
Try
Fs = New FileStream(FileToReadReverse, FileMode.Open,
FileAccess.Read, FileShare.Read)
Sr = New StreamReader(Fs, True)
Filesize = Sr.BaseStream.Length
Initialized = True
ret = True
Catch ex As Exception
RaiseEvent eException(ex)
ret = False
End Try
Else
ret = True
End If
Return ret
End Function
Public Event eException(ByVal ex As Exception)
Private _Initialized As Boolean
Public Property Initialized() As Boolean
Get
Return _Initialized
End Get
Private Set(ByVal value As Boolean)
_Initialized = value
End Set
End Property
Private _newline As String = Environment.NewLine
Public Property Newline() As String
Get
Return _newline
End Get
Set(ByVal value As String)
_newline = value
End Set
End Property
Private _SplitOption As StringSplitOptions =
StringSplitOptions.RemoveEmptyEntries
Public Property SplitOption() As StringSplitOptions
Get
Return _SplitOption
End Get
Set(ByVal value As StringSplitOptions)
_SplitOption = value
End Set
End Property
Public Function ReadLastLine() As String
Dim ret As String = String.Empty
If Init() Then
Dim buffersize As Long = 1024

buffersize = Math.Min(Filesize, buffersize)
Sr.BaseStream.Seek(-buffersize, SeekOrigin.End)

Dim text As String = Sr.ReadToEnd
Dim lines As String() = text.Split(New String() {Newline},
SplitOption)
Dim n As Integer = 4

If lines.Length <= n Then
If lines.Length < n Then
n = lines.Length
End If
If Filesize = buffersize + 1 Then
n -= 1
ElseIf Filesize >= buffersize + 2 Then
If lines(0) = "" OrElse lines(0)(0) <> ControlChars.Lf
Then
Sr.BaseStream.Seek(-buffersize - 2,
SeekOrigin.[End])
If Not (Sr.Read() = 13 AndAlso Sr.Read() = 10) Then
n -= 1
End If
Else
Sr.BaseStream.Seek(-buffersize - 1,
SeekOrigin.[End])
If Not (Sr.Read() = 13) Then
n -= 1
Else
lines(0) = lines(0).Substring(1)
End If
End If
End If
End If

Sr.Close()
Dim lastLines As String()

If n < lines.Length Then
lastLines = New String(n - 1) {}
If n > 0 Then
Array.Copy(lines, lines.Length - n, lastLines, 0, n)
End If
Else
lastLines = lines
End If
ret = lastLines(lastLines.Length - 1)
End If
Return ret
End Function
Public Sub Close()
If Sr IsNot Nothing Then
Sr.Close()
Sr.Dispose()
Sr = Nothing
End If
If Fs IsNot Nothing Then
Fs.Close()
Fs.Dispose()
Fs = Nothing
End If
End Sub
#Region " IDisposable Support "
Private disposedValue As Boolean = False ' To detect redundant
calls
' IDisposable
Protected Overridable Sub Dispose(ByVal disposing As Boolean)
If Not Me.disposedValue Then
If disposing Then
If Sr IsNot Nothing Then
Sr.Close()
Sr.Dispose()
Sr = Nothing
End If
If Fs IsNot Nothing Then
Fs.Close()
Fs.Dispose()
Fs = Nothing
End If
' TODO: free other state (managed objects).
End If

' TODO: free your own state (unmanaged objects).
' TODO: set large fields to null.
End If
Me.disposedValue = True
End Sub
' This code added by Visual Basic to correctly implement the disposable
pattern.
Public Sub Dispose() Implements IDisposable.Dispose
' Do not change this code. Put cleanup code in Dispose(ByVal
disposing As Boolean) above.
Dispose(True)
GC.SuppressFinalize(Me)
End Sub
#End Region
End Class


'






Tony Bansten said:
How can I read the last non-blank (=non whitespace) line from a text
file?

Ok, the most primitive way would be to start reading from the beginning
and then loop
through all the lines until the end.

But that is rather unconvenient and cumbersome.

Is there a smarter way?

Tony
 
Hello Cor ,

In my previous job i worked with flat files of + 12 GB in size this was for
automotive catalogue software used throughout europe
A average Tecdoc source data update is aprox 4 to 6 dvd`s of data and
pictures http://www.tecdoc.de i worked For Newco Nohau who used the Tecdoc
data and combined this with own linked data .

That is why i laught so hard when i started my job at my current employer
and they said i probably had never seen such hughe data sets :-) , remember
that cataloging is all about linking data as you might know that only in the
Netherlands we have 8 million license plate numbers and all of them are
refering to hundreds of parts in our master , and we generated data for all
western and eastern european countries .

How big do you estimate that the parts catalog of VAG ( ETKA ) is ? wel
then multiply this with every make there is
and you have a estimate how big the data of my past employer would be in a
ideal situation ( it is much smaller , i can ensure you
for multiple reassons i will not bother you with)

in my current job we have in fact files that are not so big but gigabytes we
can reach easily :-)
for sure not a simple Dutch energy company or something like that.

No we are not Dutch , the head office is in Germany but we are founded in
1902 in Denmark and for a fact we are not just a energy company ,we are the
global market leader in the consumption-dependent billing of energy, water
and ancillary costs. The services offered to property managers, house owners
and energy utilities range from the supply and installation of meters to
metering and consumption-dependent billing. With its products and services,
we make an important contribution to the responsible use of water, heat and
energy. You can find our company even in China as in the states as in the
United Arab Emirates for a fact in 26 countries wordwide and we are still
growing even now in a time of recession :-) .
Do you have a job now at City Bank?

Our average heat cost allocater user has +- 6 of our meters in his home we
meter in the Netherlands 350.000 homes
our radio controled meters send there values on a daily bases , so this
means 2100.000 transaction records in a daily bases
for just the Netherlands of only the heat cost allocators ( and we have lots
of more services ) . i wonder what the transacion rate of Citibank
Netherlands Branch would be :-) .

That is why is stated in my answer that a developer in a luxury position (
IMHO , a developer who only uses small amounts of data
and deals with common practice situations ) can just split the file an read
it reverse , however i often need to pioneer in my job
as i deal a lot with uncomon situations :-( .

Regards

Michel










Cor Ligthert said:
Michel,

Correct, are you sure this is about Gigabytes, for a string this probably
more than any company can produce.

(As soon that it becomes an array of whaterver then the sentence above
change)

Therefore I am interesting what company is able to do that, for sure not a
simple Dutch energy company or something like that.

Do you have a job now at City Bank?

:-)

Cor
The most primitive way is what the luxury coders can do and that is
reading
the whole file split it and then loop in reverse order
however if you once processed files of gigagbytes in size then you
understand that this isn`t going to work in all situations your app will
blow up ( out of memory )
or it will become verry slow .

The following piece of code solves this problem as it only reads a peace
of
the end of the file and then split the lines
so it doesn`t loop through a large file it reads a chunk at the end and
then
splits it in a few lines determines the last one and returns this

'---------------------------------------------------------------------------------------
'-- Michel Posseth [MCP]
'-- Created on : 15-03-2009
'-- http:\\www.vbdotnetcoder.com
'-- (e-mail address removed)
'---------------------------------------------------------------------------------------
Option Compare Binary
Option Explicit On
Option Strict On
Option Infer On
Imports System.IO
Imports System.Text
Public Class ClsReadTextFileReversed
Implements IDisposable
Private _FileToReadReverse As String
Public Property FileToReadReverse() As String
Get
If Not My.Computer.FileSystem.FileExists(_FileToReadReverse)
Then
Throw New ArgumentException("Property doesn`t contain file
path")
End If
Return _FileToReadReverse
End Get
Private Set(ByVal value As String)
If Not My.Computer.FileSystem.FileExists(value) Then
Throw New ArgumentException("File does not exist")
End If
_FileToReadReverse = value
End Set
End Property
Public Sub New(ByVal FullFilePath As String)
Me.FileToReadReverse = FullFilePath
End Sub
Private _Sr As StreamReader
Public Property Sr() As StreamReader
Get
Return _Sr
End Get
Private Set(ByVal value As StreamReader)
_Sr = value
End Set
End Property
Private _Fs As FileStream
Public Property Fs() As FileStream
Get
Return _Fs
End Get
Private Set(ByVal value As FileStream)
_Fs = value
End Set
End Property
Private _filesize As Long
Public Property Filesize() As Long
Get
Return _filesize
End Get
Private Set(ByVal value As Long)
_filesize = value
End Set
End Property
Private Function Init() As Boolean
Dim ret As Boolean
If Not Initialized Then
Try
Fs = New FileStream(FileToReadReverse, FileMode.Open,
FileAccess.Read, FileShare.Read)
Sr = New StreamReader(Fs, True)
Filesize = Sr.BaseStream.Length
Initialized = True
ret = True
Catch ex As Exception
RaiseEvent eException(ex)
ret = False
End Try
Else
ret = True
End If
Return ret
End Function
Public Event eException(ByVal ex As Exception)
Private _Initialized As Boolean
Public Property Initialized() As Boolean
Get
Return _Initialized
End Get
Private Set(ByVal value As Boolean)
_Initialized = value
End Set
End Property
Private _newline As String = Environment.NewLine
Public Property Newline() As String
Get
Return _newline
End Get
Set(ByVal value As String)
_newline = value
End Set
End Property
Private _SplitOption As StringSplitOptions =
StringSplitOptions.RemoveEmptyEntries
Public Property SplitOption() As StringSplitOptions
Get
Return _SplitOption
End Get
Set(ByVal value As StringSplitOptions)
_SplitOption = value
End Set
End Property
Public Function ReadLastLine() As String
Dim ret As String = String.Empty
If Init() Then
Dim buffersize As Long = 1024

buffersize = Math.Min(Filesize, buffersize)
Sr.BaseStream.Seek(-buffersize, SeekOrigin.End)

Dim text As String = Sr.ReadToEnd
Dim lines As String() = text.Split(New String() {Newline},
SplitOption)
Dim n As Integer = 4

If lines.Length <= n Then
If lines.Length < n Then
n = lines.Length
End If
If Filesize = buffersize + 1 Then
n -= 1
ElseIf Filesize >= buffersize + 2 Then
If lines(0) = "" OrElse lines(0)(0) <> ControlChars.Lf
Then
Sr.BaseStream.Seek(-buffersize - 2,
SeekOrigin.[End])
If Not (Sr.Read() = 13 AndAlso Sr.Read() = 10)
Then
n -= 1
End If
Else
Sr.BaseStream.Seek(-buffersize - 1,
SeekOrigin.[End])
If Not (Sr.Read() = 13) Then
n -= 1
Else
lines(0) = lines(0).Substring(1)
End If
End If
End If
End If

Sr.Close()
Dim lastLines As String()

If n < lines.Length Then
lastLines = New String(n - 1) {}
If n > 0 Then
Array.Copy(lines, lines.Length - n, lastLines, 0, n)
End If
Else
lastLines = lines
End If
ret = lastLines(lastLines.Length - 1)
End If
Return ret
End Function
Public Sub Close()
If Sr IsNot Nothing Then
Sr.Close()
Sr.Dispose()
Sr = Nothing
End If
If Fs IsNot Nothing Then
Fs.Close()
Fs.Dispose()
Fs = Nothing
End If
End Sub
#Region " IDisposable Support "
Private disposedValue As Boolean = False ' To detect redundant
calls
' IDisposable
Protected Overridable Sub Dispose(ByVal disposing As Boolean)
If Not Me.disposedValue Then
If disposing Then
If Sr IsNot Nothing Then
Sr.Close()
Sr.Dispose()
Sr = Nothing
End If
If Fs IsNot Nothing Then
Fs.Close()
Fs.Dispose()
Fs = Nothing
End If
' TODO: free other state (managed objects).
End If

' TODO: free your own state (unmanaged objects).
' TODO: set large fields to null.
End If
Me.disposedValue = True
End Sub
' This code added by Visual Basic to correctly implement the
disposable
pattern.
Public Sub Dispose() Implements IDisposable.Dispose
' Do not change this code. Put cleanup code in Dispose(ByVal
disposing As Boolean) above.
Dispose(True)
GC.SuppressFinalize(Me)
End Sub
#End Region
End Class


'






Tony Bansten said:
How can I read the last non-blank (=non whitespace) line from a text
file?

Ok, the most primitive way would be to start reading from the beginning
and then loop
through all the lines until the end.

But that is rather unconvenient and cumbersome.

Is there a smarter way?

Tony
 
Back
Top