Problem with embedded carriage returns

  • Thread starter Thread starter Harry
  • Start date Start date
Harry said:
Hi All

I have a .csv file that contains newline chars embedded between quotes in a
line of text eg

BSPADV1,John.public,9413,"Sharrock Ashley
TEST STREET 1
TEST NSW 2200",Address Insufficient,,,Mbase Print
Report,R7TDKPFMDBCLKE07CGJMFKKW6VVB/21,Sharrock Ashley

There are actually some 19 columns of data but when a StreamReader.ReadLine
method tries to read line by line, it only returns data up to the first
embedded newline chars. Interestingly, the data does display correctly in
Notepad, so I guess Notepad must ignore newline chars inside quotation
marks.

Is there anyway to read the above line and get the full line of data? My
only thought so far is to use the ReadToEnd method and then try and remove
the newline chars between quotes programatically before Spltting on the
"real" newline chars.

Thanks for any help

Harry, have you tried the builtin FileIO.TextFieldParser class?

It handles embedded eol characters, including embedded eol or commas
within quoted fields broken into two lines. Its handled for you. <g>
plus I presumed that it is cached as it reads, and Windows RTL already
caches text files, so it should be pretty efficient.

Here is an example console application with a ReaderCVS class wrapper
and example usage:

'--------------------------------------------------------------
' File : readcsf.vb
'--------------------------------------------------------------
Option Strict on
Option Explicit On

imports system
imports system.console
imports system.collections.generic

Module module1

public Class ReaderCVS

public shared data as new List(Of List(Of String))

'
' Read cvs file with max_fields, optional eolfilter
'
public shared function ReadCSF( _
byval fn as string, _
byval max_fields as integer, _
optional byval eolfilter as boolean = true) as boolean
try
data.clear
Dim rdr As FileIO.TextFieldParser
rdr = My.Computer.FileSystem.OpenTextFieldParser(fn)
rdr.SetDelimiters(",")

dim flds as new List(Of String)
While Not rdr.EndOfData()
dim lines as string() = rdr.ReadFields()
For Each fld as string In lines
if eolfilter then
'fld = fld.Replace(vbCr,"\r").Replace(vbLf,"\n")
fld = fld.Replace(vbCr," ").Replace(vbLf,"")
end if
flds.Add(fld)
if flds.count = max_fields then
data.Add(flds)
flds = new List(Of String)
end if
Next
End While
if flds.count > 0 then data.Add(flds)
return true

catch ex as exception
writeline(ex.message)
return false
end try
end function

sub Dump()
writeline("------- DUMP ")
dim nrow as integer = 0
for each flds as List(Of String) in data
nrow += 1
dim nfld as integer = 0
for each f as string in flds
nfld += 1
writeline("- r{0,-3} f{1,-3} => [{2}]",nrow, nfld, f)
next
next
end sub

end class

sub main(ByVal args() As String)
dim csf as new ReaderCvs
csf.readcsf("harry1.csv", 19) ' 19 fields
csf.dump()
readkey(true)
end sub

end Module

I used your example text as the harry1.csv test file.

I provided a eolfilter boolean to filter the <CR><LF> and played with
\r\n and space/blank replacements.

--
 
Back
Top