I need help in reading a “messy†CSV file

  • Thread starter Thread starter Brad
  • Start date Start date
B

Brad

I am trying to read a CSV file that contains some messy data.

Here is an example.

John Olsen,London,A1000
Lars Larson,Oslo,B2000
Sean Penn
,Toronto,A1000
Sally Smith,Paris,B3000

Note how the third record should contain a comma after the word Penn, but
does not. Instead the word Penn is followed with a number of spaces and then
the first character in the next line is a comma.

This file is coming from an outside source, so I cannot control how it is
created. I would like to use VBA to read in this file and create a new file
with the records looking like this.

John Olsen,London,A1000
Lars Larson,Oslo,B2000
Sean Penn,Toronto,A1000
Sally Smith,Bonn,B3000

I could then import this “clean†file into Access.

Does anyone have an example on how to accomplish this?

Thanks in advance for your help.

Brad
 
hi Brad,
I am trying to read a CSV file that contains some messy data.
Note how the third record should contain a comma after the word Penn, but
does not. Instead the word Penn is followed with a number of spaces and then
the first character in the next line is a comma.
A line feed (vbLf, &HA) is a valid character in a CSV file, when

a) the text is enclosed in qutoation marks

or

b) you are using a carriage return and a line feed (vbCrLf, &HD &HA) as
a row delimiter.
I could then import this “clean†file into Access.
Import the file using an import specification should work. Use Trim()
and Replace() to clean up the content.


mfG
--> stefan <--
 
Stefan,

Thanks for your thoughts on this.

The messy CSV file comes from an outside source. It does not have quotation
marks around the text.

I am not sure how I can tell what is being used for a line delimiter. I can
view the CSV file with Notepad. I tried to view the file in Hex with another
editor, but no luck.

Brad
 
Using the data you provided, I modified the code I use to import data from
CSV files to my database. There is very little error checking. You will need
to save the GetOpenFileName API to a standard module. (Do NOT name the
module the same as the the code) See:

http://www.mvps.org/access/api/api0001.htm

'----------------------------------------------------
Option Compare Database
Option Explicit

Private Sub Command0_Click() 'my button name
On Error GoTo Err_Handler
Const SearchChar = ","

'-------------------
' declare variables
'-------------------
Dim linesRead As Long
Dim linesWritten As Long

Dim inpStr As String
Dim tmp As String
Dim msgString As String

Dim pos_slash As Integer
Dim pos_dot As Integer
Dim CommaCount As Integer
Dim i As Integer

'RF= Read file
'WF = Write file
Dim RF As Integer
Dim WF As Integer
Dim FileToOpen As String
Dim FileToWrite As String

Dim strFilter As String

'-----------------------------------------------------
'Open the OpenFile Dialog box to get the .csv filename
'-----------------------------------------------------
strFilter = ahtAddFilterItem(strFilter, "CSV Text Files (*.csv)", "*.CSV")
strFilter = ahtAddFilterItem(strFilter, "Excel Files (*.xls)", "*.XLS")
strFilter = ahtAddFilterItem(strFilter, "All Files (*.*)", "*.*")
FileToOpen = ahtCommonFileOpenSave( _
Filter:=strFilter, OpenFile:=True, _
DialogTitle:="Please select an input file...", _
Flags:=ahtOFN_HIDEREADONLY)

If FileToOpen = "" Then
Exit Sub
End If

' Open file for input.
RF = FreeFile
Open FileToOpen For Input As #RF

If (Err.Number <> 0) Then
MsgBox "Unable to open file. Aborting!!!"
Close #RF
Exit Sub
End If

'check to be sure file is not RTF
Line Input #RF, inpStr
If (Left(inpStr, 1) = "{") Then
MsgBox "This is a Rich Text Format file, but I need a text only file
to" & _
vbCrLf & "work properly. Please reopen this in word and save
it" & _
vbCrLf & "as text only. You might also have to enclose the
name in" & _
vbCrLf & "double quotes to get it to save without the .txt
extension."
Close #1
Exit Sub
End If

'close file and reopen
Close #RF
inpStr = ""

RF = FreeFile
Open FileToOpen For Input As #RF


linesRead = 0
linesWritten = 0

'create the output file
WF = FreeFile
pos_slash = InStrRev(FileToOpen, "\")
pos_dot = InStrRev(FileToOpen, ".")
FileToWrite = Left(FileToOpen, pos_slash)
'create name
FileToWrite = FileToWrite & Mid(FileToOpen, pos_slash + 1, (pos_dot - 1)
- pos_slash) & "_Cleaned.csv"

Open FileToWrite For Output As #WF

'-----------------------
'start of routine
'-----------------------
Do While Not EOF(RF)
' clear variables
inpStr = ""
tmp = ""
CommaCount = 0

Line Input #1, inpStr
linesRead = linesRead + 1

'count number of commas in inputline
For i = 1 To Len(inpStr)
If Mid(inpStr, i, 1) = SearchChar Then
CommaCount = CommaCount + 1
End If
Next

If CommaCount <> 2 Then
'save the current line
tmp = inpStr

'read next line
Line Input #1, inpStr
linesRead = linesRead + 1
'join lines
inpStr = tmp & inpStr
End If

'write line to file
Print #WF, inpStr
linesWritten = linesWritten + 1

Loop

Close #RF
Close #WF

msgString = "Done!! " & vbCrLf
msgString = msgString & linesRead & " lines read"
msgString = msgString & vbCrLf & linesWritten & " lines written"

MsgBox msgString

Exit_Err_Handler:
Exit Sub

Err_Handler:
On Error Resume Next
Close #RF
Close #WF
MsgBox Err.Description
Resume Exit_Err_Handler

End Sub
'----------------------------------------------------

HTH
 
Hit the wrong button....

In the lines

If CommaCount <> 2 Then
'save the current line
tmp = inpStr


change

tmp = inpStr

to

tmp = Trim(inpStr)
 
Back
Top