Find/Replace in csv file - using VB

  • Thread starter Thread starter RJF
  • Start date Start date
R

RJF

I’ve been trying to figure this out for over a week and now I am desperate
for help! I don’t know VB very well.

I am importing a .csv file into Access. I have that working fine except
when there are carriage returns within the body of the text. Before I import
the file, I need Access to do a find and replace within the text file to
eliminate the carriage returns. BUT, I need the carriage returns at the end
of each line in the file to stay there.

Here’s a sample of the text in the .csv file.

This:

"1315841","milli
meters","1â€,"1315841","Default"
"1316985","False","

200","False","Default"

Should be this:

"1315841","millimeters","1","1315841","Default"
"1316985","False","200","False",â€Default"

This is what I’ve got to start off with:

strfilter = ahtAddFilterItem(strfilter, _
"All Files (*.*)", _
"*.*")
strinputfilename = ahtCommonFileOpenSave( _
Filter:=strfilter, OpenFile:=True, _
DialogTitle:="Please select an input file...", _
Flags:=ahtOFN_HIDEREADONLY)

Open strinputfilename For Random As #1

I know the part to select the file works because I use that to import the
file into Access. I got the Open statement from this forum but got really
confused after that.

Any help would be so greatly appreciated.

Thanks,
 
Actually, I wouldn't suggest opening this as a Random file at all.

If the file is relatively small (under about 100k, let's say), I would just
open the file, convert the whole thing then write it back out; if the file
is large, I would suggest opening two files, one incoming, one outgoing,
then deleting the original and copying the new one over it once you're done.
I'll go with the one-file version for now, since it's a bit easier.

The method I'm using below is probably not THE most optimized method, but it
should be fairly quick, and it's very light in terms of the actual coding
required.

Public Sub FixCSV(ByVal strFileName As String)
Dim strLine As String
Dim strLineBreak As String
Dim strWholeFile As String
Dim varLines As Variant

Dim i As Long
Dim intFileNum As Integer


'Always use FreeFile() instead of a fixed file number.
intFileNum = FreeFile()

'Get entire file in one shot.
Open strFileName For Binary Access Read As #intFileNum
strWholeFile = String$(LOF(intFileNum), 0)
Get #intFileNum, , strWholeFile
Close intFileNum

'Some CSV's use LF only, others use CR/LF; figure out which.
If InStr(1, strWholeFile, vbCrLf, vbBinaryCompare) > 0 Then
strLineBreak = vbCrLf
Else
strLineBreak = vbLf
End If

'Split it into lines.
varLines = Split(strWholeFile, strLineBreak)

'For each physical line, count number of quotes and figure out if it's
inside a field or not.
strLine = vbNullString
For i = LBound(varLines) To UBound(varLines)
strLine = strLine & varLines(i)
If UBound(Split(strLine, Chr$(34))) Mod 2 = 0 Then
strWholeFile = strWholeFile & strLine & strLineBreak
strLine = vbNullString
End If
Next

'Add line (whether partial or complete) to file.
strWholeFile = Join(varLines, vbNullString)

Open strFileName For Binary Access Write As #intFileNum
Put #intFileNum, , strWholeFile
Close intFileNum
End Sub



Rob
 
Woops...I really buggered up that code. Please ignore my previous code
sample (which may appear to work if you don't look too closely, but it
really really doesn't!) and use this one instead:

Public Sub FixCSV(ByVal strFileName As String)
Dim strLine As String
Dim strLineBreak As String
Dim strWholeFile As String
Dim varLines As Variant

Dim i As Long
Dim intFileNum As Integer


'Always use FreeFile() instead of a fixed file number.
intFileNum = FreeFile()

'Get entire file in one shot.
Open strFileName For Binary Access Read As #intFileNum
strWholeFile = String$(LOF(intFileNum), 0)
Get #intFileNum, , strWholeFile
Close intFileNum

'Some CSV's use LF only, others use CR/LF; figure out which.
If InStr(1, strWholeFile, vbCrLf, vbBinaryCompare) > 0 Then
strLineBreak = vbCrLf
Else
strLineBreak = vbLf
End If

'Split it into lines.
varLines = Split(strWholeFile, strLineBreak)

'For each physical line, count number of quotes and figure out if it's
inside a field or not.
strWholeFile = vbNullString
strLine = vbNullString
For i = LBound(varLines) To UBound(varLines)
strLine = strLine & varLines(i)
If UBound(Split(strLine, Chr$(34))) Mod 2 = 0 Then
strWholeFile = strWholeFile & strLine & strLineBreak
strLine = vbNullString
End If
Next

Open strFileName For Binary Access Write As #intFileNum
Put #intFileNum, , strWholeFile
Close intFileNum
End Sub
 
Note that this code *could* chop off the last line if it ends with a break
inside a field, but I'm sort of assuming that won't happen. If you want to
be sure, then after the For...Next loop, add the following line:

If strLine <> vbNullString Then strWholeFile = strWholeFile & strLine


Rob
 
Hi Robert,

Thank you so much for replying so quickly. I'm going out of my mind.

Where would I put the code you gave me? Before I was putting the code I was
trying to use in the "On Click" event of a button I added to one of my forms.

I feel like an idiot. Please be patient with me.
 
You have two choices:

a) put the code exactly as is into your form or into any module, then after
you get the filename, simply call my code with FixCSV strinputfilename

b) strip off the first and last lines and just replace "strFileName" with
"strinputfilename" throughout everything I gave you. Then you could drop it
right into the "On Click" code.


Rob
 
Hi Rob,

I tried it both ways. Something appears to be happening but it just keeps
running. I waited about 15 minutes before I stopped it. Am I being too
impatient?

The .csv file is 22mb. When I run the code it's using between 110 and 140
mb ram.

This is the exactly code that I'm using now:

Private Sub Command13_Click()

Dim strLine As String
Dim strLineBreak As String
Dim strWholeFile As String
Dim varLines As Variant

Dim i As Long
Dim intFileNum As Integer

Dim strfilter As String
Dim strinputfilename As String

strfilter = ahtAddFilterItem(strfilter, _
"All Files (*.*)", _
"*.*")
strinputfilename = ahtCommonFileOpenSave( _
Filter:=strfilter, OpenFile:=True, _
DialogTitle:="Please select an input file...", _
Flags:=ahtOFN_HIDEREADONLY)

'Always use FreeFile() instead of a fixed file number.
intFileNum = FreeFile()

'Get entire file in one shot.
Open strinputfilename For Binary Access Read As #intFileNum
strWholeFile = String$(LOF(intFileNum), 0)
Get #intFileNum, , strWholeFile
Close intFileNum

'Some CSV's use LF only, others use CR/LF; figure out which.
If InStr(1, strWholeFile, vbCrLf, vbBinaryCompare) > 0 Then
strLineBreak = vbCrLf
Else
strLineBreak = vbLf
End If

'Split it into lines.
varLines = Split(strWholeFile, strLineBreak)

'For each physical line, count number of quotes and figure out if it's
inside a field or not.
strWholeFile = vbNullString
strLine = vbNullString
For i = LBound(varLines) To UBound(varLines)
strLine = strLine & varLines(i)
If UBound(Split(strLine, Chr$(34))) Mod 2 = 0 Then
strWholeFile = strWholeFile & strLine & strLineBreak
strLine = vbNullString
End If
Next

Open strinputfilename For Binary Access Write As #intFileNum
Put #intFileNum, , strWholeFile
Close intFileNum

End Sub

Did I miss something?

I really appreciate you helping me.
 
Okay, if it's that large, we should probably take a different approach. It
SHOULD still work as-written...eventually...but at that size, VB/VBA would
definitely be running into memory-management issues, undoubtedly slowing it
down immensely. Try the following code instead. It sends the output to a
temp file, then when it's done, it deletes the original and puts the temp
file in its place. Be forewarned, it will create and delete the temp file
without regard to whether a file with the same name already exists or not.
The temp file uses the same name as the original, adding ".tmp" to the end,
which I don't imagine would be a huge problem.

For increased speed, you can try playing with the clngChunkSize value. I
found that around 2000 seemed to be the "sweet spot", though there was very
little difference between about 500 and 20000 for me. Numbers significantly
lower or higher than that began to slow it down, though. At 2048 (I was
working in powers of 2 originally), it did a 22 Meg file in about 8 seconds
on my machine.

Public Sub Command13_Click()
Const clngChunkSize As Long = 2000
Dim strLine As String
Dim strLineBreak As String
Dim strChunk As String
Dim strChunkOut As String
Dim varLines As Variant

Dim i As Long
Dim intFileNumIn As Integer
Dim intFileNumOut As Integer

Dim strfilter As String
Dim strinputfilename As String

Dim blnFirstChunk As Boolean

strfilter = ahtAddFilterItem(strfilter, _
"All Files (*.*)", _
"*.*")
strinputfilename = ahtCommonFileOpenSave( _
Filter:=strfilter, OpenFile:=True, _
DialogTitle:="Please select an input file...", _
Flags:=ahtOFN_HIDEREADONLY)

intFileNumIn = FreeFile()
Open strinputfilename For Binary Access Read As #intFileNumIn

intFileNumOut = FreeFile()
On Error Resume Next
Kill strinputfilename & ".tmp"
On Error GoTo 0
Open strinputfilename & ".tmp" For Binary Access Write As #intFileNumOut

strChunk = String$(clngChunkSize, 0)
strLine = vbNullString
blnFirstChunk = True
Do Until EOF(intFileNumIn)
Get #intFileNumIn, , strChunk
strChunk = strLine & strChunk

If blnFirstChunk Then
If InStr(1, strChunk, vbLf, vbBinaryCompare) > 0 Then
If InStr(1, strChunk, vbCrLf, vbBinaryCompare) > 0 Then
strLineBreak = vbCrLf
Else
strLineBreak = vbLf
End If
blnFirstChunk = False
End If
End If

varLines = Split(strChunk, strLineBreak)

If UBound(varLines) > LBound(varLines) Then
strChunkOut = vbNullString
strLine = vbNullString
For i = LBound(varLines) To UBound(varLines) - 1
strLine = strLine & varLines(i)
If UBound(Split(strLine, Chr$(34))) Mod 2 = 0 Then
strChunkOut = strChunkOut & strLine & strLineBreak
strLine = vbNullString
End If
Next
Put #intFileNumOut, , strChunkOut
End If
strLine = varLines(UBound(varLines))
Loop
Put #intFileNumOut, , strLine
Close intFileNumOut
Close intFileNumIn

Kill strinputfilename
Name strinputfilename & ".tmp" As strinputfilename
End Sub


Rob
 
Woops...I changed my Sub to Public for testing from within a module instead
of a form; you should switch it back to Private.


Rob
 
Hi Rob,

First I'd like to thank you so much for helping me. You have no idea how
much I appreciate it.

If I manually go into the file before I run the code and enter a hard return
in the middle of the line, it works beautifully and eliminates the return.
(It doesn't create a .tmp file, the changes are in the .csv file which is
fine.)

But as far as the "characters" for the hard return, they are still there.
They look like small boxes within the text. When I copy and paste this text
(as I did in the earlier posting), it shows them as hard returns. And when I
import the file into Access it treats them as hard returns.

Don't get me wrong, the code you gave me will definitely help us out with
other problems we've been having. My boss will be ecstatic when I tell him
about what you sent us.

I know I'm asking a lot, but is it possible to email you the .csv file that
I'm working with so you can see the "characters" that I'm seeing?

Thank you so much for everything.

Rachel
 
I suspect what's happening is that when YOU enter a carriage return, it's
putting in a CR/LF combination; in the original CSV file, it's probably only
a LF (or vice versa). You could override that by commenting out or removing
all the lines immediately surrounding the "strLineBreak = vbLf" line (i.e.,
from the "If blnFirstChunk Then" to the matching "End If". If I'm right,
then that should leave you only an incorrect character at the one that you
put in by-hand.

If that doesn't fix it, then it's the "vice versa" situation I mentioned
above, and you should change vbLf to vbCrLf.

And if THAT doesn't fix it, then yes, by all means, e-mail me your file at
the address I'm sending from (removing the "no freakin spam" thing, of
course). Don't forget to ZIP or RAR it first, or it'll probably get
rejected by my ISP as being too big.


Rob
 
Rob,

I'm posting the code that you wrote for me so that others can benefit from
it as well.

Once again, you are awesome! Thank you so much for your help.

Const clngChunkSize As Long = 2000
Dim strLine As String
Dim strLineBreak As String
Dim strChunk As String
Dim strChunkOut As String
Dim varLines As Variant

Dim i As Long
Dim intFileNumIn As Integer
Dim intFileNumOut As Integer

Dim strfilter As String
Dim strinputfilename As String

' Dim blnFirstChunk As Boolean
strfilter = ahtAddFilterItem(strfilter, _
"All Files (*.*)", _
"*.*")
strinputfilename = ahtCommonFileOpenSave( _
Filter:=strfilter, OpenFile:=True, _
DialogTitle:="Please select an input file...", _
Flags:=ahtOFN_HIDEREADONLY)

intFileNumIn = FreeFile()
Open strinputfilename For Binary Access Read As #intFileNumIn

intFileNumOut = FreeFile()
On Error Resume Next
Kill strinputfilename & ".tmp"
On Error GoTo 0
Open strinputfilename & ".tmp" For Binary Access Write As #intFileNumOut

strChunk = String$(clngChunkSize, 0)
strLine = vbNullString
strLineBreak = vbLf

Do Until EOF(intFileNumIn)
Get #intFileNumIn, , strChunk
strChunk = Replace(strLine & strChunk, vbCrLf, vbLf)

varLines = Split(strChunk, strLineBreak)
strLine = vbNullString
If UBound(varLines) > LBound(varLines) Then
strChunkOut = vbNullString
For i = LBound(varLines) To UBound(varLines) - 1
strLine = strLine & varLines(i)

If UBound(Split(strLine, Chr$(34))) Mod 2 = 0 Then
strChunkOut = strChunkOut & strLine & strLineBreak
strLine = vbNullString
End If
Next
Put #intFileNumOut, , strChunkOut
End If
strLine = strLine & varLines(UBound(varLines))
Loop
Put #intFileNumOut, , Left$(strLine, InStr(strLine & Chr$(0), Chr$(0)) - 1)
Close intFileNumOut
Close intFileNumIn

Kill strinputfilename
Name strinputfilename & ".tmp" As strinputfilename

Thank you so much for your help.
 
Back
Top