importing text file, removing data and outputting new text file

  • Thread starter Thread starter Pal
  • Start date Start date
P

Pal

I have a 800 mb text file that is TAB DELIMITED. It has 25 columns of data.
I only want columns 4,5,13 & 24.

How could I read this file in line by line, remove the other columns
and then write out the included columns to a different text file name
with only 4 columns of data?

Thanks in advance
Pal
 
Pal,

Is it really 800 megabytes or really just 800 kilobytes? If is 800 kb,
simply open the file, delete the extra columns, and save the file as a text
file.

If it is the larger size, then you can use a macro, which I will help you
write if you post back.

HTH,
Bernie
MS Excel MVP
 
yes, 800 MB
This is some code I have from a previous project
But I am not sure how to write to tab delimited file
and how to find the tabs separaterd in the input file.
Thanks
Pal


Private Sub opentextfile()
Dim LineofText As String, page As String
Dim I As Integer
I = 0
page = "c:\ca.txt"
FileNum = FreeFile ' next free filenumber
Open page For Input As #FileNum
Do While Not EOF(1)
Line Input #FileNum, LineofText
Cells(1 + I, 1).Value = LineofText ' Not needed if outputting to text
file
I = I + 1
Loop
Close #1

End Sub
 
Pal,

Give the macro below a try. It should open your file line by line, and
write out just columns 4,5,13, and 24 into the tab-delimited file
"Output.txt".
It may take a while, so watch your atatus bar for updates on what line it is
processing.

HTH,
Bernie
MS Excel MVP

Option Explicit
Sub GetFourColumns()
Dim ResultStr As String
Dim FileName As String
Dim FileNumIn As Integer
Dim FileNumOut As Integer
Dim ResultsArray As Variant
Dim WholeLine As String
Dim Counter As Double

Counter = 0

FileName = Application.GetOpenFilename
If FileName = "" Then End

FileNumIn = FreeFile()
Open FileName For Input As #FileNumIn

FileNumOut = FreeFile()
Open "Output.txt" For Output Access Write As #FileNumOut

Do While Seek(FileNumIn) <= LOF(FileNumIn)
Counter = Counter + 1
Application.StatusBar = "Processing line " & Counter
Line Input #FileNumIn, ResultStr
ResultsArray = Split(ResultStr, vbTab)
'ResultsArray is a 0 based array of the tab-delimited values
WholeLine = ResultsArray(3) & vbTab & _
ResultsArray(4) & vbTab & _
ResultsArray(12) & vbTab & _
ResultsArray(23)
Print #FileNumOut, WholeLine
Loop

Close FileNumIn
Close #FileNumOut

Application.StatusBar = False

End Sub
 
Thanks Bernie
The program is only processing the first line of data
I ran it line by line and after the first line it ends.
So I don't know what LOf is - Length of File?

Do While Seek(FileNumIn) <= LOF(FileNumIn)

Pal
 
It only pulled in the 4 elements of the Header line of the file if that
makes a difference.
I think there may not be an end of line marker in the text file. (but the
pasted in data below looks like it does)
I ran a small 3 KB sample file and LOF(filenumln) = 2210
and seek(fileNumln) = 2211 were the watch values after the FIRST pass.
Then it broke out of the loop because 2211>2210

Maybe it needs some break in the loop after the 25th element.
here is a small sample of the text file I ran:

RC UFI UNI LAT LONG DMS_LAT DMS_LONG UTM JOG FC DSG PC CC1 ADM1 ADM2 DIM CC2
NT LC SHORT_FORM GENERIC SORT_NAME FULL_NAME FULL_NAME_ND MODIFY_DATE
1 -559843 -791985 75.45 -94.1666667 752700 -941000 VD67 NS15-01 H BAY CA
14 N ABANDONBAY Abandon Bay Abandon Bay 1993-12-14
1 -559844 -791986 50.7333333 -108.75 504400 -1084500 XB52 NM12-06 P PPL CA
11 N ABBEY Abbey Abbey 1993-12-14
1 -559845 -791988 49.05 -122.3 490300 -1221800 EV53 NM10-08 P PPL CA 02
N ABBOTSFORD Abbotsford Abbotsford 1993-12-14
1 -573004 -791987 45.4333333 -72.8833333 452600 -725300 XR63 NL18-09 P PPL
CA 10 V ABBOTSFORD Abbotsford Abbotsford 1993-12-14
1 -559845 -791989 49.05 -122.3 490300 -1221800 EV53 NM10-08 P PPL CA 02
V ABBOTTSFORD Abbottsford Abbottsford 1993-12-14
1 -559846 -791990 54.2333333 -113.0166667 541400 -1130100 UF61 NN12-04 P PPL
CA 01 N ABEE Abee Abee 1993-12-14

Thanks again
Pal
 
Pal,

Try this version, which will end when it reads a line of less than 24
elements. If you can have blanks lines in your data (which you don't appear
to have) then we'll have to try something else - other than you sending me
an 800 MB file by email ;-)

Otherwise, it is possible that you only _think_ you have a tab delimited
file. You could also try changing

ResultsArray = Split(ResultStr, vbTab)

to

ResultsArray = Split(ResultStr, " ")

in case your file is actually space delimited.

HTH,
Bernie
MS Excel MVP

Sub GetFourColumns2()
Dim ResultStr As String
Dim FileName As String
Dim FileNumIn As Integer
Dim FileNumOut As Integer
Dim ResultsArray As Variant
Dim WholeLine As String
Dim Counter As Double
Dim KeepGoing As Boolean

Counter = 0

FileName = Application.GetOpenFilename
If FileName = "" Then End

FileNumIn = FreeFile()
Open FileName For Input As #FileNumIn

FileNumOut = FreeFile()
Open "Output.txt" For Output Access Write As #FileNumOut

On Error GoTo Done:

KeepGoing = True

Do While KeepGoing
Counter = Counter + 1
Application.StatusBar = "Processing line " & Counter
Line Input #FileNumIn, ResultStr
ResultsArray = Split(ResultStr, vbTab)

'ResultsArray is a 0 based array of the tab-delimited values
WholeLine = ResultsArray(3) & vbTab & _
ResultsArray(4) & vbTab & _
ResultsArray(12) & vbTab & _
ResultsArray(23)
Print #FileNumOut, WholeLine
Loop

Done:
Close FileNumIn
Close #FileNumOut

Application.StatusBar = False

End Sub
 
Pal,

You may also need to use TRIM:

ResultsArray = Split(Application.Trim(ResultStr), " ")

in case you have multiple spaces.

HTH,
Bernie
MS Excel MVP
 
I don't know if you got the PM but I found that it is importing the entire
file in on the first line. That is why it stops after 1 loop.

IS there a way to count the number of tabs(ascii 9) as it is importing the
data
then break the line after 25 tabs? I am not sure how to import based on tab
counts. Maybe that will work.
 
Back
Top