VBA Excel: Opening Very Large Text Files

  • Thread starter Thread starter d.winkler
  • Start date Start date
D

d.winkler

Trying to open very large text files (200 MB+) using an Excel VBA
program.

I am first using the "Open text_filename For Input As #1" command to
open the file.

I then use a "Do While Not EOF(1)" loop and within that use "Line
Input #1" command to look at one line at at time and parse for a
specific text string

(If line has the desired text string, I am then printing the entire
line to a new text output file; the new output file will be much
smaller)


I tested my program on smaller text files and it works, but the 200 MB
huge files are causing Excel to lock up.

Questions:
- I thought the use of the "Line Input #1" would allow me to read one
line of text at a time? So even though text file is very large, I only
look at one line at at time.

- Or is "Open filename For Input" command still causing Excel to have
to open the entire 200 MB file?

- Is there some other way I can parse out the text lines I want from
such a huge text file?
 
Thanks.

Unfortunately it is not at all a column structure. It's basically a
big ugly logfile off a SUN server.

The only consistent structure is that every line begins with a date
and time stamp, but after the stamp it's just text (no consistent
commas or other delimiters).

I know what I really need to do: Learn Unix shell or perl scripting so
I can do the text parsing in Unix, and then bring the output file into
Excel.
 
There are text editors that can extract data.

I use one called ultraedit32. (http://www.ultraedit.com)

You can do Edit|find and one of the options is to list the lines that match the
search string. Click a button to copy to the clipboard and then paste to a new
file. Then save that file.

It might not be as automatic as you want, but it's really neat for the one time
shots.

IIRC, it has a 30 day evaluation period. (Or about $30 USD).
 
I know what I really need to do: Learn Unix shell or perl scripting so
I can do the text parsing in Unix, and then bring the output file into
Excel.

Ask, and you shall receive:

grep search_string source_file > output_file

(obviously, replace search_string, source_file, and output_file
accordingly.)
--
HTH -

-Frank
Microsoft Excel MVP
Dolphin Technology Corp.
http://vbapro.com
 
And there are versions of grep that have been ported to the windows platform.

Take a look at www.shareware.com for grep.

or maybe a DOS equivalent:

find /i "searchstring" source_file > output_file
(that looks kind of familiar???)

But I think I'd check to see how big a file DOS's Find will work with. (I don't
recall if there's a limit.)
 
Thanks for all the suggestions, I appreciate everyone taking the time.

Guess I have to open the can of worms further :-)

I was actually starting with a 600MB+ file and using grep to pull out
lines with desired text string.

But there are many, many lines with duplicate text strings (see
example below) and the resulting grep output was still 200MB+

For example, say that I have the following lines in the text file
(this is my 600MB file):

look at that dog1
look at that cat
look at that dog1
look at that mouse
look at that dog1
look at that dog2
look at that dog2
look at that bird
look at that dog3
look at that dog4
look at that dog4


I can easily grep out only those lines with "dog" in them and I get
this output (this is my 200MB file):

look at that dog1
look at that dog1
look at that dog1
look at that dog2
look at that dog2
look at that dog3
look at that dog4
look at that dog4


But I don't know of a way to use grep so that I also eliminate
duplicates so that output looks like this:

look at that dog1
look at that dog2
look at that dog3
look at that dog4


Any ideas (short of a Perl script)?

Thanks again.
 
Well, if Windows is your primary platform, I'd say Rob's ADO idea is going
to be your best bet.

If you can play around on the *nix side of things, in addition to grep,
you've got awk and sed. Since the data is starting there, you may want to do
as much of that raw processing as possible before moving it over to Excel,
and let Excel do what it's good at.
--
HTH -

-Frank
Microsoft Excel MVP
Dolphin Technology Corp.
http://vbapro.com
 
Back
Top