Fastest Way to search for a string in a large text file (75 to 100mb)

  • Thread starter Thread starter Clinto
  • Start date Start date
C

Clinto

Hi,
I am trying to find the fastest way to search a txt file for a
particular string and return the line that contains the string. I have
so for just used the most basic method. Initialized a variable as
IO.streamreader. Read each line and perform an if-then to see if
var.contains(mystring) is true or false. if true I get my string if
false it reads the next line. This takes for ever. Is there anything I
can do to speed this up?
Thanks.
 
Clinto said:
Hi,
I am trying to find the fastest way to search a txt file for a
particular string and return the line that contains the string. I have
so for just used the most basic method. Initialized a variable as
IO.streamreader. Read each line and perform an if-then to see if
var.contains(mystring) is true or false. if true I get my string if
false it reads the next line. This takes for ever. Is there anything I
can do to speed this up?
Thanks.

if the file is only a 100mb... Then, seriously, I would just read
the entire file at once and process it in memory. If you read line-by-
line, then you are going to hit the disk a lot, and that will really
slow you down...

If you don't want to do it that way, then you might want to read the
file in chunks as binary data - then convert your bytes to strings and
do yor compares... of course, that is going to make it a little
tricky because you might end up in the middle of a line....
 
if the file is only a 100mb... Then, seriously, I would just read
the entire file at once and process it in memory. If you read line-by-
line, then you are going to hit the disk a lot, and that will really
slow you down...

If you don't want to do it that way, then you might want to read the
file in chunks as binary data - then convert your bytes to strings and
do yor compares... of course, that is going to make it a little
tricky because you might end up in the middle of a line....

I agree Tom, about 100mb is a huge size for a text file, reading it as
raw then converting each byte to string is a good idea, but the key
point is how to do it programmaticaly :-)
 
kimiraikkonen said:
I agree Tom, about 100mb is a huge size for a text file, reading it as
raw then converting each byte to string is a good idea, but the key
point is how to do it programmaticaly :-)

The most efficient way would presumably be to read the entire file into a
single string using IO.File.ReadAllText and see whether your search string
is contained within the file at all (which you can then do using a single
call to .Contains). If it't not there then there's no point trying to work
out which line it's on, and you can stop looking any further straight away.

If you do find the search string, you can count the line breaks that appear
before the search string to work out which line it's on.

HTH,
 
(O)enone said:
The most efficient way would presumably be to read the entire file into a
single string using IO.File.ReadAllText and see whether your search string
is contained within the file at all (which you can then do using a single
call to .Contains). If it't not there then there's no point trying to work
out which line it's on, and you can stop looking any further straight away.

If you do find the search string, you can count the line breaks that appear
before the search string to work out which line it's on.

HTH,

I would use System.IO.File.ReadAllLines(Filename), because this returns the
lines split out for you. You just loop through the array of individual lines
in the array.
 
Family said:
I would use System.IO.File.ReadAllLines(Filename), because this
returns the lines split out for you. You just loop through the array
of individual lines in the array.

I did originally write the same thing in my message but then chose to remove
it before I posted it. I think the ReadAllText approach may be quicker
because you can check whether the string exists at all without having to
loop... You could them possible determine the line by using a call to
Replace() on the string prior to the search result position, changing the
two-character line break with a one-character replacement string, and then
see how much smaller the string has got; the number of characters it reduces
by will be the line count.

Maybe needs someone to try it to see which is more efficient.
 
I did originally write the same thing in my message but then chose to remove
it before I posted it. I think the ReadAllText approach may be quicker
because you can check whether the string exists at all without having to
loop... You could them possible determine the line by using a call to
Replace() on the string prior to the search result position, changing the
two-character line break with a one-character replacement string, and then
see how much smaller the string has got; the number of characters it reduces
by will be the line count.

Maybe needs someone to try it to see which is more efficient.

Thanks everyone, I appreciate the responses. I tried several methods,
ReadAllText, io.filestream, readallLines and all seem about the same.
It became apparent that I am also fighting a slow server connection,
which increases the time to open the files.
 
Clinto,

Use the Visual Basic Find as that is optimized for strings, any other method
will go slower, just because those are optimized for characters.

Cor
 
Back
Top