Find postion a string in a large text file

  • Thread starter Thread starter John
  • Start date Start date
J

John

Hi I have just started to look at C#, so sorry if this is
a FAQ.
I have a several large text files (100MB). I need to find
the start postion of a string in these files. The reason
is that I just want to display a specific section of the
file to the user.

The file looks someting like this

texttext texttexttext text text
texttext texttexttext text text
texttext texttexttext text text
texttext texttexttext text text
texttext texttexttext text text
texttext texttexttext text text

<START_OF_SECTION:1>
texttext texttexttext text text
texttext texttexttext text text
texttext texttexttext text text
texttext texttexttext text text
<END_OF_SECTION:1>

The file is not xml it just uses tags like this to separte
the sections.
 
Posting to my own thread. I found that the following works
But I wonder if I should do it in a different way or if
this is the fastest.

FileStream fs = new FileStream(st, FileMode.Open,
FileAccess.ReadWrite);

StreamReader r = new StreamReader(fs);

Response.Write("<html>");
Response.Write("<body>");

i=0;

String result = r.ReadLine();
while(result !=null)
{

if(result.StartsWith("<START_OF_SECTION:1>"))
i=1;
if(result.StartsWith("<END_OF_SECTION:1>"))
i=0;
if(i==1)
{
Response.Write(result);
Response.Write("<br/>");
}
result = r.ReadLine();
}

Response.Write("</body>");
 
John said:
Posting to my own thread. I found that the following works
But I wonder if I should do it in a different way or if
this is the fastest.

Assuming the start/end tags are always at the start of the line, this
is a reasonable approach. A few suggestions though:

Do you really need to open the stream for read/write access? I can't
see why you'd need to do this immediately. If not, I'd suggest just
using:

Response.Write ("<html>");
Response.Write ("<body>");
using (StreamReader reader = new StreamReader (st))
{
String line;
boolean writing = false;
while ( (line=reader.ReadLine()) != null)
{
if (line.StartsWith ("<START_OF_SECTION:1>"))
writing=true;

if (writing)
{
Response.Write (line);
Response.Write ("<br/>");
}

if (line.StartsWith ("<END_OF_SECTION:1>"))
writing=false;
}
}
Response.Write ("</body>");
Response.Write ("</html>");

Note that the above means you get the lines with both tags in - your
own code would only include the start tag. It's only got the call to
ReadLine once, which helps (IMO) at the slight cost of having a minorly
complicated idiom (which is very commonly used though - you get used to
it very quickly) for the while condition.

Having the boolean variable with a meaningful name just helps
readability in general.

The using(...) construct means that your reader gets closed even if an
exception occurs.

You might also consider explicitly stating what encoding you expect the
file to be in, when you construct the StreamReader. If it's UTF-8 then
it won't change behaviour, but it'll make it more obvious when reading
the code.
 
John said:
Hi I have just started to look at C#, so sorry if this is
a FAQ.
I have a several large text files (100MB). I need to find
the start postion of a string in these files. The reason
is that I just want to display a specific section of the
file to the user.

The file looks someting like this

texttext texttexttext text text
texttext texttexttext text text
texttext texttexttext text text
texttext texttexttext text text
texttext texttexttext text text
texttext texttexttext text text

<START_OF_SECTION:1>
texttext texttexttext text text
texttext texttexttext text text
texttext texttexttext text text
texttext texttexttext text text
<END_OF_SECTION:1>

The file is not xml it just uses tags like this to separte
the sections.

If you have control of the file format, I recommend that you dedicate
some block in the beginning as a structure pointing to different
locations within the file so that you dont have to read through the
whole file to find the start of the section. Of course, if you dont have
a choice, you probably need to do what you proposed in the other post.
 
Back
Top