Search in a file

  • Thread starter Thread starter Chris R.
  • Start date Start date
C

Chris R.

I'm trying to do something relatively simple - find the offset of the
first - or next - occurance of a string in a file, ideally in a case-
insensitive way. I've seen a few solutions that seemed to involve
loading the whole file as a string, but that seems to be limiting the
size of the file stream that could be handled.

Is there a way to do searching with just the stream? Or do i need to
start loading the file into string buffers of limited size and
searching those? I'd rather not, since handling possible overlap would
be a pain, but...

Anyway, hope to see help here soon.

Thanks.
 
Simply read a buffer from the file in a string. Try to find your string with
buffer.IndexOf(searchString). seek searchString.Lenght-1 backwards in your
file to avoid overlaps and repeat the procedure.
 
If you are working with text files, you should use "readers" (TextReader,
StreamReader) rathern than bare "streams". Then, you can read line by line,
which should solve your problem if the patterns that you are looking for
don't span lines.

Readers/Writers are designed for text I/O.
Streams are designed for binary I/O.

Bruno.
 
I am. So, basically, what i want to do is read the file in line by
line seeking the string, and then to get to the string in the stream, i
just keep track of the lengths of the lines and the offset into the
matching line, seek so many bytes into the stream, and that should put
me on the start of the substring that i'm searching for?

Because i need to find this string in order to pass it as an offset
into another method that will be working on the stream. Or is it
possible to mark the stream?

Heck, even better, if i have found the substring in the stream, can i
just push the line back onto it and then work with another reference to
the same stream?
 
Well, you have to be careful if you want to use byte offsets to identify
substrings in a file:

* you have to know how lines are terminated (CRLF or LF alone).
* you have to know the encoding. If your text only contains ASCII you are on
the safe side but if it contains accentuated chars and if you save it in
UTF-8 (the default for .NET), some chars will take 2 bytes (even 3 if you
have Chinese chars).
* you will have to mix binary (stream) and text (reader) I/O because the
reader API won't let you seek to a given byte offset.

I don't know what you are really trying to achieve, but it may be easier for
you to keep track of locations by line number + char offset in line than by
byte offset, or to isolate the reader I/O somewhere and pass strings (rather
than the reader itself) to your other methods. Mixing text reader and byte
offsets is just awkward.

Text readers won't let you "push the line back". If you want to do this, you
have to open the file in binary mode (as a stream), save byte offsets and
and seek back to them later, but then, you have to deal with line separators
and encoding yourself.

Bruno.
 
What i'm trying to do is look for the first occurance of a string,
"BEGIN:VCARD" in this case, in a stream. then i want to parse the
vCard at that point, and then search for every subsequent occurance of
a vCard in the file after the termination of this one.

as per the official spec.

I'll admit, i'm at a bit of a loss with this one.

Any suggestions?
 
Back
Top