RegularExpressions

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Hello,

I have a log file that looks something like this

ABCDEF ddfs adasd
A BRE asdd asd dfddf
EROI DFIOU eeroo
B BRE errt ssdrr
AAA eIR DFDF
C BRE AAA asdd

All lines are seperated by NEWLINE (\r\n) I want to extract lines that start
with BRE all the way to the end of the line and put them into a collection or
an array. So in this case I want line 2,4,6

Does any of you RegularExpressions gurus have an idea?

Thank you
 
In your example, you said line starts with BRE but at the end you said you
want lines 2,4, and 6. So I am assuming you mean to say line containing BRE.
In that case after you have open the log file, read a line and do a string
match to see whether that line contains BRE and if its true then store that
line into an array or collection.

Regards
Sanjib
 
Thats the thing...Im asking for a RegularExpression pattern. I know i can
just loop through all lines and use IndexOf and Substring...but I have a huge
file and it will take forever. That is why Im asking if anyone has more
experience with RegularExpressions cause it is new to me.

Also yes, I said I want to extract all lines that start with BRE. So in my
example I want lines

BRE asdd asd dfddf
BRE errt ssdrr
BRE AAA asdd

Notice how I dont want the first character (or it could be more than 1
character) I just want to get the line that starts with BRE to the end of the
line.

Thank you very much for your time

Serge
 
Hi Serge,

I'm a little confused by your first and second example. You mentioned
something about not wanting the "first letter," and in your first example,
the lines with "BRE" in them all started with a single letter, but the other
lines did not, and in your second example, the lines did *not* start with a
first letter.

Be that as it may, I know you're chomping at the bit to use regular
expressions here, but in this case you don't want to use a Regular
Expression, even though it would be easy enough to write. Why? Because you
said "I have a huge file." Regular Expressions work with strings, and I
don't think that (1) you want to read a "huge file" into a single string,
and (2) use a regular expression on a string that large.

In fact, from what you've described about the size of the file, and wanting
to parse by line, your best bet (IMHO) would be to use a TextReader to read
the file one line at a time, and use String.IndexOf to evaluate whether or
not to include that line in your results. You could, for example, use a
single character array to read the lines into one at a time.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

A lifetime is made up of
Lots of short moments.
 
Thank you for your response. I already tried using it on a PER LINE basis. It
just takes too long. But HUGE file I mean about 10-50 MB. The example I
showed is not the actual file. I was just trying to get a sense of what
pattern to use as RegEx is a lot fast string manipulation then IndexOf,
Substring thing. Basically here are 2 lines from the actual log file.


3148:48 BRE: 17 13 51: penelope baraz King-Ice1 12:21AM
3148:48 ALD: 17 13 51: penelope baraz King-Ice1 12:26AM
3148:48 BRE: 17 13 51: penelope baraz King-Ice1 12:34AM
3148:48 LLD: 17 13 51: penelope baraz King-Ice1 12:45AM

As you can see I have 4 lines in this example but i want to extract lines:

BRE: 17 13 51: penelope baraz King-Ice1 12:21AM
BRE: 17 13 51: penelope baraz King-Ice1 12:34AM

Notice that before the word BRE there are some other info that I dont want.

Looping through all lines takes too much time however it takes only a sec or
2 to read it into a string variable so I dont think its a problem.

Thank you very much.
 
Hi Serge,

I see. Well, parsing it is likely to add some memory to the equation, but
you could read it in blocks if necessary. I still think a Regular Expression
would not be the way to go, though. Regular Expressions do some
backtracking, and I think that wouldn't be necessary. How about if you read
a block (or the whole) into an array of characters? You could then move
through the array one character at a time. The sequence would be a loop (in
pseudo-code):

Start at the beginning of the string, or at the first line break character
or sequence ("\r\n" or '\r' - depending on the document type).Read one
character at a time.
Find the character 'B'.
See if it is followed by 'R'.
If so, see if it is followed by an 'E'.
If all 3 are found in a row, read to the next line break.

Basically, that is what a regular expression does, but in a more roundabout
fashion, with backtracking, etc., because it is not looking for literal
characters, but for patterns. Since you're looking for literal characters in
a specific sequence, this solution would be faster, especially if you used a
pointer.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

A lifetime is made up of
Lots of short moments.
 
Thank you for your response. I will try to do it by using an array of
characters. Meanwhile, could you show me the RegEx patter that I can use? I
just want to compare the 2 approaches and use the faststest of the 2.

Thank you very much
 
Hi Serge,

Here you go:

(?m)BRE.*$

Explanation:

"(?m)" means caret (^) and dollar sign ($) match at line breaks.
Match the letters "BRE" followed by zero or more characters that are not
line break (.*), followed by a line break or the end of the string ($).

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

A lifetime is made up of
Lots of short moments.
 
Thank you very much for your help. Ive tried using your pattern but for some
reason I didnt return any matches. But after playing around with it I finally
came up with a patternt that works.

BRE.*?\n

Also it works about 10 times faster then looping through each line. Thank
you very much Keving for inspiring me :)

Serge
 
:-D

--

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

I recycle.
I send everything back to the planet it came from.
 
Back
Top