Regular Expression Help

  • Thread starter Thread starter NvrBst
  • Start date Start date
N

NvrBst

I want to match sections in a multilined string. For example

-----TEXT File-----
SECTION
hello this is my section
it can use SECTION or ENDSECTION if I want it to
and can be multi lined
ENDSECTION
SECTION
Another Section, can be any amount of sections in a file
ENDSECTION
-----EOF-----

AKA: I want a pattern which will match "^SECTION$" then anything but
"^ENDSECTION$" and then "^ENDSECTION$". I have the following: Note:
My "$" are actually "\r?$", but I kept them as "$" so they are easier
to read.

--Attempt One---
@"^SECTION$.*^ENDSECTION$", RegexOptions.Multiline |
RegexOptions.Singleline
--Problem--
It matches the first "^SECTION$" and the very last "^ENDSECTION$". I
want it to match the next "^ENDSECTION$" not the last.

--Attempt Two---
@"^SECTION$[^(ENDSECTION)]*^ENDSECTION$", RegexOptions.Multiline
--Problem--
The [^(ENDSECTION)] is matching like [^ENDSECTION]; the parenthesis
are not doing anything. "^" and "$" also have to go around it.


How would I match any character/word/etc except the pattern
"^ENDSECTION$", or maybe the word "\nENDSECTION\n" if it is easier?
Thanks.
 
I found out a solution:

@"^SECTION$.*?^ENDSECTION$", RegexOptions.Multiline |
RegexOptions.Singleline

Using ".*?" instead of ".*" makes it match as few as possible. I
could also probably do something like (^ENDSECTION$){1} at the end
instead. Thanks. If someone has a better way to do it, more
efficient, or other comments, feel free.
 
NvrBst said:
I want to match sections in a multilined string. For example

-----TEXT File-----
SECTION
hello this is my section
it can use SECTION or ENDSECTION if I want it to
and can be multi lined
ENDSECTION
SECTION
Another Section, can be any amount of sections in a file
ENDSECTION
-----EOF-----

AKA: I want a pattern which will match "^SECTION$" then anything but
"^ENDSECTION$" and then "^ENDSECTION$". I have the following: Note:
My "$" are actually "\r?$", but I kept them as "$" so they are easier
to read.

See example below.

Arne

=============================

using System;
using System.Text.RegularExpressions;

namespace E
{
public class Program
{
private static readonly Regex re = new
Regex(@"(?:SECTION\r\n)(.*?)(?:ENDSECTION\r\n)",

RegexOptions.Singleline | RegexOptions.Compiled);
public static void Parse(string s)
{
foreach(Match m in re.Matches(s))
{
Console.WriteLine("section=" + m.Groups[1].Value);
}
}
public static void Main(string[] args)
{
string s = @"SECTION
hello this is my section
it can use SECTION or ENDSECTION if I want it to
and can be multi lined
ENDSECTION
SECTION
Another Section, can be any amount of sections in a file
ENDSECTION
";
Parse(s);
Console.ReadKey();
}
}
}
 
Yup, the ".*?" solution has been working great for the current problem
I had, thanks. For future reference though, is there an easy way to
match anything but a word? Something like [^ ] except for words/
patterns instead characters? I remember doing something like "~
(WORD)" in a regular expression in the past to match anything except
"WORD", however, forgot what I did it in; it doesn't seem to work for
the .NET Regex class.

The problem I had is already solved, however, not matching something
seems to be the way I think about problems initially, and haven't been
able to figure it out in regular expression form other than expanding;
IE, anything but "IN" could be "(([^I]N) | (I[^N]) | ([^IN]))*" type
thing, which would be tedious for larger words.

Also, to correct myself, "(^ENDSECTION$){1}" wouldn't work for the
above after thinking about it a little more ;)
 
NvrBst said:
On Dec 29, 6:20 pm, Arne Vajhøj <[email protected]> wrote:

Yup, the ".*?" solution has been working great for the current problem
I had, thanks. For future reference though, is there an easy way to
match anything but a word? Something like [^ ] except for words/
patterns instead characters? I remember doing something like "~
(WORD)" in a regular expression in the past to match anything except
"WORD", however, forgot what I did it in; it doesn't seem to work for
the .NET Regex class.

The problem I had is already solved, however, not matching something
seems to be the way I think about problems initially, and haven't been
able to figure it out in regular expression form other than expanding;
IE, anything but "IN" could be "(([^I]N) | (I[^N]) | ([^IN]))*" type
thing, which would be tedious for larger words.

Also, to correct myself, "(^ENDSECTION$){1}" wouldn't work for the
above after thinking about it a little more ;)

If you have any control over the data format at all you may want to
consider storing the data in a database. The overhead of sqlite is very
low considering the advantages it offers.
 
Hello NvrBst,
Yup, the ".*?" solution has been working great for the current problem
I had, thanks. For future reference though, is there an easy way to
match anything but a word? Something like [^ ] except for words/
patterns instead characters? I remember doing something like "~
(WORD)" in a regular expression in the past to match anything except
"WORD", however, forgot what I did it in; it doesn't seem to work for
the .NET Regex class.

The problem I had is already solved, however, not matching something
seems to be the way I think about problems initially, and haven't been
able to figure it out in regular expression form other than expanding;
IE, anything but "IN" could be "(([^I]N) | (I[^N]) | ([^IN]))*" type
thing, which would be tedious for larger words.

Also, to correct myself, "(^ENDSECTION$){1}" wouldn't work for the
above after thinking about it a little more ;)


You can use negative look arounds for that:

(?:(?!word).)* would match anything but the word specified.
 
NvrBst said:
Yup, the ".*?" solution has been working great for the current problem
I had, thanks. For future reference though, is there an easy way to
match anything but a word? Something like [^ ] except for words/
patterns instead characters? I remember doing something like "~
(WORD)" in a regular expression in the past to match anything except
"WORD", however, forgot what I did it in; it doesn't seem to work for
the .NET Regex class.

The problem I had is already solved, however, not matching something
seems to be the way I think about problems initially, and haven't been
able to figure it out in regular expression form other than expanding;
IE, anything but "IN" could be "(([^I]N) | (I[^N]) | ([^IN]))*" type
thing, which would be tedious for larger words.

I think the reluctant qualifier is the way to do it.

Arne
 
Jesse said:
Hello NvrBst,
On Dec 29, 6:20 pm, Arne Vajhøj <[email protected]> wrote:
Yup, the ".*?" solution has been working great for the current problem
I had, thanks. For future reference though, is there an easy way to
match anything but a word? Something like [^ ] except for words/
patterns instead characters? I remember doing something like "~
(WORD)" in a regular expression in the past to match anything except
"WORD", however, forgot what I did it in; it doesn't seem to work for
the .NET Regex class.

The problem I had is already solved, however, not matching something
seems to be the way I think about problems initially, and haven't been
able to figure it out in regular expression form other than expanding;
IE, anything but "IN" could be "(([^I]N) | (I[^N]) | ([^IN]))*" type
thing, which would be tedious for larger words.

Also, to correct myself, "(^ENDSECTION$){1}" wouldn't work for the
above after thinking about it a little more ;)


You can use negative look arounds for that:

(?:(?!word).)* would match anything but the word specified.

I don't think that will work in this case.

Negative lookahead will not match if something is followed
by the word, but it is needed not to match if something is
the word.

Arne
 
Back
Top