Regex: Split string where line starts with known value?

  • Thread starter Thread starter Michael Lang
  • Start date Start date
M

Michael Lang

original file contents:

;; any alpha string at least 1 char long
[NAMESPACE], "Namespace", "[a-zA-Z].";
;;
;; must start with alpha, can contain numeric or alpha chars only
[CLASSNAME], "Class Name", "[a-zA-Z][a-zA-Z0-9]."
;;
;; must start with alpha, can contain numeric or alpha chars only
[KEYPROPERTY], "Key Property", "[a-zA-Z][a-zA-Z0-9]."
;;
#CODE_TEMPLATE_START
.... more lines here ...

I read this file in with:
System.IO.StreamReader sr = new System.IO.StreamReader(file);
_classTemplate = sr.ReadToEnd();
int iSplitStart = _classTemplate.IndexOf("#CODE_TEMPLATE_START");
_classHeader = _classTemplate.Substring(0, iSplitStart-1);

from locals window in debug mode:

_classHeader = ";; any alpha string at least 1 char long\r\n[NAMESPACE],
\"Namespace\", \"[a-zA-Z].\";\r\n;;\r\n;; must start with alpha, can
contain numeric or alpha chars only\r\n[CLASSNAME], \"Class Name\", \"[a-
zA-Z][a-zA-Z0-9].\"\r\n;;\r\n;; must start with alpha, can contain
numeric or alpha chars only\r\n[KEYPROPERTY], \"Key Property\", \"[a-zA-
Z][a-zA-Z0-9].\"\r\n;;\r"

Now I need to split this header by any line that starts with a comment
marked as ";;". This should give me lines such as:
[NAMESPACE], "Namespace", "[a-zA-Z].";

[CLASSNAME], "Class Name", "[a-zA-Z][a-zA-Z0-9]."

[KEYPROPERTY], "Key Property", "[a-zA-Z][a-zA-Z0-9]."

My regex to do this is:

_rgxHeaderParser = new Regex("^;;", RegexOptions.Multiline
& RegexOptions.Compiled);

I call it like this:

string[] props = _rgxHeaderParser.Split(_classHeader);

I get 2 strings, one is empty, and the other contains the entire contents
of _classHeader. How do I fix this Regex?

Doesn't "\r\n" mean a newline? If not how do I read the file so that
newlines in the file mean newline for a regex? I though the MultiLine
RegexOption would consider "^" as anytime that is the start of a line?
So why is it not splitting each time it finds ";;" at the beginning of a
line?
 
string[] lines = _classTemplate.Replace(";;", "~").Split('~');

This will have the effect you want from your Regex statement and is
much simpler, but you can do much better using Regex.For your regex
statement try something like:

"^\[\w+\].*?$"

And process any line that matches that expression, but if you just
want to process any line that does not start with ";;" you can use
something like this:

"^[^;]{2}.*?$"

Josh

Thanks for the help!

";;" could be somewhere else in a line. I only want to match when it is
the beginning of a line. It is basically a comment marker like "//" in
C#. So I need the Regex.

updated code...
======================================================================
Regex _rgxHeaderParser = new Regex("^[^;]{2}.*?$",
RegexOptions.Multiline);


System.IO.StreamReader sr = new System.IO.StreamReader(file);
_classTemplate = sr.ReadToEnd();
int iSplitStart = _classTemplate.IndexOf("#CODE_TEMPLATE_START");
_classHeader = _classTemplate.Substring(0, iSplitStart-1);


MatchCollection props = _rgxHeaderParser.Matches(_classHeader);
======================================================================

You'll be able to see this in action at:

https://sourceforge.net/projects/colcodegen/

(The project was just setup today, it may not be available from
everywhere today.)
 
The expression:

^[^;]{2}.*?$

will reject lines like ";some text" and "1" because it requires that there
be at least 2 characters in the line and neither of the first 2 characters
can be ";". To make sure that these cases are included (even though they
are probably not likely to occur), you could use zero-width negative
look-ahead:

^(?!;;).*?$



Brian Davis
(e-mail address removed)


Joshua Coady said:
string[] lines = _classTemplate.Replace(";;", "~").Split('~');

This will have the effect you want from your Regex statement and is much
simpler, but you can do much better using Regex.For your regex statement try
something like:

"^\[\w+\].*?$"

And process any line that matches that expression, but if you just want to
process any line that does not start with ";;" you can use something like
this:

"^[^;]{2}.*?$"

Josh

Michael Lang said:
original file contents:

;; any alpha string at least 1 char long
[NAMESPACE], "Namespace", "[a-zA-Z].";
;;
;; must start with alpha, can contain numeric or alpha chars only
[CLASSNAME], "Class Name", "[a-zA-Z][a-zA-Z0-9]."
;;
;; must start with alpha, can contain numeric or alpha chars only
[KEYPROPERTY], "Key Property", "[a-zA-Z][a-zA-Z0-9]."
;;
#CODE_TEMPLATE_START
... more lines here ...

I read this file in with:
System.IO.StreamReader sr = new System.IO.StreamReader(file);
_classTemplate = sr.ReadToEnd();
int iSplitStart = _classTemplate.IndexOf("#CODE_TEMPLATE_START");
_classHeader = _classTemplate.Substring(0, iSplitStart-1);

from locals window in debug mode:

_classHeader = ";; any alpha string at least 1 char long\r\n[NAMESPACE],
\"Namespace\", \"[a-zA-Z].\";\r\n;;\r\n;; must start with alpha, can
contain numeric or alpha chars only\r\n[CLASSNAME], \"Class Name\", \"[a-
zA-Z][a-zA-Z0-9].\"\r\n;;\r\n;; must start with alpha, can contain
numeric or alpha chars only\r\n[KEYPROPERTY], \"Key Property\", \"[a-zA-
Z][a-zA-Z0-9].\"\r\n;;\r"

Now I need to split this header by any line that starts with a comment
marked as ";;". This should give me lines such as:
[NAMESPACE], "Namespace", "[a-zA-Z].";

[CLASSNAME], "Class Name", "[a-zA-Z][a-zA-Z0-9]."

[KEYPROPERTY], "Key Property", "[a-zA-Z][a-zA-Z0-9]."

My regex to do this is:

_rgxHeaderParser = new Regex("^;;", RegexOptions.Multiline
& RegexOptions.Compiled);

I call it like this:

string[] props = _rgxHeaderParser.Split(_classHeader);

I get 2 strings, one is empty, and the other contains the entire contents
of _classHeader. How do I fix this Regex?

Doesn't "\r\n" mean a newline? If not how do I read the file so that
newlines in the file mean newline for a regex? I though the MultiLine
RegexOption would consider "^" as anytime that is the start of a line?
So why is it not splitting each time it finds ";;" at the beginning of a
line?

--
Michael Lang, MCSD
See my .NET open source projects
http://sourceforge.net/projects/dbobjecter (code generator)
http://sourceforge.net/projects/genadonet ("generic" ADO.NET)
 
Back
Top