regexp - need help

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Hi,

I want to parse string like "<number><trailing-char><title>" where <number>
is a string containing digits and dots, <trailing-char> a whitespace or a
semicolon and <title> any chars.

I use a regexp (framework 1.1) to parse the strings. Here is the code:

static Regex reNumSepTitle = new
Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$");

public static string ExtractTitle(string str) {
string title = str.Trim();
m = reNumSepTitle.Match(title);
if ( m.Success ) {
return m.Result("${title}");
}
return title;
}

When I call the method with "1.1:\tHeading", it returns "Heading".

With "1.1.1.1:\tHeading", I get "\tHeading" (and the number part is '1.').

What's wrong ?

Thanks.
 
Try either naming or not naming all of your groups.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
To a tea you esteem
a hurting back as a wallet.
 
Thank you for your post.
I have tried both different naming and anonymous group, but it still doesn't
work.


Kevin Spencer said:
Try either naming or not naming all of your groups.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
To a tea you esteem
a hurting back as a wallet.


jppop said:
Hi,

I want to parse string like "<number><trailing-char><title>" where
<number>
is a string containing digits and dots, <trailing-char> a whitespace or a
semicolon and <title> any chars.

I use a regexp (framework 1.1) to parse the strings. Here is the code:

static Regex reNumSepTitle = new
Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$");

public static string ExtractTitle(string str) {
string title = str.Trim();
m = reNumSepTitle.Match(title);
if ( m.Success ) {
return m.Result("${title}");
}
return title;
}

When I call the method with "1.1:\tHeading", it returns "Heading".

With "1.1.1.1:\tHeading", I get "\tHeading" (and the number part is '1.').

What's wrong ?

Thanks.
 
I am sorry.
This not the exact behaviour. I haven't posted the exact snippet code (see
below)

Actually, before trying to parse the string using the regexp
@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$", the method try to parse the
string using a regexp where each group are separated by a control char (\x31).

Finally, the behaviour is the following:
With the string "1.1.\tHeading", the _first_ regexp matches the string and
the method returns ".\tHeading".

If I use a 'printable' ascii char (like '|' for example), everything works
fine.

Thanks.

--- code ---
private const string reUS = "\\u0031";

// RE used for parsing numbered title where UnitSep. is used
static Regex reNumTitle = new Regex(@"^(?<number>.+)" + reUS +
@"(?<trailing>.)*" + reUS + @"(?<title>.*)$");

// if not found with the previous RE, try '<num><sep><heading>' where sep is
a tab or a semicolon
static Regex reNumSepTitle = new
Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$");

public static string ExtractTitle(string str) {
string title = str.Trim();
Match m = reNumTitle.Match(title);
if ( !m.Success ) {
m = reNumSepTitle.Match(title);
}
if ( m.Success ) {
Debug.WriteLine("number:" + m.Result("${number}"));
Debug.WriteLine("Title:" + m.Result("${title}"));
if ( m.Groups.Count == 4 ) {
title = m.Groups["title"].Value;
}
}
return title;
}
 
Sorry again.

The mistake is a conversion error ! The control char I used is not a control
car (31, decimal notation is the code of the char '1'). 1F is better...

Finnaly, all work fine.

Thanks.

jppop said:
I am sorry.
This not the exact behaviour. I haven't posted the exact snippet code (see
below)

Actually, before trying to parse the string using the regexp
@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$", the method try to parse the
string using a regexp where each group are separated by a control char (\x31).

Finally, the behaviour is the following:
With the string "1.1.\tHeading", the _first_ regexp matches the string and
the method returns ".\tHeading".

If I use a 'printable' ascii char (like '|' for example), everything works
fine.

Thanks.

--- code ---
private const string reUS = "\\u0031";

// RE used for parsing numbered title where UnitSep. is used
static Regex reNumTitle = new Regex(@"^(?<number>.+)" + reUS +
@"(?<trailing>.)*" + reUS + @"(?<title>.*)$");

// if not found with the previous RE, try '<num><sep><heading>' where sep is
a tab or a semicolon
static Regex reNumSepTitle = new
Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$");

public static string ExtractTitle(string str) {
string title = str.Trim();
Match m = reNumTitle.Match(title);
if ( !m.Success ) {
m = reNumSepTitle.Match(title);
}
if ( m.Success ) {
Debug.WriteLine("number:" + m.Result("${number}"));
Debug.WriteLine("Title:" + m.Result("${title}"));
if ( m.Groups.Count == 4 ) {
title = m.Groups["title"].Value;
}
}
return title;
}



jppop said:
Hi,

I want to parse string like "<number><trailing-char><title>" where <number>
is a string containing digits and dots, <trailing-char> a whitespace or a
semicolon and <title> any chars.

I use a regexp (framework 1.1) to parse the strings. Here is the code:

static Regex reNumSepTitle = new
Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$");

public static string ExtractTitle(string str) {
string title = str.Trim();
m = reNumSepTitle.Match(title);
if ( m.Success ) {
return m.Result("${title}");
}
return title;
}

When I call the method with "1.1:\tHeading", it returns "Heading".

With "1.1.1.1:\tHeading", I get "\tHeading" (and the number part is '1.').

What's wrong ?

Thanks.
 
Try using '\x31' in your Regular Expression for control character 31.
Example:

^(?<number>.+)\x31(?<trailing>.)*\x31(?<title>.*)$

The character sequence you were using ('\\u0031') is Unicode. You may be
reading an ASCII document.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer
To a tea you esteem
a hurting back as a wallet.


jppop said:
I am sorry.
This not the exact behaviour. I haven't posted the exact snippet code (see
below)

Actually, before trying to parse the string using the regexp
@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$", the method try to parse the
string using a regexp where each group are separated by a control char
(\x31).

Finally, the behaviour is the following:
With the string "1.1.\tHeading", the _first_ regexp matches the string
and
the method returns ".\tHeading".

If I use a 'printable' ascii char (like '|' for example), everything works
fine.

Thanks.

--- code ---
private const string reUS = "\\u0031";

// RE used for parsing numbered title where UnitSep. is used
static Regex reNumTitle = new Regex(@"^(?<number>.+)" + reUS +
@"(?<trailing>.)*" + reUS + @"(?<title>.*)$");

// if not found with the previous RE, try '<num><sep><heading>' where sep
is
a tab or a semicolon
static Regex reNumSepTitle = new
Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$");

public static string ExtractTitle(string str) {
string title = str.Trim();
Match m = reNumTitle.Match(title);
if ( !m.Success ) {
m = reNumSepTitle.Match(title);
}
if ( m.Success ) {
Debug.WriteLine("number:" + m.Result("${number}"));
Debug.WriteLine("Title:" + m.Result("${title}"));
if ( m.Groups.Count == 4 ) {
title = m.Groups["title"].Value;
}
}
return title;
}



jppop said:
Hi,

I want to parse string like "<number><trailing-char><title>" where
<number>
is a string containing digits and dots, <trailing-char> a whitespace or a
semicolon and <title> any chars.

I use a regexp (framework 1.1) to parse the strings. Here is the code:

static Regex reNumSepTitle = new
Regex(@"^(?<number>[\d-\.]+)([:\s])+(?<title>.*)$");

public static string ExtractTitle(string str) {
string title = str.Trim();
m = reNumSepTitle.Match(title);
if ( m.Success ) {
return m.Result("${title}");
}
return title;
}

When I call the method with "1.1:\tHeading", it returns "Heading".

With "1.1.1.1:\tHeading", I get "\tHeading" (and the number part is
'1.').

What's wrong ?

Thanks.
 
Back
Top