Regular expressions

  • Thread starter Thread starter Marc Scheuner [MVP ADSI]
  • Start date Start date
M

Marc Scheuner [MVP ADSI]

Folks,

I've started using regular expressions for parsing some data string
that come along - works quite nicely, however, as a newbie to RE, I'm
still struggling with some special cases:

1) I get strings representing decimal values, and they have the
following format:
* first, a plus or minus sign
* then, a number of numeric chars
* if the precision is > 0, then a dot and "precision" number of
numerical characters will follow

This last thing is what's killing me - how can I put the "conditial"
thing ("if precision > 0, then a dot and digits) into a single regular
expression?

What I have now is this:

[+-]\d{x,x}.\d{p,p}

which works fine, as long as I have a DECIMAL number with x digits
before and p digits after the comma (e.g. +120.45 if x=3 and p=2).

However, if p=0, this string won't be matched +120 since it doesn't
have the dot at the end.........

Any ideas??

2) For a date, which comes along as DD.MM.YYYY, I'd like to be able to
match both the cases where DD is either "01" or " 1" (leading 0 or
space). I tried various things, but nothing seems to work - it seems
if I allow it to have a leading whitespace, it'll also match "
01.12.2003" (leading whitespace and then two digits for the day) -
this is *not* what I want!

I've tried \d{2,2}.\d{2,2}.\d{4,4}, which works fine if the day is
specified with a leading zero - but will fail if I pass it "
1.12.2003". I also tried (\s\d|\d{2,2}).\d{2,2}.\d{4,4}, but as I
mentioned - in this case, all these dates are being matched:
"01.01.2003" - okay
" 1.01.2003" - okay
" 01.01.2003" - NOT okay (leading zero + 2 digits)

Any takers?

Thanks!
Marc

================================================================
Marc Scheuner May The Source Be With You!
Bern, Switzerland m.scheuner(at)inova.ch
 
Got another riddle:

how can I make a regular expression that will match all of the
following:

DECIMAL
DECIMAL(5)
DECIMAL(15, 5) (fifteen-comma-space-five)

Anyone? I've tried numerous expressions - either I get too much
(everything), or I get the two last ones (with the parenthesis) - but
I can't seem to make it work for all three cases in just one
expression.......

Thanks!
Marc

================================================================
Marc Scheuner May The Source Be With You!
Bern, Switzerland m.scheuner(at)inova.ch
 
For #1, I don't know of a way to match based on
arithmetical comparisons (like p>0), so you would probably
have to use two different expressions:

if p>0 -> [+-]\d{x}.\d{p}
if p=0 -> [+-]\d{x}

This expression will make sure that a result does not end
with only a decimal point, but it will accept -123 even if
p>0:

[+-]\d{x}(?(\.\d).\d{p})$


Additionally, if you just wanted to make sure that no more
than p digits followed the decimal point, you could use
this:

[+-]\d{x}(?(\.\d).\d{0,p})$


----------------------------------------
For #2, The last expression you listed:

(\s\d|\d{2,2}).\d{2,2}.\d{4,4}

matches corredtly. If you check the Match object's value,
it will not include the leading space if a leading zero
exists. If you want to make sure that the input string
contains only what you are trying to match, place ^
(beginning of string) at the beginning and $(end of
string) at the end:

^(\s\d|\d{2}).\d{2}.\d{4}$


Hope this helps,

Brian Davis
www.knowdotnet.com

-----Original Message-----
Folks,

I've started using regular expressions for parsing some data string
that come along - works quite nicely, however, as a newbie to RE, I'm
still struggling with some special cases:

1) I get strings representing decimal values, and they have the
following format:
* first, a plus or minus sign
* then, a number of numeric chars
* if the precision is > 0, then a dot and "precision" number of
numerical characters will follow

This last thing is what's killing me - how can I put the "conditial"
thing ("if precision > 0, then a dot and digits) into a single regular
expression?

What I have now is this:

[+-]\d{x,x}.\d{p,p}

which works fine, as long as I have a DECIMAL number with x digits
before and p digits after the comma (e.g. +120.45 if x=3 and p=2).

However, if p=0, this string won't be matched +120 since it doesn't
have the dot at the end.........

Any ideas??

2) For a date, which comes along as DD.MM.YYYY, I'd like to be able to
match both the cases where DD is either "01" or " 1" (leading 0 or
space). I tried various things, but nothing seems to work - it seems
if I allow it to have a leading whitespace, it'll also match "
01.12.2003" (leading whitespace and then two digits for the day) -
this is *not* what I want!

I've tried \d{2,2}.\d{2,2}.\d{4,4}, which works fine if the day is
specified with a leading zero - but will fail if I pass it "
1.12.2003". I also tried (\s\d|\d{2,2}).\d{2,2}.\d{4,4}, but as I
mentioned - in this case, all these dates are being matched:
"01.01.2003" - okay
" 1.01.2003" - okay
" 01.01.2003" - NOT okay (leading zero + 2 digits)

Any takers?

Thanks!
Marc

========================================================== ======
Marc Scheuner May The Source Be With You!
Bern, Switzerland m.scheuner(at) inova.ch
.
 
Marc,

The usual way to do what you want is to use the '?' quantifier, which means
"match 0 or 1 time". So, to add the plus or minus, you get:

(+|-)?\d+

plus or minus one or zero times followed by one or more digits.

Add in the decimal part:

(+|-)?\d+(\.\d+)?

adds in "." followed by one or more digits, match the whole thing zero or
one time.


Lastly, you might want to download my regular expression workbench at the
csharp site below. It will make playing with regex much easier.

--
Eric Gunnerson

Visit the C# product team at http://www.csharp.net
Eric's blog is at http://blogs.gotdotnet.com/ericgu/

This posting is provided "AS IS" with no warranties, and confers no rights.
 
The usual way to do what you want is to use the '?' quantifier, which means
"match 0 or 1 time". So, to add the plus or minus, you get:
Add in the decimal part:
(+|-)?\d+(\.\d+)?

Yeah, I stumbled across that after a while of *not* looking at my
regex :-) Thanks.
Lastly, you might want to download my regular expression workbench at the
csharp site below. It will make playing with regex much easier.

Excellent, thanks so much!

Marc

================================================================
Marc Scheuner May The Source Be With You!
Bern, Switzerland m.scheuner(at)inova.ch
 
Try this one:
^DECIMAL(\(\d+(, \d+)?\))?$

Thanks - it would work, trouble is, I also need to recognize other
types such as DATE, INTEGER, CHAR(x), VARCHAR(y) and so forth, and
they're not on a line of their own (so I can't use the ^ and $
delimiters).

I think I got it figured out by now - thanks for your input! Highly
appreciated.

Marc

================================================================
Marc Scheuner May The Source Be With You!
Bern, Switzerland m.scheuner(at)inova.ch
 
Back
Top