Regular Express Pattern Help!

  • Thread starter Thread starter Drew
  • Start date Start date
D

Drew

We have a string like: 2004\05\05T00:00:00

We would like to remove all "\", any "-" and the T00:00:00 part of the
string.

The end result would be 20040505.

I know that if I use [\\-] as the pattern it will match on everything
except the final T00 part.

Could anyone please help us with this pattern.

TIA

Drew Mace
 
I would just use a .Replace and SubString methods

So,

myString.Replace("\","").Replace("-","").SubString(0,8)

We have a string like: 2004\05\05T00:00:00

We would like to remove all "\", any "-" and the T00:00:00 part of the
string.

The end result would be 20040505.

I know that if I use [\\-] as the pattern it will match on everything
except the final T00 part.

Could anyone please help us with this pattern.

TIA

Drew Mace
 
Hi,

The following VBScript shows how you might make it work.

StringToSearch = "2004\5\05T00:00:00"
Set RegularExpressionObject = New RegExp
With RegularExpressionObject
.Pattern = "\\|T..:..:.."
.IgnoreCase = True
.Global = True
End With
MsgBox RegularExpressionObject.Replace( StringToSearch, "" )

Of course whether this will work for all your cases depends on how fixed
the structure of the input is.

Regards,

- Bruce.





William Ewart Gladstone said:
I would just use a .Replace and SubString methods

So,

myString.Replace("\","").Replace("-","").SubString(0,8)

We have a string like: 2004\05\05T00:00:00

We would like to remove all "\", any "-" and the T00:00:00 part of the
string.

The end result would be 20040505.

I know that if I use [\\-] as the pattern it will match on everything
except the final T00 part.

Could anyone please help us with this pattern.

TIA

Drew Mace
 
To match the backslashes and the T... part, you can match "\\|-|(T.*)"
However, I'd suggest replacing "(\d\d\d\d)\\(\d\d)\\(\d\d)T\d\d:\d\d:\d\d"
with "$1$2$3".
Using this expression you can be sure you won't mess up data that's not
formatted as expected.

Niki
 
It would be much easier to read your code if you didn't use regular
expressions. How about this idea:

string sourcestring = "2004\05\05T00:00:00";

string[] listbits = sourcestring.split('T');
string[] datebits = listbits[0].split('\');
string newdate = String.Join(datebits);

*caveat: air code... I didn't compile this in VS first.

That's pretty easy code, and it will work as long as the original data HAS a
"T" in it. (Fails otherwise).

Hope this helps,
--- N
 
I have to disagree with you: If you think 3 lines are more readable than 1
line, that's probably because you never learned regular expressions.

A few advantages of regexes:
- if you do understand them they are easier to read than c# code, because:
- they have no branches, exceptions, loops, etc, everything's
"straightforward" (quite literally)
- instead of specifying the transformation rules (which are pretty hards to
understand if you don't know what the source string will look like) you
specify the input pattern and the target output - that's high-level
programming;
- they can be compiled, and are often faster than a c# equivalent (make your
own timings if you don't trust me)
- they don't produce exceptions or endless loops due to bad/unexpected data
- they can easily be separated from the main program and extensively tested
(e.g. in Expresso)
- they don't have side-effects, i.e. the pattern can't mess around with any
global/local variables (yes, this is a BIG source for errors!)
- using regex'es it's far easier to create "picky" code, that doesn't
produce bad results from bad input; Your code e.g. would happily take
"Hello\\how\\are\\you\\Today" and transform it to "Hellohowareyou" - I
wouldn't want that in the "Date" column of my database...
(plus a few more good reasons I can't think of right now)

I agree with you that regex'es are often mis-used when "string.split" would
have done the same, but THIS kind of problem is what they have actually been
MADE for.

Niki


Nick Malik said:
It would be much easier to read your code if you didn't use regular
expressions. How about this idea:

string sourcestring = "2004\05\05T00:00:00";

string[] listbits = sourcestring.split('T');
string[] datebits = listbits[0].split('\');
string newdate = String.Join(datebits);

*caveat: air code... I didn't compile this in VS first.

That's pretty easy code, and it will work as long as the original data HAS a
"T" in it. (Fails otherwise).

Hope this helps,
--- N

Drew said:
We have a string like: 2004\05\05T00:00:00

We would like to remove all "\", any "-" and the T00:00:00 part of the
string.

The end result would be 20040505.

I know that if I use [\\-] as the pattern it will match on everything
except the final T00 part.

Could anyone please help us with this pattern.

TIA

Drew Mace
 
Back
Top