Character Escapes Don't Work in VB Regex Replace?

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Anyone know of a fix (ideally) or an easy workaround to the problem of escape characters not working in regex replacement text? They just come out as literal text

For example, you'd think that thi

Regex.Replace("<stuff>text</stuff>", "<stuff>", "<stuff>\n"

would give yo

<stuff
text</stuff

But it doesn't. Instead, you get

<stuff>\ntext</stuff

That's a totally arbitrary example. Escape characters just don't work at all in VB regex replacements- any of them. They work in C#, but that doesn't do me much good unless I start over

Thanks
Chri
 
You can just append the actual characters in using Environment.NewLine or
vbLf. VB does not use character escapes in strings like C#, instead you
just concatenate the string with the character, i.e.


Regex.Replace("<stuff>text</stuff>", "<stuff>", "<stuff>" &
Environment.NewLine)
-or-
Regex.Replace("<stuff>text</stuff>", "<stuff>", "<stuff>" & vbLf)


Note that this is not specific to Regex.Replace; it is just that C# supports
character escapes in strings while VB does not.


Brian Davis
www.knowdotnet.com



Chris Anderson said:
Anyone know of a fix (ideally) or an easy workaround to the problem of
escape characters not working in regex replacement text? They just come out
as literal text.
For example, you'd think that this

Regex.Replace("<stuff>text</stuff>", "<stuff>", "<stuff>\n")

would give you

<stuff>
text</stuff>

But it doesn't. Instead, you get.

<stuff>\ntext</stuff>

That's a totally arbitrary example. Escape characters just don't work at
all in VB regex replacements- any of them. They work in C#, but that doesn't
do me much good unless I start over.
 
I guess I should have pointed that out in the original posting..

I could certainly do it like that or by a few other means in the code. But the problem is that I'm loading replace expressions dynamically from a text file and I need the ability (specifically) to allow Unicode characters such as \u200e in the replace expression, and also the whole range of general character escapes. I can't hard code those. I could come up with a little parser to replace character escapes from the script before sending them to the regex replace, but that's a pain. There's no reason that VB shouldn't support this. The regex engine is part of the .net framework and should conform to common functionality, as far as I'm concerned. Anyway, regardless of the fact that VB doesn't generally support escape characters doesn't mean that they shouldn't work in regex. Afterall, $1 means nothing special in VB, but does in a regex replace string

Now I'm just ranting... Point being, I need to do this the way it should work. I'm hoping there's either some sort of update available or some crazy secrect syntax such as $#\\u002e to do what I need

Thanks.
 
Chris,
As Brian suggested:

VB.NET just does not support C# escape sequences, nor does VB.NET define its
own escape sequences!


The "\n" is only supported in regular expressions and replacement patterns,
not the replace string according to the Character Escapes section of Regular
Expression Language Elements:

http://msdn.microsoft.com/library/d...l/cpconRegularExpressionsLanguageElements.asp

This is not a VB.NET problem per se, C# & every other .NET language will
have the same problem, as the RegEx class itself is defining this behavior.
(Read as I hope your rant is against the RegEx class and not VB.NET! ;-))

Unfortunately I do not know of a predefined routine that will replace the C#
escape sequences with their respective characters. If you build one, I would
recommend using a StringBuilder in the implementation.

Hope this helps
Jay

Chris Anderson said:
I guess I should have pointed that out in the original posting...

I could certainly do it like that or by a few other means in the code. But
the problem is that I'm loading replace expressions dynamically from a text
file and I need the ability (specifically) to allow Unicode characters such
as \u200e in the replace expression, and also the whole range of general
character escapes. I can't hard code those. I could come up with a little
parser to replace character escapes from the script before sending them to
the regex replace, but that's a pain. There's no reason that VB shouldn't
support this. The regex engine is part of the .net framework and should
conform to common functionality, as far as I'm concerned. Anyway, regardless
of the fact that VB doesn't generally support escape characters doesn't mean
that they shouldn't work in regex. Afterall, $1 means nothing special in VB,
but does in a regex replace string.
Now I'm just ranting... Point being, I need to do this the way it should
work. I'm hoping there's either some sort of update available or some crazy
secrect syntax such as $#\\u002e to do what I need.
 
Well, I figured I was hoping too much. I'd still call it a bug, though. It actually says in the documentation for regex that all character escapes are supported both in regular expressions and replacement patterns

http://msdn.microsoft.com/library/d.../en-us/cpgenref/html/cpconcharacterescapes.as

Unfortunately, it's only half true in VB. I only mention C# because I wanted to see if this was a VB specific issue, or the .net regex engine. Using escape characters in C# replacement patterns does work as it says in the documentation, and as in every other language or application that implements regex. I've been using VB for 6 years, and think .net is great, but it's very disappointing and confusing that VB does implement the regex behavior in a standard way. I'll probably end up writing my own fix, and share it with anyone else who is frustrated by this.
 
Chris,
Correct, the "replacement pattern" which is the second argument to
Regex.Replace (in the sample you gave) supports \n, which is the pattern to
match.

However! The "replacement", which is the third argument to Regex.Replace,
which is your argument with \n in it, is not listed on the page you gave.

http://msdn.microsoft.com/library/d...RegularExpressionsRegexClassReplaceTopic6.asp
No! They do not work!! :-| What documentation? (not the page you gave!)

They do not work in that RegEx will not honor them, however C# itself may.
Try the following (in C#):

string s =
System.Text.RegularExpressions.Regex.Replace(@"<stuff>text</stuff>",
@"<stuff>", @"<stuff>\n");
System.Diagnostics.Debug.WriteLine(s);

Where I am telling C# not to replace C#'s escape sequences.

Notice that the result still contains the \n, as Regex does not modify the
\n in the replacement text.

Hope this helps
Jay


Chris Anderson said:
Well, I figured I was hoping too much. I'd still call it a bug, though. It
actually says in the documentation for regex that all character escapes are
supported both in regular expressions and replacement patterns.
http://msdn.microsoft.com/library/d...en-us/cpgenref/html/cpconcharacterescapes.asp

Unfortunately, it's only half true in VB. I only mention C# because I
wanted to see if this was a VB specific issue, or the .net regex engine.
Using escape characters in C# replacement patterns does work as it says in
the documentation, and as in every other language or application that
implements regex. I've been using VB for 6 years, and think .net is great,
but it's very disappointing and confusing that VB does implement the regex
behavior in a standard way. I'll probably end up writing my own fix, and
share it with anyone else who is frustrated by this.
 
Chris,
If you are still following this thread...

While searching for something else, I just came across RegEx.Escape and
RegEx.Unescape that will escape & unescape strings for you (including
whitespace). It may help in your efforts.

http://msdn.microsoft.com/library/d...xtRegularExpressionsRegexClassEscapeTopic.asp

http://msdn.microsoft.com/library/d...RegularExpressionsRegexClassUnescapeTopic.asp

Hope this helps
Jay


Chris Anderson said:
Well, I figured I was hoping too much. I'd still call it a bug, though. It
actually says in the documentation for regex that all character escapes are
supported both in regular expressions and replacement patterns.
http://msdn.microsoft.com/library/d...en-us/cpgenref/html/cpconcharacterescapes.asp

Unfortunately, it's only half true in VB. I only mention C# because I
wanted to see if this was a VB specific issue, or the .net regex engine.
Using escape characters in C# replacement patterns does work as it says in
the documentation, and as in every other language or application that
implements regex. I've been using VB for 6 years, and think .net is great,
but it's very disappointing and confusing that VB does implement the regex
behavior in a standard way. I'll probably end up writing my own fix, and
share it with anyone else who is frustrated by this.
 
Back
Top