Regular Experssion

  • Thread starter Thread starter Toby
  • Start date Start date
T

Toby

Could some tell how I could create a search replace Regular Express in .net
where is would

match

MY_STRING_TO_BE_CONVERTED

and replace with

MyStringToBeConverted

Thanks

Toby.
 
Toby,

For that you do not need a regular expression.

MY_STRING_TO_BE_CONVERTED= MyStringToBeConverted

I think you mean something else can you make that more clear.

Cor
 
Sorry,

I mean more of the search and replace pattern. It could be any string. but
in the format

UPPERCASE_DELIMITED _STRING

coverted to

CapitalizedUndelimitedString.

Regards

Toby.
 
To use a Regular Expression to replace the text you are looking for might be
overkill. Why not just use the .Replace function or an assingment as Cor
suggested? In any sense, to use a regex you need to either make a reference
to the system.text.regularexpressions namespace and use one of the shared
functions (like regex.replace), create an instance of
system.text.regularexpressions.regex, or fully qualify one of the shared
functions.

Imports System.Text.RegularExpressions
'Where MyTextBoxControl.Text is your input source
MyTextBoxControl.text = Regex.Replace(MyTextBoxControl.text,
"MY_STRING_TO_BE_CONVERTED", "MyStringToBeConverted")
 
I'm afraid this is not possible: It's possible to match for the characters
you want to replace, but a RegEx can only replace with characters from the
match (which are in your case uppercase) or fixed characters (not what you
want either). There is no way to replace a character with the lowercase
version of the match. You could in theory build 26 different regex's (one
for each letter) and apply them one after an other...
Maybe you could use Regex.Match("[A-Z0-9_]+") to find the whole capitalized
identifiers, and do the actual manipulation in a for loop.

Niki
 
Sorry, that should of course be Regex.Match("\b[A-Z0-9_]+\b").

Niki Estner said:
I'm afraid this is not possible: It's possible to match for the characters
you want to replace, but a RegEx can only replace with characters from the
match (which are in your case uppercase) or fixed characters (not what you
want either). There is no way to replace a character with the lowercase
version of the match. You could in theory build 26 different regex's (one
for each letter) and apply them one after an other...
Maybe you could use Regex.Match("[A-Z0-9_]+") to find the whole capitalized
identifiers, and do the actual manipulation in a for loop.

Niki

Toby said:
Could some tell how I could create a search replace Regular Express in .net
where is would

match

MY_STRING_TO_BE_CONVERTED

and replace with

MyStringToBeConverted

Thanks

Toby.
 
Toby,

I do not like the regex, I find it complicated and thereby it is very slow.

You got an answer from Jared, however as alternative I give you this.

\\\
Dim myStr As String = "MY_STRING_TO_BE_CONVERTED"
Dim myArr() As String = myStr.Split("_"c)
For i As Integer = 0 To myArr.Length - 1
myArr(i) = StrConv(myArr(i), VbStrConv.ProperCase)
Next
MessageBox.Show(String.Join("", myArr))
///

With that I do not say you should not use the regex, however most situations
can be done with easier commands, which show more what you are doing in the
program.

That does not mean that I say you should not use them.

I hope this helps

Cor
 
Cor Ligthert said:
Toby,

I do not like the regex, I find it complicated and thereby it is very
slow.

Actually it's pretty fast, especially if search and match string are long.
You got an answer from Jared, however as alternative I give you this.

\\\
Dim myStr As String = "MY_STRING_TO_BE_CONVERTED"
Dim myArr() As String = myStr.Split("_"c)
For i As Integer = 0 To myArr.Length - 1
myArr(i) = StrConv(myArr(i), VbStrConv.ProperCase)
Next
MessageBox.Show(String.Join("", myArr))
///

This is probably the slowest alternative...
Use a StringBuilder if you want to do complex string maniplations.
With that I do not say you should not use the regex, however most situations
can be done with easier commands, which show more what you are doing in the
program.

Only for people who don't know regex's, that is...

Niki
 
Nicky,
slow.
Actually it's pretty fast, especially if search and match string are long.

True, I forgot to say ..........slow in non complex situations. (more than
100 times than a string replace which gives a little bit more performance
than the stringbuilder.replace).
This is probably the slowest alternative...
Use a StringBuilder if you want to do complex string maniplations.

This is in my opinion not a complex string manipulation this is a very
simple array manipulation, so show an example where the stringbuilder is
faster in this sample?

Cor
 
Thanks for all you answers. My problem is i have a database schema which was
coded using field names and table names in uppercase and underscores and i
wanted a quick way to update the formatting so that the underscores are
removed and the fields / tables capitalized.

I then have the same problem in my stored procedures, business objects. &
datasets & Code. I was looking for a quick solution without writing any
code. But maybe i have to go down that route.

Thanks

Toby.



Niki Estner said:
Sorry, that should of course be Regex.Match("\b[A-Z0-9_]+\b").

Niki Estner said:
I'm afraid this is not possible: It's possible to match for the characters
you want to replace, but a RegEx can only replace with characters from the
match (which are in your case uppercase) or fixed characters (not what you
want either). There is no way to replace a character with the lowercase
version of the match. You could in theory build 26 different regex's (one
for each letter) and apply them one after an other...
Maybe you could use Regex.Match("[A-Z0-9_]+") to find the whole capitalized
identifiers, and do the actual manipulation in a for loop.

Niki

Toby said:
Could some tell how I could create a search replace Regular Express in .net
where is would

match

MY_STRING_TO_BE_CONVERTED

and replace with

MyStringToBeConverted

Thanks

Toby.
 
<flame on>
Honestly, Niki,
The world isn't made up of just two kinds of people:
people who do things your way, and people who are wrong.

Cor's solution was simple and easy to follow. It works just fine for the
specific purpose, and it far easier to debug than a regular expression,
which is difficult for most folks to compose (to the point where there's a
number of non-trivial tools running around to allow folks to debug their
regular expressions).

On technical points, you are right. However, your tone is condescending and
your contribution to the thread did not add any useful content. If you feel
that regular expressions are so powerful, how come Cor was able to whip up a
code example in the two minutes it took him to answer the post, but you
didn't whip up a regex example to show how much faster and cooler regex is?

I'll venture a guess to my own question: Perhaps this is because Cor's
example is so much simpler than a regex, which would require embedded
expressions, would be fairly difficult for a novice to code, and nearly
impossible to explain in a newsgroup post.

At least Cor answered the question.
<flame off>

Respectfully and in the spirit of constructive criticism,
--- Nick Malik
Application Architect
 
<flaming reply>
The point is that the OP specifically asked if his problem could be solved
using regular expressions. Do you really assume he doesn't know how to do it
in plain C#/VB? Honestly, coding a for loop over a string is a quite common
task, I don't think the OP has to ask an newsgroup how to do that.
So, no I don't think Cor answered the question. I think he showed a personal
oppinon (that he doesn't like regex's). And I chose to comment on that
oppinion.
For the "technical information" part: you probably didn't read my other
post, so I ignore that.
<flame off>

Criticism acknowledged, I'll try to change my tone in the future.
Or put it into "flame"-tags ;-)

Niki
 
Nick,

Thanks, however the answer from Niki is technical not right, we did more
tests in the dotnet.language.vb newsgroup. Everytime with the regex
involved. In simple operations the regex is extremly slow. (Not as slow as
the split I showed in situations with very large documents, however because
I needed that StrConv and the elemintation of the "_" I did it this way).

The fastest way to do a simple replace is with the string.replace (was as
well to my suprise however it is logical, internal there is no need to use
an immutable string in a replace operation, so you even miss the overhead
from the stringbuilder).

When it becomes however to very much instructions to do a replace than I
think that the througput from the regex will be better. As long as I did not
test that, I take for that 100 instructions because that is what I saw as
what the regex is at least slower.

Cor
 
Cor Ligthert said:
...
This is in my opinion not a complex string manipulation this is a very
simple array manipulation, so show an example where the stringbuilder is
faster in this sample?

Your main argument against regex's was that they're complex and slow: Using
String.Split is nice, but it's known to be slow. And Functions like StrConv
are nice too, but they are complex.

This should show the time difference:

Module Module1
Sub Main()
Dim d0 As DateTime, d1 As DateTime, d2 As DateTime
d0 = DateTime.Now
For i As Integer = 0 To 100000
Test1()
Next
d1 = DateTime.Now
For i As Integer = 0 To 100000
Test2()
Next
d2 = DateTime.Now
Console.WriteLine("Using String.Split: {0}", New TimeSpan(d1.Ticks -
d0.Ticks))
Console.WriteLine("Using StringBuilder: {0}", New
TimeSpan(d2.Ticks - d1.Ticks))
End Sub

Function Test1() As String
Dim myStr As String = "MY_STRING_TO_BE_CONVERTED"
Dim myArr() As String = myStr.Split("_"c)
For i As Integer = 0 To myArr.Length - 1
myArr(i) = StrConv(myArr(i), VbStrConv.ProperCase)
Next
Test1 = String.Join("", myArr)
End Function

Function Test2() As String
Dim myStr As String = "MY_STRING_TO_BE_CONVERTED"
Dim builder As New System.Text.StringBuilder
Dim nextCharLower As Boolean = False

For i As Integer = 0 To myStr.Length - 1
If myStr.Chars(i) = "_"c Then
nextCharLower = False
Else
If nextCharLower Then
builder.Append(Char.ToLower(myStr.Chars(i)))
Else : builder.Append(myStr.Chars(i))
End If
nextCharLower = True
End If
Next
Test2 = builder.ToString
End Function
End Module

Sorry if the code's not that pretty. VB isn't my "mother language".

Niki
 
Niki,

You are right, your procedure is with 100.000 loops on my computer 3 seconds
faster than what I did build.

However I was expecting that you would make a routine with replace not a
routine with a char for char loop.

In that case I have added as test3 a routine which is twice as fast as
yours, by not using the stringbuilder. I hope you do not mind that I did not
optimize it, and I can assure you that I will never use it in real practise.

I also changed the way to test the ticks, just to make it easy, maybe you
can use that as well. (Your own choise).

Maybe you can now as fourth test show it with Regex.

Cor

\\\
Module Module1
Sub Main()
Dim d0 As Integer = environment.TickCount
For i As Integer = 0 To 100000
Test1()
Next
Console.WriteLine("Using String.Split: {0}", _
Environment.TickCount - d0)
d0 = Environment.TickCount
For i As Integer = 0 To 100000
Test2()
Next
Console.WriteLine("Using StringBuilder: {0}", _
Environment.TickCount - d0)
d0 = Environment.TickCount
For i As Integer = 0 To 100000
Test3()
Next
Console.WriteLine("Using Encoding: {0}", _
Environment.TickCount - d0)
End Sub

Function Test1() As String
Dim myStr As String = "MY_STRING_TO_BE_CONVERTED"
Dim myArr() As String = myStr.Split("_"c)
For i As Integer = 0 To myArr.Length - 1
myArr(i) = StrConv(myArr(i), VbStrConv.ProperCase)
Next
Test1 = String.Join("", myArr)
End Function

Function Test2() As String
Dim myStr As String = "MY_STRING_TO_BE_CONVERTED"
Dim builder As New System.Text.StringBuilder
Dim nextCharLower As Boolean = False
For i As Integer = 0 To myStr.Length - 1
If myStr.Chars(i) = "_"c Then
nextCharLower = False
Else
If nextCharLower Then
builder.Append(Char.ToLower(myStr.Chars(i)))
Else
builder.Append(myStr.Chars(i))
End If
nextCharLower = True
End If
Next
Test2 = builder.ToString
End Function
Function Test3() As String
Dim myCoder As System.Text.Encoding = System.Text.Encoding.ASCII
Dim nextCharLower As Boolean = False
Dim se As System.Text.Encoding
Dim myStr() As Byte = myCoder.GetBytes("MY_STRING_TO_BE_CONVERTED")
Dim myStrOut(myStr.Length) As Byte
Dim y As Integer = 0
For i As Integer = 0 To myStr.Length - 1
If nextCharLower = False Then
myStrOut(y) = myStr(i)
Else
myStrOut(y) = myStr(i) + CByte(32)
End If
If myStr(i) = 95 Then
nextCharLower = False
Else
nextCharLower = True
y += 1
End If
Next
Test3 = myCoder.GetChars(myStrOut)
Test3 = Test3.Substring(0, y)
End Function
End Module
///
 
Hi Cor,

If you take a look at your code (or at mine, that doesn't matter much) you
should see my point: This code *isn't* simple. If you wouldn't know what it
does, or what it's input looks like, it would take pretty long to figure out
what it does. (It surely would take long for me) Regex's do provide a good
balance of clarity vs. performance.

And, to your last algorithm: It's faster, but it doesn't do the same thing
as the other two. It can only handle characters in the A-Z range, no French,
German, Cyrillic characters. Both of the original algorithms would have
treated these correctly, and that's the main reason for the speed
difference.

Niki
 
Niki,

I was expecting this answer and do not disagree with that, therefore was the
syntax that I never would use it.

Yesterday I could help someone who had a problem and asked for a Regular
expresion or other solution. I could help him with 4 rows of code.

The purpose of my sample was only to show that you should not directly do
everything using the the regex. Take it when it is needed and not for those
simple problems as now is asked, that was the only purpose of my answer
because the rest was already answered by Jared.

Cor
 
Cor Ligthert said:
...
Yesterday I could help someone who had a problem and asked for a Regular
expresion or other solution. I could help him with 4 rows of code.

Applying a regex usually takes only one line of code ;-)
The purpose of my sample was only to show that you should not directly do
everything using the the regex. Take it when it is needed and not for those
simple problems as now is asked, that was the only purpose of my answer
because the rest was already answered by Jared.

Well, in this case the OP explicitly asked for a regex. Telling him that his
problem can be solved without a regex is both obvious and (IMO) a bit
offensive: It's as if I asked you for driving lessons and you told me I
should rather walk, because driving a car is so complex.

Just curious: If I understood the OP, he wants to find names in uppercase
and underscores. How would you match for those? IMO that's a perfect
application for a regex, although it's a rather simple problem.

Niki
 
Niki,
Applying a regex usually takes only one line of code ;-)

That can in C# as well and some people find that more readable. (I don't)

:-)

I make this EOT, I hope you do not mind?

Cor
 
Back
Top