...newbie reg expression for this replacement...

  • Thread starter Thread starter Daniel Bass
  • Start date Start date
D

Daniel Bass

I want to convert the following example

string query = "Type in ('1','hello','3')";

into

newquery = ( Type = '1' OR Type = 'hello' OR Type = '3' )



a more general syntax would be

{ID} in ( '{value1}', '{value2}', ... '{valueN}' )

to be converted into

( {ID} = '{value1}' OR {ID} = '{value2}' OR ... OR {ID} =
'{valueN}' )

where N is any positive integer


I figure the best way to do this is to use regular expressions, but I've
just starting looking at them and haven't a clue where to start.

Thanks for your time.
Dan.
 
Hi Daniel,

You can use regular expressions to match the appropriate snippets from your
query string. Then you can reassemble the matches any way you like, e.g.
StringBuilders. Here is an example of how you might do this...

string query = "Type in ('1','hello','3')";
Match m = Regex.Match(query,
"([\\w\\d]+)\\s+in\\s+\\((?:\\'([\\w\\d]+)\\',?)*\\)");

StringBuilder sb = new StringBuilder();
CaptureCollection values = m.Groups[2].Captures;
for (int i = 0; i < values.Count; ++i)
{
sb.Append(m.Groups[1] + " = " + values);
if (i < values.Count - 1)
sb.Append(" OR ");
}
string newquery = sb.ToString();

The tricky part is, as you mentioned, the regex string
"([\\w\\d]+)\\s+in\\s+\\((?:\\'([\\w\\d]+)\\',?)*\\)" so let me try to
explain that a bit.
First of all \w means match a 'word' character, \d matches digits, \s =
spaces.
[ ] means match any character within the [ ].
+ means match at least one

So the snippet: [\\w\\d]+ means I would like to match a sequence of 1 or
more letters or digits. The letters or digits can come in any order, so
maybe you want your identifiers to start with letters, etc. That would
require a different regex string.

The next idea in the regex string is that of 'captures'. When I call
Regex.Match, the system will record certain matches for me, so I need to
tell it what to record. You do this by parenthesizing the match. So
([\\w\\d]+) means find a sequence of 1 or more letters or digits and record
it. This will match your 'Type' in the example query. If you want to
match a ( or any other special character in your string then you can escape
it: \\( and finally, to disable matching for a group, you can do (?: ... ).

The string (?:\\'([\\w\\d]+)\\',?) is what matches your values. It looks
for a "'" followed by our friend [\\w\\d]+ and then another "'". Finally
there may or may not be a comma, so I have ,? to represent that. I need
to logically group this entire structure to match any number of them,
however I don't want this group to record captures, since they will include
the "'" and the ", ". This is the reason for "(?:" to disable the capture
of this group. The inner group ([\\w\\d]+) will still record captures
though, and thats where you can extract the values.

For extracting the matches, by default the first match 'group' is the
entire expression, followed in order by those groups within the expression
(sets of parentheses without "(?:"). Each group has a capture collection,
which contains the instances when that group matched the given string.

I hope this is a helpful starting point.

-DoRon

--------------------
 
DoRon,

Mate, that was great. I wasn't expecting such a detailed and helpful reply.

Much appreciated!

Dan.




Hi Daniel,

You can use regular expressions to match the appropriate snippets from your
query string. Then you can reassemble the matches any way you like, e.g.
StringBuilders. Here is an example of how you might do this...

string query = "Type in ('1','hello','3')";
Match m = Regex.Match(query,
"([\\w\\d]+)\\s+in\\s+\\((?:\\'([\\w\\d]+)\\',?)*\\)");

StringBuilder sb = new StringBuilder();
CaptureCollection values = m.Groups[2].Captures;
for (int i = 0; i < values.Count; ++i)
{
sb.Append(m.Groups[1] + " = " + values);
if (i < values.Count - 1)
sb.Append(" OR ");
}
string newquery = sb.ToString();

The tricky part is, as you mentioned, the regex string
"([\\w\\d]+)\\s+in\\s+\\((?:\\'([\\w\\d]+)\\',?)*\\)" so let me try to
explain that a bit.
First of all \w means match a 'word' character, \d matches digits, \s =
spaces.
[ ] means match any character within the [ ].
+ means match at least one

So the snippet: [\\w\\d]+ means I would like to match a sequence of 1 or
more letters or digits. The letters or digits can come in any order, so
maybe you want your identifiers to start with letters, etc. That would
require a different regex string.

The next idea in the regex string is that of 'captures'. When I call
Regex.Match, the system will record certain matches for me, so I need to
tell it what to record. You do this by parenthesizing the match. So
([\\w\\d]+) means find a sequence of 1 or more letters or digits and record
it. This will match your 'Type' in the example query. If you want to
match a ( or any other special character in your string then you can escape
it: \\( and finally, to disable matching for a group, you can do (?: ... ).

The string (?:\\'([\\w\\d]+)\\',?) is what matches your values. It looks
for a "'" followed by our friend [\\w\\d]+ and then another "'". Finally
there may or may not be a comma, so I have ,? to represent that. I need
to logically group this entire structure to match any number of them,
however I don't want this group to record captures, since they will include
the "'" and the ", ". This is the reason for "(?:" to disable the capture
of this group. The inner group ([\\w\\d]+) will still record captures
though, and thats where you can extract the values.

For extracting the matches, by default the first match 'group' is the
entire expression, followed in order by those groups within the expression
(sets of parentheses without "(?:"). Each group has a capture collection,
which contains the instances when that group matched the given string.

I hope this is a helpful starting point.

-DoRon

--------------------
 
Back
Top