regex -- substitute chars outside quoted strings

  • Thread starter Thread starter Gary McCullough
  • Start date Start date
G

Gary McCullough

What I want to do sounds simple, but it's defeating me. I want to
substitute all occurences of a colon : character in a string with an @
character -- unless the : occurs within a single or double-quoted
substring. Surely this can be done with regular expressions? Any regex
gurus know how to do it?
 
What I want to do sounds simple, but it's defeating me. I want to
substitute all occurences of a colon : character in a string with an @
character -- unless the : occurs within a single or double-quoted
substring. Surely this can be done with regular expressions? Any regex
gurus know how to do it?

Preprocess the string: split it into the parts with quotes and without.

Lexer would work great in this case.
 
As a human being, this seems like a very simple problem, but trying to get a
computer to understand what you want is another story. Let's look at an
example of why this is more complex of a problem, using the following string:

Meeting today : 10AM
This is an "example: 1"
Meeting tomorrow : 11AM
This is another "example: 2"

If I understand the requirements, the desired output should be:

Meeting today @ 10AM
This is an "example: 1"
Meeting tomorrow @ 11AM
This is another "example: 2"

Unfortunately, if we wrote a regular expression to replace any colon (:) not
inside quotes, the colon before 11AM would not be changed, because there is a
preceding and following quote.

I've had a similar problem before as well, and the best solution I could
think of was to extract all of the quoted strings and replace them with an
escape sequence, then do the replacement, then re-inflate the escape
sequences with the extracted values.

An example would look something like:

string s = @"
Meeting today : 10AM
This is an ""example: 1""
Meeting tomorrow : 11AM
This is another ""example: 1""
";

// Extract the quoted strings
MatchCollection matches = Regex.Matches(s, @"""[^\""]+?""");
for(int x=matches.Count-1; x>-1; x--)
{
Match match = matches[x];
s = s.Remove(match.Index, match.Length);
s = s.Insert(match.Index, "{" + x + "}");
}

// Replace the remaining : with @
s = s.Replace(':', '@');

// Reinflate the escaped strings
for(int x=0; x<matches.Count; x++)
{
Match match = matches[x];
s = s.Remove(match.Index, x.ToString().Length + 2);
s = s.Insert(match.Index, match.Value);
}


If anyone else has a better solution, I'd love to hear it.

Hope this helps.
 
Jason,

I was afraid you'd say that.

Your analysis of the problem is dead on. In reality I'm converting
parameterized sql statements from SqlServer format to Oracle format and
vice versa (thus substituting :'s and @'s), but your example works just
as well. I'm surprised this is such a hard problem.

Since I can't figure out how to do it with a regex I'm just using
regex's to extract the literals and doing replaces on the other bits.
 
Jason,

I was afraid you'd say that.

Your analysis of the problem is dead on. In reality I'm converting
parameterized sql statements from SqlServer format to Oracle format and
vice versa (thus substituting :'s and @'s), but your example works just
as well. I'm surprised this is such a hard problem.

Since I can't figure out how to do it with a regex I'm just using
regex's to extract the literals and doing replaces on the other bits.
 
What I want to do sounds simple, but it's defeating me. I want to
substitute all occurences of a colon : character in a string with an @
character -- unless the : occurs within a single or double-quoted
substring. Surely this can be done with regular expressions? Any regex
gurus know how to do it?

This pattern will find all occurrences of a particular charcter except
where it occurs between quotation marks.

This example finds the occurrences of the colon character except where it
appears within quotation marks. If you want single quotes, then change the
\x22 to \x27

Dim sPattern As String = ":(?=([^\x22]*\x22[^\x22]*\x22)*(?![^\x22]*\x22))"
Dim sInput As String = "Meeting tomorrow : 11AM ""Example: 1"""

Dim mc As New MatchCollection =
Regex.Matches(sInput,sPattern,RegExOption.ExplicitCapture)

The match collection here should contain only one item the location of the
colon after the word tomorrow.

If you use the replace function:

Dim sResult As string = Regex.Replace(sInput, sPattern, "@")
'sResult = "Meeting tomorrow @ 11AM "Example: 1""

Perhaps this will help you.

--
Chris

dunawayc[AT]sbcglobal_lunchmeat_[DOT]net

To send me an E-mail, remove the "[", "]", underscores ,lunchmeat, and
replace certain words in my E-Mail address.
 
Back
Top