Regex with quotes

  • Thread starter Thread starter Flomo Togba Kwele
  • Start date Start date
F

Flomo Togba Kwele

I am having difficulty writing a Regex constructor.

A line has a quote(") at its beginning and its end. I need to strip both characters off. If the
line looks like "1", I need the result to be 1.

If it were sed, I could just do s/^"// and s/"$//, but I'm confused with the quote escape
characters. Also, can I do both replacements at once?

I tried:

Regex regex = new Regex("[^""", """$], """""")

You can see I'm lost.

Thanks, Flomo
--
 
Hello Flomo,

From your post, my understanding on this issue is: you want to use Regex to
remove the quotes (") at the beginning and the end of a string. If I'm off
base, please feel free to let me know.

Here are two ways to strip both quotes off at once:
1. match the quotes and replace them with 'Nothing':
Regex regex = new Regex("(^\")|(\"$)");
string test = "\"1\"";
string result = regex.Replace(test, "");
Console.WriteLine(result);
2. capture the inner part of the string with a Named group:
Regex regex = new Regex("^\"(?<result>.*)\"$");
string test = "\"1\"";
string result = regex.Match(test).Groups["result"].Value;
Console.WriteLine(result);

Besides, if the input string is multi-lined and each line may have been
enclosed in quotes, he can use RegexOptions.Multiline to remove all of
quotes on each line in one pass.

For your question about the quote escape character, ' " ' is escaped to be
' \" ' in C#, and ' "" ' in VB.NET.

Please let me know if you have any other concerns, or need anything else.

Sincerely,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

==================================================
For MSDN subscribers whose posts are left unanswered, please check this
document: http://blogs.msdn.com/msdnts/pages/postingAlias.aspx

Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications. If you are using Outlook Express/Windows Mail, please make sure
you clear the check box "Tools/Options/Read: Get 300 headers at a time" to
see your reply promptly.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
Jialiang,

Thanks for the reply. I will try the code.

Why do you place the search string ^\" in parenthesis? Why can it not be:

Regex regex = new Regex("^\"|\"$"); \\?
--


Hello Flomo,

From your post, my understanding on this issue is: you want to use Regex to
remove the quotes (") at the beginning and the end of a string. If I'm off
base, please feel free to let me know.

Here are two ways to strip both quotes off at once:
1. match the quotes and replace them with 'Nothing':
Regex regex = new Regex("(^\")|(\"$)");
string test = "\"1\"";
string result = regex.Replace(test, "");
Console.WriteLine(result);
2. capture the inner part of the string with a Named group:
Regex regex = new Regex("^\"(?<result>.*)\"$");
string test = "\"1\"";
string result = regex.Match(test).Groups["result"].Value;
Console.WriteLine(result);

Besides, if the input string is multi-lined and each line may have been
enclosed in quotes, he can use RegexOptions.Multiline to remove all of
quotes on each line in one pass.

For your question about the quote escape character, ' " ' is escaped to be
' \" ' in C#, and ' "" ' in VB.NET.

Please let me know if you have any other concerns, or need anything else.

Sincerely,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

==================================================
For MSDN subscribers whose posts are left unanswered, please check this
document: http://blogs.msdn.com/msdnts/pages/postingAlias.aspx

Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications. If you are using Outlook Express/Windows Mail, please make sure
you clear the check box "Tools/Options/Read: Get 300 headers at a time" to
see your reply promptly.

Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
Hello Flomo

The parenthesis is not a must. I added it so as to make the logic clearer.
You may remove the parenthesis:
Regex regex = new Regex("^\"|\"$");

Please feel free to let me know if you have any other concern.

Sincerely,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

=================================================
When responding to posts, please "Reply to Group" via your newsreader
so that others may learn and benefit from your issue.
=================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
Hello Jialiang Ge [MSFT],
Hello Flomo

The parenthesis is not a must. I added it so as to make the logic
clearer.
You may remove the parenthesis:
Regex regex = new Regex("^\"|\"$");
Please feel free to let me know if you have any other concern.

Usually when you have two regexes that you combine, but have nothign in common
(e.g. they do not logically follow eachother) it's faster to run them seperately.
Though these regexes are so short this advantage might nbe lost.

It might be even faster to just load the string in a stringbuilder and use
Remove on the first and the last character if needed. Even a simple substring
might be faster...

I tested all options with the code below and the end result was that with
reasonably short strings the Stringbuilder won, with larger strings the substring
function won. followed after a very large gat (almost 4x slower) by the regexes.

results in milliseconds for 10000 passes.
substring : 123
stringbuilder : 135
regexboth : 308
regextwopart : 443

I used compiled regexes and made sure the compile time was outside the stopwatch.

I've attached the code below, including the substring and stringbuilder funtions
I used

using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;
using System.Diagnostics;
using System.Collections;

namespace ConsoleApplication1
{
public delegate string TestStringDelegate(string input);

class Program
{
static Dictionary<string, long> results = new Dictionary<string,
long>();

static Program()
{
replaceBoth.Match("");
replaceLeft.Match("");
replaceRight.Match("");
}

static void Main(string[] args)
{
string[] inputs = new string[]{
@"""........""",
@"""............................................................................""",
@""".....................................................................................................................................................................................................................................................................................................................................................................................""",
@""".....................................................................................................................................................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................................................................................................................................................................................
......................................................................................................................................................................................................................................................................................................................................................................................
"""
};

foreach (string input in inputs)
{
Test("substring", input, new TestStringDelegate(TestSubstring));
Test("stringbuilder", input, new TestStringDelegate(TestStringBuilder));
Test("regexboth", input, new TestStringDelegate(TestRegexBoth));
Test("regextwopart", input, new TestStringDelegate(TestRegexTwoPart));
}

WriteResults();
Console.ReadLine();
}

static void WriteResults()
{
foreach (string key in results.Keys)
{
Console.WriteLine("{0, -20}: {1}", key, results[key]);
}
}

static void Test(string name, string input, TestStringDelegate function
)
{
string output = "";
Stopwatch sw = new Stopwatch();
Console.WriteLine("Start test " + name);
sw.Start();
for (int i = 0; i < 10000; i++)
{
output = function(input);
}
sw.Stop();

if (results.ContainsKey(name))
{
long millis = results[name];
results[name] = millis + sw.ElapsedMilliseconds;
}
else
{
results.Add(name, sw.ElapsedMilliseconds);
}
Console.WriteLine(input);
Console.WriteLine(output);
Console.WriteLine(sw.ElapsedMilliseconds);
Console.WriteLine("End test " + name);
}

static string TestStringBuilder(string input)
{
StringBuilder sb = new StringBuilder(input);
if (sb.Length > 0 && sb[0] == '\"')
{
sb.Remove(0, 1);
}
if (sb.Length > 1 && sb[sb.Length - 1] == '\"')
{
sb.Remove(sb.Length - 1, 1);
}
string output = sb.ToString();
return output;
}

static string TestSubstring(string input)
{
int start = 0;
int end = input.Length;

if (input.StartsWith("\""))
{
start++;
}
if (input.EndsWith("\"") && input.Length > 1)
{
end--;
}
string output = input.Substring(start, end - start);
return output;
}

static string TestRegexBoth(string input)
{
return replaceBoth.Replace(input, "");
}

static string TestRegexTwoPart(string input)
{
return replaceRight.Replace(replaceLeft.Replace(input, ""), "");
}

private static Regex replaceBoth = new Regex(@"^""|""$", RegexOptions.Compiled);
private static Regex replaceLeft = new Regex(@"^""", RegexOptions.Compiled);
private static Regex replaceRight = new Regex(@"""$", RegexOptions.Compiled);
}
}
 
Jesse,

I really appreciate your analysis. I will change to StringBuilder.

Thanks again, Flomo
--



Jesse said:
Hello Jialiang Ge [MSFT],
Hello Flomo

The parenthesis is not a must. I added it so as to make the logic
clearer.
You may remove the parenthesis:
Regex regex = new Regex("^\"|\"$");
Please feel free to let me know if you have any other concern.

Usually when you have two regexes that you combine, but have nothign in common (e.g. they do not
logically follow eachother) it's faster to run them seperately. Though these regexes are so short
this advantage might nbe lost.

It might be even faster to just load the string in a stringbuilder and use Remove on the first
and the last character if needed. Even a simple substring might be faster...

I tested all options with the code below and the end result was that with reasonably short
strings the Stringbuilder won, with larger strings the substring function won. followed after a
very large gat (almost 4x slower) by the regexes.

results in milliseconds for 10000 passes.
substring : 123
stringbuilder : 135
regexboth : 308
regextwopart : 443

I used compiled regexes and made sure the compile time was outside the stopwatch.

I've attached the code below, including the substring and stringbuilder funtions I used

using System;
using System.Collections.Generic;
using System.Text;
using System.Text.RegularExpressions;
using System.Diagnostics;
using System.Collections;

namespace ConsoleApplication1
{
public delegate string TestStringDelegate(string input);

class Program
{
static Dictionary<string, long> results = new Dictionary<string, long>();

static Program()
{
replaceBoth.Match("");
replaceLeft.Match("");
replaceRight.Match("");
}

static void Main(string[] args)
{
string[] inputs = new string[]{
@"""........""",

@"""............................................................................""",
@"""..............................................................................................
..................................................................................................
..................................................................................................
...................................................................................""",
@"""..............................................................................................
..................................................................................................
..................................................................................................
...................................................................................
..................................................................................................
..................................................................................................
..................................................................................................
...............................................................................
..................................................................................................
..................................................................................................
..................................................................................................
...............................................................................
..................................................................................................
..................................................................................................
..................................................................................................
...............................................................................
..................................................................................................
..................................................................................................
..................................................................................................
...............................................................................
..................................................................................................
..................................................................................................
..................................................................................................
...............................................................................
..................................................................................................
..................................................................................................
..................................................................................................
...............................................................................
..................................................................................................
..................................................................................................
..................................................................................................
...............................................................................
..................................................................................................
..................................................................................................
..................................................................................................
...............................................................................
..................................................................................................
..................................................................................................
..................................................................................................
...............................................................................
..................................................................................................
..................................................................................................
..................................................................................................
...............................................................................
..................................................................................................
..................................................................................................
..................................................................................................
...............................................................................
..................................................................................................
..................................................................................................
..................................................................................................
...............................................................................
..................................................................................................
..................................................................................................
..................................................................................................
...............................................................................
..................................................................................................
..................................................................................................
..................................................................................................
...............................................................................
..................................................................................................
..................................................................................................
..................................................................................................
...............................................................................
..................................................................................................
..................................................................................................
..................................................................................................
...............................................................................
..................................................................................................
..................................................................................................
..................................................................................................
............................................................................... """
}; foreach (string input in inputs) {
Test("substring", input, new TestStringDelegate(TestSubstring));
Test("stringbuilder", input, new TestStringDelegate(TestStringBuilder));
Test("regexboth", input, new TestStringDelegate(TestRegexBoth));
Test("regextwopart", input, new TestStringDelegate(TestRegexTwoPart)); }

WriteResults();
Console.ReadLine();
}

static void WriteResults()
{
foreach (string key in results.Keys)
{
Console.WriteLine("{0, -20}: {1}", key, results[key]);
}
}

static void Test(string name, string input, TestStringDelegate function )
{
string output = "";
Stopwatch sw = new Stopwatch();
Console.WriteLine("Start test " + name);
sw.Start();
for (int i = 0; i < 10000; i++)
{
output = function(input);
}
sw.Stop();

if (results.ContainsKey(name))
{
long millis = results[name];
results[name] = millis + sw.ElapsedMilliseconds;
}
else
{
results.Add(name, sw.ElapsedMilliseconds);
}
Console.WriteLine(input);
Console.WriteLine(output);
Console.WriteLine(sw.ElapsedMilliseconds);
Console.WriteLine("End test " + name);
}

static string TestStringBuilder(string input)
{
StringBuilder sb = new StringBuilder(input);
if (sb.Length > 0 && sb[0] == '\"')
{
sb.Remove(0, 1);
}
if (sb.Length > 1 && sb[sb.Length - 1] == '\"')
{
sb.Remove(sb.Length - 1, 1);
}
string output = sb.ToString();
return output;
}

static string TestSubstring(string input)
{
int start = 0;
int end = input.Length;

if (input.StartsWith("\""))
{
start++;
}
if (input.EndsWith("\"") && input.Length > 1)
{
end--;
}
string output = input.Substring(start, end - start);
return output;
}

static string TestRegexBoth(string input)
{
return replaceBoth.Replace(input, "");
}

static string TestRegexTwoPart(string input)
{
return replaceRight.Replace(replaceLeft.Replace(input, ""), "");
}

private static Regex replaceBoth = new Regex(@"^""|""$", RegexOptions.Compiled);
private static Regex replaceLeft = new Regex(@"^""", RegexOptions.Compiled);
private static Regex replaceRight = new Regex(@"""$", RegexOptions.Compiled);
}
}
 
Hello Flomo

As Jesse suggested, StringBuilder is faster than Regex in this case becasue
the regular expression ^"|"$ needs to match the pattern over the whole
string.

Please feel free to let me know if you have any other concerns.

Sincerely,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support

=================================================
When responding to posts, please "Reply to Group" via your newsreader
so that others may learn and benefit from your issue.
=================================================
This posting is provided "AS IS" with no warranties, and confers no rights.
 
Back
Top