[Slightly OT] Is this possible with a regex?

  • Thread starter Thread starter Cool Guy
  • Start date Start date
C

Cool Guy

Is it possible, with a regular expression, to split a string by spaces,
*not* splitting substrings between quotation characters?

e.g.:

one two "three three" four "five five" six

-->

1: one
2: two
3: three three
4: four
5: five five
6: six
 
// Test
string line = @"one two ""three three"" four ""five five"" six";
string[] fields = SplitQuoted(line, " "); // Delim with space. Ignore any
delims inside quotes.
foreach ( string s in fields )
{
Console.WriteLine(s);
}


/// <summary>
/// Splits any string using seperators string. This is different
from the
/// string.Split method as we ignore delimiters inside double quotes
and
/// will *ignore multiple delimiters in a row (i.e. [One two]
will split
/// into two fields if space is one of the delimiters).
/// Example:
/// Delims: " \t," (space, tab, comma)
/// Input: "one two" three four,five
/// Returns (4 strings):
/// one two
/// three
/// four
/// five
/// </summary>
/// <param name="text">The string to split.</param>
/// <param name="delimiters">The characters to split on.</param>
/// <returns></returns>
public static string[] SplitQuoted(string text, string delimiters)
{
// Default delimiters are a space and tab (e.g. " \t").
// All delimiters not inside quote pair are ignored.
// Default quotes pair is two double quotes ( e.g. '""' ).
if ( text == null )
throw new ArgumentNullException("text", "text is null.");
if ( delimiters == null || delimiters.Length < 1 )
delimiters = " \t"; // Default is a space and tab.

ArrayList res = new ArrayList();

// Build the pattern that searches for both quoted and unquoted
elements
// notice that the quoted element is defined by group #2 (g1)
// and the unquoted element is defined by group #3 (g2).

string pattern =
@"""([^""\\]*[\\.[^""\\]*]*)""" +
"|" +
@"([^" + delimiters + @"]+)";

// Search the string.
foreach ( System.Text.RegularExpressions.Match m in
System.Text.RegularExpressions.Regex.Matches(text, pattern) )
{
//string g0 = m.Groups[0].Value;
string g1 = m.Groups[1].Value;
string g2 = m.Groups[2].Value;
if ( g2 != null && g2.Length > 0 )
{
res.Add(g2);
}
else
{
// get the quoted string, but without the quotes in g1;
res.Add(g1);
}
}
return (string[])res.ToArray(typeof(string));
}
 
Back
Top