Well, you've made the usual mistake of not defining your rules. An example
may imply some rules, but not others. For example, your example does not
state whether or not an odd number of double-quotes might be found in the
string. You have not specifically said whether or not double-quotes
surrounding a phrase must be included in the match, nor whether spaces
surrounding a phrase must be included in the match. There are a number of
other rules which are not specified as well, such as handling line breaks.
A regular expression is an expression of a set of rules which must be
absolutely specific.
However, I will give you a few examples that should cover the various
possibilities.
First, we are looking at 2 specific sets of rules:
1. A phrase surrounded by double-quotes.
2. A phrase *not* surrounded by double-quotes.
Therefore, in order to match them, we must either create 2 groups, or use
one group to split the total string into matches of the other. If we use 2
groups, we can get both, but we will have to sort out which is which. If we
only use one, we will need to perform 2 sets of operations:
1. Match all matches.
2. Split and get all remaining elements.
So, the rule for the phrases surrounded by quotes is fairly simple:
"[^"]*"
Translated, this says that a match is defined by a double-quote, followed by
zero or more non-double-quotes (any character except a double-quote),
followed by a double-quote. This will capture, in your example:
"I love"
"of you"
Now, if you create a rule that is the opposite of that, you get:
[^"]*
Translated, this says that a match is any phrase *not* containing a
double-quote.
These 2 can be used together with grouping and an "or " ('|') operator, as
in:
("[^"]*")|([^"]*)
It is important to order them in this way, as the first group will capture
double-quotes, and the second group will capture anything *except*
double-quotes. If the second group is used first, it will capture the
phrases captured by the first group without capturing the double-quotes, and
the first group will not, as they have already been consumed.
When using this version, both groups are captured, effectively capturing the
entire string into 2 groups of matches, and you use the groups to identify
which regular expression was matched (quoted in group 1 and non-quoted in
group 2). You should also note that the second group will capture spaces
between the quoted phrases and the non-quoted phrases, as part of the
non-quoted phrase. I know of no way to trim this in the regular expression
itself, so you would have to trim the values from the matches themselves.
--
HTH,
Kevin Spencer
Microsoft MVP
Professional Development Numbskull
Abnormality is anything but average.
Göran Andersson said:
I would make a pattern that matches spaces with an optional quoted phrase,
and split on that.
Some untested code, but it should get you started:
Regex re = new Regex(@" |(?: ?(""[^""]*"") ?)");
string[] splitted = re.Split(input);
I am going to write a function that the search engine done.
in search engine, we may using double quotation to specify a pharse
like "I love you",
How can I using regular expression to sperate each pharse?
test case:
"I love" all "of you"
I would like it return: "I love", all, "of you" Thank you!