B
Bill Cohagan
I'm looking for help with a regular expression question, so my first
question is which newsgroup is the best one to post to? Just in case *this*
is the best choice, here's the problem:
I'm trying to "parse" something that looks like a command line; e.g.,
op arg1, arg2, ..., argn
The individual parts (op, arg1, ...) can be matched with a \w+ pattern --
except that the args *might* be quoted to cover the case where they contain
non \w characters. I've figured out what I think is the "right" pattern, but
what I'm unsure of is how to interpret the results.
What I'm trying to accomplish is to perform a Match operation, then walk
through the resulting match groups and their respective captures, producing
an array containing the arguments. The problem I have is determining the
*order* of things in the captures. Here's the detail...
The pattern for matching the args is an alternation: ((\w+)|"([^"]*)"). This
pattern of course occurs within a repeating outer group so for *this* group
there will be a capture collection with multiple elements. The problem of
course is that the captures will include the quote chars for those cases
where the right alternative was matched. What I want is just what's inside
the quotes. The inner groups (\w+) and ([^"]*) of course have their own
capture collections and these contain exactly what I want, but I don't see
how to determine the order.
To see the problem consider the case: op a, "b", c. The capture group for
the group containing the alternation will contain a, "b", c while the
capture group for the left subgroup will contain a, c and the right subgroup
will contain b. How can I tell the relative order so I can "merge" these
into a, b, c which is the desired result?
Of course I can just use the outer group and reparse the elements of the
capture collection, stripping out the quote chars where necessary, but I'd
like to know how this could be done just using regular expressions. Is it
possible?
Thanks in advance.
Bill
question is which newsgroup is the best one to post to? Just in case *this*
is the best choice, here's the problem:
I'm trying to "parse" something that looks like a command line; e.g.,
op arg1, arg2, ..., argn
The individual parts (op, arg1, ...) can be matched with a \w+ pattern --
except that the args *might* be quoted to cover the case where they contain
non \w characters. I've figured out what I think is the "right" pattern, but
what I'm unsure of is how to interpret the results.
What I'm trying to accomplish is to perform a Match operation, then walk
through the resulting match groups and their respective captures, producing
an array containing the arguments. The problem I have is determining the
*order* of things in the captures. Here's the detail...
The pattern for matching the args is an alternation: ((\w+)|"([^"]*)"). This
pattern of course occurs within a repeating outer group so for *this* group
there will be a capture collection with multiple elements. The problem of
course is that the captures will include the quote chars for those cases
where the right alternative was matched. What I want is just what's inside
the quotes. The inner groups (\w+) and ([^"]*) of course have their own
capture collections and these contain exactly what I want, but I don't see
how to determine the order.
To see the problem consider the case: op a, "b", c. The capture group for
the group containing the alternation will contain a, "b", c while the
capture group for the left subgroup will contain a, c and the right subgroup
will contain b. How can I tell the relative order so I can "merge" these
into a, b, c which is the desired result?
Of course I can just use the outer group and reparse the elements of the
capture collection, stripping out the quote chars where necessary, but I'd
like to know how this could be done just using regular expressions. Is it
possible?
Thanks in advance.
Bill