Regex help?

  • Thread starter Thread starter Bill Cohagan
  • Start date Start date
B

Bill Cohagan

I'm looking for help with a regular expression question, so my first
question is which newsgroup is the best one to post to? Just in case *this*
is the best choice, here's the problem:

I'm trying to "parse" something that looks like a command line; e.g.,

op arg1, arg2, ..., argn

The individual parts (op, arg1, ...) can be matched with a \w+ pattern --
except that the args *might* be quoted to cover the case where they contain
non \w characters. I've figured out what I think is the "right" pattern, but
what I'm unsure of is how to interpret the results.

What I'm trying to accomplish is to perform a Match operation, then walk
through the resulting match groups and their respective captures, producing
an array containing the arguments. The problem I have is determining the
*order* of things in the captures. Here's the detail...

The pattern for matching the args is an alternation: ((\w+)|"([^"]*)"). This
pattern of course occurs within a repeating outer group so for *this* group
there will be a capture collection with multiple elements. The problem of
course is that the captures will include the quote chars for those cases
where the right alternative was matched. What I want is just what's inside
the quotes. The inner groups (\w+) and ([^"]*) of course have their own
capture collections and these contain exactly what I want, but I don't see
how to determine the order.

To see the problem consider the case: op a, "b", c. The capture group for
the group containing the alternation will contain a, "b", c while the
capture group for the left subgroup will contain a, c and the right subgroup
will contain b. How can I tell the relative order so I can "merge" these
into a, b, c which is the desired result?

Of course I can just use the outer group and reparse the elements of the
capture collection, stripping out the quote chars where necessary, but I'd
like to know how this could be done just using regular expressions. Is it
possible?

Thanks in advance.

Bill
 
Cor
Thanks for the response. These look like pretty good sites for finding
patterns, but unfortunately that's not my problem -- at least I don't think
so. What I need is some help in interpreting the results and in a .Net
specific way since I believe that capture collections are unique to .Net
regex.

Regards,
Bill
 
Bill Cohagan said:
I'm looking for help with a regular expression question, so my first
question is which newsgroup is the best one to post to? Just in case
*this*
is the best choice, here's the problem:

I'm trying to "parse" something that looks like a command line; e.g.,

op arg1, arg2, ..., argn

The individual parts (op, arg1, ...) can be matched with a \w+ pattern --
except that the args *might* be quoted to cover the case where they
contain
non \w characters. I've figured out what I think is the "right" pattern,
but
what I'm unsure of is how to interpret the results.

What I'm trying to accomplish is to perform a Match operation, then walk
through the resulting match groups and their respective captures,
producing
an array containing the arguments. The problem I have is determining the
*order* of things in the captures. Here's the detail...

The pattern for matching the args is an alternation: ((\w+)|"([^"]*)").
This
pattern of course occurs within a repeating outer group so for *this*
group
there will be a capture collection with multiple elements. The problem of
course is that the captures will include the quote chars for those cases
where the right alternative was matched. What I want is just what's inside
the quotes. The inner groups (\w+) and ([^"]*) of course have their own
capture collections and these contain exactly what I want, but I don't see
how to determine the order.

To see the problem consider the case: op a, "b", c. The capture group for
the group containing the alternation will contain a, "b", c while the
capture group for the left subgroup will contain a, c and the right
subgroup
will contain b. How can I tell the relative order so I can "merge" these
into a, b, c which is the desired result?

Of course I can just use the outer group and reparse the elements of the
capture collection, stripping out the quote chars where necessary, but I'd
like to know how this could be done just using regular expressions. Is it
possible?

You could use the "Index" property of the capture objects to get the
captures into order, but I'm not sure if that's the kind of "elegant"
solution you're looking for.

Can't you use the "Matches" method, and let it return one Match item per
"parameter"? That would certainly make things easier. I guess the problem
here is the "op" part; Maybe you could strip that away first: Use one regex
that splits your string into "operation" and "parameter list", and a second
one to split the parameters. (I think that's the "standard regex way", as
many regex engines don't track capture objects)

Hope this helps,

Niki
 
Niki
I hadn't thought of looking at the Index property -- that might be what I
need. As for the Matches method I don't think it'd solve the problem since I
have groups whose matches would be included in the collection, but whose
values I don't want to see. (Actually it might be possible to use "non
capturing" groups in conjunction with Matches to do something ...)

Thanks for the suggestions. I'll post back with what I find.

Bill

Niki Estner said:
Bill Cohagan said:
I'm looking for help with a regular expression question, so my first
question is which newsgroup is the best one to post to? Just in case
*this*
is the best choice, here's the problem:

I'm trying to "parse" something that looks like a command line; e.g.,

op arg1, arg2, ..., argn

The individual parts (op, arg1, ...) can be matched with a \w+ pattern --
except that the args *might* be quoted to cover the case where they
contain
non \w characters. I've figured out what I think is the "right" pattern,
but
what I'm unsure of is how to interpret the results.

What I'm trying to accomplish is to perform a Match operation, then walk
through the resulting match groups and their respective captures,
producing
an array containing the arguments. The problem I have is determining the
*order* of things in the captures. Here's the detail...

The pattern for matching the args is an alternation: ((\w+)|"([^"]*)").
This
pattern of course occurs within a repeating outer group so for *this*
group
there will be a capture collection with multiple elements. The problem of
course is that the captures will include the quote chars for those cases
where the right alternative was matched. What I want is just what's inside
the quotes. The inner groups (\w+) and ([^"]*) of course have their own
capture collections and these contain exactly what I want, but I don't see
how to determine the order.

To see the problem consider the case: op a, "b", c. The capture group for
the group containing the alternation will contain a, "b", c while the
capture group for the left subgroup will contain a, c and the right
subgroup
will contain b. How can I tell the relative order so I can "merge" these
into a, b, c which is the desired result?

Of course I can just use the outer group and reparse the elements of the
capture collection, stripping out the quote chars where necessary, but I'd
like to know how this could be done just using regular expressions. Is it
possible?

You could use the "Index" property of the capture objects to get the
captures into order, but I'm not sure if that's the kind of "elegant"
solution you're looking for.

Can't you use the "Matches" method, and let it return one Match item per
"parameter"? That would certainly make things easier. I guess the problem
here is the "op" part; Maybe you could strip that away first: Use one regex
that splits your string into "operation" and "parameter list", and a second
one to split the parameters. (I think that's the "standard regex way", as
many regex engines don't track capture objects)

Hope this helps,

Niki
 
Niki

Thanks again for the suggestions. I was able to accomplish the goal using
the Index property as the key in a merge of the two capture groups. Works
like a charm.

I'd hoped to do this all with regular expressions, but at least the code I
had to write was fun. I finally got to use a Queue!

Bill

Niki Estner said:
Bill Cohagan said:
I'm looking for help with a regular expression question, so my first
question is which newsgroup is the best one to post to? Just in case
*this*
is the best choice, here's the problem:

I'm trying to "parse" something that looks like a command line; e.g.,

op arg1, arg2, ..., argn

The individual parts (op, arg1, ...) can be matched with a \w+ pattern --
except that the args *might* be quoted to cover the case where they
contain
non \w characters. I've figured out what I think is the "right" pattern,
but
what I'm unsure of is how to interpret the results.

What I'm trying to accomplish is to perform a Match operation, then walk
through the resulting match groups and their respective captures,
producing
an array containing the arguments. The problem I have is determining the
*order* of things in the captures. Here's the detail...

The pattern for matching the args is an alternation: ((\w+)|"([^"]*)").
This
pattern of course occurs within a repeating outer group so for *this*
group
there will be a capture collection with multiple elements. The problem of
course is that the captures will include the quote chars for those cases
where the right alternative was matched. What I want is just what's inside
the quotes. The inner groups (\w+) and ([^"]*) of course have their own
capture collections and these contain exactly what I want, but I don't see
how to determine the order.

To see the problem consider the case: op a, "b", c. The capture group for
the group containing the alternation will contain a, "b", c while the
capture group for the left subgroup will contain a, c and the right
subgroup
will contain b. How can I tell the relative order so I can "merge" these
into a, b, c which is the desired result?

Of course I can just use the outer group and reparse the elements of the
capture collection, stripping out the quote chars where necessary, but I'd
like to know how this could be done just using regular expressions. Is it
possible?

You could use the "Index" property of the capture objects to get the
captures into order, but I'm not sure if that's the kind of "elegant"
solution you're looking for.

Can't you use the "Matches" method, and let it return one Match item per
"parameter"? That would certainly make things easier. I guess the problem
here is the "op" part; Maybe you could strip that away first: Use one regex
that splits your string into "operation" and "parameter list", and a second
one to split the parameters. (I think that's the "standard regex way", as
many regex engines don't track capture objects)

Hope this helps,

Niki
 
Back
Top