regalar expression match

  • Thread starter Thread starter Jed Ozone
  • Start date Start date
J

Jed Ozone

New to Regex and I'm having a hard time figuring this one out.

I need a regular expression what will based on balanced square brackets.
For example:
[$AA[123]], [BB[bb],CC], [a[b[c]]]

I'm trying to write a reg ex that will parse the above into 3 pieces:
1) [$AA[123]] (or $AA[123] would be fine)
2) [BB[bb],CC]
3) [a[b[c]]]

Basically, any time the square brackets balance, I want to be able to pluck
out that value.

I wrote: (?<column>\[([^\[\]]*(\[.*\])*)*\])

Basically, look for a [, then repeatly take either a non-"[" and non-"]"
character or find an internal set of [...] and take everything between them;
until you hit a ]. It's the later part that doesn't work as I can't seem to
figure out how to get the brackets to balance properly. Is this just
something regular expression are not meant to do? Thanks for any help.
 
Hi Jed,

Why don't you create a state machine rather - iterate through string and
count [ (+1) and ] (-1) - when the count=0 and you are over a comma, you
will know that the comma is delimiter.
 
I think String.split(",") help. Since you have comma as delimiter
eg. code
String s = "[a][c[d]], [a], c[d][e]";
String sa[] = s.Split(",");

this will give me an array, which I can check length of & traverse using a for loop to check the contents of other elements

:-)
Kalpesh
 
Jed,

try this:

(?:(?:^|\s*,\s*)(?<column>\[(([^,]*\[[^,]*\][^,]*)*|[^,]*)\]))

this will not work for the string you wrote, since there is a comma in this
expression: [BB[bb],CC]. is this a mistake?

if its not a mistake, get back to me and we will figure it out.



Picho
 
Thanks for the help Picho. The comma is not a mistake unfortunately.
Really the only rules for the syntax are the square brackets will be
balanced and there will be a comma seperating each set (outside all
brackets). Between the brackets all characters are valid along with
additional sets of square brackets. I am trying to only get the outer most
set in each case (as I showed below), as the inner brackets have a different
meaning than the outer brackets (I did not make up this syntax!).

Thanks for an additional ideas you might have.


Picho said:
Jed,

try this:

(?:(?:^|\s*,\s*)(?<column>\[(([^,]*\[[^,]*\][^,]*)*|[^,]*)\]))

this will not work for the string you wrote, since there is a comma in this
expression: [BB[bb],CC]. is this a mistake?

if its not a mistake, get back to me and we will figure it out.



Picho

Jed Ozone said:
New to Regex and I'm having a hard time figuring this one out.

I need a regular expression what will based on balanced square brackets.
For example:
[$AA[123]], [BB[bb],CC], [a[b[c]]]

I'm trying to write a reg ex that will parse the above into 3 pieces:
1) [$AA[123]] (or $AA[123] would be fine)
2) [BB[bb],CC]
3) [a[b[c]]]

Basically, any time the square brackets balance, I want to be able to pluck
out that value.

I wrote: (?<column>\[([^\[\]]*(\[.*\])*)*\])

Basically, look for a [, then repeatly take either a non-"[" and non-"]"
character or find an internal set of [...] and take everything between them;
until you hit a ]. It's the later part that doesn't work as I can't
seem
to
figure out how to get the brackets to balance properly. Is this just
something regular expression are not meant to do? Thanks for any help.
 
Thanks Miha. I was hoping to break apart my whole file using regualar
expressions. I've broken up the file down to this point and was trying to
keep with regular expressions. I actually have written a procedure to do
exactly as you say below which I had planned as a temporary solution. I
actually had thought a regular expression would be able to handle something
like this fairly easily when I started, but hence, my compelete lack of
experience with them.

Now it's become more of a crusade just to see if in fact it's possible to
balanace brackets using regular expressions.

Thanks for the feedback.

Miha Markic said:
Hi Jed,

Why don't you create a state machine rather - iterate through string and
count [ (+1) and ] (-1) - when the count=0 and you are over a comma, you
will know that the comma is delimiter.

--
Miha Markic - RightHand .NET consulting & software development
miha at rthand com

Jed Ozone said:
New to Regex and I'm having a hard time figuring this one out.

I need a regular expression what will based on balanced square brackets.
For example:
[$AA[123]], [BB[bb],CC], [a[b[c]]]

I'm trying to write a reg ex that will parse the above into 3 pieces:
1) [$AA[123]] (or $AA[123] would be fine)
2) [BB[bb],CC]
3) [a[b[c]]]

Basically, any time the square brackets balance, I want to be able to pluck
out that value.

I wrote: (?<column>\[([^\[\]]*(\[.*\])*)*\])

Basically, look for a [, then repeatly take either a non-"[" and non-"]"
character or find an internal set of [...] and take everything between them;
until you hit a ]. It's the later part that doesn't work as I can't
seem
to
figure out how to get the brackets to balance properly. Is this just
something regular expression are not meant to do? Thanks for any help.
 
Hi hemol,

Yes, I understand you. However, regular expressions are more like a pattern
matching and not parameter matching imo.
The kind of problem you are describing is just not meant to be solved with
regex IMO.
Anyway, I am curious too, if anybody comes with a solution with regex.
--
Miha Markic - DXSquad/RightHand .NET consulting & software development
miha at rthand com

Developer Express newsgroups are for peer-to-peer support.
For direct support from Developer Express, write to (e-mail address removed)
Bug reports should be directed to: (e-mail address removed)
Due to newsgroup guidelines, DX-Squad will not answer anonymous postings.
hemol said:
Thanks Miha. I was hoping to break apart my whole file using regualar
expressions. I've broken up the file down to this point and was trying to
keep with regular expressions. I actually have written a procedure to do
exactly as you say below which I had planned as a temporary solution. I
actually had thought a regular expression would be able to handle something
like this fairly easily when I started, but hence, my compelete lack of
experience with them.

Now it's become more of a crusade just to see if in fact it's possible to
balanace brackets using regular expressions.

Thanks for the feedback.

Miha Markic said:
Hi Jed,

Why don't you create a state machine rather - iterate through string and
count [ (+1) and ] (-1) - when the count=0 and you are over a comma, you
will know that the comma is delimiter.

--
Miha Markic - RightHand .NET consulting & software development
miha at rthand com

Jed Ozone said:
New to Regex and I'm having a hard time figuring this one out.

I need a regular expression what will based on balanced square brackets.
For example:
[$AA[123]], [BB[bb],CC], [a[b[c]]]

I'm trying to write a reg ex that will parse the above into 3 pieces:
1) [$AA[123]] (or $AA[123] would be fine)
2) [BB[bb],CC]
3) [a[b[c]]]

Basically, any time the square brackets balance, I want to be able to pluck
out that value.

I wrote: (?<column>\[([^\[\]]*(\[.*\])*)*\])

Basically, look for a [, then repeatly take either a non-"[" and non-"]"
character or find an internal set of [...] and take everything between them;
until you hit a ]. It's the later part that doesn't work as I can't
seem
to
figure out how to get the brackets to balance properly. Is this just
something regular expression are not meant to do? Thanks for any help.
 
This should work:

\[(?>[^\[\]]+|\[(?<DEPTH>)|\](?<-DEPTH>))*(?(DEPTH)(?!))\]

This is based on the method described in the book "Mastering Regular
Expressions" by Jeffrey E. F. Friedl by O'Reilly. It is an excellent book
that covers regular expressions in many different languages.

Basically, the .NET flavor of Regex allows for matching nested constructs
like this. It is a very powerful feature, but it can be a little tricky.


Brian Davis
www.knowdotnet.com
 
Thanks for the expression. Unfortunately, this didn't seem to pick up the
right pieces. It missed the first set of square brackets. So when parsing
something like:

[AA[aa]BB],[CC]

It returned [aa] and [CC] (and a bunch of empty results).

There is some syntax in your expression that I don't understand (a fair
amount), so I'll have to study it. In any case, if nothing else, I learned
something about regular expression and that I'm much better at writing state
machines than regular expressions! Thanks for the help.


Brian Davis said:
This should work:

\[(?>[^\[\]]+|\[(?<DEPTH>)|\](?<-DEPTH>))*(?(DEPTH)(?!))\]

This is based on the method described in the book "Mastering Regular
Expressions" by Jeffrey E. F. Friedl by O'Reilly. It is an excellent book
that covers regular expressions in many different languages.

Basically, the .NET flavor of Regex allows for matching nested constructs
like this. It is a very powerful feature, but it can be a little tricky.


Brian Davis
www.knowdotnet.com




Jed Ozone said:
New to Regex and I'm having a hard time figuring this one out.

I need a regular expression what will based on balanced square brackets.
For example:
[$AA[123]], [BB[bb],CC], [a[b[c]]]

I'm trying to write a reg ex that will parse the above into 3 pieces:
1) [$AA[123]] (or $AA[123] would be fine)
2) [BB[bb],CC]
3) [a[b[c]]]

Basically, any time the square brackets balance, I want to be able to pluck
out that value.

I wrote: (?<column>\[([^\[\]]*(\[.*\])*)*\])

Basically, look for a [, then repeatly take either a non-"[" and non-"]"
character or find an internal set of [...] and take everything between them;
until you hit a ]. It's the later part that doesn't work as I can't
seem
to
figure out how to get the brackets to balance properly. Is this just
something regular expression are not meant to do? Thanks for any help.
 
When I test it, it seems to work as expected. Here is the code snippet and
the output:

Code:
MatchCollection mc =
Regex.Matches("[AA[aa]BB],[CC]",@"\[(?>[^\[\]]+|\[(?<DEPTH>)|\](?<-DEPTH>))*
(?(DEPTH)(?!))\]");
foreach (Match m in mc)
{
Console.WriteLine(m.Value);
}

Output:
[AA[aa]BB]
[CC]


Brian Davis
www.knowdotnet.com



Jed Ozone said:
Thanks for the expression. Unfortunately, this didn't seem to pick up the
right pieces. It missed the first set of square brackets. So when parsing
something like:

[AA[aa]BB],[CC]

It returned [aa] and [CC] (and a bunch of empty results).

There is some syntax in your expression that I don't understand (a fair
amount), so I'll have to study it. In any case, if nothing else, I learned
something about regular expression and that I'm much better at writing state
machines than regular expressions! Thanks for the help.


Brian Davis said:
This should work:

\[(?>[^\[\]]+|\[(?<DEPTH>)|\](?<-DEPTH>))*(?(DEPTH)(?!))\]

This is based on the method described in the book "Mastering Regular
Expressions" by Jeffrey E. F. Friedl by O'Reilly. It is an excellent book
that covers regular expressions in many different languages.

Basically, the .NET flavor of Regex allows for matching nested constructs
like this. It is a very powerful feature, but it can be a little tricky.


Brian Davis
www.knowdotnet.com




Jed Ozone said:
New to Regex and I'm having a hard time figuring this one out.

I need a regular expression what will based on balanced square brackets.
For example:
[$AA[123]], [BB[bb],CC], [a[b[c]]]

I'm trying to write a reg ex that will parse the above into 3 pieces:
1) [$AA[123]] (or $AA[123] would be fine)
2) [BB[bb],CC]
3) [a[b[c]]]

Basically, any time the square brackets balance, I want to be able to pluck
out that value.

I wrote: (?<column>\[([^\[\]]*(\[.*\])*)*\])

Basically, look for a [, then repeatly take either a non-"[" and non-"]"
character or find an internal set of [...] and take everything between them;
until you hit a ]. It's the later part that doesn't work as I can't
seem
to
figure out how to get the brackets to balance properly. Is this just
something regular expression are not meant to do? Thanks for any help.
 
Back
Top