Regex.Matches Problem

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I am currently working on a project and need to get a return… even if that
return is a failure. I must also add that I have no control over either the
Regular Expression that will be used or the text file that will be parsed,
and while my text example is XML, there is no guarantee that the text file
will be xml.

This is a portion of the file I am testing with:
<Stock ticker="ACEGX" type="EQUITY" title="VAN KAMPEN STRATEGIC GROWTH
FUND CLASS A" price="44.47" net="-0.01" volume="0" />
<Stock ticker="ACSRX" type="EQUITY" title="VAN KAMPEN COMSTOCK FUND CL R"
price="19.85" net="0.02" volume="0" />
<Stock ticker="ADP" type="EQUITY" title="Automatic Data Processing Inc."
price="50.58" net="-0.11" volume="422600" />
<Stock ticker="AEE" type="EQUITY" title="Ameren Corp." price="53.26"
net="-0.41" volume="129800" />
<Stock ticker="AEP" type="EQUITY" title="American Electric Power Company
Inc." price="45.56" net="-0.39" volume="753100" />

Everything works as it should if I use the Regular Expression:
Stock ticker="(?<ticker>.*?)" type="(?<type>.*?)" title="(?<title>.*?)"
price="(?<price>.*?)" net="(?<net>.*?)" volume="(?<volume>.*?)"

But if I miss 1 space (“net=†instead of “ net=â€) there is no return, it
locks up (I let it run all night, it’s not just slow)… (The bad expression):
Stock ticker="(?<ticker>.*?)" type="(?<type>.*?)" title="(?<title>.*?)"
price="(?<price>.*?)"net="(?<net>.*?)" volume="(?<volume>.*?)"

I need a way to set up the Regex that will not lock up…

Any ideas?

(If this is not the right place to post this question, please let me know
which forum would be better)
 
Hello JAul,

I'm wondering what exactly your question is. As I understand you, you have
one expression that works and one that doesn't. The one that doesn't work,
also locks up the system for an unknown reason - why do you care, as it
doesn't work anyway?
But if I miss 1 space (“net=†instead of “ net=â€) there is no
return, it
locks up (I let it run all night, it’s not just slow)… (The bad
expression):
Stock ticker="(?<ticker>.*?)" type="(?<type>.*?)" title="(?<title>.*?)"
price="(?<price>.*?)"net="(?<net>.*?)" volume="(?<volume>.*?)"

I need a way to set up the Regex that will not lock up…

Seems to me as if the answer was "use the one with the right spacing (as
you're likely to do anyway, as the other doesn't even work) and you'll be
fine".

I'm sure I'm somehow missing the point.


Oliver Sturm
 
Thank you for the reply and sorry I was not clear, but I have no control over
the expression or the document, a user will enter the parsing expression and
point at the document, my application needs to report the matches.

If the expression is way off it works, it reports no matches, which is
correct. The problem is if the expression is close but not quite right.

My question is about setting up the regex call...

Regex m_RegEx = new Regex(m_strExpression);
MatchCollection m_Matches = m_RegEx.Matches(m_strTextToParse);
return m_Matches.Count;

The problem is the m_Matches = m_RegEx.Matches(m_strTextToParse); never
returns if the expression (m_strExpression) is that tad bit off.

What I need is advice on how to set up the call (C# code) so it will not
lock up, that it will return (even if the return is just a failure).
 
Hello JAul,
Thank you for the reply and sorry I was not clear, but I have no control
over
the expression or the document, a user will enter the parsing expression
and
point at the document, my application needs to report the matches.

Right, I understand that.
If the expression is way off it works, it reports no matches, which is
correct. The problem is if the expression is close but not quite right.

Okay - but you say you're not the one who writes that expression, right?
My question is about setting up the regex call...

<snip>

You seem to be under the impression that there's something you can do
about the call ("setting it up" - what's that supposed to mean?) that
would influence whether or not the expression works. I don't understand
what you imagine you could do. If the expression is wrong, it's wrong - it
won't work under any circumstances.

Actually I would say that if you have an expression that makes the call to
the Matches() method never return, you've found a bug in the regex
implementation. It would probably be good if you'd report that to
Microsoft. But that won't help you now - there's no trick you can use from
the outside to make that method return if it hangs due to that bug.


Oliver Sturm
 
Oliver Sturm said:
Hello JAul,


Right, I understand that.


Okay - but you say you're not the one who writes that expression, right?


<snip>

You seem to be under the impression that there's something you can do
about the call ("setting it up" - what's that supposed to mean?) that
would influence whether or not the expression works. I don't understand
what you imagine you could do. If the expression is wrong, it's wrong - it
won't work under any circumstances.

Actually I would say that if you have an expression that makes the call to
the Matches() method never return, you've found a bug in the regex
implementation. It would probably be good if you'd report that to
Microsoft. But that won't help you now - there's no trick you can use from
the outside to make that method return if it hangs due to that bug.


Oliver Sturm

Thank you Oliver, that is what I was afraid of... I will report the bug and
see what I get back from Micrsoft.

I was hoping that there was something that could be done with the m_Matches
= Regex.Matches(m_strExp) call that would fix the lack of a return.
 
JAul said:
Thank you Oliver, that is what I was afraid of... I will report the
bug and see what I get back from Micrsoft.

I was hoping that there was something that could be done with the
m_Matches = Regex.Matches(m_strExp) call that would fix the lack of a
return.

It may not be a bug - some regexes can take a very long time to process even
though they look simple.

Perhaps you could do a test when the user has entered the regex: start the
regex on some test data in a separate thread and if that thread takes too
long, kill the thread and tell the user to try again.

I have no knowledge of creating/killing threads.

Andrew
 
Back
Top