Text Parsing with Qualifiers

  • Thread starter Thread starter Lucas Tam
  • Start date Start date
L

Lucas Tam

Hi all,

Does anyone know of a GOOD example on parsing text with text qualifiers?

I am hoping to parse text with variable length delimiters/qualifiers. Also,
qualified text could run onto mulitple lines and contain characters like
vbcrlf (thus the multiple lines).

Anyhow, any help would be appreciated. Thanks!
 
Does anyone know of a GOOD example on parsing text with text qualifiers?

What exactly do you mean by text qualifiers? Characters?

Parsing strings in .NET has become even easier than in VB6, and it's
certainly easier than C++. Have you taken a look at the String class? It
contains many methods for maniplating strings

http://msdn.microsoft.com/library/d...us/cpref/html/frlrfsystemstringclasstopic.asp

You can even look into regular expressions for examining strings for
patterns. They are a bit fiddly to get the hang of to start with but they
are very very useful. I recently changed an HTML parsing routine that I had
for a regular expression alternative and the code size has bee dramatically
reduced.

http://msdn.microsoft.com/library/d...stemtextregularexpressionsregexclasstopic.asp

Anyway I hope this information can help you :-) If you let me know a little
bit more about what kind of strings you are wanting to maniplate I might be
able to give you some more tips.

Nick.

--
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
"No matter. Whatever the outcome, you are changed."

Fergus - September 5th 2003
/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
 
Nak said:
What exactly do you mean by text qualifiers? Characters?

Ya, I am hoping to parse strings like:


"""This is a Quote""",01/01/2003,"Some Interesting Text, Here"

etc etc.

I've seen sample code that only handles single character delimiters/text
qualifiers, but I am hoping to find code that can handle any length text
qualifier/delimiters.

It's not TOO hard to parse such text, but if someone else has already
written some good code, might as well use it.
 
Lucas said:
Ya, I am hoping to parse strings like:


"""This is a Quote""",01/01/2003,"Some Interesting Text, Here"

etc etc.

I've seen sample code that only handles single character delimiters/text
qualifiers, but I am hoping to find code that can handle any length text
qualifier/delimiters.

So you really mean something like:

quotequotequoteThis is a Quotequotequotequotecomma01/01/2003commaquoteSome
Interesting Textcomma Herequote

( :-) )
It's not TOO hard to parse such text, but if someone else has already
written some good code, might as well use it.

If anyone hase some public VB (.NET or otherwise) code for generic handling of
this sort of thing, I'd like to see it too, but in .NET the best option is a
custom Regex, probably with extra code to handle context.
 
Mark said:
So you really mean something like:

quotequotequoteThis is a Quotequotequotequotecomma01/01/2003commaquoteSome
Interesting Textcomma Herequote

( :-) )


If anyone hase some public VB (.NET or otherwise) code for generic handling
of this sort of thing, I'd like to see it too, but in .NET the best option
is a custom Regex, probably with extra code to handle context.

I should add: if you're talking about parsing anything more complex, you
should look at .NET versions of lex and yacc, etc.
 
So you really mean something like:

quotequotequoteThis is a
Quotequotequotequotecomma01/01/2003commaquoteSome Interesting
Textcomma Herequote

Exactly! I'm trying to build an import routine that is as flexible as
possible. Who knows, maybe someone does use odd delimters like that : )
 
I should add: if you're talking about parsing anything more complex, you
should look at .NET versions of lex and yacc, etc.

Ah, I used Yacc briefly with Java. I didn't know it existed with .NET.
Thanks for the tip!
 
Lucas said:
Exactly! I'm trying to build an import routine that is as flexible as
possible. Who knows, maybe someone does use odd delimters like that : )

When I posed the "comma" separated values example I was going to provide a
Regex for it, but at the time I didn't have enough time...

Here it is:

((((quote)(?<quoted>(([^q])|(q[^u])|(qu[^o])|(quo[^t])|(quot[^e])|((quote)(quo
te)))*)(quote)))|(?<unquoted>(([^c])|(c[^o])|(co[^m])|(com[^m])|(comm[^a]))*))
((comma)|$)

I've put in a couple of pairs of brackets to highlight how this could be
produced by an automated generator...

The intended use of the above regex is to loop through all matches, checking
there are no unmatched gaps - syntax errors -- and ignoring the null match at
the end of the string. The <quoted> group needs to have quotequote reduced to
quote -- .Replace "quotequote" "quote" -- and only on of <quoted> or
<unquoted> should have any content.

Can someone confirm whether there's an optimisation for this Regex using the
extended grouping features?
 
Back
Top