Jonathan said:
To my way of thinking, parsing means constructing meaning from the raw
text.
http://en.wikipedia.org/wiki/Parsing
"In computer science and linguistics, parsing, or, more formally,
syntactic analysis, is the process of analyzing a text, made of a
sequence of tokens (for example, words), to determine its grammatical
structure with respect to a given (more or less) formal grammar."
The above is NOT consistent with your "way of thinking". If you think
it is, then you have misunderstood something in the Wikipedia
definition. The Wikipedia definition clearly describes parsing as an
interpretive step taken _after_ the "raw text" has already been
processed into a more abstract form (i.e. tokenized).
In that definition, "parsing" is synonymous with "syntactic analysis",
which is quite distinct from the "lexical analysis" that takes place at
the "raw text" stage.
FWIW, I've written a language interpreter before. For me, there was
never any question about if I'd need to parse the raw text and I can't
imagine that ever being the question for such a task. The text needs
parsing, I'm a programming, so that's what should be done.
I'm sorry. I don't understand the above paragraph, or what you're
trying to express.
Is this why
we had so much trouble communicating about what I'm trying to do?
You never said up front what you were trying to do. _That_ is the
reason we had trouble communicating about it. Your only question
initially was specifically about a very specific implementation detail,
without any reference to the larger picture.
I looked at this code for a little while. The code is hard to follow
because it's largely table-driven and not just direct processing of the
input.
Which code? In the tokenize page? Or did you follow a link from the
lexer generator page and look at code there?
The C code on the tokenize page is not table-driven at all. The grammar
is embedded in the code itself. (In fact, some might consider that a
fundamental design limitation, as it makes any extension of the grammar
a code maintenance problem instead of a data maintenance problem).
But I suppose the result is basically that it *is* case
sensitive. So maybe my best approach is to just be case sensitive and
this whole issue will go away. If code using the class wants to be
case-insensitive, then it can simply specify both the upper and lower
case versions of any characters.
That is certainly a valid approach IMHO. Alternatively, you can require
the client code to provide a Comparer<char> for your own code to use.
That way, it has complete control over every aspect of the per-character
comparison as needed and as possible.
Yet another alternative would be to require the client code to provide a
Comparer<string> implementation, and handle the comparison _after_ the
lexical analysis has been done (i.e. after the input has been tokenized).
Personally, I'd go for that last approach. I can't think of a single
programming language where tokens would be delimited in a way that
character case would matter. At worst, there's a transition between
alphanumeric and non-alphanumeric characters, and often there will be
whitespace, commas, semi-colons, etc. between tokens. Only the
and of course in said:
Dude?!? Still with the "state what you're trying to do?" Between the two
threads, I've done that repeatedly now.
That's not true at all. The first – and ONLY – time you mentioned the
fact that you're trying to write an HTML parser was in the _other_
thread, and of course when I wrote the above, I hadn't even seen that
post in the other thread.
If you think that your statements about trying to write a "generic
parser" added anything informative with respect to my question about
"exactly what you're trying to parse", unfortunately you would be
mistaken about that. As I alluded to in my previous reply, the entire
idea of a globally "generic parser" is very abstract, and more a topic
for a graduate student's research, or a doctoral thesis. Such a
statement provides no practical insight into the actual design
requirements of what you're trying to accomplish.
If you have a question about what I'm doing, you can ask.
I have asked. And if you think it's fine for me to ask, why are you
giving me grief about asking? I would not ask a question that I already
know the answer to, so if I ask a question you _think_ you've answered,
then it should be clear to you that somehow, the answer was not
conveyed. I either misunderstood your reply, or your reply did not
actually answer the question you thought it did.
Either way, reiterating the question is the only way to make forward
progress.
I have no idea what part of my repeated
descriptions are unclear to you.
Again, there have been no "repeated descriptions" of the specific
application. You have mentioned the HTML parser application only once,
and that was in a post I had not had a chance to read when I wrote the
text (quoted above) that you are apparently so upset about.
Pete