B
Big D
Hi all,
I'm working on a little app that will go through a text file (right now a
"rich text" document), and parse it into a pseudo-html that our flash
programmers can use in their presentation.
I'm having a lot of trouble, because the rtf format is quite complicated...
at first we thought it seemed that there was no "nesting" of formatting, but
every once in a while it seems like there is. Also, depending on the
complexity of the original document, we may end up with lots of
un-decypherable syntax. In other words it's not as simple as:
{\b this is bold text}{\b\i this is bold and italicised text}
because every once in a while you'll have something like:
{\b bold text}\d\adsfaa\adsagd\aeaqwewe\a\\\\asdf\\{\b\i this is bold and
italicised text}{{/das/dd /d More text} /d/as///jh/}
So there's no way to easily break content into just the {/format Text}
definitions.
It all means something, I'm sure, but rather than try and re-work the whole
spec for rtf -> my format, I was hoping that there was a simplier format
that the text could be saved as before parsing. The originals are word
documents. The target pre-parsing format simply needs to include line
breaks, bolding, italicising, and underlining. All other formatting can go
out the window.
There are commerical components that handle rtf -> HTML, but that's not
really what I need and would have to re-parse it all anyway.
Is there a format that does this? Or does anyone have any good ideas?
Thanks for any input,
MCD
I'm working on a little app that will go through a text file (right now a
"rich text" document), and parse it into a pseudo-html that our flash
programmers can use in their presentation.
I'm having a lot of trouble, because the rtf format is quite complicated...
at first we thought it seemed that there was no "nesting" of formatting, but
every once in a while it seems like there is. Also, depending on the
complexity of the original document, we may end up with lots of
un-decypherable syntax. In other words it's not as simple as:
{\b this is bold text}{\b\i this is bold and italicised text}
because every once in a while you'll have something like:
{\b bold text}\d\adsfaa\adsagd\aeaqwewe\a\\\\asdf\\{\b\i this is bold and
italicised text}{{/das/dd /d More text} /d/as///jh/}
So there's no way to easily break content into just the {/format Text}
definitions.
It all means something, I'm sure, but rather than try and re-work the whole
spec for rtf -> my format, I was hoping that there was a simplier format
that the text could be saved as before parsing. The originals are word
documents. The target pre-parsing format simply needs to include line
breaks, bolding, italicising, and underlining. All other formatting can go
out the window.
There are commerical components that handle rtf -> HTML, but that's not
really what I need and would have to re-parse it all anyway.
Is there a format that does this? Or does anyone have any good ideas?
Thanks for any input,
MCD