Dave said:
Greetings,
Is anybody aware of any code that will allow me to read .rtf or .doc or
.pdf or .htm as plain text (so I can do a streamreader off them). Thanks,
Each format would require a different tool. Microsoft Word can do .rtf and,
of course, .doc.
But for PDF check out the pdftotext.exe from the XPDF library
http://www.foolabs.com/xpdf/download.html
from their web site:
"Xpdf is an open source viewer for Portable Document Format (PDF) files.
(These are also sometimes also called 'Acrobat' files, from the name of
Adobe's PDF software.) The Xpdf project also includes a PDF text extractor,
PDF-to-PostScript converter, and various other utilities.
Xpdf runs under the X Window System on UNIX, VMS, and OS/2. The non-X
components (pdftops, pdftotext, etc.) also run on Win32 systems and should
run on pretty much any system with a decent C++ compiler. "
It's a commandline tool so you would need to shell out to it, and then open
a streamreader against the output file.
David