Search inside PDF and CHM files

  • Thread starter Thread starter Ya Ya
  • Start date Start date
Y

Ya Ya

I have a folder with a lot of PDF and CHM files.
I would like to develope an ASP.net application that enables the user to
search inside the content of those files.
How do I search inside those type of files ?

Thanks for your time

(e-mail address removed)
 
Adobe provides an addition to Indexing Services in the form of a dll that
will enable Indexing services to search PDF files. That would enable you to
use Indexing Services to search those files easily. CHM are a whole other
matter though and I'm not sure what the best way to search them is.

Hope this helps,
Mark Fitzpatrick
Microsoft MVP - FrontPage
 
For .CHM files, there are several files that describe the file format. It is
technically undocumented, so using this information is obviously at your own
risk:

http://bonedaddy.net/pabs3/hhm/
http://www.speakeasy.org/~russotto/chm/

As for PDF files, Adobe documents the format. I'm not exactly sure where it
is, but it's out there. You can also take a look at the open source project
XPDF. It is a PDF viewer for X Windows in Unix and you should be able to
learn quite a bit from it.

Both of these formats are fairly complex, so there's no simple way to get at
what you want going this route.

Pete Davis
 
Back
Top