E
Ed
Hello wizards,
I need to accomplish the following task:
- Iterate through a large directory structure of files
- For each file found that is an image-only PDF (no text)
I need to OCR the file and save it in the same folder it
was found as origfilename_OCRed (PDF text format).
Despite a lot of searching and trying several OCR programs,
I have not been able to find a solution for OCRing from the
command line and converting multiple image PDF's into
multiple text PDF documents. I'd be happy with a solution
on either Windows or Linux, that doesn't cost huge $ and is
reasonably accurate. Neither OmniPage nor FineReader for
instance appear to have command-line options.
As a bonus, I'd love any ideas on how to recognize from the
command line whether a PDF file is image-only or text, since
I only want to OCR the image PDF files.
Thanks in advance!
--Ed Rozenberg
I need to accomplish the following task:
- Iterate through a large directory structure of files
- For each file found that is an image-only PDF (no text)
I need to OCR the file and save it in the same folder it
was found as origfilename_OCRed (PDF text format).
Despite a lot of searching and trying several OCR programs,
I have not been able to find a solution for OCRing from the
command line and converting multiple image PDF's into
multiple text PDF documents. I'd be happy with a solution
on either Windows or Linux, that doesn't cost huge $ and is
reasonably accurate. Neither OmniPage nor FineReader for
instance appear to have command-line options.
As a bonus, I'd love any ideas on how to recognize from the
command line whether a PDF file is image-only or text, since
I only want to OCR the image PDF files.
Thanks in advance!
--Ed Rozenberg