combining images for OCR conversion

dmehling · Jul 31, 2007

This might sound like a pretty far out idea but is it possible to
combine separate images of a single page of text and convert them by
means of OCR into a single page in a PDF file or any other type of
textbased document? For example, if I was scanning a rather large
book that I needed to scan twice in order to capture a single page,
would be difficult to convert those images into a single page of text?

Charlie Hoffpauir · Aug 1, 2007

This might sound like a pretty far out idea but is it possible to
combine separate images of a single page of text and convert them by
means of OCR into a single page in a PDF file or any other type of
textbased document? For example, if I was scanning a rather large
book that I needed to scan twice in order to capture a single page,
would be difficult to convert those images into a single page of text?

No, with an image editor you could easily combine two graphic images
into one, and then scan the graphic image with OCR software and
produce one page of text. However, with decent OCR software that isn't
necessary. It's easy to scan individual pages as graphics, then OCR
them all to one continuous document, say MS Word document.

dmehling · Aug 1, 2007

No, with an image editor you could easily combine two graphic images
into one, and then scan the graphic image with OCR software and
produce one page of text. However, with decent OCR software that isn't
necessary. It's easy to scan individual pages as graphics, then OCR
them all to one continuous document, say MS Word document.

That's not quite what I meant. I was really thinking of if I scanned
part of one page, and a made another scan to capture the rest of the
page, then both scans would share overlapping portions of the scanned
page. I suppose I could take care of that with a program like
Photoshop, but that would take some degree of manipulation from the
user. I was thinking of something that could do this automatically
with a single click, or even if it could do a batch process of several
images.

Don · Aug 1, 2007

(e-mail address removed) wrote in @q75g2000hsh.googlegroups.com:

That's not quite what I meant. I was really thinking of if I scanned
part of one page, and a made another scan to capture the rest of the
page, then both scans would share overlapping portions of the scanned
page. I suppose I could take care of that with a program like
Photoshop, but that would take some degree of manipulation from the
user. I was thinking of something that could do this automatically
with a single click, or even if it could do a batch process of several
images.

In scanning and OCR such practises (ADF or automated manipulation) are a
trade off for quality.

Best practice is utilize multiple scans and OCR each multiple scan
individually into a complete end file.

The Library of Congress or Google's Books Project are prime examples of
automation errors.
Pages that are distinguishable to the naked eye and human brain due to
bleed through are usebale by scanners and OCR software.

Charlie Hoffpauir · Aug 2, 2007

That's not quite what I meant. I was really thinking of if I scanned
part of one page, and a made another scan to capture the rest of the
page, then both scans would share overlapping portions of the scanned
page. I suppose I could take care of that with a program like
Photoshop, but that would take some degree of manipulation from the
user. I was thinking of something that could do this automatically
with a single click, or even if it could do a batch process of several
images.

Again, that's rather trivial. Lots of software will do that matching
automatically.... results vary a lot with what you pay. The software
that came with my old Canon G3 camera called PhotoStitch does a
reasonable job, and it came free with the camera. I'm sure if you
bought a simialr program you could get better results. I just did a
Google search for "photo stitch software" and got LOTS of hits. some
appear to be free, or at least a free trial.

orink · Aug 4, 2007

This might sound like a pretty far out idea but is it possible to
combine separate images of a single page of text and convert them by
means of OCR into a single page in a PDF file or any other type of
textbased document? For example, if I was scanning a rather large
book that I needed to scan twice in order to capture a single page,
would be difficult to convert those images into a single page of text?

I use the full version of Adobe Acrobat. It lets you work with
different pages sizes and has the ability to combine pages,delete
pages individually or in a group,re-arrange them. You might check the
book stores in the computer section near the photography books for the
books written for Acrobat ver 7 or 8. You would get a free read on the
types of things that are possible. I have used it mainly for capturing
web pages and sites, but it might work with scanners as well. I
haven't been too impressed with any company's OCR up to this point.
As Ever,
Orin

Barry Watzman · Aug 5, 2007

You would have to "stitch" the two scans of parts of the page together
into a single image of the entire page with a stitching or photo editing
program. Full version Adobe Acrobat could then import the resulting
file (Tiff, JPEG, whatever). Full Version Acrobat is a very good
program and will let you do almost anything, but one thing it won't do
is combine multiple scans of parts of a single page into a single image.
[However, if you were willing to settle for having each of those
partial page scans be a separate page in the PDF file, then it could do
that all by itself.]

Scanning Files for OCR to Various Bitmap Formats	7	Nov 29, 2005
Optimal Scanning into PDF	18	Oct 24, 2007
OCR of image PDF's from command line - any ideas?	6	Oct 5, 2005
OCR scan multple pages into one WordPerfect document	1	Jul 15, 2004
Beginner's questions on scanning	19	Jul 10, 2004
Direct OCR of electronic document	8	Apr 13, 2007
Good document scanner with ADF, PDF and OCR	12	Oct 1, 2007
Extracting TIFF images from PDF files	0	Oct 23, 2012

combining images for OCR conversion

dmehling

Charlie Hoffpauir

dmehling

Don

Charlie Hoffpauir

orink

Barry Watzman

Ask a Question

Similar Threads