combining images for OCR conversion

  • Thread starter Thread starter dmehling
  • Start date Start date
D

dmehling

This might sound like a pretty far out idea but is it possible to
combine separate images of a single page of text and convert them by
means of OCR into a single page in a PDF file or any other type of
textbased document? For example, if I was scanning a rather large
book that I needed to scan twice in order to capture a single page,
would be difficult to convert those images into a single page of text?
 
This might sound like a pretty far out idea but is it possible to
combine separate images of a single page of text and convert them by
means of OCR into a single page in a PDF file or any other type of
textbased document? For example, if I was scanning a rather large
book that I needed to scan twice in order to capture a single page,
would be difficult to convert those images into a single page of text?

No, with an image editor you could easily combine two graphic images
into one, and then scan the graphic image with OCR software and
produce one page of text. However, with decent OCR software that isn't
necessary. It's easy to scan individual pages as graphics, then OCR
them all to one continuous document, say MS Word document.
 
No, with an image editor you could easily combine two graphic images
into one, and then scan the graphic image with OCR software and
produce one page of text. However, with decent OCR software that isn't
necessary. It's easy to scan individual pages as graphics, then OCR
them all to one continuous document, say MS Word document.

That's not quite what I meant. I was really thinking of if I scanned
part of one page, and a made another scan to capture the rest of the
page, then both scans would share overlapping portions of the scanned
page. I suppose I could take care of that with a program like
Photoshop, but that would take some degree of manipulation from the
user. I was thinking of something that could do this automatically
with a single click, or even if it could do a batch process of several
images.
 
(e-mail address removed) wrote in @q75g2000hsh.googlegroups.com:
That's not quite what I meant. I was really thinking of if I scanned
part of one page, and a made another scan to capture the rest of the
page, then both scans would share overlapping portions of the scanned
page. I suppose I could take care of that with a program like
Photoshop, but that would take some degree of manipulation from the
user. I was thinking of something that could do this automatically
with a single click, or even if it could do a batch process of several
images.

In scanning and OCR such practises (ADF or automated manipulation) are a
trade off for quality.

Best practice is utilize multiple scans and OCR each multiple scan
individually into a complete end file.

The Library of Congress or Google's Books Project are prime examples of
automation errors.
Pages that are distinguishable to the naked eye and human brain due to
bleed through are usebale by scanners and OCR software.
 
That's not quite what I meant. I was really thinking of if I scanned
part of one page, and a made another scan to capture the rest of the
page, then both scans would share overlapping portions of the scanned
page. I suppose I could take care of that with a program like
Photoshop, but that would take some degree of manipulation from the
user. I was thinking of something that could do this automatically
with a single click, or even if it could do a batch process of several
images.

Again, that's rather trivial. Lots of software will do that matching
automatically.... results vary a lot with what you pay. The software
that came with my old Canon G3 camera called PhotoStitch does a
reasonable job, and it came free with the camera. I'm sure if you
bought a simialr program you could get better results. I just did a
Google search for "photo stitch software" and got LOTS of hits. some
appear to be free, or at least a free trial.
 
This might sound like a pretty far out idea but is it possible to
combine separate images of a single page of text and convert them by
means of OCR into a single page in a PDF file or any other type of
textbased document? For example, if I was scanning a rather large
book that I needed to scan twice in order to capture a single page,
would be difficult to convert those images into a single page of text?

I use the full version of Adobe Acrobat. It lets you work with
different pages sizes and has the ability to combine pages,delete
pages individually or in a group,re-arrange them. You might check the
book stores in the computer section near the photography books for the
books written for Acrobat ver 7 or 8. You would get a free read on the
types of things that are possible. I have used it mainly for capturing
web pages and sites, but it might work with scanners as well. I
haven't been too impressed with any company's OCR up to this point.
As Ever,
Orin
 
You would have to "stitch" the two scans of parts of the page together
into a single image of the entire page with a stitching or photo editing
program. Full version Adobe Acrobat could then import the resulting
file (Tiff, JPEG, whatever). Full Version Acrobat is a very good
program and will let you do almost anything, but one thing it won't do
is combine multiple scans of parts of a single page into a single image.
[However, if you were willing to settle for having each of those
partial page scans be a separate page in the PDF file, then it could do
that all by itself.]
 
Back
Top