REQ: Scanner recommendation for decent scans

  • Thread starter Thread starter andy
  • Start date Start date
A

andy

I have a Visoneer OneTouch 8900 USB scanner and am very tired of the
poor scan results. Scans are blurry, OCR is mostly pointless at
anything below 300 dpi (slow scans at 300 dpi).

Are there any better scanners priced less than $200 USA? Especially
with a much better OCR package (not TextBridge Pro 9.0).

I use the scanner for:
1. Scanning old printed journals (see below)
2. Scanning old photographs for CDR archival / retouching /
reprinting

The issues I have:
1. Slow scanning times (45 seconds a scan or longer)
(2.4ghz P4, usb1, 128mb or more free memory, gigabytes
of free disk space)

2. Poor raw OCR (accuracy for 300dpi is 91% and the real accuracy is
that about 25% of the words are mis-OCRed - a big issue). I've tried
a 600dpi scan but that takes about 4-5 minutes per scan.

Here are the details for the printed journals:
1. Each page is 10 inches by 12 inches (I cannot cut them
and sheet-feed them)
2. Each page is slightly yellowed (about tan coloured)
3. The text is 8 or 9 point with serifs
4. The newspaper is usually in a 4 column format
5. There is some mixing of text point sizes ( 7 point is the smallest,
with 8 point the usual font)
6. There are a significant number of single character fractions such
as '1/4', '1/2', '2/3', '7/8' (i.e., each digit of the fraction is 4
or 5 point)

My workflow:

1. Scan test page at 300 dpi (requires 2 scans since the page is
larger than the scanning area)
2. Set scanner to scan in black and white (set threshold level to
compensate for the yellow paper)
3. Scan
4. Fix any page skew
5. Crop if needed
6. Save as windows monochrome bitmap file

Repeat 1-6 for an entire journal issue (about 20 pages)

7. Bring the pages into TextBridge Pro 9.0 as poor quality newspaper
scans.
8. Enable OCR training
9. Process each page
10. Send text to Notepad (removes all garbage formatting)
11. Reorder paragraphs if necessary to be in the correct order
12. Copy and paste into MS word
13. Spell check in MS word - correct mispellings due to OCR process -
leave in mispellings in original source material
 
----- Original Message -----
From: <>
Newsgroups: comp.periphs.scanners
Sent: Thursday, July 21, 2005 12:05 AM
Subject: REQ: Scanner recommendation for decent scans

I have a Visoneer OneTouch 8900 USB scanner and am very tired of the
poor scan results. Scans are blurry, OCR is mostly pointless at
anything below 300 dpi (slow scans at 300 dpi).

Are there any better scanners priced less than $200 USA? Especially
with a much better OCR package (not TextBridge Pro 9.0).

I use the scanner for:
1. Scanning old printed journals (see below)
2. Scanning old photographs for CDR archival / retouching /
reprinting

The issues I have:
1. Slow scanning times (45 seconds a scan or longer)
(2.4ghz P4, usb1, 128mb or more free memory, gigabytes
of free disk space)

2. Poor raw OCR (accuracy for 300dpi is 91% and the real accuracy is
that about 25% of the words are mis-OCRed - a big issue). I've tried
a 600dpi scan but that takes about 4-5 minutes per scan.

Here are the details for the printed journals:
1. Each page is 10 inches by 12 inches (I cannot cut them
and sheet-feed them)
2. Each page is slightly yellowed (about tan coloured)
3. The text is 8 or 9 point with serifs
4. The newspaper is usually in a 4 column format
5. There is some mixing of text point sizes ( 7 point is the smallest,
with 8 point the usual font)
6. There are a significant number of single character fractions such
as '1/4', '1/2', '2/3', '7/8' (i.e., each digit of the fraction is 4
or 5 point)

My workflow:

1. Scan test page at 300 dpi (requires 2 scans since the page is
larger than the scanning area)
2. Set scanner to scan in black and white (set threshold level to
compensate for the yellow paper)
3. Scan
4. Fix any page skew
5. Crop if needed
6. Save as windows monochrome bitmap file

Repeat 1-6 for an entire journal issue (about 20 pages)

7. Bring the pages into TextBridge Pro 9.0 as poor quality newspaper
scans.
8. Enable OCR training
9. Process each page
10. Send text to Notepad (removes all garbage formatting)
11. Reorder paragraphs if necessary to be in the correct order
12. Copy and paste into MS word
13. Spell check in MS word - correct mispellings due to OCR process -
leave in mispellings in original source material

Andy,

This inquiry might be better served in:
comp.doc.management
or
comp.ai.doc-analysis.ocr

I do both daily and extensive (for over five years) scanning with a bottom
line scanner ($50 Canon) utilizing Omnipage 9.0. Many folks suggest that 9.0
is terrible quality, however I'm most pleased in the results as compared to
my previous scanner. (My machine a Althon 2.0 with 512k, USB 2.0 none of
this increases the actual scan speed, only the scanner may do that).

The first place I suggest you start is in cleaning both sides of the scanner
glass. It's a careful and tedious process to removed all the streaks,
smudges and accumulated plastic. I often repeat the cleaning 3-4 time in
each cleaning session for quality. (Caution; no paper towels, no windex)

Any DPI less than 300 will not offer good quality for OCR.

If your working with either yellowed or aged documents, color is preferable
over black and white.
You'll be surpised how much improvement this will make.

Small fonts and fractions will be a problem that you'll never resolve.
Flatbed scanners just don't have enough depth-of-field or magnification
options for small fonts. Many of the docs that I work with utilize fifths of
seconds repeatedly and that rarely scan correctly.

The OCR corrections are best made in the OCR software rather than a spell
checker. They do offer a "change all".

As far as multi-column newspaper OCR?
Your best results will be in using a quality copy machine to increase the
page sizes.
I've had good results doing so with some 100+YO newspapers (four column 11 x
17) and the machines at Kinko's.

In summary, I believe your looking for an automated solution that just
doesn't exist.
You either accept crappy results and move on or do the manual editing to
assure the desired goal, there doesn't seem to be any in-between.
 
I have a Visoneer OneTouch 8900 USB scanner and am very tired of the
poor scan results. Scans are blurry, OCR is mostly pointless at
anything below 300 dpi (slow scans at 300 dpi).

Are there any better scanners priced less than $200 USA? Especially
with a much better OCR package (not TextBridge Pro 9.0).

I use the scanner for:
1. Scanning old printed journals (see below)
2. Scanning old photographs for CDR archival / retouching /
reprinting

The issues I have:
1. Slow scanning times (45 seconds a scan or longer)
(2.4ghz P4, usb1, 128mb or more free memory, gigabytes
of free disk space)

2. Poor raw OCR (accuracy for 300dpi is 91% and the real accuracy is
that about 25% of the words are mis-OCRed - a big issue). I've tried
a 600dpi scan but that takes about 4-5 minutes per scan.

Here are the details for the printed journals:
1. Each page is 10 inches by 12 inches (I cannot cut them
and sheet-feed them)
2. Each page is slightly yellowed (about tan coloured)
3. The text is 8 or 9 point with serifs
4. The newspaper is usually in a 4 column format
5. There is some mixing of text point sizes ( 7 point is the smallest,
with 8 point the usual font)
6. There are a significant number of single character fractions such
as '1/4', '1/2', '2/3', '7/8' (i.e., each digit of the fraction is 4
or 5 point)

My workflow:

1. Scan test page at 300 dpi (requires 2 scans since the page is
larger than the scanning area)
2. Set scanner to scan in black and white (set threshold level to
compensate for the yellow paper)
3. Scan
4. Fix any page skew
5. Crop if needed
6. Save as windows monochrome bitmap file

Repeat 1-6 for an entire journal issue (about 20 pages)

7. Bring the pages into TextBridge Pro 9.0 as poor quality newspaper
scans.
8. Enable OCR training
9. Process each page
10. Send text to Notepad (removes all garbage formatting)
11. Reorder paragraphs if necessary to be in the correct order
12. Copy and paste into MS word
13. Spell check in MS word - correct mispellings due to OCR process -
leave in mispellings in original source material
The only suggestion I have is:
If you are happy with the two scans per page, then there are newer and
faster A4 and Letter size scanners for about $100. The OCR solution is, use
Omnipage 14 or Abbyy Fine Reader for much better OCR results.

The best price for Omnipage 14 is found at:
http://www.scantips.com


Abbyy FineReader 7.0 Professional Upgrade for any OCR software you currently
own:
http://www.digitalriver.com/dr/v2/e...id=19652&CID=0&DSP=&CUR=840&PGRP=0&CACHE_ID=0
 
I did some more experimenting and found out that:

1. Scanning at 600dpi black and white with a higher threshold (so that
more black dots are produced) yields a much much higher OCR accuracy.
OCR accuracy may be 99% or more.

2. I created a special OCR training bitmap file with the special
single character Fractions ('1/4', '1/2') far apart so that there is a
lot of white space around each character. Using a large font such as
24 point and repeatedly OCR'ing each character allowed me to train the
OCR software to recognize those fractions.

The extra 4 or 5 minutes per scan actually saves grunt work in
correctng OCR problems and text ordering (i.e., layout was OCR'ed
incorrectly) problems.

I do like the idea of photo-copying using this workflow:

1. photocopy each 10' by 12' page, adjust the contrast and outuput on
8.5 by 11 inch pagper

2. Scan at 600 dpi or higher (this is a single scan which saves time
trying which was spent trying to join two scans together)

3. OCR

4. Correct, reformat text, etc.

Advantages:

1. handling the odd sized, aged paper journals is easier since each
page is 'scanned' by the copier 1 time instead of 2 scans with a 8.5
by 11 scanner.

2. Copier can do a better job of contrast, threshold, etc than the
scanner.

3. Copies could be sheet fed into scanner (requires me byying a new
scanner)

4. Full scan, OCR, output cycle could be automated with sheet feeder.
 
The first place I suggest you start is in cleaning both sides of the scanner
glass. It's a careful and tedious process to removed all the streaks,
smudges and accumulated plastic. I often repeat the cleaning 3-4 time in
each cleaning session for quality. (Caution; no paper towels, no windex)

In my experience *dry* microfiber cloth works the best! However, not
all microfiber cloths are made the same! Many that call themselves
microfiber, ain't! A true microfiber cloth has an almost rubbery
feeling to it when used on glass. It's worth while getting two so when
one is the wash the other one can be used.

Before that, I tried all sorts of liquids from various lens cleaning
liquids to alcohol and, yes, even Windex! Nevertheless, whatever the
liquid there always seem to be a residue. I always used lens paper
because all other paper can cause scratches.

I still managed to scratch the glass, though, but it was due to a
grain of sand which got caught in the paper. So before any cleaning
it's worth while using a blow brush to get rid of big particles first.

Finally, the best way to check if the scanner glass is clean is to
open the lid and scan "nothing" in a darkened room. As the light
passes under the empty glass look at it at a very shallow angle.
Actually, I get down so my eyes are parallel with the glass. It's
amazing what can be seen like that.

After the scan is done, examine it under maximum magnifications (e.g.
in Photoshop) after increasing the brightness until black turns to
light gray. Any glass imperfections, scratches, debris or streaks will
just jump at you!

For compulsive scanner cleaners it's truly a horror to look at! ;o)

Don.
 
That would be different for each scanner as they are all constructed
differently.

What's your scanner model? Maybe someone has specific instructions.

In general, basically you have to find the screws and/or latches
holding the glass enclosure (usually together with the lid) and
affixing them to the scanner base unit.

In my case (BearPaw 4800TA Pro) there were two screws near the lid
hinge (at either end) hiding under two rubber "plugs". I remove the
plugs with my nails, remove the screws and then lift the glass and the
lid together at the back (hinge) end first because the front is still
latched to the buttons' enclosure, and quite finicky to get off.

BTW, before you do this make sure there is as little dust floating
around in the room as possible! *Very* important!!! If you get dust
particles in the base enclosure, you may very well make matters worse.
Or, at the very least, replace one problem (streaks under the glass)
with another (dust on the CCD array)!

I usually vacuum the room first, then let the dust settle. Next I
vacuum again around the scanner, then let the dust settle again. Only
then do I open the scanner with the vacuum tube next to it. That way
any remaining "rogue" dust particles floating around are "lured" into
the vacuum cleaner during the period the glass enclosure is off.

Don.
 
Back
Top