R
renethx
This article is a continuation of my previous article titled
"DocuMate 252/262 or fi-4120C2 for archiving documents?" I posted
on Feb 9, 2005 in this group. I would like to add several tips in
digitizing paper documents and archiving them I noticed in course of
doing so in my personal library.
Compression methods for color images
Saving color images in lossless compression format needs lots of disk
space and is not practical. The old compression method of JPEG (high
quality, say, Quality 90 and Sub-sampling 1:1 in ThumbsPlus 7) is still
very good and reduces the file size 5 to 10 times. Degradation of the
image quality is hardly noticeable unless you magnify and examine it
carefully.
OCR programs
Right now the best OCR program is OmniPage 15 Professional from
ScanSoft. I have been using it for a month and I am very impressed. OCR
results are very accurate. It offers full customization for output PDF
files. It comes with a modest set of image enhancement tools. A list of
enhancement steps can be saved as a template and used in any other
workflow later. Image never deforms in the deskew process. It allows
highly customizable workflows and jobs. It allows multiple instances
and can do two or more tasks simultaneously in the same system (you
will need a dual-core processor to use multiple instances effectively,
however). There are a few bugs. In particular the program is not stable
and often shuts down abruptly with no apparent reason. Eventually they
will be fixed in service packs in future (I hope).
I also tried FineReader 8 Professional, but it was much lower than my
expectation. As far as I tested there is no big improvement over
Version 6. It does not allow batch jobs (well, OP Professional is
roughly equivalent to the much pricier FR Corporate edition), image
deforms in deskew process, poor support for output PDF formats (a step
back from FR6), etc.
Flatbed scanners
I often scan bound books because many old academic books I need are out
of print and the only way of keeping it on hand is borrow it from a
library and scan the entire book with a flatbed scanner. For this
purpose I have been using Canon CanoScan LiDE 80 and have been very
satisfied with its performance. Recently I tested a bunch of new models
of flatbed scanners to determine the best scanner for scanning books.
The most important factor is the scan speed. (10-second difference in
scanning a single page will result in more than 1-hour difference in
scanning a 500-page book.) Image quality is also important if you scan
color documents and pictures too. Ergonomics is another important
factor to scan a book quickly. I tested Canon CanoScan 8400F, 4200F,
LiDE 80, LiDE 60, Epson Perfection 3490 and Visioneer OneTouch 9420.
Speeds
To measure scan speeds, I used OP15's TWAIN interface (with
"Automatically scan pages" ON and "Time between scans" 0 sec),
scanned a letter-size document ten times continuously and took the
average scan time per page. Scan time heavily depends on the CPU speed
and the memory bandwidth. For example, scanning a letter-size document
with 4200F in Color at 600dpi takes
31 sec with Athlon 64 3200+ @2630MHz (overclocked),
35 sec with Athlon 64 3200+ @2000MHz (default clock speed),
48 sec with Athlon XP 2400+ @2163MHz (overclocked; default is 2000MHz)
The processor used in this test is Athlon 64 3200+ @2630MHz. The
following is the results of measurement at 400di and 600dpi in each
mode. Unit is second.
CanoScan 8400F: B&W: 17, 29, Grayscale: 18, 33, Color: 19, 35.
CanoScan 4200F: B&W: 15, 26, Grayscale: 16, 30, Color: 16, 31.
CanoScan LiDE 80: B&W: 22, 22, Grayscale: 22, 22, Color: 46, 46.
CanoScan LiDE 60: B&W: 34, 34, Grayscale: 34, 34, Color: 53, 53.
Perfection 3490: B&W: 29, 38, Grayscale: 29, 42, Color: 29, 47.
OneTouch 9420: B&W: 28, 28, Grayscale: 31, 44, Color: 32, 34.
LiDE 80 is by far the fastest in both B&W and Grayscale at 600dpi. For
color documents, 4200F is the fastest at 600dpi.
Image Quality
In B&W mode, there is no problem with any of the scanners. Color images
are very good with any of the Canon and Epson scanners, though color
correction may be necessary for some of them. For example, images from
LiDE 80 are reddish at all ranges (with either "Recommended" or
"CanoScan LiDE 80 Reflective (W) - sRGB IEC61966-2.1" color
management). Like other Visioneer scanners, OneTouch 9420 sets the
white point around 230 by default and there is no way to change it. As
a consequence, the information of lighter parts of the image is lost
permanently. This scanner is unsuitable for pictures for this reason.
Even for color documents, light colors look washed out with this
scanner.
Ergonomics
The reference point of the plate in both CanoScan 8400F and 4200F is
located at the right back corner and it is very difficult to align a
page to it correctly. These scanners should be avoided to scan a bound
book. CanoScan 8400F is also very bulky (nearly three times larger than
LiDE 80 in volume). Both LiDE 80 and LiDE 60 are very compact, easy to
handle and suitable for scanning a bound book. The position of scan
buttons is not good, however. (Cosmetically LiDE 80 made of aluminum is
much better that LiDE 60 made of plastic.) Epson is also good. As for
OneTouch 9420, you need to remove the lid to scan a book and the scan
buttons cannot be used because they are covered by the other side of
the book pages.
Conclusion
The best way to scan a bound book in B&W mode is to use Canon LiDE 80
with OP15's TWAIN interface (with Mode Grayscale, Size B5 in most
cases, "Automatically scan pages" ON and "Time between scans" 0
sec), then save to TIFF files, and change Color Depth to Bi-level (1
bit) with ThumbsPlus (with Threshold 450 for darker documents, 500 for
normal and 550 for lighter documents; these are roughly equivalent to
Threshold 84, 96 and 108 respectively in CanoScan ScanGear). The reason
for scanning in Grayscale mode first is that the deskew process of OP15
works more elegantly with Grayscale than B&W. In this way I am able to
scan a 500-page book in less than 3 hours at 600 dpi in B&W and the
result is a near perfect copy of the entire book.
BTW Canon LiDE 80 has been discontinued. Fortunately it appears on eBay
frequently. If you search in Froogle, you may find LiDE 80 at several
retailers, but I am pretty sure that none of them actually has it in
stock. These retailers never say availability explicitly online to
attract customers and sometimes try to sell a much cheaper model at the
same price of LiDE 80. You had better stay away from them to save time
and money.
"DocuMate 252/262 or fi-4120C2 for archiving documents?" I posted
on Feb 9, 2005 in this group. I would like to add several tips in
digitizing paper documents and archiving them I noticed in course of
doing so in my personal library.
Compression methods for color images
Saving color images in lossless compression format needs lots of disk
space and is not practical. The old compression method of JPEG (high
quality, say, Quality 90 and Sub-sampling 1:1 in ThumbsPlus 7) is still
very good and reduces the file size 5 to 10 times. Degradation of the
image quality is hardly noticeable unless you magnify and examine it
carefully.
OCR programs
Right now the best OCR program is OmniPage 15 Professional from
ScanSoft. I have been using it for a month and I am very impressed. OCR
results are very accurate. It offers full customization for output PDF
files. It comes with a modest set of image enhancement tools. A list of
enhancement steps can be saved as a template and used in any other
workflow later. Image never deforms in the deskew process. It allows
highly customizable workflows and jobs. It allows multiple instances
and can do two or more tasks simultaneously in the same system (you
will need a dual-core processor to use multiple instances effectively,
however). There are a few bugs. In particular the program is not stable
and often shuts down abruptly with no apparent reason. Eventually they
will be fixed in service packs in future (I hope).
I also tried FineReader 8 Professional, but it was much lower than my
expectation. As far as I tested there is no big improvement over
Version 6. It does not allow batch jobs (well, OP Professional is
roughly equivalent to the much pricier FR Corporate edition), image
deforms in deskew process, poor support for output PDF formats (a step
back from FR6), etc.
Flatbed scanners
I often scan bound books because many old academic books I need are out
of print and the only way of keeping it on hand is borrow it from a
library and scan the entire book with a flatbed scanner. For this
purpose I have been using Canon CanoScan LiDE 80 and have been very
satisfied with its performance. Recently I tested a bunch of new models
of flatbed scanners to determine the best scanner for scanning books.
The most important factor is the scan speed. (10-second difference in
scanning a single page will result in more than 1-hour difference in
scanning a 500-page book.) Image quality is also important if you scan
color documents and pictures too. Ergonomics is another important
factor to scan a book quickly. I tested Canon CanoScan 8400F, 4200F,
LiDE 80, LiDE 60, Epson Perfection 3490 and Visioneer OneTouch 9420.
Speeds
To measure scan speeds, I used OP15's TWAIN interface (with
"Automatically scan pages" ON and "Time between scans" 0 sec),
scanned a letter-size document ten times continuously and took the
average scan time per page. Scan time heavily depends on the CPU speed
and the memory bandwidth. For example, scanning a letter-size document
with 4200F in Color at 600dpi takes
31 sec with Athlon 64 3200+ @2630MHz (overclocked),
35 sec with Athlon 64 3200+ @2000MHz (default clock speed),
48 sec with Athlon XP 2400+ @2163MHz (overclocked; default is 2000MHz)
The processor used in this test is Athlon 64 3200+ @2630MHz. The
following is the results of measurement at 400di and 600dpi in each
mode. Unit is second.
CanoScan 8400F: B&W: 17, 29, Grayscale: 18, 33, Color: 19, 35.
CanoScan 4200F: B&W: 15, 26, Grayscale: 16, 30, Color: 16, 31.
CanoScan LiDE 80: B&W: 22, 22, Grayscale: 22, 22, Color: 46, 46.
CanoScan LiDE 60: B&W: 34, 34, Grayscale: 34, 34, Color: 53, 53.
Perfection 3490: B&W: 29, 38, Grayscale: 29, 42, Color: 29, 47.
OneTouch 9420: B&W: 28, 28, Grayscale: 31, 44, Color: 32, 34.
LiDE 80 is by far the fastest in both B&W and Grayscale at 600dpi. For
color documents, 4200F is the fastest at 600dpi.
Image Quality
In B&W mode, there is no problem with any of the scanners. Color images
are very good with any of the Canon and Epson scanners, though color
correction may be necessary for some of them. For example, images from
LiDE 80 are reddish at all ranges (with either "Recommended" or
"CanoScan LiDE 80 Reflective (W) - sRGB IEC61966-2.1" color
management). Like other Visioneer scanners, OneTouch 9420 sets the
white point around 230 by default and there is no way to change it. As
a consequence, the information of lighter parts of the image is lost
permanently. This scanner is unsuitable for pictures for this reason.
Even for color documents, light colors look washed out with this
scanner.
Ergonomics
The reference point of the plate in both CanoScan 8400F and 4200F is
located at the right back corner and it is very difficult to align a
page to it correctly. These scanners should be avoided to scan a bound
book. CanoScan 8400F is also very bulky (nearly three times larger than
LiDE 80 in volume). Both LiDE 80 and LiDE 60 are very compact, easy to
handle and suitable for scanning a bound book. The position of scan
buttons is not good, however. (Cosmetically LiDE 80 made of aluminum is
much better that LiDE 60 made of plastic.) Epson is also good. As for
OneTouch 9420, you need to remove the lid to scan a book and the scan
buttons cannot be used because they are covered by the other side of
the book pages.
Conclusion
The best way to scan a bound book in B&W mode is to use Canon LiDE 80
with OP15's TWAIN interface (with Mode Grayscale, Size B5 in most
cases, "Automatically scan pages" ON and "Time between scans" 0
sec), then save to TIFF files, and change Color Depth to Bi-level (1
bit) with ThumbsPlus (with Threshold 450 for darker documents, 500 for
normal and 550 for lighter documents; these are roughly equivalent to
Threshold 84, 96 and 108 respectively in CanoScan ScanGear). The reason
for scanning in Grayscale mode first is that the deskew process of OP15
works more elegantly with Grayscale than B&W. In this way I am able to
scan a 500-page book in less than 3 hours at 600 dpi in B&W and the
result is a near perfect copy of the entire book.
BTW Canon LiDE 80 has been discontinued. Fortunately it appears on eBay
frequently. If you search in Froogle, you may find LiDE 80 at several
retailers, but I am pretty sure that none of them actually has it in
stock. These retailers never say availability explicitly online to
attract customers and sometimes try to sell a much cheaper model at the
same price of LiDE 80. You had better stay away from them to save time
and money.