scanning large documents

  • Thread starter Thread starter Mark
  • Start date Start date
M

Mark

We are in the process of trying to scan large amounts of documentation
at work. We haven't bought a scanner or a software package yet. I have
used scanners in the past, the most recent is the Brother MLC (?

The problem is we want to keep the files size fairly small and the
quality reasonable (maybe slightly better than fax quality). We will be
scanning 100s of pages at a time. We want to be able to email these
documents to others. I thought if we scanned them in and saved them as
a PDF, that it would at least be easy for the receipient to view them.

Does anyone have a recommendation for what to do to minimize the size
and make the information available broadly to others?

How can we keep the files size small. It always amazed me at how small,
eFax documents are. I know they aren't the best quality but they are
tiny. If we could get slightly better quality than that, we would be in
a good position.

Lastly, does anyone know anything about "digital senders"? would this
be good to use for a project like this?

Thanks
Mark
 
I can't help with that (since I posted a similar question a while ago.)

But I wanted to ask you if you have any recommendations on scanners--it
sounds like you already have one in mind.
 
On 31 Jan 2005 10:46:18 -0800, kyu staggered into the Black Sun and said:

Please include context when you post to Usenet. That POS "G2" excuse
for a Usenet client doesn't do it by default, so get a Real Newsreader
if you can. Context restored:

The smallest scanned images result from low DPI scans (8.5x11 at 300 DPI
as a raw bitmap is 4x larger than 8.5x11 at 150 DPI) in black and white
compressed with CCITT Group 4 (also known as G4 compression). Faxes use
G3, which is black-and-white and slightly less efficient at compression
than G4. Both G3 and G4 are lossless.

Yes. The problem is that it's a PITA to edit a PDF, so if the
recipients need to edit/change the documents, you should just send them
the TIFFs. All modern OSes include some sort of TIFF viewer.

Scan at low DPI (300 max, 150 if you can get away with it--run some
tests!) and send the raw TIFFs if recipients need to edit them, PDFs if
they don't.

I don't know what "eFax" is, but it's highly probable that its images
are compressed with G3. G4 is more efficient than G3 IME.

EXPN "digital senders". That's a pretty ambiguous term--after all, my
computer is digitally sending this Usenet article to the NNTP server.
Did you mean something like "a scanner that scans paper, then sends the
TIFF over Ethernet/802.11b/carrier pigeon to a workstation using
SMB/NFS/Appletalk, where the image is saved in a directory"?
I can't help with that (since I posted a similar question a while
ago.) But I wanted to ask you if you have any recommendations on
scanners--it sounds like you already have one in mind.

? Mark's first paragraph said that he hadn't bought a scanner yet.
Anyway, practically any modern scanner can scan black-and-white
high-contrast stuff at 300 DPI, which is all you need for documents that
have mostly text on them. You'll probably want an automatic document
feeder, which raises the price.
 
We are in the process of trying to scan large amounts of documentation
at work. We haven't bought a scanner or a software package yet. I have
used scanners in the past, the most recent is the Brother MLC (?

The problem is we want to keep the files size fairly small and the
quality reasonable (maybe slightly better than fax quality). We will be
scanning 100s of pages at a time. We want to be able to email these
documents to others. I thought if we scanned them in and saved them as
a PDF, that it would at least be easy for the receipient to view them.

Does anyone have a recommendation for what to do to minimize the size
and make the information available broadly to others?

How can we keep the files size small. It always amazed me at how small,
eFax documents are. I know they aren't the best quality but they are
tiny. If we could get slightly better quality than that, we would be in
a good position.

Lastly, does anyone know anything about "digital senders"? would this
be good to use for a project like this?

Thanks
Mark

Most of the consumer level scanners aren't well suited to this kind of
project. A couple of years back, a buddy had a B+W only unit (a Ricoh,
I think) that just seemed to swallow large documents whole, and spit
out PDF's files. Yes, quality could be adjusted to suit. To prove it,
he pulled out a roughly 100 page, pefect bound manual, for an odd ball
Autocad plugin, removed the binding, dropped the document in the paper
feeder and hit go. About 2 minutes later, he picked up the stacked of
papers, flipped them over, since they were double sided , dropped them
back in the paper feeder, and again hit go. Less than fiver minutes
later, the finished 100 page PDF, properly collated, was written to
disk. If I remember, he paid about $900 for the unit on closeout, but
it was a true industrial duty work horse. If you have a lot to do,
this would be the kind of unit to look for.

If I had to do a large number on the cheap, I would probably use my
Nikon D100 with cable release and 60mm macro on a proper copy stand.
Once exposure is determined, the speed at which documents can be
aquired is limited only by how fast you can remove and replace pages.
Really, much quicker and less painful than using a typical flatbed.
Quality isn't anything to sneer at either. Of course, I already have
the camera, macro lens and copy stand.

A buddy of mine uses a similar setup (D1x w 60mm macro), with a
lightbox, to digitize medium format TEM (transmission electron
microscopy) films and medical x-rays. Idiot proof and painless
compared to the contortions we were going through with the lightlid
equipped scanner.

Just some random food for thought.

David Glos
 
How can we keep the files size small.


There are really only four factors affecting file size of scanned
images.

1. Size of area being scanned. 8x10 inches is 4x larger than 4x5
inches.

2. Scanning resolution. 400 dpi is 4x larger than 200 dpi (square of
resolution difference, 2 squared in this example)

3. Scan mode. RGB color is 3x larger than grayscale, and grayscale is
8x larger than line art mode.

4. File compression. This is more vague, numbers not very precise, but
compression is very good stuff for line art.

Documents are surely scannned in line art mode, and for line art
mode, then LZW, G3, and G4 compression are very good, increasingly
smaller in that order. G4 might be half size of LZW, with G3 in
between. Even LZW will be less than 1/5 size of the data. File
compression slightly smaller than 1/10 size is routinely possible with
G4. Maybe even more, because blank paper space in line art compresses
amazingly well, blank space often has near zero effect on the file size.

JPG is normally very poor for documents, one because JPG must be
grayscale instead of line art (8x larger) and two because JPG
compression is lossy, and to reach 1/10 file size, the JPG artifacts
will appear as terrible quality. Line art with compression runs circles
around JPG in both small file size and quality, but is only applicable
to documents, and not to photos.

High Quality fax mode setting is 200x200 dpi line art. Normal quality
fax is 200x100 dpi lineart (omits every other line). 300 dpi line art
will be noticeably better than fax, basically very near original
quality. Conventional analog fax machines use G3 compression in a TIF
format, and G3 is only a little larger than G4, which is the smallest
line art file.

Adobe Acrobat will use G4 for lineart images in PDF and is likely the
best choice for sending document files to others, especially if
multipage. Depending very much on document content (how much blank
space vs fine detail), 8.5x11 inch documents (full page size full of
content) at 300 dpi line art in PDF will likely be a file maybe 30KB per
page (this varies with page content). This 300 dpi file will be much
better quality than best fax, and will print near original quality. Or
200 dpi (good fax grade) would be slightly less than half that size, but
prints much less well.
 
Back
Top