need help scanning documents

  • Thread starter Thread starter thedarkman
  • Start date Start date
T

thedarkman

I'm engaged in a long term project scanning and annotating an archive.
It contains hundreds of photographs and thousands of documents, the
latter mostly A4 but including a lot of newspaper articles.

The press reports no problem by and large; if they take up a few
columns the size doesn't matter, but I'd like the A4s to come out more
or less as seen. When I scan the photos I use 96dpi and they are
okay, ditto the small reports but scanning a document at that
resolution leaves the result rather poor quality, and increasing the
resolution makes them come out BIG.

Any help regarding getting them coming out as seen greatly
appreciated.

I'm using exclusively jpgs but if pdfs are the way forward I'd do
that albeit reluctantly.

Thanks
 
thedarkman said:
I'm engaged in a long term project scanning and annotating an archive.
It contains hundreds of photographs and thousands of documents, the
latter mostly A4 but including a lot of newspaper articles.

The press reports no problem by and large; if they take up a few
columns the size doesn't matter, but I'd like the A4s to come out more
or less as seen. When I scan the photos I use 96dpi and they are
okay, ditto the small reports but scanning a document at that
resolution leaves the result rather poor quality, and increasing the
resolution makes them come out BIG.

Any help regarding getting them coming out as seen greatly
appreciated.

I'm using exclusively jpgs but if pdfs are the way forward I'd do
that albeit reluctantly.

Thanks
Are you scanning them in greyscale or lineart mode? If you're scanning for the
screen then 96dpi is probably about right as your monitor is usually around
that. If you're printing you'll need to scan at something much higher, say
300dpi (a typical resolution for a laser printer) which will then print at the
same size as the original, if printed at 300dpi. You'll have to have a resized
/resampled version to view on screen as 10 inches at 300dpi will take up 31.25
inches on a 96dpi display.

You have to accept the fact that the screen is a different medium to print.
Scan at a quality at least as good as your ultimate target.
 
thedarkman said:
I'm engaged in a long term project scanning and annotating an archive.
It contains hundreds of photographs and thousands of documents, the
latter mostly A4 but including a lot of newspaper articles.
[...]
Thanks

See CS2's "File->Automate->Image Processing" option. It's super! You can
rip out different size/format files all at once. It creates the folders
necessary. Highly recommended. You can use it's check-box features and
add more batch processing besides.

I had a similar project to do - 3,100 scans of monochrome images (B&W)
for a historical effort. Here's how we worked: first, every print was
scanned at what would be 360ppi to TIF files in B&W if the prints had
not stained, otherwise in color. (Stained B&W prints are more easily
fixed in color, then saved as monochrome.) Big external drives are
cheap enough to do that. That was the tedious part.

Then we batched them all at once to small (500pixel on the long side)
JPG files, high quality (8 on a scale of 1-10) for quick review on
screen. The "Automate->Fit Image" option is very good for that.

When the review is finished, and the finals are selected, I will go back
and do the color/stain corrections, and so-forth, resaving in TIF for
archiving and JPEG for web use.
 
thedarkman said:
I'm engaged in a long term project scanning and annotating an archive.
It contains hundreds of photographs and thousands of documents, the
latter mostly A4 but including a lot of newspaper articles.
[snip]
I'm using exclusively jpgs but if pdfs are the way forward I'd do
that albeit reluctantly.

You're already walking down the wrong path. If this is for archival
purposes, storage is cheap. Don't worry about big files; storage is
cheap and will only get cheaper.

Never use JPEG for archival projects. JPG uses "lossy" compression; it
degrades the quality of your image. You want to avoid this degradation
in an archive.
 
One more comment on something that I overlooked earlier.

You say "'m using exclusively jpgs but if pdfs are the way forward I'd
do that albeit reluctantly"

There is no issue of JPEGs (JPGs) vs. PDFs.

When you save a file in PDF format, it's an Adobe Acrobat file for
viewing (and this IS what I'd recommend), but the document is stored
INTERNALLY within the PDF in some other graphics format. This can be
either JPG or TIFF (or any of many other formats), and if there is a
reason to do so, the individual pages of the document within the PDF
file can be "exported" out of the PDF file back to their native file
format (or to other formats which Acrobat supports for export, e.g. you
can export a TIFF file even if the internal format is JPEG).
Effectively, the PDF file becomes a "wrapper" for the graphics formats
of the individual pages.

That said, for a lot of reasons, JPEG is the most commonly used format
for internal storage. And in my view (I know that many will disagree),
JPEG is fine if you don't compress excessively.
 
http://www.sendspace.com/file/g5r9wz


Hi All,

I posted here recently; as I said, I'm working on a major archive
which involves a lot of scanning but I'm having trouble especially
with documents. I received some suggestions, which were helpful. One
guy said not to scan in jpg format. I gave that some consideration but
decided to use them.

When I've scanned A4 documents before I've had some problems but the
documents on two of my sites

http://www.ismichaelstoneguilty.org/

and http://www.geocities.com/satpalramisguilty/

have come out very well.

I've just uploaded the following files in the above archive to
SendSpace

m_s_daley_statement_1.jpg
jessie-wey-valley-chess-grading-list-page-1.jpg
jessie-house-of-commons-1.jpg
jessie-lloyds-bank-1996-1.jpg
jessie-surrey-girls-chess-league-january-1996-page-1.jpg

the file m_s_daley_statement_1.jpg displays perfectly on a website, ie
when it is linked from an html file. I'd like the others to look the
same way. Some of the photos here are of a high resolution. They also
need lightening but I'm most concerned about the documents, I want
them to display as near perfect A4 reads.

Any help appreciated.
 
In my opinion, the best way to do this is with Adobe Acrobat as PDF
files; the internal format (which you can export to any desired graphics
format) is likely to be JPEG (there may be a way to change that, but I
don't know what it is if so). Unless there is very, very fine print and
detail, scan at 300 dpi and 256-bit grayscale (I am presuming that these
are B&W documents, obviously if there are color documents, that changes
things). I've done tens of thousands of pages, they are
indistinguishable from the originals on screen, and on paper if printed.
Acrobat uses your scanner software to do the actual scanning, in my
case it's HP PrecisionScan Pro, and I go to a lot of effort to get the
exposure controls optimized for each document (time consuming but it
assures a perfect output). Acrobat can properly scan and interleave
both sides of a double sided document even when the scanner has a
non-duplexing (single sided) document feeder (I am using an HP 5490C).
 
Well, I found the source of the problem, although I can't believe how
simple/stupid it is.

There is nothing wrong with the scanner, but the scanner assumes that
the image at the end of the film strip starts more or less AT the very
edge of the stip (EXACTLY at the edge). I was working with a strip from
the end of the film, it had an extra 4 to 8 mm of blank "film" beyond
the edge of the image, and that threw off every image on that strip by 4
to 8 mm (quite a bit). Trimming the film to way under 1mm within the
edge of the image fixed the issue.
 
In my opinion, the best way to do this is with Adobe Acrobat as PDF
files; the internal format (which you can export to any desired graphics
format) is likely to be JPEG (there may be a way to change that, but I
don't know what it is if so). Unless there is very, very fine print and
detail, scan at 300 dpi and 256-bit grayscale (I am presuming that these
are B&W documents, obviously if there are color documents, that changes
things). I've done tens of thousands of pages, they are
indistinguishable from the originals on screen, and on paper if printed.
Acrobat uses your scanner software to do the actual scanning, in my
case it's HP PrecisionScan Pro, and I go to a lot of effort to get the
exposure controls optimized for each document (time consuming but it
assures a perfect output). Acrobat can properly scan and interleave
both sides of a double sided document even when the scanner has a
non-duplexing (single sided) document feeder (I am using an HP 5490C).
Hi,

most are colour documents A4 size but I always scan in colour
regardless. I have no trouble with small pieces, ie newspaper articles
but scanning an entire A4 document causes problems.
 
I would advise against scanning B&W documents in color. The file is 3x
larger, scanning takes longer, and there is actually some loss of
quality relative to an original non-color document.
 
Back
Top