Trying to get a "perfect" scan of original documents...

  • Thread starter Thread starter James Rowe, Sr.
  • Start date Start date
J

James Rowe, Sr.

Hello...

I am using an Epson Perfection 2580 and my goal is to make PDF files
of many documents I have accumulated over the years. An example of
the text in these documents would be at:

http://www.drf.com/row/pps/04hol_futurity.pdf

I want to preserve the quality as much as possible. So far, my best
bet has been to use grayscale at 600 dpi, scan as a TIFF and then
importing it into Acrobat to convert to a PDF. I have found the
histogram hints at www.scantips.com very useful, because the biggest
issue has been the faint gray background that results from the scan.
The quality is very good, although I wonder if I can do better.

Also, I do have a few issues. Each page is a couple of MB, which
isn't the end of the world, and I'd take the size hit if I'm gaining
the quality. I have tried to use B&W mode and the resulting scans are
just not as good. Same with 300 vs. 600. There is definitely a
difference.

Anyway, before I get underway with the project, I am wondering if what
I've found makes sense, or if someone has a suggestion on how to do
this either better (for quality) or more efficiently (either file
size, a software suggestion, or maybe the TIF to PDF process).

Any thought appreciated...thanks and enjoy the holidays.
 
James Rowe said:
Hello...

I am using an Epson Perfection 2580 and my goal is to make PDF files
of many documents I have accumulated over the years. An example of
the text in these documents would be at:

http://www.drf.com/row/pps/04hol_futurity.pdf

I want to preserve the quality as much as possible. So far, my best
bet has been to use grayscale at 600 dpi, scan as a TIFF and then
importing it into Acrobat to convert to a PDF. I have found the
histogram hints at www.scantips.com very useful, because the biggest
issue has been the faint gray background that results from the scan.
The quality is very good, although I wonder if I can do better.

Also, I do have a few issues. Each page is a couple of MB, which
isn't the end of the world, and I'd take the size hit if I'm gaining
the quality. I have tried to use B&W mode and the resulting scans are
just not as good. Same with 300 vs. 600. There is definitely a
difference.

Anyway, before I get underway with the project, I am wondering if what
I've found makes sense, or if someone has a suggestion on how to do
this either better (for quality) or more efficiently (either file
size, a software suggestion, or maybe the TIF to PDF process).

Any thought appreciated...thanks and enjoy the holidays.
Your PDF looks very good. I do see a cropping on the right edge. Maybe
change to landscape orientation to get the full width of the document in the
PDF.
The size of the PDF is not bad, it is 141KB.

You seem to have found a method that works.

If Acrobat can except a .GIF file, you would really reduce the size of the
scanned file. If you have to use a TIFF, you will just have to take the
couple of MB hit.

The only suggestion I would make on scanning, is to try adjusting the
threshold setting in B&W Text mode, to see if you can eliminate the
background of the paper and get blacker black.

If that does not give the results you want, try adjusting the White point in
the Gray scale histogram . At some point you can White out the gray
background and keep the Black text. Also increasing the contrast will help.

There is nothing wrong with scanning at 600 dpi to get the needed detail.
From the looks of your PDF, you needed 600 dpi, there is a lot of detail in
that document.
 
Your PDF looks very good.

I'm sorry for the confusion. The link I had posted was an example of
the originals I am working with. I only wish that my scans were that
perfect! Is it normal that my grayscale/600 scans are turning into
2MB PDFs? Neither B&W mode or 300 dpi grayscale never has looked as
good for me, and I know that the file size would be dramatically less
with either of those options.

Thanks,
Jim
 
Why don't you just scan directly into a .pdf? Go File > Import in Acrobat
to access this feature. You can also adjust the parameters within Acrobat
meet your quality needs while still trying to minimize file size. See the
users manual for more info. on this.

Doug
 
James Rowe said:
I'm sorry for the confusion. The link I had posted was an example of
the originals I am working with. I only wish that my scans were that
perfect! Is it normal that my grayscale/600 scans are turning into
2MB PDFs? Neither B&W mode or 300 dpi grayscale never has looked as
good for me, and I know that the file size would be dramatically less
with either of those options.

Thanks,
Jim
Well, a little calculation, if you are scanning a 8.5 x 11 at 600 dpi, you
are getting:
8.5 x 600=5100 pixels
11 x 600=6600 pixels
Since gray scale uses 8 bits per pixel or one byte per pixel you have:
5100 x 6600 = 33,660,000 bytes. Which is a 32 MB TIFF.

A B&W scan uses 1 bit per pixel so divide the above by 8 (8 bits per
byte)=4,207,500 bytes for a B&W file.
My scanner software confirms the calculation. gray scale 600 dpi,
8.5x11=32,761KB and B&W=4,117KB

In the USA we use comma between thousands and a period(dot) for decimal.

If Acrobat is reducing that file down to 2 MB, I would say, Yeah!

OCR or PDF quality varies. If the originals are very clean, with no smudges
and perfect type, then the PDF or OCR results are very near perfect. The
best accuracy claimed is 99%.

There is a saying, "You can't make a silk purse out of a sows ear". Which
means that you need silk to make a silk purse not pig skin.

If you start with less than perfect documents, then you will have trouble
with the OCR and PDF.

The best a person can hope for is the best scan possible and a little clean
up of the scan in a Photo Editor before giving it to Acrobat.

You have a big job to do.

OT:
Have you seen the PDFs made by the US government? I have never seen so many
PDFs, that are almost perfect. Any .gov web site will have many PDFs.
fcc.gov is loaded with PDF.

I guess when you have unlimited funds, you can get that kind of results.
 
You have a big job to do.
OT:
Have you seen the PDFs made by the US government? I have never seen so many
PDFs, that are almost perfect. Any .gov web site will have many PDFs.
fcc.gov is loaded with PDF.

I guess when you have unlimited funds, you can get that kind of results.
Actually, if you own Acrobat Professional, it installs device drivers
that allow you to print directly to pdf format. You can do this with a
variety of software packages like Word, PowerPoint, etc. It isn't THAT
expensive.

Brian
 
Brian P. said:
Actually, if you own Acrobat Professional, it installs device drivers
that allow you to print directly to pdf format. You can do this with a
variety of software packages like Word, PowerPoint, etc. It isn't THAT
expensive.

Brian

Good to know.

Acrobat Pro is too expensive at $449 from Adobe for me.

I do not use PDF enough to justify buying it.
 
Acrobat Pro is too expensive at $449 from Adobe for me.

I do not use PDF enough to justify buying it.

I use Acrobat 5 at work. It cost around 150UKP a couple of years ago. At
home I use WordPerfect Office 2002 for all my word processing. It cost me
25UKP on eBay and will produce excellent PDFs.

Mike
 
RE/
Acrobat Pro is too expensive at $449 from Adobe for me.

The last installation I worked at was getting it for about $200 per seat.
Might be worth shopping around.
 
I use Acrobat 5 at work. It cost around 150UKP a couple of years ago. At
home I use WordPerfect Office 2002 for all my word processing. It cost me
25UKP on eBay and will produce excellent PDFs.

Mike


another nice feature is that the process creates searchable text so
you can query lengthy documents for desired words and phrases. It is
seems to be EXTREMELY efficient in data compression. I am working with
a 14+ meg WordPerfect document (lots of figures) that reduces down to
a perfect looking pdf document that is only about 800K. I'm still
trying to figure that one out.

Brian
 
----- Original Message -----
From: "James Rowe, Sr." <>
Newsgroups: comp.periphs.scanners
Sent: Tuesday, December 21, 2004 9:13 AM
Subject: Re: Trying to get a "perfect" scan of original documents...

I'm sorry for the confusion. The link I had posted was an example of
the originals I am working with. I only wish that my scans were that
perfect! Is it normal that my grayscale/600 scans are turning into
2MB PDFs? Neither B&W mode or 300 dpi grayscale never has looked as
good for me, and I know that the file size would be dramatically less
with either of those options.

Thanks,
Jim

Hello Jim,
Your NOT going to be able to duplicate the quality of the
example your provided from DRF.
The Daily Racing Form (ever since I've seen them, late 60's) have used
recycled and very very thin newspaper. The pages will yellow rapidly. Even
deteriaote over time if not stored properly.

Recently I've been working with some old harness racing newspapers that were
duplicate in page size to the DRF and the same quality of paper of even
less. These that I've done are called "The Horse Review" and were published
until 1930.

The best solution that I've found is to go to office supply store that has a
quality copy machine with somebody working there who may instruct you on
using it.
Kinko's works for me on a Saturday or Sunday morning at 7AM..

Take one form with you for your test.
Copy your pages at 11 X 17 with a black and white copier.
You may want to trim and straighten the edges and angle of layout of the
pages at Kinko's rather than at home with scissors.
Go home and scan.
If successful return next time to Kinko's with your plie of forms and a
pocket ful of money. Save some for the track though ;-))))
 
Hello...

Thanks to everyone for the great replies. I've been tinkering with
this for a few days, and I have determined that I'm never going to
achieve perfection. I think even if I had the FCCs money that would
be the case...nothing is going to be like an originally generated PDF.

The "sexiest" looking scan I got was a full blown 32MB TIF or BMP, and
I think that 128 pages of them would fit on a DVD. It's not a far
fetched scenario, because I really am archiving, and not looking to
share the files. However, my goal is to be able to print them for my
enjoyment. When printed, there is still a bit of a quality hit even
at 32MB, and that's where the conclusion of never achieving perfection
came from.

Black and white was definitely out as an option, and tinkering with
the histogram in grayscale was the best bet. 600 dpi made enough of a
difference to me that I wouldn't be happy at 300. I have no complaint
about my Epson Perfection 2580...the scans seemed really amazing to my
untrained eyes.

As far as some of the ideas, scanning them directly as PDFs in Acrobat
is a huge, and I mean an incredibly huge space savings in file size,
but I thought the quality suffered even when I jacked up the settings
to highest quality. Also, I found that converting TIF files in
Acrobat to PDF was different that converting them to Photoshop PDFs.
I assume an expert out there would know why that is, but the file size
was bigger in Photoshop, but both files still opened as plain ol'
PDFs.

To my standardbred friend, I was fortunate to have cut up a lot of
Forms many years ago and photocopied the races I wanted, plus I have
past performances that were generated on normal paper. I do have a
lot of yellowing editions that are untouched, and the Kinko's idea is
a terrific one.

I wish you all the best this holiday season, and if someone can talk
me out of the idea that perfection is impossible, I'm all ears.

James
 
----- Original Message -----
From: "lostinspace" <>
Newsgroups: comp.periphs.scanners
Sent: Thursday, December 23, 2004 10:47 PM
Subject: Re: Trying to get a "perfect" scan of original documents...

----- Original Message -----
From: "James Rowe, Sr." <>
Newsgroups: comp.periphs.scanners
Sent: Tuesday, December 21, 2004 9:13 AM
Subject: Re: Trying to get a "perfect" scan of original documents...



Hello Jim,
Your NOT going to be able to duplicate the quality of the
example your provided from DRF.
The Daily Racing Form (ever since I've seen them, late 60's) have used
recycled and very very thin newspaper. The pages will yellow rapidly. Even
deteriaote over time if not stored properly.

Recently I've been working with some old harness racing newspapers that
were duplicate in page size to the DRF and the same quality of paper of
even less. These that I've done are called "The Horse Review" and were
published until 1930.

The best solution that I've found is to go to office supply store that has
a quality copy machine with somebody working there who may instruct you on
using it.
Kinko's works for me on a Saturday or Sunday morning at 7AM..

Take one form with you for your test.
Copy your pages at 11 X 17 with a black and white copier.
You may want to trim and straighten the edges and angle of layout of the
pages at Kinko's rather than at home with scissors.
Go home and scan.
If successful return next time to Kinko's with your plie of forms and a
pocket ful of money. Save some for the track though ;-))))

Jim,
I've had some additional thoughts.
Some 2 1/2 years ago I obtained some photo copies of these same DRF type
Horse Reviews from the 1890's.

Central Michigan University had some speicific articles in these papers of
which I was interested. I was offered to copy them myself, however I thought
the papers were so fragile that I desired not to do so. Rather, CMU had a
staff student to the copying and I picked it up a few days later.
The student downsized the original 11 X 17 pages to 81/2 X 14. Initially I
wasn't happy with this as it made the OCR that I wished to do more
difficult.
(These article on online at one of my websites. Just do a search on "Horse
Review")

For your purpose of scanning images rather than OCR, it would prove
effective.

However, not being aware if the Epson 2850 has an 11 x 17 surface (I doubt
it,) you may need to scan these downsized copies (81/2 X 11) at larger DPI
in your scanner and then resize them to 8 1/2 X 11 in Acrobat.

Scanning directly, Using Acrobat and the line-art (Black and white) option
will be effective from these photo copies. It will not be the same quality
as in the DRF example you provided, however it will be effective. Some
scanners offer a DPI setting change for LineArt/Black and White while others
do not.

I've scanned some harness racing programs (non 11 X 17) into Acrobat PDF's
as LineArt/Black and White which are quite useful and even at the scanner
option of 150DPI.

I've also sent you a private email. I may be able to assist you more and do
not hesitate to contact me.
 
Actually, if you own Acrobat Professional, it installs device drivers
that allow you to print directly to pdf format. You can do this with a
variety of software packages like Word, PowerPoint, etc. It isn't THAT
expensive.


There is no difference in the Standard and Professional version of Acrobat in
that regard, as both offer the same "print to pdf" driver (called Distiller).
There is a feature comparison chart on the Adobe site that shows version
differences, and the differences are in other areas, mostly Forms and CAD
options. But printing to PDF is basic in both, this is simply what Acrobat
does, how it works.

Instead, it is PaperPort 10 where it is only the Professional 10 version
(and NOT the Standard 10 version) that also provides an extremely similar
"print to pdf" driver.

These allow you to print anything to PDF that can be printed from any program
that can print anything. You simply select that "pdf printer" at the
File-Print menu, and you get PDF, regardless of what you print. The idea is
that the pdf formatted page from that original program is the same as you
would see on printed paper. For example, your tax software can print copies
of your tax forms to PDF. Quicken can print its reports to PDF, etc.

Both programs will also scan pages into PDF, but the main idea of PDF is to
print the original source TEXT document to pdf, from the original's word
processor or page layout program. The original source document is printed.
This printed PDF then stores the printed text as text characters, instead of
as a huge image of a page (searchable, and the text characters are vastly
smaller than full page images). This is the "normal" way PDF files are
created, and the way Acrobat is designed to work. The case of scanning full
page images into PDF is just a subset of this, one very large image on the
otherwise blank pdf page, no text.

The huge advantage of printed original text over scanned pages is that the
text form is very clear, and clear at any enlargement, and are extremely small
files, at least relative to full size scanned pages, which can quickly become
astronomical.

The main factor is of course that you need the original source file.

When you print to PDF, that PDF driver software has options of how to resample
any embedded images on the original source text page, for example to 75 dpi
size for video purposes or 300 dpi size for purposes for high quality
printing. It's pretty important you get this choice correct for your purpose.
If you only have full page images, this would be the main difference of
printing full page scanned images as opposed to simply inserting or scanning
them into Acrobat. Printing will resample to the declared goal size,
inserting them will leave them as is (at least this seems to be true).
PaperPort seems more eager to resample all of them, but it is configurable.

I believe this Racing Form original is a similar case (printed to pdf from the
original layout source), and is NOT a scanned page. It is obviously not an
image, and as proof of that, just scale it, down to 25% size, up to 800% size,
and watch the features, which all scale perfectly at any size, even including
the oval race track logo in upper right corner. Also as proof, if you
"extract images" in Acrobat, you get very little to show for it. This
"printed" logo and all text characters and lines are created from scaled
postscript line drawings instead of scanned bit map images. Conversely for an
example, find a PDF with a scanned page image, and scale it up and down
similarly, and the difference will be immediately apparent. One scales, one
doesnt, or rather the quality suffers very greatly.

My point is that we should not assume all those PDF hardware/software manuals
we see are scanned pages, as they certainly are not. The original source file
that prints the manual to paper was simply also printed to pdf file. And we
should not assume scanned pages can have the same perfect quality either.
When we see a PDF containing a full size scanned page image, we recognize it
immediately. But the normal form of pdf file is printed from the original
source file.
 
Back
Top