Beginner's questions on scanning

  • Thread starter Thread starter Larry
  • Start date Start date
L

Larry

I have a large job to do, turning several hundreds of pages of 8.5 x 11
xeroxed pages into a single .pdf document. It is straight text. The
computer I'm using has the Lexmark 1100 all-in-one with a flat bed
scanner. I've never done a large scan job and have never created a pdf
document.

What is the best and fastest way to do this? Should I scan into Word,
and then change the Word documents to .pdf? The problem with that is
that when I pick Word as the application for the scanning, after a
single page is scanned, Word opens with that document in it. If I then
scan a second page, a _second_ instance of Word opens with that second
page in it, instead of adding the second page to the first page in
single document. I don't see how to create a multiple page document if
I'm using Word as the application

Alternatively, I can scan into OCR and then change the OCR to pdf, and
that allows the creation of a multiple-page document, which I can then
save as a pdf. That seems the way to go. However, if I do scan into
OCR, the application I must use by default is the Lexmark Photo Editor,
and that doesn't sound like the right kind of application for pages that
are all text.

Another question. Let's say I scan 50 pages into the OCR and then turn
that into a pdf file. Then I go away and come back to do more pages
which get turned into a second pdf file. How do I combine the two pdf
files into a single file, keeping the correct order of the pages?

BTW, the Help files for the Lexmark are really poor. Everything is
broken into separate little steps, nothing gives you an overview of how
to proceed.

I appreciate any tips on this. Thanks much.

Larry
 
----- Original Message -----
From: "Larry" <>
Newsgroups: comp.periphs.scanners
Sent: Saturday, July 10, 2004 12:34 AM
Subject: Beginner's questions on scanning

I have a large job to do, turning several hundreds of pages of 8.5 x 11
xeroxed pages into a single .pdf document. It is straight text. The
computer I'm using has the Lexmark 1100 all-in-one with a flat bed
scanner. I've never done a large scan job and have never created a pdf
document.

What is the best and fastest way to do this? Should I scan into Word,
and then change the Word documents to .pdf? The problem with that is
that when I pick Word as the application for the scanning, after a
single page is scanned, Word opens with that document in it. If I then
scan a second page, a _second_ instance of Word opens with that second
page in it, instead of adding the second page to the first page in
single document. I don't see how to create a multiple page document if
I'm using Word as the application

Alternatively, I can scan into OCR and then change the OCR to pdf, and
that allows the creation of a multiple-page document, which I can then
save as a pdf. That seems the way to go. However, if I do scan into
OCR, the application I must use by default is the Lexmark Photo Editor,
and that doesn't sound like the right kind of application for pages that
are all text.

Another question. Let's say I scan 50 pages into the OCR and then turn
that into a pdf file. Then I go away and come back to do more pages
which get turned into a second pdf file. How do I combine the two pdf
files into a single file, keeping the correct order of the pages?

BTW, the Help files for the Lexmark are really poor. Everything is
broken into separate little steps, nothing gives you an overview of how
to proceed.

I appreciate any tips on this. Thanks much.

Larry

Hello Larry,
Is your ALL-in-one scanner a flatbed or a feed-fax-like?
I've never used the fax-like, except as a photo copier.

You haven't specified to any length what your end-use or purpose will be for
the entire package?
Does it need to be searchable?
Or will it just be printed?

Nor do you specify the quality of the photostat copies your working with?
Nor, even the source the photo copies were taken from?

Each scanning job is a new method when OCRing text. The quality of paper,
ink and print quality in the original source ALL effect your method and
capabilities.

FAST?
Rather than OCR!
Scan directly into Acrobat as "line-art" at 150DPI of your scanner setting.

The end result will be what you desire, FAST (also the scan file sizes will
be compact,) however what your able to use this line-art scan for will be
very limited (NO search option.) The quality of printing in a line-art
scanned PDF is no where near the quality of an OCR'd with fonts job, however
in most instances the line-art items are perfectly readable.

If you require searchable pages and quality print?
There there is NOT any FAST method.
You OCR the pages individually and make the corrections.

There are ways to improve your OCR work.
Most scanner when used for OCRing text are nearly as sensitive as what I've
sen discussed in this forum for scanning slides.
My current scanner was cleaned when it was NEW and FRESH out of the box
because the maunfacturer had some type of haze on the glass. IMO it was not
a sign of effective quality control, however with regular cleaning the
bottom line scanner has worked superbly for my purposes.

I do way more scanning than the average or even above average person. In
most instances, what I'm working with are magazine going back from 1940 to
current. However I've even scanned some books from as early as 1903.
In scanning the magazines, I scan the text and images separtely. The text
is OCR'd and saved as a Word RTF (or 6.0) doc. In rare instances where the
printed font is very small I may either scan as line-art or OCR, it depends
as previously mentioned on what my intended end-use is.

Word does NOT open Acrobat PDF's.
Word does NOT edit Acrobat PDF's.
The two are entirely separate software's. The Acrobat Reader may open from
within Word, however that transition and/or difference should be obvious to
the user.

In Acrobat (the full version-NOT the Free Reader,) your able to insert
and/or rearrange your pages both before and after the current page.

When adding a new page to any type of scanned document, your active cursor
should be at the position you desire the scan inserted.

Sounds to me like you just need to read up on SCANNING.
Here's an excellent source and every SECOND you spend reading these pages
will save you days and weeks in time later.
http://www.scantips.com/
 
One thing to add to what "lostinspace" said, with Acrobat you can download a
module for the full version of Acrobat 5.x that adds OCR capability to
Acrobat (it may be standard in the newer 6.x versions). The OCR is not
perfect, but as OCR goes it is fairly decent. I would definitely scan at
225-300 dpi if you plan to use OCR in Acrobat just so that it does a bit
better job of resolving the text.

Doug
 
As I said, it is a flat bed scanner. The originals are good quality
xeroxes copied from books and articles, on 8 x 11 sheets. It is a
collection of articles and excerpts from books on politics and political
philosophy. The end use will be a pdf document posted at a web site
that people can access.

Why do you capitalize FAST? Is that an acronym?

I will check out the link you provided.

Thanks,
Larry
 
Why do you capitalize FAST? Is that an acronym?

It was merely for emphasis. quotes or asterisk would have served just as
well.
Sorry if I confused you.
 
I have a large job to do, turning several hundreds of pages of 8.5 x 11
xeroxed pages into a single .pdf document. It is straight text. The
computer I'm using has the Lexmark 1100 all-in-one with a flat bed
scanner. I've never done a large scan job and have never created a pdf
document.

I've been scanning documents for seveal years both at home and work.
Several hundred pages is a daunting task! I'd definitely look into a
sheet feed scanner for that! Yesterday I had to scan about thirty
pages of an old manual on a HP flatbed scanner for a customer and it
took more time than I care to spend.

Software that you should consider is PaperPort 9.0. You can scan
pages here and there and covert them to or create them as PDF files
and stack seperate batches into one single file and email it or
whatever. It is searchable text when scanned as text and run through
the PaperPort OCR which does a great job if the text is clear and scan
resolution is set for 300dpi.
 
Larry,
I've had a sinus migraine for two days which was quite profound in
my last brief reply.

If all you desire for the end result is reading and printing than line-art
scans will do OK.

However, initially your inquiry contained the following "a large job to do,
turning several hundreds of pages of 8.5 x 11 > xeroxed pages into a single
..pdf document." end of quote

No mention here of website use? As in your reply?
Have you seen the server side logs for a website when visitors began viewing
multiple page PDF's online?

There are numerous repeats of downloading the entire PDF when the visitor
merely goes from page to page. Let's say you have a 100-page PDF which the
visitor has opened online?
When they view page one the full file loads, then when they change to page
two, the full file loads again, it's possible that you could have as many as
50-100 loads of your entire large file if the visitor continues browsing to
the end. (I find such visitors annoying and short-sighted.) Add to that the
possibility of the numerous bots spidering your PDF, both god and bad bots
and your bandwidth could increase unnecessarily, very fast.

My suggestion is some caution in placement of the file location (a
disallowed folder in your robots.txt solve the bot problem.)

However, I would also consider breaking up the large document into smaller
chapters or modules. Six-eight pages online are plenty from a server side
point of view.
 
Unfortunately, your replies are hard to follow. For one thing, instead
of replying to my questions you go into a lot of other side issues.
Even when you are addressing my questions it's hard to understand what
you're saying. It would help if you would read read over your posts
before you post them and ask yourself, "Is the other person going to be
able to understand anything useful out of this?"

I'm not saying this to put you down. I'm telling you something
important for your and other people's benefit. The whole point of these
groups is to exchange useful information, and while I appreciate your
replies, you need to try to write your replies in a way so that the
other person will be able to make some sense out of them. Thank you.

Larry
 
----- Original Message -----
From: "Larry" <>
Newsgroups: comp.periphs.scanners
Sent: Monday, July 12, 2004 5:43 AM
Subject: Re: Beginner's questions on scanning

Unfortunately, your replies are hard to follow. For one thing, instead
of replying to my questions you go into a lot of other side issues.
Even when you are addressing my questions it's hard to understand what
you're saying. It would help if you would read read over your posts
before you post them and ask yourself, "Is the other person going to be
able to understand anything useful out of this?"

I'm not saying this to put you down. I'm telling you something
important for your and other people's benefit. The whole point of these
groups is to exchange useful information, and while I appreciate your
replies, you need to try to write your replies in a way so that the
other person will be able to make some sense out of them. Thank you.

Larry



My replies do not make sense to you beacuse you have not done any extensive
scanning, which is obvious today.

did you view the link I provided in the closing line of this previous reply?
I really don't need to ask and can affirm without your answer that you DID
NOT!

Hopefully another who is more adapt at communication than myself will come
along and assist you.


----- Original Message -----
From: "Larry" <>
Newsgroups: comp.periphs.scanners
Sent: Saturday, July 10, 2004 12:34 AM
Subject: Beginner's questions on scanning

I have a large job to do, turning several hundreds of pages of 8.5 x 11
xeroxed pages into a single .pdf document. It is straight text. The
computer I'm using has the Lexmark 1100 all-in-one with a flat bed
scanner. I've never done a large scan job and have never created a pdf
document.

What is the best and fastest way to do this? Should I scan into Word,
and then change the Word documents to .pdf? The problem with that is
that when I pick Word as the application for the scanning, after a
single page is scanned, Word opens with that document in it. If I then
scan a second page, a _second_ instance of Word opens with that second
page in it, instead of adding the second page to the first page in
single document. I don't see how to create a multiple page document if
I'm using Word as the application

Alternatively, I can scan into OCR and then change the OCR to pdf, and
that allows the creation of a multiple-page document, which I can then
save as a pdf. That seems the way to go. However, if I do scan into
OCR, the application I must use by default is the Lexmark Photo Editor,
and that doesn't sound like the right kind of application for pages that
are all text.

Another question. Let's say I scan 50 pages into the OCR and then turn
that into a pdf file. Then I go away and come back to do more pages
which get turned into a second pdf file. How do I combine the two pdf
files into a single file, keeping the correct order of the pages?

BTW, the Help files for the Lexmark are really poor. Everything is
broken into separate little steps, nothing gives you an overview of how
to proceed.

I appreciate any tips on this. Thanks much.

Larry


Hello Larry,
Is your ALL-in-one scanner a flatbed or a feed-fax-like?
I've never used the fax-like, except as a photo copier.

You haven't specified to any length what your end-use or purpose will be for
the entire package?
Does it need to be searchable?
Or will it just be printed?

Nor do you specify the quality of the photostat copies your working with?
Nor, even the source the photo copies were taken from?

Each scanning job is a new method when OCRing text. The quality of paper,
ink and print quality in the original source ALL effect your method and
capabilities.

FAST?
Rather than OCR!
Scan directly into Acrobat as "line-art" at 150DPI of your scanner setting.

The end result will be what you desire, FAST (also the scan file sizes will
be compact,) however what your able to use this line-art scan for will be
very limited (NO search option.) The quality of printing in a line-art
scanned PDF is no where near the quality of an OCR'd with fonts job, however
in most instances the line-art items are perfectly readable.

If you require searchable pages and quality print?
There there is NOT any FAST method.
You OCR the pages individually and make the corrections.

There are ways to improve your OCR work.
Most scanner when used for OCRing text are nearly as sensitive as what I've
sen discussed in this forum for scanning slides.
My current scanner was cleaned when it was NEW and FRESH out of the box
because the maunfacturer had some type of haze on the glass. IMO it was not
a sign of effective quality control, however with regular cleaning the
bottom line scanner has worked superbly for my purposes.

I do way more scanning than the average or even above average person. In
most instances, what I'm working with are magazine going back from 1940 to
current. However I've even scanned some books from as early as 1903.
In scanning the magazines, I scan the text and images separtely. The text
is OCR'd and saved as a Word RTF (or 6.0) doc. In rare instances where the
printed font is very small I may either scan as line-art or OCR, it depends
as previously mentioned on what my intended end-use is.

Word does NOT open Acrobat PDF's.
Word does NOT edit Acrobat PDF's.
The two are entirely separate software's. The Acrobat Reader may open from
within Word, however that transition and/or difference should be obvious to
the user.

In Acrobat (the full version-NOT the Free Reader,) your able to insert
and/or rearrange your pages both before and after the current page.

When adding a new page to any type of scanned document, your active cursor
should be at the position you desire the scan inserted.

Sounds to me like you just need to read up on SCANNING.
Here's an excellent source and every SECOND you spend reading these pages
will save you days and weeks in time later.
http://www.scantips.com/
 
The web site you referred me to was enormously technical, way beyond
anything I need to know. I just need some basic practical help on how
to scan pages and create a usable pdf file.

Larry
 
The web site you referred me to was enormously technical, way beyond
anything I need to know. I just need some basic practical help on how
to scan pages and create a usable pdf file.

Larry
============
Very basics:

Open Adobe Acrobat (you own that?)
File/Import/Scan.
100ppi for screen, 300ppi for good printing.
Scan.
Save PDF file.
Makes image based PDF.

There are also other PDF generating programs out there, from free to
still cheaper than Acrobat.

Or scan into Adobe Elements or Photoshop.
Save as Adobe PDF.
Also makes image based PDF.

Or open your OCR program (know what that is yet?).
Maybe you have one. Scan at 300ppi.
If so, maybe it outputs PDF directly.
If not, scan to text or word processing file and print that with Acrobat
printer driver into PDF file.
Makes text based PDF.

If the above message is too technical, better farm your work out.

Mac
 
Larry said:
The web site you referred me to was enormously technical, way beyond
anything I need to know. I just need some basic practical help on how
to scan pages and create a usable pdf file.

Larry
At least read Wayne's "Scanning Line art" to get an idea of the process.
http://www.scantips.com/basics04.html

OCR = Optical Character Recognition

For a how to:
1. I will use the assumption that you do not have a document feeder. (That
is a device that attaches to the scanner and scans multiple sheets
automatically.)

2. Place the Document face down on the scanner glass.
3. Start the PDF and OCR conversion software. Omnipage Pro 14 or PaperPort
9.0 or Adobe Acrobat 6.0 Professional or Adobe Acrobat 6.0 Standard.
4. Scan one or more pages.
5. OCR and Edit.
6. Save as PDF file.

Omnipage Pro 14 and PaperPort Deluxe 9 is least cost if purchased via
www.scantips.com.

Links to the Manufactures of the software.
http://www.scansoft.com/omnipage/
http://www.scansoft.com/paperport/pro/
http://www.adobe.com/products/acrobat/matrix.html


You must have one of the software packages to scan and convert document(s)
to a multi-page PDF file.

Scan the document with the interface that is built-in to each of the above
applications.
Follow the instructions in the Application for each step in the process.

I am not familiar with any of the software except Omnipage Pro (I own
Omnipage Pro 12)

There are three steps in Omnipage Pro to scan, OCR, then save the PDF file.

Step 1 is scan one or more pages. To have a multi-page PDF, you must scan
all the pages at once. (You can save the scans in a file for later recall)
Step 2 is OCR and Edit. (You will have to edit! OCR is not 100%, it is about
99.?% if you have very clean documents with normal size type)
Step 3 is Save the file as PDF.

Done.

There is one problem with OCR, that is, it does not convert handwriting to
editable text. Handwriting can be included in a PDF file as a graphic as can
pictures.
 
It would save time if you would read my previous messages before
answering. I already said (1) that I'm using a flat bed scanner, (2)
that I have Adobe Reader, not the full program, (3) that I had checked
"Scan as text" (OCR), (4) that I scanned into adobe, and (5) that the
result was these huge files with almost a megabyte per page.

Now, with these below instructions, are you saying that I initiate the
scan from within Adobe?
Open Adobe Acrobat (you own that?)
File/Import/Scan.
100ppi for screen, 300ppi for good printing.
Scan.
Save PDF file.
Makes image based PDF.
If the above message is too technical, better farm your work out.

I'm looking for step by step instructions so that a beginner like myself
can follow it. The problem is not that it's too technical, but that
you're writing the instructions in tech shorthand that leave out crucial
information. For example, the above instructions don't lay out what the
starting point is. Are these instructions for a fresh scan from paper?
Or are they instructions on how to convert my too-large pdf files to
smaller pdf files?
 
It would save time if you would read my previous messages before
answering. I already said (1) that I'm using a flat bed scanner, (2)
that I have Adobe Reader, not the full program, (3) that I had checked
"Scan as text" (OCR), (4) that I scanned into adobe, and (5) that the
result was these huge files with almost a megabyte per page.

It would save more time if no one answered at all, which is beginning to
look like what you deserve.
Sorry for trying to help without doing the detailed homework you require.


Last hint, if you are "scanning into Adobe", I assume you mean Photoshop
or Elements? If so, you are only going to get an image file, larger size.
You need to OCR to either straight unformatted text, or into a word
processing program, if you want formatting.

bye,

Mac
 
I'm active in other newsgroups and have given useful advice to hundreds
of people asking questions about MS Word. I try to write my replies in
a way that will be clear and helpful to people. I do not write in
techie shorthand and I do not tell people to jump off a cliff when they
tell me they didn't understand my previous post.

There must be someone reading this group who has used the Lexmark
all-in-one scanner or something similar to it and can tell me the simple
steps by which I can scan pages into a pdf file at an acceptable size,
or alternatively, if there is a way to turn the already existing pdf
files I've created (which are way too big) into smaller files, and who
does not insist that I must become an expert on the subject of scanning
before I do so.
 
Larry wrote in message ...
I'm active in other newsgroups and have given useful advice to hundreds
of people asking questions about MS Word. I try to write my replies in
a way that will be clear and helpful to people. I do not write in
techie shorthand and I do not tell people to jump off a cliff when they
tell me they didn't understand my previous post.

There must be someone reading this group who has used the Lexmark
all-in-one scanner or something similar to it and can tell me the simple
steps by which I can scan pages into a pdf file at an acceptable size,
or alternatively, if there is a way to turn the already existing pdf
files I've created (which are way too big) into smaller files, and who
does not insist that I must become an expert on the subject of scanning
before I do so.

Larry

I think you've had some very good advice from people who really are trying
to help. The type of scanner you have is irrelevant, it's only an input
device. If all you want to do is create PDFs from paper then here is the
easiest solution. It's not the cheapest, because that requires more
experience than you appear to have (or wish to acquire).

1) Purchase a full copy of Adobe Acrobat.
2) Install Acrobat.
3) Read instructions.
4) Scan documents.
5) Save file.

This will give you a single PDF file using Acrobat's default settings for
compression, etc.

Mike
 
Mike Dunstan said:
Larry wrote in message ...

Larry

I think you've had some very good advice from people who really are trying
to help. The type of scanner you have is irrelevant, it's only an input
device. If all you want to do is create PDFs from paper then here is the
easiest solution. It's not the cheapest, because that requires more
experience than you appear to have (or wish to acquire).

1) Purchase a full copy of Adobe Acrobat.

Mike means this one! NOT just the free Reader. Get out the credit card and
order!
http://www.adobe.com/products/acrobatpro/main.html
2) Install Acrobat.
3) Read instructions.
4) Scan documents.
5) Save file.

This will give you a single PDF file using Acrobat's default settings for
compression, etc.

Mike
I have told you about Omnipage Pro 14 for $100, but the full Adobe Acrobat
is a one stop solution for $450.

Adobe is the originator of the PDF format. Adobe knows the best way and you
pay for that expertise.
 
Thanks to both Mike and CSM1 for your replies.

It's now gotten clear, both in this thread and from what Wayne told me
in the other thread "Disaster with documents scanned into pdf," that the
simple answer to my questions that I was looking for was that I needed
the full version of Adobe, or of Omnipage Pro 14, to do this job
properly. I was simply using the wrong equipment. I had thought the
Lexmark 1150 all-in-one was sufficient because it gave me the option to
scan from paper into adobe format, along with the option to scan as text
using OCR. But evidently a cheapo device like the Lexmark (barely over
$100) is not going to produce satisfactory results with a big job like
this.

I just looked up Omnipage on the web, it is software program that does a
variety of tasks including scanning paper into pdf. But one still needs
a physical scanner. I think this question has already been addressed,
but just to make sure, would Omnipage work well in conjunction with the
Lexmark scanner?

Thank again,
Larry
 
I have told you about Omnipage Pro 14 for $100, but the full Adobe Acrobat
is a one stop solution for $450.

Abbyy Finereader is better and produces much better PDFs.
 
but just to make sure, would Omnipage work well in conjunction with the
Lexmark scanner?

Wayne made a good suggestion, in "Disaster with documents scanned into pdf",
which was get the free Irfanview and test if the Lexmark scanner has a TWAIN
interface.
http://www.irfanview.com/
Get the Plug-ins also. Irfanview is very limited without the plug-ins.

If Irfanview can scan from the Lexmark scanner, then Omnipage will probably
work with the Lexmark also.

On the question, of rescaning those 600 plus pages again, the answer is
probably yes.
For Omnipage to do the job, It needs good clean scans at 300 DPI or better.

--
CSM1
http://www.carlmcmillan.com
--
Larry said:
Thanks to both Mike and CSM1 for your replies.

It's now gotten clear, both in this thread and from what Wayne told me
in the other thread "Disaster with documents scanned into pdf," that the
simple answer to my questions that I was looking for was that I needed
the full version of Adobe, or of Omnipage Pro 14, to do this job
properly. I was simply using the wrong equipment. I had thought the
Lexmark 1150 all-in-one was sufficient because it gave me the option to
scan from paper into adobe format, along with the option to scan as text
using OCR. But evidently a cheapo device like the Lexmark (barely over
$100) is not going to produce satisfactory results with a big job like
this.

I just looked up Omnipage on the web, it is software program that does a
variety of tasks including scanning paper into pdf. But one still needs
a physical scanner. I think this question has already been addressed,
but just to make sure, would Omnipage work well in conjunction with the
Lexmark scanner?

Thank again,
Larry
 
Back
Top