Bulk Scanning - Hardware and Software

  • Thread starter Thread starter Jim
  • Start date Start date
J

Jim

Hello

I am trying to archive a huge amount of paper materials for a school.
There are 10000+ pages, and I want to groups them in .pdf files
automatically. So I put 50 pages in the scanner, and press scan, and
those go into one .pdf, then the next 25 go into another files and so
on. It would be great if the scanner could accept two sided pages
since there are a lot of those. This needs to be pretty much automated
so that I do not have to babysit the scanner all day, just put in the
next section and go. If there is a way to automaticaly index the
files, or make them searchable pdfs, that would be fantastic. What
would the best hardware/software recommendations be for this. Thanks.

-Jim
 
Jim said:
Hello

I am trying to archive a huge amount of paper materials for a school.
There are 10000+ pages, and I want to groups them in .pdf files
automatically. So I put 50 pages in the scanner, and press scan, and
those go into one .pdf, then the next 25 go into another files and so
on. It would be great if the scanner could accept two sided pages
since there are a lot of those. This needs to be pretty much automated
so that I do not have to babysit the scanner all day, just put in the
next section and go. If there is a way to automaticaly index the
files, or make them searchable pdfs, that would be fantastic. What
would the best hardware/software recommendations be for this. Thanks.

-Jim
The short answer to your questions, is can't be done.

You are asking too much for the present technology.

There are Scanners with ADF that are fast and will scan 50 pages at a time
and create multi-page images. Unattended maybe not, paper jams have to be
cleared by somebody. And the multi-page image files have to be seen to.
(Filenames and where to start and stop).

As for creating error free PDFs, it is not possible, the best known software
to create PDF from scanned documents is barely 99% accurate.

You can create multi-page documents with the correct software, but not
without operator attendance.

Another question, how much money do you have for this job? The better
equipment and software is very costly.
 
If you have an ADF you can try a demo of ScanOut at http://www.autumna.com
It might do a job close to the one you need for a dime... even though
support say that more automation tasks are on the way in the next release...
 
Well, I am not sure about the money we have for the project, but I
would assume is is significantly more limited than what it sounds like
you are talking about. It seems that there must be something for all
these huge reports/documents/government hearings and the like that are
in .pdf all over the web. It is hard to believe that there is someone
scanning them in one page at a time by hand.
 
Well, I am not sure about the money we have for the project, but I
would assume is is significantly more limited than what it sounds like
you are talking about. It seems that there must be something for all
these huge reports/documents/government hearings and the like that are
in .pdf all over the web. It is hard to believe that there is someone
scanning them in one page at a time by hand.

Well, PDF documents don't really have to be scanned to be converted
and saved. for example, if a word Document is produced as a PDF file
and put on the web (or otherwise transferred to you) it's a fairly
simple matter to convert that "file" from a PDF to a DOC file, without
ever converting to to "paper" or having to scan it.

Several applications are available for doing this conversion,
including the latest versions of Word, itself.

Note, however, that this is true of PDF files produced from actual
documents, not PDF files produced by scanning a piece of paper. The
two PDF files are quite different. A PDF produced by scanning is a
graphical file, and so to convert to a Word or other document type,
you need to perform OCR on the scanned image. whereas a PDF produced
electronically from the original file is really a special kind of EPS
(Encaqpsulated PostScript) file.
Charlie Hoffpauir
http://freepages.genealogy.rootsweb.com/~charlieh/
 
Jim said:
Well, I am not sure about the money we have for the project, but I
would assume is is significantly more limited than what it sounds like
you are talking about. It seems that there must be something for all
these huge reports/documents/government hearings and the like that are
in .pdf all over the web. It is hard to believe that there is someone
scanning them in one page at a time by hand.

no...they choose "print to pdf" as the option.
 
I think perhaps I misstated what I meant. Currently, the federal
government is in the process of digitizing their entire (back to 1789)
document cache including Congressional hearings, court cases and the
like. I was saying that I doubt that the people scanning in these old
documents are doing it page by page, and that there must be someway to
automate the process
 
Jim said:
I think perhaps I misstated what I meant. Currently, the federal
government is in the process of digitizing their entire (back to 1789)
document cache including Congressional hearings, court cases and the
like. I was saying that I doubt that the people scanning in these old
documents are doing it page by page, and that there must be someway to
automate the process

Ask somebody in the federal program. It is our government, most information
is online and free.

Google search:
http://www.google.com/search?hl=en&...izing+their+entire+(back+to+1789)&btnG=Search
http://library.stmarytx.edu/acadlib/doc/guides/congress/billlaws.htm

For older laws, GPO Access provides searchable access to the Statutes back
to 1995. The Library of Congress' American Memory provides free digital
access to Statutes for 1789-1875. The Law Library Microform Consortium is
also digitizing older Statutes and they are available through the Blume
Library's Online Catalog.

Online Catalog: Email sombody and ask.

http://regina.stmarytx.edu/
 
Jim said:
I think perhaps I misstated what I meant. Currently, the federal
government is in the process of digitizing their entire (back to 1789)
document cache including Congressional hearings, court cases and the
like. I was saying that I doubt that the people scanning in these old
documents are doing it page by page, and that there must be someway to
automate the process

There is, I remember working in an office (oil company not government) where
they had a desktop unit with a paper feeder that scanned page after page by
itserlf. An operator spent all day just organizing the resulting computing
files, which were just graphical images, however, the software built a
searchable indexed database from them. Unfortunately I never got involved in
this area so I wouldn't know what they used, but the year was 1990, there
must be much better solutions these days.
 
how large are the documents ? letter/ legal or could there also be a
need for B size ( 11x17)
what time frame would you like to capture all the documents

I've got quite a bit of experience in document imaging and know of
several solutions that may fit your needs.

based on the information you have in your post, I'd recommend
(initially) the Fujitsu Scansnap fi-5110EOX 2

Take a look at www.FCPA.com

this scanner is ideal for environments that don't have a need for twain
or isis compliant imaging software or docurments wider than 8.5" -
because of the dimensions... dual 600dpi true optical resolution which
allows for optimum image quality when needed plus true duplex
capabilities. USB 2 interface as well as a small footprint.

The scansnap would be ideal for anyone ... literally, I even got one
for my parents... (no joke)

it's bundled with Adobe Acrobat 7.0 standard which is the first line of
adobe products to come integrated with the ABBYY find reader OCR engine
( regarded in the industry as one of the best available)

The scanner is proprietary with Adobe Acrobat in fact.. no twain, no
isis, just the "scansnap manager"
support is free even after warranty ends
well, I could go on and on...

but, if the documents are larger than 8.5" wide or you have a short
amount of time to do this volume, you're going to need a different
solution.

Call me at work if you have any questions... 408-894-3682

Danny
 
Back
Top