Software for document archiving

  • Thread starter Thread starter martin_pentreath
  • Start date Start date
M

martin_pentreath

Following on from my earlier message about a suitable scanner, does
anyone have any recommendations for software to manage simple
digitising and archiving of home documents (bills, bank statements,
correspondence, etc).

Ideally for each document I'd like to store an image together with an
OCR of the text embedded in the same file so that it can be located by
a keyword search. The OCR wouldn't have to be fantastically accuarate.
It would also be useful if the software actually managed and indexed
the archive as well, rather than just dumping .pdf files onto your hard
drive and leaving you to sort them out.

This seems to me like a fairly obviously useful application which there
would be a market for and which someone should be producing, but
looking through the archives for this group there I'm not sure there's
anything very good out there.
 
Following on from my earlier message about a suitable scanner, does
anyone have any recommendations for software to manage simple
digitising and archiving of home documents (bills, bank statements,
correspondence, etc).

Ideally for each document I'd like to store an image together with an
OCR of the text embedded in the same file so that it can be located by
a keyword search. The OCR wouldn't have to be fantastically accuarate.
It would also be useful if the software actually managed and indexed
the archive as well, rather than just dumping .pdf files onto your hard
drive and leaving you to sort them out.

This seems to me like a fairly obviously useful application which there
would be a market for and which someone should be producing, but
looking through the archives for this group there I'm not sure there's
anything very good out there.

Check out PaperPort software.
http://www.nuance.com/paperport/

There is a trial version for download, click Try it Now.
 
Following on from my earlier message about a suitable scanner, does
anyone have any recommendations for software to manage simple
digitising and archiving of home documents (bills, bank statements,
correspondence, etc).

Ideally for each document I'd like to store an image together with an
OCR of the text embedded in the same file so that it can be located by
a keyword search. The OCR wouldn't have to be fantastically accuarate.
It would also be useful if the software actually managed and indexed
the archive as well, rather than just dumping .pdf files onto your hard
drive and leaving you to sort them out.

This seems to me like a fairly obviously useful application which there
would be a market for and which someone should be producing, but
looking through the archives for this group there I'm not sure there's
anything very good out there.

In response to both post.

1. You have to be careful with ADF scanners with things like bank
statements. They do get jammed....
2. Where I work sells a product that is pretty powerful, Art-Copy
Enterprise.
http://www.scanhelp.com/288int/artcopy/enterpriseversion/detailedinfoenterprise.html

It does scan to a "hidden text PDF." With the image over the OCR'd text.
The OCR is ok, we are still working on making it much better. But
its a piece of software to consider. It can turn any basic scanner into
a pretty powerful one. It has support to turn an ADF scanner into a
duplex scanner. Plus the auto-naming features and such are pretty useful.

ck
http://www.scanhelp.com
http://www.jetsoftdev.com
 
(e-mail address removed) wrote in @p79g2000cwp.googlegroups.com:
Following on from my earlier message about a suitable scanner, does
anyone have any recommendations for software to manage simple
digitising and archiving of home documents (bills, bank statements,
correspondence, etc).

Ideally for each document I'd like to store an image together with an
OCR of the text embedded in the same file so that it can be located by
a keyword search. The OCR wouldn't have to be fantastically accuarate.
It would also be useful if the software actually managed and indexed
the archive as well, rather than just dumping .pdf files onto your hard
drive and leaving you to sort them out.

This seems to me like a fairly obviously useful application which there
would be a market for and which someone should be producing, but
looking through the archives for this group there I'm not sure there's
anything very good out there.

I have a HP7310 All-in-one on my ethernet network behind a NAT router.
It's supplied software has a feature to create a PDF file which can be
opened and printed by Adobe Acrobat Reader. It also works well with an
included OCR software but I prefer the PDF format as it requires less
work on my part. The one disavantage is that the PDF file can be
slightly or quite a bit larger than the OCR version depending on format,
graphics, etc. Perhaps similar offerings by other scanner manufacturers
have the same features.
 
Hi Martin,

You may already have the tools needed for the job you mentioned. If
you're creating pdf images, that means that you already have an
application capable of making PDF's (not the free acrobat reader ).

If you have Adobe Acrobat std or pro- or if your scanner comes with
that application, then you'll have the ability to do exactly what you
state in your post.

In Acrobat 7.0 std: (viewing a graphical image- documents already
captured)
1. click on document
2. click on recognise text using OCR | Start
3. if there's multiple pages, click the radio button for all pages
4. click edit

NOTE: the default setting is "Searchable Image- Exact"- this leaves the
appearance untouched- but applies the ocr data as metadata behind the
image. The metadata is invisible- but allows for instant retrieval of
content within larger pdf files. For example, you could run a search on
a character and have all the matches in that file appear highlighted-
without changing the original image / appearance.

If you want to change the content, or edit the pdf: select the
"formatted Text and Graphics" setting instead. The ABBYY OCR engine
(Acrobat v7.0 Std&Pro) will ocr the image and OCR to the best of it's
ability. Like human eyes though, if Abbyy doesn't recognize the image
character, the results of the OCR may be incorrect. This may be
dangerous for very important documents being scanned because the
incorrect OCR become the actual image as well. (both correct as well as
incorrect data)

Adobe Arobat 5.0 and 6.0 also have the same capabilities- slightly
different menu navigation however.

Acrobat Reader is free to anyone- but can only view a pdf... not create
one. Adobe's in the busines of making money, so if you want to CREATE a
pdf, royalties would have to be paid to Adobe. Applications such as
Paper Port pay royalties and then embed the pdfmaker plugin within
their application. But, its still the exact same engine- just a
different skin for appearance...

hope this helps...

Danny
 
I'm not sure if you've had a chance to see the Fujitsu S500 yet.. but
if not, it's exactly what you're looking for.

S500:
With a 50page ADF Chute

2x 600dpi Optical CCD cameras for single-pass duplex scanning

18ppm/ 36ipm

auto cropping, content-based deskew

auto color detection

always scans to pdf (unless you have pictures or illustrations which
you want scanned to jpg)

auto file encryption (128bit strength) with either: preset key you
determined or a prompt will pop-up after the batch is completed
allowing you to set the password.

Saves automatically with the naming schema you decide on- auto prefix
and counter combination, date/ time stamp, etc...

saves anywhere your windows pc has rights to save, including mapped
drives

auto blank page deletion

tiny footprint

and the list goes on...

check out:
http://www.fujitsu.com/us/services/computing/peripherals/scanners/workgroup/s500.html
for more details

the best part is that all these features are automatic... you just
place the stack of documents, and press a button directly on the
scanner. This is the perfect scanner for daily scanning up to 250 -500
pages.

The general perception/ opinion of MFP's is that they're decent for
doing the multiple tasks... but tend to excel in any of them. mediocre
scanning, printing, copying.. more ideal for the person who might scan
once or twice a week vs. a couple hundred or more pages a day. Try it
out and see for yourself...

Hope this helps...

Danny
 
Back
Top