optimal DPI for paperless office?

  • Thread starter Thread starter Matthew Mucklo
  • Start date Start date
M

Matthew Mucklo

Hello,

Looking to buy a scanner to scan in old documents (hopefully one that
is in the range of 10PPM).

From people's experience/wisdom, is there a minimum DPI (and/or a
maximum) that the scannner should have? Is there a DPI that's optimal
for OCR? What size differences are we talking about?

--Matt
 
Hello,

Looking to buy a scanner to scan in old documents (hopefully one that
is in the range of 10PPM).

From people's experience/wisdom, is there a minimum DPI (and/or a
maximum) that the scannner should have? Is there a DPI that's optimal
for OCR? What size differences are we talking about?

--Matt
For OCR you probably won't need more than about 300dpi. The printed
originals were not created at more than that. Some would suggest 600 dpi
just to be on the safe side. Just about any modern scananer is better
than this.
 
For OCR you probably won't need more than about 300dpi. The printed
originals were not created at more than that. Some would suggest 600
dpi just to be on the safe side. Just about any modern scananer is
better than this.

Yeah, 300 DPI should be fine. For English text, I don't think 600 DPI
would buy you much unless you have lots of text that's 8-point or
smaller. I've seen hundreds of thousands of pages scanned at 300 DPI
and OCR from them is pretty good provided that the original images
weren't splotched/wrinkled/coffee-stained. A 600 DPI image will be
roughly 4 times larger than an equivalent-sized 300 DPI image.

Empirical evidence suggests that a paperless office has about as much of
a chance as a paperless bathroom. I'm sure you've heard stories about
people who print out all the e-mail they receive. These stories are
*true*. Proofreading/copy-editing documents can be faster and easier
with pen and paper than it is on a computer, depending on the
change-control system you're using. HTH,
 
Dances With Crows said:
Empirical evidence suggests that a paperless office has about as much of
a chance as a paperless bathroom. I'm sure you've heard stories about
people who print out all the e-mail they receive. These stories are
*true*.

I think that there is one company that has the know-how to
take us very close to the ideal paperless office. I am talking
about Adobe and their PDF format, of course. The office forms,
specially, are just like the original paper forms, and then better.

No other company is even close. Microsoft, of course, has tried
to kill PDF, but it seems that they gave up.

-Ramon
 
I think that there is one company that has the know-how to
take us very close to the ideal paperless office. I am talking
about Adobe and their PDF format, of course.

The problem with "paperless office" is not the technology. It's the
people, as what I wrote earlier implies. Most people only want to learn
the absolute minimum necessary to do their jobs, and computer illiteracy
is usually excused/covered over. Also, it's easier/faster to scribble
intermediate results or rough out graphics with a whiteboard, or a pen
and scrap paper, than it is with a graphics tablet and Gimp.

PDF is a lousy format for many things. Acrobat (full) is full of
useless features and incapable of performing many basic editing tasks;
you need an add-on package like PitStop to do basic editing. PDF and
Acrobat are insanely complex; the Acrobat plugin development
documentation runs to about 2700 pages! PDF, like PostScript, is fine
for a document that doesn't need to be changed--but you'd better
*create* that document in LaTeX or OpenOffice or something.
No other company is even close. Microsoft, of course, has tried to
kill PDF, but it seems that they gave up.

I think that in the long run, open standards will prevail. PDF *is*
open, but it's not suitable for distributing editable documents.
 
The problem with "paperless office" is not the technology. It's the
people, as what I wrote earlier implies. Most people only want to learn

It's more than that. It's regulations as well. many, if not all legal
documents and engineering drawings that require a "wet signature"
require that document be kept. A digitized version is often not
considered legal.

That adds another layer of decision making as to what must be kept and
what can be digitized and keep only as a digital document.

Things such as SOPs and other corporate procedures can often be kept
on line, rather than existing as paper documents. OTOH I was a
project manager for the implementation of a Laboratory Information
Management System that had to be "FDA validated". Every conceive able
input to the system had to be tested. For example on a log in. You
started with the log in screen with nothing entered. (screen dump to
printer), Valid name and PW entered (screen dump) press <enter> wait
for results and (screen dump) then valid name and PW where the PW has
one extra letter (screen dump), press <enter> screen dump. Same thing
minus one character, same thing with invalid PW, now do the whole
thing with the name. The same had to be done for each and every test
and result entry.

The guide we started with was about an inch thick. When we finished we
had a document that was over 3 feet tall and it all had to be hard
copy and signed by the proper individual(s)

The project charter was only two pages. There were three copies and
all had to be signed and kept as hard copies and not by our choice.
I wrote the charter and had to make sure all the proper people signed
the thing. Yes, we kept digital copies of the thing, but specific
product tests on specific batches had to be kept in hard copy that
again had multiple signatures. Over all even in an aggressively
computerized environment we were still stuck with tons of paper per
year.

the absolute minimum necessary to do their jobs, and computer illiteracy
is usually excused/covered over. Also, it's easier/faster to scribble
intermediate results or rough out graphics with a whiteboard, or a pen
and scrap paper, than it is with a graphics tablet and Gimp.

Computer literacy was a condition of employment at that company. They
are a very large and modern company, but due to the characteristics of
business going paperless meant we still used truckloads of the stuff.
We did lots of imaging or documents, but still, many originals still
had to be kept.
PDF is a lousy format for many things. Acrobat (full) is full of
useless features and incapable of performing many basic editing tasks;
you need an add-on package like PitStop to do basic editing. PDF and
Acrobat are insanely complex; the Acrobat plugin development
documentation runs to about 2700 pages! PDF, like PostScript, is fine
for a document that doesn't need to be changed--but you'd better
*create* that document in LaTeX or OpenOffice or something.
That is the idea. Many documents need to be in a form that can not be
edited or changed.

Our computer and imaging centers were quite large and modern. They
were heavily used, but still there were those truck loads of paper.
My office alone probably used 30 to 40 cases of paper per year and
there were only 4 of us in there. Then on top of that the project had
many other departments involved. The project probably accounted for a
small truck load of paper and we were just a drop in the bucket
compared to the rest of the corporation.

Roger Halstead (K8RI & ARRL life member)
(N833R, S# CD-2 Worlds oldest Debonair)
www.rogerhalstead.com
 
[...]
: The problem with "paperless office" is not the technology. It's the
: people, as what I wrote earlier implies.

Hmmmmm.

: Most people only want to learn
: the absolute minimum necessary to do their jobs, and computer illiteracy
: is usually excused/covered over.

IMHO, if the right technology were available, people wouldn't have to
learn anything more than the minimum to become paperless.

: Also, it's easier/faster to scribble
: intermediate results or rough out graphics with a whiteboard, or a pen
: and scrap paper, than it is with a graphics tablet and Gimp.

That's a fault in the currently available technology.

IMHO, the new ScanSnap 5110 is a step in the right direction: paper to
PDF (duplex, color, 600dpi, 15ppm) with the push of a button for $300.
In the other direction, Brother makes a $300 21ppm duplex laser
printer. So now we have a low-cost two-way gateway between cyberspace
and the world of paper. Lower-cost, higher-resoltion LCD panels will
make it possible to read most everything directly from cyberspace.
Tablet PC technology can make it possible to annotate intermediate
drafts or correct quizzes on-line as convenitenty as on-paper.

In other words, the technology is available but the cost (especially
the cost of the software) has to come down.

Tom Payne
 
IMHO, if the right technology were available, people wouldn't have to
learn anything more than the minimum to become paperless.

You're posting from an .edu address. If you're a student, you know how
dumb your fellow students can be. If you're a faculty member, you know
how dumb your students are. Most of the people you find in the real
world are dumber than that. Scary, eh? There are many stories posted
in ATSR and ASR where techs and sysadmins struggle with users who can't
use the mouse, users who don't grasp the idea of subdirectories, and
users who call the helldesk they dragged their Microsloth Word icon 2"
to the right on their desktop and now they can't find it. I've worked
in tech support, and I know most of these stories are true.
That's a fault in the currently available technology.

It's also a matter of economics. Big pad of paper + 10-pack of
ballpoint pens = $3. Whiteboard + markers = already there in most
offices, but say $300 or so. Tablet computer + networking gear + all
the software to run it = $2000 or more.
IMHO, the new ScanSnap 5110 is a step in the right direction: paper to
PDF (duplex, color, 600dpi, 15ppm) with the push of a button for $300.

The PDFs that are produced that way are almost certainly nothing but
TIFF or JPEG images with a PDF wrapper around them. They may *look* OK,
but the file sizes are huge and you can't grep them. You'd need to OCR
the images produced, which means more time and money. You'd also need
to store the images produced somewhere in a rational way, which means a
document management system (expensive) or a well-organized
filesystem (see above; users who don't understand subdirectories.)
In the other direction, Brother makes a $300 21ppm duplex laser
printer. So now we have a low-cost two-way gateway between cyberspace
and the world of paper.

Huh? Good, fast laser printers have been around for over ten years.
This is hardly revolutionary. It's nice that this one's cheap though.
Lower-cost, higher-resoltion LCD panels will make it possible to read
most everything directly from cyberspace.

? CRTs are cheap, can be made to run at high resolution, and there are
metric tons of them in every office and computer lab I've seen. People
still print out their e-mail. This is a cultural problem and a training
problem, not a technology problem.
Tablet PC technology can make it possible to annotate intermediate
drafts or correct quizzes on-line as convenitenty as on-paper.

Online tests can already be done with Apache+PHP/CGI-Perl/Java and a
test-taking suite of pages. (See freshmeat.net for some Free tools that
allow you to create and give online quizzes/tests.) This is also not
new. You can do the same thing with a bunch of desktop machines in a
computer lab, or let the students take the test from their home
machines, or set up an 802.11b access point and have the students all
bring in their laptops, or....

Adding tablet computers to this only makes things more expensive, except
for certain fields like math where it's bloody difficult to type an
integral sign or a big sigma on a keyboard. (Well, you could make the
Calc 101 kids learn basic LaTeX, but that'd probably break most of their
brains.)
In other words, the technology is available but the cost (especially
the cost of the software) has to come down.

Creating a "paperless office" is a complex, multifaceted problem. Tech
is only a small part of it. Users have to be trained to e-mail documents
they've created to other users, or post them on the company's web
server, or whatever instead of printing them out. Document formats need
to change--PDF is OK for non-editable documents, but lots of documents
need to be edited by multiple people. Markup languages won't work;
they're portable, open, and can be edited by any text editor, but they
scare people. MS Office formats won't work; they're proprietary and
undocumented, and Office costs money. OpenOffice has more promise since
its formats are fully documented open standards, but people don't like
OpenOffice since it isn't an exact MS Office clone yet (cultural
problems again!).

Not just that, but people outside the office have to be willing and able
to accept documents that aren't delivered on paper. Various government
agencies *love* paper, and people will have a hard time changing their
minds.

ISTR hearing a guy talking about "the paperless office" back in 1989.
It hasn't shown up yet. I'm not holding my breath. Later,
 
: On Wed, 21 Jul 2004 01:56:30 +0000 (UTC), (e-mail address removed) staggered into
[...]
:> IMHO, if the right technology were available, people wouldn't have to
:> learn anything more than the minimum to become paperless.
:
: You're posting from an .edu address. If you're a student, you know how
: dumb your fellow students can be. If you're a faculty member, you know
: how dumb your students are. Most of the people you find in the real
: world are dumber than that. Scary, eh? There are many stories posted
: in ATSR and ASR where techs and sysadmins struggle with users who can't
: use the mouse, users who don't grasp the idea of subdirectories, and
: users who call the helldesk they dragged their Microsloth Word icon 2"
: to the right on their desktop and now they can't find it. I've worked
: in tech support, and I know most of these stories are true.

Hmmmmm. Modern students have no problem downloading pirated music,
conversing via hideous IM technology, or finding material to
plagiarize on the internet. They certainly could learn to use a
tablet PC to simulate a pad of paper, and a PC with a keyboad to
simulate a typewriter, and the Windows file system to simulate a file
cabinet.

:>> Also, it's easier/faster to scribble intermediate results or rough
:>> out graphics with a whiteboard, or a pen and scrap paper, than it is
:>> with a graphics tablet and Gimp.
:> That's a fault in the currently available technology.
:
: It's also a matter of economics. Big pad of paper + 10-pack of
: ballpoint pens = $3. Whiteboard + markers = already there in most
: offices, but say $300 or so. Tablet computer + networking gear + all
: the software to run it = $2000 or more.

That's the current cost of technology. But that cost is coming down
at an inevitable and predictable rate.

:> IMHO, the new ScanSnap 5110 is a step in the right direction: paper to
:> PDF (duplex, color, 600dpi, 15ppm) with the push of a button for $300.
:
: The PDFs that are produced that way are almost certainly nothing but
: TIFF or JPEG images with a PDF wrapper around them. They may *look* OK,
: but the file sizes are huge and you can't grep them. You'd need to OCR
: the images produced, which means more time and money. You'd also need
: to store the images produced somewhere in a rational way, which means a
: document management system (expensive) or a well-organized
: filesystem (see above; users who don't understand subdirectories.)

Paper isn't OCR'd. To compete with paper, we only need to store
images.

:> In the other direction, Brother makes a $300 21ppm duplex laser
:> printer. So now we have a low-cost two-way gateway between cyberspace
:> and the world of paper.
:
: Huh? Good, fast laser printers have been around for over ten years.
: This is hardly revolutionary. It's nice that this one's cheap though.

Costs are an attribute of the technology, as you complained above, but
those costs are governed by Moore's Law and its various analogs and
correlaries.

:> Lower-cost, higher-resoltion LCD panels will make it possible to read
:> most everything directly from cyberspace.
:
: ? CRTs are cheap, can be made to run at high resolution, and there are
: metric tons of them in every office and computer lab I've seen. People
: still print out their e-mail. This is a cultural problem and a training
: problem, not a technology problem.

CRTs lack portability, another serious limitation of current technology
but one that is rapidly improving.

:> Tablet PC technology can make it possible to annotate intermediate
:> drafts or correct quizzes on-line as convenitenty as on-paper.
:
: Online tests can already be done with Apache+PHP/CGI-Perl/Java and a
: test-taking suite of pages. (See freshmeat.net for some Free tools that
: allow you to create and give online quizzes/tests.) This is also not
: new. You can do the same thing with a bunch of desktop machines in a
: computer lab, or let the students take the test from their home
: machines, or set up an 802.11b access point and have the students all
: bring in their laptops, or....
:
: Adding tablet computers to this only makes things more expensive, except
: for certain fields like math where it's bloody difficult to type an
: integral sign or a big sigma on a keyboard. (Well, you could make the
: Calc 101 kids learn basic LaTeX, but that'd probably break most of their
: brains.)

I was responding to your comment regarding the ease of annotating
paper documents compared to that of annotating on-line documents. I
agree with your comment, but I'm convinced that improved technology
will eliminate that difference. Acrobat already provides mechanisms
for annotating PDF documents, but it's expensive and might not yet
allow freehand annotations on tablet PCs -- both are momentary
limitations of the current technology.

:> In other words, the technology is available but the cost (especially
:> the cost of the software) has to come down.
:
: Creating a "paperless office" is a complex, multifaceted problem.

Agreed.

: Tech is only a small part of it.

IMHO, it's 99% of the battle provided that the technology is done
right.

: Users have to be trained to e-mail documents
: they've created to other users, or post them on the company's web
: server, or whatever instead of printing them out.

Rather, they have to be motivated or incentivized. Nobody was born
learning how to use paper. We had to learn penmanship and typing in
school. In fact, more people have learned on their own to use Word
than ever learned to type on a typewriter.

: Document formats need
: to change--PDF is OK for non-editable documents, but lots of documents
: need to be edited by multiple people.

PDF is no worse than paper in that regard.

: Markup languages won't work;
: they're portable, open, and can be edited by any text editor, but they
: scare people.

With good technology, nobody should have to learn markup languages any
more than they had to learn the chemistry of paper.

: MS Office formats won't work; they're proprietary and
: undocumented, and Office costs money.

Agreed.

: OpenOffice has more promise since
: its formats are fully documented open standards, but people don't like
: OpenOffice since it isn't an exact MS Office clone yet (cultural
: problems again!).

A minor technology lag. (GNUmeric is now so close to Excel that I'm
having no compatibilty problems.)

: Not just that, but people outside the office have to be willing and able
: to accept documents that aren't delivered on paper.

So, give it to them on paper. They are the ones who have to put up with
the inconvenience of filing, storing, and retrieving it.

: Various government agencies *love* paper, and people will have a hard
: time changing their minds.

Government agencies have been in the forefront of electronic business,
e.g., the IRS.

: ISTR hearing a guy talking about "the paperless office" back in 1989.
: It hasn't shown up yet.

Da Vinci talked about flying machines in the 1400s and they didn't show
up for 500 years.

I'm not holding my breath.

It's a good thing Da Vinci didn't.

Regards,
Tom Payne
 
<snip>

As for the knowledge level of users, I taught the intro to computer
science as a grad assistant. I don't think I need to say more on that
topic, except I found the general knowledge level of college students
to be scary.
Creating a "paperless office" is a complex, multifaceted problem. Tech

Creating a "paperless office is not possible. It doesn't matter what
you have for equipment, or even computer literate employees.
Regulations and document retention make it virtually impossible except
for a very small business and in many cases not even then.
is only a small part of it. Users have to be trained to e-mail documents
they've created to other users, or post them on the company's web
server, or whatever instead of printing them out. Document formats need
to change--PDF is OK for non-editable documents, but lots of documents
need to be edited by multiple people. Markup languages won't work;
they're portable, open, and can be edited by any text editor, but they
scare people. MS Office formats won't work; they're proprietary and
undocumented, and Office costs money. OpenOffice has more promise since
its formats are fully documented open standards, but people don't like

Now days if you have a business of any size you almost have to have
Office and the latest incarnations as do those with whom you do
business. It may be full of holes, uses lots of memory and storage,
but as the majority of businesses use it, most are stuck with it.
OpenOffice since it isn't an exact MS Office clone yet (cultural
problems again!).

Not just that, but people outside the office have to be willing and able
to accept documents that aren't delivered on paper. Various government
agencies *love* paper, and people will have a hard time changing their
minds.

I worked for a very large organization. They went through some legal
proceedings. They had to provide truck loads and it was many truck
loads of documents. They had to hire a relatively large staff just to
sort and organize those documents.

In many areas and particularly if you deal with the medical profession
and regulations you not only have to have electronic backups but paper
ones as well.
ISTR hearing a guy talking about "the paperless office" back in 1989.
It hasn't shown up yet. I'm not holding my breath. Later,

And as we tie ourselves up with regulations I don't think it's any
closer.

Roger Halstead (K8RI & ARRL life member)
(N833R, S# CD-2 Worlds oldest Debonair)
www.rogerhalstead.com
 
[...]
: Creating a "paperless office is not possible. It doesn't matter what
: you have for equipment, or even computer literate employees.
: Regulations and document retention make it virtually impossible except
: for a very small business and in many cases not even then.
[...]
: Now days if you have a business of any size you almost have to have
: Office and the latest incarnations as do those with whom you do
: business. It may be full of holes, uses lots of memory and storage,
: but as the majority of businesses use it, most are stuck with it.
[...]
: I worked for a very large organization. They went through some legal
: proceedings. They had to provide truck loads and it was many truck
: loads of documents. They had to hire a relatively large staff just to
: sort and organize those documents.
:
: In many areas and particularly if you deal with the medical profession
: and regulations you not only have to have electronic backups but paper
: ones as well.
[...]

Paper images of electronic backups are accepted in ever more arenas.
So, we seem to be moving, not to the paperless office, but to "perless
storage of images". The enabling technologies being cheap-but-good:
- high-resolution duplex ADF scanners (e.g., Fujitsu's Scansnap
5110E0X),
- OCR software,
- search-engine technology for local files.

Tom Payne
 
Back
Top