Reading Text on Graphics

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Friends:

I scanned a page of a document and so now I have this page in a Microsoft
Word file.
The contents of the word file are not available as text but as a single
graphic.

* Is there audio reading software out there that can still read the text
on the graphic eventhough the actual text is visible but not available for
processing ?



cheers
 
JoJo said:
Friends:

I scanned a page of a document and so now I have this page in a Microsoft
Word file.
The contents of the word file are not available as text but as a single
graphic.

* Is there audio reading software out there that can still read the text
on the graphic eventhough the actual text is visible but not available for
processing ?



cheers

That is called OCR software (optical character recognition).

http://en.wikipedia.org/wiki/Optical_character_recognition

Read the reviews for those products carefully. It doesn't
take too high an error rate in the character recognition,
to make you stop using the product. Personally, I
won't be buying any more of this kind of software.

Paul
 
LVTravel said this on 1/10/2009 9:58 PM:
Depending on the version of Office (you say you have Word so I assume it
is part of an Office suite) you have on your computer, you may already
have an optical character recognition program available to you. If it
is Office 2003 you have Microsoft Document Imaging located in All
Programs, Microsoft Office, Microsoft Office Tools. That program will
covert a scanned image into readable text (with some degree of accuracy
but is not perfect.) I also believe Office 2007 has a similar program
but if so, I didn't install it by default when I installed Office 2007.

LVTravel: I didn't get anything in 2007 either, but I did a custom
install myself. It makes me want to go back and look at the install CD.

JoJo: If you can rescan the page and IF you want to OCR it, we all
agree that error rate is dependent on the quality of the scan. I find
tweaking the DPI of the scan a bit higher and playing with contrast and
brightness to get sharper clearer characters, as well as b&w vs.
greyscale. But after an hour of scanning, maybe a page is easier to
type in manually. Of course I can type pretty good 65 wpm so I'd skip
all the hooey and type it in. I did scan a long 35 page memoirs once
and got pretty good results. 10 errors per page. Not too bad.

I've also loaded the picture (and thats really what it is, a picture,
its no where near text) into photoshop and removed major errors like
hole punches, folds, etc. But more of that was just trying to see how
and what I could do as an experiment. Then again, some times you can
more easily change contrast and clarity in photoshop rather than the
scanner. (photoshop happens to be my editor of choice, use anything).
 
I use OCR software to varied degrees of success. You're right it depends on
the quality of the original, mine usually being a fax which has many
imperfections. I'm not sure if it's quicker for me to scan my documents in
and look for errors or just start typing away, but I had 50 or 60 pages to
convert once. Was ridiculous.
 
shawn said this on 1/13/2009 3:59 PM:
I use OCR software to varied degrees of success. You're right it depends on
the quality of the original, mine usually being a fax which has many
imperfections. I'm not sure if it's quicker for me to scan my documents in
and look for errors or just start typing away, but I had 50 or 60 pages to
convert once. Was ridiculous.
Yep, its not what its cracked up to be.

I did find a good clean doc the other day and per the earlier part of
this thread I loaded the Office document scanner (OCR) program. It
worked great. Of course the original was almost in perfect form.

I did notice that the Office program did not retain as much formatting
(a good thing really) as compared to the first OCR I ran. The first
OCR program brought in lines of text with returns. This caused me to
remove all the returns. Office program just brought in the text.
Now, an after thought, it might have been settings, not sure. Oh well,
on and upwards.
 
I also use Cogniview's PDF2XL We convert a huge list of customer names every
year. It's like 140+ pages usually! We don't use all the information on
there, but I need to get it into a format that I can edit and remove the
names we do not want to send to. It's a list for a fashion show and the
company that gives the list does it in a crappy format on purpose -- for
free. If you want it in nice Excel format they charge you .25 cents per
name.

The worst thing about the list is they have a format like

Company Address City State Zipcode Phone -- all that on one
line, but then buyer names on a separate line and codes for what type of
apparel they buy.

So for me to convert takes forever because the company is too cheap to pay
..25 cents per address, yet they'll pay me more to peck away for a few days.
So, I have to convert it into Excel then edit it to get buyer names on the
same line as company name.

It would be easier if every record was the same length, ie: 2 lines long,
but some are 3 lines long for example, so I can't just make some sort of
macro.
 
Back
Top