Converting a JPEG file to text

  • Thread starter Thread starter Ajay
  • Start date Start date
A

Ajay

Hello,

I am working on a project for which I need a software that shall take
in a .jpg file (that is basically a scanned paper filled with written
material) and seperate out only the words out for me in that document.
Do you guys know of any Commercial/Open Source softwares available that
shall do this?

I am sure there is stuff around. I do not want to reinvent the wheel.

Thanks
Ajay
 
Hello,

I am working on a project for which I need a software that shall take
in a .jpg file (that is basically a scanned paper filled with written
material) and seperate out only the words out for me in that document.
Do you guys know of any Commercial/Open Source softwares available that
shall do this?

I am sure there is stuff around. I do not want to reinvent the wheel.

Thanks
Ajay


Search for OCR - Optical Character Recognition software. It does
that.
 
Charles said:
Search for OCR - Optical Character Recognition software. It does
that.

On the open source front, let me suggest GOcr (or JOcr, which is actually
the same thing, it merely has two names... don't ask me).
It's not too good, but as far as I know, it's about the best you can get
open source. OCR is really a field where open source is currently a bit
lacking.


by LjL
(e-mail address removed)
 
Lorenzo J. Lucchini said:
On the open source front, let me suggest GOcr (or JOcr, which is actually
the same thing, it merely has two names... don't ask me).
It's not too good, but as far as I know, it's about the best you can get
open source. OCR is really a field where open source is currently a bit
lacking.


by LjL
(e-mail address removed)

Simple OCR is a Royalty Free application. Source code is available for a
fee.
http://www.simpleocr.com/
 
Hello,

I am working on a project for which I need a software that shall take
in a .jpg file (that is basically a scanned paper filled with written
material) and seperate out only the words out for me in that document.
Do you guys know of any Commercial/Open Source softwares available that
shall do this?

I am sure there is stuff around. I do not want to reinvent the wheel.

Thanks
Ajay

OCR (Optical character Recognition) programs have been around for
years. They all work better on tif, or other file formats that do not
introduce artifacts in the scanned image. To work well with jpg, you
have to insure that there is limited compression, to reduce the
artifacts.
Charlie Hoffpauir
http://freepages.genealogy.rootsweb.com/~charlieh/
 
Ajay said:
I am working on a project for which I need a software that shall take
in a .jpg file (that is basically a scanned paper filled with written
material) and seperate out only the words out for me in that document.
Do you guys know of any Commercial/Open Source softwares available that
shall do this?

If you have MS Office, there is a program under MS Office Tools called
Document Imaging. I think you usually need 300 dpi to get decent
results with OCR. And you don't want to JPEG if possible because it is
lossy and introduces artifacts. Use TIFF or PNG or GIF.
 
Use any of the suggested OCT packages. Did one come with your scanner? Try
that.

Re jpeg and artifacts, test a few by converting to tiff or bmp and then
OCR-ing. If that works, batch convert the jpgs and then OCR them.
 
Back
Top