J
Jon
I'm now working to "clean up" the 182 page images from a recent scan
of a very rare and noteworthy public domain book. The cleaned-up scans
will be released to the public (such as given to the Internet Archive)
for free access. [For those interested, the book is the 1885 second
printing of the second edition of Sir Richard F. Burton's "Kama Sutra
of Vatsyayana".]
The scans were done at 600 dpi (optical) 256-color greyscale (there's
no color in the book), to capture sufficient fine-detail to aid in the
cleanup process. Of course, the book was chopped (the binding was
falling apart anyway) and each page scanned on a flat-bed, so there's
no page distortion caused by trying to scan a bound book. There are no
illustrations -- it's all black and white text.
I've already deskewed, cropped, centered and size-normalized all 182
pages. (For those interested, links to two sample partially-cleaned
pages are given below.)
In the cleanup process, I'd like to convert what I now have into
600-dpi *bitonal* (black and white) with uniform and nicely readable
character density, removal of "pepper", cleanup of larger blotches,
etc. I recognize there will be some handwork required, particularly to
remove larger "pepper" and blotches, and repair a few characters,
etc., but of course want to minimize handwork.
[Note that the purpose of the cleanup is for direct human-use of the
scans, and not solely for OCR purposes which doesn't require the
planned level of cleanup. For example, I plan to produce a DjVu
version for direct reading. For those who will probably ask, the raw
page scans have already been uploaded to Distributed Proofreaders for
conversion to structured digital text.]
Unfortunately, what complicates the clean-up process is that the
original book is in poor and variable condition. The paper is quite
yellowed and darkened, and many pages are quite faded. Were the
original in mint condition with good, uniform ink-to-paper contrast, I
wouldn't be posting this request for advice. But the overall poor
quality and page-to-page variation is taxing my graphics abilities to
produce a clean finished product with reasonably readable and uniform
character density (at 600-dpi bitonal.)
Here are two sample pages, each about 4.5 megs in size (2550x3900
greyscale):
http://www.openreader.org/kamasutra/page031.png (good condition)
http://www.openreader.org/kamasutra/page106.png (poor condition)
I would assume that others have had similar needs and have come up
with various processing tricks and even built special tools to aid in
the clean-up process (e.g., how to auto-remove small "pepper", the
one to few pixel wide black spots on the white background?). I look
forward to your advice and even help if you are interested (I will
upload all the partially-cleaned images somewhere if you want to help
with the actual clean-up process -- the whole set of images totals 680
megs.)
[As a final note, I use Paint Shop Pro 9, but do not have Photoshop.
But since PSP9 is fairly powerful, I assume that many, if not all,
recommended Photoshop processes will map over to PSP9.]
Thanks!
Jon Noring
of a very rare and noteworthy public domain book. The cleaned-up scans
will be released to the public (such as given to the Internet Archive)
for free access. [For those interested, the book is the 1885 second
printing of the second edition of Sir Richard F. Burton's "Kama Sutra
of Vatsyayana".]
The scans were done at 600 dpi (optical) 256-color greyscale (there's
no color in the book), to capture sufficient fine-detail to aid in the
cleanup process. Of course, the book was chopped (the binding was
falling apart anyway) and each page scanned on a flat-bed, so there's
no page distortion caused by trying to scan a bound book. There are no
illustrations -- it's all black and white text.
I've already deskewed, cropped, centered and size-normalized all 182
pages. (For those interested, links to two sample partially-cleaned
pages are given below.)
In the cleanup process, I'd like to convert what I now have into
600-dpi *bitonal* (black and white) with uniform and nicely readable
character density, removal of "pepper", cleanup of larger blotches,
etc. I recognize there will be some handwork required, particularly to
remove larger "pepper" and blotches, and repair a few characters,
etc., but of course want to minimize handwork.
[Note that the purpose of the cleanup is for direct human-use of the
scans, and not solely for OCR purposes which doesn't require the
planned level of cleanup. For example, I plan to produce a DjVu
version for direct reading. For those who will probably ask, the raw
page scans have already been uploaded to Distributed Proofreaders for
conversion to structured digital text.]
Unfortunately, what complicates the clean-up process is that the
original book is in poor and variable condition. The paper is quite
yellowed and darkened, and many pages are quite faded. Were the
original in mint condition with good, uniform ink-to-paper contrast, I
wouldn't be posting this request for advice. But the overall poor
quality and page-to-page variation is taxing my graphics abilities to
produce a clean finished product with reasonably readable and uniform
character density (at 600-dpi bitonal.)
Here are two sample pages, each about 4.5 megs in size (2550x3900
greyscale):
http://www.openreader.org/kamasutra/page031.png (good condition)
http://www.openreader.org/kamasutra/page106.png (poor condition)
I would assume that others have had similar needs and have come up
with various processing tricks and even built special tools to aid in
the clean-up process (e.g., how to auto-remove small "pepper", the
one to few pixel wide black spots on the white background?). I look
forward to your advice and even help if you are interested (I will
upload all the partially-cleaned images somewhere if you want to help
with the actual clean-up process -- the whole set of images totals 680
megs.)
[As a final note, I use Paint Shop Pro 9, but do not have Photoshop.
But since PSP9 is fairly powerful, I assume that many, if not all,
recommended Photoshop processes will map over to PSP9.]
Thanks!
Jon Noring