Redact Phone Numbers on 60,000+ Scanned TIFs

  • Thread starter Thread starter Bob
  • Start date Start date
B

Bob

I have been given the task of figuring out a way to redact (i.e.
block-out) phone numbers from about 60,000 + scanned documents.

I can do this using most image editors one image at a time but would
like to
find a way to make the process more efficient (scripted). Most of the
numbers are
in the same general position on the scans but given that the docs were
scanned over a period of 3 years by different people, there are several
"batches" at different resolutions, etc.

Anyway...

The 60,000+ TIF's are in 84 folders. I need to somehow:

1. Open the first TIF
2. Ask the user to select the area they would like to redact (i.e.
phone number)
3. Black-out the user selected area
4. Save with same filename (maybe overwrite), close and load the next
TIF

Ideally, #2 would be the only user-click.

Does anyone know how to script this and what program would work?

Thanks!

Bob
 
You can automate "actions" in photoshop and it can prompt the user to
confirm changes.

Example: open file, create blacked out selection in a certain part of
the image. Pause so user can reposition if needed. Manually restart
action, save, close. You can batch this to run on an entire folder.
I'd suggest sorting the "batches" into separate folders and creating
distinct actions for each. You don't need a new version of photoshop
to use this. You should be able to find Photoshop 7 for a reasonable
price.

60,000 documents is excessive however. If all the files lined up
perfectly you could batch the whole thing, but manually clicking on
something 60,000 times does NOT sound like fun to me.

Good luck.
 
Bob said:
I have been given the task of figuring out a way to redact (i.e.
block-out) phone numbers from about 60,000 + scanned documents.

I can do this using most image editors one image at a time but would
like to
find a way to make the process more efficient (scripted). Most of the
numbers are
in the same general position on the scans but given that the docs were
scanned over a period of 3 years by different people, there are several
"batches" at different resolutions, etc.

Anyway...

The 60,000+ TIF's are in 84 folders. I need to somehow:

1. Open the first TIF
2. Ask the user to select the area they would like to redact (i.e.
phone number)
3. Black-out the user selected area
4. Save with same filename (maybe overwrite), close and load the next
TIF

Ideally, #2 would be the only user-click.

Does anyone know how to script this and what program would work?

Thanks!

Bob

How about a shameless plug :-) The place I work for could write
something that would do this, but it would be somewhat costly (as
compared to $700 software). But not so costly as to make doing it
manually make sense.

http://www.jetsoftdev.com
http://www.scanhelp.com

Thanks,
Clarence
 
Photoshop CS has something called a "droplet", its an automated action
you create and if you drag and drop a file onto it, the droplet will
run the action on the file, and can save the file with a new name in a
new location. You could create a droplet that would mask off an area
large enough to hit all the numbers (it shouldn't matter if you have a
few digits showing at the beginnng or end of the number), drag and drop
your 60,000 files onto the droplet and let it chug away overnight or
however long it takes.
 
If your typeface/copy is clean enough there are many OCR applications
(optical character recognition) that would produce identifiable
telephone numbers in ascii (with area code parens and hypen) if you
don't mind occasional glitches in the rest of the text.

There are off-the-shelf applications that are designed to identify
telephone numbers in text, used regularly by Internet hackers, but I
don't know what they are.
 
By the way, Bob, with the Photoshop droplet, you can put a pause in the
action to allow user input, so you could have the droplet open each
file, resize the screen view or position the document so that the user
can then draw a box around the area to be redacted and then click to
resume the rest of the action and the file save. Then the droplet
should open the next file. If you're interested in this approach, you
might want to take it over to a Photoshop forum where you can get more
detailed information.
 
Back
Top