J
Jon Noring
[This is a proposed project. If you are interested in being involved
as a founder/leader, let me know. Whether or not I put in the effort
to get this launched depends on whether I can assemble a core group of
movers/leaders.]
Google's recent announcement to scan millions of books in the next
couple decades (in association with a few major libraries) has raised
both applause and concern. Applause that this is long overdue -- let's
get our printed heritage online. Concern that the scanned page images
from the public domain books will not be completely and freely
accessible.
Fortunately, the Internet Archive is planning a parallel project (also
in association with a few major libraries) to likewise scan a large
number of books, and is currently shaking down their system with a
test project in Canada -- this test project appears to be going
smoothly. The IA *will* make the public domain books (and those under
copyright where they secure the necessary rights) freely and
completely available to the world. Bravo!
However, both projects use a centralized approach (with paid/trained
staff to run the process), and both will likely use very expensive
($100,000) robotic page-turning scanners so as not to damage the book
bindings, and to get high throughput.
Thus, I'm considering an alterative approach (which is intended to run
in parallel with, and to augment, the Internet Archive project) where
a volunteer network will be setup (and volunteers include both
institutions and individuals) to scan donated books using relatively
inexpensive and high throughput sheet feed scanners. I call this
project "Distributed Scanners", inspired by "Distributed
Proofreaders", an innovative online interface to assist volunteers in
proofing older texts, and which this project, if launched, will
associate with.
We will rely on the fact that there are a lot of books out there whose
bindings are pretty much gone -- the books are falling apart -- so,
except for rare "collectors" books, there's no issue in removing the
broken bindings (maybe by chopping) and running the pages through
high-speed sheet feeding scanners. After each book is scanned, the
pages are then appropriately packaged/boxed, indexed and archived
(they will NOT be thrown away).
The books can be given as a charitable donation to the project (in the
U.S. through a 501(c)3 organization -- IA has been asked to be the
umbrella for this project, and to archive the original books, but I
have not yet heard back from them), thus private donors might be able
to claim a tax deduction. Used book stores, private book collectors,
public libraries, etc., etc., probably all have, in aggregate, a lot
of such books and will be happy to donate them to the cause (some
people may go through the used books at the Salvation Army, Goodwill,
DI, etc., and for a dime find some great books to donate to the
cause.)
In parallel with Distributed Scanners, we could set up a "Distributed
Catalogers", to setup a network of volunteer librarians, and provide
an online interface, to allow the volunteers to enter the cataloging
metadata associated with each scanned book. They could also take the
lead in copyright clearance (further discussed in the note at the end
of this message.)
Of course, the final page scans will be donated to the Internet
Archive, plus to any other online archive out there willing to host
the scans (including Google.)
Note, too, the Distributed Scanners, over time, could diversify into
scanning other types of older documents, such as newspapers,
historical records, etc. The sky's the limit as to how we might
mobilize volunteers to digitize our printed heritage.
*****
So, with that introduction, I'm making three requests:
1) I'm looking for individuals reading this who would like to be
involved, especially as co-founders and co-movers/leaders of this
project, who see the potential and want to be involved. If you are
interested, contact me by email.
2) I'm looking for someone in the Salt Lake City area of Utah, who
owns (or has free access to) a high-quality sheet feed scanner that
we can begin experimenting with, to understand the various
real-world issues associated with this project.
3) Of course, looking for a few older books (pre-1963) that can be
donated for demo'ing this project. (Don't send any now, just let me
know if you have such books which are literally falling apart.)
And of course, your thoughts, ideas and criticisms are welcome.
Jon Noring
[Note on copyright aspects: We will definitely need to go through a
process to determine the copyright status of each scanned book. We can
draw upon the expertise of Project Gutenberg to assist with this (they
are experts at it.) Nearly all books published in the U.S. before 1923
are public domain -- and interestingly, a good number of the books
published between 1923 and 1963 are also public domain since copyright
renewal was required (some say up to 90% of all the books published
between 1923 and 1963 are public domain -- Distributed Proofreaders
has been working on the Copyright Office renewal files to assist with
renewal searching.) Public Domain books will become freely available;
the books whose status is either copyrighted or indeterminate will go
into a separate, non-public part of the archive, and efforts can be
made in the future to find and approach the copyright holders for
permission to allow the scans of those books to be made freely
available under a Creative Commons license.)]
as a founder/leader, let me know. Whether or not I put in the effort
to get this launched depends on whether I can assemble a core group of
movers/leaders.]
Google's recent announcement to scan millions of books in the next
couple decades (in association with a few major libraries) has raised
both applause and concern. Applause that this is long overdue -- let's
get our printed heritage online. Concern that the scanned page images
from the public domain books will not be completely and freely
accessible.
Fortunately, the Internet Archive is planning a parallel project (also
in association with a few major libraries) to likewise scan a large
number of books, and is currently shaking down their system with a
test project in Canada -- this test project appears to be going
smoothly. The IA *will* make the public domain books (and those under
copyright where they secure the necessary rights) freely and
completely available to the world. Bravo!
However, both projects use a centralized approach (with paid/trained
staff to run the process), and both will likely use very expensive
($100,000) robotic page-turning scanners so as not to damage the book
bindings, and to get high throughput.
Thus, I'm considering an alterative approach (which is intended to run
in parallel with, and to augment, the Internet Archive project) where
a volunteer network will be setup (and volunteers include both
institutions and individuals) to scan donated books using relatively
inexpensive and high throughput sheet feed scanners. I call this
project "Distributed Scanners", inspired by "Distributed
Proofreaders", an innovative online interface to assist volunteers in
proofing older texts, and which this project, if launched, will
associate with.
We will rely on the fact that there are a lot of books out there whose
bindings are pretty much gone -- the books are falling apart -- so,
except for rare "collectors" books, there's no issue in removing the
broken bindings (maybe by chopping) and running the pages through
high-speed sheet feeding scanners. After each book is scanned, the
pages are then appropriately packaged/boxed, indexed and archived
(they will NOT be thrown away).
The books can be given as a charitable donation to the project (in the
U.S. through a 501(c)3 organization -- IA has been asked to be the
umbrella for this project, and to archive the original books, but I
have not yet heard back from them), thus private donors might be able
to claim a tax deduction. Used book stores, private book collectors,
public libraries, etc., etc., probably all have, in aggregate, a lot
of such books and will be happy to donate them to the cause (some
people may go through the used books at the Salvation Army, Goodwill,
DI, etc., and for a dime find some great books to donate to the
cause.)
In parallel with Distributed Scanners, we could set up a "Distributed
Catalogers", to setup a network of volunteer librarians, and provide
an online interface, to allow the volunteers to enter the cataloging
metadata associated with each scanned book. They could also take the
lead in copyright clearance (further discussed in the note at the end
of this message.)
Of course, the final page scans will be donated to the Internet
Archive, plus to any other online archive out there willing to host
the scans (including Google.)
Note, too, the Distributed Scanners, over time, could diversify into
scanning other types of older documents, such as newspapers,
historical records, etc. The sky's the limit as to how we might
mobilize volunteers to digitize our printed heritage.
*****
So, with that introduction, I'm making three requests:
1) I'm looking for individuals reading this who would like to be
involved, especially as co-founders and co-movers/leaders of this
project, who see the potential and want to be involved. If you are
interested, contact me by email.
2) I'm looking for someone in the Salt Lake City area of Utah, who
owns (or has free access to) a high-quality sheet feed scanner that
we can begin experimenting with, to understand the various
real-world issues associated with this project.
3) Of course, looking for a few older books (pre-1963) that can be
donated for demo'ing this project. (Don't send any now, just let me
know if you have such books which are literally falling apart.)
And of course, your thoughts, ideas and criticisms are welcome.
Jon Noring
[Note on copyright aspects: We will definitely need to go through a
process to determine the copyright status of each scanned book. We can
draw upon the expertise of Project Gutenberg to assist with this (they
are experts at it.) Nearly all books published in the U.S. before 1923
are public domain -- and interestingly, a good number of the books
published between 1923 and 1963 are also public domain since copyright
renewal was required (some say up to 90% of all the books published
between 1923 and 1963 are public domain -- Distributed Proofreaders
has been working on the Copyright Office renewal files to assist with
renewal searching.) Public Domain books will become freely available;
the books whose status is either copyrighted or indeterminate will go
into a separate, non-public part of the archive, and efforts can be
made in the future to find and approach the copyright holders for
permission to allow the scans of those books to be made freely
available under a Creative Commons license.)]