Archiving every book ever published


This article was originally on a blog post platform and may be missing photos, graphics or links. See About archive blog posts.

It sounds like something out of a Borjes story, but Brewster Kahle is trying to build an archive that includes a copy -- a print copy -- of every book ever published. Kahle is perhaps both the best and oddest person to take on the task.

Kahle founded the Internet Archive, which has been taking and storing snapshots of the entire Internet since 1996. It now has a digital library that includes video and audio files, live music and the Open Library, which is building a single Web page for every book ever published.


In other words, he’s kind of a digital guy.

But he’s working on a huge -- really huge -- analog project. The Associated Press caught up with Kahle in Richmond, Calif., where he has stacks and stacks of shipping containers stored in a warehouse.

‘There is always going to be a role for books,’ said Kahle as he perched on the edge of a shipping container. Each container can hold about 40,000 volumes, the size of a branch library. ‘We want to see books live forever.’ So far, Kahle has gathered about 500,000 books. He thinks the warehouse itself is large enough to hold about a million titles, each one given a barcode that identifies the cardboard box, pallet and shipping container in which it sits. That’s far fewer than the roughly 130 million different books Google engineers involved in that company’s book scanning project estimate to exist worldwide. But Kahle says the ease with which they’ve acquired the first half-million donated texts makes him optimistic about reaching what he sees as a realistic goal of 10 million, the equivalent of a major university library.

A technology pioneer, Kahle was a co-founder of the Web ranking system Alexa, which Amazon purchased in 1999. His hard-copy book collection, he hopes, will eventually also become a digital book collection. ‘The dedicated idea is to have the physical safety for these physical materials for the long haul and then have the digital versions accessible to the world,’ he told the Associated Press.

Kahle gave a Ted Talk in 2007 about the Internet Archive and how he sees it working as a vast, freely accessible library. It’s embedded after the jump.

As for his half-million printed books? They’re going to stay in storage, he explained, as an authoritative backup. He compares his warehouse full of books to the Svalbard Global Seed Vault -- even if it evokes images of the end of ‘Raiders of the Lost Ark.’

-- Carolyn Kellogg