Patt Morrison Asks: The Internet Archive’s Brewster Kahle
Brewster Kahle has the gleeful air of a man who has just found something wonderful and wants to tell his friends all about it. And his friends are the 2 billion people, and counting, who are on the Internet every day.
What he has found -- or more accurately, crafted -- are the means and the mechanisms to preserve the human record, the whole human record, in its many media, so other humans can get to it with a tap or a mouse click, on www.internetarchive.org and www.openlibrary.org.
For a geek who made his fortune in cutting-edge search engines, Kahle sure does love books and print. He taught his kids geometry out of a 19th century volume of Euclid and does hand-set letterpress printing in his basement.
Thanks to Kahle’s Wayback Machine -- a search engine named in homage to a cartoon on “The Rocky and Bullwinkle Show” -- you can follow the history of vanished Web pages. At the archive’s website, download a book that’s in the public domain or borrow one -- electronically -- that’s not.
Kahle’s home base is a onetime Christian Science church in San Francisco. Where the week’s hymn numbers were once posted, there are now two canonical tech-world numbers: the golden ratio, and pi. Everybody sing!
You love libraries, Web pages, pretty much all forms of information, but you worry about preserving it all.
What happens to libraries is that they burn. And they get burned by governments. The Library of Congress was burned once; it was burned by the British.
So let’s design for it. If the folks at the [ancient] Library of Alexandria had made a copy and put it in China or India, we would have the works of Aristotle, the other plays by Euripides.
Wouldn’t it be great if you could put all the published works online? The Internet Archive is trying to become useful as a modern-day digital library. We’re trying with [today’s] Library of Alexandria [among others].
The Alexandria, in Egypt, right?
Yes. They have this gorgeous building; you walk in, turn to your right and [there is] the running Internet Archive. They’re scanning their books for it. We have [such] agreements with five or six [libraries] around the world.
Let’s not have the Library of Alexandria, version two, burn this time. [Let’s be prepared for] when Iron Curtains go up or down, when governments say, “We’re not really interested in this library thing anymore.”
The Internet Archive started [by] collecting all the Web pages, a copy of every page from every website every two months. We collected, collected, collected. Then we made the Wayback Machine.
Then we started collecting television — 20 channels worldwide since the year 2000, mostly news.
The book collection — we’re digitizing 1,000 books every day [in] 29 scanning centers in six countries. There’s a room in the Library of Congress and they keep bringing us maybe 100 or 200 books a day [to scan].
We get a couple of million people a day to see these collections.
One of the ways your collection of current books differs from Google’s quest to record all books is that you’ve structured yours like a lending library -- people check out a book virtually and return it virtually.
We started by scanning public domain books and now have about 2 million available [to download] free. [We get books] from around 500 great libraries. The California state library participates. The libraries, or some foundation, [pay] to have them scanned. The national library of Spain [has] us collect all Spanish websites.
It costs 10 cents a page; about $30 a book. We can do it all in about one hour.
But we wanted to get more modern books [too], so we came up with the lending library system at openlibrary.org.
[He pulls up the site on his laptop and demonstrates.] “Mr. Popper’s Penguins” -- it probably has some rights issues, so I can take this book for two weeks. Anyone else who wants to borrow it, they’ll have to wait until I return it. OK, so now I’m going to return the book -- ta da!
How do writers get paid? And will all libraries be online affairs?
All this is in transition. We’re starting to see a few companies really suck the air out of the room [with] central points of control: Google, Apple, Amazon. Let’s find an [open] alternative.
We see libraries [online], buying ebooks as they buy books today: Buy them and lend them out. [Some] publishers are not selling ebooks to libraries, but if the $3 billion to $4 billion that libraries currently spend on publishers’ products [still goes] to publishers and authors, then there is a future for all concerned.
Slate called you an evangelical librarian. Do librarians like you?
Yes; we’re doing things they wish they could be doing.
You sound like a liberal arts major!
Nah, I just read all these books. My background really comes from geekdom and the idea of building a smart machine. If we’re going to build a smart machine, let’s have it read good books.
So when you went to the library as a kid, you thought, we can do better than this?
Oh yeah -- [the library] is all romantic, but it’s super-slow. Answering questions in a physical library with books -- that’s the sort of thing we expect to do like that [he snaps his fingers] on the Net now.
The problem is the Net doesn’t have [enough of] the good stuff yet. It’s shallow. The way most people are learning these days is through screens, so let’s make sure they have as good a [screen] library as [the kind] we grew up with.
My kids are 14 and 17; the books of the 20th century are not at the fingertips of my children, and the 20th century was pretty impactful. If we don’t [change] that, we’re going to end up with a generation that’s going to learn only [from] corporate stuff or Wikipedia.
I read you have 40 billion Web pages from 50 million websites. Do you lie awake at night and think there are millions more being created at this very moment, how do we catch up?
Yes, absolutely. And the Web is changing. It’s more difficult to [keep up], but that’s our challenge.
With the early websites from the 1990s, a lot of [things] didn’t work out but at least we have copies of them. And next time, let’s go back and make sure our technology can support those dreams better.
What’s [the Web] going to become? I’m hoping [it] isn’t just the next glorified television.
You do ephemera like seed catalogs and political brochures too?
A lot of ephemera, old computer magazines, people love that stuff.
And you’ve got a “book ark.”
We don’t want to destroy the books we’re scanning. We love books! So we said let’s get good at storing books. Libraries spend a lot of money storing books. We [do it for about] one-tenth of what libraries spend.
We do it much more densely. We put them in boxes, then on pallets, then in modified shipping containers. We know where everything is. It’s not meant to be a circulating library. It’s collection-oriented.
If you’re wondering [if] “1984” by George Orwell has been changed [in a new edition], can we check the original? We’re a place to do that. It’s the original testimony of the artifact. Is this level of protection the ultimate? I don’t know. But it’s another shot at it.
Do you read on a Kindle?
No, I like books.
How close are you to getting it all digitized?
When we started, we were thought of as crazy; it was impossible. Or if you could do it, you wouldn’t want to. We don’t hear that anymore. People are saying, glad you’re there; I’ve used it; it’s helped me out. So in 15 years -- somewhat because of us doing it and showing it’s valuable — we’ll use the Net as the library. By being a library, we’re able to remember and live a civic role that existed before the Internet.
This interview was edited and excerpted from a longer taped transcript. Interview archive: latimes.com/pattasks.
A cure for the common opinion
Get thought-provoking perspectives with our weekly newsletter.
You may occasionally receive promotional content from the Los Angeles Times.