Advertisement

COLUMN ONE : Fragile Virtual Libraries : Digitizing books and papers opens up a trove of culture to anyone with a modem. But archivists worry about what happens when the power fails.

Share
TIMES SCIENCE WRITER

The electronic archive at Washington University is a library without walls for books without pages--a wonder of the nether world called the Internet.

Operated from the school’s St. Louis campus, the archive may be the world’s largest public computerized information source--so indispensable that an average of 45,000 people reference it every day; so besieged with requests that another 60,000 people daily are turned away.

None of them actually has to set foot in St. Louis.

The archive, housed in a small computer on a folding table, is a fledgling example of what proponents call a virtual library, in which international computer networks and automated databases replace traditional book repositories. A modem serves in lieu of a library card.

Advertisement

The advent of the virtual library may be the most significant change in the nature of the public library in centuries, experts say.

With several hundred thousand files of text, software and images available instantaneously worldwide, the 60-gigabyte Washington University archive is a tentative step toward a time when electronic libraries will make books seem as archaic as clay tablets.

So the consternation was understandable one morning not so long ago when the archive vanished--vaporized when the computer’s memory failed.

In their digital bindings, the books of that virtual library were as vulnerable to a flipped bit or a power surge as monastic scrolls were to the barbarian’s torch.

For library specialists and some computer scientists, the fragility of the St. Louis archive, which has since been painstakingly restored, is a cautionary tale. Dozens of even more comprehensive electronic libraries are being planned.

Some experts worry that reliance on electronic archives may make humanity’s hard-won knowledge more vulnerable and expose it to unexpected risks of technological obsolescence.

Advertisement

Computer equipment becomes obsolete so quickly that it may be impossible for historians of the next generation to study today’s electronic records, documents or databases. As computers make it easier to store, catalogue and retrieve information, the information itself is becoming more fragile. Conventional type can withstand all but destruction of the page on which it is printed, but it only takes a stray magnetic field to kill an electronic file forever.

And when material of history and culture is electronic, what happens when the power fails? “We are very scared about the electronic media,” said Peter Hirter, coordinator of the electronic public access initiative at the National Archives, which is responsible for preserving valuable government records in perpetuity.

“The problems that are associated with long-term preservation of scanned, electronic material are immense,” he said.

After all, almost no one requires special equipment to read a book.

Computer scientists, however, are used to thinking of memory in terms of nanoseconds, not decades or centuries. “Very few people in the computer science world have really thought much about the problem of longevity,” said Jeff Rothenberg, an expert on computer storage and longevity at the RAND Corp.

The Lure of Digital Immortality

For researchers and archivists awash in hard-copy information, the immediate promise of electronic archives and libraries is a liberating one.

In the Washington area alone, the National Archives houses about 6 billion documents, 7 million pictures, 118,000 movie reels and 200,000 recordings. The 65-acre Library of Congress, the world’s largest library, houses more than 107 million items, ranging from the papers of 23 U.S. Presidents to one of only three existing perfect copies of the Gutenberg Bible. Panicked over how to preserve what they have, federal archivists are watching their collections grow by more than 5 million items a year.

Advertisement

Confronted with such perishable mountains of material, the Library of Congress, the National Archives and the University of California and other major universities are investigating ways to transform their collections into digital computerized records that can be distributed as widely as possible.

The aim is to drastically lower the cost of warehousing books, manuscripts, photographs, maps, motion pictures and sound recordings, while dramatically broadening public access around the world to even the most obscure historical collections. With computerized indexes, scholars might even discover everything that is stored on the hundreds of miles of shelves and file drawers in the National Archives.

“There is no area in our library or any other library that will remain untouched by digitization and computers,” said Suzanne Thorin, chief of staff at the Library of Congress.

This spring, the Library of Congress joined with 14 other major research libraries to begin a national digital library. As a start, they hope to have 5 million digital documents available to the public through the Internet and on CD-ROM by decade’s end.

The National Science Foundation, the Advanced Research Projects Agency and NASA together are spending almost $25 million at UC Santa Barbara, UC Berkeley and several other schools to construct and manage such mammoth interactive, electronic collections.

Law libraries and scientific publishers are hustling to get their reference works online. Art scholars are urging the creation of a 10-million-volume digital library. In France, the national library is committing the country’s literary heritage to disk.

Advertisement

Even the 500-year-old Vatican Library is going digital.

In the short run, research libraries are embracing the Internet and electronic storage as a way to make their collections more accessible to scholars and students. But some experts hope that electronic storage can solve the preservation problems that make all present-day archival efforts a losing race with time.

The best-tended paper will crumble. Film will dissolve. Recording tape will quickly lose its voice. Even museum-quality photos eventually will fade away.

Digital is forever--at least in theory, experts say.

It offers the ability to make perfect copies of any document, image, audio recording or film and it means that librarians can consolidate the different elements of their collections into one form, such as a CD-ROM disk.

The advantages of digital storage--even at the $2 to $6 it costs to scan a page of text into a computer--seem so compelling that some expect that one day it will replace microfilm as the major means of preserving texts and images.

But even if libraries go completely digital, others worry they will not escape all the problems that plague them now. Experts say digital archives will still last only as long as the physical material on which they are stored. Computer tape, floppy disks and hard drives last only a few years. Even sturdy CD-ROMs barely last a generation.

Writing in the American Archivist, Rothenberg and information storage expert Avra Michelson noted wryly, “The preferred media on which this digital information is stored--disk, tape and even CD-ROM--have far shorter shelf lives than acid-free paper or microfilm. Moreover these media tend to become unusable long before they reach their ultimate storage limits.

Advertisement

“It is only somewhat facetious to express this irony by saying that digital data lasts forever--or five years, whichever comes first,” they said.

A Computer Age Tower of Babel

While that may not be so different from the preservation problems archivists already face, digital storage adds a new, and unsettling, wrinkle.

A conventional text can be translated from stone tablet to vellum to handset hardcover to mass-produced paperback to microfiche and still be recognizable as written text--even when the language itself is unknown, as with the ancient Rosetta Stone.

But once a document is converted to silicon storage, its meaning is submerged in a so-called “bit-stream” of electronic digital zeros and ones. The resulting digital file is meaningful only to the software that created it.

The stored bits in a digital file could represent a letter of the alphabet, a pixel dot in an image, an audio signal or a number. There is no way to retrieve it, or to be certain it even exists, except by reading it with the proper software and computer hardware.

“It is not a document anymore,” said Rothenberg. “It is just a bunch of gibberish until you run the software that interprets it and puts the document up in front of you.”

Advertisement

Even if the computer program that created the files is preserved, there may be no surviving compatible computer that can run it.

In just two decades, mainframe computers have been replaced by mini-computers, which have given way to networks of personal computers. Punch cards were overtaken by computer tape, which was in turn made obsolete by magnetic floppy disks. Floppies have been superseded by hard disks, which are giving way to optical storage and flash memory.

“Just what good will these records be down the line?” said Judy Moline, an expert on electronic information storage at the National Institute of Standards and Technology. “The machines will go away. The software will go away. We have to have a way to recover this material in years to come.”

Already the National Archives has to contend with an electronic Babel of nine-track magnetic tapes, computer tape cartridges, analog videodiscs and audio compact discs. Last year, the archives started accepting government documents recorded on optical CD-ROM computer disks.

The Census Bureau alone manages 40,000 to 50,000 computer tapes.

“The thing that is really troubling about a computer tape is that you can’t really tell something is wrong until you try to read it,” said Fynette Eaton, acting director of the Center for Electronic Records at the National Archives. She tends digital files that date to World War II.

For now, conservators at the Library of Congress and the Archives are taking a judicious approach toward storing their collections on computer. The materials they want most to save are some of their most irreplaceable items. So there are no plans to throw away a manuscript or a photograph once it has been copied electronically.

Advertisement

“I am just enjoying the fact we have lots of possibilities,” said Diane Kresh, director for preservation at the Library of Congress. “For years there was microfilming and nothing else. I don’t even like to use microfilm.”

The E-Mail Onslaught

Whatever they decide to do with their paper files, archivists have little choice about the increasing amount of information that originates inside a computer--either as a word processing file, digital image, spreadsheet or database. There is little archivists can do with them except to catalogue and store them.

At least 15 publishers already file their manuscripts directly to the Library of Congress by computer. It is only a matter of time--and copyright law--before all authors can submit electronic manuscripts and the library makes them generally available as a full text database.

While some archivists are most concerned about the fragility of records composed of magnetic data, others are worried about the way multimedia computers are transforming the nature of documents themselves, said Hirter at the National Archives.

Consider the archival conundrum of electronic mail, which has become an indispensable but intangible record of the public’s business.

If Abraham Lincoln had composed the Gettysburg Address on the back of an envelope, as a popular story goes, an astute aide could have saved the scrap of paper for posterity. But how does someone save a message such as e-mail that has no physical form?

Advertisement

Federal archivists decided in August that the millions of e-mail messages generated daily by federal employees be preserved in the proper archival computer format or, better still, printed out.

They were spurred by a lawsuit over the White House e-mail generated by former National Security Council aide Oliver L. North and others involved in the Iran-Contra scandal. About 6,000 computer hard drives, computer tapes and other backup files were impounded from the White House and turned over to the Archives to be deciphered.

Not surprisingly, they were in an unreadable format.

“Because they are in a backup format, we can’t read them. It can only be read on the equipment that created it,” Eaton said.

So far, the effort to re-create the equipment needed to read the files is costing about $16 million. Just to make accurate copies of the files cost the archives $1.5 million. The effort took so much time that there now is a “huge backlog” of other electronic records that the center has not yet properly archived, Eaton said.

The Search for a Standard Format

Electronic mail is simple compared to multimedia texts and pages on the World Wide Web arm of the Internet, which can include graphics, audio and video.

Archivists today blanch at the prospect of preserving electronic memos embellished with spoken comments or images contained within the written text, as is increasingly common with multimedia documents. Spreadsheets pose similar problems.

Advertisement

To stock the shelves of new online “cybraries,” electronic publishers are creating documents beyond the dreams of any conventional bookbinder--colorful, animated tomes that incorporate into their text electronic links to hundreds of other electronic volumes. A touch on a highlighted word leads the reader down a chain of cross-referenced texts, footnotes, illustrations and additional documents that themselves are linked to an expanding web of multimedia images.

Preserving these interactive texts and the image-rich, multimedia home pages of the World Wide Web is simply out of the question--even though the government is increasingly in the business of creating them.

“We don’t accept image files,” Eaton said with an ill-concealed shudder. There are just too many incompatible formats, too many technical issues, and too many copyright problems, she said.

Federal archivists today try to sidestep the problem of changing technology and incompatible computer files by accepting only the simplest standardized text files.

To set the standard for a universal electronic library, archivists long ago embraced a simple, internationally accepted form of the digital alphabet called the ASCII standard. Now that rudimentary standard is being eroded by four or five more complex multimedia formats, which all allow information about a graphic in a document to be stored along with its text.

In the resulting confusion, archivists feel they are just one step ahead of national electronic amnesia.

Advertisement

So, to keep from being overtaken by decaying disks or evolving computer standards, the National Archives has committed itself to translating its electronic libraries into new standard formats as they are invented, to ensure that future readers can read records of the past.

Its oldest electronic files--surveys of returning war veterans--have been copied every 10 years since World War II, migrating in the process from the original punch cards to computer tape.

As computer documents become more complex, even electronic copying could become perilous.

“Unfortunately, copying without distorting or losing information is not as trivial as it sounds,” Rothenberg said. “Every time you translate a file [into a new format], it is going to introduce some distortion.”

Imagine how the Iliad would read if the only existing text of the 2,400-year-old epic had survived the centuries by being translated into every intermediate language that existed between ancient Greek and modern American English. How much of the original poetic descriptions of the Trojan War would be lost in the process?

“You might have something that might be recognizable but it would not be what we think of as the Iliad,” Rothenberg said. “And it certainly would not retain the one thing that makes it worth preserving, which is the spark and the style and the flavor of whoever actually created it.

“You would lose exactly what you were really interested in as literature. You might never know that its poetical quality existed in the first place.”

Advertisement

(BEGIN TEXT OF INFOBOX / INFOGRAPHIC)

Building the Virtual Library

The Library of Congress, the National Archives, university research libraries and museums are creating a foothold in cyberspace. To reduce costs, improve preservation and increase public access, they are turning their collections into digital files that can be shared easily on computers worldwide. But archivists worry about how long these records can survive.

IT JUST FACES AWAY

Digital data is only as permanent as the material it is stored on. Computer tapes and storage disks don’t last as long as books and microfilm. Stray magnetic fields can erase them easily. Computer technology changes so quickly that the tapes or disks usually become obsolete long before they physically wear out.

*--*

Time until obsolete Physical lifetime Magnetic tape 5 YEARS 1 YEAR Videotape 5 YEARS 12 YEARS Magnetic disk 5 YEARS 5-10 YEARS Optical disk 10 YEARS 30 YEARS

*--*

****

RISKING ELECTRONIC AMNESIA

Unlike regular written records, computer files store their meaning in a library electronic code that often can be read only by the computer program that created it. Without the right equipment, a reader has no way of knowning whether the code represents a letter, a number, part of an image, an audio note or a computer instruction.

This character could stand for a number, note or computer instruction.

INTEGER: 21

REAL NUMBER: 1.3125

CHARACTER: U

Source: The National Archives, Scientific American, Library of Congress

Advertisement