Advertisement

Computer Firms Aim for a Bug-Free Future

Share
From Associated Press

It is common to all computers, regardless of make, operating system or age. It lurks everywhere from the humble home PC to spacecraft that navigate the Martian atmosphere.

It’s the bug, glitch or bomb--the system crash that brings down the whole works, frustrating users, raising costs, reinforcing doubts about the reliability of technology in the 21st century.

After years of working alone on the problem, high-powered minds from academia, government and the private sector are putting aside competitive concerns and banding together to work on creating bulletproof computer systems.

Advertisement

“We’re willing to put a lot of information on the table and share it on the theory that high tide raises all ships,” said Richard DeMillo, Hewlett-Packard Co.’s chief technology officer. “If we can bring up the level of dependability in the industry, then we all win.”

The High Dependability Computing Consortium, announced in December, held its first workshop this month to chart a course for stomping out bugs that affect everything from air traffic control to office networks.

Initially funded with a $500,000 NASA grant and led by Carnegie Mellon University, the group includes industry heavyweights such as HP, IBM Corp. and Microsoft Corp. After 2 1/2 days of discussions, participants agreed that reliable computing must be a priority.

“High dependability is something that cuts across a lot of what companies do,” said David Garlan, a computer science professor at Carnegie Mellon. “It’s not their main line of business, but they need it. In some sense, sharing that is less threatening than sharing some proprietary feature.”

The dollar cost of computer crashes and downtime is staggering, and companies aren’t eager to reveal the scope of the problem.

“Most view reliability issues as dirty laundry,” said Dale Way, who led Y2K research for the Institute of Electrical and Electronics Engineers. “A lot of organizations have methods of reacting when things fail and keeping it in-house . . . It’s serious and expensive, but it’s buried inside the cost structure of organizations.”

Advertisement

In June 1999, a 22-hour outage on the Internet auction site eBay cost the company $3.9 million.

Sometimes, more than dollars are at stake.

In December, officials at San Francisco International Airport halted installation of a new ground radar system after tests showed it was tracking phantom airplanes.

And national pride took a hit in 1999 when human and software glitches doomed both of NASA’s Mars spacecraft--just as they were about to begin their missions. One cost $125 million, the other $165 million.

“Both people in government and industry know we could do much better,” said Microsoft researcher Jim Gray, who leads the company’s San Francisco lab.

The biggest problem is that high quality is not always designed into software from the start.

Garlan believes software developers should work more like engineers--who incorporate the lessons of failures into future designs.

Advertisement

“All the things that you would find in a mechanical system would apply to software systems,” DeMillo said. “The appeal here is to build that set of engineering principles and build that culture of learning from failure and making sure it doesn’t happen again.”

Beyond learning from spilled milk, computer system developers should try to understand the environment in which their programs operate--and how humans can thwart the software, DeMillo said.

Consortium partners also will develop ways to simulate how software works before it is deployed, much as airplanes now are tested by computers before they fly. New techniques also are being explored that will give computers the ability to heal themselves.

Sometimes, it’s better to rewrite code that has been recycled over the years, Garlan said.

Next generation operating systems from both Apple and Microsoft, scheduled for release this year, are abandoning old code in favor of programming that originated in the business world and has been proven.

On the Net:

High Dependability Computing Consortium: https://www.hdcc.cs.cmu.edu

21st Century Project: https://www.utexas.edu/lbj/21cp/

Advertisement