Massive glitches that impaired phone systems in Los Angeles and elsewhere in recent weeks have been fixed, telecommunications engineers said Friday. But analysts said there could be additional and greater phone hang-ups ahead.
After working feverishly for the past week, engineers from several phone firms said they finally identified and fixed software problems that caused phone networks in four major metropolitan areas nationwide to overload half a dozen times in the past month.
In California, Pacific Bell representatives said new software was installed early Friday "that will end the series of network impairments" that hit Los Angeles and San Francisco in June and July. Separate repairs were also installed in the Bell Atlantic network that failed twice earlier this week, frustrating callers in Washington and Pittsburgh for hours.
But despite the apparent end to the latest rash of telephone outages--the most wide-ranging and persistent in recent memory--experts said the worst may be yet to come as the nation becomes increasingly dependent on SS7, a complex and sophisticated telecommunications traffic routing system that has been beset with repeated electronic flaws.
The real problem, analysts said, is that there is no certainty that the root cause of the problem has been uncovered and fixed. Further, they worried that no assurances can be given that similar--or even far greater--collapses won't occur as telephone companies continue to install the new SS7 system and link their individual networks together through it.
"The new SS7 network is vulnerable to problems that could bring down the entire network of a phone company--and even spread to other phone companies," said Berge Ayvazian, a telecommunications analyst with the Yankee Group in Boston. "This is a major concern. These problems aren't easily isolated and they spread."
Ayvazian said phone companies must realize the importance of installing adequate backup systems even as they continue to outfit their networks with the latest technology.
"The regional phone companies have to develop more rigorous fallback solutions," he argued. "They should be able to patch these problem in 20 minutes, not six to eight hours. . . . We are particularly vulnerable now. We are operating without a safety net."
Meanwhile, the Federal Communications Commission on Friday detailed its plans to conduct a closed-door hearing Tuesday on its own internal investigation of the latest wave of phone glitches.
James Spurlock, assistant director of the FCC's common carrier division, said the agency is concerned about the wide-ranging effects from the system's installation and wants to ensure that the phone companies are taking every possible precaution as they shift to SS7.
"Maintaining a dial tone for the nation's phone users is a very important priority for the commission," Spurlock said. "We can't allow problems of this magnitude to persist."
Engineers and investigators said their worst fears--that the phone systems had been the victim of sabotage or a computer virus--have all but completely ruled out.
Instead, the problems have been traced to glitches in the signal transfer points manufactured by DSC Communications Corp. in Plano, Tex. The STPs, a part of the SS7 network being installed by Bell Atlantic and Pacific Bell, electronically route traffic along a phone company's network using a series of special computers and software.
DSC said Friday that the latest problems were caused when an individual STP started sending distress signals to the rest of the network. Instead of routing calls around the trouble spot, the network became inundated by the distress signals and was unable to handle its regular traffic. This caused the congestion that prevented callers from using the phone system in California, Pittsburgh and Washington.
DSC said its engineers fixed the problem by creating a set of computer instructions that will act as a circuit breaker, preventing distress signals from a STP from congesting the entire network.
However, the company said, engineers from its labs and other phone firms will continue to explore the root cause of the problem.