[LINK] The Undersea Network

Stephen Loosley stephenloosley at zoho.com
Sun Mar 29 22:36:40 AEDT 2015


Excerpt: "The Undersea Network" by Nicole Starosielski. Duke University Press.
 
Gateway: From Cable Colony to Network Operations Center

Entering the network operations center of a globe-spanning undersea cable system, I find what you might expect: a room dominated by computer screens, endless information feeds of network activity, and men carefully monitoring the links that carry Internet traffic in and out of the country. 

At first glance, it seems to be a place of mere supervision, where the humans sit around and watch machines do the work of international connection, waiting only for a moment of crisis, such as when a local fishing boat drops an anchor on the cable or a tsunami sweeps the system down into a trench.
 
This vision of autonomous networks is shaped more by Hollywood cinema than by actual cable operations. In reality, our global cable network is always in a sort of crisis and, at the same time, highly dependent on humans to power the steady flow of information transmissions.
 
It would perhaps be more precise to say that cables are always in a state of “alarm.” 

An “alarm,” in network-speak, is anything from an indication that the cable has been severed to a reminder about a needed computer update. Undersea systems are not so different from our personal computers. They need regular updates and upgrades. They are susceptible to bugs and environmental fluctuations. Sometimes things just don’t work as planned. 

The men in a network operations center work daily to resolve a continually updated batch of alarms, which at this particular location number around 120–150 per week. The vast majority of these are only warning alarms, which notify them of some approaching threshold, a problem with a backup system, or a source of potential interference. Even if our signals continue to pass through cable systems without delay, the undersea network never quite functions perfectly on its own, that is, without alarm and without human assistance.
 
System errors can be produced by even the smallest events. The stations where undersea links terminate house immense cooling systems, and with all of the air conditioners blowing dust around, regular cleaning is required. Yet even when companies employ specialized cleaning crews, there is often an increased number of alarms during the process. By contrast, during Christmas the number drops dramatically. An operations manager explains what might seem obvious: “when you haven’t got people touching stuff it tends not to break.” 

The inside of his station testifies to the danger of human hands. The primary fibers running in from the sea are labeled with bright tape reading “Danger Optical Fiber,” to warn anyone who enters the station not to touch them. During Super Bowl weekend, another company planned not to have any activity in their station at all, just to ensure that nothing went wrong. The circulation of human bodies, necessary for network operation, inevitably bump, jostle, and set equipment into an alarm state.
 
Alarms can also be generated by the machines themselves. Although network equipment is supposed to be identical and thus predictable, in reality each device displays remarkably individual behavior and can produce errors without anyone even coming into contact with it. 

One manager gripes to me that their station just hadn’t gotten the right piece of transmission equipment, and once it had started to have bugs, it required repeated maintenance for most of its life—a kind of problem child. Another cable engineer explains that each machine has been manufactured using different batches of raw materials and assembled at different times. Two circuit packs might be technically identical but might function differently over the course of their lifetime, in part because different computers contain materially different components. The glass or the solder wire may have been of a different quality or come from a different origin. This can result in “batch faults” which occur in a series of equipment manufactured at the same time. 

The engineer uses an analogy to explain the process: “It’s a bit like making a fruit cake. I can make a fruit cake on Monday and I can make one on Wednesday, but they can be different even if I followed the same recipe. In the one on Monday I might have used 198 grams of sugar and the one on Wednesday I might have had 202 grams of sugar. Very, very minor differences could have an unknown impact sometime in the future.”
 
The men at this network operations center are tasked with reading the incessant feed of alarms, determining what needs to be fixed, and conducting the necessary maintenance, all without a drop in signal transmission. 

One technician lets me follow him to a cable station on a routine follow-up to a warning alarm. He explains that there is not a one-to-one correspondence between each alarm and an actual problem with the system. Rather, an alarm is a symptom that something is wrong—an indication of a failed connection. It could be compared to a fever or a rash on the human body: a manifestation of a problem, but not an indication of cause. A full cable break might generate many alarms. In turn, multiple problems might contribute to a single alarm.
 
As a result, there is a significant amount of human interpretation required to deduce the origin of a problem from an array of alarms. Cable engineers might be thought of as the doctors of the global cable network. Pointing to one rack, which has a light on, this technician says, “See… that machine is in a state of alarm.” He plugs in his computer to figure out what is wrong, but it remains unclear. He then turns to a rack from which several cords extend, plugging into another machine. He looks at the loose cords. “I think that this one here,” he says, picking up a cord, “is supposed to be in here”—he points to a jack— “but I’m not sure.” He’s not ready to risk it. This alarm is only for a backup machine, so it can wait. We leave the station, still not quite sure what the cause is, and head back to the network operations center to consult with the other technicians.
 
While in some ways the computers that support global networks are not so different from our personal laptops, the stakes are dramatically higher for this kind of maintenance work. The technicians aim to make every backup system, and backup-for-the-backup system, run perfectly. 

Much of the equipment is designed to function for 25 years, the expected life of an undersea cable, including the repeaters that sit on the bottom of the seafloor. These are some of the most durable computers out there. And yet some parts will develop bugs, and others won’t. Technicians keep detailed records on individual pieces of equipment so they know what each part’s history is. Tracking “what each one’s been through” is critical to maintaining a reliable network.
 
Even the smallest discordances in the network need to be addressed. One cable worker describes a problem he had with a piece of equipment that was displaying an alarm state when he looked at it in the landing station, but the alarm was not detected back at the network operations center. As a result, he could not determine where the bug was: in the piece of equipment or in the computers at the center. Even though it was at great cost, the engineer decided to send the equipment out to have its code rewritten, just in case.

Even though the alarms are constant, because of this thorough labor, actual failures are few and far between.
 
Operating undersea networks requires this kind of careful interpretive work and a detailed knowledge of the history of cable equipment, skills that cannot be outsourced to computers. Although we might think of digital networks as purely technical, engineers and technicians are the human components in a system carrying 99 percent of transoceanic Internet traffic. If these workers were to disappear, the system would ultimately collapse. We owe the smooth operation of global communications in part to their ability to act quickly and minimize disruptions.
 
The level of secrecy of this job, the specialized nature of cabling, and the small number of systems, however, have kept this a fairly insular group of men. Many have been in the cable industry for decades. Even with all of this experience, though, no single person has an understanding of the entire network. 

In the station that I visited, new servers and stacks have been added, and the technician I interviewed was not familiar with the history of every single one. As a result, engineers depend heavily on each other to solve problems: they must know who to call for what information and how to coordinate system fixes across platforms. 

The insularity of the cable community supports this interpretive work.
 
When I ask operators about the vulnerabilities of today’s undersea network, many express concerns about downsizing and retirements. They fear that carefully sustained industry knowledge will be lost and that there will be nobody to take their place that will adhere to the same standards of reliability. Recruiting the next generation of workers is difficult. There is no direct path to the industry and it remains largely invisible to the public. 

One engineer describes the situation, “Nobody goes to school and says I want to be in the undersea cable business.” 

In many ways, the operation of the undersea cable system is in opposition to the everyday tech culture: it is built on an ethos of durability, rather than disposability. Many ask who will ensure the continuity of the cable networks, if their industry starts to take a path toward quicker turnover, devalued labor, or planned obsolescence? Who will ensure that the bodies maintaining our undersea networks are as reliable as the cable technology?

http://www.scientificamerican.com/article/undersea-cable-network-operates-in-a-state-of-alarm-excerpt/

Cheers,
Stephen






More information about the Link mailing list