[LINK] The Optus Sub to the Parltry Ctee

Roger Clarke Roger.Clarke at xamax.com.au
Fri Nov 17 08:07:38 AEDT 2023


The Optus statement:
https://www.aph.gov.au/DocumentStore.ashx?id=2ed95079-023d-49d5-87fd-d9029740629b&subId=750333

p.5:

19.   It is now understood that the outage occurred due to approximately 
90 PE routers automatically self-isolating in order to protect 
themselves from an overload of IP routing information. These 
self-protection limits are default settings provided by the relevant
global equipment vendor (Cisco).
20.   This unexpected overload of IP routing information occurred after 
a software upgrade at one of the Singtel internet exchanges (known as 
STiX) in North America, one of Optus’ international networks. During the 
upgrade, the Optus network received changes in routing information from 
an alternate Singtel peering router. These routing changes were 
propagated through multiple layers of our IP Core network. As a result, 
at around 4:05am (AEDT), the pre-set safety limits on a significant 
number of Optus network routers were exceeded. Although the software 
upgrade resulted in the change in routing information, it was not the 
cause of the incident.
21.   Restoration required a large-scale effort across more than 100 
devices in 14 sites nationwide to facilitate the recovery (site by 
site). This recovery was performed remotely and also required physical 
access to several sites.

___________

Optus outage blamed on edge router default settings
Fullest technical explanation so far is provided.
Ry Crozier
itNews
Nov 16 2023 11:01PM
https://www.itnews.com.au/news/optus-outage-blamed-on-edge-router-default-settings-602442

Optus has given its fullest account of what it thinks caused the 
November 8 outage: default settings in its Cisco provider edge (PE) 
routers that led to around 90 shutting down nationwide.

Optus outage blamed on edge router default settings
The attribution is an evolution of its previous explanation that an 
“international peering network” had fed it bad data.

News reports this week identified that peer to be the Singtel internet 
exchange (STiX), and partially identified the cause as a software 
upgrade on Singtel’s end.

Singtel disputed that account on Thursday, instead - more correctly, it 
seems - identifying “preset failsafe” mechanisms in Optus’ routers as 
the cause - an account Optus confirmed in a submission filed late on 
Thursday, ahead of a senate appearance on Friday.

“It is now understood that the outage occurred due to approximately 90 
PE [provider edge] routers automatically self-isolating in order to 
protect themselves from an overload of IP routing information,” Optus 
said. [pdf]

“These self-protection limits are default settings provided by the 
relevant global equipment vendor (Cisco).”

Optus said the “unexpected overload” of routing information came via “an 
alternate Singtel peering router”, because the primary or usual router 
hardware that Optus took route information from was under planned 
maintenance.

The telco said an unspecified software upgrade was being performed at 
one STiX location in North America - which Singtel confirms [pdf].

Optus suggests the upgrade led to the bad route information being 
propagated - why, it is unclear - but now says this “was not the cause 
of the incident" in Australia.

Instead, it puts the blame on the edge router “safety” defaults. It does 
not say why the default settings were used, to what extent it had the 
ability to tweak the settings, or how long the routers had operated with 
these defaults in place.

Optus said a team of 150 engineers and technicians were directly 
involved in the investigation and restoration, supported by another 250 
staff and five vendors.

Six theories

For the first six hours or so, the engineers pursued six different 
possible explanations for the large-scale outage.

These included whether works overnight by Optus itself were the cause; 
it rolled back those changes but found no resolution.

Other options simultaneously explored included whether it was a DDoS 
attack, a network authentication issue, or problems with other vendors 
such as its content delivery network provider.

One explanation, however, became the “leading hypothesis for network 
restoration”: equipment logs and alerts that “showed multiple Border 
Gateway Protocol (BGP) IPv6 prefixes exceeding threshold alerts.”

“We identified that resetting routing connectivity addressed the loss of 
network services. This occurred at 10:21am,” Optus said.

Engineers then set about “resetting and clearing routing connectivity on 
network elements which had disconnected themselves from the network, 
physically rebooting and reconnecting some network elements to restore 
connectivity, [and] “carefully and methodically re-introducing traffic 
onto the mobile data and voice core to avoid a signalling surge on the 
network,” it said.

Engineers performed unspecified “resiliency” works on the network 
between resolution on November 8 and the following Monday, November 13.

Optus foreshadowed more work to come.

“We are committed to learning from this event and continue to invest 
heavily, working with our international vendors and partners, to 
increase the resilience of our network,” it said.

“We will also support and will fully cooperate with the reviews being 
undertaken by the government and the senate.”

Defends customer comms

Optus used other parts of its submission to defend its customer 
communications on the outage day.

Its position is that as consumer and some enterprise services were out, 
media - traditional and social - was considered the best way to get the 
word out.

That is likely to be challenged in the senate inquiry.

The other issue the senate is likely to raise is financial compensation 
for customers.

So far, Optus has offered users extra data quota, which has been 
criticised in some circles.

While there is an argument that businesses, in particular, lost money 
while the network was down, there is a counterargument that businesses 
should have their own backup connectivity in the event their primary 
service is down.

To what extent the senate can resolve that is unclear.

Financial compensation unprecedented

Optus, however, in its submission argues that making a telco pay 
financial compensation for “consequential losses” isn’t a precedent that 
should be set.

“There is no precedent for compensation being paid by telecommunications 
providers to all business customers who suffer a loss of business as a 
result of an outage of the kind that occurred on November 8, either here 
or overseas,” Optus said.

“We understand that this would create a new precedent that would extend 
far beyond Optus and apply to all other telecommunications providers, as 
well as other providers of essential services, critical infrastructure 
and public services.

“This makes it a much broader policy question for government that would 
have far reaching implications across many sectors of the economy and 
the cost of these services for Australian consumers.”

Optus said that it isn’t the first to suffer a sizeable outage in 
Australia, nor would the November 8 outage be the last incident of its type.

“It is an unfortunate reality in our reliant digital age that no 
communications network can completely protect against, nor prevent, 
these types of occurrences from ever happening – despite the investments 
made or resiliency efforts undertaken,” it said.

“Reflecting this, communications services are not provided with a 
guarantee of continuous service.

“Given continuity of service is not guaranteed, consumers are not given 
an automatic right of compensation whenever an outage occurs.”

-- 
Roger Clarke                            mailto:Roger.Clarke at xamax.com.au
T: +61 2 6288 6916   http://www.xamax.com.au  http://www.rogerclarke.com

Xamax Consultancy Pty Ltd      78 Sidaway St, Chapman ACT 2611 AUSTRALIA 

Visiting Professor in the Faculty of Law            University of N.S.W.
Visiting Professor in Computer Science    Australian National University

-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <https://mailman.anu.edu.au/pipermail/link/attachments/20231117/4046146a/attachment.sig>


More information about the Link mailing list