[LINK] The Optus Sub to the Parltry Ctee
Roger Clarke
Roger.Clarke at xamax.com.au
Fri Nov 17 08:07:38 AEDT 2023
The Optus statement:
https://www.aph.gov.au/DocumentStore.ashx?id=2ed95079-023d-49d5-87fd-d9029740629b&subId=750333
p.5:
19. It is now understood that the outage occurred due to approximately
90 PE routers automatically self-isolating in order to protect
themselves from an overload of IP routing information. These
self-protection limits are default settings provided by the relevant
global equipment vendor (Cisco).
20. This unexpected overload of IP routing information occurred after
a software upgrade at one of the Singtel internet exchanges (known as
STiX) in North America, one of Optus’ international networks. During the
upgrade, the Optus network received changes in routing information from
an alternate Singtel peering router. These routing changes were
propagated through multiple layers of our IP Core network. As a result,
at around 4:05am (AEDT), the pre-set safety limits on a significant
number of Optus network routers were exceeded. Although the software
upgrade resulted in the change in routing information, it was not the
cause of the incident.
21. Restoration required a large-scale effort across more than 100
devices in 14 sites nationwide to facilitate the recovery (site by
site). This recovery was performed remotely and also required physical
access to several sites.
___________
Optus outage blamed on edge router default settings
Fullest technical explanation so far is provided.
Ry Crozier
itNews
Nov 16 2023 11:01PM
https://www.itnews.com.au/news/optus-outage-blamed-on-edge-router-default-settings-602442
Optus has given its fullest account of what it thinks caused the
November 8 outage: default settings in its Cisco provider edge (PE)
routers that led to around 90 shutting down nationwide.
Optus outage blamed on edge router default settings
The attribution is an evolution of its previous explanation that an
“international peering network” had fed it bad data.
News reports this week identified that peer to be the Singtel internet
exchange (STiX), and partially identified the cause as a software
upgrade on Singtel’s end.
Singtel disputed that account on Thursday, instead - more correctly, it
seems - identifying “preset failsafe” mechanisms in Optus’ routers as
the cause - an account Optus confirmed in a submission filed late on
Thursday, ahead of a senate appearance on Friday.
“It is now understood that the outage occurred due to approximately 90
PE [provider edge] routers automatically self-isolating in order to
protect themselves from an overload of IP routing information,” Optus
said. [pdf]
“These self-protection limits are default settings provided by the
relevant global equipment vendor (Cisco).”
Optus said the “unexpected overload” of routing information came via “an
alternate Singtel peering router”, because the primary or usual router
hardware that Optus took route information from was under planned
maintenance.
The telco said an unspecified software upgrade was being performed at
one STiX location in North America - which Singtel confirms [pdf].
Optus suggests the upgrade led to the bad route information being
propagated - why, it is unclear - but now says this “was not the cause
of the incident" in Australia.
Instead, it puts the blame on the edge router “safety” defaults. It does
not say why the default settings were used, to what extent it had the
ability to tweak the settings, or how long the routers had operated with
these defaults in place.
Optus said a team of 150 engineers and technicians were directly
involved in the investigation and restoration, supported by another 250
staff and five vendors.
Six theories
For the first six hours or so, the engineers pursued six different
possible explanations for the large-scale outage.
These included whether works overnight by Optus itself were the cause;
it rolled back those changes but found no resolution.
Other options simultaneously explored included whether it was a DDoS
attack, a network authentication issue, or problems with other vendors
such as its content delivery network provider.
One explanation, however, became the “leading hypothesis for network
restoration”: equipment logs and alerts that “showed multiple Border
Gateway Protocol (BGP) IPv6 prefixes exceeding threshold alerts.”
“We identified that resetting routing connectivity addressed the loss of
network services. This occurred at 10:21am,” Optus said.
Engineers then set about “resetting and clearing routing connectivity on
network elements which had disconnected themselves from the network,
physically rebooting and reconnecting some network elements to restore
connectivity, [and] “carefully and methodically re-introducing traffic
onto the mobile data and voice core to avoid a signalling surge on the
network,” it said.
Engineers performed unspecified “resiliency” works on the network
between resolution on November 8 and the following Monday, November 13.
Optus foreshadowed more work to come.
“We are committed to learning from this event and continue to invest
heavily, working with our international vendors and partners, to
increase the resilience of our network,” it said.
“We will also support and will fully cooperate with the reviews being
undertaken by the government and the senate.”
Defends customer comms
Optus used other parts of its submission to defend its customer
communications on the outage day.
Its position is that as consumer and some enterprise services were out,
media - traditional and social - was considered the best way to get the
word out.
That is likely to be challenged in the senate inquiry.
The other issue the senate is likely to raise is financial compensation
for customers.
So far, Optus has offered users extra data quota, which has been
criticised in some circles.
While there is an argument that businesses, in particular, lost money
while the network was down, there is a counterargument that businesses
should have their own backup connectivity in the event their primary
service is down.
To what extent the senate can resolve that is unclear.
Financial compensation unprecedented
Optus, however, in its submission argues that making a telco pay
financial compensation for “consequential losses” isn’t a precedent that
should be set.
“There is no precedent for compensation being paid by telecommunications
providers to all business customers who suffer a loss of business as a
result of an outage of the kind that occurred on November 8, either here
or overseas,” Optus said.
“We understand that this would create a new precedent that would extend
far beyond Optus and apply to all other telecommunications providers, as
well as other providers of essential services, critical infrastructure
and public services.
“This makes it a much broader policy question for government that would
have far reaching implications across many sectors of the economy and
the cost of these services for Australian consumers.”
Optus said that it isn’t the first to suffer a sizeable outage in
Australia, nor would the November 8 outage be the last incident of its type.
“It is an unfortunate reality in our reliant digital age that no
communications network can completely protect against, nor prevent,
these types of occurrences from ever happening – despite the investments
made or resiliency efforts undertaken,” it said.
“Reflecting this, communications services are not provided with a
guarantee of continuous service.
“Given continuity of service is not guaranteed, consumers are not given
an automatic right of compensation whenever an outage occurs.”
--
Roger Clarke mailto:Roger.Clarke at xamax.com.au
T: +61 2 6288 6916 http://www.xamax.com.au http://www.rogerclarke.com
Xamax Consultancy Pty Ltd 78 Sidaway St, Chapman ACT 2611 AUSTRALIA
Visiting Professor in the Faculty of Law University of N.S.W.
Visiting Professor in Computer Science Australian National University
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature
Type: application/pgp-signature
Size: 840 bytes
Desc: OpenPGP digital signature
URL: <https://mailman.anu.edu.au/pipermail/link/attachments/20231117/4046146a/attachment.sig>
More information about the Link
mailing list