[LINK] Optus Outage

Kate Lance kate at lancewood.net
Thu Nov 9 11:33:02 AEDT 2023


Hi Narelle,

An interesting post on Mastodon from Rob Thomas, supporting the idea it was
a route reflector overload -
https://mastodon.au/@xrobau/111376847362633903

The problem yesterday started at about 4am, when Optus told the world 'I no
longer have any internet connectivity', and 'Do not send any internet traffic
to me, at all'. The technical description is that they withdrew ALL of their
routes from the #DFZ (Which is "The Internet", as seen by all the core routers
that ACTUALLY control the internet).

However, as a precursor at about 3am there was a hint that things weren't
perfect, as there was a flurry of changes from Optus to the outside world
saying, roughly, 'Something has changed inside my network, but you can still
keep sending me stuff'.

Now, as two final bits of possibly relevant information, the default for
maximum-prefix on #Cisco #ASR9000 is 1048576 (this number is 'the number of
routes that can be accepted by this router'), and MOST IMPORTANTLY the DFZ
("the internet") has about 980,000 routes in it at the moment. That's only 90k
odd routes LESS than the default maximum. 

I'd be amazed if Optus has less than 100k internal routes  that aren't visible
to the internet, but are visible internally.

So here's what I think happened. The at 3am, the first core #router was
upgraded, and a new config was put in place. This did not join the network
correctly, and things were half broken. What SHOULD have happened is that all
the changes should have stopped, and either rolled back, or waited for further
investigation (the cause being that more than 1mil routes were visible, causing
it to shut down)

However, someone decided 'Well, maybe if we upgrade the SECOND one, that'll fix
the first one' at 4am. That broke the SECOND one, and took Optus completely off
the internet.

(Continued, see next for why this is far worse than it should have been)
.....


Regards, Kate


On Wed, Nov 08, 2023 at 05:33:43PM +1100, Narelle Clark wrote:
> Rumour has it was a BGP update from an external source that wasn't filtered
> properly with which the BGP route reflectors then overloaded the internal
> routers. Persistently.
> 
> It was clearly an internal transport problem arising from an underlying IP
> protocol. BGP fits that bill completely as it would be redistributed, and
> clearly their management network isn't sufficiently out of band. Once a
> network of that scale goes down like that, you can't just turn it back on
> and expect it to all work fine - millions of devices all want to
> re-register at once, and all those state changes across the network have to
> converge...
> 
> Narelle
> 
> On Wed, 8 Nov 2023 at 10:30, Alex (Maxious) Sadleir <maxious at gmail.com>
> wrote:
> 
> > Around 4am, Optus networks re-announced all their BGP routes at once
> > https://radar.cloudflare.com/routing/as7474
> > https://radar.cloudflare.com/routing/as4804
> > This is indicative of a change management malfunction, akin to IBM's
> > 2016 eCensus routers restarting with no routes
> >
> > https://www.itnews.com.au/news/ibm-treasury-in-settlement-talks-over-census-failure-440066
> >
> > The VoWifi infrastructure seems to be online but unable to connect any
> > calls
> > https://goughlui.com/2023/11/08/breaking-optus-nationwide-outage-08-11-2023/
> > More alarmingly 000 doesn't work on landlines
> > https://twitter.com/lucethoughts/status/1722029287727825124 contrary
> > to advice from emergency services
> > https://twitter.com/nswpolice/status/1722028862161449151
> >
> > On Wed, Nov 8, 2023 at 10:08 AM Tom Worthington
> > <tom.worthington at tomw.net.au> wrote:
> > >
> > > Any more news on what caused the Optus network outage? On ABC Canberra
> > > Radio this morning I suggested it was most likely a software upgrade
> > > which went wrong, and would be fixed by 6pm.
> > >
> > > Is VoWiFi working?
> > >
> > > I use Telstra, but when COVID-19 struck, I purchased an Optus 4G modem,
> > > with an Optus SIM. This was in case Telstra went down.
> > >
> > >
> > > --
> > > Tom Worthington http://www.tomw.net.au
> > > _______________________________________________
> > > Link mailing list
> > > Link at anu.edu.au
> > > https://mailman.anu.edu.au/mailman/listinfo/link
> >
> > _______________________________________________
> > Link mailing list
> > Link at anu.edu.au
> > https://mailman.anu.edu.au/mailman/listinfo/link
> >
> 
> 
> -- 
> 
> 
> Narelle
> narellec at gmail.com
> _______________________________________________
> Link mailing list
> Link at anu.edu.au
> https://mailman.anu.edu.au/mailman/listinfo/link


More information about the Link mailing list