[LINK] Dealing with Your IAP
Roger Clarke
Roger.Clarke at xamax.com.au
Wed Apr 18 16:20:45 AEST 2012
The last 36 hours have given me some unwanted experience in the
complexities of dealing with an Internet Access Provider (IAP).
TPG provide me with ADSL2+ (but I use other ISPs for email and
web-site hosting).
An intermediary device in the bowels of TPG Canberra needed
re-booting, because it was blocking traffic from my IP-address to one
or more target IP-addresses - including my primary ISP.
If TPG has any automated mechanisms in place to detect dead or
malfunctioning nodes, they failed to discover the problem.
My incident report was not assigned an identifier.
It took a long time before it was available to the helpdesk (hours).
Subsequent emails also took time to become available to appropriate
people.
The frontline helpdesk (in Manila) generally answered quickly, and
was populated by people whose English was okay, whose politeness was
exemplary, but whose grasp of the technology was minimal.
The supervisor (once I managed to break through after 6 hours and 5
long phone-calls) was much more competent, and listened, and (given
time) appears to have eventually got the message through to the
engineers.
But I'd sent them full details in the first email, which were
sufficient to identify the problem. Yet it still took 26 hours from
the time that I notified the problem before service was restored (and
32 hours from the time the malfunction occurred).
I dread to think what would have happened to a mug punter.
They would still be following instructions on how to (needlessly)
re-boot email-clients, browsers, PCs, ADSL modems and routers, and
(needfully) run ping and traceroute, copy the results into an email,
and log into their router to change their username to one nominated
by the engineers (and remember to change it back afterwwards).
Full gory details below.
______________________________________________________________________
Tue 17 April, my auto-email downloads stopped about 03:45.
When I got to my desktop about 07:30, I found that not only my emails
but also a couple of web-sites, all hosted at Inifinite in Canberra,
were inaccessible.
I raised it with Infinite, who said they could see the sites fine.
On further checking, their own site was fine (119.15.96.8).
But I was getting no response from the device that runs my web-sites,
POP server, and the CPanel server (119.15.105.241).
Traceroutes established that the blockage was at the 4th step of 16:
cbr-trn-nor-crt2-port-channel-3 (202.7.162.73)
A whois at apnic.net.au confirmed that this is a TPG IP-address.
Like any well-trained customer, I checked:
http://www.tpg.com.au/servicestatus/
http://www.tpg.com.au/servicestatus/?timeframe=last24
It said only that there had been a problem 04:00-06:00: "Due to
carrier planned maintenance, some customers may experience slow
connection and browsing speeds within the maintenance window".
I emailed the email-address provided on the TPG web-site
(helpdesk at tpg.com.au) at 09:32. I included the explanation, and the
traceroutes.
A copy is down the bottom.
I left it 20 minutes to give them a chance, and then called the
support line at 09:52.
The help-desk answered very promptly. Alan plodded a bit, but he was
well-trained, polite, and worked hard at it.
He was in The Philippines. (It's not all that hot there today,
although they did have a 35 a couple of days ago. He assumed we used
Fahrenheit, and was looking for a converter for me until I assured
him that we use both F and C here. Pleasant chap).
But he couldn't get access to my email at any stage, and of course
(as a first-line help-desk'er) had only a hazy idea of how things
work. (I had to spell apnic.net out letter-by-letter, as part of the
exercise of convincing his tech supervisor that I knew the blockage
was a device within TPG).
After 20 minutes, and five pauses off-line while he checked with his
local tech supervisor, he'd done all he could and said he would pass
it up to the next level.
After 5 minutes, Alan called back and asked for a re-send of the
email, because they still didn't have access to it. By that stage it
was 10:19.
At 11:42, with the same problem arising, and having heard nothing, I
re-sent the email, again to the address copied from the web-site.
By 14:25, google.com.au was suffering the same problem. (Whereas
google.com wasn't). I called the support number, but TPG's IVR
refused to even queue me, due to an excess of callers waiting. There
were still no issues on notice at the service status page.
Trying again at 14:40, the call was queued, briefly. They could
quickly find the current status - but it was merely what it had been
at 10:15. I spent 20 minutes, mainly waiting while the frontline
desk'er tried to make progress with his local tech person.
At that stage, I (metaphorically of course) thumped the desk and said
'it's been 12 hours, and all it needs is for someone in TPG's
Canberra centre to re-boot the device that I identified to you 5
hours ago'.
He said he couldn't escalate until they'd done all the tests. I said
'I've done them all, and they made no difference. You should have
done all of this 4 hours ago anyway. Escalate it *now*'.
He put me back on hold to talk with his tech support. After 30
minutes, the batteries on my hand-set ran out and disconnected me ...
I left it another 1-1/2 hours, to give them a decent chance to get
organised. No change, and no callback. At 16:45, I called again, 13
hours after the fault started and 7 hours after notification.
I got through to the helpdesk quickly. This third person had no
technical clue at all. He couldn't grasp the idea that this was a
problem with ADSL traffic, and kept trying to solve non-existent
problems with my TPG email account.
He left me on hold for long periods, the last time for more than 10
minutes. But after resorting to strong demands, I did manage to
escape the local dunce, and get it escalated to his supervisor, Don.
Just before my handset batteries ran out, I arranged for Don to call
me on the other line (because of course I had no number to call him
directly on - they prohibit such things, don't they).
It took 5 minutes, but eventually Don understood that he needed to
search out the email that none of his predecessors appear to have
read. (Maybe the ability to read is an optional extra for frontline
help-deskers).
After a further 5 minutes, Don appeared to have actually understood
the problem - that a device within TPG was blocking oubound traffic
from me to various services that I depend upon, which are run by
another ISP.
He asked for 15 minutes to talk to an engineer. (An engineer! A
second-line help-desker can actually talk to engineers! What a novel
idea!). That was at 17:40.
There were then two 30-minute sessions with Don, who had a good
understanding of what the engineer wanted (and what the engineer
wanted made sense).
We did multiple tests using both my own account and an additional one
Don nominated. That involved multiple logins to my local router to
change the username. By the end of it they had copies of traceroutes
and pings for various combinations. It wasn't helped by there being
some instability in the service. But it was clear that traffic from
my location to one particular IP-address was being blocked by a
device inside TPG's Canberra premises. We finished about 18:40 (with
Don 1-1/2 hours into overtime).
At 06:00, there was no change. But support isn't available until 08:00 ...
At 08:15, yet another clueless frontline helpdesker started reading
the erroneous notes made by his predecessors. Clearly the screens
presented by the incident management system:
- fails to allocate an incident identifier
- fails to allocate an incident back to anyone who has previously
handled it
- fails to make the associated emails available on the screen
- fails to display the current status at the top
- forces every new handler to start at the very beginning
What period of the Dark Ages does the incident management system come from??
He was repeatedly unable to reach the 'system administrator', and
kept asking for more time to try again. After 10 minutes, I asked him
to call me as soon as he had some information - to save both my
handset-batteries and my blood-pressure.
The sole progress made was that he repeated back to me that the
problem was a device in the Canberra site that needed to be re-booted.
He called back at 08:30 to say that this was being handled by Don,
who would be in at 10am. (Don is the only one who has a clue, so I
accepted that gracefully, despite the ongoing loss of time).
Don called at 10:35. He said the technicians are working on it, and
there was concern that there may be other customers affected as well.
The problem was at that stage assigned to a senior technician. He
said he would call when he had new information, at latest 3pm.
(Okay, we're over 25 hours outage now, but at least it's finally
where it should have been 22 hours ago).
My email-client is set to download email every 15 minutes, and at
11:41, mail came tumbling down the line. Using traceroute, it was
clear that the path was no longer passing through the device that had
been blocking my traffic.
I sent an email to the helpdesk at 11:56, explaining what had
happened, including new traceroutes, and asking whether they had
changed something to fix it (e.g. took down the faulty server,
re-booted the device, or similar), or whether the problem had fixed
itself (e.g. because the internal routing tables changed).
At 4pm, yet another new person called me, asking me to do the
traceroutes that Don had taken me through the previous evening. He
clearly had not seen the email from 4 hours earlier, and could not
find it while we talked. I explained, as patiently as I could that
he needed to read that before we talked again.
You really do have to wonder about the design of helpdesk systems ...
_________________________________________________________________________
Date: Tue, 17 Apr 2012 09:32:47 +1000
To: helpdesk at tpg.com.au
From: Roger Clarke <Roger.Clarke at xamax.com.au>
Subject: Blockages at cbr-trn-nor-crt2-port-channel-3
Dear TPG Support
I have TPG ADLS2+ installed.
I have access to the Web.
But some IP-addresses are blocked.
I've checked your support site, but there's nothing there of relevance, e.g.
http://www.tpg.com.au/servicestatus/
http://www.tpg.com.au/support/problems_connect_internet_standalone.php
The blocked IP-addresses include sites where I store web-pages and
access email.
Specifically, I can access 119.15.96.8.
But I cannot access 119.15.105.241.
According to Traceroute, the blockage is at
cbr-trn-nor-crt2-port-channel-3 (202.7.162.73)
Whois @ APNIC says:
inetnum: 202.7.160.0 - 202.7.191.255
netname: TPG-AU
descr: TPG Internet Pty Ltd.
Traceroutes are below.
Would you please get this fixed urgently, and advise.
I cannot receive any email at present, so if there's a delay, please
call me on (02) 6288 6916.
Thanks! ... Roger Clarke
Account cheeper at tpg.com.au [2966669]
78 Sidaway St
Chapman ACT 2611
_________________________________________________________________________
08:47 Successful traceroute to 119.15.96.8
traceroute to infinite.net.au (119.15.96.8), 64 hops max, 40 byte packets
1 192-168-1-1 (192.168.1.1) 2.617 ms 0.512 ms 0.257 ms
2 cbr-trn-nor-bras3-lo-0 (10.20.20.221) 49.236 ms 21.247 ms 20.948 ms
3 cbr-trn-nor-csw2-port-channel-11 (202.7.173.29) 30.569 ms
23.341 ms 20.888 ms
4 cbr-trn-nor-crt2-port-channel-1 (202.7.162.49) 31.616 ms 29.791
ms 23.001 ms
5 syd-nxg-men-crt2-ge-2-1-2 (202.7.171.25) 42.338 ms 41.223 ms 40.513 ms
6 119.225.9.25 (119.225.9.25) 43.377 ms 35.388 ms 35.936 ms
7 * * *
8 119.225.3.82 (119.225.3.82) 38.945 ms 36.070 ms 36.157 ms
9 ip-214.192.31.114.vocus.net.au (114.31.192.214) 41.190 ms
40.765 ms 40.479 ms
10 ip-203.192.31.114.vocus.net.au (114.31.192.203) 53.257 ms
42.548 ms 39.834 ms
11 ten-0-0-0-0.cor01.syd01.nsw.vocus.net.au (114.31.192.38) 40.931
ms 40.781 ms 45.352 ms
12 ip-45.192.31.114.vocus.net.au (114.31.192.45) 44.522 ms
ip-193.192.31.114.vocus.net.au (114.31.192.193) 154.718 ms
ip-45.192.31.114.vocus.net.au (114.31.192.45) 40.302 ms
13 ge-0-2-7.bdr02.cbr01.act.vocus.net.au (114.31.204.66) 80.139 ms
* 40.540 ms
14 * ip-46.205.31.114.vocus.net.au (114.31.205.46) 92.857 ms *
15 ge0-0.sw1.cbr.infinite.net.au (119.15.111.62) 49.981 ms 34.589
ms 35.103 ms
16 www.infinite.net.au (119.15.96.8) 34.414 ms 34.552 ms 34.592 ms
08:49 Unsuccessful traceroutes to 119.15.105.241
traceroute to 119.15.105.241 (119.15.105.241), 64 hops max, 40 byte packets
1 192-168-1-1 (192.168.1.1) 1.194 ms 0.403 ms 3.353 ms
2 cbr-trn-nor-bras3-lo-0 (10.20.20.221) 44.323 ms 21.881 ms 176.546 ms
3 cbr-trn-nor-csw1-port-channel-11 (202.7.173.25) 21.094 ms
20.793 ms 22.326 ms
4 cbr-trn-nor-crt2-port-channel-3 (202.7.162.73) 20.404 ms 20.791
ms 22.825 ms
5 * * *
6 * * *
...
09:06
traceroute to cpanel01.infinite.net.au (119.15.105.241), 64 hops max,
40 byte packets
1 192-168-1-1 (192.168.1.1) 1.247 ms 0.521 ms 0.810 ms
2 cbr-trn-nor-bras3-lo-0 (10.20.20.221) 34.794 ms 32.807 ms 20.816 ms
3 cbr-trn-nor-csw1-port-channel-11 (202.7.173.25) 43.884 ms
24.730 ms 39.888 ms
4 cbr-trn-nor-crt2-port-channel-3 (202.7.162.73) 21.045 ms 20.670
ms 20.940 ms
5 * * *
6 * * *
...
_________________________________________________________________________
--
Roger Clarke http://www.rogerclarke.com/
Xamax Consultancy Pty Ltd 78 Sidaway St, Chapman ACT 2611 AUSTRALIA
Tel: +61 2 6288 1472, and 6288 6916
mailto:Roger.Clarke at xamax.com.au http://www.xamax.com.au/
Visiting Professor in the Faculty of Law University of NSW
Visiting Professor in Computer Science Australian National University
More information about the Link
mailing list