[LINK] Dealing with Your IAP

Roger Clarke Roger.Clarke at xamax.com.au
Thu Apr 19 09:39:49 AEST 2012


At 22:18 +1000 18/4/12, rene wrote:
>Probably more than that, given the number of complaints about the same TPG
>problem here:
>http://forums.whirlpool.net.au/forum-replies.cfm?t=1901706
>( a thread that includes posts by several TPG staff since 3 pm yesterday -
>along the lines of "we're investigating").

A good clear indication of where the problem lay was in jardeath's 
post, at 19:04 on Tue evening.

It was acknowledged by a TPG rep at 20:10, but it still took another 
15 hours, until 11:41, to get a temprary fix done.

Am I being too bold when I suggest that a re-boot of the identified 
device would have had a, say, 80% chance of fixing the problem?  The 
collateral damage should have been minor pain for the other things 
that the device was doing at the time - which should all be 
self-healing thanks to the error-correction features of all the 
relevant protocols.


The 'take-away' from the event is that the company not only can't use 
its incident reporting system to detect major problems (and 
solutions) - which it could have discovered at 09:32 Tue - but it 
also failed to exploit meaningful content on the grizzle-boards - 
which appeared at 19:04 the same day.


Update:
(1)  the same thing happened between 06:00 and 07:45 this morning
      (I was only at the desk some of the time, so I can't nail down
      the actual window)
(2)  I would have emailed the engineer, but had no email-address,
      so I emailed the CEO again
(3)  in fact, that interruption was the intended final fix being done.
      (The temp fix on Wed was to fiddle with the routing tables in
      order to - partly? - isolate the device, and the final fix was
      replacement of the device, done c. 06:45)
(4)  as soon as I saw it was back, I emailed the CEO again
(5)  the engineer has subsequently called and given me an inside
      email-address for the engineering group  (:-)}

Summary:
-   the internal engineering at TPG may well be quite okay
     (The engineer explained, without me prompting, why this was a
     difficult problem to detect automatically, whereas a device or
     server *outage* would have been auto-reported to engineering)
-   the arrangements for detecting dissatisfaction in user-land, and
     monitoring sources for useful diagnostic information, are a serious
     weak-point, which has cost the company an amount of cred and goodwill


-- 
Roger Clarke                                 http://www.rogerclarke.com/

Xamax Consultancy Pty Ltd      78 Sidaway St, Chapman ACT 2611 AUSTRALIA
                    Tel: +61 2 6288 1472, and 6288 6916
mailto:Roger.Clarke at xamax.com.au                http://www.xamax.com.au/

Visiting Professor in the Faculty of Law               University of NSW
Visiting Professor in Computer Science    Australian National University



More information about the Link mailing list