[LINK] virgin blue outage II
rik at kawaja.net
Mon Feb 21 23:29:43 AEDT 2011
I'm not aware specifically whether SSDs are particularly susceptible to power outages, but they haven't been mainstream or that long. There's no particular inherent reason why it should be a problem for enterprise class equipment. Was the virgin blue outage linked to power problems?
More generally though, IT systems continue to get more complex, particularly at enterprise level. This means there are lots of difficult-to-predict border cases, which increases overall risk. This firmware mismatch is a good example - whatever the storage vendors tell you (any storage vendor), storage is complex and is too easy to get wrong. That's why so much time and money is spent on testing and validation. And some this still cant be effectively tested, so you hope that your restoration/back out/recovery procedures work smoothly. The decision to deploy bleeding edge is a risk/reward equation (whatever the vendor says). The organization in question may be reevaluating it's current risk/reward threshold now...
If the scenario below is correct (it seems pretty plausible to me) and they really were using SSDs in a NetApp array, then I understand these are really new - NetApp has historically used solid state storage in caches (PAM, renamed FlashCache) not as drives. It doesn't surprise me that they might have had issues with them. It is a surprise that the issues caused such an outage, though - may have been coincidental?
Disclaimer - I'm not a vendor (I'm sure you guessed that) but I do work for an enterprise - Telstra. These words are my opinion, not Telstra's.
On 21/02/2011, at 4:06 AM, "Philip Argy" <pargy at argystar.com> wrote:
> Could the use of solid state disk arrays make them abnormally vulnerable to
> a power outage? It seems pretty amazing to me that something as simple as a
> power outage would be an issue in this day and age!
> -----Original Message-----
> From: link-bounces at mailman.anu.edu.au
> [mailto:link-bounces at mailman.anu.edu.au] On Behalf Of Rachel Polanskis
> Sent: Friday, October 08, 2010 5:29 PM
> To: Link list
> Subject: [LINK] virgin blue outage
> I was in a meeting today, with some product vendors whose name starts with
> the 15th letter of the alphabet. We briefly discussed the virgin blue
> airline checkout crash. Apparently,
> those in the know told us that the problem was caused by a netapp data
> server that uses
> solid state (ssd) disk drives in the array. According the the guy that I
> spoke to, this was
> a new system that is arguably using bleeding edge hardware and the issue was
> caused by
> firmware mismatches on the drives themselves, vs the netapp RAID layer. How
> true this
> is I do not know, but the people concerned did seem to have some knowledge
> of the event...
> rachel polanskis
> <r.polanskis at uws.edu.au>
> <grove at zeta.org.au>
> Link mailing list
> Link at mailman.anu.edu.au
More information about the Link