[LINK] IBM Builds Biggest Data Drive Ever

stephen at melbpc.org.au stephen at melbpc.org.au
Mon Aug 29 21:07:31 AEST 2011


IBM Builds Biggest Data Drive Ever

Thursday, August 25, 2011 By Tom Simonite
<http://www.technologyreview.com/computing/38440/>


A data repository almost 10 times bigger than any made before is being 
built by researchers at IBM's Almaden, California, research lab. 

The 120 petabyte "drive" — that's 120 million gigabytes — is made up of 
200,000 conventional hard disk drives working together. 

The giant data container is expected to store around one trillion files 
and should provide the space needed to allow more powerful simulations of 
complex systems, like those used to model weather and climate.

A 120 petabyte drive could hold 24 billion typical five-megabyte MP3 
files or comfortably swallow 60 copies of the biggest backup of the Web, 
the 150 billion pages that make up the Internet Archive's WayBack Machine.

The data storage group at IBM Almaden is developing the record-breaking 
storage system for an unnamed client that needs a new supercomputer for 
detailed simulations of real-world phenomena. However, the new 
technologies developed to build such a large repository could enable 
similar systems for more conventional commercial computing, says Bruce 
Hillsberg, director of storage research at IBM and leader of the project.

"This 120 petabyte system is on the lunatic fringe now, but in a few 
years it may be that all cloud computing systems are like it," Hillsberg 
says. Just keeping track of the names, types, and other attributes of the 
files stored in the system will consume around two petabytes of its 
capacity.

Steve Conway, a vice president of research with the analyst firm IDC who 
specializes in high-performance computing (HPC), says IBM's repository is 
significantly bigger than previous storage systems. "A 120-petabye 
storage array would easily be the largest I've encountered," he says. 

The largest arrays available today are about 15 petabytes in size. 

Supercomputing problems that could benefit from more data storage include 
weather forecasts, seismic processing in the petroleum industry, and 
molecular studies of genomes or proteins, says Conway.

IBM's engineers developed a series of new hardware and software 
techniques to enable such a large hike in data-storage capacity. Finding 
a way to efficiently combine the thousands of hard drives that the system 
is built from was one challenge. 

As in most data centers, the drives sit in horizontal drawers stacked 
inside tall racks. Yet IBM's researchers had to make those significantly 
wider than usual to fit more disks into a smaller area. 

The disks must be cooled with circulating water rather than standard fans.

The inevitable failures that occur regularly in such a large collection 
of disks present another major challenge, says Hillsberg. IBM uses the 
standard tactic of storing multiple copies of data on different disks, 
but it employs new refinements that allow a supercomputer to keep working 
at almost full speed even when a drive breaks down.

--

Cheers,
Stephen



More information about the Link mailing list