[LINK] FW: Australia's online history 'facing extinction'

Paul Koerbin pkoerbin at nla.gov.au
Thu May 7 17:01:49 AEST 2009

The 2.9 tb figure is for the PANDORA Archive only, which is a selective, permissions based - no legal deposit to support this activity - and quality checked collection. The best crawl robots in the world do not do a thorough job of collecting the content, so we check it, patch it up and make it function in the archive context.

You can of course just send out robots to trawl whatever it can collect, as the Internet Archive does and as we have done in collaboration with them for the past 4 years too. The domain harvest collections amount to nearly 80 tb of data. Last year we collected 1 billion files from the .au domain amounting to around 35 tb of data if memory serves me correct (I'm on leave at present). But don't ask to look at it, because you can't.

And yes, collecting we material for preservation purposes is a costly exercise for legal and technical and proceduural reasons, for which we have received no funding, so it has to come out of operational budgets. Consequently what we can do is much more limited than is desirable. This was part of the rationale for my opinion piece on the ABC online opinion site today.

Paul Koerbin
Manager Web Archiving

From: link-bounces at mailman1.anu.edu.au [link-bounces at mailman1.anu.edu.au] On Behalf Of Michael Still [mikal at stillhq.com]
Sent: Thursday, 7 May 2009 3:02 PM
To: brd at iimetro.com.au
Cc: link at anu.edu.au
Subject: Re: [LINK] Australia's online history 'facing extinction'

Bernard Robertson-Dunn wrote:

> Australia's online history 'facing extinction'
> By Brigid Andersen
> ABC News
> http://www.abc.net.au/news/stories/2009/05/07/2563251.htm


> "We've been going for quite a while and we have at the moment about
> 2.9 terabytes of information, but that is a drop in the ocean to what
> has been produced since 1996," he said.

I agree... That's a pathetic amount of data to have collected. How much
have Australian's paid for that? What's their engineering approach? How
do they decide what to archive?

Link mailing list
Link at mailman.anu.edu.au

More information about the Link mailing list