[LINK] Pandora?

Sandra Henderson SHENDERS at nla.gov.au
Wed Jan 3 10:28:06 AEDT 2007

PANDORA doesn't ignore sites hosted outside Australia, and doesn't
ignore Australian sites which aren't in the .au domain.  There are
examples of both in PANDORA.  PANDORA is a labour-intensive operation,
which is why its impossible to capture ALL Australian sites - all files
are checked for completeness and functionality, to ensure the harvest
has been accurate. This level of hand-crafting is not possible on a much
larger scale. PANDORA is consistently lauded by the web archiving
community for its high standards and achievements. 
Conscious of the very selective nature of the PANDORA Archive, the NLA
also contracted the Internet Archive to undertake two comprehensive
harvests of Australian sites.  These harvests, done in 2005 and 2006,
are not available to the public - the copyright situation, for example,
is tricky, which is why PANDORA files are only made available with
permission of the publishers. The files from the two web harvests are
installed here at the NLA - the last one resulted in over 19 terabytes
of data, from 1.2 million sites (over 500 million files), and the
Internet Archive has also prepared a full text index - this index across
the two years of files was the largest the Internet Archive has ever
undertaken.  It does mean we do have, for the future, two large
snapshots of the Australian web (and I understand the lists of seed URLS
we provided also ensure we get relevant non .au sites)

Sandra Henderson
Manager, Research, Coordination Support Branch
National Library of Australia
Phone: +61 2 6262 1481
Fax: +61 2 6273 2545
Email: shenders at nla.gov.au

-----Original Message-----
From: link-bounces at anumail0.anu.edu.au
[mailto:link-bounces at anumail0.anu.edu.au] On Behalf Of Michael Still
Posted At: Thursday, 28 December 2006 12:38 PM
Posted To: Link List
Conversation: [LINK] Pandora?
Subject: Re: [LINK] Pandora?

Adam Todd wrote:
> At 08:07 AM 28/12/2006, Michael Still wrote:
>> Eric Scheid wrote:
>>> no idea if the sitemaps protocol is supported by Pandora yet.
>> Isn't it irrelevant? Pandora claim to be a "selective index" (the 
>> usefulness of such a thing seems low to me, but anyway), and probably

>> wouldn't include anything new.
>> http://pandora.nla.gov.au/selectionguidelinesallpartners.html
> Pandora is a great resource.  Not every web site on the Internet that 
> holds Australian Public interest gets fully archives elsewhere.

Sure. In that case though they should do it properly and do all sights. 
Or partner with someone who can if they can't.

> The Internet Archive Project is probably one of the few resources that

> tries hard with limited resources to do such.
> If you have a web site that has a level of public interest, or future 
> historic  value or even a site that has information that may one day 
> vanish - say when you move house and turn off your server, then 
> Pandora is a place you can approach if they haven't already approached

> you, to at least ensure that something of the fastest vanishing 
> records in history might be preserved.

I asked to be included. I was. Then they dropped me without warning. 
They're also flawed in that their definition of "Australian content" 
ignores (last time I checked):

  - content hosted outside of Australia
  - content on a non-.au domain

There are _lots_ of Australians who host overseas and who don't use a
.au domain, in both cases because the costs of doing it locally are
disproportionally high.

Why not use whois data for such an index? Why not let people opt in?

> I'm sure as time passes, the NLA will find ways to add more resources 
> and storage to the archive facility and increase and broaden the

Until Pandora is a complete archive of all "Australian" content (for
some workable definition), it will be largely useless. Worse, it
encourages others to make assumptions based on bad input data...

> I'd like to think that Pandora archives all the Political web sites 
> for each of those Running Candidates that flings so much crude on 
> their websites that never gets printed in a news paper.  Those things 
> can come back to bite later :)
> Be nice if Pandora could also archive all School based web sites.  
> They change so often that history and commendations vanish from the 
> eyes of the world as fast as they are created.
> Everyone deserves to be noted.  Time Magazine proved that!

Ok, so then they should do that.

Link mailing list
Link at mailman.anu.edu.au

More information about the Link mailing list