[LINK] Pandora?

Adam Todd link at todd.inoz.com
Wed Jan 3 11:32:10 AEDT 2007

At 10:28 AM 3/01/2007, Sandra Henderson wrote:
>PANDORA doesn't ignore sites hosted outside Australia, and doesn't
>ignore Australian sites which aren't in the .au domain.  There are
>examples of both in PANDORA.  PANDORA is a labour-intensive operation,
>which is why its impossible to capture ALL Australian sites - all files
>are checked for completeness and functionality, to ensure the harvest
>has been accurate.

Ergh, I dunna envy you (and the Pandora team) Sandra!  There is a lot of 
work to go to archive properly.

>This level of hand-crafting is not possible on a much
>larger scale. PANDORA is consistently lauded by the web archiving
>community for its high standards and achievements.
>Conscious of the very selective nature of the PANDORA Archive, the NLA
>also contracted the Internet Archive to undertake two comprehensive
>harvests of Australian sites.  These harvests, done in 2005 and 2006,
>are not available to the public - the copyright situation, for example,
>is tricky,

Tell me where to sign!

Seriously.  Although I ensure archiving of our web sites and public access 
information here, if my resources vanish, so does the information.

Internet Archive and Pandora are VITAL as library store houses of history 
that is there one moment and gone the next.

OK I've had my web sites and public access information online since the 
early 1990's and I'm still here.  Not bad when you think about it.

But with up and coming relocations and new offices being set up around the 
world, a change in my career path and new goals and aspirations, much of my 
Internet History is going to vanish and along with that, much information 
collected and gathered in the public interest and published.

I'll try and "house it" in an all in one location but it's not always going 
to be easy for me to do.

I've also used Internet Archive to recover our oldest web sites which we 
updated without giving a thought to the mere fact they really should have 
been burned to CD for archival purposes.

I use to see the Internet as an evolutionary process, not a historic process.

Now I realise that history is the discovery of the evolution process and 
keep more accurate archives.

I don't want the world in 2 million years to be trying to speculate what I 
did, how and when!  I'd rather give it to them in some form!

>which is why PANDORA files are only made available with
>permission of the publishers. The files from the two web harvests are
>installed here at the NLA - the last one resulted in over 19 terabytes
>of data, from 1.2 million sites (over 500 million files),


>and the Internet Archive has also prepared a full text index - this index 
>across the two years of files was the largest the Internet Archive has ever

It's quite scary when you think about it.

>It does mean we do have, for the future, two large
>snapshots of the Australian web (and I understand the lists of seed URLS
>we provided also ensure we get relevant non .au sites)

As I said - where do we sign?

Especially as this year AJ's web site is going to take on a whole new look 
and form!  We'll still have the old version running online.  (Pandora as 
far as I know already has an older snapshot of AJ's web site!)

And we've got two new "Drama" sites coming soon.

More information about the Link mailing list