[LINK] Open Source at Digital Preservation Meeting in Canberra

Tom Worthington Tom.Worthington at tomw.net.au
Fri Sep 17 17:56:39 EST 2004


Australian, New Zealand and international archivists and records managers 
are in Canberra this week for several conferences and meetings. I was 
invited along to a one day  "Advances in Digital Preservation: an 
International Working Meeting" 15 September at the National Archives of 
Australia (NAA) 
<http://www.naa.gov.au/publications/media_releases/international_working.html>. 
The presentations will be on the NAA web site next week. Below are my 
notes. The most interesting part of the day was at then end, when NAA 
described how they built an ELECTRONIC ARCHIVE 
<http://www.naa.gov.au/recordkeeping/preservation/digital/summary.html> and 
released OPEN SOURCE SOFTWARE TO PREPARE CONTENT 
<http://sourceforge.net/projects/xena/>.

GETTING THERE: rather than ride my bicycle 
<http://www.tomw.net.au/2003/bb/> I caught the number 34 bus 
<http://www.action.act.gov.au/timetable.cfm?route=34>. This was an 
enjoyable 30 minutes through the ANU campus down to the National Museum of 
Australia on a peninsular in Lake Burley Griffin, over the lake, past Old 
Parliament House to what used to be known as "West Block", where NAA have 
offices and a museum. This is relevant to the day's events, as the NAA is 
very much part of the political and physical history of Canberra.

INTRODUCTION (Ross Gibbs, NAA): I felt a little out of place as everyone 
else had "Archives" or "Library" on their name tag. The introduction was a 
little daunting with international initiatives and meetings mentioned. But 
AGLS got a mention and so then I felt comfortable 
<http://www.tomw.net.au/2004/dm/mpub.html#agls>. Some quips from the 
introduction quoted from previous debates: "PDF not the way to go", "XML 
not the way to go", "my archive is bigger than yours". ;-)

ARCHIVES INTERNATIONALLY (Andrew McDonald, UK & International Council on 
Archives): "Electronic records: a workbook for archivists" builds on ICA 
Guide and ISO standard. The ISO standard was based on the Australian 
standard so should be of use here. It is planned for release in early  2005 
on paper and on the web. I asked Andrew if the document could be in a 
format easily accessible on-line (accessible to the disabled in particular) 
and he said he would pass this on to ICA. <http://www.ica.org>

USA ELECTRONIC RECORDS ARCHIVES (Fynette Eaton, US National Archives and 
Records Administration): Computer based records are the most rapidly 
developing part of the collection. As an example, 911 commission records 
will be transferred before the end of the year and include extensive 
electronic documents. NARA's new system is "Electronic Records Archives" 
(ERA) <http://www.archives.gov/electronic_records_archives/index.html>.

NARA takes the view that it is the translated human readable reproduction 
of binary data which is the electronic record, not the stored binary data 
itself. To me this sounds like nonsense (by this logic microfilm is not a 
record, as you need a machine to read it). However, this might match the 
legal argument which the High Court used to decide that e-documents are 
documents <http://www.tomw.net.au/2004/dm/>.

NARA is experimenting with Data Grids for record access and DoD 5015.2 STD 
was mentioned. NARA's approach is to have a core set of standards and then 
leave implementation details to industry. Two companies were selected: 
Lockheed Martin ($9.5M) and Harris Corporation ($10.6M) to each separately 
work with NARA staff to produce designs and architecture 
<http://www.archives.gov/electronic_records_archives/acquisition/design_contract_award.html>. 
One company will then be selected for implementation.

It is a shame NARA didn't select an open source approach, so that NARA and 
the companies, could share their knowledge with each other and the wider 
community. The NARA approach appears similar to US DoD weapons acquisition 
approach which is not known for efficiency or success. The probability of 
NARA being successful with their exercise seems to me to be about 5%. But 
it will be interesting to see in a year's time how it went.

AUSTRALASIAN INITIATIVES IN DIGITAL RECORD KEEPING (Stephen Ellis, NAA): 
Newer digital media tends to be less stable than older (writable CDs less 
stable than magnetic media). Spatial data community in Australia do good 
work on e-records. Australasian Digital RecordKeeping Initiative (ADRI) 
includes the federal Australian Government, all states and NZ 
<http://www.naa.gov.au/publications/media_releases/digital.html>. A 
framework is being prepared and will cover making and managing records. 
They will then provide guidelines and standards. It appears to me that this 
will be much more specific than the USA approach (and is much more likely 
to succeed). It will likely use XML and be based on NAA's work. Australian 
states and NZ already look to NAA for leadership. For example states use 
AGLS, as does NZ (with the Maori language added 
<http://www.tomw.net.au/2001/nzmmf.html>).

Guidelines will include document formats and transfer methods. I think it 
would be useful if guidelines could include document templates but 
archivists may be reluctant to be so prescriptive 
<http://www.tomw.net.au/2004/dm/acsepub.html>. Transfer systems are likely 
to look like what is used for Web Services 
<http://www.tomw.net.au/2002/ws/> and archive access like DSPACE 
<http://www.tomw.net.au/2004/dm/masticate.html>. Standards to be used are 
ISO 14721:2003 (OAIS Blue Book), ISO15489, XML. NAA's approach is that 
"records are data" which seems to be slightly different to the USA approach 
(and more realistic). I asked Stephen to make the working documents 
available on-line so the research community can contribute.

* PRESERVATION TECHNOLOGIES AND PROTOTYPES AT SDSC (Richard Marciano): SDSC 
is a supercomputer grid shop <http://www.sdsc.edu/>. Richard had an 
inspiring approach to making information more open and accessible. He 
emphasized that the technology is constantly changing and that archiving 
has to cope with that. What was impressive was that he went from a 
philosophical discussion to building real systems with Linux PC based "data 
bricks" for hundreds of terabytes of storage accessed using "Storage Record 
Broker" software and accessed via Web Services.

* PROPRIETARY FORMAT TO OPEN SOURCE (Liz Reuben, Department of Family and 
Community Services): This was an excellent down to earth overview of a 
project to convert a very large complex proprietary electronic document 
"Guide to Social Security Law" to an open format. Although this is only 
from 2003 it now sounds like an age ago, with conversion from the 
proprietary format via RTF to HTML and then on to XML. They used the 
Victorian Government VERS standard, but this has now aligned more closely 
with NAA. Those who do not learn from this history are likely to be 
condemned to convert a lot of old electronic documents, so read the 
excellent report of this work "Migrating Records from Proprietary Software 
to RTF, HTML, and XML" by Elizabeth Reuben in Vol. 23 No. 6 — June 2003 
of  Computers in Libraries magazine 
<http://www.infotoday.com/cilmag/jun03/reuben.shtml>.

At that point in the day (just after lunch) the battery went flat in my 
laptop, so I stopped taking notes. From memory, what came next was:

* VICTORIAN ELECTRONIC RECORDS STRATEGY UPDATE (Howard Quenault, Public 
Record office Victoria): VERS has moved from research to development to 
production with a significant resource input from the Victorian Government 
<http://www.prov.vic.gov.au/vers/vers/default.htm>. As I see it, this 
provides some practical details which have been missing from the NAA's 
approach. It is not complex and it looks like Australia ERM product vendors 
at least will add VERS interfaces to their systems.

* NAA PRESERVATION APPROACH (Cornel Platzer, Andrew Wilson, NAA): A written 
paper "Preservation of digital records the national archives approach" (15 
September 2004) was provided. I couldn't find it on the web but it appears 
to be an update of "National Archives Green Paper: An Approach to the 
Preservation of Digital Records" 
<http://www.naa.gov.au/recordkeeping/er/digital_preservation/Green_Paper.pdf> 
(December 2002). This contains statements such as "When a source is 
combined with a process, a performance is created and it is this 
performance that provides meaning to a researcher." This sounded like 
nonsense to me and a variation on the NARA's idea that the binary data is 
not electronic record. With talk of such ideas as "performance" archivists 
seem to be dancing around (pun intended) the issue that they cannot 
preserve the electronic record in its original form.

Because the software used to create a record will not be available in the 
future, archivists need to transform electronic records into a format 
suitable for long term preservation. Some of the content of the record will 
be lost in transformation and it will not look the same when rendered with 
other than the original software. My view is archivists need to stop 
confusing this issue with terms such as "performance" and "essence" get on 
with worrying about what of the record needs to be preserved and what can 
be thrown away. Some of the tools developed by computer scientists to 
formally prove that critical software does what it is supposed to might be 
applied to this task. Some of the techniques from accessible web design, 
which allow for the same documents to be rendered in different ways for 
different people, could also be applied 
<http://www.tomw.net.au/2004/wd/testing.html>.

Leaving the rhetoric to one side, the NAA have build the first version of 
an electronic archive which is about ready to go into production 
<http://www.naa.gov.au/recordkeeping/preservation/digital/summary.html>. 
This takes a cautions approach with three separate systems (using what in 
my Defence days was called an "air gap") with documents to be archived 
checked and staged between the systems. This is perhaps an overly cautious 
approach and some sort of more on-line approach might be introduced later. 
However, the significance of NAA's step should not be lost in the details: 
they have built an electronic archive. But how do you get the electronic 
records into a goo format for the archive? That came as the biggest 
surprise of the day:

XENA XML ELECTRONIC NORMALIZING OF ARCHIVES: The NAA then surprised me by 
demonstrating their Xena software. I had been vaguely aware that they were 
doing something and there and been some press items ("Australia's history 
archived in OpenOffice.org" by Steven Deare, LinuxWorld 14/10/2003 
<http://www.linuxworld.com.au/index.php?id=1991153367&fp=2&fpid=1>), but 
that they had actually produced some software, and how they were releasing 
it, came as a complete surprise.

Xena is designed to prepare documents for the electronic archive. What the 
software does is not that exciting: it uses readily available tools such as 
OpenOffice.Org to package documents in XML. However, what is exciting is 
that the tool is portable across Microsoft Windows, Apple Mac and Linux 
systems and that it is AVAILABLE NOW AS OPEN SOURCE: 
<http://sourceforge.net/projects/xena/>.

While people talking earlier in the day about guidelines they are going to 
release and software they are going to write was interesting, this had none 
of the impact of NAA doing a LIVE DEMONSTRATION of software which is 
available FREE TO ANYONE.

There is much which could be improved in Xena. OOO's XML format is not the 
ideal for efficient, flexible, long term document storage. Given that 
content from Commonwealth agencies will be converted to an XML format for 
long term storage, it would make sense to create the documents in that 
format, or a compatible one in the first place. This works for creating 
scientific papers in Microsoft Word, so should work for public service 
documents just as well <http://www.tomw.net.au/2004/dm/acsepub.html>.  Tim 
Wilson-Brown's work on restructuring documents would be of use 
<http://xml.anu.edu.au/index.php3?config=xpub&dummy=index>. But even if 
OOO's XML format is retained, further optimizations can be performed of it.

What NAA has done is to provide researchers and companies with something to 
work from which is more than just an abstract specification. As an example 
of what could be done, it should be possible for one good undergraduate IT 
student to implement an on-line version of Xena with about a day or twos 
work and add VERS functionality with a couple of days effort.

However, it will not be the electronic conversion or archiving software 
which makes a success of electronic archives. The software for these is 
relatively simple and with XML tools becoming available has become 
something an IT student can do as an undergraduate project. The difficult 
bit is integrating this with the business function of government 
organizations. This might suit the architectural evaluation techniques 
based on empirical research methods used by Nordic Technology 
<http://www.nordic.com.au>.

ps: The workshop was followed by the launch of "Looking Back to the Future: 
30 Years of Keeping Electronic Records in the National Archives of 
Australia" <http://ourhistory.naa.gov.au/>. It made me feel a bit old to 
get a mention in the history of an archive 
<http://ourhistory.naa.gov.au/library/playing_for_keeps.html>. Ironically 
the copy of my paper recommending agencies use HTML is stored as a scanned 
bit map image in PDF. ;-)




Tom Worthington FACS     tom.worthington at tomw.net.au  Ph: 0419 496150
Director, Tomw Communications Pty Ltd             ABN: 17 088 714 309
http://www.tomw.net.au          PO Box 13, Belconnen ACT 2617
Visiting Fellow, Computer Science,  Australian National University
Publications Director,  Australian Computer Society  




More information about the Link mailing list