[LINK] Guidelines for Digital Repositories, Canberra, 27 July

Karl Auer kauer at biplane.com.au
Fri Jul 21 11:24:17 AEST 2006

On Fri, 2006-07-21 at 10:29 +1000, Kim Holburn wrote:
> As I understand it in general, digital archiving usually involves  
> uncompressed data with all the bits present.  While you might include  
> compressed versions the base data is uncompressed.

The problem is that compression ain't compression.

Compression to us programmers means reducing the amount of data using a
completely reversible encoding, so that the result after decompression
is identical to the original uncompressed data.

Compression to video people means *throwing data away* to reduce the
size of the data. It is one of the huge sadnesses of early television,
and it continues to this day, that vast amounts of irreplaceable video
are being discarded during "archiving". Putting film on video also loses
huge amounts of data, as the resolution of film is vastly higher than

In the music world the term is totally ambiguous, which is why in the
world of sound the word "compression" is almost invariably preceded
either by the actual method used or by "lossy" or "lossless".

Digital data - including digitised analog data like older video or music
- is almost invariably reversibly compressed for storage, unless it is
unimaginably precious. The problem is that a single corrupted bit in a
compressed data file can render the whole thing unusable. The better the
compression, the worse the effect of corruption. Uncompressed data can
often be used even when corrupted, or techniques can be applied to
recover the corrupted parts. So truly precious data will be stored
uncompressed, but that makes it hugely more expensive. The usual
solution is multiple copies or even multiple copies across different

For a great treatise (but I'm biased) on some of the problems involved
in archiving film footage, check out the introduction of this paper:


One colour feature film, scanned from end to end at a resolution high
enough to capture the film granularity (about 5000dpi) at a 16 bit
colour depth will take about 19 terabytes of raw data. Compression is
our friend :-)

For archival purposes, the highest possible resolution should always be
chosen for storage. We can always reduce it.

BTW, the technique mentioned in the above document has the interesting
effect of being able to archive any optical soundtrack in the same pass,
with *no decode phase*.

Regards, K.

Karl Auer (kauer at biplane.com.au)                   +61-2-64957160 (h)
http://www.biplane.com.au/~kauer/                  +61-428-957160 (mob)

More information about the Link mailing list