[LINK] Fwd: [padiforum-l] New tool available: File Information Tool Set (FITS)

Antony Barry tony at tony-barry.emu.id.au
Fri Aug 7 20:59:22 AEST 2009



Begin forwarded message:

> From: Andrea Goethals <andrea_goethals at harvard.edu>
> Date: 7 August 2009 4:40:18 AM
> To: digipres at ala.org, diglib at infoserv.inist.fr, padiforum-l at nla.gov.au
> Cc: Spencer McEwen <Spencer_McEwen at harvard.edu>, Vitaly Zakuta <vitaly_zakuta at harvard.edu 
> >
> Subject: [padiforum-l] New tool available: File Information Tool Set  
> (FITS)
> Reply-To: padiforum-l at nla.gov.au
>
> File Information Tool Set (FITS):  http://fits.googlecode.com
>
> With the increase in web archiving and other born-digital projects
> that introduce new formats and genres to our digital preservation
> repositories, it is becoming more important that our tools support a
> wide range of file formats. In particular, our file format
> identification, validation and metadata extraction tools should work
> with a broad range of formats and genres. There are a number of these
> file tools in existence, but none of these tools individually can both
> support a wide range of formats and extract the technical metadata
> necessary to fully characterize digital content.
>
> In the fall of 2008 Harvard University Library began development on
> the File Information Tool Set (FITS) in response to this need. FITS
> acts as a wrapper around multiple open source file format
> identification, validation and metadata extraction tools. FITS invokes
> and manages the output of these tools. The native output from these
> tools is converted into a common format, "FITS XML", compared to one
> another and consolidated into a single XML output file. The tools
> currently wrapped by FITS are:
>
> * JHOVE
> * Exiftool from Phil Harvey
> * National Library of New Zealand Metadata Extractor
> * DROID from the UK National Archives
> * Ffident from Marco Schmidt
> * File Utility
>
> In addition, FITS includes two original tools: FileInfo and
> XmlMetadata. There are a number of tools that will be evaluated for
> incorporation into FITS in the future, including:
>
> * Apache Tika
> * JHOVE 2
> * Aduna Aperture
> * MediaInfo
>
> FITS is written in Java and is compatible with Java 1.5 or higher.
> FITS can be invoked by its command-line interface or through its Java
> API.
>
> FITS produces a “status” value for each format identification it
> makes. When the status is SINGLE_RESULT, all tools that were able to
> identify the format agree on the file’s format. When the status is
> CONFLICT, there is more than one purported format identified for the
> file. Because FITS combines the output of multiple tools it has to be
> able to handle conflicts among the tool’s output when they don’t
> agree. It handles this conflict in many ways:
>
> * Tool output is normalized before it is compared for conflicts. For
> example, one tool might report for a file format that it is “PNG”,
> while another tool may output it as “Portable Network Graphics”. In
> another example, one tool might output the resolution unit as “2”;
> another tool might output it as “inches”. These values are normalized
> in the XSLT file that converts the tool’s native output to FITS XML
> before the FITS XML for each tool is compared to each other.
> * Users configure a tool ordering preference. In cases of format
> identification conflicts, the format identified by the preferred tools
> will determine the format FITS reports.
> * Tools can be excluded from reporting on particular formats and/or on
> particular metadata elements if its output is found in testing to be
> incorrect or buggy. This is very useful for incorporating a tool into
> FITS because it is good at some things without having to accept known
> unreliable information from the tool.
> * FITS consults a configurable “format tree” to know when two reported
> formats for a file are not really conflicts because one of the formats
> is a more specific form of the other format. For example the format
> tree documents that the OpenDocument Text format is a more specific
> form of the Zip format. If a file is identified as being in both of
> these formats by FITS tools it is not reported as a conflict because
> technically they are both correct. Instead the more specific format,
> OpenDocument Text, is reported as the format.
>
> FITS is available to the public under the LGPL license. Harvard
> University Library (HUL) plans to use FITS in production in 2010
> within its ingest service, but is making an early release of it
> available now for testing at http://fits.googlecode.com. Additional
> tools are being written at HUL to convert FITS XML into MIX, textMD,
> documentMD and other technical metadata schemas.
>
> We invite you to download and try using FITS. Any issues using it can
> be reported on the FITS website on the Issues web page
> (http://code.google.com/p/fits/issues/list). For more information
> please see the FITS website (http://fits.googlecode.com) or contact me
> directly.
>
> -- 
> Andrea Goethals
> Digital Preservation and Repository Services Manager
> HUL - Office for Information Systems
> 90 Mt. Auburn Street
> Cambridge, MA 02138
> phone: (617) 495-3724
> andrea_goethals at harvard.edu
>

Phone:02 6241 7659, Mobile:04 3365 2400, Skype:antonybbarry
Email:tony at Tony-Barry.emu.id.au, antonybbarry at gmail.com
http://www.facebook.com/people/Antony-Barry/1386242004
http://tony-barry.emu.id.au







More information about the Link mailing list