Dating pages
Tony Barry
tony@ningaui.anu.edu.au
Fri, 9 May 1997 12:12:29 +1000
At 10:01 PM 8/5/97, Stewart Fist wrote:
>Can I make a plea to those who devise Internet and Web protocols to provide
>some inbuilt mechanism in HTML which records the date at which data was
>placed on-line, and when it was last modified.
Most severs now support Server Side Includes. Ningaui is a Mac running
Quid Pro Quo which supports them thence the SHTML suffixes on the files.
They support a variable "LAST_MODIFIED" for the last modified date for the
file and a command "flastmod" which lets you get the date for an arbitrary
file.
Dublin Core meta data developments support a date element and W3C is
looking at an expanded META scheme to take in ALL the Dublin Core elements
which was mentioned at the 4th DC workshop earlier this year.
<http://www.ukoln.ac.uk/metadata/resources/dc4-notes.html>
> We actually need this
>information automatically in a <META Created date= xxx Modified date=xxx>
>form in the background, as well as becoming standard practice in Text at
>the top of each page. We need the coders to add this as a matter of
>course.
Most house style now recommend this sort of information as a matter of
course. I put it into the ANU styles back in the days of gopher.
>2. Search engines don't seem to update pages already checked in the past.
>I still find US stuff that has been up on the net for at least a year,
>which doesn't have some of the key words and scientist's names recorded in
>AltaVista and Excite. I really suspect that the engines reject anything
>they've done before with the same HTML Title.
Different search engines follow different algorithms.
See <http://calafia.com/webmasters/chart.htm>
> The need for a basic thesaurus of terms, and encouragement of keyword
>coding in the META Contents=vvv. We have some capacity for indexing terms
>here, but these terms really need to be separated into Descriptors and
>Identifiers, and there needs to be a central, standard thesaurus for
>reference -- and a standardised way of listing names.
The Dublin core work recognises this BUT there are many thesauri and few
people skilled in their use. Much of this work is being undertaken in
Australia.
<http://metadata.net/>
>
>Joe Bloggs is found under Joe Bloggs, Joe F. Bloggs, J.F.Bloggs, JF Bloggs,
>Bloggs, JF and probably another thirty variations of the initials with
>'Blogs' and 'Blogg'. His may also be widely known as J. Fred Bloggs, with
>all its variations.
This is no so much a thesaurus problem but one of authorised headings. In
the English speaking world this relies on the Anglo-American cataloging
rules v2 (AACR2) which are horrifically complex. There is a conference
shortly to looking at revising them. The complexity of these rules accounts
for much of the ~$50 labour cost to catalogue a book.
>OK. Why I'm bringing this up, is that this could be some sort of project
>that Australia mounts, and maintains. A couple of decent thesaurii (I
>guess they are like hippopotomii) which we collectively build, based on the
>old ones already in use in library science. When someone is building a
>page, they could then check with the thesaurus site for the standard terms
>to use. And maybe even the old Dewey-decimal codes most relevant.
The Resource Discovery Unit at DSTC <http://www.dstc.edu.au/RDU/> and the
national Library have been looking at this and there is a lot of metadata
activity in the mapping and environmental areas. The hope for universal or
even widespread use of a single thesaurus is an illusion. In the world of
paper each of the major indexing services runs it own for instance. All
you can hope for is islands of organisation applied at the publisher level
of remotely. The work at W3C to integrate the delivery of meta data in
with the PICS system provided the prospect of distributed cataloguing
system with terms provided remote from the variations of publishers.
See <http://www.dstc.edu.au/RDU/PICS/proposal02.html>
PICS Extension Proposal to support text-based Metadata
>A couple of decent thesaurii (I
>>guess they are like hippopotomii) which we collectively build, based on the
>>old ones already in use in library science. When someone is building a
>>page, they could then check with the thesaurus site for the standard terms
>>to use. And maybe even the old Dewey-decimal codes most relevant.
Dewey codes are useful for shelf browsing but are not reged as a great
information retrieval tool. Dewey is just one of many classifications.
Many major libraries use Library of Congress classification for instance.
>If we don't do this sort of stuff now, the Web is going to become impossible.
I can't help feeling that perhaps we are approaching the whole question
with the blinkers provided by our print experience where we organise
documents in anticipation of questions and allocate terms to the documents
centrally rather than allocating documents to the questions in a
decentralised way by the users. In part what I am trying to do with the
ningaui link pages is a step in this direction.
Tony
______________________________________________________
mailto:tony@ningaui.anu.edu.au |+61 6 249 5688
http://www.anu.edu.au/People/TonyB.html|+61 6 288 0959
Ningaui Pty Ltd, GPO Box 1680, Canberra City, ACT 2601
Visiting Fellow, Department of Computer Science, FEIT
Australian National University, ACT 0200 AUSTRALIA