[LINK] A new metadata tag proposal

Tony Barry me@Tony-Barry.emu.id.au
Tue, 27 Nov 2001 22:39:52 +1100


At 4:18 PM +1100 27/11/01, Dr. Bob Jansen wrote:
>Whilst working on information retrieval systems, such as Status, 
>whilst at ICL and latter doing R&D at CSIRO into retrieval 
>technologies, I learned very quickly that stop words, such as the 
>proposal mentioned, only work effectively in well defined contexts. 
>For example, we did a prototype of the Hansard system for the Fed 
>Parliamentary Library and concluded that we could only have one stop 
>word, the word 'a', since all other potential stop words could be 
>used as acronyms etc.

We librarians have some experience in stop words. They tend to start 
to fall apart when you have multilingual databases.

For instance "the" is a family name in Vietnamese! In addition 
retrieval systems do not allow for the use of of diacritic marks 
which may be applied twice or even three times to the same letter in 
Vietnamese. For this reason we always regarded Vietnamese as an 
honorary non roman script!

Tony
-- 
phone  +61 2 6241 7659
mailto:me@Tony-Barry.emu.id.au
http://tony-barry.emu.id.au/people/tony/index.html