[LINK] A new metadata tag proposal
Tue, 27 Nov 2001 22:39:52 +1100
At 4:18 PM +1100 27/11/01, Dr. Bob Jansen wrote:
>Whilst working on information retrieval systems, such as Status,
>whilst at ICL and latter doing R&D at CSIRO into retrieval
>technologies, I learned very quickly that stop words, such as the
>proposal mentioned, only work effectively in well defined contexts.
>For example, we did a prototype of the Hansard system for the Fed
>Parliamentary Library and concluded that we could only have one stop
>word, the word 'a', since all other potential stop words could be
>used as acronyms etc.
We librarians have some experience in stop words. They tend to start
to fall apart when you have multilingual databases.
For instance "the" is a family name in Vietnamese! In addition
retrieval systems do not allow for the use of of diacritic marks
which may be applied twice or even three times to the same letter in
Vietnamese. For this reason we always regarded Vietnamese as an
honorary non roman script!
phone +61 2 6241 7659