[LINK] Diacritics and Search Engines
Roger Clarke
Roger.Clarke at xamax.com.au
Wed Jan 16 13:25:34 AEDT 2008
At 12:43 +1100 16/1/08, Alastair Rankine wrote:
>http://www.google.com.au/search?q=%C3%9Cberveillance&cr=countryAU&sourceid=mozilla-search&start=0
>Your paper appears as the second result.
I accept that, thanks, but my point wasn't that the paper couldn't be
found, nor that the u-umlaut isn't supported.
The point I'm making is that a search on <uberveillance> doesn't
locate documents that contain the string <Xberveillance> where X =
u-umlaut / ü / %C3%9C (depending on the character-set and encoding
that's used).
>Note that Ü is a different character to U, no surprise that it
>should change the search term.
I'm arguing that, for a number of purposes, the set 'u-umlaut' is a
subset of the set 'u', and that searches need to deal with that
relationship in some way.
That's not a character-set or encoding issue, but a service and hence
application issue.
Addendum: A search on <Ueberveillance> also finds nothing.
The character-pair / diphthong 'ue' is both the origin of the umlaut
and conventional usage, in German, in a variety of contexts (i.e. not
just when using 7-bit ASCII).
It's particularly common to see 'Ue' when the u-umlaut occurs at the
beginning of a sentence or the beginning of a proper noun (because in
German the first letter of all nouns is capitalised, not just the
first letters of names and of words in titles).
--
Roger Clarke http://www.anu.edu.au/people/Roger.Clarke/
Xamax Consultancy Pty Ltd 78 Sidaway St, Chapman ACT 2611 AUSTRALIA
Tel: +61 2 6288 1472, and 6288 6916
mailto:Roger.Clarke at xamax.com.au http://www.xamax.com.au/
Visiting Professor in Info Science & Eng Australian National University
Visiting Professor in the eCommerce Program University of Hong Kong
Visiting Professor in the Cyberspace Law & Policy Centre Uni of NSW
More information about the Link
mailing list