[LINK] Diacritics and Search Engines

Roger Clarke Roger.Clarke at xamax.com.au
Wed Jan 16 13:25:34 AEDT 2008


At 12:43 +1100 16/1/08, Alastair Rankine wrote:
>http://www.google.com.au/search?q=%C3%9Cberveillance&cr=countryAU&sourceid=mozilla-search&start=0
>Your paper appears as the second result.

I accept that, thanks, but my point wasn't that the paper couldn't be 
found, nor that the u-umlaut isn't supported.

The point I'm making is that a search on <uberveillance> doesn't 
locate documents that contain the string <Xberveillance> where X = 
u-umlaut / ü / %C3%9C (depending on the character-set and encoding 
that's used).


>Note that Ü is a different character to U, no surprise that it 
>should change the search term.

I'm arguing that, for a number of purposes, the set 'u-umlaut' is a 
subset of the set 'u', and that searches need to deal with that 
relationship in some way.

That's not a character-set or encoding issue, but a service and hence 
application issue.


Addendum:  A search on <Ueberveillance> also finds nothing.

The character-pair / diphthong 'ue' is both the origin of the umlaut 
and conventional usage, in German, in a variety of contexts (i.e. not 
just when using 7-bit ASCII).

It's particularly common to see 'Ue' when the u-umlaut occurs at the 
beginning of a sentence or the beginning of a proper noun (because in 
German the first letter of all nouns is capitalised, not just the 
first letters of names and of words in titles).


-- 
Roger Clarke                  http://www.anu.edu.au/people/Roger.Clarke/

Xamax Consultancy Pty Ltd      78 Sidaway St, Chapman ACT 2611 AUSTRALIA
                    Tel: +61 2 6288 1472, and 6288 6916
mailto:Roger.Clarke at xamax.com.au                http://www.xamax.com.au/

Visiting Professor in Info Science & Eng  Australian National University
Visiting Professor in the eCommerce Program      University of Hong Kong
Visiting Professor in the Cyberspace Law & Policy Centre      Uni of NSW



More information about the Link mailing list