[LINK] Diacritics and Search Engines

Roger Clarke Roger.Clarke at xamax.com.au
Wed Jan 16 12:36:16 AEDT 2008


It's embarrassing to have to admit it (because I've done some work in 
this area), but I've just twigged to the obvious - diacritics such as 
umlauts, acutes and cedillas are not handled well by search-engines.

In the few languages that I'm familiar with, a letter with a 
diacritic is appropriately treated as a variant of the letter, e.g. 
u-umlaut is still a u (although in some languages the unadorned 
letter may not exist, or the two may be treated as different letters).

I tripped over the problem because people have reported that they're 
unable to find my paper from last September:

     What 'Überveillance' Is, and What To Do About It
     [Heaven knows what your email-client did with the u-umlaut ...]
     http://www.anu.edu.au/people/Roger.Clarke/DV/RNSA07.html

If linkers can point to sources that explain this to dubbos like me, 
and what to do about it, I'd greatly appreciate the assistance.

-- 
Roger Clarke                  http://www.anu.edu.au/people/Roger.Clarke/
			            
Xamax Consultancy Pty Ltd      78 Sidaway St, Chapman ACT 2611 AUSTRALIA
                    Tel: +61 2 6288 1472, and 6288 6916
mailto:Roger.Clarke at xamax.com.au                http://www.xamax.com.au/

Visiting Professor in Info Science & Eng  Australian National University
Visiting Professor in the eCommerce Program      University of Hong Kong
Visiting Professor in the Cyberspace Law & Policy Centre      Uni of NSW



More information about the Link mailing list