[LINK] Remove Rupert from Results Campaign
Tom Worthington
tom.worthington at tomw.net.au
Wed Nov 25 08:37:18 AEDT 2009
LINKGRAM
LINK INSTITUTE
CANBERRA, 24 November 2009: The Link Institute today called on search
engine providers to honour News Corporation's Chairman Rupert Murdoch
wishes and remove all Newscorp publications, and associated media
content, from all web search results. The Link Institute's experts have
offered advice to News Corporation on how to configure their system to
block search engines.
Link Institute Director Professor Klerphell said:
"Mr. Murdoch is clearly frustrated by Google and other search engines
indexing Newscorp content. Research shows that around 25% of traffic to
Newscorp comes from Google (from all search engines the total is
estimated at 50%). Clearly Newscorp is getting twice as much web traffic
as they can handle and the want the search engines to stop sending them
business".
Industry commentators have questioned Murdoch's comments, saying that
Newscorp just has to ask Google to stop indexing it for this to stop.
But Klerphell questions how easy this is:
"The Link Institute carried out a test to see how easy it was to stop
Google indexing content. Our researchers asked the Goggle search engine
'how to stop Google indexing my content'. The response came back one
tenth of a second later: "Removing my own content from Google -
Webmasters/Site owners Help". However this page was headed 'This Help
Centre is not currently available in your language.'"
The problem apparently is because Mr. Murdoch's staff speak British
English (indicated by "en-uk" in the web address) and Google supplies
its instructions in American English. They were therefore unable to
understand the instructions:
To prevent robots from crawling your site, add the following
directive to your robots.txt file:
User-agent: *
Disallow: /
From:
<http://www.google.com/support/webmasters/bin/answer.py?hl=en-uk&answer=156412>.
The Link Institute employed a team of linguists to translate the
instructions into UK English and discovered that it said:
To prevent robots from crawling your site, add the following
directive to your robots.txt file:
User-agent: *
Disallow: /
Following this discovery, a team of web experts obtained a copy of the
current robots.txt file from a Newscorp site
<http://www.theaustralian.com.au/robots.txt>:
User-agent: *
Disallow: /*comments-*
Disallow: /*print/*
Disallow: /*email/*
Disallow: /*SIT*
Disallow: /*.swf
Disallow: /printpage/
Disallow: */404*
Sitemap: http://www.theaustralian.com.au/sitemap.xml
Extensive work was then required to incorporate the additional Google
blocking instructions into the Newscorp file, using specialised software
tools ("copy" and "paste"), resulting in:
User-agent: *
Disallow: /
The Link Institute has offered to run a training course for Newscorp
employees to explain how the additional code works and the maintenance
procedures required.
** END***
;-)
---
--
Tom Worthington FACS HLM, TomW Communications Pty Ltd. t: 0419496150
PO Box 13, Belconnen ACT 2617, Australia http://www.tomw.net.au
Adjunct Lecturer, The Australian National University t: 02 61255694
Computer Science http://cs.anu.edu.au/people.php?StaffID=140274
More information about the Link
mailing list