[LINK] Fixing Broken *External* Links
georgebray at gmail.com
Sun Mar 1 18:39:06 EST 2009
Over the years (11 now!) running LinkAlarm we've had many thoughtful
suggestions on how to solve some of these problems.
As Ivan said, it's not just the links that break but the content may
change. This is a particular problem for many of our school sites, as
the sites they were pointing to would suddenly go away. Those
unscrupulous domain name harvesters would put a porn site at the same
domain. To LinkAlarm it would be a link that tested OK. So we
researched how to discern a change in the owner of the domain, and
while this approach worked it didn't scale at all well (because the
time take for a whois lookup on each link to be checked was too long).
I often think back to the days before the web when Ted Nelson's Xanadu
system had "two-way" links, and the idea of some mechanism to make
sure a document and all its linkages were kept intact. Sadly, the web
we all know came along in 1994 and one-way links are the norm.
In around 2001 we had a competitor whose approach was to scan the
entire web, tally the broken links, so that they had some intelligence
on what might be a good replacement link. But that too was a problem
of scale and they went away in the dotcom crash.
Just for fun, we check the old Netscape site (1994) every month. As
you can imagine, it's pretty broken by now!
More information about the Link