[LINK] Fixing Broken *External* Links

Roger Clarke Roger.Clarke at xamax.com.au
Sun Mar 1 09:01:21 AEDT 2009


<idle thoughts on Sunday morning>

Many links die, but many others merely move.

Wouldn't it be nice if someone maintained a table that contained:
-   dead URL
-   replacement URL

Because then you could write a routine to run through your site and 
replace dead URLs with live equivalents.

(Of course, you'd want a 'report, don't replace' option, and maybe 
even a tick-the-boxes option for the ones that you do want replaced. 
But there's no point writing the spec if the underlying resource 
doesn't exist).

On the face of it, hoping for such a service is impossible day-dreaming.

It would require a comparison of every-file against every-file, 
character-by-character.  Neither the CIA nor Google have that kind of 
capacity.

But wait.

P2P networks use hashes of files as identifiers.

There's a huge amount of redundancy inside files, and there are 
hashing algorithms that take advantage of that to (perfectly?  very 
highly reliably?  highly reliably?) generate hashes without 
collisions / duplicates.

So  ...  if holders of large volumes of cache and archive (e.g. all 
search-engine operators, archive.org) ran the hash algorithm on every 
file in their holdings, they could contribute to a common database 
(large numbers of) entries each of which contained:
-   hash
-   URL
-   date-time-stamp
-   signed source and reference (for audit, to protect against cowboys)

People who are trying to fix broken external links would then need a 
utility that:
-   searches that database for the old URL
-   uses the hash of the old URL to search for a replacement
-   if there are duplicates, uses the date-time-stamp to pick the newest
-   invokes the new URL to see if it's still there
-   updates the page that linked to the old URL (or provides a report)

Of course, the database won't contain the hashes of the files that 
had disappeared before the scheme started.  But we'd all be happy to 
put up with that boot-strapping problem if the world ran better in 
future.

Downside:  lazy webmasters who break lots of links every time they do 
a refresh of the colour-scheme and/or navigation on their site would 
be rewarded for their laziness and sub-professionalism.

Upside:  lots of fodder for PhD candidates in post-industrial 
archaeology.  (Don't laugh.  If linguistics has been complemented by 
computational linguistics, why not computational digital archaeology? 
Alright, do laugh then).

No, I haven't looked to see if anyone's already created such a resource.
(Actually, I'm not sure how to construct a search for such a thing).

And anyway I really do need my morning cup of coffee.

</idle thoughts>

-- 
Roger Clarke                                 http://www.rogerclarke.com/

Xamax Consultancy Pty Ltd      78 Sidaway St, Chapman ACT 2611 AUSTRALIA
                    Tel: +61 2 6288 1472, and 6288 6916
mailto:Roger.Clarke at xamax.com.au                http://www.xamax.com.au/

Visiting Professor in Info Science & Eng  Australian National University
Visiting Professor in the eCommerce Program      University of Hong Kong
Visiting Professor in the Cyberspace Law & Policy Centre      Uni of NSW



More information about the Link mailing list