[LINK] Fixing Broken *External* Links
Roger Clarke
Roger.Clarke at xamax.com.au
Sun Mar 1 09:01:21 AEDT 2009
<idle thoughts on Sunday morning>
Many links die, but many others merely move.
Wouldn't it be nice if someone maintained a table that contained:
- dead URL
- replacement URL
Because then you could write a routine to run through your site and
replace dead URLs with live equivalents.
(Of course, you'd want a 'report, don't replace' option, and maybe
even a tick-the-boxes option for the ones that you do want replaced.
But there's no point writing the spec if the underlying resource
doesn't exist).
On the face of it, hoping for such a service is impossible day-dreaming.
It would require a comparison of every-file against every-file,
character-by-character. Neither the CIA nor Google have that kind of
capacity.
But wait.
P2P networks use hashes of files as identifiers.
There's a huge amount of redundancy inside files, and there are
hashing algorithms that take advantage of that to (perfectly? very
highly reliably? highly reliably?) generate hashes without
collisions / duplicates.
So ... if holders of large volumes of cache and archive (e.g. all
search-engine operators, archive.org) ran the hash algorithm on every
file in their holdings, they could contribute to a common database
(large numbers of) entries each of which contained:
- hash
- URL
- date-time-stamp
- signed source and reference (for audit, to protect against cowboys)
People who are trying to fix broken external links would then need a
utility that:
- searches that database for the old URL
- uses the hash of the old URL to search for a replacement
- if there are duplicates, uses the date-time-stamp to pick the newest
- invokes the new URL to see if it's still there
- updates the page that linked to the old URL (or provides a report)
Of course, the database won't contain the hashes of the files that
had disappeared before the scheme started. But we'd all be happy to
put up with that boot-strapping problem if the world ran better in
future.
Downside: lazy webmasters who break lots of links every time they do
a refresh of the colour-scheme and/or navigation on their site would
be rewarded for their laziness and sub-professionalism.
Upside: lots of fodder for PhD candidates in post-industrial
archaeology. (Don't laugh. If linguistics has been complemented by
computational linguistics, why not computational digital archaeology?
Alright, do laugh then).
No, I haven't looked to see if anyone's already created such a resource.
(Actually, I'm not sure how to construct a search for such a thing).
And anyway I really do need my morning cup of coffee.
</idle thoughts>
--
Roger Clarke http://www.rogerclarke.com/
Xamax Consultancy Pty Ltd 78 Sidaway St, Chapman ACT 2611 AUSTRALIA
Tel: +61 2 6288 1472, and 6288 6916
mailto:Roger.Clarke at xamax.com.au http://www.xamax.com.au/
Visiting Professor in Info Science & Eng Australian National University
Visiting Professor in the eCommerce Program University of Hong Kong
Visiting Professor in the Cyberspace Law & Policy Centre Uni of NSW
More information about the Link
mailing list