[LINK] RFI: Mirroring Complete Web-Sites?
Steven Clark
steven.clark at internode.on.net
Wed Aug 31 17:43:45 AEST 2011
I've used HTTrack and WebHTTrack on large sites. You can set the UserAgent, and ignore robots.txt amongst it's many options.
It can also mirror the logical or physical architecture of websites. And tag each HTML page with comment of your choice: date, original URL, etc.
I hope one of the suggestions here helps :)
---
Steven R Clark, MACS CP
On 31/08/2011, at 2:44 PM, Peter Johnson <peter at itchymind.com> wrote:
> You can try this free product.
> I have used it on small sites.
>
> http://www.httrack.com/
>
> Pete
>
> -----Original Message-----
> From: link-bounces at mailman.anu.edu.au [mailto:link-bounces at mailman.anu.edu.au] On Behalf Of Roger Clarke
> Sent: Wednesday, 31 August 2011 2:29 PM
> To: link at anu.edu.au
> Subject: [LINK] RFI: Mirroring Complete Web-Sites?
>
> I need to urgently mirror a web-site, which is about to disappear (don't ask, but assume bumbling governmental incompetencies).
>
> I ought to know about things like this, but I don't (don't criticise, but assume bumbling Rogerish incompetencies).
>
> I can't get to a database that's in behind the site, but there's a great deal of HTML that's well worth rescuing.
>
> I'm a mere user / member of the public, and have no ftp privileges, and my first attempts led nowhere (i.e. timed out).
>
> In any case, can an anonymous ftp user do a recursive download?
>
> Is there an easy way to do a bulk download within a browser?
>
> I frequently mirror individual pages in Firefox 3.0.19, using File / Save Page As ... / Web Page Complete
>
> But I don't see a way to get it to follow links or directory-structures.
>
> --
> Roger Clarke http://www.rogerclarke.com/
>
> Xamax Consultancy Pty Ltd 78 Sidaway St, Chapman ACT 2611 AUSTRALIA
> Tel: +61 2 6288 1472, and 6288 6916
> mailto:Roger.Clarke at xamax.com.au http://www.xamax.com.au/
>
> Visiting Professor in the Cyberspace Law & Policy Centre Uni of NSW
> Visiting Professor in Computer Science Australian National University
> _______________________________________________
> Link mailing list
> Link at mailman.anu.edu.au
> http://mailman.anu.edu.au/mailman/listinfo/link
>
> _______________________________________________
> Link mailing list
> Link at mailman.anu.edu.au
> http://mailman.anu.edu.au/mailman/listinfo/link
More information about the Link
mailing list