Git Product home page Git Product logo

archiveurl's Introduction

Archive URLs

Overview

This is just a small little script I whipped up to archive hyperlinks found in HTML and Markdown pages using standard Linux utilities.

It's not 100% perfect; I suspect more work will be needed on regex used, but it seems to work fine.

The script can be a tad slow as I have configured it to sleep every 2 seconds after making a request to the Internet Archive. You can remove this if you want, but I'd much rather not send too many requests there and use up their valuable bandwidth.

Usage

Just run archiveurl.sh

It needs curl installed.

The script just outputs the headers of the response gotten back from the Archive. If it's a 302 FOUND message, and it has a location header with a web.archive.org URL, most likely it's been archived. If you get any other errors, check the URL that's been requested to be archived, as the regex may not have worked correctly. If that's all right, it could also be an 429 TOO MANY REQUESTS message, which is fairly self-explanatory; just wait a little while and try again. And of course, for anything else, just create an issue

License

I hereby release this code into the Public Domain, so that it may be used as freely as possible, and that more content is archived for future generations. May not a single bit be lost to the void.

archiveurl's People

Contributors

gotlougit avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.