Git Product home page Git Product logo

404-checker's Introduction

404 Checker

This tool sniffs out the HTTP status of links on a page, and if the URL returns 404 (or if it returns no headers at all) it queries the Wayback Machine's API to see if a snapshot is available. If one is, we can choose whether or not to redirect users to the Wayback snapshot instead of a 404 result.

Best approach?

At the moment, the script goes through three steps:

  1. On `document.ready`, the links are scanned and external links are flagged.
  2. On hovering over a link, we use PHP and AJAX to reach out and grab the headers for the URL in question.
  3. If the page returns a 404 header, or doesn't return a heard at all, we query the Wayback machine to see if it has a snapshot.

A few things:

  1. It would be better if the initial link scan is limited to areas of the page known to contain links worth checking, like main content areas. There's no sense in scanning links in areas of the page we know to contain good links.
  2. In theory we could preemptively scan all the links, instead of on hover. This is certainly easier from a programming standpoint, but possibly not so good from a UX and resources standpoint, as we'll be making a bunch of (possibly unneeded) HTTP requests.
  3. Right now the script only checks for 404s and pages that don't resolve at all. There a lot of other HTTP statuses we could be checking for.
  4. At the request of @waxpancake, the demo page has a fake pubdate of `20060303`, which the script is using to ask for a Wayback snapshot as close to this date as it will give us. If no pubdate is present (or if it's not in Wayback's preferred format: YYYYMMDD), Wayback will default to returning the most recent snapshot.

If we go with the on-demand approach, we need to decide what to do:

  1. Do we try to replace the URL before the user clicks? Depending on how fast the HTTP check comes in, the user may click before we get a response.
  2. It's possible we've been able to flag the link as 404, but don't have a result from the Wayback API yet yet. So do we capture the click, make a note of it, then push the user to the snapshot URL when it comes in, assuming it will come in a timely manner? If not, what?

Please note: I have not, but intend to, see how things work on touch devices. We'll likely need a different approach to the link events.

There's a demo over here: http://git.monkeydo.biz/404-checker/.

404-checker's People

Contributors

murtaugh avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

linkbaseorg

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.