Git Product home page Git Product logo

check-pages's Introduction

check-pages

Checks various aspects of a web page for correctness.

npm version GitHub tag Build status Coverage License

Install

npm install check-pages --save-dev

If you're using Grunt, the grunt-check-pages package wraps this functionality in a Grunt task.

If you're using Gulp or another framework, the example below shows how to integrate check-pages into your workflow.

Overview

An important aspect of creating a web site is validating the structure, content, and configuration of the site's pages. The checkPages task provides an easy way to integrate this testing into your workflow.

By providing a list of pages to scan, the task can:

Usage

To use check-pages with Gulp, create a task and invoke checkPages, passing the task's callback function. The following example includes all supported options:

var gulp = require("gulp");
var checkPages = require("check-pages");

gulp.task("checkDev", [ "start-development-server" ], function(callback) {
  var options = {
    pageUrls: [
      'http://localhost:8080/',
      'http://localhost:8080/blog',
      'http://localhost:8080/about.html'
    ],
    checkLinks: true,
    onlySameDomain: true,
    queryHashes: true,
    noRedirects: true,
    noLocalLinks: true,
    noEmptyFragments: true,
    linksToIgnore: [
      'http://localhost:8080/broken.html'
    ],
    checkXhtml: true,
    checkCaching: true,
    checkCompression: true,
    maxResponseTime: 200,
    userAgent: 'custom-user-agent/1.2.3',
    summary: true
  };
  checkPages(console, options, callback);
});

gulp.task("checkProd", function(callback) {
  var options = {
    pageUrls: [
      'http://example.com/',
      'http://example.com/blog',
      'http://example.com/about.html'
    ],
    checkLinks: true,
    maxResponseTime: 500
  };
  checkPages(console, options, callback);
});

API

/**
 * Checks various aspects of a web page for correctness.
 *
 * @param {object} host Specifies the environment.
 * @param {object} options Configures the task.
 * @param {function} done Callback function.
 * @returns {void}
 */
module.exports = function(host, options, done) { ... }

Host

Type: Object
Required

Specifies the task environment.

For convenience, console can be passed directly (as in the example above).

log

Type: Function (parameters: String)
Required

Function used to log informational messages.

error

Type: Function (parameters: String)
Required

Function used to log error messages.

Options

Type: Object
Required

Specifies the task configuration.

pageUrls

Type: Array of String
Default value: undefined
Required

pageUrls specifies a list of URLs for web pages the task will check. The list can be empty, but must be present.

URLs can point to local or remote content via the http, https, and file protocols. http and https URLs must be absolute; file URLs can be relative. Some features (for example, HTTP header checks) are not available with the file protocol.

checkLinks

Type: Boolean
Default value: false

Enabling checkLinks causes each link in a page to be checked for validity (i.e., an HTTP HEAD or GET request returns success).

For efficiency, a HEAD request is made first and a successful result validates the link. Because some web servers misbehave, a failed HEAD request is followed by a GET request to definitively validate the link.

The following element/attribute pairs are used to identify links:

  • a/href
  • area/href
  • audio/src
  • embed/src
  • iframe/src
  • img/src
  • input/src
  • link/href
  • object/data
  • script/src
  • source/src
  • track/src
  • video/src

onlySameDomain

Type: Boolean
Default value: false
Used by: checkLinks

Set this option to true to block the checking of links on different domains than the referring page.

This can be useful during development when external sites aren't changing and don't need to be checked.

queryHashes

Type: Boolean
Default value: false
Used by: checkLinks

Set this option to true to verify links with file hashes in the query string point to content that hashes to the expected value.

Query hashes can be used to invalidate cached responses when leveraging browser caching via long cache lifetimes.

Supported hash functions are:

  • image.png?crc32=e4f013b5
  • styles.css?md5=4f47458e34bc855a46349c1335f58cc3
  • archive.zip?sha1=9511fa1a787d021bdf3aa9538029a44209fb5c4c

noRedirects

Type: Boolean
Default value: false
Used by: checkLinks

Set this option to true to fail the task if any HTTP redirects are encountered.

This can be useful to ensure outgoing links are to the content's canonical location.

noLocalLinks

Type: Boolean
Default value: false
Used by: checkLinks

Set this option to true to fail the task if any links to localhost are encountered.

This is useful to detect temporary links that may work during development but would fail when deployed.

The list of host names recognized as localhost are:

  • localhost
  • 127.0.0.1 (and the rest of the 127.0.0.0/8 address block)
  • ::1 (and its expanded forms)

noEmptyFragments

Type: Boolean
Default value: false
Used by: checkLinks

Set this option to true to fail the task if any links contain an empty fragment identifier (hash) such as <a href="#">.

This is useful to identify placeholder links that haven't been updated.

linksToIgnore

Type: Array of String
Default value: undefined
Used by: checkLinks

linksToIgnore specifies a list of URLs that should be ignored by the link checker.

This is useful for links that are not accessible during development or known to be unreliable.

checkXhtml

Type: Boolean
Default value: false

Enabling checkXhtml attempts to parse each URL's content as XHTML and fails if there are any structural errors.

This can be useful to ensure a page's structure is well-formed and unambiguous for browsers.

checkCaching

Type: Boolean
Default value: false

Enabling checkCaching verifies the HTTP Cache-Control and ETag response headers are present and valid.

This is useful to ensure a page makes use of browser caching for better performance.

checkCompression

Type: Boolean
Default value: false

Enabling checkCompression verifies the HTTP Content-Encoding response header is present and valid.

This is useful to ensure a page makes use of compression for better performance.

maxResponseTime

Type: Number
Default value: undefined

maxResponseTime specifies the maximum amount of time (in milliseconds) a page request can take to finish downloading.

Requests that take more time will trigger a failure (but are still checked for other issues).

userAgent

Type: String
Default value: check-pages/x.y.z

userAgent specifies the value of the HTTP User-Agent header sent with all page/link requests.

This is useful for pages that alter their behavior based on the user agent. Setting the value null omits the User-Agent header entirely.

summary

Type: Boolean
Default value: false

Enabling the summary option logs a summary of each issue found after all checks have completed.

This makes it easy to pick out failures when running tests against many pages.

Release History

  • 0.7.0 - Initial release, extract functionality from grunt-check-pages for use with Gulp.
  • 0.7.1 - Fix misreporting of "Bad link" for redirected links when noRedirects enabled.
  • 0.8.0 - Suppress redundant link checks, support noEmptyFragments option, update dependencies.
  • 0.9.0 - Add support for checking local content via the 'file:' protocol, update dependencies.

check-pages's People

Contributors

davidanson avatar furzeface avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.