munter / hyperlink Goto Github PK

A node library and command line tool to test the integrity of your internal an external hyperlinks

JavaScript 100.00%

hyperlink's Introduction

Hyperlink

Detect invalid and inefficient links on your webpages. Works with local files or websites, on the command line and as a node library.

Because web performance is not only about making your own page run smoothly, but also about giving people a quick navigation out of your page.

Read some more of the thoughts behind hyperlink in Check your link rot.

Hyperlink is known to:

Detect broken links to internal assets
Detect broken links to external assets
Detect broken links to fragment identifiers
Detect missing DNS records on external links
Detect inefficient external links that result in a redirect chain
Detect miscellaneous syntax errors in your web assets
Detect mixed content warnings on TLS secured pages

Todo:

Detect inefficient redirects to internal assets
Autocorrect inefficient redirects in local files

Installation

$ npm install -g hyperlink

Hyperlink exposes an executable hyperlink in your npm binaries folder.

Usage

Command line usage and options:

hyperlink [options] <htmlFile(s) | url(s)>

Options:
  -h, --help         Show this help                     [default: false]
  --root             Path to your web root (will be deduced from your
                     input files if not specified)
  --canonicalroot    URI root where the project being built will be
                     deployed. Canonical URLs in local sources will be
                     resolved to local URLs
  --verbose, -v      Log all added assets and relations. VERY verbose.
  --recursive, -r    Crawl all HTML-pages linked with relative and root
                     relative links. This stays inside your domain.
  --internal, -i     Only check links to assets within your own web root

  --pretty, -p       Resolve "pretty" urls without .html extension to
                     the .html file on disk             [default: false]
  --source-maps      Verify the correctness of links to source map
                     files and sources.                 [default: false]
  --skip             Avoid running a test where the report matches the
                     given pattern
  --todo             Mark a failed tests as todo where the report
                     matches the given pattern
  --concurrency, -c  The maximum number of assets that can be loading
                     at once                               [default: 25]

Hyperlink takes any number of input files or urls. It is recommended having these urls on the same domain or be part of the same web site.

The --root option is only needed for resolving root relative urls in case you are not sending in pages located in the web root.

Common Use Cases

Checking internal URL's only

Running hyperlink path/to/index.html --canonicalroot https://deployed.website.com/ -r --internal path/to/index.html will recursively explore the internals links of your website to ensure internal integrity. It is recommended to make this a part of your build pipeline and block on errors, since any error is very likely to be actually user facing if our page is deployed.

Running hyperlink path/to/index.html --canonicalroot https://deployed.website.com/ -r path/to/index.html will recursively explore all links of your website, internal and external, to ensure that you aren't linking to external resources that have been removed or are otherwise failing. It is not recommended to block your build pipeline on a failure of external links, since they are out of your control. Run in this mode in a non-blocking way and fix the errors in the report at your leisure. It is recommended to to this regularly, since external assets can move or disappear without warning.

Using a sitemap

Hyperlink understands sitemaps, and if you have one, it is recommended to start hyperlink out from that. You might have multiple sitemaps, annotated with Sitemap:-directives in your robots.txt, in which case you can start hyperlink from your robots.txt as well. Run hyperlink path/to/robots.txt or hyperlink path/to/sitemap.xml

The following sitemap formats are supported:

Reporters

Hyperlink is using the TAP output format, which is sort of human readable, and very machine readable. Use the TAP output in your CI setup, or pipe the output through one of these awesome reporters to get improved human readability or an output Jenkins likes

These reporters are known to work well with hyperlink:

tap-spot: Minimal output for non-errors and human readable reports for errors marked as TODO or ERROR

Example:

$ hyperlink https://mntr.dk/ | tap-spot

Tee is a very useful program when you want to save and replay TAP outputs. In order to save the output to a file but still see the logs on stdout you might run a command line like so:

hyperlink https://mntr.dk -r | tee mntr.dk.tap | tap-spot

License

The MIT License (MIT)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

hyperlink's People

Contributors

Stargazers

Watchers

Forkers

passy modulexcite tyriar papandreou albertfdp laurencerobert gaffen simon04 skipjack nanowebcoder savoyschuler garipoff global19-atlassian-net igk1972 akx ah89 mexerica joshuakgoldberg

hyperlink's Issues

Handle 502 responses to HTTP HEAD request

Hyperlink doesn't recover with an attempted HTTP GET when encountering a 502 response from a HTTP HEAD:

$ curl https://jspm.io -I
HTTP/1.1 502 Bad Gateway
Date: Wed, 12 Jul 2017 07:46:38 GMT
Content-Type: text/html; charset=UTF-8
Connection: keep-alive
Set-Cookie: __cfduid=d65c6b17eead63406cc24a080503b3b8d1499845598; expires=Thu, 12-Jul-18 07:46:38 GMT; path=/; domain=.jspm.io; HttpOnly
Expires: Thu, 01 Jan 1970 00:00:01 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
X-Frame-Options: SAMEORIGIN
Server: cloudflare-nginx
CF-RAY: 37d263cbf89e3cef-CPH

See webpack/webpack.js.org#1413

Content-type inference of http assets by extension triggers false negatives

The inference of content-type based on file extension in a URL seems to often cause false negatives in the real world. We should probably delay content-type inference until after the http response has informed us of what the server thinks.

Examples:

HTML-page with .js file extension, referenced from an anchor:

not ok 6716 content-type-mismatch https://github.com/zeit/next.js
  ---
    operator: content-type-mismatch
    expected: "application/javascript"
    actual:   "text/html; charset=utf-8"
    at: build/starter-kits/index.html:21:100926 <a href="https://github.com/zeit/next.js" target="_blank">...</a>
  ...

Png image sent through an image resizing service. The original url response 302 with an image/png and the redirect target responds 200 with image/jpeg

not ok 6493 content-type-mismatch https://github.com/d3viant0ne.png?size=90
  ---
    operator: content-type-mismatch
    expected: "image/png"
    actual:   "image/jpeg"
    at: build/contribute/release-process/index.html:31:165 <img src="https://github.com/d3viant0ne.png?size=90">
  ...

Timeout Issues on Travis

Just to be clear this isn't necessarily an issue with this package. It's just that after a conversation with @Munter we decided this would be the most logical place for the discussion.

We’ve been having some issues when running hyperlink on Travis for the webpack.js.org repository. Essentially, our build and other tests run but when hyperlink (the lint:links npm script) runs, it takes longer than 10 minutes and travis terminates the build. This issue has been somewhat inconsistent, but in the last few days has gotten worse. The odd thing is that locally it runs much faster (~5 min).

One of our contributors (@pierreneter) dug into it a bit and came to the conclusion that it’s probably the Travis' network that's slowing things down. I guess my questions are just…

Has anyone run into this before and is the Travis network theory correct?
Does anyone have ideas for a workaround or solution to this?

I tried travis_wait by the way, which I guess might be one solution, but I think we’d have to rework our .travis.yaml file a bit to get that to work.

iconv issue?

$ hyperlink http://h3manth.com/new
TAP version 13
# Crawling internal assets
ok 1 loading http://h3manth.com/new
not ok 2 should not have any errors loading asset
  ---
    operator: error
    expected:

    actual:
      "node-iconv not found. Cannot decode [Html/1 http://h3manth.com/new] (encoding is iso-8859-1). Please run `npm install iconv` and try again"
    at: http://h3manth.com/new
  ...

1..2
# tests 2
# pass  1
# fail  1

Same results even after npm install iconv am I missing something here?

SVG parse of http redirects empty body

When loading an svg over http and the response is a redirect to the actual location it seems that assetgraph is attempting to parse the redirects empty response body as SVG instead of the redirect target.

Relevant errors from hyperlink:

TAP version 13
# Crawling internal assets
ok 1 load build-status.html
not ok 2 content-type-mismatch https://travis-ci.org/peerigon/extract-loader.svg?branch=master
  ---
    operator: content-type-mismatch
    expected: "image/svg+xml"
    actual:   "Asset is used as both Html and Svg"
    at: build-status.html:1:11 <img src="https://travis-ci.org/peerigon/extract-loader.svg?branch=master" alt="Build Status">
  ...
not ok 3 Failed loading https://travis-ci.org/peerigon/extract-loader.svg?branch=master
  ---
    operator: error
    expected:

    actual:
      "https://travis-ci.org/peerigon/extract-loader.svg?branch=master: Parse error in https://travis-ci.org/peerigon/extract-loader.svg?branch=master\n[xmldom error]\tinvalid doc source\n@#[line:undefined,col:undefined]"
    at: build-status.html:1:11 <img src="https://travis-ci.org/peerigon/extract-loader.svg?branch=master" alt="Build Status">
  ...
not ok 4 load https://travis-ci.org/peerigon/extract-loader.svg?branch=master
  ---
    operator: load
    expected:
      "200 https://travis-ci.org/peerigon/extract-loader.svg?branch=master"
    actual:
      "Cannot read property 'childNodes' of undefined"
    at: build-status.html:1:11 <img src="https://travis-ci.org/peerigon/extract-loader.svg?branch=master" alt="Build Status">
  ...
# Crawling 0 outgoing urls
# Connecting to 0 hosts (checking <link rel="preconnect" href="...">
# Looking up 0 host names (checking <link rel="dns-prefetch" href="...">

1..4
# tests 4
# pass  1
# fail  3

The above is created by crawling this html:

<img src="https://travis-ci.org/peerigon/extract-loader.svg?branch=master" alt="Build Status">

Feature request: garbage collection / dead file detection

I recently wanted to detect and clear out all the old files in my static site. I did this by

running hyperlink
munging its text output to get a list of files accessible from the roots
using find to list all files in my static site
using diff to list those files in my static site that are not accessible from the roots
deleting those files

Hyperlink was very useful here, but I think it would be cool if the feature was built in, or if there was a good example in the README for how to do this with hyperlink. Ultimately I'd like to have this test in my CI to ensure I keep the site clean.

(Not posting my script here because it's awful messy, mostly due to scraping hyperlink's stdout!)

Some error

Good day!

Sorry for just throwing it on you. but I don't have experience with js and node. I've tried to use tool against our website and got next:

/usr/local/lib/node_modules/hyperlink/node_modules/assetgraph/node_modules/jsdom/lib/jsdom/level1/core.js:662
Array.prototype.splice.call(this.childNodes, oldChildIndex, 1);
^
TypeError: Cannot set property length of [object Object] which has only a getter
at HTMLDocument.core.Node.removeChild (/usr/local/lib/node_modules/hyperlink/node_modules/assetgraph/node_modules/jsdom/lib/jsdom/level1/core.js:662:28)
at HTMLDocument. (/usr/local/lib/node_modules/hyperlink/node_modules/assetgraph/node_modules/jsdom/lib/jsdom/level2/events.js:370:17)
at HTMLDocument.proto.(anonymous function) (/usr/local/lib/node_modules/hyperlink/node_modules/assetgraph/node_modules/jsdom/lib/jsdom/utils.js:23:26)
at HTMLDocument.inheritFrom.removeChild (/usr/local/lib/node_modules/hyperlink/node_modules/assetgraph/node_modules/jsdom/lib/jsdom/level1/core.js:1667:47)
at Html.parseTree (/usr/local/lib/node_modules/hyperlink/node_modules/assetgraph/lib/assets/Html.js:217:33)
at Html.extendWithGettersAndSetters.findOutgoingRelationsInParseTree (/usr/local/lib/node_modules/hyperlink/node_modules/assetgraph/lib/assets/Html.js:252:17)
at Html.outgoingRelations (/usr/local/lib/node_modules/hyperlink/node_modules/assetgraph/lib/assets/Asset.js:576:44)
at Html.extendWithGettersAndSetters.populate (/usr/local/lib/node_modules/hyperlink/node_modules/assetgraph/lib/assets/Asset.js:635:17)
at AssetGraph..extend.addAsset (/usr/local/lib/node_modules/hyperlink/node_modules/assetgraph/lib/index.js:161:15)
at /usr/local/lib/node_modules/hyperlink/node_modules/assetgraph/lib/transforms/populate.js:106:48

I will try to investigate and fix it on weekend.

Install failed due to issue with dependency

I tried to install your tool but got an error (see screenshot). It looks like one of your dependencies requires Python which isn’t pre-installed on Windows (not sure about OS X). Can you make sense of this?

Allow redirects of URLs matching pattern to be ignored

I'd like to do something like hyperlink 127.0.0.1 -r -ignore "<some regex>" in order to ignore certain redirects that don't need to be fixed.

Avoid spamming with CommonJsRequire errors when encountering browserify output

Browserify exposes a require function as an argument, so the wrapped source code can use require(<String>). Assetgraph sees this, correctly as an attempt to load CommonJS modules, but doesn't figure out that this is a browserify packaged, probably self encapsulated, module.

$ hyperlink http://sarasoueidan.com/ | tap-colorize 

TAP version 13
# Crawling internal assets
ok 1 loading http://sarasoueidan.com/
not ok 2
  ---
    operator: error
    expected:

    actual:
      "Skipping JavaScriptCommonJsRequire (only supported from file: urls): require(\"./SearchStrategies/fuzzy\")"
    at: inline JavaScript in http://sarasoueidan.com/
  ...

False negatives

Running hyperlink with -p -v -r dist/index.html and getting false negatives like this:

not ok 21 load dist/set-designer-1

Which is strange since set-designer-1 is never referenced in any of the files in dist/.
Any idea what's going on?
I could send the dist/ folder if it would help, or if you could give me some pointers where I should start to debug :)

Resolve directories to `index.html`

Say I have reference like <p><a href="writers-guide">Writer's guide</a></p>. That should try out writers-guide/index.html in addition to writers-guide.html as a fallback.

CLI --skip option

TAP has the option to mark a test as skipped. When a test is skipped the work is not executed, but the report returns with a thumbs up and marked as skipped.

The current --exclude option abuses TAP skip semantics by also possibly returning an executed and failed test as skipped.

It's very likely that this should actually be the behavior of the updated --exclude option: #113. Consider collapsing these two issues into the same commit

Test failures for Preconnect link types

Preconnect links cause test failures like so:

<!-- browser hint to open a connection to fonts.gstatic.com -->
<link rel="preconnect" href="https://fonts.gstatic.com">

not ok 54 should respond with HTTP status 200
operator: error
expected: "200 https://fonts.gstatic.com"
actual:   "404 https://fonts.gstatic.com"
at: https://www.cars.com (6:30) <link rel="preconnect" href="https://fonts.gstatic.com">
    https://www.cars.com/ (8:30) <link rel="preconnect" href="https://fonts.gstatic.com">

I think this library should handle preconnect links differently (or at least ignore them). Perhaps it should do a dns lookup for them. Let me know your thoughts. I'm happy to send a PR.

Non-recursive mode fails on checking file urls

When recursion is disabled while checking a local filesystem graph, the local recursive pages get marked for http checking, which obviously fails.

Suggestion: Mark the pages on a file: url as a stopAsset and keep them inside the assetgraph population pipeline. When processing an asset that's marked as stopAsset, skip all checks and abort further recursion. This way we can use the Assetgraph pipeline instead of having to reinvent http semantics with fs.stat or similar

Don't populate sourcemaps by default

Can I mark a file/node to be ignored?

Here's an example page that causes an error, Parse error in inline JavaScript in test_case.html\nUnknown node type ChainExpression.

<!doctype html>
<html lang="en">
  <body>
    <script>
      foo = bar?.baz;
    </script>
  </body>
</html>

The error here is, I believe, because hyperlink doesn't know about optional chaining. That's OK: I can't expect it to understand every bleeding-edge JS experiment.

But I do expect the freedom to use bleeding-edge browser features without this causing all CI to fail. I would like to be able to mark a file/node/section as "trust me, I know what I'm doing, don't test this", e.g. like this

<!doctype html>
<html lang="en">
  <body>
    <script data-hyperlink-ignore>
      foo = bar?.baz;
    </script>
  </body>
</html>

Is this possible today? If so, could I see an example? Thanks!

CLI --todo option

TAP specifies that a test can be marked as todo. This is used for marking failing tests as non-blocking.

Implement a CLI --todo option that takes a matching pattern similar to --exclude. Any URL that matches a todo-pattern should still be executed, but upon failure be marked as todo with an exaplanation that this categorization comes from a specific todo-match pattern

Memory Error

From our discussion on webpack/webpack.js.org#1194...

We've been seeing an error in our build process over at webpack/webpack.js.org which seems to be stemming from this package:

See this travis build for the full log.

<link rel=dns-prefetch href=...>: Don't try to load the url via http(s)

Instead, just look up the host via DNS, similar to how preconnect is handled.

Spurious fail caused by `fetch("https://some.url.that/only/accepts/post", { method: "POST" })`

Steps to reproduce

<!doctype html>
<html>
  <script>
    fetch("https://some.url.that/only/accepts/post", { method: "POST" });
  </script>
</html>

Expected behavior

All tests pass, because no requests made by this page will result in a 404.

Actual behavior

not ok 2 load https://some.url.that/only/accepts/post
  ---
    operator: load
    expected: "200 https://some.url.that/only/accepts/post"
    actual:   "getaddrinfo ENOTFOUND some.url.that"
    at: _site/test_case.html:4:12 (inlined JavaScript)

It seems that when hyperlink finds fetch("somestringliteral", { ... }), it will try to GET somestringliteral, and fail if it gets a 404.

Workaround

I can currently outsmart hyperlink by writing

<!doctype html>
<html>
  <script>
    const theUrl = "https://some.url.that/only/accepts/post";
    fetch(theUrl, { method: "POST" });
  </script>
</html>

But this is undesirable, because I have to obfuscate my code to get around the testing tool.

Desired change

Option 1: hyperlink should only follow a fetch() URL if it knows for sure that it's a GET request (although I think this is going down a deep rabbithole)
Option 2: some kind of escape hatch like #187

mark whole run as `--todo`

Hi,

Thank you for this very useful library. I wanted to incorporate your tool into snowpack build on top of 11ty build. I was able to check all internal link successfully with your tool. Following your advice that all external links could also be check but not fail the build process, I'm struggling to mark all failed or skipped link for external as ok for the npm build process. Would you know how I can achieve this? Thank you very much!

Here is a portion of my package.json

...
"scripts": {
    "clean": "del _site",
    "start": "npm run clean && snowpack dev",
    "build": "snowpack build",
    "postbuild": "npm run links-internal && npm run links-external",
    "links-internal": "hyperlink -pri --root _build --canonicalroot https://example.com/ --todo 'fonts.gstatic' --todo 'file.myfontastic' _build/*.html | tap-spot",
    "links-external": "hyperlink -pr --root _build --canonicalroot https://example.com/ --todo '//' _build/*.html | tap-spot"
  },
...

Right now, my trick is to use // as to mark all of external links so the hyperlink check does not fail my npm run. However, I am just wondering if there is a better way?

Thank you very much!

tap output

Even though it's hard to define what a test is, and thus hard to show how many failures there are compared to successes, tap output is quite useful for working with machines and CI setups.

So the result might be depressing, since there will likely never be any test that pass, just the absence of failing ones. However, my current assumption is that people will run this tool to become aware of any problems with their setup, so a list of failures might be fine.

Check missing rel="noreferrer noopener" on new window links

No compatible version found: assetgraph@'>=1.17.5 <2.0.0'

Getting this error when trying to install.

Follows links to canonicalRoot instead of resolving as local

The following HTML

<a href="https://mntr.dk">waat</a>

Checked with this command line:

hyperlink -ri --canonicalroot https://mntr.dk --root . index.html

Results in this TAP output:

TAP version 13
# Crawling internal assets
ok 1 load index.html
ok 2 load https://mntr.dk
not ok 3 load static/bundle.e4f5761693.css
  ---
    operator: load
    expected:
      "200 static/bundle.e4f5761693.css"
    actual:
      "ENOENT: no such file or directory, open '/Users/munter/git/mocha/static/bundle.e4f5761693.css'"
    at: https://mntr.dk (1:2819) <link rel="stylesheet" href="/static/bundle.e4f5761693.css">
  ...
not ok 4 load static/bundle-1.74ee7145ce.css
  ---
    operator: load
    expected:
      "200 static/bundle-1.74ee7145ce.css"
    actual:
      "ENOENT: no such file or directory, open '/Users/munter/git/mocha/static/bundle-1.74ee7145ce.css'"
    at: https://mntr.dk (1:2914) <link rel="stylesheet" href="/static/bundle-1.74ee7145ce.css" integrity="sha256-RaWVNaKNpPwo3fei7Cy7ZVOJbyKdZZOze5mWdWJildU=">
  ...
not ok 5 load feed.xml
  ---
    operator: load
    expected:
      "200 feed.xml"
    actual:
      "ENOENT: no such file or directory, open '/Users/munter/git/mocha/feed.xml'"
    at: https://mntr.dk (1:4894) <link rel="alternate" type="application/rss+xml" title="RSS" href="/feed.xml">
  ...
not ok 6 load static/logo-white.0b1467f089.svg
  ---
    operator: load
    expected:
      "200 static/logo-white.0b1467f089.svg"
    actual:
      "ENOENT: no such file or directory, open '/Users/munter/git/mocha/static/logo-white.0b1467f089.svg'"
    at: https://mntr.dk (1:5053) <img src="/static/logo-white.0b1467f089.svg">
  ...
not ok 7 load static/web-share.0d5ae2348f.js
  ---
    operator: load
    expected:
      "200 static/web-share.0d5ae2348f.js"
    actual:
      "ENOENT: no such file or directory, open '/Users/munter/git/mocha/static/web-share.0d5ae2348f.js'"
    at: https://mntr.dk (1:15127) <script src="/static/web-share.0d5ae2348f.js" async="" integrity="sha256-M+JrvP+ihAv2Lm9ojTdA2j03E34+HhQSHkHWuILaYPE=">...</script>
  ...
ok 8 load 
ok 9 load https://fonts.googleapis.com/css?family=Noto+Serif:400,700,400i|Open+Sans:700,400
# Connecting to 2 hosts (checking <link rel="preconnect" href="...">
ok 10 preconnect-check https://fonts.googleapis.com
ok 11 preconnect-check https://fonts.gstatic.com
# Looking up 0 host names (checking <link rel="dns-prefetch" href="...">

1..11
# tests 11
# pass  6
# fail  5

It looks like hyperlink follows the link because it is a match for the canonical root, but instead of resolving to index.html on the local disk, it seems to load the content from https://mntr.dk and keeps running the checks from that page. Only the online deployed index page has the links to those hashed file names. The canonical link resolution makes hyperlink look for the hashed files on local disk.

Any ideas on this one @papandreou ?

Memory leak issue during checking of all links

Hi Munter, sorry it took me a while to post the issue here.

As a summary, I am having issue with hyperlink give memory leak problem and not checking all links (external and internal).

This is the run just for internal links with it working fine:

...
> hyperlink -pri --root _build --canonicalroot https://mydomain.com/ --todo '_build/solvents' --todo 'fonts.gstatic' --todo 'file.myfontastic' _build/*.html | tap-spot

..!..........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................!!  

...
  
  1375 tests
  1372 passed
     3 todo

However, switching to testing all links, i got this issue and I think it stopped short of checking all links (sometimes, I saw the total test number smaller than the one reported by checking only internal links.

> hyperlink -pr --root _build --canonicalroot https://mydomain.com/ --todo '_build/solvents' --todo 'fonts.gstatic' --todo 'file.myfontastic' --todo '//' _build/*.html | tap-spot

..!.................................................................!.......!!..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................,............................................................................................................!...............................................!.................!........................................(node:36296) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 11 pipe listeners added to [Request]. Use emitter.setMaxListeners() to increase limit
(Use `node --trace-warnings ...` to show where the warning was created)
.....!..........................................!.................!!  

  ! TODO load _build/solvents
// more TODO


  1676 tests
     1 skipped
  1665 passed
    11 todo

Would you mind having a look at this issue please? Thank you very much :)

tap output fails in tap-dot

Some errors from tap-render seem to be incompatible with some tap consumers, specifically tap-dot and am-tap-dot

$ hyperlink build/api/plugins/index.html --root build --canonicalroot https://webpack.js.org | tee plugins.tap | tap-dot

  ...................x/Users/munter/.nvm/versions/node/v8.9.3/lib/node_modules/tap-dot/node_modules/trim/index.js:5
  return str.replace(/^\s*|\s*$/g, '');
             ^

TypeError: Cannot read property 'replace' of undefined
    at trim (/Users/munter/.nvm/versions/node/v8.9.3/lib/node_modules/tap-dot/node_modules/trim/index.js:5:14)
    at Parser._handleError (/Users/munter/.nvm/versions/node/v8.9.3/lib/node_modules/tap-dot/node_modules/tap-out/index.js:184:14)
    at Parser.handleLine (/Users/munter/.nvm/versions/node/v8.9.3/lib/node_modules/tap-dot/node_modules/tap-out/index.js:46:8)
    at Stream.<anonymous> (/Users/munter/.nvm/versions/node/v8.9.3/lib/node_modules/tap-dot/node_modules/tap-out/index.js:212:14)
    at emitOne (events.js:116:13)
    at Stream.emit (events.js:211:7)
    at drain (/Users/munter/.nvm/versions/node/v8.9.3/lib/node_modules/tap-dot/node_modules/through/index.js:36:16)
    at Stream.stream.queue.stream.push (/Users/munter/.nvm/versions/node/v8.9.3/lib/node_modules/tap-dot/node_modules/through/index.js:45:5)
    at emit (/Users/munter/.nvm/versions/node/v8.9.3/lib/node_modules/tap-dot/node_modules/split/index.js:37:14)
    at next (/Users/munter/.nvm/versions/node/v8.9.3/lib/node_modules/tap-dot/node_modules/split/index.js:49:7)
internal/streams/legacy.js:59
      throw er; // Unhandled stream error in pipe.
      ^

Error: write EPIPE
    at _errnoException (util.js:1024:11)
    at WriteWrap.afterWrite [as oncomplete] (net.js:867:14)

Minimum recreation that causes this failure:

TAP version 13
not ok 1 external-check file:///Users/munter/git/webpack.js.org/build/assets/favicon.ico
  ---
    operator: external-check
    expected:
      "200 file:///Users/munter/git/webpack.js.org/build/assets/favicon.ico"
    actual:
      "Unknown error"
    at: build/api/plugins/index.html:9:37 <link rel="shortcut icon" href="/assets/favicon.ico">
  ...

Recommend https over protocol relative links to externals

Don't try to follow mailto: links

not ok 70 should respond with HTTP status 200
  ---
    operator: error
    expected: "200 mailto:[email protected]"
    actual:   "Unknown error mailto:[email protected]"
    at: https://mntr.dk (1:12032) <a class="subscribe social-email" href="mailto:[email protected]">...</a>
  ...
ok 71 URI should have no redirects - mailto:[email protected]

The following redirect check is also a bit confusing when the check before clearly failed

CLI --exclude option behaves wrong

Current behavior of --exclude: Do exactly the same work as if there was no --exclude, but mark tests on excluded URL's as skipped in the tap report.

This is a confusing behavior.

--exclude should actually block from even making a request to any excluded URL's.

This could be done by filtering hrefs in hrefs by what's currently shouldSkip(url). shouldSkip() has to be renamed in the process, since skip and todo have special semantics in TAP, which other command line parameters should be able to target

Doesn't recognize fragment links to name attributes

Although it's deprecated in HTML5 to use fragments to link to an elements name-attribute, the practice is still in wide use and browser also still support it.

The user interaction of clicking a link and having it work is more important than sticking to the word of the standard, so if browsers give a successful experience here, so should we.

Example:

<a href="#namefragment">goto</a>
<a name="namefragment">Welcome</a>

BUG: Checks pages outside its boundaries

If any Html-asset that wasn't supposed to be a part of the graph expansion query somehow gets into the graph, hyperlink will treat it as a valid page that needs all its non-page dependencies checked.

Case in point:

<meta property="og:video" content="https://vimeo.com/49026951">

The open graph link responds with an Html asset, which then becomes a first-level citizen in the graph.

When hyperlink is running with the --internal switch, https://vimeo.com/49026951 get checked as part of the run as if you had called hyperlink directly on that URL.

If hyperlink is running with the --recursive switch activated, having https://vimeo.com/49026951 in the graph will result in a complete traversal of the entirety of https://vimeo.com

This case is equally relevant where no cross-domain jump is happening. Running hyperlink non-recursive should fo course also never result in another page being expanded upon, even if it was local

Allow checking anchors

Let's say I have <p><a href="writers-guide#foo">Writer's guide</a></p>. It would be handy to check that the anchors exist.

To quote @Munter, findRelations({ to: { type: 'Html' }, href: /#/}).every(relation => return relation.to.parseTree.document.querySelector(relation.href.match(/#.*/))) or so might do the trick.

Fails when a directory-name has an @ symbol

Hi,

Locally, I've tried to mirror GitHub orgs so I've named my directories something like /home/foo/Projects/@unexpected/unexpected which is a local clone of this repo https://github.com/unexpectedjs/unexpected.

With that, this command:

hyperlink -ri --canonicalroot https://unexpected.js.org --skip content-type-mismatch --skip unexpected.js.org/unexpected- site-build/index.html | tap-spot

Fails with this error:

✖ FAIL load ../../%40unexpected/unexpected/site-build
  | operator: load
  | expected: 200 ../../%40unexpected/unexpected/site-build
  |   actual: ENOENT: no such file or directory, open '/home/foo/Projects/%40unexpected/unexpected/site-build/'
  |       at: site-build/assertions/any/to-be/index.html:7:36 <a href="/">...</a>
  
  257 tests
   15 skipped
  256 passed
    1 failed

But as you can see a bunch of other tests also pass. It works if I rename the directory to something without the @ symbol.

I'm running on an Ubuntu 18.04 machine with [email protected]. Let me know if I should provide more info.

Best,
Joel.

Fragment identifier check does not run with `--internal` active

Security vulnerability in deprecated package

Hi,

This package depends on optimist, which is deprecated and uses an outdated version of minimist, which is vulnerable.

Would it be possible to replace this dependency?

Thank you for your time.

Local file-urls get passed to external-check with http request

When running hyperlink on local files, some urls to local files get sent to an external http check instead of handled internally by assetgraph with file-url support.

Examples

not ok 3621 external-check file:///Users/munter/git/webpack.js.org/build/assets/favicon.ico
  ---
    operator: external-check
    expected:
      "200 file:///Users/munter/git/webpack.js.org/build/assets/favicon.ico"
    actual:
      "Unknown error"
    at: build/index.html:9:37 <link rel="shortcut icon" href="/assets/favicon.ico">
  ...

not ok 4704 external-check file:///Users/munter/git/webpack.js.org/build/1ebd0482aadade65f20ec178219fe012.woff2
  ---
    operator: external-check
    expected:
      "200 file:///Users/munter/git/webpack.js.org/build/1ebd0482aadade65f20ec178219fe012.woff2"
    actual:
      "Unknown error"
    at: build/0faedee72dede3679c76.css:108:12 url(/1ebd0482aadade65f20ec178219fe012.woff2) format("woff2"), url(/314bbcd238d458622bbf32427346774f.woff) format("woff")
  ...
not ok 4705 external-check file:///Users/munter/git/webpack.js.org/build/314bbcd238d458622bbf32427346774f.woff
  ---
    operator: external-check
    expected:
      "200 file:///Users/munter/git/webpack.js.org/build/314bbcd238d458622bbf32427346774f.woff"
    actual:
      "Unknown error"
    at: build/0faedee72dede3679c76.css:108:74 url(/1ebd0482aadade65f20ec178219fe012.woff2) format("woff2"), url(/314bbcd238d458622bbf32427346774f.woff) format("woff")
  ...
not ok 4706 external-check file:///Users/munter/git/webpack.js.org/build/bf176a25b4f8227fea804854c98dc5e2.png
  ---
    operator: external-check
    expected:
      "200 file:///Users/munter/git/webpack.js.org/build/bf176a25b4f8227fea804854c98dc5e2.png"
    actual:
      "Unknown error"
    at: build/branding/index.html:21:28646 <img src="/bf176a25b4f8227fea804854c98dc5e2.png">
  ...

not ok 4730 external-check file:///Users/munter/git/webpack.js.org/build/assets/icon-square-small-slack.png
  ---
    operator: external-check
    expected:
      "200 file:///Users/munter/git/webpack.js.org/build/assets/icon-square-small-slack.png"
    actual:
      "Unknown error"
    at: build/branding/index.html:43:268 <img src="/assets/icon-square-small-slack.png" width="50" alt="icon square small example">
  ...

not ok 6140 external-check file:///Users/munter/git/webpack.js.org/build/loaders/eslint-loader/CHANGELOG.md
  ---
    operator: external-check
    expected:
      "200 file:///Users/munter/git/webpack.js.org/build/loaders/eslint-loader/CHANGELOG.md"
    actual:
      "Unknown error"
    at: build/loaders/eslint-loader/index.html:274:103 <a href="CHANGELOG.md">...</a>
  ...

CLI option: Internal link check only

When iterating on your own website it would be useful to be able to have hyperlink check only internal references (quick) and exclude any outbound references to the internet (slow).

This could be done by adding a CLI option to disable the last transform in the chain : https://github.com/Munter/hyperlink/blob/master/lib/index.js#L269

Possible name for option: --internal-only

embed src causes internal traversal to extend to foreign domain

This code causes hyperlink to recursively traverse http://www.cc.com despite being called with the -i flag that should ensure that only site internal pages are traversed:

<embed src='http://www.thedailyshow.com/sitewide/video_player/view/default/swf.jhtml'></embed>

Output:

$ hyperlink -ri BUG.html
Guessing --root from input files: file:///Users/pbm/
TAP version 13
# Crawling internal assets
ok 1 load BUG.html
ok 2 load http://www.thedailyshow.com/sitewide/video_player/view/default/swf.jhtml
ok 3 load http://www.cc.com/shows/the-daily-show-with-trevor-noahsitewide/video_player/view/default/swf.jhtml
ok 4 load http://www.cc.com/shows
ok 5 load http://www.cc.com/shows/hart-of-the-city
ok 6 load http://www.cc.com/shows/crank-yankers
^C

Looks like a redirect chain from this embed src ends up on a html page, which hyperlink doesn't correctly identify as cross domain.

$ curl -I http://www.thedailyshow.com/sitewide/video_player/view/default/swf.jhtml
HTTP/1.1 301 Moved Permanently
Date: Tue, 12 May 2020 08:12:53 GMT
Content-Type: text/html
Content-Length: 166
Connection: keep-alive
Cache-Control: no-store, no-cache, must-revalidate
Expires: Tue, 12 May 2020 08:12:53 GMT
Location: http://www.cc.com/shows/the-daily-show-with-trevor-noahsitewide/video_player/view/default/swf.jhtml
Server: EasyRedir

$ curl -I http://www.cc.com/shows/the-daily-show-with-trevor-noahsitewide/video_player/view/default/swf.jhtml
HTTP/1.1 301 Moved Permanently
Content-Type: text/html; charset=UTF-8
Content-Length: 0
Server: Apache/2.4.29 (Unix)
X-Powered-By: PHP/7.1.1
Location: /shows
Cache-Control: max-age=60
Date: Tue, 12 May 2020 08:13:22 GMT
Connection: keep-alive

Fragment links are not checked correctly when pointing to folder having index.html

While trying to understand why links on webpack.js.org got broken without linter having caught them (e.g. webpack/webpack.js.org#2929), I found out that fragment links are not checked correctly then they point to a folder having index.html inside.

To illustrate the issue, I created testdata/fragmentIdentifier/test/index.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Document</title>
</head>
<body>
</body>
</html>

and appended

diff --git a/testdata/fragmentIdentifier/index.html b/testdata/fragmentIdentifier/index.html
index 5ec7aff..2e12ebd 100644
--- a/testdata/fragmentIdentifier/index.html
+++ b/testdata/fragmentIdentifier/index.html
@@ -26,5 +26,7 @@
     <a href="page.html#broken-two"></a>
     <a href="page.html#broken-three"></a>
 
+    <a href="/test#definitely-broken"></a>
+
 </body>
 </html>

Running

./lib/cli.js testdata/fragmentIdentifier/index.html

doesn't report test#definitely-broken failed. It only does if index.html is used in the link, like

diff --git a/testdata/fragmentIdentifier/index.html b/testdata/fragmentIdentifier/index.html
index 5ec7aff..fb497aa 100644
--- a/testdata/fragmentIdentifier/index.html
+++ b/testdata/fragmentIdentifier/index.html
@@ -26,5 +26,7 @@
     <a href="page.html#broken-two"></a>
     <a href="page.html#broken-three"></a>
 
+    <a href="/test/index.html#definitely-broken"></a>
+
 </body>
 </html>

I'd be happy to come up with a PR, but wasn't able to identify the root cause. Ended up finding that condition relation.to.type === 'Html' is not fulfilled in this case and so relation.to.incomingFragments are not populated.

Docs: More examples

Could you please provide some examples for more advanced usages?

For example how to use --skip to ignore redirect chains like:

not ok 700 external-redirect http://blog.dilbert.com/2016/12/27/the-kristina-talent-stack/
  ---
    operator: external-redirect
    expected:
      "302 http://blog.dilbert.com/2016/12/27/the-kristina-talent-stack/ --> 200 https://blog.dilbert.com/2016/12/27/the-kristina-talent-stack/"
    actual:
      "301 http://blog.dilbert.com/2016/12/27/the-kristina-talent-stack/ --> 200 https://blog.dilbert.com/2016/12/27/the-kristina-talent-stack/"
    at: https://stephanschubert.com/10-things-i-did-not-know-before-25/ (210:101) <a href="http://blog.dilbert.com/2016/12/27/the-kristina-talent-stack/" title="There's also one post about Trumps' talent stack." target="_blank" rel="nofollow noopener noreferrer" data-reactid="281">...</a>
  ...

Related: Why does it expect a 302 instead of a 301?

Crashes against webpack.js.org

I'm getting

.../webpack.js.org/node_modules/hyperlink/lib/index.js:193
        if (error.message.indexOf('AssetGraph.ensureAssetConfigHasType: Couldn\'t load') === 0) {
                         ^

TypeError: Cannot read property 'indexOf' of undefined

if running against webpack.js.org. Hash: da4a57f29512b9b645431cbb7854b82a1ecb7ad0 .

The strange thing is that Travis doesn't crash, but it's on a newer Node (v6.5.0 vs. v6.9.1). I tried a fresh npm install, but it's still the same.

The link errors are something for me to fix. Just thought to point out this potential crash. For a reason I don't understand error.message can be undefined and adding a check like if (error.message && error.message... fixes the problem.

Allow Cookie injection

It might be useful to run this tool in a logged in state. Allow passing one or more cookies to the tool so it can penetrate these walled gardens

Don't be so noisy about missing URL scheme resolvers

See assetgraph/assetgraph#236

Bin: Exit with non-zero when a "problem" is detected

Use cases: CI, git bisect run

Could be configured by something like checkLanguageKeys --stoponwarning --warn <messageType>,... --ignore <messageType>,....

https://github.com/assetgraph/assetgraph-builder/blob/master/bin/checkLanguageKeys#L15-L27

403 false negatives

Figure out how to avoid making servers angry. 200 responses are much nicer than 403's

Potential exit code issue

Hyperlink exits with an error code equal to the number of errors encountered, making it a useful CI tool.

Looking at your project I just stumbled across this in the readme and there is likely a bug here (without looking at the code). Most systems use an unsigned integer for the exit code and silently wrap around to 0 on 256 and above.

On linux this is the case and I believe OSX/BSD also use uint8 and have the same issue.

#!/bin/sh
exit 256

./test.sh; echo $?; # oops - now zero!

If you are not already doing so you should Math.min(code, 255) to prevent this bug.

I scratched my head for a while on this issue before so thought I would give you a heads up: 256 errors might equal success!

Netlify build fails early with no info

I have a failing link on the link checker but I have no insight into what it is! It doesn’t show up locally and the Netlify build quits early so it never gives the final report.

failed during stage 'building site': Build script returned non-zero exit code: 1

While this is partially Netlify’s fault https://community.netlify.com/t/common-issue-deploy-logs-truncated-or-showing-shutting-down-logging-number-messages-pending/747 I did play around with verbose mode locally but it doesn’t seem to output the links that are checking while they’re being checked.

Is there any way to improve the output here so that it shows the failing link when it finds it?

An in-range update of eslint-config-standard is breaking the build 🚨

The devDependency eslint-config-standard was updated from `14.0.1` to `14.1.0`.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

eslint-config-standard is a devDependency of this project. It might not break your production code or affect downstream projects, but probably breaks your build or test tools, which may prevent deploying or publishing.

Status Details

❌ continuous-integration/travis-ci/push: The Travis CI build failed (Details).

Commits

The new version differs by 3 commits.

d1dacfc 14.1.0
adba47f BREAKING: eslint@>=6.2.2
6b22f77 ecmaVersion: 2020

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.

Your Greenkeeper Bot 🌴