Git Product home page Git Product logo

linkcheck's Introduction

linkcheck

Build Status

Very fast link-checking.

linkcheck versus the popular blc tool

Philosophy:

A good utility is custom-made for a job. There are many link checkers out there, but none of them seems to be striving for the following set of goals.

Crawls fast

  • You want to run the link-checker at least before every deploy (on CI or manually). When it takes ages, you're less likely to do so.

  • linkcheck is currently several times faster than blc and all other link checkers that go to at least comparable depth. It is 40 times faster than the only tool that goes to the same depth (linkchecker).

Finds all relevant problems

  • No link-checker can guarantee correct results: the web is too flaky for that. But at least the tool should correctly parse the HTML (not just try to guess what's a URL and what isn't) and the CSS (for url(...) links).

    • PENDING: srcset support
  • linkcheck finds more than linklint and blc. It finds the same amount or more problems than the best alternative, linkchecker.

Leaves out irrelevant problems

  • linkcheck doesn't attempt to render JavaScript. It would make it at least an order of magnitude slower and way more complex. (For example, what links and buttons should the tool attempt to click, and how many times? Should we only click visible links? How exactly do we detect broken links?) Validating SPAs is a very different problem than checking static links, and should be approached by dedicated tools.

  • linkcheck only supports http: and https:. It won't try to check FTP or telnet or nntp links.

    • Note: linkcheck will currently completely ignore unsupported schemes like ftp: or mailto: or data:. This may change in the future to at least show info-level warning.
  • linkcheck doesn't validate file system directories. Servers often behave very differently than file systems, so validating links on the file system often leads to both false positives and false negatives. Links should be checked in their natural habitat, and as close to the production environment as possible. You can (and should) run linkcheck on your localhost server, of course.

Good UX

  • Yes, a command line utility can have good or bad UX. It has mostly to do with giving sane defaults, not forcing users to learn new constructs, not making them type more than needed, and showing concise output.

  • The most frequent use cases should be only a few arguments.

  • linkcheck doesn't throttle itself on localhost.

  • linkcheck follows POSIX CLI standards (no @input and similar constructs like in linklint).

Brief and meaningful output

  • When everything works, you don't want to see a huge list of links.

    • In this scenario, linkcheck just outputs 'Perfect' and some stats on a single line.
  • When things are broken, you want to see where exactly is the problem and you want to have it sorted in a sane way.

    • linkcheck lists broken links by their source URL first so that you can fix many links at once. It also sorts the URLs alphabetically, and shows both the exact location of the link (line:column) and the anchor text (or the tag if it wasn't an anchor).
  • For CI builds, you want non-zero exit code whenever there is a problem.

    • linkcheck returns status code 1 if there are warnings, and status code 2 if there are errors.

It goes without saying that linkcheck fully respects definitions in robots.txt and throttles itself when accessing websites.

Installation

Direct download

  • Download the latest executable from the Releases page on GitHub. Pick the executable for your system (for example, linkcheck-win-x64.exe for a 64-bit machine running Microsoft Windows).

You should be able to immediately run this executable -- it has no external dependencies. For example, assuming you are on macOS and downloaded the file to the default downloads directory, you can go to your Terminal (or iTerm, or SSH) and run ./Downloads/linkcheck-mac-x64.

You can rename the file and move it to any directory. For example, on a Linux box, you might want to rename the executable to simply linkcheck, and move it to /usr/local/bin, $HOME/bin or another directory in your $PATH.

Docker image

Latest executable in a docker image:

docker run --rm tennox/linkcheck --help

(built from a repo mirror by @tennox)

From Source

Step 1. Install Dart

Follow the installation instructions for your platform from the Get the Dart SDK documentation.

For example, on a Mac, assuming you have homebrew, you just run:

$ brew tap dart-lang/dart
$ brew install dart

Step 2. Install linkcheck

Once Dart is installed, run:

$ dart pub global activate linkcheck

Pub installs executables into ~/.pub-cache/bin, which may not be on your path. You can fix that by adding the following to your shell's config file (.bashrc, .bash_profile, etc.):

export PATH="$PATH":"~/.pub-cache/bin"

Then either restart the terminal or run source ~/.bash_profile (assuming ~/.bash_profile is where you put the PATH export above).

Docker

If you have Docker installed, you can build the image and use the container avoiding local Dart installation.

Build

In the project directory, for x86 and x64 architectures, run

docker build -t filiph/linkcheck .

On ARM architectures (Raspberry, M1 Mac), run

docker build --platform linux/arm64 -t filiph/linkcheck .

Usage (container mode)

docker run filiph/linkcheck <URL>

All below usage guidelines are valid running on container too.

Usage (github action)

uses: filiph/[email protected]
  with:
    arguments: <URL>

All below usage guidelines are valid running as a GitHub action too.

Usage

If in doubt, run linkcheck -h. Here are some examples to get you started.

Localhost

Running linkcheck without arguments will try to crawl http://localhost:8080/ (which is the most common local server URL).

  • linkcheck to crawl the site and ignore external links
  • linkcheck -e to try external links

If you run your local server on http://localhost:4000/, for example, you can do:

  • linkcheck :4000 to crawl the site and ignore external links
  • linkcheck :4000 -e to try external links

linkcheck will not throttle itself when accessing localhost. It will go as fast as possible.

Deployed sites

  • linkcheck www.example.com to crawl www.example.com and ignore external links
  • linkcheck https://www.example.com to start directly on https
  • linkcheck www.example.com www.other.com to crawl both sites and check links between the two (but ignore external links outside those two sites)

Many entry points

Assuming you have a text file mysites.txt like this:

http://egamebook.com/
http://filiph.net/
https://alojz.cz/

You can run linkcheck -i mysites.txt and it will crawl all of them and also check links between them. This is useful for:

  1. Link-checking projects spanning many domains (or subdomains).
  2. Checking all your public websites / blogs / etc.

There's another use for this, and that is when you have a list of inbound links, like this:

https://www.dart.dev/
https://www.dart.dev/tools/
https://www.dart.dev/guides/

You probably want to make sure you never break your inbound links. For example, if a page changes URL, the previous URL should still work (redirecting to the new page when appropriate).

Where do you get a list of inbound links? Try your site's sitemap.xml as a starting point, and — additionally — try something like the Google Webmaster Tools’ crawl error page.

Skipping URLs

Sometimes, it is legitimate to ignore some failing URLs. This is done via the --skip-file option.

Let's say you're working on a site and a significant portion of it is currently under construction. You can create a file called my_skip_file.txt, for example, and fill it with regular expressions like so:

# Lines starting with a hash are comments.

admin/
\.s?css$
\#info

The file above includes a comment on line 1 which will be ignored. Line 2 is blank and will be ignored as well. Line 3 contains a broad regular expression that will make linkcheck ignore any link to a URL containing admin/ anywhere in it. Line 4 shows that there is full support for regular expressions – it will ignore URLs ending with .css and .scss. Line 5 shows the only special escape sequence. If you need to start your regular expression with a # (which linkcheck would normally parse as a comment) you can precede the # with a backslash (\). This will force linkcheck not to ignore the line. In this case, the regular expression on line 4 will match #info anywhere in the URL.

To use this file, you run linkcheck like this:

linkcheck example.com --skip-file my_skip_file.txt

Regular expressions are hard. If unsure, use the -d option to see what URLs your skip file is ignoring, exactly.

To use a skipfile while running linkchecker through docker create a directory to use as a volume in docker and put your skip file in it. Then use a command similar to the following (assuming the folder was named skipfiles):

docker run -v "$(pwd)/skipfiles/:/skipfiles/" filiph/linkcheck http://example.com/ --skip-file /skipfiles/skipfile.txt

User agent

The tool identifies itself to servers with the following user agent string:

linkcheck tool (https://github.com/filiph/linkcheck)

Releasing a new version

  1. Commit all your changes, including updates to CHANGELOG, and including updating the version number in pubspec.yaml and lib/linkcheck.dart. Let's say your new version number is 3.4.56. That number should be reflected in all three files.
  2. Tag the last commit with the same version number. In our case, it would be 3.4.56.
  3. Push to master.

This will run the GitHub Actions script in .github/workflows/release.yml, building binaries and placing a new release into github.com/filiph/linkcheck/releases.

In order to populate it to the GitHub Actions Marketplace as well, it's currently required to manually Edit and hit Update release on the release page once. No changes needed. (Source: GitHub Community)

linkcheck's People

Contributors

chalin avatar danklassen avatar emielbeinema avatar errnesto avatar filiph avatar ggrossetie avatar hugo-sid avatar kastnerp avatar kevmoo avatar mooreds avatar nfagerlund avatar parlough avatar tennox avatar tooomm avatar tvolkert avatar willgibson avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

linkcheck's Issues

Handle preconnect links correctly

<link rel="preconnect" href="https://www.googletagmanager.com" /> is a hint to start DNS lookup and so on, for the domain. The actual page can be 404.

Current status:

  • Preconnect links are considered like any other link, being fetched / HEAD-ed when -e.
  • For the specific example of <link rel="preconnect" href="https://www.googletagmanager.com" />, this means a HTTP 400 error is reported (because https://www.googletagmanager.com is not a valid page).

Ideal solution:

  • Preconnect is recognized and linkcheck then merely verifies that the domain exists and connects.
  • linkcheck might show a warning when there's a preconnect link and then no actual link to that domain. (Problem: many times the actual link is constructed in JavaScript.)

Realistic solution:

  • Preconnect links are ignored. After all, it is just a hint, and sooner a later an actual link is coming.

Repeatedly getting INTERNAL ERROR

I'm getting this error when running linkcheck on my site. After the error, it seems that the crawl is still continuing though so sometimes the error occurs again later in the crawl. What does it mean and how can I fix it?

INTERNAL ERROR: Sorry! Please open https://github.com/filiph/linkcheck/issues/new in your favorite browser and copy paste the following output there:

Bad state: No element
#0      __CompactLinkedHashSet&_HashFieldBase&_HashBase&_OperatorEqualsAndHashCode&SetMixin.singleWhere (dart:collection/set.dart:271:5)
#1      crawl.<anonymous closure> (package:linkcheck/src/crawl.dart:255:20)
#2      _rootRunUnary (dart:async/zone.dart:1132:38)
#3      _CustomZone.runUnary (dart:async/zone.dart:1029:19)
#4      _CustomZone.runUnaryGuarded (dart:async/zone.dart:931:7)
#5      _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#6      _DelayedData.perform (dart:async/stream_impl.dart:591:14)
#7      _StreamImplEvents.handleNext (dart:async/stream_impl.dart:707:11)
#8      _PendingEvents.schedule.<anonymous closure> (dart:async/stream_impl.dart:667:7)
#9      _rootRun (dart:async/zone.dart:1120:38)
#10     _CustomZone.run (dart:async/zone.dart:1021:19)
#11     _CustomZone.runGuarded (dart:async/zone.dart:923:7)
#12     _CustomZone.bindCallbackGuarded.<anonymous closure> (dart:async/zone.dart:963:23)
#13     _rootRun (dart:async/zone.dart:1124:13)
#14     _CustomZone.run (dart:async/zone.dart:1021:19)
#15     _CustomZone.runGuarded (dart:async/zone.dart:923:7)
#16     _CustomZone.bindCallbackGuarded.<anonymous closure> (dart:async/zone.dart:963:23)
#17     _microtaskLoop (dart:async/schedule_microtask.dart:41:21)
#18     _startMicrotaskLoop (dart:async/schedule_microtask.dart:50:5)
#19     _runPendingImmediateCallback (dart:isolate-patch/isolate_patch.dart:116:13)
#20     _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:173:5)

avoid repeating invalid links

Run the command linkcheck https://webdev.dartlang.org. Part of the output generated will be as shown below. Note that the two 404s are repeated 5 times. It would be nice to list the erroneous links only once.

https://webdev.dartlang.org/angular/guide
- (534:7) 'Cookbook' => https://webdev.dartlang.org/angular/cookbook/ (HTTP 301 => 404)
  - redirect path:
    - https://webdev.dartlang.org/angular/cookbook/ (301)
    - /angular/cookbook (404)
- (556:21) 'Change Log' => https://webdev.dartlang.org/angular/guide/change-log.html (HTTP 301 => 404)
  - redirect path:
    - https://webdev.dartlang.org/angular/guide/change-log.html (301)
    - /angular/guide/change-log (404)
- (534:7) 'Cookbook' => https://webdev.dartlang.org/angular/cookbook/ (HTTP 301 => 404)
  - redirect path:
    - https://webdev.dartlang.org/angular/cookbook/ (301)
    - /angular/cookbook (404)
- (556:21) 'Change Log' => https://webdev.dartlang.org/angular/guide/change-log.html (HTTP 301 => 404)
  - redirect path:
    - https://webdev.dartlang.org/angular/guide/change-log.html (301)
    - /angular/guide/change-log (404)
- (534:7) 'Cookbook' => https://webdev.dartlang.org/angular/cookbook/ (HTTP 301 => 404)
  - redirect path:
    - https://webdev.dartlang.org/angular/cookbook/ (301)
    - /angular/cookbook (404)
- (556:21) 'Change Log' => https://webdev.dartlang.org/angular/guide/change-log.html (HTTP 301 => 404)
  - redirect path:
    - https://webdev.dartlang.org/angular/guide/change-log.html (301)
    - /angular/guide/change-log (404)
- (534:7) 'Cookbook' => https://webdev.dartlang.org/angular/cookbook/ (HTTP 301 => 404)
  - redirect path:
    - https://webdev.dartlang.org/angular/cookbook/ (301)
    - /angular/cookbook (404)
- (556:21) 'Change Log' => https://webdev.dartlang.org/angular/guide/change-log.html (HTTP 301 => 404)
  - redirect path:
    - https://webdev.dartlang.org/angular/guide/change-log.html (301)
    - /angular/guide/change-log (404)
- (534:7) 'Cookbook' => https://webdev.dartlang.org/angular/cookbook/ (HTTP 301 => 404)
  - redirect path:
    - https://webdev.dartlang.org/angular/cookbook/ (301)
    - /angular/cookbook (404)
- (556:21) 'Change Log' => https://webdev.dartlang.org/angular/guide/change-log.html (HTTP 301 => 404)
  - redirect path:
    - https://webdev.dartlang.org/angular/guide/change-log.html (301)
    - /angular/guide/change-log (404)

Bad state: No element

This is the output I got and it asked me to create an issue with this.
What i am trying to do is take a list of URL's from a file as input and check them for 404. The command is: linkcheck.bat -e -i 1.csv > sanitize1.txt

Kills Linux System Memory Runaway

Tried linkcheck out on my system:
Operating System: Linux Mint 19
Kernel: Linux 4.15.0-33-generic
Architecture: x86-64
4 GHz i5-8300H
15GB RAM
Dart VM version: 2.0.0
Pub 2.0.0
linkcheck version 2.0.4

Both times I tried it, it crashed my system. Both times was to try and find all dead links on a company website. When i ranlinkcheck <privateURL> system resources were at 3GB RAM with Load of less than 0.5. After about 700 links my CPU fans were going nuts and RAM was at 10GB in use. By 1200 links, RAM was maxed out at 15GB with an additional 2.5GB of SWAP. By 2193 links RAM still maxed out and SWAP was at 5.25GB and then whole system locked up. UI stopped responding and I could not do anything. I waited 8 minuets the last time after UI lock up before hard shutdown.

Has anybody tested this on a Linux system?

Add 'max depth' argument

When walking the package site (for instance) there are a lot of random walks that can yield a lot of URLs.

It'd be nice to say "walk at most 5 deep from the source URL"

unable to connect to https://localhost

The only message is "connection failed". It's a self-signed certificate, for other similar utilities I have to turn off certificate checking in some fashion. I can crawl the live public version of the same site, which has a valid cert, so I'd guess that's some part of the problem.

Install doc - finding pub

Hi,

I installed dart following the instructions from your linux link https://www.dartlang.org/install/linux, and then:

$ pub global activate linkcheck Command 'pub' not found

Checking the dart .deb package, it puts its bins under /usr/lib/dart/bin/ which is of course is not in my PATH, so it might be useful to add a note helping the user to find pub.

Thanks.

Unhandled exception for charset

I get this error when running with external link checking enabled.

Crawling: 568Unhandled exception:
NoSuchMethodError: The getter 'charset' was called on null.
Receiver: null
Tried calling: charset
#0      Object.noSuchMethod (dart:core-patch/object_patch.dart:51:5)
#1      checkServer (package:linkcheck/src/worker/worker.dart:75:38)
<asynchronous suspension>
#2      worker.<anonymous closure> (package:linkcheck/src/worker/worker.dart:199:42)
<asynchronous suspension>
#3      _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#4      _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#5      _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#6      _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:764:19)
#7      _StreamController._add (dart:async/stream_controller.dart:640:7)
#8      _StreamController.add (dart:async/stream_controller.dart:586:5)
#9      _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#10     _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#11     _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#12     _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:764:19)
#13     _StreamController._add (dart:async/stream_controller.dart:640:7)
#14     _StreamController.add (dart:async/stream_controller.dart:586:5)
#15     _StreamSinkWrapper.add (dart:async/stream_controller.dart:858:13)
#16     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#17     CastStreamSubscription._onData (dart:_internal/async_cast.dart:81:11)
#18     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#19     _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#20     _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#21     _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:764:19)
#22     _StreamController._add (dart:async/stream_controller.dart:640:7)
#23     _StreamController.add (dart:async/stream_controller.dart:586:5)
#24     _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:172:12)
575Unhandled exception:
NoSuchMethodError: The getter 'charset' was called on null.
Receiver: null
Tried calling: charset
#0      Object.noSuchMethod (dart:core-patch/object_patch.dart:51:5)
#1      checkServer (package:linkcheck/src/worker/worker.dart:75:38)
<asynchronous suspension>
#2      worker.<anonymous closure> (package:linkcheck/src/worker/worker.dart:199:42)
<asynchronous suspension>
#3      _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#4      _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#5      _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#6      _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:764:19)
#7      _StreamController._add (dart:async/stream_controller.dart:640:7)
#8      _StreamController.add (dart:async/stream_controller.dart:586:5)
#9      _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#10     _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#11     _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#12     _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:764:19)
#13     _StreamController._add (dart:async/stream_controller.dart:640:7)
#14     _StreamController.add (dart:async/stream_controller.dart:586:5)
#15     _StreamSinkWrapper.add (dart:async/stream_controller.dart:858:13)
#16     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#17     CastStreamSubscription._onData (dart:_internal/async_cast.dart:81:11)
#18     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#19     _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#20     _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#21     _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:764:19)
#22     _StreamController._add (dart:async/stream_controller.dart:640:7)
#23     _StreamController.add (dart:async/stream_controller.dart:586:5)
#24     _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:172:12)

My command is:

linkcheck -e --skip-file linkcheck-skip-file.txt https://site-local.fusionauth.io

Docs: Does `linkcheck` processes / respect `robots.txt?

Hi, I'm not sure if linkcheck respect robots.txt. The sentence in the README.md isn't clear to me.

It goes without saying that linkcheck honors robots.txt and throttles itself when accessing websites.

If processes will be nice to have a parameter to avoid this behaviour.
Example: Some pages are disabled in robots.txt, but should be checked.

Unhandled exception

Hi and thanks for this tool, it seems useful!

Env

  • MacOs Mojave
  • Dart and Linkcheck installed as recommended on the repo's readme.
  • Webserver started with npm http-server (if that matters)

Problem

I'm running a local webserver with 4 folders at the root, and my index file. I expect the whole server to contain around 10 broken link (tested with other tools).

When I check with linkcheck, I get the following output:

Crawling: 2Unhandled exception:
NoSuchMethodError: The getter 'primaryType' was called on null.
Receiver: null
Tried calling: primaryType
#0      Object.noSuchMethod (dart:core/runtime/libobject_patch.dart:50:5)
#1      DestinationResult.updateFromResponse (package:linkcheck/src/destination.dart:327:48)
#2      checkPage (package:linkcheck/src/worker/worker.dart:127:11)
<asynchronous suspension>
#3      worker.<anonymous closure> (package:linkcheck/src/worker/worker.dart:192:29)
<asynchronous suspension>
#4      _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#5      _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#6      _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#7      _SyncStreamController._sendData (dart:async/stream_controller.dart:763:19)
#8      _StreamController._add (dart:async/stream_controller.dart:639:7)
#9      _StreamController.add (dart:async/stream_controller.dart:585:5)
#10     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#11     _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#12     _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#13     _SyncStreamController._sendData (dart:async/stream_controller.dart:763:19)
#14     _StreamController._add (dart:async/stream_controller.dart:639:7)
#15     _StreamController.add (dart:async/stream_controller.dart:585:5)
#16     _StreamSinkWrapper.add (dart:async/stream_controller.dart:858:13)
#17     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#18     CastStreamSubscription._onData (dart:_internal/async_cast.dart:81:11)
#19     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#20     _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#21     _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#22     _SyncStreamController._sendData (dart:async/stream_controller.dart:763:19)
#23     _StreamController._add (dart:async/stream_controller.dart:639:7)
#24     _StreamController.add (dart:async/stream_controller.dart:585:5)
#25     _RawReceivePortImpl._handleMessage (dart:isolate/runtime/libisolate_patch.dart:171:12)
Unhandled exception:
NoSuchMethodError: The getter 'primaryType' was called on null.
Receiver: null
Tried calling: primaryType
#0      Object.noSuchMethod (dart:core/runtime/libobject_patch.dart:50:5)
#1      DestinationResult.updateFromResponse (package:linkcheck/src/destination.dart:327:48)
#2      checkPage (package:linkcheck/src/worker/worker.dart:127:11)
<asynchronous suspension>
#3      worker.<anonymous closure> (package:linkcheck/src/worker/worker.dart:192:29)
<asynchronous suspension>
#4      _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#5      _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#6      _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#7      _SyncStreamController._sendData (dart:async/stream_controller.dart:763:19)
#8      _StreamController._add (dart:async/stream_controller.dart:639:7)
#9      _StreamController.add (dart:async/stream_controller.dart:585:5)
#10     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#11     _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#12     _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#13     _SyncStreamController._sendData (dart:async/stream_controller.dart:763:19)
#14     _StreamController._add (dart:async/stream_controller.dart:639:7)
#15     _StreamController.add (dart:async/stream_controller.dart:585:5)
#16     _StreamSinkWrapper.add (dart:async/stream_controller.dart:858:13)
#17     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#18     CastStreamSubscription._onData (dart:_internal/async_cast.dart:81:11)
#19     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#20     _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#21     _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#22     _SyncStreamController._sendData (dart:async/stream_controller.dart:763:19)
#23     _StreamController._add (dart:async/stream_controller.dart:639:7)
#24     _StreamController.add (dart:async/stream_controller.dart:585:5)
#25     _RawReceivePortImpl._handleMessage (dart:isolate/runtime/libisolate_patch.dart:171:12)
(repeat the same exception)

But the CLI still runs! After waiting a couple of minutes, I get :

Errors. Checked 4892 links, 161 destination URLs (82 ignored), 93 have errors, 0 have warnings.

Which is not correct, but... it runs.

Any idea?

processing ng docs site results in FormatException: Expecting '='

Running linkcheck over the ng docs dev site:

linkcheck https://angulardart-org-dev.firebaseapp.com

results in

.../static-assets/styles.css
Unhandled exception:
FormatException: Expecting '=' (at character 24)
data:image/svg+xml;utf8,<svg xmlns='http://www.w3.org/2000/svg' width='16' ...
                       ^

Context: we're using the dartdoc generated pages for https://github.com/dart-lang/angular2 via https://github.com/dart-lang/site-webdev. Running linkcheck over the resulting webdev site resulted in the error above.

cc @kwalrath @ericjim

Retries for potentially transient errors

First of all, thank you for this great tool, it's saved me a heap of time.

Occasionally, linkcheck will fail because an external site returned a HTTP 503 or something like that. It would be great if linkcheck could be configured to retry in this case rather than failing the entire invocation because one site is down.

Skip patterns ending with # do not seem to work.

I've been testing the skip pattern

/angular/guide/server-communication#

over site-webdev.

Here is part of the debug output:

Crawl will start on the following URLs: [http://localhost:4001/]
Crawl will check pages only on URLs satisfying: {http://localhost:4001/**}
Crawl will skip links that match patterns: UrlSkipper</angular/api/.*apiFilter, data:image/svg+xml;utf8,<svg xmlns='http://www.w3.org/2000/svg', /angular/api/, /angular/guide/router(\.html)?($|#), /angular/guide/change-log.html$, /angular/cookbook/, /angular/guide/appmodule.html$, /angular/guide/server-communication#, /angular/api/static-assets/fonts, /angular/api/(docs|examples)/, /angular/api/.*/index/>
Crawl will check the following servers (and their robots.txt) first: {localhost:4001}
...

http://localhost:4001/angular/guide/server-communication
- (533:18) 'RxJS Obs..' => http://localhost:4001/angular/guide/server-communication#rxjs (HTTP 200 but missing anchor)
- (535:18) 'Enabling..' => http://localhost:4001/angular/guide/server-communication#enable-rxjs-operators (HTTP 200 but missing anchor)
...

Stats:
   14465 links
     331 destination URLs
     347 URLs ignored
      12 warnings
       0 errors

It should be skipping .../server-communication#rxjs.

Parameter to ignore robots.txt

Congratulations by the project, this is awesome.

I'm having a specific use case with this project: I have a staging web project that I don't want it to be indexed by the google. In other words, my robots.txt have one rule to ignore all links, and in this case, I can't use the linkcheck to check that.

Would be great have a parameter that says I don't want to honors the robots.txt.

docker skip-file

Is there a way to utilize a skip file while running linkcheck through docker?

if I do something like
docker run filiph/linkcheck http://example.com/ --skip-file ./skipfile.txt

I get:
Can't read skip file './skipfile.txt': FileSystemException: Cannot open file, path = './skipfile.txt' (OS Error: No such file or directory, errno = 2)

which makes sense as the skip file does not exist in the docker environment. Is there a way to work around this that I'm missing?

Thanks!

Exception during running crawling "NoSuchMethodError: The getter 'charset' was called on null."

Complete Error Message:

Crawling: 134Unhandled exception:
NoSuchMethodError: The getter 'charset' was called on null.
Receiver: null
Tried calling: charset
#0 checkPage (package:linkcheck/src/worker/worker.dart:149)
#1 _RootZone.runUnary (dart:async/zone.dart:1379)
#2 _FutureListener.handleValue (dart:async/future_impl.dart:137)
#3 Future._propagateToListeners.handleValueCallback (dart:async/future_impl.dart:678)
#4 Future._propagateToListeners (dart:async/future_impl.dart:707)
#5 Future._completeWithValue (dart:async/future_impl.dart:522)
#6 _AsyncAwaitCompleter.complete (dart:async-patch/async_patch.dart:30)
#7 _completeOnAsyncReturn (dart:async-patch/async_patch.dart:288)
#8 _fetchHead (package:linkcheck/src/worker/worker.dart:0)
#9 _RootZone.runUnary (dart:async/zone.dart:1379)
#10 _FutureListener.handleValue (dart:async/future_impl.dart:137)
#11 Future._propagateToListeners.handleValueCallback (dart:async/future_impl.dart:678)
#12 Future._propagateToListeners (dart:async/future_impl.dart:707)
#13 Future._completeWithValue (dart:async/future_impl.dart:522)
#14 Future.timeout. (dart:async/future_impl.dart:776)
#15 _RootZone.runUnary (dart:async/zone.dart:1379)
#16 _FutureListener.handleValue (dart:async/future_impl.dart:137)
#17 Future._propagateToListeners.handleValueCallback (dart:async/future_impl.dart:678)
#18 Future._propagateToListeners (dart:async/future_impl.dart:707)
#19 Future._completeWithValue (dart:async/future_impl.dart:522)
#20 Future.wait. (dart:async/future.dart:400)
#21 _RootZone.runUnary (dart:async/zone.dart:1379)
#22 _FutureListener.handleValue (dart:async/future_impl.dart:137)
#23 Future._propagateToListeners.handleValueCallback (dart:async/future_impl.dart:678)
#24 Future._propagateToListeners (dart:async/future_impl.dart:707)
#25 Future._completeWithValue (dart:async/future_impl.dart:522)
#26 Future._asyncComplete. (dart:async/future_impl.dart:552)
#27 _microtaskLoop (dart:async/schedule_microtask.dart:41)
#28 _startMicrotaskLoop (dart:async/schedule_microtask.dart:50)
#29 _Timer._runTimers (dart:isolate-patch/timer_impl.dart:391)
#30 _Timer._handleMessage (dart:isolate-patch/timer_impl.dart:416)
#31 _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:172)
135Unhandled exception:
NoSuchMethodError: The getter 'charset' was called on null.
Receiver: null
Tried calling: charset
#0 checkPage (package:linkcheck/src/worker/worker.dart:149)
#1 _RootZone.runUnary (dart:async/zone.dart:1379)
#2 _FutureListener.handleValue (dart:async/future_impl.dart:137)
#3 Future._propagateToListeners.handleValueCallback (dart:async/future_impl.dart:678)
#4 Future._propagateToListeners (dart:async/future_impl.dart:707)
#5 Future._completeWithValue (dart:async/future_impl.dart:522)
#6 _AsyncAwaitCompleter.complete (dart:async-patch/async_patch.dart:30)
#7 _completeOnAsyncReturn (dart:async-patch/async_patch.dart:288)
#8 _fetchHead (package:linkcheck/src/worker/worker.dart:0)
#9 _RootZone.runUnary (dart:async/zone.dart:1379)
#10 _FutureListener.handleValue (dart:async/future_impl.dart:137)
#11 Future._propagateToListeners.handleValueCallback (dart:async/future_impl.dart:678)
#12 Future._propagateToListeners (dart:async/future_impl.dart:707)
#13 Future._completeWithValue (dart:async/future_impl.dart:522)
#14 Future.timeout. (dart:async/future_impl.dart:776)
#15 _RootZone.runUnary (dart:async/zone.dart:1379)
#16 _FutureListener.handleValue (dart:async/future_impl.dart:137)
#17 Future._propagateToListeners.handleValueCallback (dart:async/future_impl.dart:678)
#18 Future._propagateToListeners (dart:async/future_impl.dart:707)
#19 Future._completeWithValue (dart:async/future_impl.dart:522)
#20 Future.wait. (dart:async/future.dart:400)
#21 _RootZone.runUnary (dart:async/zone.dart:1379)
#22 _FutureListener.handleValue (dart:async/future_impl.dart:137)
#23 Future._propagateToListeners.handleValueCallback (dart:async/future_impl.dart:678)
#24 Future._propagateToListeners (dart:async/future_impl.dart:707)
#25 Future._completeWithValue (dart:async/future_impl.dart:522)
#26 Future._asyncComplete. (dart:async/future_impl.dart:552)
#27 _microtaskLoop (dart:async/schedule_microtask.dart:41)
#28 _startMicrotaskLoop (dart:async/schedule_microtask.dart:50)
#29 _Timer._runTimers (dart:isolate-patch/timer_impl.dart:391)
#30 _Timer._handleMessage (dart:isolate-patch/timer_impl.dart:416)
#31 _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:172)
Done crawling.

[QUESTION] "HTTP 200 but missing anchor" and "connection failed"?

I am getting a lot of "HTTP 200 but missing anchor" and "connection failed" errors when scanning my site but if I try the links manually, they seem fine. Unfortunately the site can only be used within my company's network so I can't post the link but what do these errors mean and is there a way to ignore them?

FileSystemException: writeFrom failed, path = '' (OS Error: Broken pipe, errno = 32)

FileSystemException: writeFrom failed, path = '' (OS Error: Broken pipe, errno = 32)
#0 _RandomAccessFile.writeFromSync (dart:io/file_impl.dart:879)
#1 _StdConsumer.addStream. (dart:io/stdio.dart:344)
#2 _rootRunUnary (dart:async/zone.dart:1132)
#3 _CustomZone.runUnary (dart:async/zone.dart:1029)
#4 _CustomZone.runUnaryGuarded (dart:async/zone.dart:931)
#5 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336)
#6 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263)
#7 _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:764)
#8 _StreamController._add (dart:async/stream_controller.dart:640)
#9 _StreamController.add (dart:async/stream_controller.dart:586)
#10 _StreamSinkImpl.add (dart:io/io_sink.dart:156)
#11 _IOSinkImpl.write (dart:io/io_sink.dart:289)
#12 _IOSinkImpl.writeln (dart:io/io_sink.dart:309)
#13 _StdSink.writeln (dart:io/stdio.dart:341)
#14 crawl.print (package:linkcheck/src/crawl.dart:37)
#15 crawl. (package:linkcheck/src/crawl.dart:133)
#16 _AsyncAwaitCompleter.start (dart:async-patch/async_patch.dart:43)
#17 crawl. (package:linkcheck/src/crawl.dart:123)
#18 _rootRunUnary (dart:async/zone.dart:1132)
#19 _CustomZone.runUnary (dart:async/zone.dart:1029)
#20 _CustomZone.runUnaryGuarded (dart:async/zone.dart:931)
#21 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336)
#22 _DelayedData.perform (dart:async/stream_impl.dart:591)
#23 _StreamImplEvents.handleNext (dart:async/stream_impl.dart:707)
#24 _PendingEvents.schedule. (dart:async/stream_impl.dart:667)
#25 _rootRun (dart:async/zone.dart:1120)
#26 _CustomZone.run (dart:async/zone.dart:1021)
#27 _CustomZone.runGuarded (dart:async/zone.dart:923)
#28 _CustomZone.bindCallbackGuarded. (dart:async/zone.dart:963)
#29 _rootRun (dart:async/zone.dart:1124)
#30 _CustomZone.run (dart:async/zone.dart:1021)
#31 _CustomZone.runGuarded (dart:async/zone.dart:923)
#32 _CustomZone.bindCallbackGuarded. (dart:async/zone.dart:963)
#33 _microtaskLoop (dart:async/schedule_microtask.dart:41)
#34 _startMicrotaskLoop (dart:async/schedule_microtask.dart:50)
#35 _runPendingImmediateCallback (dart:isolate-patch/isolate_patch.dart:116)
#36 _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:173)

Unhandled exception:
HttpException: Connection closed while receiving data, uri = https://hidden_website.com
#0 _HttpIncoming.listen. (dart:_http/http_impl.dart:161)
#1 _invokeErrorHandler (dart:async/async_error.dart:17)
#2 _HandleErrorStream._handleError (dart:async/stream_pipe.dart:286)
#3 _ForwardingStreamSubscription._handleError (dart:async/stream_pipe.dart:168)
#4 _RootZone.runBinaryGuarded (dart:async/zone.dart:1326)
#5 _BufferingStreamSubscription._sendError.sendError (dart:async/stream_impl.dart:355)
#6 _BufferingStreamSubscription._sendError (dart:async/stream_impl.dart:373)
#7 _BufferingStreamSubscription._addError (dart:async/stream_impl.dart:272)
#8 _SyncStreamControllerDispatch._sendError (dart:async/stream_controller.dart:768)
#9 _StreamController._addError (dart:async/stream_controller.dart:648)
#10 _StreamController.addError (dart:async/stream_controller.dart:600)
#11 _HttpParser._onDone (dart:_http/http_parser.dart:822)
#12 _RootZone.runGuarded (dart:async/zone.dart:1302)
#13 _BufferingStreamSubscription._sendDone.sendDone (dart:async/stream_impl.dart:389)
#14 _BufferingStreamSubscription._sendDone (dart:async/stream_impl.dart:399)
#15 _BufferingStreamSubscription._close (dart:async/stream_impl.dart:283)
#16 _SyncStreamControllerDispatch._sendDone (dart:async/stream_controller.dart:772)
#17 _StreamController._closeUnchecked (dart:async/stream_controller.dart:629)
#18 _StreamController.close (dart:async/stream_controller.dart:622)
#19 _Socket._onDone (dart:io-patch/socket_patch.dart:1844)
#20 _RootZone.runGuarded (dart:async/zone.dart:1302)
#21 _BufferingStreamSubscription._sendDone.sendDone (dart:async/stream_impl.dart:389)
#22 _BufferingStreamSubscription._sendDone (dart:async/stream_impl.dart:399)
#23 _BufferingStreamSubscription._close (dart:async/stream_impl.dart:283)
#24 _SyncStreamControllerDispatch._sendDone (dart:async/stream_controller.dart:772)
#25 _StreamController._closeUnchecked (dart:async/stream_controller.dart:629)
#26 _StreamController.close (dart:async/stream_controller.dart:622)
#27 _RawSecureSocket._close (dart:io/secure_socket.dart:648)
#28 _RawSecureSocket.shutdown (dart:io/secure_socket.dart:670)
#29 _RawSecureSocket.close (dart:io/secure_socket.dart:623)
#30 _Socket._closeRawSocket (dart:io-patch/socket_patch.dart:1800)
#31 _Socket.destroy (dart:io-patch/socket_patch.dart:1732)
#32 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1817)
#33 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#34 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#35 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#36 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#37 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#38 _ConnectionTarget.close (dart:_http/http_impl.dart:1955)
#39 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#40 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#41 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#42 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#43 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#44 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#45 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#46 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#47 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#48 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#49 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#50 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#51 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#52 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#53 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#54 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#55 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#56 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#57 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#58 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#59 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#60 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#61 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#62 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#63 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#64 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#65 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#66 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#67 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#68 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#69 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#70 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#71 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#72 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#73 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#74 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#75 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#76 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#77 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#78 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#79 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#80 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#81 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#82 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#83 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#84 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#85 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#86 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#87 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#88 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#89 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#90 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#91 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#92 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#93 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#94 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#95 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#96 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#97 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#98 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#99 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#100 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#101 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#102 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#103 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#104 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#105 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#106 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#107 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#108 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#109 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#110 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#111 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#112 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#113 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#114 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#115 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#116 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#117 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#118 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#119 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#120 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#121 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#122 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#123 _ConnectionTarget.close (dart:_http/http_impl.dart:1955)
#124 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#125 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#126 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#127 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#128 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#129 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#130 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#131 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#132 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#133 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#134 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#135 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#136 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#137 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#138 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#139 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#140 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#141 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#142 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#143 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#144 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#145 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#146 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#147 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#148 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#149 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#150 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#151 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#152 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#153 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#154 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#155 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#156 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#157 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#158 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#159 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#160 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#161 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#162 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#163 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#164 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#165 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#166 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#167 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#168 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#169 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#170 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#171 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#172 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#173 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#174 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#175 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#176 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#177 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#178 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#179 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#180 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#181 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#182 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#183 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#184 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#185 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#186 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#187 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#188 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#189 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#190 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#191 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#192 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#193 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#194 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#195 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#196 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#197 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#198 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#199 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#200 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#201 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#202 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#203 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#204 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#205 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#206 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#207 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#208 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#209 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#210 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#211 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#212 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#213 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#214 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#215 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#216 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#217 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#218 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#219 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#220 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#221 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#222 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#223 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#224 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#225 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#226 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#227 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#228 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#229 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#230 _HttpClient._connectionsChanged (dart:_http/http_impl.dart:2275)
#231 _HttpClient._connectionClosed (dart:_http/http_impl.dart:2269)
#232 _HttpClientConnection.destroy (dart:_http/http_impl.dart:1816)
#233 _ConnectionTarget.close (dart:_http/http_impl.dart:1958)
#234 _HttpClient._closeConnections (dart:_http/http_impl.dart:2281)
#235 _HttpClient.close (dart:_http/http_impl.dart:2152)
#236 worker. (package:linkcheck/src/worker/worker.dart:186)
#237 _AsyncAwaitCompleter.start (dart:async-patch/async_patch.dart:43)
#238 worker. (package:linkcheck/src/worker/worker.dart:183)
#239 _RootZone.runUnaryGuarded (dart:async/zone.dart:1314)
#240 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336)
#241 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263)
#242 _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:764)
#243 _StreamController._add (dart:async/stream_controller.dart:640)
#244 _StreamController.add (dart:async/stream_controller.dart:586)
#245 _RootZone.runUnaryGuarded (dart:async/zone.dart:1314)
#246 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336)
#247 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263)
#248 _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:764)
#249 _StreamController._add (dart:async/stream_controller.dart:640)
#250 _StreamController.add (dart:async/stream_controller.dart:586)
#251 _StreamSinkWrapper.add (dart:async/stream_controller.dart:858)
#252 _RootZone.runUnaryGuarded (dart:async/zone.dart:1314)
#253 CastStreamSubscription._onData (dart:_internal/async_cast.dart:81)
#254 _RootZone.runUnaryGuarded (dart:async/zone.dart:1314)
#255 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336)
#256 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263)
#257 _SyncStreamControllerDispatch._sendData (dart:async/stream_controller.dart:764)
#258 _StreamController._add (dart:async/stream_controller.dart:640)
#259 _StreamController.add (dart:async/stream_controller.dart:586)
#260 _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:172)

Can't pub activate as of 2.0.0-dev.61 w/ --preview-dart-2

> export DART_VM_OPTIONS=--preview-dart-2
> pub global activate linkcheck
Package linkcheck is currently active at version 1.0.6.
Resolving dependencies... (2.0s)
+ args 0.13.7 (1.4.3 available)
+ async 2.0.7
+ charcode 1.1.1
+ collection 1.14.9
+ console 2.2.4
...
+ vector_math 1.4.7 (2.0.7 available)
Precompiling executables... (1.1s)
Failed to precompile linkcheck:linkcheck:
file:///Users/chalin/.pub-cache/hosted/pub.dartlang.org/console-2.2.4/lib/src/base.dart:216:40: Error: A value of type 'dart.core::List<dynamic>' can't be assigned to a variable of type 'dart.core::Iterable<dart.core::int>'.
Try changing the type of the left hand side, or casting the right hand side to 'dart.core::Iterable<dart.core::int>'.
    var str = new String.fromCharCodes(bytes);
                                       ^
file:///Users/chalin/.pub-cache/hosted/pub.dartlang.org/console-2.2.4/lib/src/canvas.dart:27:18: Error: A value of type 'console::PixelSpec' can't be assigned to a variable of type 'dart.core::int'.
Try changing the type of the left hand side, or casting the right hand side to 'dart.core::int'.
      spec = new PixelSpec(color: spec);
                 ^
file:///Users/chalin/.pub-cache/hosted/pub.dartlang.org/console-2.2.4/lib/src/canvas.dart:33:20: Error: A value of type 'dart.core::int' can't be assigned to a variable of type 'console::PixelSpec'.
Try changing the type of the left hand side, or casting the right hand side to 'console::PixelSpec'.
    pixels[x][y] = spec;
                   ^
file:///Users/chalin/.pub-cache/hosted/pub.dartlang.org/console-2.2.4/lib/clut.dart:4:4: Error: A value of type 'dart.core::int' can't be assigned to a variable of type 'dart.core::String'.
Try changing the type of the left hand side, or casting the right hand side to 'dart.core::String'.
  [0, '000000'],
   ^
file:///Users/chalin/.pub-cache/hosted/pub.dartlang.org/console-2.2.4/lib/clut.dart:5:4: Error: A value of type 'dart.core::int' can't be assigned to a variable of type 'dart.core::String'.
Try changing the type of the left hand side, or casting the right hand side to 'dart.core::String'.
  [1, '800000'],
   ^
file:///Users/chalin/.pub-cache/hosted/pub.dartlang.org/console-2.2.4/lib/clut.dart:6:4: Error: A value of type 'dart.core::int' can't be assigned to a variable of type 'dart.core::String'.
Try changing the type of the left hand side, or casting the right hand side to 'dart.core::String'.
  [2, '008000'],
   ^
file:///Users/chalin/.pub-cache/hosted/pub.dartlang.org/console-2.2.4/lib/clut.dart:7:4: Error: A value of type 'dart.core::int' can't be assigned to a variable of type 'dart.core::String'.
Try changing the type of the left hand side, or casting the right hand side to 'dart.core::String'.
  [3, '808000'],
   ^
file:///Users/chalin/.pub-cache/hosted/pub.dartlang.org/console-2.2.4/lib/clut.dart:8:4: Error: A value of type 'dart.core::int' can't be assigned to a variable of type 'dart.core::String'.
Try changing the type of the left hand side, or casting the right hand side to 'dart.core::String'.
  [4, '000080'],
   ^
file:///Users/chalin/.pub-cache/hosted/pub.dartlang.org/console-2.2.4/lib/clut.dart:9:4: Error: A value of type 'dart.core::int' can't be assigned to a variable of type 'dart.core::String'.
Try changing the type of the left hand side, or casting the right hand side to 'dart.core::String'.
  [5, '800080'],
   ^
file:///Users/chalin/.pub-cache/hosted/pub.dartlang.org/console-2.2.4/lib/clut.dart:10:4: Error: A value of type 'dart.core::int' can't be assigned to a variable of type 'dart.core::String'.
Try changing the type of the left hand side, or casting the right hand side to 'dart.core::String'.
  [6, '008080'],
   ^

I realize that the issue is with https://github.com/DirectMyFile/console.dart, but if that package can't be updated, maybe linkcheck should use another package?

cc @kwalrath @kevmoo

Can't save output to text file.

First off, thanks for the tool. Trying to use it for the first time, but cannot get past the problem below...

If I run:
linkcheck www.nobleprog.co.uk
...it runs fine.

If I run:
linkcheck www.nobleprog.co.uk > list.log
...it gets stuck on:
"Crawling..."

I've tried many times on different days, different servers, using different domain names, different log file names, switches such as 2>&1, all to no avail.

Any ideas?

--Daniel

skipped data url reported as invalid

When linkcheck is run over https://webdev.dartlang.org/, the following is reported:

http://localhost:4001/angular/api/static-assets/styles.css
- (391:24) url(...) => data:image/svg+xml;utf8,<svg xmlns='http://www.w3.org/2000/svg' width='16' height='16' viewBox='0 0 16 16'><path fill=' (invalid URL)

even if this skip pattern is used:

^data

Note that the skip pattern is being used:

Done checking: http://localhost:4001/angular/api/static-assets/styles.css (HTTP 200) => 3 links
- will not be checking: data:image/svg+xml;utf8,<svg xmlns='http://www.w3.org/2000/svg' width='16' height='16' viewBox='0 0 16 16'><path fill=' - URL 'data:image/svg+xml;utf8,<svg xmlns='http://www.w3.org/2000/svg' width='16' height='16' viewBox='0 0 16 16'><path fill='' skipped because it was matched by the following regular expressions of skip file './scripts/config/linkcheck-skip-list.txt': ^data (line 12)

But the data url is none-the-less being reported as invalid. Here is an actual sample entry:

background-image: url("data:image/svg+xml;utf8,<svg xmlns='http://www.w3.org/2000/svg' width='16' height='16' viewBox='0 0 16 16'><path fill='#DDDDDD' d='M6.7,4L5.7,4.9L8.8,8l-3.1,3.1L6.7,12l4-4L6.7,4z'/></svg>");

Skipped links probably shouldn't have validity checks performed on them. (On the hand, I'd also be curious to know why the data url is being considered invalid.)

Unhandled exception always on the same external site

Running linkcheck on a local site build including external links to be checked,
I always get an unhandled exception on the same external page, see below:

linkcheck -e --skip-file ../my_skip_file.txt --no-connection-failures-as-warnings > ../linkchecker.log 

Unhandled exception:
HttpException: Connection closed while receiving data, uri = https://jira.mariadb.org/robots.txt
#0      checkServer (package:linkcheck/src/worker/worker.dart:81:15)
<asynchronous suspension>
#1      worker.<anonymous closure> (package:linkcheck/src/worker/worker.dart:199:42)
<asynchronous suspension>
#2      _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#3      _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#4      _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#5      _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#6      _StreamController._add (dart:async/stream_controller.dart:640:7)
#7      _StreamController.add (dart:async/stream_controller.dart:586:5)
#8      _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#9      _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#10     _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#11     _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#12     _StreamController._add (dart:async/stream_controller.dart:640:7)
#13     _StreamController.add (dart:async/stream_controller.dart:586:5)
#14     _StreamSinkWrapper.add (dart:async/stream_controller.dart:858:13)
#15     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#16     CastStreamSubscription._onData (dart:_internal/async_cast.dart:81:11)
#17     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#18     _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#19     _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#20     _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#21     _StreamController._add (dart:async/stream_controller.dart:640:7)
#22     _StreamController.add (dart:async/stream_controller.dart:586:5)
#23     _RawReceivePortImpl._handleMessage (dart:isolate/runtime/libisolate_patch.dart:171:12)
INTERNAL ERROR: Sorry! Please open https://github.com/filiph/linkcheck/issues/new in your favorite browser and copy paste the following output there:

Bad state: No element

INTERNAL ERROR

Doing as instructed:

Sorry! Please open https://github.com/filiph/linkcheck/issues/new in your favorite browser and copy paste the following output there:

Bad state: No element
#0      SetMixin.singleWhere (dart:collection/set.dart:271)
#1      crawl.<anonymous closure> (package:linkcheck/src/crawl.dart:255)
#2      _rootRunUnary (dart:async/zone.dart:1132)
#3      _CustomZone.runUnary (dart:async/zone.dart:1029)
#4      _CustomZone.runUnaryGuarded (dart:async/zone.dart:931)
#5      _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336)
#6      _DelayedData.perform (dart:async/stream_impl.dart:591)
#7      _StreamImplEvents.handleNext (dart:async/stream_impl.dart:707)
#8      _PendingEvents.schedule.<anonymous closure> (dart:async/stream_impl.dart:667)
#9      _rootRun (dart:async/zone.dart:1120)
#10     _CustomZone.run (dart:async/zone.dart:1021)
#11     _CustomZone.runGuarded (dart:async/zone.dart:923)
#12     _CustomZone.bindCallbackGuarded.<anonymous closure> (dart:async/zone.dart:963)
#13     _rootRun (dart:async/zone.dart:1124)
#14     _CustomZone.run (dart:async/zone.dart:1021)
#15     _CustomZone.runGuarded (dart:async/zone.dart:923)
#16     _CustomZone.bindCallbackGuarded.<anonymous closure> (dart:async/zone.dart:963)
#17     _microtaskLoop (dart:async/schedule_microtask.dart:41)
#18     _startMicrotaskLoop (dart:async/schedule_microtask.dart:50)
#19     _runPendingImmediateCallback (dart:isolate-patch/isolate_patch.dart:116)
#20     _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:173)

Otherwise the output looks good to me. For instance I do get back valid warnings like:

http://www-vbox.transloadit.com/blog/2011/05/fixing-amazon-s3/
- (704:0) '#s3-expo..' => http://www-vbox.transloadit.com/blog/2011/05/fixing-amazon-s3/#blog-posts (HTTP 200 but missing anchor)

http://www-vbox.transloadit.com/blog/2011/05/support-for-leaving-smaller-images-untouched/
- (629:24) 'full doc..' => http://www-vbox.transloadit.com/docs/transcoding/#image-manipulation-and-resizing (HTTP 200 but missing anchor)

http://www-vbox.transloadit.com/blog/2013/01/improvements-for-how-assembly-crashes-are-handled/
- (630:49) 'the new ..' => http://www-vbox.transloadit.com/accounts/api_settings (HTTP 500)

HTTP 308 Reported as Error

Hello again, and thank you again for this amazing tool! I discovered today that any 308 status code is characterized as an error by the link checker, where I think it probably shouldn't be.

I tried to fix this locally but wasn't able to get anything working 😢 - if someone was able to provide a little guidance I'd be happy to take a stab at a PR!

Servers often behave very differently than filesystems -- which and how?

In your README you state:

Servers often behave very differently than file systems, so validating links on the file system often leads to both false positives and false negatives.

Surely linkchecker could make the same assumptions that static site generators already need to make to produce correct output? If that's not true, or if it would need to make more assumptions than that, could you elaborate on which changes in behavior you saw that make this impossible?

valid links reported as (connection failed) consistently

Any idea why these valid links fail to validate consistently?

linkcheck -e docs-dev.fast.ai/style.html

http://docs-dev.fast.ai/style.html
- (336:189) 'APL' => https://en.wikipedia.org/wiki/APL_(programming_language) (connection failed)
- (347:63) 'Iverson’s' => https://en.wikipedia.org/wiki/Kenneth_E._Iverson (connection failed)

running in debug mode I get:

Killing unresponsive Worker<0>
Done checking: https://en.wikipedia.org/wiki/APL_(programming_language) (connection failed) => 0 links
- BROKEN
Killing unresponsive Worker<2>
Done checking: https://en.wikipedia.org/wiki/Kenneth_E._Iverson (connection failed) => 0 links
- BROKEN

There is handful of other links to wikipedia on the same page and they all work. I double-checked manually that they work if I click on them.

Strangely there are 3 almost identical links:

https://en.wikipedia.org/wiki/APL_(programming_language)
https://en.wikipedia.org/wiki/J_(programming_language)
https://en.wikipedia.org/wiki/K_(programming_language)

and only the first out of 3 doesn't validate.

External link checking results in exception: "Invalid argument(s): Truncated URI"

For the full log, see https://travis-ci.org/flutter/website/jobs/455734598. Here is an excerpt:

pub run linkcheck --external --skip-file ./tool/config/linkcheck-skip-list.txt :4002 (logging to /home/travis/tmp/linkcheck-log.txt)
Crawling...
Unhandled exception:
Invalid argument(s): Truncated URI
#0      _Uri._uriDecode (dart:core/uri.dart:2850:13)
#1      Uri.decodeComponent (dart:core/uri.dart:1115:17)
#2      parseHtml.<anonymous closure> (package:linkcheck/src/parsers/html.dart:82:30)
#3      MappedListIterable.elementAt (dart:_internal/iterable.dart:414:29)
#4      ListIterable.toList (dart:_internal/iterable.dart:219:19)
#5      parseHtml (package:linkcheck/src/parsers/html.dart:83:8)
#6      checkPage (package:linkcheck/src/worker/worker.dart:168:10)
<asynchronous suspension>
#7      worker.<anonymous closure> (package:linkcheck/src/worker/worker.dart:192:29)
<asynchronous suspension>
#8      _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#9      _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#10     _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#11     _SyncStreamController._sendData (dart:async/stream_controller.dart:763:19)
#12     _StreamController._add (dart:async/stream_controller.dart:639:7)
#13     _StreamController.add (dart:async/stream_controller.dart:585:5)
#14     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#15     _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#16     _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#17     _SyncStreamController._sendData (dart:async/stream_controller.dart:763:19)
#18     _StreamController._add (dart:async/stream_controller.dart:639:7)
#19     _StreamController.add (dart:async/stream_controller.dart:585:5)
#20     _StreamSinkWrapper.add (dart:async/stream_controller.dart:858:13)
#21     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#22     CastStreamSubscription._onData (dart:_internal/async_cast.dart:81:11)
#23     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#24     _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#25     _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#26     _SyncStreamController._sendData (dart:async/stream_controller.dart:763:19)
#27     _StreamController._add (dart:async/stream_controller.dart:639:7)
#28     _StreamController.add (dart:async/stream_controller.dart:585:5)
#29     _RawReceivePortImpl._handleMessage (dart:isolate/runtime/libisolate_patch.dart:171:12)
...

cc @kwalrath @sfshaza2

Bad state: No element

#0 SetMixin.singleWhere (dart:collection/set.dart:267:5)
#1 crawl. (package:linkcheck/src/crawl.dart:255:20)
#2 _rootRunUnary (dart:async/zone.dart:1198:47)
#3 _CustomZone.runUnary (dart:async/zone.dart:1100:19)
#4 _CustomZone.runUnaryGuarded (dart:async/zone.dart:1005:7)
#5 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:357:11)
#6 _DelayedData.perform (dart:async/stream_impl.dart:611:14)
#7 _StreamImplEvents.handleNext (dart:async/stream_impl.dart:730:11)
#8 _PendingEvents.schedule. (dart:async/stream_impl.dart:687:7)
#9 _rootRun (dart:async/zone.dart:1182:47)
#10 _CustomZone.run (dart:async/zone.dart:1093:19)
#11 _CustomZone.runGuarded (dart:async/zone.dart:997:7)
#12 _CustomZone.bindCallbackGuarded. (dart:async/zone.dart:1037:23)
#13 _rootRun (dart:async/zone.dart:1190:13)
#14 _CustomZone.run (dart:async/zone.dart:1093:19)
#15 _CustomZone.runGuarded (dart:async/zone.dart:997:7)
#16 _CustomZone.bindCallbackGuarded. (dart:async/zone.dart:1037:23)
#17 _microtaskLoop (dart:async/schedule_microtask.dart:41:21)
#18 _startMicrotaskLoop (dart:async/schedule_microtask.dart:50:5)
#19 _runPendingImmediateCallback (dart:isolate-patch/isolate_patch.dart:118:13)
#20 _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:169:5)

Feature Request: Enhancement

Hello,
Can you please look into this.
Screenshot 2019-06-21 at 4 18 56 PM

Here is a property(maxlength) to buildTag for link, where it has limit upto 10 character, which truncates information of longer text. Will it be possible for you to increase limit(100 or more) or give a command line parameter to give length based on it.
If you don't have time, Can i raise a pull request for it.

Thank you,

A lot of connection failed

Hi
It seems that when executing linkcheck against https://daemons.it, some link won't be validated even when triying more than once. That link changes from one execution to other.

I'm using the dockerfile, if it helps. If I can add more information, just ask.

Thanks for your work, this program is awesome!

[Feature Request] - BasicAuth

Adding BasicAuth via direkt Input(https://user:passwort@fqdn) or extra Header would be a nice option for secure environments.

Unhandled exception

Unhandled exception:
NoSuchMethodError: The getter 'primaryType' was called on null.
Receiver: null
Tried calling: primaryType
#0      Object.noSuchMethod (dart:core-patch/object_patch.dart:50:5)
#1      DestinationResult.updateFromResponse (package:linkcheck/src/destination.dart:327:48)
#2      checkPage (package:linkcheck/src/worker/worker.dart:127:11)
<asynchronous suspension>
#3      worker.<anonymous closure> (package:linkcheck/src/worker/worker.dart:192:29)
<asynchronous suspension>
#4      _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#5      _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#6      _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#7      _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#8      _StreamController._add (dart:async/stream_controller.dart:640:7)
#9      _StreamController.add (dart:async/stream_controller.dart:586:5)
#10     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#11     _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#12     _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#13     _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#14     _StreamController._add (dart:async/stream_controller.dart:640:7)
#15     _StreamController.add (dart:async/stream_controller.dart:586:5)
#16     _StreamSinkWrapper.add (dart:async/stream_controller.dart:858:13)
#17     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#18     CastStreamSubscription._onData (dart:_internal/async_cast.dart:81:11)
#19     _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#20     _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#21     _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#22     _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#23     _StreamController._add (dart:async/stream_controller.dart:640:7)
#24     _StreamController.add (dart:async/stream_controller.dart:586:5)
#25     _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:171:12)

VM initialization failed: Invalid vm isolate snapshot seen

I added the native binary linkcheck-linux-x64 version 2.0.11 to a local ~/bin directory which is added to the PATH.

When I try to call linkcheck from another directory I get the following error.

VM initialization failed: Invalid vm isolate snapshot seen

When calling it directly inside the ~/bin directory, everything works as expected. Really nice tool!

internal error in _RawReceivePortImpl

Hey.. thanks for your great tool which we're using in an internal pipeline to check our documentation.. Today we noticed the following failure in a docker container built off google/dart with RUN pub global activate linkcheck crawling an adjacent container web server.

This error seems to be intermittent, can't give more info than provided in below:

+ docker-compose -f docker-compose-utils.yml run linkcheck bash -c linkcheck --no-nice --skip-file jenkins/linkcheck-ignore http://cxta_docs_ext:8080

Creating cxta_docs_ext ... 
Creating cxta_docs_ext ... done
Crawling...
INTERNAL ERROR: Sorry! Please open https://github.com/filiph/linkcheck/issues/new in your favorite browser and copy paste the following output there:

Bad state: No element

#0      SetMixin.singleWhere (dart:collection/set.dart:267:5)
#1      crawl.<anonymous closure> (package:linkcheck/src/crawl.dart:255:20)
#2      _rootRunUnary (dart:async/zone.dart:1198:47)
#3      _CustomZone.runUnary (dart:async/zone.dart:1100:19)
#4      _CustomZone.runUnaryGuarded (dart:async/zone.dart:1005:7)
#5      _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:357:11)
#6      _DelayedData.perform (dart:async/stream_impl.dart:611:14)
#7      _StreamImplEvents.handleNext (dart:async/stream_impl.dart:730:11)
#8      _PendingEvents.schedule.<anonymous closure> (dart:async/stream_impl.dart:687:7)
#9      _rootRun (dart:async/zone.dart:1182:47)
#10     _CustomZone.run (dart:async/zone.dart:1093:19)
#11     _CustomZone.runGuarded (dart:async/zone.dart:997:7)
#12     _CustomZone.bindCallbackGuarded.<anonymous closure> (dart:async/zone.dart:1037:23)
#13     _rootRun (dart:async/zone.dart:1190:13)
#14     _CustomZone.run (dart:async/zone.dart:1093:19)
#15     _CustomZone.runGuarded (dart:async/zone.dart:997:7)
#16     _CustomZone.bindCallbackGuarded.<anonymous closure> (dart:async/zone.dart:1037:23)
#17     _microtaskLoop (dart:async/schedule_microtask.dart:41:21)
#18     _startMicrotaskLoop (dart:async/schedule_microtask.dart:50:5)
#19     _runPendingImmediateCallback (dart:isolate-patch/isolate_patch.dart:118:13)
#20     _RawReceivePortImpl._handleMessage (dart:isolate-patch/isolate_patch.dart:169:5)

Include cookies or headers

I'd like a way to include cookies or custom headers in requests. I sometimes need to linkcheck authenticated URLs and it doesn't look like linkcheck supports this yet.

FR: Output list of links found

Not sure if this is out of scope - it would be useful for me to get a list of the links which were checked so I can do some post processing.

Linkcheck Internal errors and Unhandled exception

Hi Filip,

Again compliments. Linkcheck is the fastest linkchecker I'm aware of. Here are some issues I've found.

Best regards,
Hans

$ linkcheck wordpress.org
Crawling: 459INTERNAL ERROR: Sorry! Please open https://github.com/filiph/linkcheck/issues/new in your favorite browser and copy paste the following output there:

Invalid argument(s): Text "" must be 73 characters long.

736INTERNAL ERROR: Sorry! Please open https://github.com/filiph/linkcheck/issues/new in your favorite browser and copy paste the following output there:

Invalid argument(s): Text "" must be 128 characters long.

$ linkcheck https://autorijschoolokido.nl/
Crawling: 96Unhandled exception:
NoSuchMethodError: The getter 'primaryType' was called on null.
Receiver: null
Tried calling: primaryType
#0 Object.noSuchMethod (dart:core/runtime/libobject_patch.dart:50:5)
#1 DestinationResult.updateFromResponse (package:linkcheck/src/destination.dart:327:48)
#2 checkPage (package:linkcheck/src/worker/worker.dart:127:11)

#3 worker. (package:linkcheck/src/worker/worker.dart:192:29)

#4 _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#5 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#6 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#7 _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#8 _StreamController._add (dart:async/stream_controller.dart:640:7)
#9 _StreamController.add (dart:async/stream_controller.dart:586:5)
#10 _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#11 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#12 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#13 _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#14 _StreamController._add (dart:async/stream_controller.dart:640:7)
#15 _StreamController.add (dart:async/stream_controller.dart:586:5)
#16 _StreamSinkWrapper.add (dart:async/stream_controller.dart:858:13)
#17 _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#18 CastStreamSubscription._onData (dart:_internal/async_cast.dart:81:11)
#19 _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#20 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#21 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#22 _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#23 _StreamController._add (dart:async/stream_controller.dart:640:7)
#24 _StreamController.add (dart:async/stream_controller.dart:586:5)
#25 _RawReceivePortImpl._handleMessage (dart:isolate/runtime/libisolate_patch.dart:171:12)
97Unhandled exception:
NoSuchMethodError: The getter 'primaryType' was called on null.
Receiver: null
Tried calling: primaryType
#0 Object.noSuchMethod (dart:core/runtime/libobject_patch.dart:50:5)
#1 DestinationResult.updateFromResponse (package:linkcheck/src/destination.dart:327:48)
#2 checkPage (package:linkcheck/src/worker/worker.dart:127:11)

#3 worker. (package:linkcheck/src/worker/worker.dart:192:29)

#4 _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#5 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#6 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#7 _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#8 _StreamController._add (dart:async/stream_controller.dart:640:7)
#9 _StreamController.add (dart:async/stream_controller.dart:586:5)
#10 _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#11 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#12 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#13 _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#14 _StreamController._add (dart:async/stream_controller.dart:640:7)
#15 _StreamController.add (dart:async/stream_controller.dart:586:5)
#16 _StreamSinkWrapper.add (dart:async/stream_controller.dart:858:13)
#17 _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#18 CastStreamSubscription._onData (dart:_internal/async_cast.dart:81:11)
#19 _RootZone.runUnaryGuarded (dart:async/zone.dart:1314:10)
#20 _BufferingStreamSubscription._sendData (dart:async/stream_impl.dart:336:11)
#21 _BufferingStreamSubscription._add (dart:async/stream_impl.dart:263:7)
#22 _SyncStreamController._sendData (dart:async/stream_controller.dart:764:19)
#23 _StreamController._add (dart:async/stream_controller.dart:640:7)
#24 _StreamController.add (dart:async/stream_controller.dart:586:5)
#25 _RawReceivePortImpl._handleMessage (dart:isolate/runtime/libisolate_patch.dart:171:12)
102
^C
Ctrl-C Terminating crawl.

support link whitelisting

As an example of where this would be useful is when running the checker over https://webdev.dart-lang.org. We currently do not yet have an Angular guide for the Router, but we do have some Angular pages that already link into the (soon to be created) Router page. It would be great if we could whitelist links to the router page.

As an example the broken-link-checker has an excludeKeywords option. We use it like this under angular.io (note the value of the exclude array variable):

gulp.task('link-checker', () => {
  var method = 'get'; // the default 'head' fails for some sites
  var exclude = [
    // Dart API docs aren't working yet; ignore them
    '*/dart/latest/api/*',
    // Somehow the link checker sees ng1 {{...}} in the resource page; ignore it
    'resources/%7B%7Bresource.url%7D%7D',
    // API docs have links directly into GitHub repo sources; these can
    // quickly become invalid, so ignore them for now:
    '*/angular/tree/*',
    // harp.json "bios" for "Ryan Schmukler", URL isn't valid:
    'http://slingingcode.com'
  ];
  var blcOptions = { requestMethod: method, excludedKeywords: exclude};
  return linkChecker({ blcOptions: blcOptions });
});

cc @kwalrath @kevmoo

Skip patterns are ignored for external links

For example, the following skip pattern:

forum/flutter-dev

seems to be ignored for the external link https://groups.google.com/forum/#!forum/flutter-dev:

http://localhost:4002/tos
- (807:12) 'flutter-..' => https://groups.google.com/forum/#!forum/flutter-dev (HTTP 200 but missing anchor)

cc @Sfshaza

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.