Git Product home page Git Product logo

linky's Introduction

Linky

Crates.io Changelog

Extract links from Markdown files and check links for brokenness.

Motivation

Maintaining Markdown documentation you often end up with lots of links to tend to. Linky extracts all links and checks them, pointing out the broken ones so you can fix them. Specifically, linky was created to ease maintenance of Markdown documentation on Github.

Installation

Install stable Rust and Cargo:

$ curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Compile and install linky using Cargo:

$ cargo install linky

Usage

Extracting and checking links

The simplest thing you can do with linky is to extract links from a Markdown file:

$ linky example_site/path/to/example.md
example_site/path/to/example.md:3:  https://github.com/mattias-p/linky/blob/master/example_site/path/to/other.md
example_site/path/to/example.md:4:  https://github.com/mattias-p/linky/blob/master/example_site/path/to/other.md#existing
example_site/path/to/example.md:5:  other.md
example_site/path/to/example.md:6:  non-existing.md
example_site/path/to/example.md:7:  other.md#existing
example_site/path/to/example.md:8:  other.md#non-existing
example_site/path/to/example.md:9:  #heading
example_site/path/to/example.md:10:  #non-existing
example_site/path/to/example.md:11:  #heading-with-code
example_site/path/to/example.md:12:  #HEADING

The output lists all the extracted links along with their respective input files and line numbers.

Enable the --check option to resolve those links:

$ linky --check example_site/path/to/example.md
example_site/path/to/example.md:3: OK https://github.com/mattias-p/linky/blob/master/example_site/path/to/other.md
example_site/path/to/example.md:4: NO_FRAG https://github.com/mattias-p/linky/blob/master/example_site/path/to/other.md#existing
example_site/path/to/example.md:5: OK other.md
example_site/path/to/example.md:6: NO_DOC non-existing.md
example_site/path/to/example.md:7: OK other.md#existing
example_site/path/to/example.md:8: NO_FRAG other.md#non-existing
example_site/path/to/example.md:9: OK #heading
example_site/path/to/example.md:10: NO_FRAG #non-existing
example_site/path/to/example.md:11: OK #heading-with-code
example_site/path/to/example.md:12: CASE_FRAG #HEADING

A status token is now added to each line indicating the outcome of the resolution. An OK token indicates that the resolution succeeded without remarks. For details on how links are resolved see the link resolution section.

Recursive directory traversal

Linky doesn't do directory traversal on its own. Instead it integrates well with find and xargs:

$ find example_site -type f -print0 | xargs -0 linky
example_site/path/to/absolute.md:2:  /path/to/other.md
example_site/path/to/absolute.md:3:  /path/to/non-existing.md
example_site/path/to/absolute.md:4:  /path/to/other.md#existing
example_site/path/to/absolute.md:5:  /path/to/other.md#non-existing
example_site/path/to/example.md:3:  https://github.com/mattias-p/linky/blob/master/example_site/path/to/other.md
example_site/path/to/example.md:4:  https://github.com/mattias-p/linky/blob/master/example_site/path/to/other.md#existing
example_site/path/to/example.md:5:  other.md
example_site/path/to/example.md:6:  non-existing.md
example_site/path/to/example.md:7:  other.md#existing
example_site/path/to/example.md:8:  other.md#non-existing
example_site/path/to/example.md:9:  #heading
example_site/path/to/example.md:10:  #non-existing
example_site/path/to/example.md:11:  #heading-with-code
example_site/path/to/example.md:12:  #HEADING
example_site/path/to/follow.md:2:  http://github.com/mattias-p/linky/blob/master/example_site/path/to/other.md
example_site/path/to/follow.md:3:  http://github.com/mattias-p/linky/blob/master/example_site/path/to/other.md#non-existing
example_site/path/to/fragment.md:2:  https://github.com/mattias-p/linky/blob/master/example_site/path/to/other.md#existing
example_site/path/to/fragment.md:3:  https://github.com/mattias-p/linky/blob/master/example_site/path/to/other.md#non-existing
example_site/path/to/other.md:2:  example.md
example_site/path/to/transform.md:2:  https://github.com/mattias-p/linky/blob/master/example_site/path/to/non-existing.md
example_site/path/to/transform.md:3:  https://github.com/mattias-p/linky/blob/master/example_site/path/to/only-on-example-branch.md

Note: In case your paths contain spaces you may need the find -print0 and xargs -0 options.

Absolute local links

By default linky doesn't resolve absolute local links. This way you're able to know which links do redirect. Also, linky doesn't know the location of the document root.

$ linky --check example_site/path/to/absolute.md
example_site/path/to/absolute.md:2: ABSOLUTE /path/to/other.md
example_site/path/to/absolute.md:3: ABSOLUTE /path/to/non-existing.md
example_site/path/to/absolute.md:4: ABSOLUTE /path/to/other.md#existing
example_site/path/to/absolute.md:5: ABSOLUTE /path/to/other.md#non-existing

If you specify the document root using the --root option linky proceeds with the resolution relative to that directory:

$ linky --check --root=example_site example_site/path/to/absolute.md
example_site/path/to/absolute.md:2: OK /path/to/other.md
example_site/path/to/absolute.md:3: NO_DOC /path/to/non-existing.md
example_site/path/to/absolute.md:4: OK /path/to/other.md#existing
example_site/path/to/absolute.md:5: NO_FRAG /path/to/other.md#non-existing

HTTP redirects

By default linky doesn't follow HTTP redirects. This way you're able to know which links do redirect.

$ linky --check example_site/path/to/follow.md
example_site/path/to/follow.md:2: HTTP_301 http://github.com/mattias-p/linky/blob/master/example_site/path/to/other.md
example_site/path/to/follow.md:3: HTTP_301 http://github.com/mattias-p/linky/blob/master/example_site/path/to/other.md#non-existing

Enable the --follow option to make linky proceed with the resolution across HTTP redirects:

$ linky --check --follow example_site/path/to/follow.md
example_site/path/to/follow.md:2: OK http://github.com/mattias-p/linky/blob/master/example_site/path/to/other.md
example_site/path/to/follow.md:3: NO_FRAG http://github.com/mattias-p/linky/blob/master/example_site/path/to/other.md#non-existing

URI fragment identifiers

Sometimes when Markdown headings are converted into HTML id attributes a prefix is added to the id attribute. E.g. Github adds a "user-content-" prefix.

First, let's just try out the example file without specifying a prefix:

$ linky --check example_site/path/to/fragment.md
example_site/path/to/fragment.md:2: NO_FRAG https://github.com/mattias-p/linky/blob/master/example_site/path/to/other.md#existing
example_site/path/to/fragment.md:3: NO_FRAG https://github.com/mattias-p/linky/blob/master/example_site/path/to/other.md#non-existing

Now, let's try adding that prefix:

$ linky --check --prefix='user-content-' example_site/path/to/fragment.md
example_site/path/to/fragment.md:2: PREFIXED https://github.com/mattias-p/linky/blob/master/example_site/path/to/other.md#existing
example_site/path/to/fragment.md:3: NO_FRAG https://github.com/mattias-p/linky/blob/master/example_site/path/to/other.md#non-existing

Transforming links before resolution

If you, for example, want to check links against a development version of a sister site you can pipe your links through sed to transform the base URL.

First, let's just extract all links from the example file:

$ linky example_site/path/to/transform.md
example_site/path/to/transform.md:2:  https://github.com/mattias-p/linky/blob/master/example_site/path/to/non-existing.md
example_site/path/to/transform.md:3:  https://github.com/mattias-p/linky/blob/master/example_site/path/to/only-on-example-branch.md

Use sed to edit the links so they point to the sister site:

$ linky example_site/path/to/transform.md | sed 's,/master/,/example/,'
example_site/path/to/transform.md:2:  https://github.com/mattias-p/linky/blob/example/example_site/path/to/non-existing.md
example_site/path/to/transform.md:3:  https://github.com/mattias-p/linky/blob/example/example_site/path/to/only-on-example-branch.md

Note: You may need to be careful with your sed expressions so you don't inadvertently transform the path prefixes.

Finally, pipe the edited linky output into another linky process that actually checks the links:

$ linky example_site/path/to/transform.md | sed 's,/master/,/example/,' | linky --check
example_site/path/to/transform.md:2: HTTP_404 https://github.com/mattias-p/linky/blob/example/example_site/path/to/non-existing.md
example_site/path/to/transform.md:3: OK https://github.com/mattias-p/linky/blob/example/example_site/path/to/only-on-example-branch.md

Resolution speed

Linky uses a thread pool for link resolution. The size of the thread pool affects throughput, but the output itself remains unaffected.

You set the size of the thread pool using the RAYON_NUM_THREADS environment variable:

$ env RAYON_NUM_THREADS=16 linky --check example_site/path/to/example.md

The default size of the thead pool is equal to the number of logical CPU cores available to the process. You may be able to achieve much higher throughputs with a larger than default thread pool. Use the time command to benchmark a suitable size.

Resolution details

In case you ever wonder why a certain link resolved to whatever status token it got, set the RUST_LOG environment variable to warn.

$ env RUST_LOG=warn linky --check example_site/path/to/example.md
example_site/path/to/example.md:3: OK https://github.com/mattias-p/linky/blob/master/example_site/path/to/other.md
 WARN  linky > Fragment not found
 WARN  linky >   context: link = https://github.com/mattias-p/linky/blob/master/example_site/path/to/other.md
 WARN  linky >   context: fragment = #existing
example_site/path/to/example.md:4: NO_FRAG https://github.com/mattias-p/linky/blob/master/example_site/path/to/other.md#existing
example_site/path/to/example.md:5: OK other.md
 WARN  linky > Document not found
 WARN  linky >   context: link = /tmp/linky/example_site/path/to/non-existing.md
 WARN  linky >   caused by: No such file or directory (os error 2)
example_site/path/to/example.md:6: NO_DOC non-existing.md
example_site/path/to/example.md:7: OK other.md#existing
 WARN  linky > Fragment not found
 WARN  linky >   context: link = /tmp/linky/example_site/path/to/other.md
 WARN  linky >   context: fragment = #non-existing
example_site/path/to/example.md:8: NO_FRAG other.md#non-existing
example_site/path/to/example.md:9: OK #heading
 WARN  linky > Fragment not found
 WARN  linky >   context: link = /tmp/linky/example_site/path/to/example.md
 WARN  linky >   context: fragment = #non-existing
example_site/path/to/example.md:10: NO_FRAG #non-existing
example_site/path/to/example.md:11: OK #heading-with-code
 WARN  linky > Fragment not found case-sensitively
 WARN  linky >   context: link = /tmp/linky/example_site/path/to/example.md
 WARN  linky >   context: fragment = #HEADING
 WARN  linky >   context: anchor = #heading
example_site/path/to/example.md:12: CASE_FRAG #HEADING

Link resolution

Local links are resolved to readable ordinary files and directories in the local filesystem. HTTP(S) links are resolved using GET requests to 200-responses, optionally following redirects. Target documents are read and decoded into character strings.

For HTTP(S) links fragments are resolved to HTML anchors. For local links fragments are resolved to Markdown headings. Fragment resolution is attempted first without any prefix and then, if that fails, with each of the prefixes, if any were provided.

Issues

In case you experience an issue using linky, it would be most helpful if you provide a test Markdown document and the verbose output of the execution.

For example:

$ env RUST_LOG=debug RUST_BACKTRACE=1 linky --check  test.md 2&> linky_err.log

Note: RUST_LOG controls the logging verbosity. RUST_BACKTRACE controls the printing of the stack trace on panic.

Simply drag-and-drop the resulting linky_err.log file into the issue editor of Github. Unfortunately, as of February 2018, Github does not allow to drag-and-drop of Markdown files (*.md). You can either:

  • rename your test.md file to test.md.txt

  • use a third party paste service (e.g. Hastebin)

  • if the file is not large, inline it into the issue inside code blocks:

    ´´´Markdown

    your test markdown here ...

    ´´´

License

Copyright 2017-2019 Mattias Päivärinta

Licensed under the Apache License, Version 2.0 (the "License"); you may not use any of the files in this distribution except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

linky's People

Contributors

bugabinga avatar egrieco avatar llogiq avatar mattias-p avatar sanxiyn avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

linky's Issues

Logging

Linky reports result statuses for links as single words. This hides details of what actually went wrong. E.g. is it a domain lookup problem, or do we get a 404? It must be possible to obtain more verbose error messages.

Error messages should include:

  • Original link
  • File system path, if applicable
  • Error chain

URL-decode local links

Today link targets like "filename%20with%20spaces.md" resolve to NO_DOC. Local links should be URL decoded before the file name is looked up in the file system.

Links to pdf files

Allow links to pdf files. Today they're resolved to status DEC_ERR even though they aren't even text files.

Detect link labels without matching link reference definition

According to the latest Commonmark spec:

A shortcut reference link consists of a link label that matches a link reference definition elsewhere in the document and is not followed by [] or a link label.

Consequently syntax meant to be a shortcut reference link but whose label that doesn't match a link reference definition is not recognized as a link at all. There are a number of easy mistakes you could make to end up in this situation. You could misspell a link label, forget to add a matching definition, remove a definition that's still in use, or rename a definition but forget to update a reference to it.

It would be great if linky could emit a special tag for syntax that would be interpreted as a shortcut reference link if there was a matching link reference definition.

Allow mailto: links

Today mailto: links resolve to the status PROTOCOL. That is not reasonable.

Output examples

  • Update user documentation with outputs from the example commands.
  • Update example files to match the user documentation.
  • Update implementation to match the user documentation.

Improve the purpose statement

Nobody is interesting in yet another tool unless it solves a problem the actually have. At the time of writing Linky doesn't make much effort to convey its intended use case. The primary use case should be clearly stated right at the top of the README.

Log Location headers

When the HTTP Location header is present it should be logged. This is helpful when updating links that redirect.

Improve user documentation

Some amount of thought has been put into the UI of Linky. The user documentation presently consists of a list of example inputs. The user documentation should be structured like a tutorial geared towards the primary use case, showing how to best make use of the various UI features along the way.

Links to xml files

Allow links to document with mime type text/xml. Today they're resolved to status MIME and the log complains about an unrecognized mime type.

IO_ERR for lua.org

Hi!

Thanks for making ´linky´. I use it to lint my little website!

´linky´ returns an IO_ERR for https://www.lua.org/.
But the URL resolves fine in the browser.

test.md:

[SUCCESS](https://bugabinga.net)
[FAIL](https://www.lua.org/)
$ env RUST_LOG=debug RUST_BACKTRACE=1 linky --check  test.md 2&> linky_err.log

linky_err.log

Check image links

It'd be super handy if ![image](links) were also supported -- for several of my projects, screenshots are extremely important and much to my embarrassment I've broken them before :P.

Fix fragment resolution for github

Github prefixes heading fragments with "user-content-" in the HTML and uses Javascript to scroll. I'm not aware of a robust way to know beforehand if the target document should be searched for a raw fragment or a prefixed one. Linky must be able to verify Github links with fragments in a meaningful way.

Local links to directories

Allow local links to directories. Today such links resolve to status IO_ERR which doesn't really make sense.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.