Git Product home page Git Product logo

lycheeverse / lychee Goto Github PK

View Code? Open in Web Editor NEW
1.9K 12.0 116.0 4.72 MB

⚡ Fast, async, stream-based link checker written in Rust. Finds broken URLs and mail addresses inside Markdown, HTML, reStructuredText, websites and more!

Home Page: https://lychee.cli.rs

License: Apache License 2.0

Rust 98.62% Dockerfile 0.87% Makefile 0.32% Shell 0.19%
link-checker link-checking link-checkers validator broken-links link check

lychee's People

Contributors

abordage avatar dblock avatar dependabot-preview[bot] avatar dependabot[bot] avatar dscho avatar elkiwa avatar fabianbg avatar fauust avatar github-actions[bot] avatar hu90m avatar jbampton avatar kemingy avatar kxxt avatar lebensterben avatar manish0kuniyal avatar matttimms avatar michaing avatar mre avatar mre-trv avatar orhun avatar pawroman avatar pwnwriter avatar stefankreutz avatar szepeviktor avatar techassi avatar thomas-zahner avatar untitaker avatar vpereira01 avatar walterbm avatar xiaochuanyu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lychee's Issues

Extended summary

Hi!

I am trying to upgrade from link-checker to lychee-action however I am running into a small issue. link-checker output is plain text and very easy to process (grep) to list only broken links. lychee's output is much prettier, and I actually really like the summary it produces. However it would be great if lychee provided a way to list only links which produced errors.

Is this at all possible?

Consider retrying on certain HTTP codes only

The purpose of a link checker is to ensure that the tested links are all still valid.

However, some HTTP codes indicate permanent issues and resources that are gone. In my experience, 404 is usually not worth retrying.

You can argue that retries should only happen for certain subset of the HTTP codes, to speed up the link checking process. This might be a configurable option.

Introduce extractor pool

Introduction

As of now we send each URI we'd like to check to a client pool in main.rs. Code is here:

lychee/src/main.rs

Lines 128 to 135 in d2e349c

tokio::spawn(async move {
for link in links {
if let Some(pb) = &bar {
pb.set_message(&link.to_string());
};
send_req.send(link).await.unwrap();
}
});

This is not ideal for a few reasons:

  • All links get extracted on startup. This is a slow process that can take up to a few seconds for long link lists.
    It's not necessary to block the client during this step, though as we could lazy-load the links on demand from the inputs.
  • There is no clear separation of concerns between main and the link extraction. Ideally the responsibilities could should be split up to make testing and refactoring easier.

We already use a channel for sending the links to check to the client pool. We could use the same abstraction for extracting the links, too in form of an extractor pool.

In the future this would allow implementing some advanced features in an extensible way:

  • Recursively check links: Push newly discovered websites into the input channel of the extractor pool
  • Skip duplicate URLs: Filter input links with a HashSet or even a Bloom filter (for constant memory-usage) that is maintained by the extractor pool before sending it to the client pool.
  • Request throttling: Group requests per website and apply some throttling to not overload the server.

How to contribute

  1. Create an extractor pool similar to our client pool
  2. Spawn the pool inside main on startup, pass the channel to the pool and start processing the inputs.

(The other end of the channel the channel is already passed to the client pool.)

Builder pattern for Checker

At the moment we construct the Checker object (which takes care of actually checking the links) with a basic constructor:
https://github.com/hello-rust/lychee/blob/812c5972a3aa6004273a9fe512b95c7f9a3da91a/src/checker.rs#L293-L310

The readability and maintainability could be greatly improved e.g. by using a builder pattern.
This way, the individuals fields would be more understandable and we could omit the ones where the defaults work.

I had good experiences with derive_builder in the past
Here is an example from their docs:

#[macro_use]
extern crate derive_builder;

#[derive(Default, Builder, Debug)]
#[builder(setter(into))]
struct Channel {
    token: i32,
    special_info: i32,
    // .. a whole bunch of other fields ..
}

fn main() {
    // builder pattern, go, go, go!...
    let ch = ChannelBuilder::default()
        .special_info(42u8)
        .token(19124)
        .build()
        .unwrap();
    println!("{:?}", ch);
}

We could use something similar for our Checker:

 let checker = Checker::default() 
         .token("DUMMY_GITHUB_TOKEN".to_string()) 
         .max_redirects(5)
         .user_agent("curl/7.71.1".to_string())
         .build()?;

This would clean up a lot of lines in our test code and throughout the application.
If anyone wants to tackle this, please add a comment here.

Separate crate for test utilities

Currently, the test_utils module is exported as part of the public API.

This is not by design, but rather a simplified version of what it should be: we should extract all shared test/benchmarking functionality into a separate crate. This crate could then be used by unit tests as well as integration tests in the tests directory.

The crate could live in the main repo, since cargo supports workspaces natively.

Consider treating codes other than 200 as success

Currently, we only consider 200 status code as success. But in some cases codes like 206 Partial Content might be treated as success, e.g. the entire content might not be needed.

This should probably be a configurable option.

Add support for basic auth

Sometimes websites require basic auth to be retrieved.
We should add a flag for basic auth support to lychee.

Maybe something like --basic-auth "username:password".

Hyper (our http client) recently moved typed headers into a separate crate.

Here is the header we could use (docs):

use headers::Authorization;

let basic = Authorization::basic("username", "password");

We could pass that to hyper when we create the client instance.
Pull requests greatly appreciated. Add a comment here if you'd like to work on this.

Report output in JSON format

Introduction

We only print a summary to the console at the end of the process at the moment.
Having machine-readable output would allow other tools and CI pipelines to consume
lychee's output.

How to contribute

Link checking statistics end up in a stats object that is defined here:
https://github.com/lycheeverse/lychee/blob/master/src/stats.rs

You'd have to implement Serialize and Deserialize from serde for this struct:

lychee/src/stats.rs

Lines 10 to 18 in d2e349c

pub struct ResponseStats {
total: usize,
successful: usize,
failures: HashSet<Uri>,
timeouts: HashSet<Uri>,
redirects: HashSet<Uri>,
excludes: HashSet<Uri>,
errors: HashSet<Uri>,
}

After that the output can be automatically converted to JSON format (as well as HTML, SQL, CSV, XML, YAML, TOML and others if you want).

Here is a good example of the full process:
https://github.com/serde-rs/json#creating-json-by-serializing-data-structures

Mentorship

If you need more guidance don't be afraid to ask here. We will do our best to help you get started.

Improve library support

In #64 @drahnr mentioned a few improvement areas for library support:

  • Reduce the exposed module structure - it's better to have fewer modules when each of them only contains a single or two entities
  • Provide true errors (one that impl's Send + Sync + 'static + std::error:Error) rather than using anyhow. So use thiserror instead of anyhow.
  • Currently one example is provided for the simple case. It would be nice if that was shown as the entry one for docs.rs and expanded in order to showcase the usage of InputContent with collect_links. We should create an examples folder which contains this and other use-cases.
  • Provide the byte- and character ranges of the link that was being checked, would be very helpful in order to craft meaningful error messages. Needs support by html5gum.
  • At the moment the library depends on tokio. Make the library executor-agnostic and not rely on any particular runtime. Probably not worth it. Quite a bit of work for very little gain.

Contributing

Contributions very welcome.
Add a comment here if you'd like to tackle one of the points above or simple send a pull request right away. 😃

Implement Retry/backoff

The challenge

Currently lychee does not retry failed requests.
For this to work, a mechanism for rescheduling requests has to be in place.
One option is using channels for passing requests and responses and rescheduling the failed ones.

As a bonus it would be nice to support exponential backoff when retrying requests. There are crates for this, like backoff.

Help welcome.
I'm totally open for alternative suggestions and brainstorming here.

What you will do

  • Come up with your own solution
  • Have some fun with Rust (I hope)
  • Work with Rusts async toolchain

Medium.com links parsed as mail links

E.g.:

Errors in new-docs/content/any/project/performance/n_plus_one.md
↯ medium.com/@bretdoucette/n-1-queries-and-guides-avoid-them-a12f02345be5 (Invalid mail address: medium.com/@bretdoucette/n-1-queries-and-guides-avoid-them-a12f02345be5)

Presumably this is a bit too eager:

lychee/src/uri.rs

Lines 59 to 61 in cefe38e

if s.contains('@') & !is_link_internal {
return Ok(Uri::Mail(s.to_string()));
}

Not all URLs are found depending on HTML structure

It seems that parsing a HTML file for links aborts unexpectedly at some point, depending on the format/structure of that HTML file:

./lychee -v https://dietpi.com/index.html
./lychee -v https://dietpi.com/docs/index.html
# each
📝 Summary
-------------------
🔍 Total: 0
✅ Successful: 0
⏳ Timeouts: 0
🔀 Redirected: 0
👻 Excluded: 0
🚫 Errors: 0

I added some random newlines to the document which lead to two URLs being found.

The first HTML is in non-minified (otherwise exact match) form available here, where at least 6 URLs are found and checked:

./lychee -v https://raw.githubusercontent.com/MichaIng/DietPi-Website/master/index.html
📝 Summary
-------------------
🔍 Total: 6
✅ Successful: 6
⏳ Timeouts: 0
🔀 Redirected: 0
👻 Excluded: 0
🚫 Errors: 0

But when you check the HTML, there are a lot more links of different kinds. There seem to be something, probably too long lines (?), that makes lychee stop parsing for further URLs, or at least does not find any further, even that they are there.

Publish images to Docker Hub

To make the project more user friendly, and easier to integrate into existing workflows, we should add Docker image builds for releases/tags and publish them to Docker Hub.

Look into tower as a replacement for deadpool + channels

While working on #33 I looked into existing libraries for network handling.
I even asked around on Twitter.

Turns out there is a neat little library called tower that was recommended by @shanesveller.
It handles many things that we currently do manually or not at all:

  • network timeouts
  • retries
  • batching

This part is particularly interesting:

This module defines a load-balanced pool of services that adds new services when load is high.

The pool uses poll_ready as a signal indicating whether additional services should be spawned
to handle the current level of load. Specifically, every time poll_ready on the inner service
returns Ready, [Pool] consider that a 0, and every time it returns Pending, [Pool]
considers it a 1. [Pool] then maintains an exponential moving
average
over those
samples, which gives an estimate of how often the underlying service has been ready when it was
needed "recently" (see [Builder::urgency]). If the service is loaded (see
[Builder::loaded_above]), a new service is created and added to the underlying [Balance].
If the service is underutilized (see [Builder::underutilized_below]) and there are two or
more services, then the latest added service is removed. In either case, the load estimate is
reset to its initial value (see [Builder::initial] to prevent services from being rapidly
added or removed.

(Sorry for quoting the entire thing, but I think it's quite rad.
AFAICT with that we can have some pool that optimizes network throughput.

If anyone has comments on the pros and cons of tower I'd love to hear them.
Perhaps somebody even wants to investigate and create a PR for it?

Help and feedback definitely wanted!

Support checking local file links

#15 implemented relative URLs, however a simple test shows that it is not working:

$ cat '<a href="./foo.html">Broken</a>' > index.html
$ GITHUB_TOKEN= lychee index.html -pv

📝Summary
-------------------
🔍Found: 0
👻Excluded: 0
✅Successful: 0
🚫Errors: 0

Expected to find a 404 because file foo.html does not exist.

Support relative URLs

I'd like to suport checking relative urls.
Maybe we could add a base-url parameter that gets used for relative URLs.
Help wanted.

Error: thread 'tokio-runtime-worker' panicked at 'not currently running on the Tokio runtime.'

I encountered this panic initially in GitHub Actions (someone else has ran into this and has filed lycheeverse/lychee-action#4), but I have been able to reproduce this error in my local environment myself (lychee 0.5.0), so I'm opening this issue here.

Here is the backtrace:

thread 'tokio-runtime-worker' panicked at 'not currently running on the Tokio runtime.', /Users/caleb/Library/Caches/Homebrew/cargo_cache/registry/src/github.com-1ecc6299db9ec823/tokio-0.2.22/src/runtime/handle.rs:118:28
stack backtrace:
   0:        0x10ebd047e - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h3b6ed74a60c4de30
   1:        0x10ec06c0e - core::fmt::write::h72dd6ddbc116ef3c
   2:        0x10ebcf86a - std::io::Write::write_fmt::h033803ce14d847cc
   3:        0x10ebe4639 - std::panicking::default_hook::{{closure}}::h040276b51a1a4749
   4:        0x10ebe435e - std::panicking::default_hook::h2a43ed83163cecb9
   5:        0x10ebe4bca - std::panicking::rust_panic_with_hook::h15f3dba6c099e04e
   6:        0x10ebd0b35 - std::panicking::begin_panic_handler::{{closure}}::h33fb39231ad9a88d
   7:        0x10ebd05f8 - std::sys_common::backtrace::__rust_end_short_backtrace::hd5ec6f84e4df1d34
   8:        0x10ebe4743 - _rust_begin_unwind
   9:        0x10ec2518f - core::panicking::panic_fmt::h7889f3b8e7c118f7
  10:        0x10ec24c5a - core::option::expect_failed::h5fe3576924a3bde2
  11:        0x10ea80ed4 - tokio::runtime::handle::Handle::current::h89cdd643080c74fa
  12:        0x10e969927 - tokio::runtime::blocking::pool::spawn_blocking::h1a37cd83de30f186
  13:        0x10e979d28 - <hyper::client::connect::dns::GaiResolver as tower_service::Service<hyper::client::connect::dns::Name>>::call::h61e571d51b6de036
  14:        0x10e8bd8c3 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h69dc60019f0f2088
  15:        0x10e8c49d1 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hdd28a6be8b7b5042
  16:        0x10e8b89c1 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h29531a238642b647
  17:        0x10e8c5d5b - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::hf9e83dde27a4801c
  18:        0x10e8b7195 - <hyper::service::oneshot::Oneshot<S,Req> as core::future::future::Future>::poll::h6f7b6ee34c74f84a
  19:        0x10e9345e9 - <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll::h70474202a53b3651
  20:        0x10e934941 - <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll::ha14d54d9ca61ad28
  21:        0x10e8ff195 - <futures_util::future::try_future::try_flatten::TryFlatten<Fut,<Fut as futures_core::future::TryFuture>::Ok> as core::future::future::Future>::poll::h587492a7f2201d74
  22:        0x10e8931e8 - <hyper::common::lazy::Lazy<F,R> as core::future::future::Future>::poll::hb1a677b7f21c70e0
  23:        0x10e932f9c - <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll::h1776ff857ff61130
  24:        0x10e923beb - <futures_util::future::future::flatten::Flatten<Fut,<Fut as core::future::future::Future>::Output> as core::future::future::Future>::poll::hde9bd31e7767b2a1
  25:        0x10e936163 - <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll::he6181a9c8bf8decf
  26:        0x10e8ff4fa - <futures_util::future::try_future::try_flatten::TryFlatten<Fut,<Fut as futures_core::future::TryFuture>::Ok> as core::future::future::Future>::poll::h59a74d989a93a7f6
  27:        0x10e912dfb - <futures_util::future::poll_fn::PollFn<F> as core::future::future::Future>::poll::h4e7335b26450df9d
  28:        0x10e96c656 - <hyper::client::ResponseFuture as core::future::future::Future>::poll::h92cb2c7ab9f8ecd9
  29:        0x10e916d74 - <reqwest::async_impl::client::PendingRequest as core::future::future::Future>::poll::h5ecc6d3415cf3506
  30:        0x10e916c87 - <reqwest::async_impl::client::Pending as core::future::future::Future>::poll::hb8d12fd76e64dbc8
  31:        0x10e819afb - <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll::hf64ca53c51c6012c
  32:        0x10e82742d - <futures_util::future::try_future::try_flatten::TryFlatten<Fut,<Fut as futures_core::future::TryFuture>::Ok> as core::future::future::Future>::poll::h06472600e0d834db
  33:        0x10e8181c9 - <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll::h588738c9b92b7f0f
  34:        0x10e827f34 - <futures_util::future::try_future::try_flatten::TryFlatten<Fut,<Fut as futures_core::future::TryFuture>::Ok> as core::future::future::Future>::poll::h8c9611ad63cec342
  35:        0x10e82af0e - <futures_util::future::try_future::AndThen<Fut1,Fut2,F> as core::future::future::Future>::poll::h618b0c2ebb4e0f8a
  36:        0x10e818704 - <futures_util::future::future::map::Map<Fut,F> as core::future::future::Future>::poll::h71d1b9106303bf37
  37:        0x10e82aece - <futures_util::future::try_future::MapOk<Fut,F> as core::future::future::Future>::poll::h8686e11037584f66
  38:        0x10e53cdd8 - <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll::h4159be52d6d7d02a
  39:        0x10e5a6806 - <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::h9e44a18f075958ea
  40:        0x10e5fa9f2 - tokio::runtime::task::harness::Harness<T,S>::poll::hb87397d6c8ae81a1
  41:        0x10ea8876f - std::thread::local::LocalKey<T>::with::hc7b3d1d25d224e3a
  42:        0x10eaa46cf - tokio::runtime::thread_pool::worker::Context::run_task::h29d701a9af81405d
  43:        0x10eaa3c03 - tokio::runtime::thread_pool::worker::Context::run::h88f812f18928c39b
  44:        0x10ea8ac05 - tokio::macros::scoped_tls::ScopedKey<T>::set::ha079f29468eb40d4
  45:        0x10eaa336d - tokio::runtime::thread_pool::worker::run::hb4225ff6226cfa1a
  46:        0x10eaa0f2f - tokio::runtime::task::core::Core<T,S>::poll::habdfb8e2a0dc1545
  47:        0x10eaa8916 - <std::panic::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once::hc521d0629d24b295
  48:        0x10ea864a5 - tokio::runtime::task::harness::Harness<T,S>::poll::h1255d0013f67039a
  49:        0x10ea9bc08 - tokio::runtime::blocking::pool::Inner::run::h48447d92da9b7811
  50:        0x10ea88119 - std::sys_common::backtrace::__rust_begin_short_backtrace::hd03a52f4a358aca2
  51:        0x10eaad455 - core::ops::function::FnOnce::call_once{{vtable.shim}}::h42e03a1e69101f0f
  52:        0x10ebebf4b - std::sys::unix::thread::Thread::new::thread_start::hed7aa7efa61a9b35
  53:     0x7fff2033e950 - __pthread_start

(This particular trace repeats over and over again, presumably once per problematic link check attempt.)

Lychee was built with Rust 1.49.0 and is running on macOS 11.1 Big Sur (Intel). I was using a GITHUB_TOKEN. Please let me know if you need any further information about my environment, etc.

Print scanned sources when using "--verbose" option

Currently the --verbose option prints all checked links, but not the files/sources in which those URLs were found. It would be great if lychee would print the currently processed source as well before the checked URLs found in each source.

But I guess the concurrency of processed sources will be an issue to implement this, right?

Better handle input files arguments

I have two suggestions for improvements in terms of handling the input files arguments:

  1. Allow to pass in links from stdin, e.g. follow the standard-ish Unix convention of specifying - as input file, instructing the program to read in from STDIN instead. E.g. this should work fine:
# - means read from stdin instead from file
$ echo "https://google.com" | lychee -
  1. Issue some kind of warning or error for non-existing files. Perhaps erroring out instead of issuing a warning can be an optional flag. This should help the user experience in the face of typos and other mishaps.

Expose the link check as library

Hey, awesome tool and as a matter of fact cargo-spellcheck has a bit of an overlap (I just wasn't aware of lychee's existence!).

It would be awesome if lychee could be used as a library as well so that arbitrary cmark content can be fed into it, this would be particularly interesting for not re-implementing the logic as part of drahnr/cargo-spellcheck#113

Thanks!

Report extended statistics: request latency

Introduction

Some requests take longer than others — sometimes by a significant amount.
As of now we are blind how long each website took to respond. Providing this information to the user would help with troubleshooting problems and optimizing testing times.

It would be great to add some extended statistics to our final output (in case the user wants to see those).

How to contribute

We could instrument lychee with metrics and time each request.
The data could be part of our response stats here: https://github.com/lycheeverse/lychee/blob/master/src/stats.rs

Ideally we should start making the stats output machine-readable before tackling this. See #53.

Skip/Exclude Unallowed URL Schemes

@mre - I get the below output when I run the link checker. All the 6 links marked as errors are correct and working. Therefore, how can I skip/exclude them from being checked?

Screenshot 2021-01-12 at 08 45 09

My config is as follows:

on: [pull_request]
name: Link Checker 
jobs:
  linkChecker:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Build the site
        uses: shafreenAnfar/jekyll-build-action@v5
      - name: Lychee link checker action
        id: lc
        uses: lycheeverse/lychee-action@v1
        with:
          args: --exclude-link-local
      - name: Archive production artifacts
        uses: actions/upload-artifact@v2
        with:
          name: Link Checker Report
          path:  ./lychee/out.md  
      - name: Fail if there were link errors
        run: exit ${{ steps.lc.outputs.exit_code }}

Add youtube and vimeo link check support

For youtube links, it's necessary to scrape the content of the website to work out if the video still exists. For vimeo, they don't really like fake user agents.

The https://www.youtube.com/oembed API will return 404 for missing videos and can be used to check youtube links.

We special case GitHub already, we could special case YouTube and Vimeo too. The mechanism to specialize these kinds of links should be made extensible though.

Ensure cookies are not re-used between requests

Certain websites use cookies to throttle/rate-limit requests. Not storing cookies between requests is one strategy to avoid being rate-limited.

I'm not sure what the current behavior is, but would be good to ensure this.

URL reference to other Markdown treated as E-Mail check

I'm having a Markdown in which I refer to other Markdown files using the @ notation. For example., to link to another Markdown, I use

[please have a look here](@/posts/some/path/mk.md)

When I run it through the action, the above gets checked as E-Mail. Is there a way to ignore such links or to interpret it as a link instead of an E-Mail?

Refactor: use channel for in-flight requests

In the future we'd like to support recursive link checking.

lychee --recursive endler.dev

would check all links on my blog until a configurable max-depth.
At the moment we iterate over all links of a page in a for-loop and print a
status report at the end. Recursive execution would require to modify
that for-loop during iteration, which is not ideal.

A channel for in-flight requests might both allow for a simpler implementation of recursion
while also making the architecture more maintainable
I'm keeping an eye on https://github.com/stjepang/async-channel and I'd love to try
it for lychee. If someone wants to tackle this, here is a list of points to modify:

  • Replace this code in main with a channel.
  • Wrap a worker around the checker, which reads from the channel and passes the request to the checker.
  • Adjust the unit tests.

My thinking is that wrapping the checker would allow for separation of concerns:
the checker would not know about the channel and we could test the checker in isolation from the rest.
Also, we might want to have a worker pool of checkers in the future.

There are many design possibilities for this feature so I'm happy to discuss our options here.

Behavior when specifying directory

Open question: what should be the default behavior if a directory is given?
At the moment we throw and error and stop. Instead, the user has to use a glob pattern for recursion, e.g. **/*.md.
That doesn't seem to be very intuitive right now. See lycheeverse/lychee-action#3.

Here's my proposal:
If the user specifies a single directory, analyze all supported files (currently html/md/txt) within that directory.
This also holds if the user specifies a glob that implies a directory.

Any objections or concerns?

Allow to configure method other than GET for certain links

Checking links in the wild, I have learned that doing a GET on .pdf, .png and other big files might be excessive. Usually a HEAD request for such files is sufficient.

The users should be able to specify which method to use for links (as a config option), perhaps using a regex->method mapping, so the checker doesn't have to be ran multiple times.

How to reduce duplicate dependencies?

I did run cargo +nightly udeps before and we don't have any unused dependencies.
However @pizzamig mentioned that there is still room for improvement.

Looking at our dependency tree it's quite large:

lychee v0.3.1 (/Users/mendler/Code/private/lychee)
├── anyhow v1.0.34
├── check-if-email-exists v0.8.15
│   ├── async-smtp v0.3.4
│   │   ├── async-native-tls v0.3.3
│   │   │   ├── async-std v1.7.0
│   │   │   │   ├── async-global-executor v1.4.3
│   │   │   │   │   ├── async-executor v1.4.0
│   │   │   │   │   │   ├── async-task v4.0.3
│   │   │   │   │   │   ├── concurrent-queue v1.2.2
│   │   │   │   │   │   │   └── cache-padded v1.1.1
│   │   │   │   │   │   ├── fastrand v1.3.4
│   │   │   │   │   │   ├── futures-lite v1.11.2
│   │   │   │   │   │   │   ├── fastrand v1.3.4
│   │   │   │   │   │   │   ├── futures-core v0.3.8
│   │   │   │   │   │   │   ├── futures-io v0.3.8
│   │   │   │   │   │   │   ├── memchr v2.3.3
│   │   │   │   │   │   │   ├── parking v2.0.0
│   │   │   │   │   │   │   ├── pin-project-lite v0.1.7
│   │   │   │   │   │   │   └── waker-fn v1.0.0
│   │   │   │   │   │   ├── once_cell v1.4.1
│   │   │   │   │   │   └── vec-arena v1.0.0
│   │   │   │   │   ├── async-io v1.1.0
│   │   │   │   │   │   ├── cfg-if v0.1.10
│   │   │   │   │   │   ├── concurrent-queue v1.2.2 (*)
│   │   │   │   │   │   ├── fastrand v1.3.4
│   │   │   │   │   │   ├── futures-lite v1.11.2 (*)
│   │   │   │   │   │   ├── libc v0.2.76
│   │   │   │   │   │   ├── log v0.4.11
│   │   │   │   │   │   │   └── cfg-if v0.1.10
│   │   │   │   │   │   ├── once_cell v1.4.1
│   │   │   │   │   │   ├── parking v2.0.0
│   │   │   │   │   │   ├── polling v1.0.2
│   │   │   │   │   │   │   ├── cfg-if v0.1.10
│   │   │   │   │   │   │   ├── libc v0.2.76
│   │   │   │   │   │   │   └── log v0.4.11 (*)
│   │   │   │   │   │   ├── socket2 v0.3.12
│   │   │   │   │   │   │   ├── cfg-if v0.1.10
│   │   │   │   │   │   │   └── libc v0.2.76
│   │   │   │   │   │   ├── vec-arena v1.0.0
│   │   │   │   │   │   └── waker-fn v1.0.0
│   │   │   │   │   ├── futures-lite v1.11.2 (*)
│   │   │   │   │   ├── num_cpus v1.13.0
│   │   │   │   │   │   └── libc v0.2.76
│   │   │   │   │   └── once_cell v1.4.1
│   │   │   │   ├── async-io v1.1.0 (*)
│   │   │   │   ├── async-mutex v1.1.5
│   │   │   │   │   └── event-listener v2.4.0
│   │   │   │   ├── blocking v1.0.2
│   │   │   │   │   ├── async-channel v1.5.1
│   │   │   │   │   │   ├── concurrent-queue v1.2.2 (*)
│   │   │   │   │   │   ├── event-listener v2.4.0
│   │   │   │   │   │   └── futures-core v0.3.8
│   │   │   │   │   ├── async-task v4.0.3
│   │   │   │   │   ├── atomic-waker v1.0.0
│   │   │   │   │   ├── fastrand v1.3.4
│   │   │   │   │   ├── futures-lite v1.11.2 (*)
│   │   │   │   │   └── once_cell v1.4.1
│   │   │   │   ├── crossbeam-utils v0.8.0
│   │   │   │   │   ├── cfg-if v1.0.0
│   │   │   │   │   ├── const_fn v0.4.2
│   │   │   │   │   └── lazy_static v1.4.0
│   │   │   │   │   [build-dependencies]
│   │   │   │   │   └── autocfg v1.0.0
│   │   │   │   ├── futures-core v0.3.8
│   │   │   │   ├── futures-io v0.3.8
│   │   │   │   ├── futures-lite v1.11.2 (*)
│   │   │   │   ├── kv-log-macro v1.0.7
│   │   │   │   │   └── log v0.4.11 (*)
│   │   │   │   ├── log v0.4.11 (*)
│   │   │   │   ├── memchr v2.3.3
│   │   │   │   ├── num_cpus v1.13.0 (*)
│   │   │   │   ├── once_cell v1.4.1
│   │   │   │   ├── pin-project-lite v0.1.7
│   │   │   │   ├── pin-utils v0.1.0
│   │   │   │   └── slab v0.4.2
│   │   │   ├── native-tls v0.2.4
│   │   │   │   ├── lazy_static v1.4.0
│   │   │   │   ├── libc v0.2.76
│   │   │   │   ├── security-framework v0.4.4
│   │   │   │   │   ├── bitflags v1.2.1
│   │   │   │   │   ├── core-foundation v0.7.0
│   │   │   │   │   │   ├── core-foundation-sys v0.7.0
│   │   │   │   │   │   └── libc v0.2.76
│   │   │   │   │   ├── core-foundation-sys v0.7.0
│   │   │   │   │   ├── libc v0.2.76
│   │   │   │   │   └── security-framework-sys v0.4.3
│   │   │   │   │       ├── core-foundation-sys v0.7.0
│   │   │   │   │       └── libc v0.2.76
│   │   │   │   ├── security-framework-sys v0.4.3 (*)
│   │   │   │   └── tempfile v3.1.0
│   │   │   │       ├── cfg-if v0.1.10
│   │   │   │       ├── libc v0.2.76
│   │   │   │       ├── rand v0.7.3
│   │   │   │       │   ├── getrandom v0.1.14
│   │   │   │       │   │   ├── cfg-if v0.1.10
│   │   │   │       │   │   └── libc v0.2.76
│   │   │   │       │   ├── libc v0.2.76
│   │   │   │       │   ├── rand_chacha v0.2.2
│   │   │   │       │   │   ├── ppv-lite86 v0.2.9
│   │   │   │       │   │   └── rand_core v0.5.1
│   │   │   │       │   │       └── getrandom v0.1.14 (*)
│   │   │   │       │   └── rand_core v0.5.1 (*)
│   │   │   │       └── remove_dir_all v0.5.3
│   │   │   ├── thiserror v1.0.20
│   │   │   │   └── thiserror-impl v1.0.20
│   │   │   │       ├── proc-macro2 v1.0.24
│   │   │   │       │   └── unicode-xid v0.2.1
│   │   │   │       ├── quote v1.0.7
│   │   │   │       │   └── proc-macro2 v1.0.24 (*)
│   │   │   │       └── syn v1.0.53
│   │   │   │           ├── proc-macro2 v1.0.24 (*)
│   │   │   │           ├── quote v1.0.7 (*)
│   │   │   │           └── unicode-xid v0.2.1
│   │   │   └── url v2.2.0
│   │   │       ├── form_urlencoded v1.0.0
│   │   │       │   ├── matches v0.1.8
│   │   │       │   └── percent-encoding v2.1.0
│   │   │       ├── idna v0.2.0
│   │   │       │   ├── matches v0.1.8
│   │   │       │   ├── unicode-bidi v0.3.4
│   │   │       │   │   └── matches v0.1.8
│   │   │       │   └── unicode-normalization v0.1.13
│   │   │       │       └── tinyvec v0.3.4
│   │   │       ├── matches v0.1.8
│   │   │       ├── percent-encoding v2.1.0
│   │   │       └── serde v1.0.117
│   │   │           └── serde_derive v1.0.117
│   │   │               ├── proc-macro2 v1.0.24 (*)
│   │   │               ├── quote v1.0.7 (*)
│   │   │               └── syn v1.0.53 (*)
│   │   ├── async-std v1.7.0 (*)
│   │   ├── async-trait v0.1.38
│   │   │   ├── proc-macro2 v1.0.24 (*)
│   │   │   ├── quote v1.0.7 (*)
│   │   │   └── syn v1.0.53 (*)
│   │   ├── base64 v0.12.3
│   │   ├── bufstream v0.1.4
│   │   ├── fast-socks5 v0.3.1
│   │   │   ├── anyhow v1.0.34
│   │   │   ├── async-std v1.7.0 (*)
│   │   │   ├── futures v0.3.8
│   │   │   │   ├── futures-channel v0.3.8
│   │   │   │   │   ├── futures-core v0.3.8
│   │   │   │   │   └── futures-sink v0.3.8
│   │   │   │   ├── futures-core v0.3.8
│   │   │   │   ├── futures-executor v0.3.8
│   │   │   │   │   ├── futures-core v0.3.8
│   │   │   │   │   ├── futures-task v0.3.8
│   │   │   │   │   │   └── once_cell v1.4.1
│   │   │   │   │   └── futures-util v0.3.8
│   │   │   │   │       ├── futures-channel v0.3.8 (*)
│   │   │   │   │       ├── futures-core v0.3.8
│   │   │   │   │       ├── futures-io v0.3.8
│   │   │   │   │       ├── futures-macro v0.3.8
│   │   │   │   │       │   ├── proc-macro-hack v0.5.19
│   │   │   │   │       │   ├── proc-macro2 v1.0.24 (*)
│   │   │   │   │       │   ├── quote v1.0.7 (*)
│   │   │   │   │       │   └── syn v1.0.53 (*)
│   │   │   │   │       ├── futures-sink v0.3.8
│   │   │   │   │       ├── futures-task v0.3.8 (*)
│   │   │   │   │       ├── memchr v2.3.3
│   │   │   │   │       ├── pin-project v1.0.2
│   │   │   │   │       │   └── pin-project-internal v1.0.2
│   │   │   │   │       │       ├── proc-macro2 v1.0.24 (*)
│   │   │   │   │       │       ├── quote v1.0.7 (*)
│   │   │   │   │       │       └── syn v1.0.53 (*)
│   │   │   │   │       ├── pin-utils v0.1.0
│   │   │   │   │       ├── proc-macro-hack v0.5.19
│   │   │   │   │       ├── proc-macro-nested v0.1.6
│   │   │   │   │       └── slab v0.4.2
│   │   │   │   ├── futures-io v0.3.8
│   │   │   │   ├── futures-sink v0.3.8
│   │   │   │   ├── futures-task v0.3.8 (*)
│   │   │   │   └── futures-util v0.3.8 (*)
│   │   │   ├── log v0.4.11 (*)
│   │   │   └── thiserror v1.0.20 (*)
│   │   ├── fast_chemail v0.9.6
│   │   │   └── ascii_utils v0.9.3
│   │   ├── hostname v0.1.5
│   │   │   └── libc v0.2.76
│   │   ├── log v0.4.11 (*)
│   │   ├── nom v5.1.2
│   │   │   ├── lexical-core v0.7.4
│   │   │   │   ├── arrayvec v0.5.1
│   │   │   │   ├── bitflags v1.2.1
│   │   │   │   ├── cfg-if v0.1.10
│   │   │   │   ├── ryu v1.0.5
│   │   │   │   └── static_assertions v1.1.0
│   │   │   └── memchr v2.3.3
│   │   │   [build-dependencies]
│   │   │   └── version_check v0.9.2
│   │   ├── pin-project v0.4.23
│   │   │   └── pin-project-internal v0.4.23
│   │   │       ├── proc-macro2 v1.0.24 (*)
│   │   │       ├── quote v1.0.7 (*)
│   │   │       └── syn v1.0.53 (*)
│   │   ├── pin-utils v0.1.0
│   │   ├── serde v1.0.117 (*)
│   │   ├── serde_derive v1.0.117 (*)
│   │   ├── serde_json v1.0.57
│   │   │   ├── itoa v0.4.6
│   │   │   ├── ryu v1.0.5
│   │   │   └── serde v1.0.117 (*)
│   │   └── thiserror v1.0.20 (*)
│   ├── async-std v1.7.0 (*)
│   ├── async-std-resolver v0.19.5
│   │   ├── async-std v1.7.0 (*)
│   │   ├── async-trait v0.1.38 (*)
│   │   ├── futures v0.3.8 (*)
│   │   └── trust-dns-resolver v0.19.5
│   │       ├── backtrace v0.3.50
│   │       │   ├── addr2line v0.13.0
│   │       │   │   └── gimli v0.22.0
│   │       │   ├── cfg-if v0.1.10
│   │       │   ├── libc v0.2.76
│   │       │   ├── miniz_oxide v0.4.0
│   │       │   │   └── adler v0.2.3
│   │       │   ├── object v0.20.0
│   │       │   └── rustc-demangle v0.1.16
│   │       ├── cfg-if v0.1.10
│   │       ├── futures v0.3.8 (*)
│   │       ├── lazy_static v1.4.0
│   │       ├── log v0.4.11 (*)
│   │       ├── lru-cache v0.1.2
│   │       │   └── linked-hash-map v0.5.3
│   │       ├── resolv-conf v0.6.3
│   │       │   ├── hostname v0.3.1
│   │       │   │   ├── libc v0.2.76
│   │       │   │   └── match_cfg v0.1.0
│   │       │   └── quick-error v1.2.3
│   │       ├── smallvec v1.4.2
│   │       ├── thiserror v1.0.20 (*)
│   │       └── trust-dns-proto v0.19.5
│   │           ├── async-trait v0.1.38 (*)
│   │           ├── backtrace v0.3.50 (*)
│   │           ├── enum-as-inner v0.3.3
│   │           │   ├── heck v0.3.1
│   │           │   │   └── unicode-segmentation v1.6.0
│   │           │   ├── proc-macro2 v1.0.24 (*)
│   │           │   ├── quote v1.0.7 (*)
│   │           │   └── syn v1.0.53 (*)
│   │           ├── futures v0.3.8 (*)
│   │           ├── idna v0.2.0 (*)
│   │           ├── lazy_static v1.4.0
│   │           ├── log v0.4.11 (*)
│   │           ├── rand v0.7.3 (*)
│   │           ├── smallvec v1.4.2
│   │           ├── thiserror v1.0.20 (*)
│   │           ├── tokio v0.2.22
│   │           │   ├── bytes v0.5.6
│   │           │   ├── fnv v1.0.7
│   │           │   ├── futures-core v0.3.8
│   │           │   ├── iovec v0.1.4
│   │           │   │   └── libc v0.2.76
│   │           │   ├── lazy_static v1.4.0
│   │           │   ├── libc v0.2.76
│   │           │   ├── memchr v2.3.3
│   │           │   ├── mio v0.6.22
│   │           │   │   ├── cfg-if v0.1.10
│   │           │   │   ├── iovec v0.1.4 (*)
│   │           │   │   ├── libc v0.2.76
│   │           │   │   ├── log v0.4.11 (*)
│   │           │   │   ├── net2 v0.2.34
│   │           │   │   │   ├── cfg-if v0.1.10
│   │           │   │   │   └── libc v0.2.76
│   │           │   │   └── slab v0.4.2
│   │           │   ├── mio-uds v0.6.8
│   │           │   │   ├── iovec v0.1.4 (*)
│   │           │   │   ├── libc v0.2.76
│   │           │   │   └── mio v0.6.22 (*)
│   │           │   ├── num_cpus v1.13.0 (*)
│   │           │   ├── pin-project-lite v0.1.7
│   │           │   ├── signal-hook-registry v1.2.1
│   │           │   │   ├── arc-swap v0.4.7
│   │           │   │   └── libc v0.2.76
│   │           │   ├── slab v0.4.2
│   │           │   └── tokio-macros v0.2.5
│   │           │       ├── proc-macro2 v1.0.24 (*)
│   │           │       ├── quote v1.0.7 (*)
│   │           │       └── syn v1.0.53 (*)
│   │           └── url v2.2.0 (*)
│   ├── fast-socks5 v0.3.1 (*)
│   ├── futures v0.3.8 (*)
│   ├── log v0.4.11 (*)
│   ├── mailchecker v3.3.12
│   │   └── fast_chemail v0.9.6 (*)
│   ├── rand v0.7.3 (*)
│   ├── regex v1.4.2
│   │   ├── aho-corasick v0.7.13
│   │   │   └── memchr v2.3.3
│   │   ├── memchr v2.3.3
│   │   ├── regex-syntax v0.6.21
│   │   └── thread_local v1.0.1
│   │       └── lazy_static v1.4.0
│   ├── reqwest v0.10.9
│   │   ├── async-compression v0.3.5
│   │   │   ├── bytes v0.5.6
│   │   │   ├── flate2 v1.0.17
│   │   │   │   ├── cfg-if v0.1.10
│   │   │   │   ├── crc32fast v1.2.0
│   │   │   │   │   └── cfg-if v0.1.10
│   │   │   │   ├── libc v0.2.76
│   │   │   │   └── miniz_oxide v0.4.0 (*)
│   │   │   ├── futures-core v0.3.8
│   │   │   ├── memchr v2.3.3
│   │   │   └── pin-project-lite v0.1.7
│   │   ├── base64 v0.13.0
│   │   ├── bytes v0.5.6
│   │   ├── encoding_rs v0.8.23
│   │   │   └── cfg-if v0.1.10
│   │   ├── futures-core v0.3.8
│   │   ├── futures-util v0.3.8 (*)
│   │   ├── http v0.2.1
│   │   │   ├── bytes v0.5.6
│   │   │   ├── fnv v1.0.7
│   │   │   └── itoa v0.4.6
│   │   ├── http-body v0.3.1
│   │   │   ├── bytes v0.5.6
│   │   │   └── http v0.2.1 (*)
│   │   ├── hyper v0.13.7
│   │   │   ├── bytes v0.5.6
│   │   │   ├── futures-channel v0.3.8 (*)
│   │   │   ├── futures-core v0.3.8
│   │   │   ├── futures-util v0.3.8 (*)
│   │   │   ├── h2 v0.2.6
│   │   │   │   ├── bytes v0.5.6
│   │   │   │   ├── fnv v1.0.7
│   │   │   │   ├── futures-core v0.3.8
│   │   │   │   ├── futures-sink v0.3.8
│   │   │   │   ├── futures-util v0.3.8 (*)
│   │   │   │   ├── http v0.2.1 (*)
│   │   │   │   ├── indexmap v1.5.1
│   │   │   │   │   └── hashbrown v0.8.2
│   │   │   │   │       [build-dependencies]
│   │   │   │   │       └── autocfg v1.0.0
│   │   │   │   │   [build-dependencies]
│   │   │   │   │   └── autocfg v1.0.0
│   │   │   │   ├── slab v0.4.2
│   │   │   │   ├── tokio v0.2.22 (*)
│   │   │   │   ├── tokio-util v0.3.1
│   │   │   │   │   ├── bytes v0.5.6
│   │   │   │   │   ├── futures-core v0.3.8
│   │   │   │   │   ├── futures-sink v0.3.8
│   │   │   │   │   ├── log v0.4.11 (*)
│   │   │   │   │   ├── pin-project-lite v0.1.7
│   │   │   │   │   └── tokio v0.2.22 (*)
│   │   │   │   └── tracing v0.1.19
│   │   │   │       ├── cfg-if v0.1.10
│   │   │   │       ├── log v0.4.11 (*)
│   │   │   │       ├── tracing-attributes v0.1.11
│   │   │   │       │   ├── proc-macro2 v1.0.24 (*)
│   │   │   │       │   ├── quote v1.0.7 (*)
│   │   │   │       │   └── syn v1.0.53 (*)
│   │   │   │       └── tracing-core v0.1.14
│   │   │   │           └── lazy_static v1.4.0
│   │   │   ├── http v0.2.1 (*)
│   │   │   ├── http-body v0.3.1 (*)
│   │   │   ├── httparse v1.3.4
│   │   │   ├── itoa v0.4.6
│   │   │   ├── pin-project v0.4.23 (*)
│   │   │   ├── socket2 v0.3.12 (*)
│   │   │   ├── time v0.1.43
│   │   │   │   └── libc v0.2.76
│   │   │   ├── tokio v0.2.22 (*)
│   │   │   ├── tower-service v0.3.0
│   │   │   ├── tracing v0.1.19 (*)
│   │   │   └── want v0.3.0
│   │   │       ├── log v0.4.11 (*)
│   │   │       └── try-lock v0.2.3
│   │   ├── hyper-tls v0.4.3
│   │   │   ├── bytes v0.5.6
│   │   │   ├── hyper v0.13.7 (*)
│   │   │   ├── native-tls v0.2.4 (*)
│   │   │   ├── tokio v0.2.22 (*)
│   │   │   └── tokio-tls v0.3.1
│   │   │       ├── native-tls v0.2.4 (*)
│   │   │       └── tokio v0.2.22 (*)
│   │   ├── ipnet v2.3.0
│   │   ├── lazy_static v1.4.0
│   │   ├── log v0.4.11 (*)
│   │   ├── mime v0.3.16
│   │   ├── mime_guess v2.0.3
│   │   │   ├── mime v0.3.16
│   │   │   └── unicase v2.6.0
│   │   │       [build-dependencies]
│   │   │       └── version_check v0.9.2
│   │   │   [build-dependencies]
│   │   │   └── unicase v2.6.0 (*)
│   │   ├── native-tls v0.2.4 (*)
│   │   ├── percent-encoding v2.1.0
│   │   ├── pin-project-lite v0.2.0
│   │   ├── serde v1.0.117 (*)
│   │   ├── serde_json v1.0.57 (*)
│   │   ├── serde_urlencoded v0.7.0
│   │   │   ├── form_urlencoded v1.0.0 (*)
│   │   │   ├── itoa v0.4.6
│   │   │   ├── ryu v1.0.5
│   │   │   └── serde v1.0.117 (*)
│   │   ├── tokio v0.2.22 (*)
│   │   ├── tokio-socks v0.3.0
│   │   │   ├── bytes v0.4.12
│   │   │   │   ├── byteorder v1.3.4
│   │   │   │   └── iovec v0.1.4 (*)
│   │   │   ├── either v1.6.0
│   │   │   ├── futures v0.3.8 (*)
│   │   │   ├── thiserror v1.0.20 (*)
│   │   │   └── tokio v0.2.22 (*)
│   │   ├── tokio-tls v0.3.1 (*)
│   │   └── url v2.2.0 (*)
│   ├── serde v1.0.117 (*)
│   ├── serde_json v1.0.57 (*)
│   └── trust-dns-proto v0.19.5 (*)
├── deadpool v0.6.0
│   ├── async-trait v0.1.38 (*)
│   ├── config v0.10.1
│   │   ├── lazy_static v1.4.0
│   │   ├── nom v5.1.2 (*)
│   │   └── serde v1.0.117 (*)
│   ├── crossbeam-queue v0.3.0
│   │   ├── cfg-if v1.0.0
│   │   └── crossbeam-utils v0.8.0 (*)
│   ├── num_cpus v1.13.0 (*)
│   ├── serde v1.0.117 (*)
│   └── tokio v0.3.4
│       └── pin-project-lite v0.2.0
│       [build-dependencies]
│       └── autocfg v1.0.0
├── derive_builder v0.9.0
│   ├── darling v0.10.2
│   │   ├── darling_core v0.10.2
│   │   │   ├── fnv v1.0.7
│   │   │   ├── ident_case v1.0.1
│   │   │   ├── proc-macro2 v1.0.24 (*)
│   │   │   ├── quote v1.0.7 (*)
│   │   │   ├── strsim v0.9.3
│   │   │   └── syn v1.0.53 (*)
│   │   └── darling_macro v0.10.2
│   │       ├── darling_core v0.10.2 (*)
│   │       ├── quote v1.0.7 (*)
│   │       └── syn v1.0.53 (*)
│   ├── derive_builder_core v0.9.0
│   │   ├── darling v0.10.2 (*)
│   │   ├── proc-macro2 v1.0.24 (*)
│   │   ├── quote v1.0.7 (*)
│   │   └── syn v1.0.53 (*)
│   ├── proc-macro2 v1.0.24 (*)
│   ├── quote v1.0.7 (*)
│   └── syn v1.0.53 (*)
├── futures v0.3.8 (*)
├── glob v0.3.0
├── headers v0.3.2
│   ├── base64 v0.12.3
│   ├── bitflags v1.2.1
│   ├── bytes v0.5.6
│   ├── headers-core v0.2.0
│   │   └── http v0.2.1 (*)
│   ├── http v0.2.1 (*)
│   ├── mime v0.3.16
│   ├── sha-1 v0.8.2
│   │   ├── block-buffer v0.7.3
│   │   │   ├── block-padding v0.1.5
│   │   │   │   └── byte-tools v0.3.1
│   │   │   ├── byte-tools v0.3.1
│   │   │   ├── byteorder v1.3.4
│   │   │   └── generic-array v0.12.3
│   │   │       └── typenum v1.12.0
│   │   ├── digest v0.8.1
│   │   │   └── generic-array v0.12.3 (*)
│   │   ├── fake-simd v0.1.2
│   │   └── opaque-debug v0.2.3
│   └── time v0.1.43 (*)
├── http v0.2.1 (*)
├── hubcaps v0.6.2
│   ├── base64 v0.12.3
│   ├── data-encoding v2.3.0
│   ├── futures v0.3.8 (*)
│   ├── http v0.2.1 (*)
│   ├── hyperx v1.0.0
│   │   ├── base64 v0.11.0
│   │   ├── bytes v0.5.6
│   │   ├── http v0.2.1 (*)
│   │   ├── httparse v1.3.4
│   │   ├── language-tags v0.2.2
│   │   ├── log v0.4.11 (*)
│   │   ├── mime v0.3.16
│   │   ├── percent-encoding v2.1.0
│   │   ├── time v0.1.43 (*)
│   │   └── unicase v2.6.0 (*)
│   ├── jsonwebtoken v7.2.0
│   │   ├── base64 v0.12.3
│   │   ├── pem v0.8.1
│   │   │   ├── base64 v0.12.3
│   │   │   ├── once_cell v1.4.1
│   │   │   └── regex v1.4.2 (*)
│   │   ├── ring v0.16.15
│   │   │   ├── spin v0.5.2
│   │   │   └── untrusted v0.7.1
│   │   │   [build-dependencies]
│   │   │   └── cc v1.0.59
│   │   ├── serde v1.0.117 (*)
│   │   ├── serde_json v1.0.57 (*)
│   │   └── simple_asn1 v0.4.1
│   │       ├── chrono v0.4.15
│   │       │   ├── num-integer v0.1.43
│   │       │   │   └── num-traits v0.2.12
│   │       │   │       [build-dependencies]
│   │       │   │       └── autocfg v1.0.0
│   │       │   │   [build-dependencies]
│   │       │   │   └── autocfg v1.0.0
│   │       │   ├── num-traits v0.2.12 (*)
│   │       │   └── time v0.1.43 (*)
│   │       ├── num-bigint v0.2.6
│   │       │   ├── num-integer v0.1.43 (*)
│   │       │   └── num-traits v0.2.12 (*)
│   │       │   [build-dependencies]
│   │       │   └── autocfg v1.0.0
│   │       └── num-traits v0.2.12 (*)
│   ├── log v0.4.11 (*)
│   ├── mime v0.3.16
│   ├── percent-encoding v2.1.0
│   ├── reqwest v0.10.9 (*)
│   ├── serde v1.0.117 (*)
│   ├── serde_derive v1.0.117 (*)
│   ├── serde_json v1.0.57 (*)
│   └── url v2.2.0 (*)
├── indicatif v0.15.0
│   ├── console v0.12.0
│   │   ├── lazy_static v1.4.0
│   │   ├── libc v0.2.76
│   │   ├── regex v1.4.2 (*)
│   │   ├── terminal_size v0.1.13
│   │   │   └── libc v0.2.76
│   │   ├── termios v0.3.2
│   │   │   └── libc v0.2.76
│   │   └── unicode-width v0.1.8
│   ├── lazy_static v1.4.0
│   ├── number_prefix v0.3.0
│   └── regex v1.4.2 (*)
├── lazy_static v1.4.0
├── linkify v0.4.0
│   └── memchr v2.3.3
├── log v0.4.11 (*)
├── pretty_env_logger v0.4.0
│   ├── env_logger v0.7.1
│   │   ├── atty v0.2.14
│   │   │   └── libc v0.2.76
│   │   ├── humantime v1.3.0
│   │   │   └── quick-error v1.2.3
│   │   ├── log v0.4.11 (*)
│   │   ├── regex v1.4.2 (*)
│   │   └── termcolor v1.1.0
│   └── log v0.4.11 (*)
├── pulldown-cmark v0.8.0
│   ├── bitflags v1.2.1
│   ├── getopts v0.2.21
│   │   └── unicode-width v0.1.8
│   ├── memchr v2.3.3
│   └── unicase v2.6.0 (*)
├── quick-xml v0.20.0
│   └── memchr v2.3.3
├── regex v1.4.2 (*)
├── reqwest v0.10.9 (*)
├── serde v1.0.117 (*)
├── shellexpand v2.0.0
│   └── dirs v2.0.2
│       ├── cfg-if v0.1.10
│       └── dirs-sys v0.3.5
│           └── libc v0.2.76
├── structopt v0.3.21
│   ├── clap v2.33.3
│   │   ├── ansi_term v0.11.0
│   │   ├── atty v0.2.14 (*)
│   │   ├── bitflags v1.2.1
│   │   ├── strsim v0.8.0
│   │   ├── textwrap v0.11.0
│   │   │   └── unicode-width v0.1.8
│   │   ├── unicode-width v0.1.8
│   │   └── vec_map v0.8.2
│   ├── lazy_static v1.4.0
│   └── structopt-derive v0.4.14
│       ├── heck v0.3.1 (*)
│       ├── proc-macro-error v1.0.4
│       │   ├── proc-macro-error-attr v1.0.4
│       │   │   ├── proc-macro2 v1.0.24 (*)
│       │   │   └── quote v1.0.7 (*)
│       │   │   [build-dependencies]
│       │   │   └── version_check v0.9.2
│       │   ├── proc-macro2 v1.0.24 (*)
│       │   ├── quote v1.0.7 (*)
│       │   └── syn v1.0.53 (*)
│       │   [build-dependencies]
│       │   └── version_check v0.9.2
│       ├── proc-macro2 v1.0.24 (*)
│       ├── quote v1.0.7 (*)
│       └── syn v1.0.53 (*)
├── tokio v0.2.22 (*)
├── toml v0.5.7
│   └── serde v1.0.117 (*)
└── url v2.2.0 (*)
[dev-dependencies]
├── assert_cmd v1.0.2
│   ├── doc-comment v0.3.3
│   ├── predicates v1.0.5
│   │   ├── difference v2.0.0
│   │   ├── float-cmp v0.8.0
│   │   │   └── num-traits v0.2.12 (*)
│   │   ├── normalize-line-endings v0.3.0
│   │   ├── predicates-core v1.0.0
│   │   └── regex v1.4.2 (*)
│   ├── predicates-core v1.0.0
│   ├── predicates-tree v1.0.0
│   │   ├── predicates-core v1.0.0
│   │   └── treeline v0.1.0
│   └── wait-timeout v0.2.0
│       └── libc v0.2.76
├── predicates v1.0.5 (*)
├── tempfile v3.1.0 (*)
├── uuid v0.8.1
│   └── rand v0.7.3 (*)
└── wiremock v0.3.0
    ├── async-h1 v2.1.2
    │   ├── async-std v1.7.0 (*)
    │   ├── byte-pool v0.2.2
    │   │   ├── crossbeam-queue v0.2.3
    │   │   │   ├── cfg-if v0.1.10
    │   │   │   ├── crossbeam-utils v0.7.2
    │   │   │   │   ├── cfg-if v0.1.10
    │   │   │   │   └── lazy_static v1.4.0
    │   │   │   │   [build-dependencies]
    │   │   │   │   └── autocfg v1.0.0
    │   │   │   └── maybe-uninit v2.0.0
    │   │   └── stable_deref_trait v1.2.0
    │   ├── futures-core v0.3.8
    │   ├── http-types v2.5.0
    │   │   ├── anyhow v1.0.34
    │   │   ├── async-std v1.7.0 (*)
    │   │   ├── cookie v0.14.2
    │   │   │   ├── aes-gcm v0.6.0
    │   │   │   │   ├── aead v0.3.2
    │   │   │   │   │   └── generic-array v0.14.4
    │   │   │   │   │       └── typenum v1.12.0
    │   │   │   │   │       [build-dependencies]
    │   │   │   │   │       └── version_check v0.9.2
    │   │   │   │   ├── aes v0.4.0
    │   │   │   │   │   ├── aes-soft v0.4.0
    │   │   │   │   │   │   ├── block-cipher v0.7.1
    │   │   │   │   │   │   │   └── generic-array v0.14.4 (*)
    │   │   │   │   │   │   ├── byteorder v1.3.4
    │   │   │   │   │   │   └── opaque-debug v0.2.3
    │   │   │   │   │   └── block-cipher v0.7.1 (*)
    │   │   │   │   ├── block-cipher v0.7.1 (*)
    │   │   │   │   ├── ghash v0.3.0
    │   │   │   │   │   └── polyval v0.4.1
    │   │   │   │   │       ├── cfg-if v0.1.10
    │   │   │   │   │       └── universal-hash v0.4.0
    │   │   │   │   │           ├── generic-array v0.14.4 (*)
    │   │   │   │   │           └── subtle v2.3.0
    │   │   │   │   └── subtle v2.3.0
    │   │   │   ├── base64 v0.12.3
    │   │   │   ├── hkdf v0.9.0
    │   │   │   │   ├── digest v0.9.0
    │   │   │   │   │   └── generic-array v0.14.4 (*)
    │   │   │   │   └── hmac v0.8.1
    │   │   │   │       ├── crypto-mac v0.8.0
    │   │   │   │       │   ├── generic-array v0.14.4 (*)
    │   │   │   │       │   └── subtle v2.3.0
    │   │   │   │       └── digest v0.9.0 (*)
    │   │   │   ├── hmac v0.8.1 (*)
    │   │   │   ├── percent-encoding v2.1.0
    │   │   │   ├── rand v0.7.3 (*)
    │   │   │   ├── sha2 v0.9.1
    │   │   │   │   ├── block-buffer v0.9.0
    │   │   │   │   │   └── generic-array v0.14.4 (*)
    │   │   │   │   ├── cfg-if v0.1.10
    │   │   │   │   ├── cpuid-bool v0.1.2
    │   │   │   │   ├── digest v0.9.0 (*)
    │   │   │   │   └── opaque-debug v0.3.0
    │   │   │   └── time v0.2.22
    │   │   │       ├── const_fn v0.4.2
    │   │   │       ├── libc v0.2.76
    │   │   │       ├── standback v0.2.11
    │   │   │       │   [build-dependencies]
    │   │   │       │   └── version_check v0.9.2
    │   │   │       └── time-macros v0.1.1
    │   │   │           ├── proc-macro-hack v0.5.19
    │   │   │           └── time-macros-impl v0.1.1
    │   │   │               ├── proc-macro-hack v0.5.19
    │   │   │               ├── proc-macro2 v1.0.24 (*)
    │   │   │               ├── quote v1.0.7 (*)
    │   │   │               ├── standback v0.2.11 (*)
    │   │   │               └── syn v1.0.53 (*)
    │   │   │       [build-dependencies]
    │   │   │       └── version_check v0.9.2
    │   │   │   [build-dependencies]
    │   │   │   └── version_check v0.9.2
    │   │   ├── http v0.2.1 (*)
    │   │   ├── infer v0.2.3
    │   │   ├── pin-project-lite v0.1.7
    │   │   ├── rand v0.7.3 (*)
    │   │   ├── serde v1.0.117 (*)
    │   │   ├── serde_json v1.0.57 (*)
    │   │   ├── serde_qs v0.7.0
    │   │   │   ├── data-encoding v2.3.0
    │   │   │   ├── percent-encoding v2.1.0
    │   │   │   ├── serde v1.0.117 (*)
    │   │   │   └── thiserror v1.0.20 (*)
    │   │   ├── serde_urlencoded v0.7.0 (*)
    │   │   └── url v2.2.0 (*)
    │   ├── httparse v1.3.4
    │   ├── lazy_static v1.4.0
    │   ├── log v0.4.11 (*)
    │   └── pin-project-lite v0.1.7
    ├── async-std v1.7.0 (*)
    ├── bastion v0.4.3
    │   ├── anyhow v1.0.34
    │   ├── async-mutex v1.1.5 (*)
    │   ├── bastion-executor v0.4.0
    │   │   ├── arrayvec v0.5.1
    │   │   ├── bastion-utils v0.3.2
    │   │   ├── crossbeam-channel v0.4.3
    │   │   │   ├── cfg-if v0.1.10
    │   │   │   └── crossbeam-utils v0.7.2 (*)
    │   │   ├── crossbeam-epoch v0.8.2
    │   │   │   ├── cfg-if v0.1.10
    │   │   │   ├── crossbeam-utils v0.7.2 (*)
    │   │   │   ├── lazy_static v1.4.0
    │   │   │   ├── maybe-uninit v2.0.0
    │   │   │   ├── memoffset v0.5.5
    │   │   │   │   [build-dependencies]
    │   │   │   │   └── autocfg v1.0.0
    │   │   │   └── scopeguard v1.1.0
    │   │   │   [build-dependencies]
    │   │   │   └── autocfg v1.0.0
    │   │   ├── crossbeam-queue v0.2.3 (*)
    │   │   ├── crossbeam-utils v0.7.2 (*)
    │   │   ├── futures-timer v3.0.2
    │   │   ├── lazy_static v1.4.0
    │   │   ├── lever v0.1.1-alpha.11
    │   │   │   ├── anyhow v1.0.34
    │   │   │   ├── crossbeam-epoch v0.8.2 (*)
    │   │   │   ├── lazy_static v1.4.0
    │   │   │   ├── log v0.4.11 (*)
    │   │   │   ├── parking_lot v0.11.0
    │   │   │   │   ├── instant v0.1.6
    │   │   │   │   ├── lock_api v0.4.1
    │   │   │   │   │   └── scopeguard v1.1.0
    │   │   │   │   └── parking_lot_core v0.8.0
    │   │   │   │       ├── cfg-if v0.1.10
    │   │   │   │       ├── instant v0.1.6
    │   │   │   │       ├── libc v0.2.76
    │   │   │   │       └── smallvec v1.4.2
    │   │   │   └── thiserror v1.0.20 (*)
    │   │   ├── libc v0.2.76
    │   │   ├── lightproc v0.3.5
    │   │   │   ├── crossbeam-utils v0.7.2 (*)
    │   │   │   └── pin-utils v0.1.0
    │   │   ├── num_cpus v1.13.0 (*)
    │   │   ├── once_cell v1.4.1
    │   │   ├── pin-utils v0.1.0
    │   │   └── tracing v0.1.19 (*)
    │   ├── crossbeam-queue v0.2.3 (*)
    │   ├── futures v0.3.8 (*)
    │   ├── futures-timer v3.0.2
    │   ├── fxhash v0.2.1
    │   │   └── byteorder v1.3.4
    │   ├── lazy_static v1.4.0
    │   ├── lever v0.1.1-alpha.11 (*)
    │   ├── lightproc v0.3.5 (*)
    │   ├── nuclei v0.1.2-alpha.1
    │   │   ├── agnostik v0.1.5
    │   │   │   ├── bastion-executor v0.3.6
    │   │   │   │   ├── arrayvec v0.5.1
    │   │   │   │   ├── bastion-utils v0.3.2
    │   │   │   │   ├── crossbeam-channel v0.4.3 (*)
    │   │   │   │   ├── crossbeam-epoch v0.8.2 (*)
    │   │   │   │   ├── crossbeam-utils v0.7.2 (*)
    │   │   │   │   ├── futures-timer v3.0.2
    │   │   │   │   ├── lazy_static v1.4.0
    │   │   │   │   ├── libc v0.2.76
    │   │   │   │   ├── lightproc v0.3.5 (*)
    │   │   │   │   ├── num_cpus v1.13.0 (*)
    │   │   │   │   └── pin-utils v0.1.0
    │   │   │   ├── lightproc v0.3.5 (*)
    │   │   │   └── once_cell v1.4.1
    │   │   ├── futures v0.3.8 (*)
    │   │   ├── futures-io v0.3.8
    │   │   ├── futures-util v0.3.8 (*)
    │   │   ├── lever v0.1.1-alpha.11 (*)
    │   │   ├── libc v0.2.76
    │   │   ├── once_cell v1.4.1
    │   │   ├── pin-utils v0.1.0
    │   │   └── socket2 v0.3.12 (*)
    │   ├── pin-utils v0.1.0
    │   ├── serde v1.0.117 (*)
    │   ├── serde_json v1.0.57 (*)
    │   ├── tracing v0.1.19 (*)
    │   ├── tracing-subscriber v0.2.11
    │   │   ├── ansi_term v0.12.1
    │   │   ├── chrono v0.4.15 (*)
    │   │   ├── lazy_static v1.4.0
    │   │   ├── matchers v0.0.1
    │   │   │   └── regex-automata v0.1.9
    │   │   │       ├── byteorder v1.3.4
    │   │   │       └── regex-syntax v0.6.21
    │   │   ├── regex v1.4.2 (*)
    │   │   ├── serde v1.0.117 (*)
    │   │   ├── serde_json v1.0.57 (*)
    │   │   ├── sharded-slab v0.0.9
    │   │   │   └── lazy_static v1.4.0
    │   │   ├── smallvec v1.4.2
    │   │   ├── thread_local v1.0.1 (*)
    │   │   ├── tracing-core v0.1.14 (*)
    │   │   ├── tracing-log v0.1.1
    │   │   │   ├── lazy_static v1.4.0
    │   │   │   ├── log v0.4.11 (*)
    │   │   │   └── tracing-core v0.1.14 (*)
    │   │   └── tracing-serde v0.1.1
    │   │       ├── serde v1.0.117 (*)
    │   │       └── tracing-core v0.1.14 (*)
    │   └── uuid v0.8.1 (*)
    ├── futures-timer v3.0.2
    ├── http-types v2.5.0 (*)
    ├── log v0.4.11 (*)
    ├── regex v1.4.2 (*)
    ├── serde v1.0.117 (*)
    └── serde_json v1.0.57 (*)

The ones marked with * are duplicate ones.
Cleaned it up with my magic shell skills. (Yes, the command could be improved but you get the idea.)

cargo tree | grep "(*)" | tr -dc '[:alnum:]\.\n\r' | sort | uniq -c | sort -r

Result for more than one occurence:

  22 logv0.4.11
  17 serdev1.0.117
  17 quotev1.0.7
  17 procmacro2v1.0.24
  15 synv1.0.53
   9 httpv0.2.1
   9 futuresv0.3.8
   8 tokiov0.2.22
   8 serdejsonv1.0.57
   8 regexv1.4.2
   7 thiserrorv1.0.20
   7 asyncstdv1.7.0
   5 urlv2.2.0
   5 randv0.7.3
   5 numcpusv1.13.0
   5 genericarrayv0.14.4
   5 futuresutilv0.3.8
   5 crossbeamutilsv0.7.2
   4 numtraitsv0.2.12
   4 futureslitev1.11.2
   3 unicasev2.6.0
   3 tracingv0.1.19
   3 tracingcorev0.1.14
   3 timev0.1.43
   3 nativetlsv0.2.4
   3 lightprocv0.3.5
   3 iovecv0.1.4
   3 asynctraitv0.1.38
   2 socket2v0.3.12
   2 serdederivev1.0.117
   2 reqwestv0.10.9
   2 leverv0.1.1alpha.11
   2 futurestaskv0.3.8
   2 futureschannelv0.3.8
   2 digestv0.9.0
   2 crossbeamqueuev0.2.3
   2 crossbeamepochv0.8.2
   2 concurrentqueuev1.2.2
   2 blockcipherv0.7.1

Don't know yet what to make from this but I wanted to note it down for inspection later.
One idea is to not pin the dependencies to a patch version but rather to a minor version for pre-1.0 crates and a major version for post-1.0 crates.

Add recursive option

It would be nice to pass a URL and have it crawl the entire website recursively looking for dead links.

In order to avoid crawling the entire internet, it should stop recursing once a request no longer matches the original domain.

Question: Should `Accept: text/html` be a default header?

Crates.io returns 404 when not specifying a custom Accept: text/html header.

I wonder if we should make that header a default for all requests.
Not sure how many other websites have the same issue, but I'm wondering if it can hurt to specify the encoding just in case.
On the other side there might be scenarios where we expect JSON as a response or other encodings,
so it might be a bad idea. Still wanted to get it out there.

Does anyone have any experience with this?

lychee.toml

Providing a lot of command-line flags can get tedious and repetitive quite quickly.
Consider the following example:

lychee --verbose --progress --max-redirects 2 --user-agent "lychee" --insecure --exclude "github.com --exclude-all-private --timeout 10

As an alternative I'd love to have a lychee.toml file, which can be used instead of manually specifying the flags:

verbose = true
progress = true
max-redirects = 2
user-agent = lychee
insecure = true
exclude = [
  "github.com"
]
exclude-all-private = true
timeout = 10

Then whenever lychee is called, it would look for the lychee.toml file and read the parameters. Parameters could still be overwritten from the command-line. (E.g. lychee --timeout=11 would overwrite the timeout from the config.)

If you're interested in working on this, add a comment and I'll assign it to you.

Make lychee usable as a library

Realized that some other projects might profit from a good link checker.
One such project that comes to mind is zola, which uses a custom but basic link checker right now.

TODO:

  • Split code into lib.rs and main.rs
  • Use lib.rs from main.rs
  • Bonus: split lib into separate crate that can be published on crates.io

If anyone is interested in giving this a shot, feel free to add a comment below so I can assign it to you. Thanks!

missing features in comparison table

Hi!

By default, some crawlers (like linkchecker and the w3c link checker) respect robots.txt (because they are, after all, bots). One of the thing that got me involved in linkchecker is the ability to disable that for some sites, actually..

Is that a feature that lychee supports? Either way, it should probably be listed in the table.

Same with GUI support: linkchecker has a GUI, not sure if that's the case for lychee or the others. Oh and plugins, we have plugins too. :)

I suspect there might be other such features missing here... heck, just looking at the linkchecker readme, i find:

  • recurses
  • GUI
  • web interface
  • robots.txt
  • plugins
  • multiple output (HTML, SQL, CSV, XML, Sitemap, text)
  • regex filters (or are those included in the "exclude filters" bit?)
  • proxy support
  • telnet, FTP, news:, nntp support
  • cookie support
  • html5 support (although to be honest I have no idea what that actually means)

Those are provided through plugins:

  • anchor checks
  • PDF parsing
  • word document parsing
  • HTTPS expiration check
  • virus checks
  • content search for regex
  • w3c syntax check

Use structopt-toml as a config loader?

I just found structopt-toml and it looks like it covers all of what we need for our config loader. The code is quite similar and also uses serde in the background. The advantage is that we could outsource some code to this crate and avoid duplication of default values in our config.rs.

Here is an example from their docs:

use serde_derive::Deserialize;
use structopt::StructOpt;
use structopt_toml::StructOptToml;

#[derive(Debug, Deserialize, StructOpt, StructOptToml)]
#[serde(default)]
struct Opt {
    #[structopt(default_value = "0", short = "a")]
    a: i32,
    #[structopt(default_value = "0", short = "b")]
    b: i32,
}

fn main() {
    let toml_str = r#"
        a = 10
    "#;
    let opt = Opt::from_args_with_toml(toml_str).expect("toml parse failed");
    println!("a:{}", opt.a);
    println!("b:{}", opt.b);
}

@pawroman, @akrantz01 what do you think? Am I missing a feature here that we need?
If anyone wants to work on a PR, feel free to comment here.

Can't compile due to "error: reached the type-length limit"

❯ cargo build
   Compiling lychee v0.3.0 (/Users/xiaochuanyu/stuff/lychee)
error: reached the type-length limit while instantiating `<std::vec::IntoIter<std::future:...]>>, ()}]>, ()}]>}]>>::Future}]>`
    --> /Users/xiaochuanyu/.rustup/toolchains/stable-x86_64-apple-darwin/lib/rustlib/src/rust/src/libcore/iter/traits/iterator.rs:2015:5
     |
2015 | /     fn fold<B, F>(mut self, init: B, mut f: F) -> B
2016 | |     where
2017 | |         Self: Sized,
2018 | |         F: FnMut(B, Self::Item) -> B,
...    |
2024 | |         accum
2025 | |     }
     | |_____^
     |
     = note: consider adding a `#![type_length_limit="2332107"]` attribute to your crate

I tried doing whats suggested but got similar error again with higher suggested limit.
Eventually adding #![type_length_limit="7912782"] to top of main.rs worked.

Link Checker Report

Errors were reported while checking the availability of links.


📝 Summary
---------------------
🔍 Total...........30
✅ Successful......29
⏳ Timeouts.........0
🔀 Redirected.......0
👻 Excluded.........0
🚫 Errors...........1

Input: README.md
   🚫 https://example.org/README.md
      Failed (404 Not Found)


Full Github Actions output

Use colored output

Introduction

We add some emojis to lychees output but to easily separate failures from mere status updates we could add some color.

How to contribute

Add the colored crate as a project dependency and adjust the println! statement accordingly. Errors should be red and successes green.
Feel free to experiment with other crates or outputs.

Simplified `check` method that accepts &str

Introduction

The current check method of our client only accepts a URI, so users have to do this slightly awkward dance to get a link checked:

let url = Url::parse("https://github.com/lycheeverse/lychee")?;
let response = client.check(Website(url)).await;

We've been discussing whether we should support a simplified interface as well that simply accepts strings:

let response = client.check("https://github.com/lycheeverse/lychee").await;

How to contribute

If you want to tackle that, please propose a signature for the simplified checker here.
We'd be particularly interested about the trade-offs of each approach before starting to implement it.

❤️ Help publish lychee for common platforms

Introduction

lychee is getting more mature and we'd like to share the future v1 with a broader audience.
For that we need an easy way to install the binary.

Package Managers

Preferably, all of our packages should have versions for these CPU architectures:
x86_64, arm and aarch64 (64-bit ARM)

(This list is probably incomplete and should be extended.)

Language Bindings

  • NPM (JavaScript bindings using Neon)
  • pip (Python Bindings using pyo3)

How to contribute

Please pick one of the above targets and create a pull request.

Inspiration:

Footnotes

  1. Maybe using cargo-deb

Include patterns

We have ways to exclude URLs via regex, but no ways to include them.
There should be support for a --include parameter which takes a regex similar to --exclude.
Preferably the includes and excludes should be handled in order of appearance on the commandline.
This way we could allow excluding a range of urls while still allowing for exceptions, e.g.
--exclude "github.com" --include "github.com/hello-rust/lychee".
(Most users will probably only use a single exclude or include.)

Help is very welcome. Add a comment here if you'd like to work on this and I'll assign it to you.

--exclude option excludes input files

I just recognised that the --exclude option takes all following arguments as exclude expressions. It took me a while understand why Error: Failed to read file: README.md shows up even that I passed a lot of other files and globs as input 😄. -- or another option (argument starting with dash) ends the exclude arguments.

If this is expected, probably a note in README + help text would be good, that -- shall be used to end the list of exclude arguments before the input file arguments start.

Otherwise, probably --exclude could use a single argument only. It should be enough since (a|b|c) regex syntax can be used to concatenate multiple expressions anyway? Or --exclude could be allowed multiple times.

Not sure what is best here and it is not really a bug, but since it took me 30 minutes to find out what is wrong, I thought it's a good idea to prevent other users from running into the same 😄.

Wrong error on emails in Gitlab-CI

Hi,
lychee in GitLab-CI wrongly reports error on email addresses, see:
https://salsa.debian.org/faust/go-team.pages.debian.net/-/jobs/1427125#L150-L151

screenshot_20210209_154154

Here is the yaml part:

check:
  stage: check
  script:
    - apt-get update -qq && apt-get install -y -qq wget
    - wget -qO- "https://github.com/lycheeverse/lychee/releases/download/v0.5.0/lychee-v0.5.0-x86_64-unknown-linux-gnu.tar.gz" | tar -xz
    - ./lychee --verbose --exclude="irc://irc.debian.org:6667" --exclude="https://anonscm.debian.org" build/*.html

When running it locally, no error is detected:

./lychee --verbose --exclude="irc://irc.debian.org:6667" --exclude="https://anonscm.debian.org" build/*.html | grep @
[email protected] [200 OK]
[email protected] [200 OK]

Am I missing something?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.