Git Product home page Git Product logo

deadlinks's Introduction

deadlinks

Stand with Ukraine PyPI Github (CI) codecov PyPI - Downloads

Health checks for your documentation links.

United 24 Help Oleg Butuzov


Features

  • Concurrent and recursive checks
  • Respect robots.txt restrictions (content only)
  • External links checks
  • Checking links within base url path
  • Retries in the case of 502, 503 and 504 http errors

Installing

Using package installer for Python

# using pip - package installer for Python
pip install deadlinks

Mac

# we using custom tap to install deadlinks
brew install butuzov/deadlinks/deadlinks

Using forked repo for development propose.

# activate virtual environment to keep your local site-packages clean.
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip

# in case if you developing within forked repository
cd /home/path/to/deadlinks
pip install -r requirements.txt
pip install -e .

Usage

See more examples at docs

# Check links (including external) at http://gobyexample.com/ in 10 threads,
# but not ones that leading to domains play.golang.org or github.com
deadlinks gobyexample.com -n 10 -e -d play.golang.org -d github.com

# Limiting check only to links found within /docs path.
deadlinks http://localhost:1313/docs

# Running checks for all local links that belong to a domain.
deadlinks http://localhost:1313/docs/ -n 10 --full-site-check

# Checking local html files
deadlinks internal -n 10 --root=/var/html

# Help yourself
deadlinks --help

Contributing

Here is a quick start guide to contributing to deadlinks

  • Fork deadlinks repository.
  • Create feature branch based on develop.
  • Install package using development instructions.
  • Implement your feature and test it with make tests and make lints.
  • Create pull request back to development branch.

All your contributions are welcome!

deadlinks's People

Contributors

butuzov avatar dependabot[bot] avatar pyup-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

deadlinks's Issues

Mapping URL

Short Description

Add way to check exernal url if local fails.

Scenario

  1. CI checks local website after generating static artefacts on domain localhost:8080
  2. dev) localhost:8080/old/path fails
  3. prod) domain.com/docs, /old/path redirected to domain.com/docs/new/feature
  4. so in case of local fail, we suppose to ask prod to check same url (somehow)

Parsing Links

Research possibility to use parsers instead regular expression for links extraction.

Usage example

if you type

deadlinks --help

Usage Example doesn't have new lines.

link changed and not found.

Scenario

  • One of the links on the page has trailing slash, but deadlinks, removes it (trailing slash).
  • Webserver return 404 if trailing slash not available.

Set changed size during iteration

Tests case

deadlinks https://devdocs.magento.com/ -n 10 -s all -p "blob/master" -p "issues/new"  -d "jira"  | tee devdocs.txt

RuntimeError: Set changed size during iteration

Using sets wasn't so great idea.

Distribution - Brew

Create Formula for Brew

Checkout ansible and youtube-dl as examples of python packages distributed via homebrew

url decoding

Test for different possible URL decoding bugs.

Redirection tracking

Case

  1. get page at https://domain.com/path
  2. redirection in action https://domain.com/path -> https://domain.com/path/
  3. get links at /path (link1 and link2)
  4. create new links (https://domain.com/link1 and https://domain.com/link2), while real url https://domain.com/path/link1 and https://domain.com/path/link2

running cli app without scheme

> deadlinks localhost:1313 -s all
Usage:
  deadlinks <URL> [OPTIONS]
Try "deadlinks --help" for help.

Error: URL localhost:1313 is not valid

Its better for users to, catch this error and prepand "http" silently.

Python Version Upgrade

Few Blockers before moving to next version supported.

  1. reppy - seomoz/reppy#122
  2. new flow (golden master)
  3. CI update (switch to github actions or other)
  4. time to do work on this

stay within path

I need to limit the app by default to stay within the documentation path, and maybe allow to index the whole website with some knob.

Scenario

  1. Ask to index localhost/docs/projectname/ (which simply serves static files within $root/docs/projectname folder).
  2. All links found that points to the localhost domain are valid, but only if they also starts with /docs/projectname/
  3. If not, links declare to be external.

TODO

  • Implement
  • Introduce cli args
  • Tests
  • Documentation

Whoud you mind to add an `timeout` argument?

This program is great! But it runs very slow when it comes to too many deadlinks.
All the threads are just stuck for a very long time. It'll take hours to finish the progress.
So, I wonder if we can set a timeout argument, let's say, 10s, would greatly improve the experience!

Release v0.3.3

  • docs
  • release pipeline
  • changelog
  • pipy package
  • brew tap
  • docker image

default `user_agent`

App needs to have default user_agent, and way to redefine it.

Pros:

  • robots.txt comlience - #8

Cons:

  • None

Continues Integration

Maybe it's will be useful to create a ci option to run commands before running crawler.

So, for example, deadlinks will be able to run some webserver to server HTML from directory.

termination

Program should still return report if SIGINT received.

Github Action

maybe it worth time to create github action for deadlinks?

Github pipelines - docs & brew

  • release pipeline doens't reacts on new tag that comes from github
  • automade docs publishig from docs to @docs
  • automate bew formula bump

Pre Release testing

Need more tests for prerelease (end to end) against dev brew and dev package.

Redirections

In Addition to the #30 it would be great to server netlify redirections.

pypi.org and package name

Goal

  • Obtain deadlinks package name.
  • Build travis-ci deploy pipeline for development branch
  • Build travis-ci deploy pipeline for master branch

Initial Update

The bot created this issue to inform you that pyup.io has been set up on this repo.
Once you have closed it, the bot will open pull requests for updates as soon as they are available.

robots.txt

We need to respect robots.txt

  1. It's not supposed to exists for local dev.
  2. It can prevent of DoS'ing website.

It's a complex issue, of which domain we should respect (ideally - all).

  • (at least) documentation domain.
  • (at most) all others.

Pros:

  • It's right thing to do.

Cons:

  • Slower indexation/crawling.

User Case: Localization (Originals Checks)

This is draft of idea to have localization checked. Idea come true after founds at kubernetes.io

deadlinks kubernetes.io/ko -n10 -s ignored | \ 
        grep kubernetes.io/docs | \
        awk '{print $4}' | \
        sed 's/.io\/docs/.io\/ko\/docs/g' | \
        xargs -I {} sh -c "deadlinks {} -single -short"

Tagline?

deadlinks deserves better tagline

Current variations:

  • (github/header) CLI app checker for dead links in the generated html documentation https://pypi.org/project/deadlinks
  • (github/readme) deadlinks is a simple cli tool to check your documentation/website for deadlinks.
  • (pypi) CLI/API for links liveness checking.
  • (brew) CLI/API for links liveness checking

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.