umbrelladocs / linkspector Goto Github PK
View Code? Open in Web Editor NEWUncover broken links in your content.
License: Apache License 2.0
Uncover broken links in your content.
License: Apache License 2.0
The example below uses docker #23
git clone https://github.com/UmbrellaDocs/linkspector
cd linkspector
docker build --build-arg LINKSPECTOR_PACKAGE= -t umbrelladocs/linkspector .
linkspector check
on a test directory
mkdir -p test/nested
cp .gitignore test # due to https://github.com/UmbrellaDocs/linkspector/issues/24
echo '[test-relative](../test.md)' > test/nested/nested.md
echo '[nested-absolute](/nested/nested.md)' > test/test.md
echo '[nested-relative](./nested/nested.md)' >> test/test.md
docker run --rm -it -v $PWD/test:/app --name linkspector umbrelladocs/linkspector bash -c 'linkspector check'
Output
⠋ Configuration file not found. Using default configuration.
🚫 nested/nested.md, ../test.md, 404, 1, Cannot find: ../test.md.
❌ Error: Some links in the specified files are not valid.
It is expected that 'linkspector check' passes. It is a reproduction example for #22
Line 49 in 71b8ebd
typo: Connot
-> Cannot
During the merge of #23, the following points were displaced. It may be good to move them to their original location.
Linkspector starts checking the hyperlinks in your files based on the configuration provided in the configuration file or using the default configuration. It then displays the results in your terminal.
After the check is complete, Linkspector provides a summary of the results. If any dead links are found, they are listed in the terminal, along with their status codes and error messages.
If no dead links are found, Linkspector displays a success message, indicating that all links are working.
Due to heavy dependencies (puppeteer + google chrome) the common usage of linspector will be most likely using a pre-built docker image, to minimize the risk of component installation failures, and increase the installation speed. Also pre-commit hooks will probably be of repository-local-hooks (calling a docker run
) or docker_image type.
Publish docker image as umbrelladocs/linkspector
. See publishing-docker-images as a starting point. Note the artifact attestation step included in the above doc. For covenience, most likely the latest explicit image tag of umbrelladocs/linkspector
, e.g. v0.3.6
will be also pushed as the latest
docker image tag. This will allow people who prefer to use the latest version to refer to umbrelladocs/linkspector:latest
from CI systems.
Another observation: currently the docker build uses a non-standard default (requires --build-arg LINKSPECTOR_PACKAGE=
) to build the image based on the local contents. This default could be changed in Dockerfile, so by default the local contents is used during the build
docker build --no-cache --pull -t umbrelladocs/linkspector .
while still support building of published npm
docker build --build-arg LINKSPECTOR_VERSION=@umbrelladocs/[email protected] -t umbrelladocs/linkspector:0.2.7 .
This is also an opportunity to reduce the confusion between potentially different 0.2.7
and v0.2.7
versions used by various services: npm, github, dockerhub.
Hello,
I am trying to update my CI/CD from using Markdown link check action to Linkspector.
I set my baseURL in the configuration file, but only the absolute links get correctly evaluated, while all the markdown links that use relative paths fail.
For example, if my root directory is structured like this:
-- docs
|-- dir1
| |-- file1.md
|
|-- dir2
|-- file2.md
in the the file1.md
a link formatted as:
../dir2/file2
will fail ❌/dir2/file2
will succeed ✅However, they both are valid links.
Am I missing something?
Is there a way to make links with a relative path not being falsely detected as failing?
Thank you!
Describe the bug
The README says:
If you don't want to install using npm you can download the binary from GitHub releases.
But I can only see the binary in the first three releases and not in the more recent ones.
Describe the bug
First, thank you for making this.
As for the issue, while running linkspector check
on a json
file, I am getting the following error:
💥 Error: Cannot read properties of undefined (reading 'start')
To Reproduce
Running linkspector check
with the following minimal json
file causes it for me:
{
"images": [
"https://cdn.britannica.com/75/178475-050-E9212E3D/Pyramid-of-Khufu-Giza-Egypt.jpg"
]
}
The issue does not happen when I remove the array and just have "images": "..."
directly.
Expected behavior
I would expect this to work, but a better error message to give a better idea of what the actual issue is, or at least a line number pointing to the offending link, would be nice too.
Describe the bug
A clear and concise description of what the bug is.
To Reproduce
Steps to reproduce the behavior:
💥 Error: Cannot read properties of undefined (reading 'start')
Expected behavior
Get a list of any broken links in the repo
Additional context
$ node -v
v22.2.0
$ yarn -v # used to install
1.22.22
I moved all the markdown containing files/directories to a standalone directory to avoid any interactions with other files resulting in this many files:
$ ls
LICENSE.md README.md about-civicactions about-this-guidebook common-practices-tools company-policies employee-benefits practice-areas
$ find . -name '*.md' | wc -l
139
That resulted in the error above:
../node_modules/.bin/linkspector check
⠋ Configuration file not found. Using default configuration.
# Snip broken links
💥 Error: Cannot read properties of undefined (reading 'start')
I deleted most of the files and it started running fine. I then added them back and removed some directories until it started running:
rm -rf practice-areas/project-management/
rm -rf practice-areas/help-desk/
find . -name '*.md' | wc -l
112
After that the test completed successfully.
../node_modules/.bin/linkspector check
⠋ Configuration file not found. Using default configuration.
# Snip broken links
💥 Error: Some hyperlinks in the specified files are invalid.
Given this, my hypothesis is that it is probably something to do with the number of files. I couldn't identify a specific file that caused it to fail repeatedly, although it's possible that is a cause also. Let me know if I can help test anything or if you need more info :)
Finally - thanks for working on this project! We have been using markdown-link-check for a long time, but have had to keep increasing the number of exceptions due to site adding CDNs. This looks like it should help a lot with this approach (as well as filtering and reporting checks!).
For reference see #1 which added support for AsciiDoc files.
According to pnpm/pnpm#6648 (comment) pnpm is maintained by
... a person who maintains an open source project in their free time
This library is exactly what I was looking for, thanks so far!
I'm having trouble using baseUrl
in the config.
Docs: baseUrl
https://github.com/UmbrellaDocs/linkspector?tab=readme-ov-file#base-url
Config Validator: baseUrl
https://github.com/UmbrellaDocs/linkspector/blob/main/lib/validate-config.js#L25
Usage: baseURL
destructured off of config -
I think this mismatch is making it never be applied, right?
Create a Dockerfile to run Linkspector in a Docker image.
Currently, Linkspector processes each file separately, there could also be an option to process files and collect links and process them in batches. Then study the if it performs better by doing that.
We have a relative link to a json
file in our md
and linkspector finds it as invalid.
$ linkspector check
🚫 apps/github-cascading-app/README.md, ./schemas/config.schema.json, 404, 34, Cannot find: ./schemas/config.schema.json.
❌ Error: Some links in the specified files are not valid.
The link works well, you can find it here: https://github.com/AmadeusITGroup/otter/blob/main/apps/github-cascading-app/README.md
markdown-link-check
finds it valid.
FILE: apps/github-cascading-app/README.md
[✓] https://github.com/apps/otter-cascading
[✓] ./schemas/config.schema.json
2 links checked.
Describe the bug
If a section heading contains inline code (e.g. inline code
), then links to it report as not valid even though they do work.
To Reproduce
Steps to reproduce the behavior:
inline code
Create a link to that section e.g.
link
Test that this link does in fact work in vscode or some other markdown viewer
Run linkspector against this markdown file
Observe that linkspector will report this link as broken
Expected behavior
Links to sections that have headings that contain code
should be marked as valid when they are.
Additional context
Example actions run - This is an actions run of my repo. It reports that there are 4 broken links but they are all false failures.
First issue - This doc is the first issue being flagged in my action run. Contents 3b is the link that is being reported as broken, but if you click it, it works just fine. This (https://github.com/KStocky/ShaderTestFramework/blob/main/docs/TTL/TypeTraits/VoidT.md#2-detecting-if-an-expression-is-valid-using-ttldeclval) is the URL to that specific section which matches the link used in the document.
Perhaps it would be useful to allow linkspector to be ran without a configuration file.
There are two ways to this:
dirs: .
to mimic the behavior of markdown-link-checkerWe currently added the following line to our github action for smooth transition from markdown-link-checker:
# Generate default configuration file if it doesn't exist.
if [ ! -f .linkspector.yml ] ; then printf "dirs:\n - ./\n" > .linkspector.yml ; fi
While this works, it looks a bit ugly.
FWIW, we cannot use markdown-link-checker anymore, as SPIE uses a new redirection scheme in which working urls like https://doi.org/10.1117/12.2559784 are marked as broken by markdown-link-checker. linkspector correctly identifies them as working. So this seemed like a good moment to switch to linkspector.
And thanks for your work!
Allow users to run Linkspector as a GitHub action
Describe the bug
Links to sections of the current page are reported as broken when they are valid.
To Reproduce
This is a link to a failing report of Linkspector that is run alongside markdown link checker.
It claims that the link 4c of my tutorial is broken when it is demonstrably not.
markdown link checker succeeds in validating that it is correct.
Expected behavior
Expected a valid link to a page section to report as valid
The example below uses docker #23
git clone https://github.com/UmbrellaDocs/linkspector
cd linkspector
docker build --build-arg LINKSPECTOR_PACKAGE= -t umbrelladocs/linkspector .
linkspector check
on a test directory
mkdir -p test/nested
cp .gitignore test # due to https://github.com/UmbrellaDocs/linkspector/issues/24
echo 'footnote:[linkspector-mising[https://github.com/UmbrellaDocs/linkspector-missing]]' > test/nested/nested.asciidoc # 1
echo 'link:nested/nested-missing.asciidoc[nested-missing]' > test/test.asciidoc # 2
echo '[[test-missing.png]]' >> test/test.asciidoc
echo 'image::test-missing.png[test-missing]' >> test/test.asciidoc # 3
echo '<<test-missing-missing.png>>' >> test/test.asciidoc # 4
docker run --rm -it -v $PWD/test:/app --name linkspector umbrelladocs/linkspector bash -c 'linkspector check'
Output
⠋ Configuration file not found. Using default configuration.
✨ Success: All hyperlinks in the specified files are valid.
⠋ Configuration file not found. Using default configuration.
It is expected that 'linkspector check' fails for the above 4 types of missing links.
There are more types of links, but I think these are the most common ones.
Check if we can use Linkspector with puppeteer config support. For example, to use proxy support to connect puppeteer to a remote service.
The example below uses docker #23
git clone https://github.com/UmbrellaDocs/linkspector
cd linkspector
docker build --build-arg LINKSPECTOR_PACKAGE= -t umbrelladocs/linkspector .
linkspector check
on a test directory
mkdir test
cp .gitignore test # due to https://github.com/UmbrellaDocs/linkspector/issues/24
echo '![test](test-missing.png "test")' > test/test.md
docker run --rm -it -v $PWD/test:/app --name linkspector umbrelladocs/linkspector bash -c 'linkspector check'
Output
⠋ Configuration file not found. Using default configuration.
✨ Success: All hyperlinks in the specified files are valid.
An error due to a missing image link is expected.
Hi, thanks for this very usefull utility. I'm in the process of building it to the GitHub workflow for the Nephio open source project.
For some of the links I got a 304, what I think is still okay and should be considered as alive. I've added a config file and added 304 to the aliveStatusCodes section, but those links are still treated as error:
Hi,
I have a Github repo and I'd like to check its wiki page repo for broken links as well.
https://github.com/kaktusztea/km100/wiki
Created a workflow for it:
https://github.com/kaktusztea/km100/blob/master/.github/workflows/markdown_links_wiki.yml
But it fails and I don't know why.
https://github.com/kaktusztea/km100/actions/runs/9646569412
Run umbrelladocs/action-linkspector@v1
Run reviewdog/action-setup@v1
Run set -eu
🐶 Installing reviewdog ... https://github.com/reviewdog/reviewdog
Run set -eu
📖 reviewdog -h
Run $GITHUB_ACTION_PATH/script.sh
🔗💀 Installing linkspector ... https://github.com/UmbrellaDocs/linkspector
Running linkspector with reviewdog 🐶 ...
reviewdog: failed to unmarshal rdjson (DiagnosticResult): proto: unexpected EOF
Error: Process completed with exit code 1.
Additionally on wiki page you don't use link like "Something.md", but only as "Something" that points to the selected page.
Describe the bug
Checking locally works find, but running in Github Actions leads to a 403.
To Reproduce
Example README.md
:
Redditor [u/Boojum](https://old.reddit.com/user/Boojum) has crafted a nice
[surprise input](https://old.reddit.com/r/adventofcode/comments/18firip/2023_day_10_an_alternate_input_to_visualize/).
Relevant output from run in GitHub Action:
🚫 2023/day-10-python/README.md, https://old.reddit.com/user/Boojum , 403, 24, null
🚫 2023/day-10-python/README.md, https://old.reddit.com/r/adventofcode/comments/18firip/2023_day_10_an_alternate_input_to_visualize/ , 403, 25, null
I thought it might have to do with old.reddit.com
, but this rewrite rule in .linkspector.yml
did not help:
replacementPatterns:
- pattern: "https?://old.reddit.com"
replacement: 'https://www.reddit.com'
I'm not sure how I can get more details (e.g. headers) to see what is different locally and in GitHub Actions. I'll gladly provide more details if I can get some hints on how to get them.
Describe the bug
Consider this simple test.md
file:
Here is a tag <a name="ref1"></a>
Here is a link to that tag: [link](#ref1)
linkspector check
throws:
🚫 testmd.md, #ref1 , 404, 3, Cannot find section: #ref1 in file: testmd.md.
💥 Error: Some hyperlinks in the specified files are invalid.
Links like this work when published, not sure if this is expected behavior.
The example below uses docker #23
git clone https://github.com/UmbrellaDocs/linkspector
cd linkspector
docker build --build-arg LINKSPECTOR_PACKAGE= -t umbrelladocs/linkspector .
linkspector check
on an empty directory
mkdir empty
echo '[linkspector](https://github.com/UmbrellaDocs/linkspector-missing)' > empty/test.md
docker run --rm -it -v $PWD/empty:/app --name linkspector umbrelladocs/linkspector bash -c 'linkspector check'
Output
⠋ Configuration file not found. Using default configuration.
ENOENT: no such file or directory, open '.gitignore'
✨ Success: All hyperlinks in the specified files are valid.
A failure would rather be expected
⠋ Configuration file not found. Using default configuration.
🚫 test.md, https://github.com/UmbrellaDocs/linkspector-missing, 404, 1, null
❌ Error: Some links in the specified files are not valid.
The example below uses docker #23
git clone https://github.com/UmbrellaDocs/linkspector
cd linkspector
docker build --build-arg LINKSPECTOR_PACKAGE= -t umbrelladocs/linkspector .
linkspector check
on a test directory
mkdir test
cp .gitignore test # due to https://github.com/UmbrellaDocs/linkspector/issues/24
echo '[my-section](#my-section-missing)' > test/test.md
echo '## My section' >> test/test.md
docker run --rm -it -v $PWD/test:/app --name linkspector umbrelladocs/linkspector bash -c 'linkspector check'
Output
⠋ Configuration file not found. Using default configuration.
✨ Success: All hyperlinks in the specified files are valid.
An error due to a missing section link is expected.
Describe the bug
For example if a section is named as ## 📖 Documentation
the valid link to this section #-documentation
is reported as Cannot find section: #-documentation in file.
To Reproduce
Steps to reproduce the behavior:
check [the README Documentation Section](#-documentation)
## 📖 Documentation
linkspector check
Expected behavior
No broken links
linkspector/.github/workflows/npm-publish.yml
Lines 19 to 20 in 5643ed7
Since the package-lock.json is missing, the installation outcome of the github action workflow may be non-repeatable.
Is your feature request related to a problem? Please describe.
For big repositories with 1000s of links, it can be a good idea to implement some sort of caching mechanism.
Describe the solution you'd like
Linkspector can keep a cache of checked hyperlinks and if they are alive or dead. That way we can skip checking the hyperlinks again.
This line
Line 38 in 52a3475
produces output like
🚫 README.md, https://doi.org/10.1117/12.2559784, null, 23, Navigation timeout of 30000 ms exceeded
which github will parse with the trailing comma as part of the url when the warning appears in a github action log, e.g. https://github.com/AstarVienna/DevOps/actions/runs/7968772750/job/21753566613
Arguably this is a github bug, and not a linkspector bug, because github does not include the comma in the URL in other places, like in this issue.
Nevertheless, linkspector will probably be running in github actions quite often, so it would be good if the links are clickable from there without having to manually remove the comma.
So perhaps there can be a space added after the url?
Currently, Linkspector would download the files if the link point to a PDF file or binary. It should instead only check if the link is valid and not download the file.
Line 10 in 52a3475
I believe the expected version is to be taken from package.json
Describe the bug
I set up link checking in ci in my project. I have broken link in markdown file to file on github that doesn't exist anymore. However, linkspector says, everything is ok.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Linkspector shows an error in markdown file.
Hi!
I use - for now - linkspector locally by running linkspector check
in my "md" directory to verify my whole documentation - written is markdown and used many links between pages and sections (with anchor links):
https://github.com/kaktusztea/km100/tree/master/md
I got lots of lots of false errors.
Example: I have this markdown
https://github.com/kaktusztea/km100/blob/master/md/070_tavolsagi_harc.md
It has this link:
Lásd még: [Szándékos kitérés lövés elől](070_tavolsagi_harc.md#sz%C3%A1nd%C3%A9kos-kit%C3%A9r%C3%A9s-l%C3%B6v%C3%A9s-el%C5%91l) fejezetet.
... which lands here well:
https://github.com/kaktusztea/km100/blob/master/md/070_tavolsagi_harc.md#sz%C3%A1nd%C3%A9kos-kit%C3%A9r%C3%A9s-l%C3%B6v%C3%A9s-el%C5%91l
... but linkspectors throws this error:
🚫 070_tavolsagi_harc.md, 070_tavolsagi_harc.md#sz%C3%A1nd%C3%A9kos-kit%C3%A9r%C3%A9s-l%C3%B6v%C3%A9s-el%C5%91l , 404, 7, Cannot find section: #sz%C3%A1nd%C3%A9kos-kit%C3%A9r%C3%A9s-l%C3%B6v%C3%A9s-el%C5%91l in file: /Users/kaktusz/repo/km100.code/md/070_tavolsagi_harc.md.
Actually I have hundreds of false positive errors with accended anchor links. I also use unicode characters in links like 🔵,
best regards.
Linspector action fails right at the beginning:
https://github.com/kaktusztea/km100/actions/runs/9268174606/job/25496239289
Problematic page:
https://github.com/kaktusztea/km100/blob/master/md/start.md
Running linkspector with reviewdog 🐶 ...
Error in checking if file #0-bevezet%C5%91-jelz%C5%91k exist! TypeError: Cannot read properties of undefined (reading 'includes')
Error in checking if file #1-karakteralkot%C3%A1s exist! TypeError: Cannot read properties of undefined (reading 'includes')
Error in checking if file #2-h%C3%A1tterek exist! TypeError: Cannot read properties of undefined (reading 'includes')
Error in checking if file #3-k%C3%A9pzetts%C3%A9grendszer exist! TypeError: Cannot read properties of undefined (reading 'includes')
Error in checking if file #4-fort%C3%A9lyok exist! TypeError: Cannot read properties of undefined (reading 'includes')
Error in checking if file #5-trad%C3%ADci%C3%B3k exist! TypeError: Cannot read properties of undefined (reading 'includes')
Error in checking if file #6-harcrendszer-%EF%B8%8F exist! TypeError: Cannot read properties of undefined (reading 'includes')
Error in checking if file #7-t%C3%A1vols%C3%A1gi-harcrendszer- exist! TypeError: Cannot read properties of undefined (reading 'includes')
Error in checking if file #8-psz%C3%AD exist! TypeError: Cannot read properties of undefined (reading 'includes')
Error in checking if file #9-m%C3%A1giarendszer exist! TypeError: Cannot read properties of undefined (reading 'includes')
Error in checking if file #10-papi-m%C3%A1gia-10- exist! TypeError: Cannot read properties of undefined (reading 'includes')
Error in checking if file #11-var%C3%A1zst%C3%A1rgyak--10- exist! TypeError: Cannot read properties of undefined (reading 'includes')
Error in checking if file #12-gy%C3%B3gy%C3%ADt%C3%A1s-gy%C3%B3gyul%C3%A1s exist! TypeError: Cannot read properties of undefined (reading 'includes')
Error in checking if file #13-m%C3%A9regrendszer-m%C3%A9rgek exist! TypeError: Cannot read properties of undefined (reading 'includes')
Error in checking if file #14-%C3%A9rz%C3%A9kel%C3%A9s-%C3%A9szlel%C3%A9s-90 exist! TypeError: Cannot read properties of undefined (reading 'includes')
Error in checking if file #15-szitu%C3%A1ci%C3%B3k-20 exist! TypeError: Cannot read properties of undefined (reading 'includes')
Error in checking if file start.md#karakteralkot%C3%B3k exist! TypeError: Cannot read properties of undefined (reading 'includes')
reviewdog: failed to unmarshal rdjson (DiagnosticResult): proto: unexpected EOF
Affected section in start.md. These are all in-page anchor links.
Bevezető - Karakteralkotás
Hátterek - Képzettségrendszer - Fortélyok - Tradíciók
Harcrendszer - Távolsági Harcrendszer
Pszí - Mágiarendszer - Papi mágia - Varázstárgyak
Gyógyítás, gyógyulás - Méregrendszer, Mérgek - Érzékelés, Észlelés - Szituációk
What is unique, that the target sections has also link in the section title.
Example:
### 1. [Karakteralkotás](010_karakteralkotas.md)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.