Git Product home page Git Product logo

linkcheckerhtml's Introduction

HTML / XML / RSS link checker

VSCode extension that checks for broken links in an HTML, XML, RSS, PHP, or Markdown file.

Functionality

Checks for broken links in anchor-href, link-href, img-src, and script-src tags in currently-open HTML or PHP file. It checks HTTP/HTTPS links by trying to access them on the internet, and checks relative links (../folder/file.html) by checking if the file exists on the local file system.

Checks both clearnet and onion (Tor) links.

Also checks for badly-formatted mailto links, and duplicate local anchors (anchor-name, anchor-id).

Also checks for working HTTPS equivalents of HTTP links.

Also checks for broken links in currently-open XML or RSS or Markdown file.

Optionally checks for invalid characters and common mistakes (missing tag content, empty attribute value, more).

Also checks for errors in a small subset of semantic HTML tags (in HTML and PHP files): checks that each page has header, main, footer; checks that each heading is inside a section, article, or aside; checks that each section/article/aside has exactly one heading in it; checks that heading values are nested properly.

Use

Open an editor window on an HTML, XML, RSS, PHP, or Markdown file, and then press Alt+H.

Broken links are reported via the standard error/warning/information diagnostic icons in lower-left of UI.

Click on the diagnostic icons and numbers to open the diagnostics pane.

Click on a diagnostic line, see that link highlighted in the source file, press Alt+T to open that URL in your browser.

If it's an HTTP link, press Alt+M to try to open the HTTPS equivalent of that URL in your browser.

Press Alt+L to clear all diagnostic messages generated by this extension.

Using the extension

Tip: After you do Alt+H and get diagnostics, work on the problems from bottom (last diagnostic) to top (first diagnostic). That way the line numbers in the diagnostics don't change as you delete or add lines in the source.

To see/change settings for this extension, open Settings (Ctrl+,) / Extensions / "HTML / XML / RSS link checker".

To change the key-combinations for this extension, open File / Preferences / Keyboard Shortcuts and search for Alt+H or Alt+T or Alt+M or Alt+L.

Onion (Tor) links

Onion URLs look like https://1234567890123456.onion/something (16 chars before '.onion') or https://12345678901234567890123456789012345678901234567890123456.onion/something (56 chars before '.onion'). They are used to access dark-web sites through Tor Browser (usually).

Checking validity of Onion (Tor) links

To use Alt+H to check onion links, you must have a Tor/socks proxy listening on 127.0.0.1:9050. On Linux:

sudo systemctl status tor   # should show an active Tor service
# if it's not active, try:
sudo systemctl start tor

sudo ss -lptu | grep :9050  # should show an active Tor listener

For more information see https://github.com/talmobi/tor-request#requirements If you don't have a Tor/socks proxy listening, each onion link will give an error "Can't check onion URLs: no Tor/socks service listening on 127.0.0.1:9050".

While checking links, the Tor Browser can be running or not, it doesn't matter. Only the proxy is used.

Opening Onion (Tor) links in Tor Browser

[THIS FEATURE SEEMS TO BE BROKEN]

To use Alt+T or Alt+M to open onion links in the Tor Browser, you must have Tor Browser installed and running already. You have to launch it yourself; this extension won't launch it.

Also, on Linux, you must install "xdotool":

sudo apt install xdotool
xdotool --version

Then for any bad onion link reported in the diagnostics, do Alt+T on it. If it's an "http://" onion link (illegal, I think), also you can do Alt+M on it. You should see focus switch to the Tor Browser, and the URL will be typed in the address bar, then accessed.

The connection to Tor Browser is not 100% reliable. The extension is using xdotools to send key-presses to the Tor Browser, and it's fairly timing-dependent and one-way. If your system is busy, or Tor Browser is busy, or something else goes wrong, you may see the wrong things happen in Tor Browser (chars missing from the URL, or some dialogs popping open).

Semantic HTML

The body of the HTML page is expected to be structured like:

<body>
<header>STUFF</header>
<main>

<section>
<h1>HEADING</h1>
CONTENT

<section>
<h2>HEADING</h2>
MORE CONTENT
</section>

...

</section>

</main>
<footer>STUFF</footer>
</body>

This structure should increase the SEO and accessibility of your web pages.

If your pages are not structured like this, or you just don't want to bother checking Semantic HTML, change the setting "reportSemanticErrors" to "Don't report".

If a heading outside of any section and outside of main is found, it is assumed that your page is not using Semantic HTML at all, and no further checking of Semantic HTML is done.

Settings

  • addExtensionToLocalURLsWithNone: If a local file URL has no extension, add this extension to the filename before checking (default is ""; don't include "." in the setting).

  • checkInternalLinks: Check #name links to targets inside current file (default is true).

  • checkMailtoDestFormat: Check format of email addresses in mailto links (default is true).

  • dontCheckURLsThatStartWith: Don't check URLs that start with any sequence in this comma-separated list (default is "127.,192.,localhost,[::1],[FC00:,[FD00:").

  • localRoot: String prepended to links that start with "/" (default is ".").

  • maxParallelThreads: Maximum number of links to check in parallel (range is 1 to 20; default is 20).

  • processIdAttributeInAnyTag: #name link can be to any tag with ID attribute inside current file (default is true).

  • reportBadChars: Report possible bad characters ? (default is [check and report] "as Information")

  • patternBadChars: RegEx pattern to match possible bad characters (default is "[^\\\\x09-\\\\x7E]"; if you use lots of non-English characters, maybe use "[\\x7F-\\x9F]" instead; thought "[\\x00-\\x08,\\x0E-\\x1F,\\x7F-\\xFF]" would be good but it fails).

  • reportHTTPSAvailable: Report if HTTP links have HTTPS equivalents that work ? (default is [check and report] "as Information")

  • reportNonHandledSchemes: Report links with URI schemes not checked by the checker, such as FTP and Telnet (default is "as Information").

  • reportPossibleMistakes: Report possible mistakes such as empty tags or attributes ? (default is [check and report] "as Warning")

  • patternsPossibleMistakes: Comma-separated list of RegEx patterns to match possible mistakes (default is " href=\"\", src=\"\", hreef=\",\"></a>,<h1></h1>,<h2></h2>,<h3></h3>,<h4></h4>,<b></b>,<i></i>,<u></u>").

  • reportRedirect: Report links that get redirected (default is "as Warning").

  • reportSemanticErrors: Report errors in semantic HTML tags such as main, section, article, aside, h1, etc (default is "as Information").

  • timeout: Timeout (seconds) for accessing a link (range is 5 to 30; default is 15).

  • torOpenURLCmd1: command (1) to open an URL in Tor Browser ('URL' will be appended; default is "xdotool search --onlyvisible --name 'Tor Browser' windowactivate --sync key --clearmodifiers --window 0 ctrl+t type --delay 100 "

  • torOpenURLCmd2: command (2) to open an URL in Tor Browser (default is "xdotool search --onlyvisible --name 'Tor Browser' windowactivate --sync key --clearmodifiers --window 0 Return").

  • userAgent: User-Agent value used in Get requests (default is "Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0").

Limitations

  • HTML and PHP: Tag name and href/src/id attribute must be on the same line.

  • XML and RSS: Entire tag (for link, guid, and url tags) must be on the same line.

  • Doesn't know about comments; will find and check tags inside comments.

  • Checks "#name" links to targets in current file, but not in other local or remote files.

  • Doesn't check EVERY detail of the email address spec in mailto links. Just a cursory check.

  • XML: There are no standard tag and attribute names, so some links may not be checked.

Note that checking for broken links is more of an art than a science. Some sites don't actually return 404, but send you to a landing page. For example, Azure.com works this way. You can go to https://Azure.com/foo/bar and it will happily redirect you to a sub-page of https://azure.microsoft.com/, with no 404 status returned. So take a status of "OK" with a grain of salt - you may not be arriving at the page you intend.

Also, browsers seem to be more tolerant than the library used by this extension. This extension will report a lot of certificate-errors and such that browsers mostly ignore.

And checking is getting harder, with more URLs redirecting through GDPR-consent or cookie pages and such, or redirecting to same URL with a tracking parameter added, causing false positives.

Quirks

  • If there are multiple identical tags with identical link-targets on same line (for example two Anchor tags with identical href targets), clicking on diagnostic for any of them takes you to first one in the source line.

  • Doesn't check ANY of the email address format after "?", as in "mailto:[email protected]?subject=xyz".

  • "://" is prepended to items in dontCheckURLsThatStartWith before matching; e.g. if you specify "localhost" the code searches for "://localhost" in URLs.

  • The checking in XML and RSS files is permissive, allowing known stuff from RSS, and likely stuff that could be in XML. Any attribute of the form *url="something" or *href="something" will be checked, as well as the standard RSS tags: link, guid, url.

  • Onion: an URL is considered "onion" if it starts with "https://" and contains ".onion" ANYWHERE in it.

  • Onion: if an URL starts with "http://", it will be treated as non-Onion, and the "https://" form of it will be checked as non-Onion too.

  • Semantic HTML: assumes that a section/article/aside will have a heading in it before it has any sub-section/article/aside.

  • Semantic HTML: section/article/aside without heading will be flagged (correctly) but may screw up the tracking of headings from that point on. Fix first such error and then scan again.

  • PHP: links are found if they look like links in HTML. For example, the PHP code is expected to look something like:

    echo '<a href="https://example.com">test1</a><br />';

    and not like:

    echo '<a href="https://' + theDomainName + '">test1</a><br />';

    But the following would be okay:

    echo '<a href="https://example.com">' + theTextOfTheLink + '</a><br />';

  • Markdown: in reference-style links, the URL will be checked, but there will be no check that both halves of a reference-style link exist. A reference-style link looks like:

    [hobbit-hole] [1]
    
    [1]: <https://example.com/hobbithole.html> "Hobbit hole"
    

Install

From the Marketplace

Open Visual Studio Code and press F1; a field will appear at the top of the window. Type ext install linkcheckerhtml, hit enter, and reload the window to enable.

From VSIX file

Either:

  • In CLI, do
code --install-extension linkcheckerhtml-n.n.n.vsix

or

  • In VSCode GUI, in the Extensions view "..." drop-down, select the "Install from VSIX" command.

From source code

  • Do a git clone to copy the source code to "linkcheckerhtml" in your home directory.
  • In CLI, cd linkcheckerhtml and then ./CopyToHomeToRunInNormal.sh

Releases

0.2.0

  • Copied from "Microsoft / linkcheckermd" and then greatly modified.
  • Extension works, but probably has memory leaks, not much testing.

0.3.0

0.4.0

  • Finally nailed that hang bug.
  • Added setting for timeout.
  • Fixed timeout and redirect settings.

0.5.0

  • Added Alt+T to open an URL in a browser.
  • First release with a VSIX file.

0.6.0

  • Got rid of: "href" or "src" has to be first attribute in the tag.
  • Require at least one "." in mailto address's domain.
  • Try to dispose memory properly to avoid leaks.
  • Handle local files with "?args" on the end.

0.7.0

  • Added localRoot setting.
  • Fixed mailto that ends with "?".
  • Added userAgent setting, and it definitely makes some sites happier.

1.0.0

  • Increased default timeout to 12.
  • Check local anchors (#name) in current file.
  • Support anchor-id (HTML5) as well as anchor-name.

1.1.0

  • Added settings about checking local anchors (#name) and ID attributes in current file.

1.2.0

  • Moved repeated add-diagnostic code into a function.

1.3.0

  • Added setting and code to check if HTTPS equivalent exists for HTTP address.
  • Added Alt+M to open current HTTP URL as an HTTPS URL in browser.

1.4.0

  • Briefly tested IPv6 addresses to see that at least they don't cause anything to blow up.
  • Set default user-agent string to latest Firefox.
  • Added dontCheckURLsThatStartWith setting and code.
  • Increased default timeout to 15.

1.5.0

  • Added "Using the extension" image.
  • Better message when 0 files left to do.
  • Added addExtensionToLocalURLsWithNone setting and code.

1.6.0

  • Added Alt+L to clear all diagnostics belonging to this extension.
  • Changed my email address.

1.7.0

  • Fixed README.

2.0.0

  • Added support for XML and RSS files.

2.1.0

  • Changed to Axios 0.19.0.
  • On redirected link, give new URL.

2.2.0

  • Updated package dependencies because of security warnings.
  • Don't report link that redirects to same link (but rare, usually something is different).
  • Don't report link that redirects to same link with a tracking parameter added (but rare, usually something else is different too).
  • Fix status when file contains zero links.

3.0.0

  • Added support for checking onion links. Simple pass or fail, consider redirect as pass, no way to control timeout or user-agent.
  • Onto new versions of VSCode and npm and node.
  • Updated default user-agent string to Firefox 76.

3.1.0

  • Made Alt+T or Alt+M on onion link open it in Tor Browser, using xdotool.

3.2.0

  • Flag onion links where domain name is illegal length.
  • Moved xdotool command line strings into settings. (Wayland will use ydotool ?)
  • Treat onion links that start with "http:" as clearnet links.

3.3.0

  • Somehow using xdotool to send onion links to Tor Browser has stopped working.

4.0.0

  • Added checking for bad characters and possible mistakes.
  • Onto new versions of VSCode and npm and node.
  • Updated default user-agent string to Firefox 83.
  • Various code cleanup.
  • Made regex's case-insensitive.

5.0.0

  • Added checking for semantic HTML errors.
  • Updated default user-agent string to Firefox 84.

5.1.0

  • Added support of PHP files as if they were HTML files.
  • Updated default user-agent string to Firefox 86.

6.0.0

  • Added support of Markdown files.

6.1.0

  • Added support for local Markdown links to heading IDs as in [link1](#heading1).
  • Added support for Markdown headings automatically becoming local IDs, with spaces converted to dashes.
  • Updated default user-agent string to Firefox 89.

6.2.0

  • Tweaked support for Markdown headings automatically becoming local IDs: take a heading, remove any leading spaces, change it to lowercase, remove everything not letter digit hyphen space, then change spaces to hyphens.

6.3.0

  • In Markdown, added requirement for [identifier at start of link.
  • Updated npm and modules.

Development

To-Do list

  • In Markdown, prevent collisions when generating implicit header IDs ? "# H" twice should generate IDs "h" and "h-1" ?
  • Add tasks to open and close all HTML files in directory, so linter reports any errors.
  • Maybe new axios has broken timeout ?
  • Somehow using xdotool to open onion link in Tor Browser has gotten broken.
  • Test onion links a lot more, maybe indicate redirects, any way to control timeout, set user-agent.
  • Better way to open onion link in Tor Browser ?
  • Way to open onion link in Tor Browser on Windows ?
  • Add setting "do/don't check onion links".
  • Snap version of VSCode uses Alt+H for Help menu.
  • Create automated tests.
  • Extension really is supposed to remove each diagnostic line after the corresponding source line is edited.
  • Bundle extension to make it smaller/faster ? https://code.visualstudio.com/api/working-with-extensions/bundling-extension https://webpack.js.org/guides/getting-started/
  • Can't really test IPv6 because my system and ISP have it turned off.
  • Allow single-quotes on attributes ? I thought HTMLHint didn't allow them, so I didn't support them.
  • Don't check a link if it has rel="nofollow" ? Probably should leave it as-is: check it.
  • Any way to do retries inside axios ? Apparently not.
  • Memory leaks ? Doesn't seem to be any tool to check an extension for leaking. Maybe not possible, since extensions are running inside a huge framework of Electron or Node or something.
  • Display a "busy" cursor ? Can't. Window.withProgress could put up a dialog, but then user would have to close the dialog manually every time, don't want that. Doesn't seem to be a way to close that dialog programmatically.
  • Click on diagnostic, do Alt+T or Alt+M to browser, come back to VSCode, cursor is in filter field of diagnostics pane instead of in source file. More convenient if in source file. But seems to be no way to do it.
  • Multi-line tag (tag name and href/src attribute on different lines) silently ignored. Would be a lot of work to deal with, given the simple way the code does parsing.

Development Environment

I'm no expert on this stuff, maybe I'm doing some things stupidly.

Now using:

  • Fedora 34 KDE with X.
  • VSCode deb 1.58.2 (which says Node.js: 14.16.0)
  • node --version # v14.17.0
  • npm --version # 6.14.13
  • axios
  • path
  • fs
  • tor-request

I did:

  • sudo apt install npm

  • sudo npm install -g vsce In project directory:

  • npm install

  • npm audit

  • npm audit fix

  • sudo npm -g install --save axios

GitHub repo for this extension

Visual Studio Marketplace page for this extension

My web site


Privacy Policy

This extension doesn't collect, store or transmit your identity or personal information in any way. All it does is read the current editor window, do existence-tests on local files, open links to internet sites, and send internet links to your browser.

linkcheckerhtml's People

Contributors

billdietrich avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

yaterah

linkcheckerhtml's Issues

command 'extension.generateLinkReport' not found

Hi there
I've installed code just to run your extension.
I can see the extension preferences and can change the shortcut

however pressing it I get "command 'extension.generateLinkReport' not found"

thanks

Feature Request: Add extensions to search

Love this extension. I would like the ability to search for extensions. For example, my files are .html, but my links are extensionless and it's not finding them.

I'd be happier if I'm using the extension wrong though, please let me know. :)

Adding domain to local root settings field add domain after local file system address

I'm assuming the local root setting is to allow me to check relative URLs in a file by defining the domain of those URLs separately. Instead, it's taking that domain and adding it after the local root and before the href {{c:\users\me...}}{{domain}}{{href}}.

I added this to the Local Root settings
https://www.domain.com.au

This is my link
<a href="/podiatrist">podiatrist</a>

The errors comes through as
Local file 'c:\Users\userName\Documents\Content Documents\https:\www.healthdirect.gov.au\podiatrist' does not exist.

tel: links marked as problematic

For privacy, I changed my phone number and repository name/path.

Full error code is below.

[{
	"resource": "/Users/path/Documents/GitHub/path/contact/index.html",
	"owner": "linkcheckerhtml",
	"severity": 8,
	"message": "Local file '/Users/path/Documents/GitHub/path/contact/tel:2124441100'  does not exist.",
	"startLineNumber": 56,
	"startColumn": 18,
	"endLineNumber": 56,
	"endColumn": 32
}]

case mismatch not detected in file/folder names (in URLs)

apologies if i'm offbase, but i 'think' when you check URLs in html, the extension is not detecting case mismatches.
For example, if the html specifies:
<a href="Aquarium.jpg">
and the actual file is:
AQUARIUM.jpg or aquarium.jpg or Aquarium.JPG etc
no 'broken link' is highlighted.
(same applies for folder names within the URL)

granted... if the page is hosted on IIS, no problem. But... if it's hosted on a UNIX based web server (eg. GitHub pages backend), then invoking that href will generate a 404 error.

Would be (P3) 'Nice to have' the ability to check a repo full of pages for this sort of breakage.

as additional background see this VSCode related thread/comment.

Support local markdown links

I tried to use your extension for markdown files, to check local markdown links. I am not sure if this is supported to work, so probably its a feature request.

If I have this markdown file:

# GUI

[Test link](#GUI)

The link works perfectly, but the extension complains

"Id or name 'GUI' not found in current file."

It would be great if you could make this work (also for more complicated links, where spaces in the header have to be substituted with - in the link, like this:

# header with space

[Test link](#header-with-space)

All links marked as Error in markdown with Hugo > Docsy

  • I am using markdown with Hugo > Docsy theme
  • I installed the extension on vscode.
  • In Hugo > Docsy, all links work without any .md extension, for example:
    [Details](../app_integration_intro/google_cloud)
  • Such links are reported as in error by the linkcheckerhtml extension
  • If I add .md at the end of the link, the link is no longer reported in error. But the link no longer works in localhost:1313

Maybe it would be possible to provide some option to add .md to the end of all local links without file extension (or some other string, to be chosen by the user?)

relative links starting with ~/ are shows as broken

Hi Bill, thanks for the absolutely brill extension. Loving it.

I use VS2019 for the dev and the file is in index.cshtml in my dev folder.

I use VS Code for awesome extensions like this one for checking broken links etc. But the links that are perfectly valid in Visual Studio for asp.net are shown as broken in vs code using this extension.

so when I hover over the "~/assets/vendor/bootstrap-select/css/bootstrap-select.min.css"
it doesnt go to application root dir to check for it and marks as broken.

obv, I could change ~/ links to just start with / but that would not be advisable in case there are more than one applications in the same root dir.

could you please advise.

Other file extensions

Can this be made to work with php files? Currentl notification: "command 'extension.generateLinkReport' not found"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.