tcort / markdown-link-extractor Goto Github PK

View Code? Open in Web Editor NEW

23.0 3.0 19.0 276 KB

extracts links from markdown texts

License: ISC License

JavaScript 100.00%

hyperlinks extract-links

markdown-link-extractor's Introduction

markdown-link-extractor

Extracts links from markdown texts.

Installation

$ npm install --save markdown-link-extractor

API

markdownLinkExtractor(markdown)

Parameters:

markdown text in markdown format.

Returns:

an array containing the URLs from the links found.

Examples

const { readFileSync } = require('fs');
const markdownLinkExtractor = require('markdown-link-extractor');

const markdown = readFileSync('README.md', {encoding: 'utf8'});

const links = markdownLinkExtractor(markdown);
links.forEach(link => console.log(link));

Upgrading to v4.0.0

anchor link extraction no longer supported

Code that looked like this:

const { links } = markdownLinkExtractor(str);

Should change to this:

const links = markdownLinkExtractor(str);

Upgrading to v3.0.0

extended mode no longer supported
embedded image size parameters in ![]() no longer supported

Testing

npm test

License

See LICENSE.md

markdown-link-extractor's People

Contributors

Stargazers

Watchers

Forkers

nschonni angiecortez ticze kalugn tatianacastrolizama smellems blgm th3n00bc0d3r ceciliageraldo doc22940 gaurav-nelson nicolasmassart dianacato leidytapias autoantwort oshliaer aepfli tashian

markdown-link-extractor's Issues

Does not extract URLs from HTML

HTML hyperlink can be part of markdown.

It would be nice to handle cases such as:

<a href='http://foo.bar'>foo</b>

Expected:

[
  "http://foo.bar"
]

RangeError: Maximum call stack size exceeded - for content with emojis

Marked version:
"marked": "^2.0.5"

Depended by this package: https://github.com/tcort/markdown-link-extractor

Describe the bug
Getting an error for content with emojis: RangeError: Maximum call stack size exceeded

To Reproduce
Steps to reproduce the behavior:

We are using the package markdown-link-extractor for the following sample content to extract links and the RangeError is thrown by marked

Hi, Patrick! 👋
Did you hear that group activities like posting a discussion in a group can earn you 10x credits than the regular offer? Learn more about the special offers to look forward to this 10.10 Promo, from October 10 to 13 only!

Check the topic: **[📣 Promo Catalog & Many Ways to Earn Bigger Credits this 10.10 Promo!](https://1pt.ee/c/ea2c93)**

Stack trace

RangeError: Maximum call stack size exceeded\nPlease report this to https://github.com/markedjs/marked.\n    at String.match (<anonymous>)\n    at Tokenizer.link (/home/bountee/bundle/programs/server/npm/node_modules/markdown-link-extractor/index.js:14:31)\n    at Tokenizer.tokenizer.<computed> (/home/bountee/bundle/programs/server/npm/node_modules/marked/src/marked.js:165:45)\n    at Tokenizer.tokenizer.<computed> (/home/bountee/bundle/programs/server/npm/node_modules/marked/src/marked.js:167:31)\n    at Tokenizer.tokenizer.<computed>

Expected behavior
No error

Originally posted here:
markedjs/marked#2220

Why remove anchor link extraction?

It causes a bug in markdown-link-check.

Marked.js doesn't parse links in front matter headers correctly

Description of the issue

As indicated in tcort/markdown-link-check#128 the parsing of links in front matter YAML is buggy and returns all the characters even after the end of the link, so it includes quotes (as quotes are ok in YAML to delimitate string values).
This seems to be a choice on the Marked.js side not to support this: markedjs/marked#485

Solving leads

We first need to check if latest Marked.js behaves in a better way.

Then there's two options:

exclude the front matter header parsing from Marked.js parsing and parse it separately for links
switch to a parser that handles front matter and would provide the correct result

1st option is clearly the easiest in my opinion as we don't know the effect of switching to a new parser on existing user projects.

Expectations

Markdown-link-extractor is expected to extract for all the links in markdown files including those in a front matter header.

Linked issue

#7 also asks for links to be extracted from html code included in markdown. This is the same kind of request. Maybe both could be handled at the same time?

Update to marked > 0.6.2 for vulnerability

There was a vulnerability reported in the version of marked this package relies on here: https://www.npmjs.com/advisories/812

Would it be possible to bump the marked dependency to a version that does not have this advisory notice?

Need to update dependency

Running an npm audit with this package installed suggests the dependency marked needs upgrading to >=4.0.10