Comments (7)
@bboydflo not sure if it's helpful, but my original reason for using this library was to "pipe" its output to Preact, which requires using DOMParser to convert the generated HTML to Virtual DOM. Since this is then rendered using the imperative DOM API, it's relatively easy to implement XSS mitigation, though the same concept can be applied as a string-to-string transform. It won't be the fastest, but it avoids building in an HTML parser just for sanitization:
function safeMarkdown(markdown) {
const html = snarkdown(markdown);
const doc = new DOMParser().parseFromString(`<!DOCTYPE html><html><body>${html}`, 'text/html');
doc.normalize();
_sanitize(doc.body);
return doc.body.innerHTML;
}
function _sanitize(node) {
if (node.nodeType === 3) return;
if (node.nodeType !== 1 || /^(script|iframe|object|embed|svg)$/i.test(node.tagName)) {
return node.remove();
}
for (let i=node.attributes.length; i--; ) {
const name = node.attributes[i].name;
if (!/^(class|id|name|href|src|alt|align|valign)$/i.test(name)) {
node.attributes.removeNamedItem(name);
}
}
for (let i=node.childNodes.length; i--; ) _sanitize(node.childNodes[i]);
}
Here's the above running on JSFiddle: https://jsfiddle.net/developit/vrn16fsg/
from snarkdown.
@retog your question made me curios, so I investigated it a bit.
Unfortunately, I just managed to XSS snarkdown without using HTML:
https://codesandbox.io/s/immutable-cloud-b66z48?file=/src/index.ts
In general, XSS prevention isn’t that easy. Here’s a somewhat complete list of prevention techniques: https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html
Since snarkdown has no plans of being XSS-proof I’d strongly recommend not using snarkdown for any user-provided data, and only for trusted markdown files.
in case codesandbox 404s, here’s the source code
import md from "snarkdown";
import e from "lodash.escape";
const t = "[XSS Me](javascript:alert`hello from xss`)";
document.getElementById("xss")!.innerHTML = md(t);
document.getElementById("txt")!.innerHTML = e(md(t));
<div id="txt"></div>
<div id="xss"></div>
from snarkdown.
Hey, could you explain it with a little more detail? I don't see how this has to do with this markdown-parser. It is parsing some input and this <img src onerror="alert(1)"/>
is valid markdown. Maybe one could add this to the readme.md.
from snarkdown.
@najtin sure. Basically, my point is that very often, if markdown is used in real-world apps, it’s used to parse user-generated content (like these comments we’re writing here). Most developers don’t explicitly go through OWASP’s list of potential security pitfalls every single time they implement anything.
So, what will most likely happen is that someone will use this library and not expect it to allow JavaScript to be executed when using the output. If this library doesn’t want to implement an explicit cross-site-scripting preventing mechanism, it should at least have a warning that implementing such a mechanism is always necessary when parsing and rendering user-generated markdown content.
Otherwise, developers will find a way to mess this up and risk their users’ and company’s security and image.
Other markdown parsers mention these issues in their readme or have (sometimes too simple) ways of mitigating them themselves, like disabling all HTML.
from snarkdown.
@najtin sure. Basically, my point is that very often, if markdown is used in real-world apps, it’s used to parse user-generated content (like these comments we’re writing here). Most developers don’t explicitly go through OWASP’s list of potential security pitfalls every single time they implement anything.
So, what will most likely happen is that someone will use this library and not expect it to allow JavaScript to be executed when using the output. If this library doesn’t want to implement an explicit cross-site-scripting preventing mechanism, it should at least have a warning that implementing such a mechanism is always necessary when parsing and rendering user-generated markdown content.
Otherwise, developers will find a way to mess this up and risk their users’ and company’s security and image.
Other markdown parsers mention these issues in their readme or have (sometimes too simple) ways of mitigating them themselves, like disabling all HTML.
Agree with you. This is really important.
The library should either explicitly state that the parser does NOT protect against XSS or implement a XSS feature itself.
Referring to other libraries in the README that help with XSS would be a nice addition too. There are a number of client side and server side solutions out there that fit the tiny & fast mantra of Snarkdown.
from snarkdown.
There are a number of client side and server side solutions out there that fit the tiny & fast mantra of Snarkdown.
Can you give some examples of smaller client side libs that help with xss?
from snarkdown.
Removing all HTML from the markdown before passing it to snarkdown, would this render the output safe? Or could one cause a similar output by another valid markdown?
from snarkdown.
Related Issues (20)
- Strikethrough not working HOT 3
- HR not working after PRE
- Nested lists not working HOT 3
- unexpected link generated HOT 1
- feature request: fenced divs
- Add id to headings HOT 1
- Issue with ruler after <h3>
- The output for nested italic and bold is incorrect. HOT 5
- Exposiing the parser API
- Add usage example with PrismJS HOT 1
- pre+code tag instead of just pre HOT 1
- module not defined HOT 1
- Type declarations missing in npm package HOT 2
- v2.0 breaking changes HOT 1
- Angle-bracket link/url syntax not supported
- Date formatting support
- Unexpected parsing with single characters such as * HOT 1
- Markdown code not formatting correctly on uptime website HOT 3
- export declarations may only appear at top level of a module
- in inline html, attribute values are incorrectly parsed as markdown HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from snarkdown.