Ruby 2.3.0 Rails 4.2.5 <p dir=

Hello <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Sure, happy to try and tackle this! One question: why are <code clas

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

HTML string containing underscores gets escaped and shown in output markdown about reverse_markdown HOT 11 OPEN

xijo commented on August 31, 2024 1

HTML string containing underscores gets escaped and shown in output markdown

from reverse_markdown.

Comments (11)

xijo commented on August 31, 2024

Hello @jiggneshhgohel

I'm not quite sure how to help you there. We could introduce an option to disable the escaping, but I think the use cases for that are quite rare.

On the other hand could you replace escaped key chars with the non-escaped versions whenever you need it. Wouldn't that be the easier solution?

Let me know what you think!
Jo

from reverse_markdown.

jiggneshhgohel commented on August 31, 2024

@xijo

On the other hand could you replace escaped key chars with the non-escaped versions whenever you need it. Wouldn't that be the easier solution?

That's the most obvious option when there is no provision for desired feature in the library. But my expectancy of using a library like yours is to expect raw original markdown from the HTML I need to convert. My HTML didn't contained any escape characters thus I wasn't expecting it in my converted to markdown.

Thanks.

from reverse_markdown.

anujbiyani commented on August 31, 2024

I think the issue is that the md = ReverseMarkdown.convert(html_str) is double-escaping the underscore, no? I would expect

2.3.0 :006 > md = ReverseMarkdown.convert(html_str)
 => " **Username** : %{user\\_name}\n\n"

to read as

2.3.0 :006 > md = ReverseMarkdown.convert(html_str)
 => " **Username** : %{user\_name}\n\n" # only \_ here instead of \\_

The issue I'm having is with a string like

"https://github.com/xijo/reverse_markdown"

ReverseMarkdown converts it to

"https://github.com/xijo/reverse\\_markdown\n\n"

and then the marked library converts it to

"<p><a href="https://github.com/xijo/reverse\_markdown">https://github.com/xijo/reverse\_markdown</a></p>
"

which is correct from marked's perspective because it's processing only one of the escapes, but incorrect from the user's perspective because the original string had no escapes and the outputted markdown has escapes.

(I'm using ReverseMarkdown because sometimes the input string contains actual markup, sometimes it's just plain text.)

from reverse_markdown.

anujbiyani commented on August 31, 2024

https://www.markdownguide.org/basic-syntax/#italic

To italicize text, add one asterisk or underscore before and after a word or phrase. To italicize the middle of a word for emphasis, add one asterisk without spaces around the letters.

So I think the actual problem is that reverse markdown should never escape underscores that are in the middle of a word.

from reverse_markdown.

xijo commented on August 31, 2024

@anujbiyani Thanks for your input on that issue.

I like the idea to leave underscored in words unmodified. It won't solve all issues though, because something like foo, _bar = something will still raise problems, but it's a good solution for most of the cases.

Would you like to open a PR?

from reverse_markdown.

anujbiyani commented on August 31, 2024

Sure, happy to try and tackle this!

One question: why are * and _ escaped at all? (Referring to this function.) Aren't unmatched * and _ valid without the escaping, and won't most markdown->HTML parsers simply ignore them?

from reverse_markdown.

xijo commented on August 31, 2024

@anujbiyani Good question! I did some digging in the past and I think initial the description mentions this case: https://daringfireball.net/projects/markdown/syntax#backslash

The example sounds pretty reasonable to me so I wouldn't ditch the whole escaping. What about:

don't escape in the middle of a word
add an configuration flag that disables escaping

This way it should be save for the majority and doesn't change the default behavior all to drastically. But everyone is free to opt out of escaping if it fits their case.

What do you think?

from reverse_markdown.

anujbiyani commented on August 31, 2024

Happy to do it via configuration flag to maintain backwards compatibility! Follow-up question, though: I believe the link you provided is referring to user-provided backslashes whereas the code I linked in ReverseMarkdown is escaping underscores and asterisks that don't have specifically two backslashes.

I think that method should either:

be removed (if it doesn't have to exist)
be a no-op (if it does have to exist)
add a second backslash if there already is one backslash (if we need the double-backslash instead of just the single)

I'm trying to understand what the right behavior should be for when the flag is enabled 😄 let me know what you think!

from reverse_markdown.

xijo commented on August 31, 2024

You're right, the backslash escape is used for user provided escapes. But, if we complete a full life cycle (HTML - MD - HTML) and we would not escape the initial *, then it would be treated as if it were markdown and therefore the information would be lost. It's a little confusing, so let me give you an example:

With escaping

<div>I _emphasize_ that</div>   # Original HTML (user input)
I \_emphasize\_ that            # MD after conversion
<p>I _emphasize_ that</p>       # After using random MD interpreter

Without escaping (as proposed)

<div>I _emphasize_ that</div>     # Original HTML (user input)
I _emphasize_ that                # MD after conversion
<p>I <em>emphasize</em> that</p>  # Now there is an additional em tag

Does this example clarify my problem with just skipping the whole escaping?

In my opinion it is correct to treat the original HTML entities as user input and therefore escape them.

from reverse_markdown.

anujbiyani commented on August 31, 2024

Ohh that helps a ton, thanks! So now I agree that removing the escaping entirely isn't actually a good idea.

What if we just go with option 1 from your post above and skip the disabling escaping via config?

don't escape in the middle of a word

add an configuration flag that disables escaping

Or did you mean that the configuration flag should apply to whether or not we escape in the middle of the word?

Sorry for all the back-and-forth, I find escaping/not-escaping incredibly confusing from a keeping-it-straight-in-my-head perspective 😖 .

from reverse_markdown.

bsbodden commented on August 31, 2024

Also, don't convert/parse tag attributes, for example:

one_tag = "<img src=\"https://someimageserver.com/img__1fu9uz__.png\" alt=\"foo\" />"
ReverseMarkdown.convert(one_tag)

results in:

=> " ![foo](https://someimageserver.com/img __1fu9uz__.png)"

notice the space

from reverse_markdown.

HTML string containing underscores gets escaped and shown in output markdown about reverse_markdown HOT 11 OPEN

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent