Git Product home page Git Product logo

Comments (11)

xijo avatar xijo commented on August 31, 2024

Hello @jiggneshhgohel

I'm not quite sure how to help you there. We could introduce an option to disable the escaping, but I think the use cases for that are quite rare.

On the other hand could you replace escaped key chars with the non-escaped versions whenever you need it. Wouldn't that be the easier solution?

Let me know what you think!
Jo

from reverse_markdown.

jiggneshhgohel avatar jiggneshhgohel commented on August 31, 2024

@xijo

On the other hand could you replace escaped key chars with the non-escaped versions whenever you need it. Wouldn't that be the easier solution?

That's the most obvious option when there is no provision for desired feature in the library. But my expectancy of using a library like yours is to expect raw original markdown from the HTML I need to convert. My HTML didn't contained any escape characters thus I wasn't expecting it in my converted to markdown.

Thanks.

from reverse_markdown.

anujbiyani avatar anujbiyani commented on August 31, 2024

I think the issue is that the md = ReverseMarkdown.convert(html_str) is double-escaping the underscore, no? I would expect

2.3.0 :006 > md = ReverseMarkdown.convert(html_str)
 => " **Username** : %{user\\_name}\n\n" 

to read as

2.3.0 :006 > md = ReverseMarkdown.convert(html_str)
 => " **Username** : %{user\_name}\n\n" # only \_ here instead of \\_

The issue I'm having is with a string like

"https://github.com/xijo/reverse_markdown"

ReverseMarkdown converts it to

"https://github.com/xijo/reverse\\_markdown\n\n"

and then the marked library converts it to

"<p><a href="https://github.com/xijo/reverse\_markdown">https://github.com/xijo/reverse\_markdown</a></p>
"

which is correct from marked's perspective because it's processing only one of the escapes, but incorrect from the user's perspective because the original string had no escapes and the outputted markdown has escapes.

(I'm using ReverseMarkdown because sometimes the input string contains actual markup, sometimes it's just plain text.)

from reverse_markdown.

anujbiyani avatar anujbiyani commented on August 31, 2024

https://www.markdownguide.org/basic-syntax/#italic

To italicize text, add one asterisk or underscore before and after a word or phrase. To italicize the middle of a word for emphasis, add one asterisk without spaces around the letters.

So I think the actual problem is that reverse markdown should never escape underscores that are in the middle of a word.

from reverse_markdown.

xijo avatar xijo commented on August 31, 2024

@anujbiyani Thanks for your input on that issue.

I like the idea to leave underscored in words unmodified. It won't solve all issues though, because something like foo, _bar = something will still raise problems, but it's a good solution for most of the cases.

Would you like to open a PR?

from reverse_markdown.

anujbiyani avatar anujbiyani commented on August 31, 2024

Sure, happy to try and tackle this!

One question: why are * and _ escaped at all? (Referring to this function.) Aren't unmatched * and _ valid without the escaping, and won't most markdown->HTML parsers simply ignore them?

from reverse_markdown.

xijo avatar xijo commented on August 31, 2024

@anujbiyani Good question! I did some digging in the past and I think initial the description mentions this case: https://daringfireball.net/projects/markdown/syntax#backslash

The example sounds pretty reasonable to me so I wouldn't ditch the whole escaping. What about:

  1. don't escape in the middle of a word
  2. add an configuration flag that disables escaping

This way it should be save for the majority and doesn't change the default behavior all to drastically. But everyone is free to opt out of escaping if it fits their case.

What do you think?

from reverse_markdown.

anujbiyani avatar anujbiyani commented on August 31, 2024

Happy to do it via configuration flag to maintain backwards compatibility! Follow-up question, though: I believe the link you provided is referring to user-provided backslashes whereas the code I linked in ReverseMarkdown is escaping underscores and asterisks that don't have specifically two backslashes.

I think that method should either:

  1. be removed (if it doesn't have to exist)
  2. be a no-op (if it does have to exist)
  3. add a second backslash if there already is one backslash (if we need the double-backslash instead of just the single)

I'm trying to understand what the right behavior should be for when the flag is enabled 😄 let me know what you think!

from reverse_markdown.

xijo avatar xijo commented on August 31, 2024

You're right, the backslash escape is used for user provided escapes. But, if we complete a full life cycle (HTML - MD - HTML) and we would not escape the initial *, then it would be treated as if it were markdown and therefore the information would be lost. It's a little confusing, so let me give you an example:

With escaping

<div>I _emphasize_ that</div>   # Original HTML (user input)
I \_emphasize\_ that            # MD after conversion
<p>I _emphasize_ that</p>       # After using random MD interpreter

Without escaping (as proposed)

<div>I _emphasize_ that</div>     # Original HTML (user input)
I _emphasize_ that                # MD after conversion
<p>I <em>emphasize</em> that</p>  # Now there is an additional em tag

Does this example clarify my problem with just skipping the whole escaping?

In my opinion it is correct to treat the original HTML entities as user input and therefore escape them.

from reverse_markdown.

anujbiyani avatar anujbiyani commented on August 31, 2024

Ohh that helps a ton, thanks! So now I agree that removing the escaping entirely isn't actually a good idea.

What if we just go with option 1 from your post above and skip the disabling escaping via config?

  1. don't escape in the middle of a word
  2. add an configuration flag that disables escaping

Or did you mean that the configuration flag should apply to whether or not we escape in the middle of the word?

Sorry for all the back-and-forth, I find escaping/not-escaping incredibly confusing from a keeping-it-straight-in-my-head perspective 😖 .

from reverse_markdown.

bsbodden avatar bsbodden commented on August 31, 2024

Also, don't convert/parse tag attributes, for example:

one_tag = "<img src=\"https://someimageserver.com/img__1fu9uz__.png\" alt=\"foo\" />"
ReverseMarkdown.convert(one_tag)

results in:

=> " ![foo](https://someimageserver.com/img __1fu9uz__.png)"

notice the space

from reverse_markdown.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.