Git Product home page Git Product logo

reverse_markdown's Introduction

Summary

Transform html into markdown. Useful for example if you want to import html into your markdown based application.

Build Status Gem Version Code Climate Code Climate

Changelog

See Change Log

Requirements

  1. Nokogiri
  2. Ruby 2.0.0 or higher

Installation

Install the gem

[sudo] gem install reverse_markdown

or add it to your Gemfile

gem 'reverse_markdown'

Features

  • Supports all the established html tags like h1, h2, h3, h4, h5, h6, p, em, strong, i, b, blockquote, code, img, a, hr, li, ol, ul, table, tr, th, td, br, figure
  • Module based - if you miss a tag, just add it
  • Can deal with nested lists
  • Inline and block code is supported
  • Supports blockquote

Usage

Ruby

You can convert html content as string or Nokogiri document:

input  = '<strong>feelings</strong>'
result = ReverseMarkdown.convert input
result.inspect # " **feelings** "

Commandline

It's also possible to convert html files to markdown using the binary:

$ reverse_markdown file.html > file.md
$ cat file.html | reverse_markdown > file.md

Configuration

The following options are available:

  • unknown_tags (default pass_through) - how to handle unknown tags. Valid options are:
    • pass_through - Include the unknown tag completely into the result
    • drop - Drop the unknown tag and its content
    • bypass - Ignore the unknown tag but try to convert its content
    • raise - Raise an error to let you know
  • github_flavored (default false) - use github flavored markdown (yet only code blocks are supported)
  • tag_border (default ' ') - how to handle tag borders. valid options are:
    • ' ' - Add whitespace if there is none at tag borders.
    • '' - Do not not add whitespace.

As options

Just pass your chosen configuration options in after the input. The given options will last for this operation only.

ReverseMarkdown.convert(input, unknown_tags: :raise, github_flavored: true)

Preconfigure

Or configure it block style on a initializer level. These configurations will last for all conversions until they are set to something different.

ReverseMarkdown.config do |config|
  config.unknown_tags     = :bypass
  config.github_flavored  = true
  config.tag_border  = ''
end

Related stuff

Thanks

Thanks to all contributors and all other helpers:

reverse_markdown's People

Contributors

anshul78 avatar aried3r avatar craig-day avatar danschultzer avatar diogoosorio avatar ehsandarroudi avatar grddev avatar gregoryjscott avatar grmartin avatar gstamp avatar harlantwood avatar henrypoydar avatar jeanmartin avatar jesperronn avatar kerrick avatar livathinos avatar mauidude avatar michaelglass avatar mu-is-too-short avatar niall3rs avatar olleolleolle avatar pocke avatar rgould avatar shivabhusal avatar staugaard avatar stephencroberts avatar sunaku avatar visoft avatar willglynn avatar xijo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

reverse_markdown's Issues

Are you willing to license this under the MIT license or similar?

Hi, thanks for your work on this project! I am interested in contributing, and wondering if you would be willing to license this under an open source license. If yes, I am happy to create a pull request if helpful with the standard MIT (or other) license verbiage. Thanks for considering!

Ruby verbose mode warnings

If run with "ruby -w" (or "$VERBOSE = true"), reverse_markdown causes the following warnings:

reverse_markdown/cleaner.rb:13: warning: ambiguous first argument; put parentheses or even spaces
reverse_markdown/cleaner.rb:29: warning: ambiguous first argument; put parentheses or even spaces
reverse_markdown/cleaner.rb:35: warning: ambiguous first argument; put parentheses or even spaces
reverse_markdown/cleaner.rb:41: warning: ambiguous first argument; put parentheses or even spaces
reverse_markdown/cleaner.rb:47: warning: ambiguous first argument; put parentheses or even spaces

Maybe those could be eliminated for cleanliness? These warnings pop up if you are developing a script that requires reverse_markdown and run it with "ruby -w".

Should the .config settings be "sticky"?

require 'reverse_markdown'

ReverseMarkdown.config do |config|
  config.unknown_tags     = :bypass
  config.github_flavored  = true
end
p ReverseMarkdown.convert("<pre>a = 5</pre>")
p ReverseMarkdown.convert("<pre>a = 5</pre>")

Produces this output:

"```\na = 5\n```\n"
"    a = 5\n\n"

Only the first one is fenced.

It strikes me that the .config settings should be sticky until changed.

[Question] How to change \n by </br> ?

It markdown will return the \n, however, not all markdown renders will understand \n. So, is it possible to change \n by something else? Is that a config for this?

uninitialized constant ReverseMarkdown::Mapper::Digest (NameError)

So a follow up to #19, I'd done a little regexing to convert my <code> tags into git hub style code blocks and then ran into the following error:

.rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/reverse_markdown-0.4.6/lib/reverse_markdown/mapper.rb:23:in `block in process_root': uninitialized constant ReverseMarkdown::Mapper::Digest (NameError)
    from .rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/reverse_markdown-0.4.6/lib/reverse_markdown/mapper.rb:22:in `gsub!'
    from .rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/reverse_markdown-0.4.6/lib/reverse_markdown/mapper.rb:22:in `process_root'
    from .rbenv/versions/1.9.3-p125/lib/ruby/gems/1.9.1/gems/reverse_markdown-0.4.6/lib/reverse_markdown.rb:15:in `parse'
    from ../fixmarkdown.rb:12:in `block in <main>'
    from ../fixmarkdown.rb:5:in `glob'
    from ../fixmarkdown.rb:5:in `<main>'

Seems like a require 'digest' is missing some place.

Consider changing the license to something listed on Open Source Initiative list of approved licenses

As per https://opensource.org/minutes20090304 this was rejected as redundant with Fair license https://opensource.org/licenses/Fair

Would you consider using Fair license or CC0 or X11 or Apache 2.0 instead?

https://www.gnu.org/licenses/license-list.html#WTFPL FSF recommends using X11 license or Apache 2.0

This is stopping gitlab from updating licensee which now include a dependency on reverse_markdown

https://gitlab.com/gitlab-org/gitaly/-/issues/2856#note_438114315

Nested indentation - confirm bug?

Before I dive in and try and fix it, can you confirm that this behavior is undesirable?

Given a nested list between adjacent list items like this:

<ul>
  <li>alpha</li>
  <li>bravo
    <ul>
      <li>bravo alpha</li>
      <li>bravo bravo
        <ul>
          <li>bravo bravo alpha</i>
        </ul>
      </li>
    </ul>
  </li>
  <li>charlie</li>
  <li>delta</li>
</ul>

An extra newline seems to be inserted. So instead of getting this:

- alpha
- bravo
  - bravo alpha
  - bravo bravo
    - bravo bravo alpha
- charlie
- delta

Reverse markdown produces this:

- alpha
- bravo
  - bravo alpha
  - bravo bravo
    - bravo bravo alpha

- charlie
- delta

Seems wrong to me, but just wanted to check before I dug around for a fix. Markdown has surprised me before.

Pass unsupported tags instead of dropping them

What's the logic behind dropping all unsupported tags, rather than passing them through?

HTML is valid within markdown. If I have a non-markdownable entity (e.g. a script tag) reverse_markdown should pass it through to the output (or at least give me the option to do so).

Indenting seems off around code blocks

Input:

If you don't already have a <code>php.ini</code> file you'll need to create one by copying the default:
<code>
sh-3.2# if ( ! test -e /private/etc/php.ini ) ; then cp /private/etc/php.ini.default /private/etc/php.ini; fi
</code>

Restart Apache:

Output:

If you don't already have a  `php.ini`  file you'll need to create one by copying the default:  `sh-3.2# if ( ! test -e /private/etc/php.ini ) ; then cp /private/etc/php.ini.default /private/etc/php.ini; fi`  Restart Apache:  `sh-3.2# apachectl restart`

Expected either:

If you don't already have a `php.ini` file you'll need to create one by copying the default:  

    sh-3.2# if ( ! test -e /private/etc/php.ini ) ; then cp /private/etc/php.ini.default /private/etc/php.ini; fi

Restart Apache:

    sh-3.2# apachectl restart

or one using GitHub style comment blocks that I can't seem to figure out how to format using markdown.

HTML string containing underscores gets escaped and shown in output markdown

Ruby 2.3.0

Rails 4.2.5

Rails Console output:

2.3.0 :005 > html_str = "<p><strong>Username</strong> : %{user_name}</p>"
 => "<p><strong>Username</strong> : %{user_name}</p>" 
2.3.0 :006 > md = ReverseMarkdown.convert(html_str)
 => " **Username** : %{user\\_name}\n\n" 
2.3.0 :007 > puts md
 **Username** : %{user\_name}
 => nil 

As can be seen when my HTML string contains words separated by underscore(s), the underscore(s) gets escaped which is correct. But is there any way we can hide those in the output string? I have a use-case wherein I want to convert the HTML string to its Markdown version and render the markdown version as it is.

Process footnotes

I need to process footnotes and came up with this extension, if you think it could be a generic way to process footnotes, I'll be happy to open a PR.

# frozen_string_literal: true

require 'reverse_markdown'

# Sent as a patch to reverse_markdown
# https://github.com/xijo/reverse_markdown/issues/101
module ReverseMarkdown
  module Converters
    class Footnote < A
      def convert(node, state = {})
        # If the link has a circular reference, we need to check if it's
        # inside a paragraph or it's the first element of a paragraph or
        # list item.
        if node['id'] && node['href']&.start_with?('#')
          parent = node.parent

          # The link could be contained in a <sup>
          until %[p li].include?(parent.name) do
            parent = parent.parent

            # Don't go further than this
            break if parent.name == 'body'
          end

          first_child = parent.first_element_child

          # If it's the first link on the parent, it's the footnote
          # itself, otherwise it's the reference.
          if first_child == node || first_child.children.include?(node)
            "[^#{node['href'].tr('#', '')}]:"
          else
            "[^#{node['id']}]"
          end
        # Just process the link.
        else
          super
        end
      end
    end

    register :a, Footnote.new
  end
end

Edit: The footnote was pointing to it's id instead of the href for the reference. Also the markdown syntax was incorrect.

Improper nested ol/ul parsing

The Gem (although awesome) seems to struggle a bit with nested lists, both numbered and unnumbered.

Take this example HTML:

  <ol>
    <li>One
      <ol>
        <li>Sub one
          <ol>
            <li>Sub sub one</li>
            <li>Sub sub two</li>
          </ol>
        </li>
      </ol>
    </li>
  </ol>

Line breaks add whitespace

The above is outputted as:

2. Two
  1. Sub one
    1. Sub sub one
     2. Sub sub two

Note the 5 spaces before the "sub sub two" list item. Boggled my mind how it could get a number of spaces that wasn't a multiple of 2 (the indent function), but it turns out the line break after the preceding </li> inserts and extra space before the subsequent <li> at some point in the pipeline. Completely stripping new lines before passing to reverse markdown solves the problem.

Numbering

The numbering of numbered lists properly resets when a subsequent ol is nested deeper than the preceding ol, but seems to continue numbering if it is shallower.

1. One
  1. Sub one
  2. Sub two

3. Two
  1. Sub one
    1. Sub sub one
    2. Sub sub two

  3. Sub two

4. Three

I tried to implement the same logic in my own Gem before realizing that reverse markdown should handle it. Perhaps it'd make more sense to store the current_li value in an array, where each element represented one level of nesting (the value being the current number). Upon discovering a less indented list item, you'd reset all elements in the array after the current nesting level. Not a show stopper, because still valid markdown, but still an odd behavior.

Using the Gem to convert Word to Markdown. Awesome stuff!

Slack flavored Markdown

Hi, I was recently working with slack integration and had to convert HTML into Slack Flavored Markdown, which is a bit different than GitHub Flavored.

For the urgent need, I forked this project and made the required changes to make it work with Slack. I thought this might be good starting point for me to contribute to open source. If you guys think that there is a use for Slack Flavored Markdown, then I'll be more than happy to open a PR for it.

Slack only allows a small subset of Markdown features. It includes things like Bold, Italic, Ordered List, Unordered List, Quote, Code Block.

Hyphen(-) issue

Hyphen(-) in markdown should be show as a list, i think we should escape hyphen(-) to "-", otherwise after converted, hyphen(-) will be show as a list, but i expected only a normal hyphen(-), not list.

Unwanted new line characters within lists with paragraphs

The library is adding what I believe to be unintended newline characters, when parsing a document with li > p structure with newlines in between the 2 nodes:

[6] pry(main)> ReverseMarkdown.convert("<ul> <li><p>a</p></li></ul>")
=> "- a\n\n"
[7] pry(main)> ReverseMarkdown.convert("<ul><li><p>a</p></li></ul>")
=> "- a\n\n"
[8] pry(main)> ReverseMarkdown.convert("<ul><li> <p>a</p></li></ul>")
=> "-  \n\na\n\n"
[9] pry(main)> ReverseMarkdown.convert("<ul><li>\n<p>a</p>\n</li></ul>")
=> "- \n\na\n\n"

The 2 first examples work as intended, but if you add a space or newline character between the <li> and the <p> the library changes its behaviour and introduces the problem.

Funnily enough this scenario was accounted for in the list specs here, but the corresponding assertion is being skipped here - and it has been this way since 2012 (cd24cc3).

Anyway I'll open a PR shortly with a fix proposal. ๐Ÿ‘

Too much blankspace gets stripped

Coming from forem/forem#8457, I think that ReverseMarkdown strips too much blankspace in some scenarios. Here's a failing test:

  it 'keeps whitespace surrounding links' do
    result = ReverseMarkdown.convert("a\n<a href='1'>link</a>\nis good\nbut blankspace is better")
    expect(result).to eq "a [link](1) is good but blankspace is better\n\n"
  end

The output is a[link](1)is good but blankspace is better\n\n. This happens because the text converter calls remove_border_newlines, and the fact that the middle line is a link means that it will be its own nokogiri node, and the three nodes will be joined with no whitespace. I tried changing remove_border_lines to squeeze instead of removing everything, but this doesn't work: it keeps whitespace in scenarios where it shouldn't:

first<p>second</p>third becomes first\n\nsecond\n\n third\n\n.

I still want to investigate this further, but I decided to post this now to share my findings.

Links to ids don't produce Markdown links

When thehref is to an HTML id, no link is generated

link_to_external_site = '<a href="https://example.com#hallo">Hallo!</a>'
link_to_id = '<a href="#hallo">Hallo!</a>'

ReverseMarkdown.convert(link_to_external_site, github_flavored: true, tag_border: '')
=> "[Hallo!](https://example.com#hallo)"

ReverseMarkdown.convert(link_to_id, github_flavored: true, tag_border: '')
=> "Hallo!"

In the second case, I expected [Hallo!](#hallo), not Hallo!.

img tag is not implemented correctly?

>> require 'reverse_markdown'
=> true

test 1

>> s = '<img src="./images/1.jpg">'
=> "<img src=\"./images/1.jpg\">"
>> ReverseMarkdown.parse s
=> "![./images/1.jpg] "

test 2

>> ss = '<img src="./images/1.jpg" alt="some pic">'
=> "<img src=\"./images/1.jpg\" alt=\"some pic\">"
>> ReverseMarkdown.parse ss=> "![some pic][./images/1.jpg] "

test 3, copy from test spec comes with revers_markdown

>> a = '<p><img src="http://foo.bar/dog.png" alt="My Dog" title="Ralph"></p>'
=> "<p><img src=\"http://foo.bar/dog.png\" alt=\"My Dog\" title=\"Ralph\"></p>"
>> ReverseMarkdown.parse a
=> "\n\n![My Dog][http://foo.bar/dog.png] "

Untidy tags produce invalid Markdown

I may work on a patch when I have the time, but I'll leave this here for discussion :)

I'm converting a large site with user-edited HTML content through a WYSIWYG editor, so I'm finding many cases where stuff like this <em>word</em> is actually <em>wo</em><em>rd</em>, which this gems converts into _wo__rd_. I think it'd be good to add an option to sanitize HTML by removing empty tags and tidying a bit before, but I'm not sure if it'd be a task for reverse markdown. Maybe passing something who can do the cleanup, like Loofah?

Unsupported tags: h5, h6, i, b

The README says you support h5 and h6, but they don't work:

ReverseMarkdown.parse("<h5>hi</h5>")
#=> "hi"
ReverseMarkdown.parse("<h6>hi</h6>")
#=> "hi"

Additionally, b and i tags aren't supported, although strong and em are.

Command-line HTML to Markdown now available!

I've added markdown support to my html_massage gem, of course using reverse_markdown.

So you can do:

gem install html_massage
html_massage markdown http://en.wikipedia.org/wiki/Hylos

And you will get back the markdown version of the given URL. It works pretty well, better on some sites than others. The results of the command above, for example, are here: https://gist.github.com/31baaa60d7354a72dab1

Just thought people in this project would want to know!

I'm also happy to add this info to this project's README (or wiki) -- what do you think @xijo?

Improper space parsing within codeblock

The use of String#squeeze in the file below is messing with codeblock's indentation.

text.tr("\n\t", ' ').squeeze(' ')

I.E.

# Original code block
def original_value
  if assigned?
    original_attribute.original_value
  else
    type_cast(value_before_type_cast)
  end
end

# After ReverseMarkdown
def original_value
 if assigned?
 original_attribute.original_value
 else
 type_cast(value_before_type_cast)
 end
end

When encountering a formatting tag, spaces are lost

When passing in text with formatting, such as <strong> or <emm>, any spaces around the tags are lost, causing the words to run together.

Example:

<strong>Elephants</strong> are large land mammals in two extant genera of the family Elephantidae: <em>Elephas</em> and <em>Loxodonta</em>, with the third genus <em>Mammuthus</em> extinct.

produces the following output:

**Elephants**are large land mammals in two extant genera of the family Elephantidae:*Elephas*and*Loxodonta*, with the third genus*Mammuthus*extinct.

Strongs add whitespace

Loving the new 5.x features. One thing I noticed in word-to-markdown's unit tests after bumping:

Converting the following:

<p class="P1">This word is <strong class="T1">bold</strong>.</p></body>

Resulted in (4.x top, 5.x bottom):

<"This word is **bold**."> expected but was
<"This word is **bold** .">.

(Note the extra space before the period)

Raise on character encoding errors

I've been using Reverse Markdown and it works great most of the time. I've run into one issue that I thought I'd get your opinion on.

Sometimes the HTML documents I'm converting have character encoding problems, leading to th dreaded Argument Error: invalid byte sequence in UTF-8.

In other places I'm fixing this by coercing the lines of a file to UTF8 as I read them. I've discovered that when you parse a line you can generally just force_encoding on it, and that will convert typographic marks and whatnot pretty well, but occasionally you'll run into issues where it's not enough and you have to be more aggressive, ie. the following:

def clean_line(line)
  # encoding must be utf8, if non-utf8 characters are encountered we remove them.
  # Weirdly though, this can fail, but then doesn't blow up until you call something else on the string...
  line.force_encoding("UTF-8").strip # strip will make this raise if it didn't work
rescue
  # ... in that case we want to selectively remove the offending characters.
   line.encode('UTF-8', 'binary', invalid: :replace, undef: :replace, replace: '')
end

I end up using this same code to scrub HTML before I enter it into ReverseMarkdown, but it would probably be more efficient to handle it inside the gem - and would save other people from this same headache.

Are you interested in handling encoding errors inside the gem? If yes, you can use that code, or I can try to circle back with a PR. If not, no worries, just thought it might be worth considering.

Thanks for a great gem!

tag_border option is being ignored.

The tag_border option can theoretically be used to specify whether tags have a whitespace border or not:

#default:
ReverseMarkdown.convert("markdownFoo")
#no whitespace:
ReverseMarkdown.convert("markdownFoo", tag_border:'')

This option doesn't work at the moment, because:

  • tag_border is only used from cleaner.rb=>tidy
  • The call cleaner.tidy(result) in reverse_markdown.rb happens outside of the config block, so...
  • Inside of cleaner we revert to the default config, ignoring the input tag_border.

The bug can be fixed straightforwardly by moving the cleaner.tidy call into the config block. I'll add a PR in a bit demonstrating the fix.


Justification: Using ReverseMarkdown to parse tricky mixed tags of this form

<a href=\"http://Google.de\"><em><b>Google</b></em><b>.de</b></a>

results in the following output

[_ **Google** _ **.de**](http://Google.de)

The underscore tags aren't parsed correctly by markdown (or by redcarpet) because they're separated from the bold tag by the whitespaces added by the cleaner.

ReverseMarkdown::Cleaner#clean_tag_borders aggressive with cleanup

Found a bug caused by the clean_tag_borders method which does this:

>> require 'reverse_markdown'
=> true
>> ReverseMarkdown.convert('<html><body><a href="http://blog.99.co/wp-content/uploads/2014/04/suomayamuseum_mexico_city.jpg__1072x0_q85_upscale.jpg">test</a></body></html>')
=> " [test](http://blog.99.co/wp-content/uploads/2014/04/suomayamuseum_mexico_city.jpg__1072x0_q85_upscale.jpg)"
>> ReverseMarkdown.convert('<a href="http://blog.99.co/wp-content/uploads/2014/04/tohoku_earthquake_tsunami_japan.jpg__1072x0_q85_upscale.jpg"><img class="alignnone size-full wp-image-160" alt="tohoku_earthquake_tsunami_japan.jpg__1072x0_q85_upscale" src="http://blog.99.co/wp-content/uploads/2014/04/tohoku_earthquake_tsunami_japan.jpg__1072x0_q85_upscale.jpg" width="1072" height="603" /></a>')
=> " [![tohoku_earthquake_tsunami_japan.jpg __1072x0_q85_upscale](http://blog.99.co/wp-content/uploads/2014/04/tohoku_earthquake_tsunami_japan.jpg__ 1072x0_q85_upscale.jpg)](http://blog.99.co/wp-content/uploads/2014/04/tohoku_earthquake_tsunami_japan.jpg__1072x0_q85_upscale.jpg)"

Notice the extra space added between the two __ in the links and hence the link is broken. Looking at a fix now, will submit a pull request if I find something quick.

Github tables without a header row

I have some tables that are coded in HTML without like this...

<table>
<tr>
<td><b> header 1 </b></td> <td> <b> header 2 </b> </td>
<td> item 3 </td> <td> item 4 </td>
</tr>
</table>
| header 1 | header 2 |
| item 3 | item 4 |

Suggestion: github_flavored should treat the first row as the header row to match the Github markdown spec for tables. The markdown shown above is not being converted back to HTML properly by Redcarpet or the github_markup gem (Ruby). Thanks!

Text inside backticks is escaped

When I try to convert a <p> element containing text inside backticks, underscores (and asterisks)
are escaped:

ReverseMarkdown.convert("<p>`foo_bar`</p>")
=> "`foo\\_bar`"

Then, when I try to convert back the result with RedCarpet then I get this:

=> "<p><code>foo\\_bar</code></p>\n"

which is inconsistent with the reversed string due to the escaping.
Shouldn't reverse markdown recognise the backticks and skip escaping inside them?

Emphasis trailing/leading whitespace

In relation to #37 I found an issue. It seems there is only proper clearing of whitespace with double emphasis:

2.1.1 :025 > ReverseMarkdown.convert '<strong> test </strong>'
 => " **test** " 
2.1.1 :026 > ReverseMarkdown.convert '<em> test </em>'
 => "_ test _"

From reading the cleaner.rb file I see that most places the emphasis is only defined as double asterisks, neither single or triple. I suggest that the RegExp is changed to handle this more carefully.

no new line char in code inside <pre> tag

Input

<p>I ran into a weird situation today. Active Record objects stored in vars are removed when I switched from one tenant
    to another on the fly. This will create a weird test-failing scenario and you never know why its happening.</p>
<pre>
  def switch(name)
    yield(name)
  end
  
  @customers = [1,2,3]
  switch('shiva') do |name|
    puts name
    puts @customers
  end
  
  # Gives output
  # shiva
  # 1
  # 2
  # 3
</pre>

<p>&nbsp;</p>
<p>but when I do this</p>

<pre>
  @roles     = Role.all
  @customers = org.branches
  @list      = [1, 2]
  
  puts 'Role count' + @roles.count.to_s
  puts 'Customer count' + @customers.count.to_s
  puts 'list count' + @list.count.to_s
  
  Apartment::Tenant.switch!(org.database_name)
  puts '-------'
  puts 'Role count' + @roles.count.to_s
  puts 'Customer count' + @customers.count.to_s
  puts 'list count' + @list.count.to_s
</pre>

<p>output is</p>

<pre>
  Role count2
  Customer count2
  list count2
  -------
  Role count0
  Customer count0
  list count2
</pre>

<p>&nbsp;</p>
<h2>Reason</h2>

<pre>
  # apartment-2.2.0/lib/apartment/adapters/abstract_adapter.rb
  #   Switch to a new tenant
  #
  #   @param {String} tenant name
  #
  def switch!(tenant = nil)
    run_callbacks :switch do
      return reset if tenant.nil?
  
      connect_to_new(tenant).tap do
        <strong>Apartment.connection.clear_query_cache</strong>
      end
    end
  end
</pre>

is rendered into

I ran into a weird situation today. Active Record objects stored in vars are removed when I switched from one tenant to another on the fly. This will create a weird test-failing scenario and you never know why its happening.

    def switch(name) yield(name) end @customers = [1,2,3] switch('shiva') do |name| puts name puts @customers end # Gives output # shiva # 1 # 2 # 3

&nbsp;

but when I do this

    @roles = Role.all @customers = org.branches @list = [1, 2] puts 'Role count' + @roles.count.to\_s puts 'Customer count' + @customers.count.to\_s puts 'list count' + @list.count.to\_s Apartment::Tenant.switch!(org.database\_name) puts '-------' puts 'Role count' + @roles.count.to\_s puts 'Customer count' + @customers.count.to\_s puts 'list count' + @list.count.to\_s

output is

    Role count2 Customer count2 list count2 ------- Role count0 Customer count0 list count2

&nbsp;

## Reason

    # apartment-2.2.0/lib/apartment/adapters/abstract\_adapter.rb # Switch to a new tenant # # @param {String} tenant name # def switch!(tenant = nil) run\_callbacks :switch do return reset if tenant.nil? connect\_to\_new(tenant).tap do **Apartment.connection.clear\_query\_cache** end end end


Expected

It should have rendered newline chars \n inside <pre> tag where there is no <code> tag inside.
The HTML is extracted from by blog at https://cbabhusal.wordpress.com . Wordpress does not put <code> tag inside <pre> tag.

Code

ReverseMarkdown.convert(post['content'])

convert error

I get this error when convert html to markdown, is it an issue?
ArgumentError: negative argument
from /home/case18/.rvm/gems/ruby-2.1.0/gems/reverse_markdown-0.5.0/lib/reverse_markdown/converters/li.rb:22:in *' from /home/case18/.rvm/gems/ruby-2.1.0/gems/reverse_markdown-0.5.0/lib/reverse_markdown/converters/li.rb:22:inindentation_for'
from /home/case18/.rvm/gems/ruby-2.1.0/gems/reverse_markdown-0.5.0/lib/reverse_markdown/converters/li.rb:6:in convert' from /home/case18/.rvm/gems/ruby-2.1.0/gems/reverse_markdown-0.5.0/lib/reverse_markdown/converters/base.rb:11:intreat'
from /home/case18/.rvm/gems/ruby-2.1.0/gems/reverse_markdown-0.5.0/lib/reverse_markdown/converters/base.rb:6:in block in treat_children' from /home/case18/.rvm/gems/ruby-2.1.0/gems/nokogiri-1.6.1/lib/nokogiri/xml/node_set.rb:237:inblock in each'
from /home/case18/.rvm/gems/ruby-2.1.0/gems/nokogiri-1.6.1/lib/nokogiri/xml/node_set.rb:236:in upto' from /home/case18/.rvm/gems/ruby-2.1.0/gems/nokogiri-1.6.1/lib/nokogiri/xml/node_set.rb:236:ineach'

Improve handling of new lines

Using this HTML:

<p><strong>Some text<br><br>other text</strong></p>

I end up with:

**Some text

other text**

New lines inside emphasis tags should be handled specifically to avoid this problem.

Whitespace before links

When links are directly preceded by non-whitespace characters, a space is added before the link. Hence, a link wrapped in quotes, parens, etc, will come out appearing poorly formatted. Since whitespace is not required before a link in markdown, it shouldn't be added when reversing.

irb(main):004:0> ReverseMarkdown.convert '<p>I like this "<a href="http://daringfireball.net/projects/markdown/">markdown</a>" stuff!</p>'
=> "I like this \" [markdown](http://daringfireball.net/projects/markdown/)\" stuff!\n\n"

Expected:
=> "I like this \"[markdown](http://daringfireball.net/projects/markdown/)\" stuff!\n\n"

An exception may be where a ! precedes the link in the HTML, making the converted markdown an inline image instead of a link.

Blockquotes are assumed to be free of any HTML elements

It looks like <blockquote>s are assumed to be free of any HTML elements. If input like <blockquote><p>...</p></blockquote> is given, the output is broken:

ReverseMarkdown.parse("<blockquote><p>foo</p></blockquote>")
#=> ">\n\n> foo"

i.e.,:

>

> foo

The desired output here is simply

> foo

Likewise, if a <blockquote> contains, say, a list:

ReverseMarkdown.parse("<blockquote><ul><li>foo</li></ul></blockquote>")
#=> ">\n- foo"

i.e.,

>
- foo

The desired output is

> - foo

<blockquote> should probably be unwrapped, then its contents parsed to Markdown, then the final result prepended with >, so that input like:

<blockquote>
<p>Some text.</p>
<p>Some more text.</p>
</blockquote>

Results in:

> Some text.
>
> Some more text.

Blockquote not correctly closed

Hi I have the following HTML:

<blockquote>  This is a quote </blockquote>

<img alt="alt" src="https://path/to/file.jpg" />

The generated markdown for this is

> This is a quote ![alt](https://path/to/file.jpg)

Somehow the newline before the image is missing after a blockquote. It happens for all blockquote elements I tested that where followed by an image. If there is text after the quote everything works fine.

I'm using reverse_markdown version 1.0.3 and ruby 2.3.1p112

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.