springload / draftjs_exporter Goto Github PK

View Code? Open in Web Editor NEW

81.0 18.0 20.0 721 KB

Convert Draft.js ContentState to HTML

Home Page: https://www.draftail.org/blog/2018/03/13/rethinking-rich-text-pipelines-with-draft-js

License: MIT License

Python 97.73% Makefile 0.84% Shell 1.30% JavaScript 0.13%

draft-js python exporter draftjs-exporter draftail rich-text

draftjs_exporter's People

Stargazers

Watchers

draftjs_exporter's Issues

Improve dependencies compatibility definition, and testing

Raised by @loicteixeira in https://github.com/springload/draftjs_exporter/pull/74/files#r135967185. Opening as a separate issue because it's worth discussing and there is no point in holding that PR for this.

At the moment, draftjs_exporter defines its dependencies as:

draftjs_exporter/setup.py

Lines 15 to 22 in c963514

 dependencies = [ 

 'beautifulsoup4>=4.4.1,<5', 

 'html5lib>=0.999,<=1.0b10', 

 ] 

 lxml_dependencies = [ 

 'lxml>=3.6.0', 

 ]

Those ranges are purposefully big (we want to support as many versions as possible for something as fundamental to people's tech stacks), which is good, but as @loicteixeira puts it then it would make sense to test accordingly, with the lower and upper bounds at least.

More info to help in the decision,

lxml at 3.6.0 is the version I used locally. Seems like now there is 3.8.0 (https://pypi.python.org/pypi/lxml), might make sense to add an upper bound in setup.py to limit installed versions to below 4.0.0?
The BeautifulSoup4 / html5lib versions are loosely aligned with requirements from Wagtail: https://github.com/wagtail/wagtail/blob/a7820a2e903a560d6ceb73062ab9ba922f23b0dc/setup.py#L30-L31

    "beautifulsoup4>=4.5.1",
    "html5lib>=0.999,<1",

Finally, bear in mind that our usage of the APIs of those dependencies is very small (HTML string -> DOM nodes conversion, DOM nodes -> HTML string conversion, create nodes, append child to node), which means that the potential breakage would only likely be in how those engines handle specific content, which is hard to test for. We do however have a small test suite of "potential engine quirks": https://github.com/springload/draftjs_exporter/blob/master/tests/engines/test_engines_differences.py

Entity renderers should be given more data on render

Follow-up to #87. At the moment, entity components are only given a small amount of data about an entity for rendering:

draftjs_exporter/draftjs_exporter/entity_state.py

Lines 48 to 58 in 764d377

 props = entity_details['data'].copy() 

 props['entity'] = { 

 'type': entity_details['type'], 

 } 

 nodes = DOM.create_element() 

 for n in self.element_stack: 

 DOM.append_child(nodes, n) 

 elt = DOM.create_element(opts.element, props, nodes)

There are a few more problems here:

The shape of the props is different than how this is stored in the Draft.js entityMap.
The entity data is given as the top-level props, which makes it impossible to add extra props without risking overwriting the entity data (say if the entity has a field called type that does something different from the entity type).

We should refactor this to something like:

props = {}
props['entity'] = entity_details
props['entity']['key'] = key
props['block'] = block
props['blocks'] = blocks
# (Potentially the `entity_range` as well?)

This way, it's easy to add arbitrary props without risking overwrites. The components get full access to the entity data, mutability, type, and key. And to the current block and blocks list, like styles, decorators, blocks.

Compared to the changes introduced in #90, this would be a breaking change, so should be considered carefully. Users of the exporter will have to rewrite their renderers slightly to use the new props shape.

Block-level data support

I began to use draft-js and I discovered a feature called 'block-level data'. Via Modifier#setBlockData you can set additional metadata for a specific block.

In that case the generated JSON looks like this:

{  
   "entityMap": {},
   "blocks": [  
      ...
      {  
         "key": "a4dpo",
         "text": "asdfasdfsadfatsetsd",
         "type": "direct-speech",
         "depth": 0,
         "inlineStyleRanges": [],
         "entityRanges": [],
         "data": {  
            "name": "test"
         }
      },
      ...
   ]
}

Is there any way to implement a block rendering depending on these attributes values?

Build a demo with a live editor to facilitate testing & showcase

Using create-react-app and flask for example.

https://github.com/facebookincubator/create-react-app/blob/master/packages/react-scripts/template/README.md#integrating-with-a-node-backend

Deployed on Heroku

Nested items are not inserted in the right wrapper when lowering depth and increasing it again

Infringing test case:

draftjs_exporter/tests/test_output.py

Line 497 in d84f436

def test_render_with_backtracking_nested_wrapping(self):

[
                {
                    'text': 'Backtracking, two at once... (2)',
                    'depth': 2,
                },
                {
                    'text': 'Uh oh (1)',
                    'depth': 1,
                },
                {
                    'text': 'Up, up, and away! (2)',
                    'depth': 2,
                },
                {
                    'text': 'Arh! (1)',
                    'depth': 1,
                },
                {
                    'text': 'Did this work? (0)',
                    'depth': 0,
                },
]

Should be :

[...]
- [...]
  - 2 A
- 1 B
  - 2 C
- 1 D
0 E

Is (C is not in the right spot):

[...]
- [...]
  - 2 A
  - 2 C
- 1 B
- 1 D
0 E

Support the "decorator" feature?

Hi there,

I'm working on integrating this library in my own project, it works really nice, great work, thanks a lot.

And is there a plan to support the decorator rendering? It'll be quite useful.

Publish as a wheel – and hash issues with newly-published distributions on existing releases v2.1.7, v2.1.6, v2.1.5

Edit: 🚧 For hash issues since the package got published as a wheel – see comments below.

It looks like the exporter isn’t using wheels as its published format. We should switch over to wheels, which have a number of advantages as described on https://pythonwheels.com/.

It’s not clear to me whether switching from eggs to wheels is a breaking change or not, so the first step would be to research this and decide what to do.

Nested blocks at depth 0 should use the same wrapper

List item
- Nested list item
De-nested list item

becomes:

<ul class="bullet-list"><li>List item<ul class="bullet-list"><li>Nested list item</li></ul></li></ul>
<ul class="bullet-list"><li>De-nested list item</li></ul>

should be a single ul (same wrapper) with two depth 0 items.

Gut feel code to investigate: https://github.com/springload/draftjs_exporter/blob/master/draftjs_exporter/wrapper_state.py#L114

Drop support for Python 2.7

There is no plans to do this immediately, just opening this early so people interested in this project have a fair heads-up.

I'm looking into Python versions support of the exporter as part of #101. Although that issue isn't anything like a pressing feature / bug to address, I think it's important to be clear that the exporter isn't going to support old versions of Python forever.

Python 2.7 will officially stop being supported as a language on January 1st 2020 (PEP-373), 8 months from now (https://pythonclock.org/). The exporter should also drop Python 2.7 support then, if not before, like many other projects (https://python3statement.org/). The exporter is relatively stable, so there is no need to do this sooner than needed, either in 2020 to align with other projects in the Python ecosystem, or earlier if there is important work done on the exporter and it feels like Python 2.7 support is a hindrance.

There are many people relying on this project and using Python 2.7 at the moment, here are the pip installs over the last 30 days:

| python_version | download_count |
| -------------- | -------------- |
| 3.6            |         18,249 |
| 2.7            |         11,815 |
| 3.7            |          8,106 |
| 3.5            |          3,112 |
| 3.4            |            384 |
| 3.3            |              4 |
| 3.8            |              3 |
| 2.6            |              1 |
| Total          |         41,674 |

If you're one of them, it would be good to hear from you! Please be assured that if upgrading Python versions isn't an option for you, older versions of the exporter will still be available and keep on working. Considering how stable this project is, they'll also most likely keep on being relevant.

Add benchmarking tests with real-world content

At the moment the exports test suite is used to measure performance. While the content in it is significant, it's not really representative of real-world workloads.

Once someone has an open-source site using this, I would be keen to dump all of its content in this repository so we have more relevant numbers to look at.

DOM class variable issue with multiple engines (draftjs_exporter_markdown)

First of all, I would like to thank you as someone who is using this library.
I'm using this library with a markdown version. (draftjs_exporter_markdown)
Because the dom of draftjs_exporter.dom.DOM is a class-level variable, it could be a problem depending on the timing of creating two versions of the HTML object.
Furthermore, I'm using this on a web server, this can cause bigger problems in a multithreaded environment.
Not only the markdown version, but the 3 engines built-in draftjs_exporter will cause the same problem.

Could you solve this problem?
Attached test code and results.

from draftjs_exporter.dom import DOM
from draftjs_exporter.html import HTML
from draftjs_exporter.constants import ENTITY_TYPES
from draftjs_exporter_markdown import ENGINE as MARKDOWN_ENGINE

template = {
    'blocks': [
        {'key': 'rrwx',
         'type': 'unstyled',
         'text': '<a href="https://www.google.com">google.com</a>',
         'depth': 0,
         'inlineStyleRanges': [],
         'entityRanges': [{'offset': 9, 'length': 22, 'key': 0}],
         'data': {}}],
    'entityMap': {'0': {'type': 'LINK',
                        'mutability': 'MUTABLE',
                        'data': {'url': 'https://www.google.com'}}}}

html_exporter = HTML({
    'entity_decorators': {
        ENTITY_TYPES.LINK: lambda props: DOM.create_element('a', {
            'href': props['url']
        }, props['children']),
    },
    'engine': DOM.LXML
})
html1 = html_exporter.render(template)

markdown_exporter = HTML({
    'engine': MARKDOWN_ENGINE
})
html2 = html_exporter.render(template)

print(html1)
print(html2)

output:

<p>&lt;a href="<a href="https://www.google.com">https://www.google.com</a>"&gt;google.com&lt;/a&gt;</p>
<p><a href="<a href="https://www.google.com">https://www.google.com</a>">google.com</a></p>

Add configuration options to determine handling of missing components

Copy/pasting @loicteixeira's point about the current exceptions:

Something to consider as well is whether the renderer should crash when there is a component it doesn’t know about it, or if it should ignore it? Maybe a settings in draftjs_exporter like RAISE_MISSING_BLOCK_ERRORS which we will probably set to the same value as DEBUG.

style_map components should be given data on render

At the moment, style_map components do not receive any data beyond the text to style (as props['children]).

draftjs_exporter/draftjs_exporter/style_state.py

Lines 26 to 32 in 209631a

 def render_styles(self, text_node): 

 node = text_node 

 if not self.is_empty(): 

 # Nest the tags. 

 for s in sorted(self.styles, reverse=True): 

 opt = Options.for_style(self.style_map, s) 

 node = DOM.create_element(opt.element, opt.props, node)

This is ok for common use cases (BOLD, ITALIC, etc), but it makes the style_map fallback rather useless – there is no way to know what style needs the fallback, or have any other information about the context to adjust the fallback behavior.

Here's what the block_map fallback has access to for comparison:

props['block'] = {
        'type': type_,
        'depth': depth,
        'data': data,
}

In retrospect I think this could've been all of the block's attributes, not just a cherry-picked shortlist, so for inline styles we could pass the following exhaustive props:

{
    # The style range to render.
    "range": 
        "offset": 10,
        "length": 17,
        "style": "BOLD"
    },
    # The full block data, eg.
    "block": {
            "key": "t7k7",
            "text": "Unstyled test test test test test",
            "type": "unstyled",
            "depth": 0,
            "inlineStyleRanges": [
                {
                    "offset": 10,
                    "length": 17,
                    "style": "BOLD"
                }
            ],
            "entityRanges": [
                {
                    "offset": 0,
                    "length": 4,
                    "key": 6
                }
            ],
            "data": {}
        },
}

Here's the approximative change:

-    def render_styles(self, text_node):
+    def render_styles(self, text_node, block):
        node = text_node
        if not self.is_empty():
            # Nest the tags.
            for s in sorted(self.styles, reverse=True):
                opt = Options.for_style(self.style_map, s)
+                props['block'] = block
+                props['range'] = s
                node = DOM.create_element(opt.element, opt.props, node)

        return node

Ideally I'd like entities and blocks to also be given more data (enough data to recreate the whole ContentState, thus making the exporter usable to create content migrations), but that's a separate issue.

Create a html5lib DOM backing engine without using BeautifulSoup

I do not think there is a reason for using BeautifulSoup on this project, apart from the API being documented.

Presumably, using html5lib's tree builder directly would result in performance gains. I'm not sure if there are tradeoffs.

How to interlink with mathjax to export math equations HTML

"entityMap": {
        "0": {
            "type": "INLINE-EQUATION",
            "mutability": "IMMUTABLE",
            "data": {"text": "u=1/5, and v=3/5"}
        }
    }

I have tried something like this but its not working

"entity_decorators": {
        "INLINE-EQUATION": lambda props: DOM.create_element(
            "span", {"class": "mjx-process", 'data-inline-math': props['text']}
        ),
 }

need to generate math HTML! can anyone help me with this ?

How to apply text align inline style inside header tag?

Example : block

 {
            "key": "cu1e2",
            "data": {},
            "text": "test",
            "type": "header-three",
            "depth": 0,
            "entityRanges": [],
            "inlineStyleRanges": [
                {
                    "style": "right",
                    "length": 4,
                    "offset": 0
                }
            ]
        },

Use Django-style loading via path for custom backing engines

It would be handy to allow custom engines to be loaded via path (and not only by passing the class) with django.utils.module_loading.import_string.

So we don't tie this project to Django, here is the import_string source code which only relies on six.

Create a new dependency-free DOM backing engine

Make a dependency-free implementation of DOMEngine based on xml.etree.ElementTree or xml.etree.cElementTree. This might have a positive performance impact, as well as facilitating the use of the exporter.

For reference, here is my tentative implementation of an engine using ElementTree. For some reason it outputs wrappers twice, I would suspect a bug in wrapper_state that this particular implementation surfaces.

class DOM_ETREE(DOMEngine):
    """
    lxml implementation of the DOM API.
    """
    @staticmethod
    def create_tag(type_, attr=None):
        if not attr:
            attr = {}

        return etree.Element(type_, attrib=attr)

    @staticmethod
    def parse_html(markup):
        pass

    @staticmethod
    def append_child(elt, child):
        if hasattr(child, 'tag'):
            elt.append(child)
        else:
            c = etree.Element('fragment')
            c.text = child
            elt.append(c)

    @staticmethod
    def render(elt):
        return re.sub(r'(</?fragment>)', '', etree.tostring(elt, method='html'))

    @staticmethod
    def render_debug(elt):
        return etree.tostring(elt, method='html')

Edit: once implemented, this could become the default engine so people can more easily "choose their own adventure" with any other engine, but still have a working default when doing pip install draftjs_exporter.

Update build environments to latest Python version

At the moment the project is tested on Python 2.7, 3.4, 3.5. We need to update this to test Python 3.6 (or whatever the latest version is when we get to this task).

Also, I'm very keen on limiting the number of environments we test on. Would it be reasonable to:

Only build on Python 2.7, and the oldest supported Python 3 version (3.4) and the latest, not all intermediary versions?
Only build on Python 2.7, and the latest Python 3 version?

Experiment with PEP-484 type hints

Since Pyre got released, I've been thinking the exporter would be a good project to have type annotations in, either with Pyre or Mypy.

Corresponding PEPs:

Also worth checking out:

As far as I understand, in order to release anything useful, PEP-3107 is the bare minimum, and PEP-484 is a good baseline, so Python 3.5+ only. This means this package would need to drop support for Python 2.7 and 3.4. Wagtail, the main project relying on this package, has already dropped Python 2.7 compatibility, and the last version of Django to support Python 3.4 was v2.0 (https://docs.djangoproject.com/en/dev/faq/install/)

Edit: ^ I might be wrong, since the annotation syntax is supported starting with Python 3 the package should work in versions below 3.5. But type checking will only be doable starting in v3.5+?

This means that starting when Wagtail makes a new release without Django 2.0 support (or without Python 3.4 support, if that comes first), it will be possible to release the exporter with type annotations included (some time in 2019, see https://docs.wagtail.io/en/latest/releases/upgrading.html).

If anyone wants to experiment with this in the meantime, I would be interested to see what bugs this would surface. In my opinion the first step would be to use https://github.com/dropbox/pyannotate. I think there is a similar project from Google that does annotations based on instrumentation of running code.

Update example.py to use full range of content features from Draftail

Provide readme in reStructuredText format for pypi

pypi project page https://pypi.org/project/draftjs_exporter/ doesn't look nice, because pypi requires package readme in reStructuredText format not markdown

Exporter loads the html5lib engine even if it isn't used

At the moment, the exporter loads the html5lib engine even if it is configured to use another one. The impact is 11.6MB of memory taken for nothing:

Line #    Mem usage    Increment   Line Contents
================================================
     8     11.8 MiB      0.0 MiB   @profile
     9                             def run():
    10     23.4 MiB     11.6 MiB       from draftjs_exporter.html import HTML
    11
    12     23.4 MiB      0.0 MiB       HTML({'engine': 'string'})
[...]

Small demo script:

from memory_profiler import profile


@profile
def run():
    from draftjs_exporter.html import HTML

    HTML({'engine': 'string'}).render({
        'entityMap': {},
        'blocks': [
            {
                "key": "2nols",
                "text": "Test",
                "type": "unstyled",
                "depth": 0,
                "inlineStyleRanges": [],
                "entityRanges": [],
                "data": {}
            },
        ]
    })


run()

This could be a relatively simple fix (see patch below) if DOM.use was used consistently, but there are many tests that aren't calling it explicitly (and thus implicitly rely on the html5lib default). I would suggest to wait for #79 to be taken care of, so we don't have to update all those tests and they can then rely on the new default.

--- a/draftjs_exporter/dom.py
+++ b/draftjs_exporter/dom.py
@@ -3,7 +3,6 @@ from __future__ import absolute_import, unicode_literals
 import inspect
 import re

-from draftjs_exporter.engines.html5lib import DOM_HTML5LIB
 from draftjs_exporter.error import ConfigException

 # Python 2/3 unicode compatibility hack.
@@ -28,7 +27,7 @@ class DOM(object):
     LXML = 'lxml'
     STRING = 'string'

-    dom = DOM_HTML5LIB
+    dom = HTML5LIB

     @staticmethod
     def camel_to_dash(camel_cased_str):
@@ -37,7 +36,7 @@ class DOM(object):
         return dashed_case_str.replace('--', '-')

     @classmethod
-    def use(cls, engine=DOM_HTML5LIB):
+    def use(cls, engine=HTML5LIB):
         """
         Choose which DOM implementation to use.
         """
@@ -45,6 +44,7 @@ class DOM(object):
             if inspect.isclass(engine):
                 cls.dom = engine
             elif engine.lower() == cls.HTML5LIB:
+                from draftjs_exporter.engines.html5lib import DOM_HTML5LIB
                 cls.dom = DOM_HTML5LIB
             elif engine.lower() == cls.LXML:

Unicode decode error installing on Ubuntu 16.04

pip 10.0.1 from /usr/local/lib/python3.5/dist-packages/pip (python 3.5)

  Downloading https://files.pythonhosted.org/packages/6d/e5/fcf88a5dab82ca619ff6824a062b46d9315ba91e64204275213a1a712125/draftjs_exporter-2.1.0.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-rmsz5l7l/draftjs-exporter/setup.py", line 91, in <module>
        long_description=md2pypi('README.md'),
      File "/tmp/pip-install-rmsz5l7l/draftjs-exporter/setup.py", line 59, in md2pypi
        content = io.open(filename).read()
      File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1727: ordinal not in range(128)

Any chance for a version update to master that doesn't run the markdown conversion in the setup.py script?

Update HTML.render to handle None content state

The value for an empty field will change to None (instead of an empty dictionary). The renderer has to be updated accordingly.

Note: Corresponding changes will need to happen in springload/draftail and springload/wagtaildraftail.

Add API to facilitate the creation of custom decorators

Is it possible to export draftjs state as plain text (i.e., without any html tags)

is there a built-in function to do it?

if not, is it possible to set configuration to do it?

thanks!

Write blog post about Draft.js content model and reference it in docs

Could be an announcement blog post for 1.0.0 as well.

Nested inline style configuration question

In the https://www.draftail.org/ documentation I searched for: nested inline style.
In the issues / pull requests, I searched for: nested inline style
In Stack Overflow, I searched for: draftjs exporter nested inline style

I have this block

            {
                "key": "",
                "text": "Bold Italic Underline",
                "type": "unstyled",
                "depth": 0,
                "inlineStyleRanges": [
                    {"offset": 0, "length": 21, "style": "BOLD"},
                    {"offset": 5, "length": 16, "style": "ITALIC"},
                    {"offset": 12, "length": 9, "style": "UNDERLINE"},
                ],
                "entityRanges": [],
                "data": {},
            },

I expect it to render HTML as

<p><strong>Bold <em>Italic <u>Underline</u></em></strong></p>

but it renders HTML as

<p><strong>Bold </strong><strong><em>Italic </em></strong><strong><em><u>Underline</u></em></strong></p>

Is it a bug or is there any configuration?

Measure memory footprint of the exporter

To make sure we do not cause any memory leak, and this is ready to be used in production servers.

Missing 0.4 tag

v0.4 is up on pypi but the corresponding tag is missing on GitHub :(

QUESTION - Can I create JSON from HTML with this code?

My use case need to migrate rich text content into JSON used in DraftJS, is it possible to do this with this code?

Implement default config supporting a lot of HTML elements

Use http://www.poormansstyleguide.com/

The setup.py in windows can cause a encoding error.

This will be corrected by :

with open('README.md', encoding='utf8') as f:
    long_description = f.read()

Entities with adjacent offset are rendered incorrectly

Test version: v2.1.4

Test with the following code:

from draftjs_exporter.constants import BLOCK_TYPES
from draftjs_exporter.constants import ENTITY_TYPES
from draftjs_exporter.defaults import BLOCK_MAP
from draftjs_exporter.dom import DOM
from draftjs_exporter.html import HTML
import json

a = '''{
    "blocks": [
        {
            "key": "bh6r4",
            "text": "🙄😖",
            "type": "unstyled",
            "depth": 0,
            "inlineStyleRanges": [],
            "entityRanges": [
                {
                "offset": 0,
                "length": 1,
                "key": 7
                },
                {
                "offset": 1,
                "length": 1,
                "key": 8
                }
            ],
            "data": {}
        }
    ],
    "entityMap": {
        "7": {
            "type": "emoji",
            "mutability": "IMMUTABLE",
            "data": {
                "emojiUnicode": "🙄"
            }
        },
        "8": {
            "type": "emoji",
            "mutability": "IMMUTABLE",
            "data": {
                "emojiUnicode": "😖"
            }
        }
    }
}'''


def emoji(props):
    emoji_encode = []
    for c in props.get('emojiUnicode'):
        code = '%04x' % ord(c)
        if code != '200d':
            emoji_encode.append('%04x' % ord(c))

    return DOM.create_element('span', {
        'data-emoji': '-'.join(emoji_encode),
        'class': 'emoji',
    }, props['children'])


def entity_fallback(props):
    return DOM.create_element('span', {'class': 'missing-entity'},
                              props['children'])


def style_fallback(props):
    return props['children']


def block_fallback(props):
    return DOM.create_element('div', {}, props['children'])


DRAFTJS_EXPORTER_CONFIG = {
    'entity_decorators': {
        'emoji': emoji,
        ENTITY_TYPES.FALLBACK: entity_fallback,
    },
    'block_map': dict(BLOCK_MAP, **{
        BLOCK_TYPES.FALLBACK: block_fallback,
    })
}

exporter = HTML(DRAFTJS_EXPORTER_CONFIG)

if __name__ == '__main__':
    print(exporter.render(json.loads(a)))

Actual output:

<p><span class="emoji" data-emoji="1f644">🙄</span>😖<span class="emoji" data-emoji="1f616"></span></p>

Expected output:

<p><span class="emoji" data-emoji="1f644">🙄</span><span class="emoji" data-emoji="1f616">😖</span></p>

[WIP] HTML building libraries to try out

https://github.com/html5lib/html5lib-python for its nicer API, but no documentation
http://pythonhosted.org/pyquery/ for its nicer API (?), documented, built on top of lxml

Fix code relying on mutable default values

Pointed out by @loicteixeira, this happens a couple of times within the codebase.

[WIP] Corner cases to address

Discuss the tricky problems the exporter needs to address. They can be addressed by "doing the right thing", documenting the shortcomings, and/or letting the user configure the output they want.

Custom attributes (not data-* attributes but invalid ones like *ngFor) – ok with html5lib
Order in which ranges are applied (em in strong or strong in em)
- Stable (alphabetical) order for now, but not configurable.
Order in which attributes are rendered can be different between Python versions - use something like from collections import OrderedDict?
- html5lib inserts attributes in alphabetical order
Nested blocks where nesting jumps one level of depth (depth = 1, then depth = 3)
- Not supported for now
unstyled blocks without text, should they render as empty p tags? br?
- Empty elements for now.

Missing git tags for 2.1.1

The git tag 2.1.1 is missing for this commit 220f2b9

Please add it

Regards,

SVG namespaces are not properly supported

draftjs_exporter/draftjs_exporter/dom.py

Line 37 in ce7bff6

# TODO One-off fix ATM, even though the problem is everywhere.

Safari in particular will likely complain if namespaces are not handled properly.

The same issue would arise with other namespaces (og, microdata, mathjax, etc)

About the performance of bs4 + html5lib

Hi Thibaud Colas,

Here I got another problem. In my project, when I wrote a real-world note text which is not too long, but with a lot of entities, I found it takes more than 5 seconds to render. Of course that's unacceptable for an online service, so I tried to reduce the number of temporary wrapper elements to optimize the speed, finally I made it a little better, like more than 3 seconds, that's all I could do.

But when I tried to use lxml instead of html5lib, the rendering time decreased to less than 1 second!

WTH? Then I found someone's benchmark , which explains the hug difference (with python 2).

And here's my simple test case with a few images and "subjects". With lxml, the rendering takes 0.17 seconds:

def Soup(raw_str):
    """
    Wrapper around BeautifulSoup to keep the code DRY.
    """
    return BeautifulSoup(raw_str, 'lxml')

         47459 function calls (46117 primitive calls) in 0.164 seconds

   Ordered by: cumulative time
   List reduced from 257 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.165    0.165 html.py:24(render)
       16    0.001    0.000    0.115    0.007 html.py:39(render_block)
      133    0.002    0.000    0.091    0.001 dom.py:35(create_element)
       38    0.000    0.000    0.091    0.002 entity_state.py:55(render_entitities)
        6    0.001    0.000    0.073    0.012 benchmark.py:173(render)
      174    0.001    0.000    0.072    0.000 dom.py:17(Soup)
      174    0.008    0.000    0.071    0.000 __init__.py:87(__init__)
      172    0.001    0.000    0.059    0.000 dom.py:28(create_tag)
        1    0.000    0.000    0.049    0.049 wrapper_state.py:111(to_string)
        1    0.000    0.000    0.049    0.049 dom.py:124(render)
      174    0.001    0.000    0.042    0.000 __init__.py:285(_feed)
      174    0.001    0.000    0.041    0.000 _lxml.py:246(feed)
 1166/515    0.002    0.000    0.037    0.000 {hasattr}
      107    0.001    0.000    0.036    0.000 element.py:1029(__getattr__)
      107    0.000    0.000    0.034    0.000 element.py:1273(find)
      107    0.001    0.000    0.034    0.000 element.py:1284(find_all)
      174    0.007    0.000    0.034    0.000 {method 'feed' of 'lxml.etree._FeedParser' objects}
      107    0.003    0.000    0.033    0.000 element.py:518(_find_all)
        2    0.000    0.000    0.028    0.014 element.py:1077(__unicode__)
    336/2    0.008    0.000    0.028    0.014 element.py:1105(decode)

But with html5lib it takes about 0.65 seconds.

         178504 function calls (177142 primitive calls) in 0.663 seconds

   Ordered by: cumulative time
   List reduced from 455 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.663    0.663 html.py:24(render)
      174    0.001    0.000    0.572    0.003 dom.py:17(Soup)
      174    0.008    0.000    0.571    0.003 __init__.py:87(__init__)
       16    0.001    0.000    0.551    0.034 html.py:39(render_block)
      174    0.001    0.000    0.543    0.003 __init__.py:285(_feed)
      174    0.003    0.000    0.542    0.003 _html5lib.py:57(feed)
      172    0.001    0.000    0.488    0.003 dom.py:28(create_tag)
      133    0.002    0.000    0.412    0.003 dom.py:35(create_element)
       38    0.000    0.000    0.382    0.010 entity_state.py:55(render_entitities)
      174    0.011    0.000    0.350    0.002 html5parser.py:55(__init__)
        6    0.001    0.000    0.256    0.043 benchmark.py:173(render)
      174    0.001    0.000    0.188    0.001 html5parser.py:225(parse)
      174    0.002    0.000    0.187    0.001 html5parser.py:81(_parse)
     5916    0.104    0.000    0.176    0.000 utils.py:49(__init__)
      174    0.007    0.000    0.145    0.001 html5parser.py:157(mainLoop)
        1    0.000    0.000    0.104    0.104 wrapper_state.py:111(to_string)
        1    0.000    0.000    0.104    0.104 dom.py:124(render)
      174    0.019    0.000    0.090    0.001 html5parser.py:874(__init__)
       16    0.000    0.000    0.089    0.006 wrapper_state.py:125(element_for)
      174    0.054    0.000    0.086    0.000 html5parser.py:422(getPhases)

So, any suggestion for optimizing? And I don't know if html5lib is good enough for us to ignore the performance issue, how do you think? Thank you~

Allow ability to customise where entity children are output

Right now custom entity renderes like Link do not define where their children go – the text or further elements just get appended at the end.

This is a major problem for entities that render text as a prefix or suffix.

Support arbitrary nesting levels

The exporter should support blocks going from depth 0 to depth 3, or any depth jump, as this frequently happens with real-world HTML.

Unicode decode reading markdown file

Ubuntu 16.04

python 3.5

Collecting draftjs_exporter (from -r requirements.txt (line 4))
  Downloading https://files.pythonhosted.org/packages/c3/98/2ae0db16e3841d9d0623b1a2248987e1edd037ab7eaa04e45e4fdf18873b/draftjs_exporter-2.1.1.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-f_emtego/draftjs-exporter/setup.py", line 38, in <module>
        long_description = f.read()
      File "/usr/lib/python3.5/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1956: ordinal not in range(128)

Remove support for class based decorator

Capturing offline discussions for a public discussion.

Currently a decorator can be written as a function (accepting a single positional argument props) or a class (with a single render method accepting a single positional argument props) which has a few issues:

Nothing is passed to the __init__ of the class but instead everything is passed to props which makes the class quite useless. Moving the props to the __init__ will either create boilerplate or will force the user to inherit from a custom class which does that for them. In short, it's not clear which benefit there is to use a class.
To keep naming consistent, functions are names are camel cased (so when you update your config file, you don't have to think whether it's a function or a class) which gets the linter to complain. The library should not encourage non-PEP8 compliant code.

We were therefore thinking of removing support for class based decorators. Any thoughts?

Custom components switch places

I have two custom components: image and direct speech. There's nothing very fancy in their code, e.g. image is rendered like this:

class Image:
    def render(self, props):
        data = props['block']['data']
        return self.wrapper(data)

    def wrapper(self, data):
        caption = data.get('caption', '')

        return tags.div(
            {'class': 'contentImage'},
            self.image(data),
            DOM.create_text_node(caption))

    def image(self, data):
        src = data.get('image', '')
        caption = data.get('caption', '')
        return tags.img({
            'src': src,
            'class': 'contentImage-image',
            'alt': caption})

then I generate the content for the blocks (not important fields are dropped):

{  
   "blocks":[  
      {  
         "text":"Start text",
         "type":"unstyled",
         ...
      },
      {  
         "type":"image",
         "data":{  
            "caption":"Some image caption",
            "image":"http://placekitten.com/401/402"
         },
         ...
      },
      {  
         "text":"Middle text",
         "type":"unstyled",
         ...
      },
      {  
         "type":"direct-speech",
         "data":{  
            "name":"Steve Jobs",
            "image":"http://placekitten.com/500/501",
            "text":"Hello, world!!!!111"
         },
         ...
      },
      {  
         "text":"Last text",
         "type":"unstyled",
         ...
      }
   ]
   ...
}

I expect "Middle text" block to be between the image and direct speech, but for some reason these two blocks are rendered together and the text appears before them:

<p>Start text</p>
<p>Middle text</p>
<div>
   <div class="contentImage">
      <img class="contentImage-image" src="http://placekitten.com/401/402">Some image caption
   </div>
   <blockquote class="directSpeech">
      <div class="directSpeech-image" style="background-image: url(http://placekitten.com/500/501);"></div>
      <div class="directSpeech-wrapper">
         <div class="directSpeech-title">
            Steve Jobs
         </div>
         <p class="directSpeech-text">Hello, world!!!!111</p>
      </div>
   </blockquote>
</div>
<p>Last text</p>

I hope the bug is reproducable; if you have any questions or need more details I'll provide you any information.

Does not support nested blocks

use case

Forgive my naivete, but why is this package needed? What is its purpose / use case for Python devs? I just found out about draft.js, and don't know a lot about javascript. Thanks.

Change default engine to the new dependency-free one introduced in #77

The exporter now has an engine that doesn't have any dependencies. It should probably be the one activated by default, to make the whole package dependency-free unless another engine is configured. It also happens to be faster, and less memory-hungry.

This is a breaking change though, no matter how little difference there is with the output of the current default (html5lib + BeautifulSoup), so should be part of the 2.0.0 release.

As part of this change, it will also be necessary to move the html5lib / BS4 dependencies to a separate extra like for lxml (pip install draftjs_exporter[html5lib]), as well as update the documentation.

Allow content on element even if not configured

Before upgrading, we used unstyled as a empty string ''

After 0.8(if I'm not mistaken) draftjs_exporter
will create a empty text instead of original text content

	dependencies = [
	'beautifulsoup4>=4.4.1,<5',
	'html5lib>=0.999,<=1.0b10',
	]

	lxml_dependencies = [
	'lxml>=3.6.0',
	]

	props = entity_details['data'].copy()
	props['entity'] = {
	'type': entity_details['type'],
	}

	nodes = DOM.create_element()

	for n in self.element_stack:
	DOM.append_child(nodes, n)

	elt = DOM.create_element(opts.element, props, nodes)

	def render_styles(self, text_node):
	node = text_node
	if not self.is_empty():
	# Nest the tags.
	for s in sorted(self.styles, reverse=True):
	opt = Options.for_style(self.style_map, s)
	node = DOM.create_element(opt.element, opt.props, node)

springload / draftjs_exporter Goto Github PK

draftjs_exporter's People

Stargazers

Watchers

Forkers

draftjs_exporter's Issues

Recommend Projects

Recommend Topics

Recommend Org