sergiocorreia / panflute Goto Github PK

View Code? Open in Web Editor NEW

481.0 14.0 59.0 1.72 MB

An Pythonic alternative to John MacFarlane's pandocfilters, with extra helper functions

Home Page: http://scorreia.com/software/panflute/

License: BSD 3-Clause "New" or "Revised" License

Python 99.87% TeX 0.13%

pandoc markdown filter python

panflute's People

Contributors

Stargazers

Watchers

panflute's Issues

Building writers.

Is there a way that we can make write a writer with Panflute?
Has anyone thought of it?
I've been reading in the pandoc documentation, and in the section Custom Writers, It talks about been able to write a writer in Lua, and make pandoc enforce it.
With pandoc we have a pythonic representation of the same AST so, I guess we could work on enabling a way or a "standard" way of writing a writer...

Does this sounds do able?

Add testing framework

Allow run_filters, load, dump to work on strings

Currently they work on streams which makes it messy if we just have a string created with e.g. pypandoc.

Instead, if the input is a string instance, call json.loads() and so on.

However, we should not break backward compat.

YAML syntax in Fenced code block?

As mentioned in #12 about an enhanced version of csv2table in pandoc-table-csv-test/panflute-csv2table.ipynb at master · ickc/pandoc-table-csv-test, I have some questions on the YAML syntax required in the fenced code block.

First, it doesn't allow an opening --- row, e.g.

~~~csv
---
table-width: 1.2
...
**_Fruit_**,~~Price~~,_Number_,`Advantages`
*Bananas~1~*,$1.34,12~units~,"Benefits of eating bananas 
(**Note the appropriately
rendered block markdown**):    

- _built-in wrapper_        
- ~~**bright color**~~

"
*Oranges~2~*,$2.10,5^10^~units~,"Benefits of eating oranges:

- **cures** scurvy
- `tasty`"
~~~

will treat the table-width: 1.2 after the first --- as the actual table (i.e. not YAML), not the content below.

Second, if there's no YAML like this:

~~~csv
**_Fruit_**,~~Price~~,_Number_,`Advantages`
*Bananas~1~*,$1.34,12~units~,"Benefits of eating bananas 
(**Note the appropriately
rendered block markdown**):    

- _built-in wrapper_        
- ~~**bright color**~~

"
*Oranges~2~*,$2.10,5^10^~units~,"Benefits of eating oranges:

- **cures** scurvy
- `tasty`"
~~~

It will have an error:

yaml.scanner.ScannerError: while scanning an alias
  in "<unicode string>", line 1, column 1:
    **_Fruit_**,~~Price~~,_Number_,` ...

i.e. empty YAML still requires a ---/... row to start with.

Old filter stopped working recently

Dear Sergio,

Thank you for your great tool. I have been using a filter that creates a list of acronyms for almost two years, recent update caused it to stop working. Last time I used it was around five months back.

The filter is:

"""
Panflute filter that allows for acronyms in latex using the acro package

Usage:

- In markdown, use it as links: [SO](acro "Stack Overflow")
- Then, this filter will add \DeclareAcronym{LRU}{long=Least Recently Used} 
for the definition of LRU and finally \ac{LRU} to every time the term is used in the text.

(Development of this tool can be followed here https://groups.google.com/forum/#!topic/pandoc-discuss/Bz1cG55BKjM)
"""

from string import Template  # using .format() is hard because of {} in tex
import panflute as pf

TEMPLATE_GLS = Template(r"\ac{$acronym}")
TEMPLATE_NEWACRONYM = Template(r"\DeclareAcronym{$acronym}{long=$definition}")


def prepare(doc):
    doc.acronyms = {}


def action(e, doc):
    if isinstance(e, pf.Link) and e.url == 'acro':
        acronym = pf.stringify(e)
        definition = e.title
        # Only update dictionary if definition is not empty
        if definition:
            doc.acronyms[acronym] = definition
        
        if doc.format == 'latex':
            tex = '\ac{{}}'.format(acronym)
            tex = TEMPLATE_GLS.safe_substitute(acronym=acronym)
            return pf.RawInline(tex, format='latex')


def finalize(doc):
    if doc.format == 'latex':
        tex = [r'\usepackage[]{acro}']
        for acronym, definition in doc.acronyms.items():
            tex_acronym = TEMPLATE_NEWACRONYM.safe_substitute(acronym=acronym, definition=definition)
            tex.append(tex_acronym)

        tex = [pf.MetaInlines(pf.RawInline(line, format='latex')) for line in tex]
        tex = pf.MetaList(*tex)
        if 'header-includes' in doc.metadata:
            doc.metadata['header-includes'].content.extend(tex)
        else:
            doc.metadata['header-includes'] = tex


def main(doc=None):
    return pf.run_filter(action, prepare=prepare, finalize=finalize, doc=doc) 


if __name__ == '__main__':
    main()

It is called with -F acronyms.py, but recently (as of yesterday) I have been welcomed with the error:

Traceback (most recent call last):
  File "style/acronyms.py", line 58, in <module>
    main()
  File "style/acronyms.py", line 54, in main
    return pf.run_filter(action, prepare=prepare, finalize=finalize, doc=doc) 
  File "/home/luis/.local/lib/python2.7/site-packages/panflute/io.py", line 265, in run_filter
    return run_filters([action], *args, **kwargs)
  File "/home/luis/.local/lib/python2.7/site-packages/panflute/io.py", line 249, in run_filters
    finalize(doc)
  File "style/acronyms.py", line 48, in finalize
    doc.metadata['header-includes'].content.extend(tex)
  File "/usr/lib/python2.7/_abcoll.py", line 675, in extend
    self.append(v)
  File "/usr/lib/python2.7/_abcoll.py", line 664, in append
    self.insert(len(self), value)
  File "/home/luis/.local/lib/python2.7/site-packages/panflute/containers.py", line 87, in insert
    v = check_type(v, self.oktypes)
  File "/home/luis/.local/lib/python2.7/site-packages/panflute/utils.py", line 29, in check_type
    raise TypeError(msg)
TypeError: received MetaInlines but expected <class 'panflute.base.Block'>
Error running filter style/acronyms.py:
Filter returned error status 1

I can't wrap my head around what must been causing the problem. What do you suggest I look into?

P.S.: Tested in Archlinux with python3 as well, the error is exactly the same.

Attributes to Arbitrary Elements, (Definition List)

I've been reading the source code, and I can't find a way to put Attributes to DefinitionLists.
Is this Something, that has to do with bare Pandoc, or could I freely modify and do a pull request?

`convert_text` from json: AttributeError

MWE:

from panflute import *
md = 'Some *markdown* **text** ~xyz~'
md2json = convert_text(md, input_format='markdown', output_format='json')
print(md2json)
md2json2md = convert_text(md2json, input_format='json', output_format='markdown')
print(md2json2md)

will resulted in error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-34-befb94598183> in <module>()
      3 md2json = convert_text(md, input_format='markdown', output_format='json')
      4 print(md2json)
----> 5 md2json2md = convert_text(md2json, input_format='json', output_format='markdown')
      6 print(md2json2md)

/usr/local/lib/python3.5/site-packages/panflute/tools.py in convert_text(text, input_format, output_format, extra_args)
    147     args = [from_arg, to_arg] + extra_args
    148 
--> 149     out = run_pandoc(text, args)
    150 
    151     if output_format == 'json':

/usr/local/lib/python3.5/site-packages/panflute/tools.py in run_pandoc(text, args)
     96 
     97     proc = Popen([pandoc_path] + args, stdin=PIPE, stdout=PIPE, stderr=PIPE)
---> 98     out, err = proc.communicate(input=text.encode('utf-8'))
     99     exitcode = proc.returncode
    100     if exitcode != 0:

AttributeError: 'list' object has no attribute 'encode'

Allow panflute filters to be autodownloaded from github.

Note: I think this is outside the scope of panflute, and there should be a separate repo that deals with this (we can add it to panflute's dependencies)

Implementation:

The package should fetch a filters.yaml file from the repo of an organization repo. This file should have a list of items, each with two fields (name and url).
A filter consists of a python file (for now let's limit ourselves to one file) and optionally a yaml file (e.g. headers.py and headers.csv)
The YAML file contains metadata about the filter. This includes: name, description, author, date, version (!), etc.
If at some point way in the future we want to extend this, we can do it easily through the yaml (eg: allow for more than one python file, more than just python files, versioning, example input and output markdown files).
The url should point to either the .py file or the .yaml file
Both files shoud be copied to datadir/filters (or to whatever path the user asks)

Finally, this utility should allow for a few suboptions:

??? install somefilter
??? update somefilter
??? install filter1 filter2 filter3
??? uninstall filter1

Also see:

Bower: https://bower.herokuapp.com/packages
Conda forge: https://github.com/conda-forge/feedstocks
ST Package control: https://github.com/wbond/package_control_channel/

autorun filters with dependencies?

Is there a way to implement autorun filters with dependencies?

The use case, is that I have a set of filters, in separate files, but they depend on a another module (I did), that contains a series of utility functions?

Travis deploy to PyPI; git tag to GitHub Release

Do you want me to make a pull request such that Travis can help you to release to PyPI automatically?

If travis is setup this way:

you need to add your password in travis-ci.org as an environmental variable, say, pypi_password. (this will not be shown to the public unless you activate "Display value in build log")
(there are multiple ways to set this up. One way is to use git tag, which has the added bonus to have it "properly" released on GitHub release simultaneously.) You need to use git tag -a v$(python setup.py --version) -m 'Deploy to PyPI' && git push origin v$(python setup.py --version) when you release a version.

TableCell admits Para but not Span Block elements

I am new to panflute, so sorry if this a stupid error.
If I interchange the variable tbl_el with aaa, everything works fine. Can't figure out why! Thanks!

script replaces

::: {.v_t}
col1

col2

col3
:::

with table

#!/usr/bin/python3
import panflute as pf
import sys

def action(element, doc):

    if not isinstance(element, pf.Div):
        return None
    if not "v_t" in element.classes:
        return None

    rows = []
    headers = []
    header_found = False
    widths = []
    aligns = []
    caption = [] 
    col = []
    for tbl_el in element.content:
        if isinstance(tbl_el, pf.Para):
            aaa=pf.Span(*tbl_el.content)
            print(aaa,file=sys.stderr)
            print(tbl_el, file=sys.stderr)
            ############################################################
            col.append(pf.TableCell(aaa)) ##############################
            ############################################################
            # Here's the problem
            # Everything works fine with tbl_el instead of aaa
            # But WHY ??? Span is also a Block element ?
        else:
            pass
    rows.append(pf.TableRow(*col))

    kwargs = {}
    if header_found:
        kwargs["header"] = pf.TableRow(*headers)
    if widths:
        kwargs["width"] = widths
    if aligns:
        kwargs["alignment"] = aligns
    if not caption is None:
        kwargs["caption"] = caption
    #return pf.Table(*rows, **kwargs)
    a = pf.Table(*rows, **kwargs)
    return pf.Div(a,classes=['ee'])

def prepare(doc):
    pass


def finalize(doc):
    pass


def main(doc=None):
    return pf.run_filter(action, doc=doc, prepare=prepare, finalize=finalize)


if __name__ == "__main__":
   main()

Wrapper around pandoc should also be GPL

There are no exceptions about that in Pandoc license and the way of linking doesn't matter (though the way of linking would have mattered if Pandoc was LGPL).

And Panflute uses some helper functions that are wrappers around Pandoc and that not really needed by most of the filters. So I suggest removing them and keeping the current license.

Python 2 Support

First of all, thank you for a great library.

I saw the discussion going on here: https://groups.google.com/forum/#!msg/pandoc-discuss/MitGRIUwEGo/C4eviaS6BQAJ and I was wondering what was the final say regarding Python 2 support.

TravisCI using older versions of code

Hi @ickc

For some reason the TravisCI builds keep failing, and when I inspect the results it show that its failing on some parts of the code that I have already fixed/removed , as if it is using a stale version of panflute

EG: here it uses the stringify function within autofilters.py ,but the latest builds have no use of that function within the autofilter module

Any ideas of what might be going on?

Exception: unknown tag: LineBlock

Somewhat MWE:

printf "%s\n" '| abc' '| def' | pandoc -f markdown -t native -F pantable2csv
Traceback (most recent call last):
  File "/usr/local/bin/pantable2csv", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.5/site-packages/pantable/pantable2csv.py", line 127, in main
    panflute.run_filter(table2csv)
  File "/usr/local/lib/python3.5/site-packages/panflute/io.py", line 257, in run_filter
    return run_filters([action], *args, **kwargs)
  File "/usr/local/lib/python3.5/site-packages/panflute/io.py", line 230, in run_filters
    doc = load(input_stream=input_stream)
  File "/usr/local/lib/python3.5/site-packages/panflute/io.py", line 53, in load
    doc = json.load(input_stream, object_pairs_hook=from_json)
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/__init__.py", line 268, in load
    parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/__init__.py", line 332, in loads
    return cls(**kw).decode(s)
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/Cellar/python3/3.5.2_3/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/decoder.py", line 355, in raw_decode
    obj, end = self.scan_once(s, idx)
  File "/usr/local/lib/python3.5/site-packages/panflute/elements.py", line 1459, in from_json
    raise Exception('unknown tag: ' + tag)
Exception: unknown tag: LineBlock
pandoc: Error running filter pantable2csv
Filter returned error status 1

looking at the source in panflute/elements.py, LineBlock is not defined.

`IndexError` when `Table(..., header=None)`

Hi,

From Table(*args, *, header=None, caption=None, alignment=None, width=None), the default value of header is None. However, using a modified version of the docstring:

x = [Para(Str('Something')), Para(Space, Str('else'))]
c1 = TableCell(*x)
c2 = TableCell(Header(Str('Title')))
rows = [TableRow(c1, c2)]
table = Table(*rows)

will result in

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-11-f81c7d6eb71b> in <module>()
      3 c2 = TableCell(Header(Str('Title')))
      4 rows = [TableRow(c1, c2)]
----> 5 table = Table(*rows)

/usr/local/lib/python3.5/site-packages/panflute/elements.py in __init__(self, header, caption, alignment, width, *args)
   1051         self.rows = len(self.content)
   1052         self.cols = len(self.content[0].content)
-> 1053         self.header = header if header else []
   1054         self.caption = caption if caption else []
   1055 

/usr/local/lib/python3.5/site-packages/panflute/elements.py in header(self, value)
   1082             msg = 'table header has an incorrect number of cols:'
   1083             msg += ' {} rows but expected {}'.format(len(value), self.cols)
-> 1084             raise IndexError(msg)
   1085 
   1086     @property

IndexError: table header has an incorrect number of cols: 0 rows but expected 2

It would be the same if the last line becomes table = Table(*rows, header=None).

So it always expected a list as long as the no. of columns.

Thanks.

get_metadata throws AssertionError : assert isinstance(key, str)

Hello,

Thanks for this piece of software, it's pretty cool but I'm struggling with the metadata function :

from panflute import *
meta = {'author': MetaString('John Doe')}
content = [Header(Str('Title')), Para(Str('Hello!'))]
doc = Doc(*content, metadata=meta, format='pdf')
a=doc.get_metadata('author')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/panflute/tools.py", line 239, in _get_metadata
    assert isinstance(key, str)
AssertionError

Am I doing something wrong ?

autofilter.py: autorun_filters: KeyError: 'main'

In an attempt to add more tests, I setup Travis to run all .md through panflute, which will applies necessary filters according to autofilter.py and the YAML.

I then encountered KeyError: 'main' in Job #59.2 - ickc/panflute - Travis CI, when processing an .md file using the filter include.

panflute errors on python 2.7.12

Hi! First, let me say how useful this has been, thank you very much for putting this together!

On an Ubuntu 16.04.1 machine running python 2.7.12 and pip 9.0.1, I ran:

$ sudo -H pip install panflute
Collecting panflute
  Downloading panflute-1.9.2.tar.gz
Requirement already satisfied: pyyaml in /usr/local/lib/python2.7/dist-packages (from panflute)
Requirement already satisfied: future in /usr/local/lib/python2.7/dist-packages (from panflute)
Requirement already satisfied: shutilwhich in /usr/local/lib/python2.7/dist-packages (from panflute)
Building wheels for collected packages: panflute
  Running setup.py bdist_wheel for panflute ... done
  Stored in directory: /root/.cache/pip/wheels/d1/f0/ac/8313c959de9b43243cd3c9a5e5296bd4369bacbc7f40d7b7e6
Successfully built panflute
Installing collected packages: panflute
Successfully installed panflute-1.9.2

Note, this was not the GH address you mention in your documentation, it was the pip record you said should work with python 2.7.

Perhaps the easiest use case error I can provide is copy/pasting directly from your tutorial:

>>> e1 = Emph(Str('Hello'), Space, Str('world!'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/panflute/elements.py", line 695, in __init__
    self.text = check_type(text, str)
  File "/usr/local/lib/python2.7/dist-packages/panflute/utils.py", line 29, in check_type
    raise TypeError(msg)
TypeError: received str but expected <class 'future.types.newstr.newstr'>

Trying this same command on python 3.6.0, it works fine:

>>> e1 = Emph(Str('Hello'), Space, Str('world!'))
>>>

Is it possible that panflute has't been fully ported to 2.7? Did I do something wrong on my install? Any help you can offer would bemuch appreciated, thank you!

TODO: update installation docs (py2 now allowed)

Travis CI Bug

One of the TravisCI tests is failing on some versions of Python; investigate further..

Update documentation

Add e.g. this to the sphinx docs:

https://github.com/kiwi0fruit/pandoctools/blob/master/docs/panfl.md

[feature request] A better repr for panflute AST

For example,

>>> from panflute import *
>>> table_referenced = Table(TableRow(TableCell(Plain(Str(''))), TableCell(Plain(Str('')))), TableRow(TableCell(Plain(Str('1'))), TableCell(Plain(Str('2')))), TableRow(TableCell(Plain(Str('3'))), TableCell(Plain(Str('4')))), alignment=['AlignDefault', 'AlignDefault'], width=[0.5, 0.5])
>>> repr(table_referenced)
"Table(TableRow(TableCell(Plain(Str())) TableCell(Plain(Str()))) TableRow(TableCell(Plain(Str(1))) TableCell(Plain(Str(2)))) TableRow(TableCell(Plain(Str(3))) TableCell(Plain(Str(4)))); alignment=['AlignDefault', 'AlignDefault'], width=[0.5, 0.5], rows=3, cols=2)"

It will be easier for writing tests if repr(table_referenced) would be just Table(TableRow(TableCell(Plain(Str(''))), TableCell(Plain(Str('')))), TableRow(TableCell(Plain(Str('1'))), TableCell(Plain(Str('2')))), TableRow(TableCell(Plain(Str('3'))), TableCell(Plain(Str('4')))), alignment=['AlignDefault', 'AlignDefault'], width=[0.5, 0.5]). i.e. if I copy the repr output and paste it, it would be a valid function to generate a panflute AST.

Currently, the __str__ points to the __repr__ method, may be the current repr will be moved to str, and has a new repr that has the said behavior?

convert_text does not work with tags

convert_text('<https://www.google.com>')

does not work and gives this error

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.4/dist-packages/panflute/tools.py", line 176, in convert_text
  out = json.loads(out, object_pairs_hook=from_json)
File "/usr/lib/python3.4/json/__init__.py", line 331, in loads
  return cls(**kw).decode(s)
File "/usr/lib/python3.4/json/decoder.py", line 343, in decode
  obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.4/json/decoder.py", line 359, in raw_decode
  obj, end = self.scan_once(s, idx)
File "/usr/local/lib/python3.4/dist-packages/panflute/elements.py", line 1393, in from_json
  return Link(*c[1], url=c[2][0], title=c[2][1], **_decode_ica(c[0]))
IndexError: list index out of range

Even though pandoc converts it to -

<p><a href="https://www.google.com">https://www.google.com</a></p>

Add autofilter capabilities

panflute can be run as a filter by itself, in turn calling all filters listed in the panflute-filters metadata field

Changed required:

All filters must be callable from Python, so they need to follow a main() convention (they should have a main function that gets called). This breaks compat with old filters
script hooks in setup.py
code should be self contained in autofilter.py

TypeError when using `convert_text`

I'm not sure what went wrong. I use the example provided by the docstring, but resulted into error:

from panflute import *

md = 'Some *markdown* **text** ~xyz~'

tex = 'Some $x^y$ or $x_n = \sqrt{a + b}$   extit{a}'

convert_text(md)

convert_text(tex)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-b45a79b257df> in <module>()
      2 md = 'Some *markdown* **text** ~xyz~'
      3 tex = 'Some $x^y$ or $x_n = \sqrt{a + b}$   extit{a}'
----> 4 convert_text(md)
      5 convert_text(tex)

/usr/local/lib/python3.5/site-packages/panflute/tools.py in convert_text(text, input_format, output_format, extra_args)
    143 
    144     if output_format == 'json':
--> 145         out = json.loads(out, object_pairs_hook=from_json)[1]
    146 
    147     elif output_format == 'doc':  # Entire document including metadata

TypeError: 'Doc' object does not support indexing

Travis deploy to PyPI only depends on passing build of py36

It is a known problem: travis-ci/travis-ci#929 (comment). And Travis is going to provide a setting to fix this in Q1/Q2 this year.

So this is considered to be an upstream "bug".

For the meanwhile, the temporary fix is either:

do not increment the version until you know it is passing for the other builds too (especially py27)
do not commit to master branch directly but make the master branch read-only. This way, the pull request travis build will guarantee all builds are passing.
- References are Continuous Integration is Dead - Yegor Bugayenko and Protected branches and required status checks.
- Personally, I only implemented this for 1 private project. It is kind of troublesome to always branching and merging (although this should be the practice).

panflute.Table: empty Table body resulted in `IndexError: list index out of range`

From this section of code:

# finalize table according to metadata
    header_row = table_body.pop(0) if options['header'] else None
    table = panflute.Table(
        *table_body,
        caption=options['caption'],
        alignment=options['alignment'],
        width=options['width'],
        header=header_row
    )

In case there's only 1 row, while header is True, then the table_body will be [], and panflute.Table(...) will resulted in IndexError: list index out of range. How should the case of empty table_body be handled? I tried to return empty string or panflute.Str() but they doesn't work.

Thanks.

Encoding Problem with French caracters in headers

Hi !

Thanks for this software. It's quite useful.

I think I found an encoding problem when the markdown source has headers with French caracters

I'm using this filter as an example
https://github.com/sergiocorreia/panflute/blob/master/examples/panflute/headers.py

The following works fine:
pandoc --filter /tmp/headers.py tests/english_title.md -o _build/5011.pdf

with tests/english_title.md contains just this line :

# Normal Title

Now let's create a file called tests/english_title.md with 1 line:

# Titre en français avec plein de caractères bizarres ïéèàù

And let's run;
pandoc --filter /tmp/headers.py tests/french_title.md -o _build/5011.pdf

Here comes trouble:

Traceback (most recent call last):
  File "/tmp/headers.py", line 17, in <module>
    main()
  File "/tmp/headers.py", line 13, in main
    return run_filter(increase_header_level, doc=doc)
  File "/usr/local/lib/python2.7/dist-packages/panflute/io.py", line 265, in run_filter
    return run_filters([action], *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/panflute/io.py", line 238, in run_filters
    doc = load(input_stream=input_stream)
  File "/usr/local/lib/python2.7/dist-packages/panflute/io.py", line 61, in load
    doc = json.load(input_stream, object_pairs_hook=from_json)
  File "/usr/lib/python2.7/json/__init__.py", line 286, in load
    return loads(fp.read(),
  File "/usr/lib/python2.7/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 47: ordinal not in range(128)
pandoc: Error running filter /tmp/headers.py
Filter returned error status 1

PDF link in README.md is broken

http://scorreia.com/software/panflute/Panflute.pdf

Converting a Para element to BlockQuote

Hi @sergiocorreia, thanks for making panflute! It's really helped with a couple filters I've wanted to make.

I'm really struggling with a template and hoping you could lend a hand. I want to create a pandoc filter to have a similar effect as with this python-markdown extension that I wrote: https://github.com/nickwynja/mdx_poetic

Essentially, I want to take a markdown file like below and have it convert SoftBreaks to LineBreaks and set the entire "poem" (the text in between |||: and closing |||) in a BlockQuote.

This is just some normal text that is prose. It might have some _markdown_.

|||:

this is a
line of a poem

and this is _another_
stanza of a poem that has a really long line
and more **markdown** in it

|||

And then this is some **more** markdown that follows.

Here's what I have so far. It seems really quite wrong but it's working convert the linebreaks and then remove the |||: and ||| "syntax" out of the file.

#!/usr/bin/env python3

"""
Converts poetics
"""

import panflute as pf

def action(e, doc):
    if isinstance(e, pf.Para):
        s = pf.stringify(e).strip()
        if s == '|||:':
            e = e.next
            while s != '|||':
                e.walk(parse_lines)
                e.walk(blockquote)
                e = e.next
                s = pf.stringify(e).strip()

def blockquote(e, doc):
    if isinstance(e, pf.Para):
        bq = pf.BlockQuote()
        bq.content = [e]
        return bq

def parse_lines(e, doc):
    if isinstance(e, pf.SoftBreak):
        return pf.LineBreak

def finalize(doc):
    open_syntax = "|||:"
    close_syntax = "|||"
    doc.replace_keyword(open_syntax, pf.Null())
    doc.replace_keyword(close_syntax, pf.Null())

def main(doc=None):
    pf.toJSONFilter(action, finalize=finalize)


if __name__ == '__main__':
    main()

But I can't for the life of me figure out how to get the Paras from inside the |||: "poetry" block into BlockQuotes. You can see with my blockquote function, I'm taking a stab at it but with no effect.

I'd appreciate any advice you might have.

Thanks!

Improve path discovery of scripts installed through pip

When e.g. pantable gets installed, pantable.exe is placed within $PATH, but panflute can't find it

However, panflute could just do import pantable; doc = pantable.main(doc).

On Python 3.4+, we can check this with:

import importlib
spam_spec = importlib.util.find_spec("spam")
found = spam_spec is not None

(or we could just try to import it..)

404 error following http://scorreia.com/software/panflute/

I’m afraid that following http://scorreia.com/software/panflute/ points to a non-existent resource (I get a 404 error).

RawBlock RAW_FORMATS too restrictive

Currently this line is present in elements.py

RAW_FORMATS = {'html', 'tex', 'latex', 'context'}

Which means a lot of filters are not possible for other output formats. For example, the filter I am currently writing is for docx output and uses raw xml.

For my present situation, changing the line to

RAW_FORMATS = {'html', 'tex', 'latex', 'context', 'openxml'}

would be sufficient, but I also suggest that this check should not exist at all.

Readme.md is missing from the pip package

The readme.md file is missing from the pip package, as can be seen when opening the zip file at https://pypi.python.org/pypi/panflute.

This causes the installation to fail with this error

Collecting panflute
  Downloading panflute-1.0.1.zip
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "c:\users\fred~1.sun\appdata\local\temp\pip-build-8ombhn\panflute\setup.py", line 15, in <module>
        with open(path.join(here, 'README.md'), encoding='utf-8') as f:
      File "C:\Anaconda2\lib\codecs.py", line 896, in open
        file = __builtin__.open(filename, mode, buffering)
    IOError: [Errno 2] No such file or directory: 'c:\\users\\fred~1.sun\\appdata\\local\\temp\\pip-build-8ombhn\\panflute\\README.md'
----------------------------------------

some reference code for the help web page seems out of date

It seems following page and/or its reference code in the site seems to be out of date:

http://scorreia.com/software/panflute/_modules/panflute/elements.htm

The page(code) says RAW_FORMATS = {'html', 'tex', 'latex'} while in recent code this variable has
openxml etc. also.

line 1313~

TravisCI failing

convert_text() is failing, might be related to the from_json() function

In particular, line 1358 might make it exit prematurely, perhaps adding OrderedDicts at the beginning of that function might help

Wrapping Links with Divs

Hi, I've attempted to iterate over Links and tried Div.content.extend(Link) but I can't seem to get a link wrapped in a div outputted....

Any ideas?

Tables with pipe rendering

See chdemko/pandoc-numbering#8

Feature Request: generalize Doc.get_metadata

Basically it is very similar to what the current YAML filter is doing, except it is not for CodeBlock but for the actual YAML used in the Markdown metadata.

I see that you have an example manipulating the metadata in panflute-filters/unformat_abstract.py at master · sergiocorreia/panflute-filters. But in cases if we don't need to manipulate the metadata, but just to extract some information from there, getting the metadata in a dictionary is more natural. And similar to the current YAML filter, rather than reinventing the wheel each time in a filter, may be it will be more useful to add it in panflute as part of the "batteries".

On the other hand, I understand that such a filter is quite different from the current YAML filter, since the original YAML metadata is lost from pandoc.

Something assert by panflute about pandoc table can be False while pandoc is ok about it

I don't have much, and it might actually be a bug in pandoc, see bug: docx (containing table) to native and docx to markdown then to native is hugely different - Google Groups.

So far what I've got in some kind of MWE is this:

[Table [] [AlignDefault,AlignDefault,AlignDefault] [0.0,0.0,0.0]
 [[Para [Str "x",Space,Str "y"]]
 ,[Para [Strong [Emph [Str "a",Space,Str "b"]]]]
 ,[Para [Strong [Emph [Str "Math"]]]]]
 []]

If you try pandoc -t markdown -f native on this file, pandoc would accept it. But when passing through panflute, it will complains IndexError: list index out of range.

But it could well be a bug from pandoc though.

Misc. ideas

The .toJSONFilter() and .toJSONFilters() method names are not Pythonic at all (and hard to understand unless you were a previous user of pandocfilters or know the internals of Pandoc). Maybe change it to something like .run_filter() and .run_filters() (but keep the old names as wrappers for compat!)
Running several filters one after another is slow, because each filter has to decode JSON from stdin and then encode it back, and that's where most of the time is spent.
Maybe we can list the filters we want to run in the YAML metadata (eg: panflute-filters: onefilter, another).
Then, we can do python somedoc.md -F panflute, and panflute itself can be used as a filter that calls the filters listed in the metadata. This fixes problem 2.

TLDR:

Rename .toJSONFilter() while keeping the old name as wrapper for compatibility
Allow panflute to be run as a filter, where it calls the list of filters listed in the metadata.

panflute.yaml_filter: error handling when YAML is invalid

I'm writing a test on invalid YAML:

``` {.table}
---
caption: *unquoted*
...
1,2
3,4
```

which would gives this error:

yaml.scanner.ScannerError: while scanning an alias
  in "<unicode string>", line 1, column 10:
    caption: *unquoted*
             ^
expected alphabetic or numeric character, but found '*'
  in "<unicode string>", line 1, column 19:
    caption: *unquoted*
                      ^

I'm thinking how to handle the error. May be an option can be given to panflute.yaml_filter such that it will do nothing (leave the code block as is) rather than raising an error?

How to write a filter that runs all filters that are written in code chunks inside markdown?

How to write a filter (meta-filter) that runs all filters that are written in code chunks inside markdown?

Some code chunk can have attributes like panflute-name=filter_x. And metadata can have:

panflute-filters: [meta-filter, filter_x, filter_y]
panflute-path: 'C:/meta-filter'

I know how to find all such chunks. Then that meta-filter writes code from each chunk to it's own file like C:/meta-filter/filter_x.py before it's main() function.

The question: would filter_x run in this scenario or reading from that file or checking if it's exist would occure before first meta-filter run?

plantuml example does not work

The plantuml example seems to be missing a lot of code. It uses variables that were never initialized for example, it uses filename, but it is never defined or assigned any value.

Make Python2 version of `panflute` pip installable

Documentation discrepancies: pandoc-filters or panflute-filters meta-data field.

The general description of panflute (here) indicates that you should use the meta-data field pandoc-filters, however this link states i more details that you should use panflute-filters.

I assume that the first link is a typo or legacy?

Support Pandoc's new JSON API

Pandoc will introduce several "breaking changes" in its JSON format (changelog).

Pandoc changes

Elements with no children (Space, SoftBreak, LineBreak, HorizontalRule, Null) previously had an empty array for their "c" value. We remove that here.

New toplevel JSON format with api-version. The toplevel format was previously:

[{"unMeta": META}, [BLOCKS]]

It is now:

{
 "pandoc-api-version" : [MAJ, MIN, REV],
 "meta" : META,
 "blocks": BLOCKS
}

Add a LineBlock block element

UNIX: error as panflute.shell() replaces slash to backslash

`panflute.shell()` internally replaces forward slash`/` of given argument to backslash`\` but this makes error on unix shell.

import panflute as pf

pf.shell("ls") # works
pf.shell("/bin/ls") # fails

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/site-packages/panflute/tools.py", line 288, in shell
    proc = Popen(args, stdin=PIPE, stdout=PIPE, stderr=PIPE)
  File "/usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 947, in __init__
    restore_signals, start_new_session)
  File "/usr/local/Cellar/python3/3.5.2_1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/subprocess.py", line 1551, in _execute_child
    raise child_exception_type(errno_num, err_msg)
FileNotFoundError: [Errno 2] No such file or directory: '\\bin\\ls'

Is this intended? Other way makes sense for me
because windows shell understands forward slash.

looks-like cause of error is in tools.py, line 285:

...

    args = [arg.replace('/', '\\') for arg in args]

...

confirmed on Ubuntu 16.04 and MacOS

Install via conda

It would be nice to have panflute installable in conda (Anaconda/Miniconda). Please consider putting it to conda-forge.

By the way. I've seen your ideas about Pandoc filters distribution and came to the conclusion that it would be more concise to distribute any filters in separate pip or conda packages. Conda is welcome for distributing compiled binaries so I'm going to put even pandoc-crossref there some day.

sergiocorreia / panflute Goto Github PK

panflute's People

Contributors

Stargazers

Watchers

Forkers

panflute's Issues

Implementation:

Pandoc changes

panflute.shell() internally replaces forward slash/ of given argument to backslash\ but this makes error on unix shell.

Recommend Projects

Recommend Topics

Recommend Org

`panflute.shell()` internally replaces forward slash`/` of given argument to backslash`\` but this makes error on unix shell.