Git Product home page Git Product logo

Comments (11)

sergiocorreia avatar sergiocorreia commented on July 21, 2024
  • Fetch filters from $DATADIR/filters (in Win and *nix) and then in path
  • The natural extension is to fetch filters from an external repo and dump them into $DATADIR/filters , but that would probably be another package

A few useful filters should be more easily available:

  • debug_json --> pretty print the resulting JSON
  • frequency --> print freq tables

from panflute.

sergiocorreia avatar sergiocorreia commented on July 21, 2024

from panflute.

sergiocorreia avatar sergiocorreia commented on July 21, 2024
  • add a OrderedDict when parsing every element, to avoid depending on a specific order of "t" and "c" content

from panflute.

ickc avatar ickc commented on July 21, 2024

Allow panflute to be run as a filter, where it calls the list of filters listed in the metadata.

Are there any interest in turning this repo to be a centralized panflute filters gallery? I'm building an extended version of panflute csv2table based on yours in ickc/pandoc-table-csv-test/panflute-csv2table.ipynb. I almost finished it (need to think about the exact metadata keys to use, cleanup, etc.) and am thinking about how to distribute it.

From pandoc-discuss we discussed the need of a centralized pandoc filters library, as well as being easy to install. I'm thinking may be we can start from panflute? So, say, everyone made pull-request of their scripts into panflute (with some minimum requirement, say, naming scheme, version numbering, etc.), and then they will be bundled with panflute, with the said metadata controls which filters are used. And all people need will be adding, say, --filter=panflute in the pandoc arg.

I am considering porting my pandoc-amsthm in panflute too. And I need a variants of pandoc-includes (the one on panflute seems great). I considered writing haskell filters but it is a pain to make sure the colleagues can install it. pip is much easier (because python is almost ubiquitous) but @jgm specifically said his pandocfilters isn't a centralized repository. So you are my last hope to streamline the use of pandoc filters. No pressure though. ๐Ÿ˜„

from panflute.

kdheepak avatar kdheepak commented on July 21, 2024

I agree that having a centralized repository will be the best path forward. Would this fit an organizational structure better?

from panflute.

sergiocorreia avatar sergiocorreia commented on July 21, 2024

This would be a cool thing to have. Now, how would this work exactly?

  • Should we keep all filters in one repo? If so, we could use https://github.com/sergiocorreia/panflute-filters , and I can give you commit access.
  • Else, maybe as @kdheepak an organization works better?
  • Borrowing from other projects, we could also have one repo with a CSV file that lists all the filters, and then the repo only gets updated when a filter is added, and each of us can work on our own filters without much worry. This is perhaps the saner alternative in the long term, as less trust is required when other contributors add their own filters.

About the role of panflute: maybe we can list the filters used as metadata, and then have panflute auto-install them from this repo?

from panflute.

ickc avatar ickc commented on July 21, 2024

Organization

I have been thinking about setting up a GitHub organization about pandoc. It would actually be nice to have pandoc/panflute etc. all fall under 1 umbrella organization. I didn't ask @jgm but think that probably he wouldn't want to do that.

[Sidenote: About organization: sadly, pandoc has already been taken, by some guy that has 1 repo with no active development, and the contents are pirated Chinese fictions in pandoc markdown. And I already filed a complaint (that the content violates copyright and hence the GitHub terms and conditions), but GitHub refuses to take it down and requires the copyright owner to do so. However, while I "know" the copyright owners (best authors among Chinese fiction writer), they don't know me.]

Anyway, I suggest if an organization is setup, its name should be more generic, and allow the inclusion of projects other than panflute. This will becomes the "centralized gallery" I've been talking about. Possible names are

  • lapandoc: a word play on LaTeX from TeX, but la probably stands for Lamport, the creator of LaTeX.
  • pandocx: x stands for extra, but people might think pan-docx rather than pan-doc-x
  • pandoc-extras: boring but clear

all filters in one repo

I agree, wherever that repo is (say if GitHub Organization is used), panflute being able to auto-install it behind the scene would be excellent. In the latest version of pandoc, it means just putting it in data-dir/filters, which seems more secure. But in earlier version of pandoc, it means panflute need to either put those filters in the PATH, or export the path panflute is installing to PATH. Either way, it is insecure. I guess if this feature is implemented, we should say this is for pandoc >= 1.18 only.

Centralized Gallery

I kind of did the CSV thing in ickc/pandoc-filters/pandoc-filters.csv for currently available filters (but far from finished).

I think the list of all (panflute-)filters should fall in the same repo that contains those filter (say, panflute-filters), for easier organization. We can ask whoever making the pull request to the repo also adds their entry in the list (with a link to the documentation perhaps).

However there can be another separate repo that contain references to filters not in panflute-filters. (may be just transfer mine to the organization).

I think if we could auto-generate a website gallery of it, it would be great for filter discovery. I have some vague ideas about it, but don't know what's the best way to do it. (gh-pages has more limitation but seamless to GitHub. Travis is needed for test anyway, but requires more setting to customize a website build. And then there will be a question on which one to use, jekyll, yst, makefile+pandoc, etc.)

from panflute.

ickc avatar ickc commented on July 21, 2024

Just to mention another bonus of having a centralized repo for panflute filters: the naming scheme for filters can be shorter. Currently, people called the filters like pandoc-includes, pandoc-csv2tables, pandoc-placetables, pandoc-amsthm, etc. because they are submitted to cabal/pip, etc. and the prepended pandoc is for identification among the seas of packages. If the panflute filters fall in one repo, the prepended string won't be necessary, which allow a cleaner, shortner name.

from panflute.

sergiocorreia avatar sergiocorreia commented on July 21, 2024

I like your proposal, but my main concern is that complexity can explode. I think that there are several interlinked issues that we will benefit from treating separately:

  1. Filter hosting: let's not host all filters in one repo, it's not really needed and would create barriers to adoption. Something that might be useful is to allow for yaml files with the description of each filter (e.g. if you have pandoc-csv.py, then also have pandoc-csv.yaml, that has labels, description, sample usage, etc.)
  2. Instead, let's have a repo that lists the filters. So whenever you add a filter and want it indexed, just submit a PR with a one-line change. This repo can be in the pandoc-extras org (something like pandoc-extras/panflute-filters)
  3. Independently of that, someone can scrape the repos listed in (2) and create a nice gallery, also using the metadata discussed in (1).
  4. Finally, installing can be done from panflute.

About step 4, do you know how to use setup.py to include executable files? (I think it's called entry points). It would be cool if we allow panflute to be a filter, so if you do pandoc -F panflute .. then panflute checks the metadata and download+runs the required filters.

from panflute.

ickc avatar ickc commented on July 21, 2024

panflute as filter

I think it is something like

entry_points={
    `console_scripts`: [
        `panflute = panflute:main',
    ],
},

(And you need to provide a __main__.) If you want cli options, getopt would work.

Centralized repo or not?

One of the complexity involved and needed to balance is security. Let's say panflute choose the safest approach that only copy it to $DATA-DIR/filters and support this feature (of auto-download filters) for pandoc >= 1.18 only. Even in this case, there might be security implication since a user might have formerly added $DATA-DIR/filters to their PATH (when they were working with an earlier version of pandoc). So anything copied to that folder would be in the PATH and executable (probably, depends on how the user setup) without sudo. So then the panflute will open a point of attack to install arbitrary code.

And even if $DATA-DIR/filters is not in the PATH, panflute running the filters automatically still means it's an opening for attack.

[sidenote: I'm considering writing a filter that can execute code in the markdown source, say, through exec or ! in iPython. This also have security implication. And hypothetically, say, if such filter make a pull request to the said centralized-repository, I'm not sure if it should be accepted for the sanitization for security reason.]

That's the reason behind having the filters hosted in the same centralized repository. This way, the core-developers can verify the code is not malicious, and any change to the code requires a separate pull request for sanitization.

[another sidenote: I think the closest thing to our idea is \usepackage in LaTeX. Arbitrary \usepackage can be specified in the document, so the packages are centralized in CTAN for sanitization and distribution.]

If we really do not want centralized hosting, then we might need to learn from the example of how, say, brew handles it. For each additional unknown repository to add, you need to brew tap into that (manually). And then brew will also calculate the SHA-256 sum to check the source hasn't been modified (meaning if the source is modified, a separate pull request is required to update the SHA-256 sum, hence in principle it is sanitized.) This approach however, will take away the "seamless" part of our (at least my) dream.

But I understand the concern about complexity. For example, we can defines rules of submitting the filter (including a clear standard on specifying the author). Every issues submitted has to call the name of the author, and let the author deals with the bug (this is how Travis CI provides 3rd party/community-based languages). In addition, tests, docs, might also be required.

There's potentially a problem of resistance to adoption, and might consume too much time (who knows how much more busy we will becomes). But I think the security issue is more important. panflute will be given too much power (by downloading arbitrary executable codes, either directly or indirectly), and hence we should guard the filters it can download more carefully.

On the other hand, the added barrier might means a high quality of filters submitted, and lesser pull request to deal with. Given the pandoc community is relatively small and (probably) not much people are writing pandoc filters (although I'm sure one of the goal of panflute is to change this!), it seems probably we won't be too busy. (a data to backup this argument is, after a decade, the list in Pandoc Filters ยท jgm/pandoc Wiki is not very long. I'm sure only the people are motivated enough to put a link to their filter in pandoc wiki will be motivated to try a centralized filter repo.

By the way, I don't think having their filters submitted to a centralized repo means they can't have their own repo. Just like CTAN, some of the sources are elsewhere (say, in GitHub). They can even write a script to prepare their codes to be summited in our centralized repo.

from panflute.

sergiocorreia avatar sergiocorreia commented on July 21, 2024

Closing this as all the ideas are now either in separate issues or have been implemented.

Also see: https://github.com/sergiocorreia/panflute/projects/1

from panflute.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.