python-parsy / parsy Goto Github PK

Easy and elegant parser combinators for Python. With awesome docs.

License: MIT License

Python 99.13% Shell 0.87%

parsy's Introduction

parsy

Parsy is an easy and elegant way to parse text in Python by combining small parsers into complex, larger parsers. If it means anything to you, it's a monadic parser combinator library for LL(infinity) grammars in the spirit of Parsec, Parsnip, and Parsimmon. But don't worry, it has really good documentation and it doesn't say things like that!

Parsy requires Python 3.7 or greater.

For a good example of the kind of clear, declarative code you can create using parsy, see the SQL SELECT statement example or JSON parser.

Links:

To contribute, please create a fork and submit a pull request on GitHub, after checking the contributing section of the docs. Thanks!

If you like parsy and think it should be better known, you could:

Star this project on GitHub.
Vote for it being included on awesome-python.

Parsy was originally written by Jeanine Adkisson, with contributions by other people as can be found in the git commit history.

parsy's People

Contributors

Stargazers

Watchers

parsy's Issues

Bug with backtracking and generate?

It appears that parsy has issues with, e.g.,

import parsy
@parsy.generate
def parser():
  a = yield parsy.regex('.*')
  b = yield parsy.string('za')
  return [a, b]
parser.parse('hiza')

Which yields

ParseError: expected 'za' at 0:4

This looks like a backtracking issue to me, but I did not investigate parsy's internals too much.

Add `one_of` parser

Should accept either a string (which will be split into characters), or a list of strings.

def one_of(strings):
    """
    Returns a parser that matches any of the passed in strings
    """
    # Sort longest first, so that backtracking works correctly
    return alt(*map(parsy.string, sorted(strings, key=lambda s: -len(s))))

async/await support in generate()

Hi there,

I was wondering if you would accept a PR that adds supports for await syntax in generate() as an alternative to yield. The reason is mainly because await syntax contains fixes that were never "backported" to yield, such as the ability to use it inside a list/dict/generator comprehension.

For example I wanted to create a simple combinator:

def as_tuple(*parsers):
    return tuple(
        yield p
        for p in parsers
    )

But that is not valid Python syntax. OTH, this is:

async def as_tuple(*parsers):
    return tuple([
        await p
        for p in parsers
    ])

I would be happy to make a PR as the support is trivial to add:


import functools


class Parser:
    def __init__(self, x):
        self.x = x

    def __repr__(self):
        return f'Parser({self.x})'

    def __await__(self):
        return (yield self)


def coroutine_function_to_generator_function(f):
    @functools.wraps(f)
    def wrapper(*args, **kwargs):
        coro = f(*args, **kwargs)
        x = None
        excep = None
        while True:
            try:
                if excep is None:
                    future = coro.send(x)
                else:
                    future = coro.throw(excep)
            except StopIteration as e:
                return e.value
            else:
                try:
                    x = yield future
                except BaseException as e:
                    excep = e
                else:
                    excep = None

    return wrapper


@coroutine_function_to_generator_function
async def as_tuple(*parser):
    return tuple(
        [
            await p
            for p in parser
        ]
    )


print(list(
    as_tuple(Parser(1), Parser(2))
))

generate() could use inspect.iscoroutine() to check if the value returned by the wrapped function is a coroutine or a generator (it's usually better to do that rather than test the function itself with inspect.iscoroutinefunction() since a transparent wrapper might not be a coroutine function itself, e.g.:

# This is a coroutine function
async def f():
    ...

# This is not a coroutine function (not `async def`), and yet it returns a future all the same.
def wrapper(*args, **kwargs):
    return f(*args, **kwargs)

)

alt doesn't use fallback parsers when initial parser has .many()/.sep_by()

Steps to reproduce

>>> p = alt(string("a").many(), string("abc"))
>>> p.parse("abc")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../parsy/__init__.py", line 98, in parse
    (result, _) = (self << eof).parse_partial(stream)
  File "/.../parsy/__init__.py", line 112, in parse_partial
    raise ParseError(result.expected, stream, result.furthest)
parsy.ParseError: expected one of 'EOF', 'a' at 0:1

The expected output should be "abc".

I'm not sure whether this is intended behaviour or a bug. Intuitively, I would expect the parser to fallback to another parser when the first one fails, without special cases that break this rule.

Recompute line number for ParseError passed up from .bind

ParseErrors in a ".bind" contain stream and index information relative to their own context.
If this isn't handled in the bind, but rather passed up through the bind to the parser with the larger context, a new ParseError should be created containing information from the nested parse context and the current parse context (perhaps with an attribute containing a nested structure similar to the error graph - or maybe it's just linear?).

My naive messy implementation just adds arguments to the .bind function and has a user supplied function recompute the index information by adding the index of the subparser to the index from the outer context, which iI'm not sure is 100% correct.

Pull in things from https://github.com/jneen/parsy/pull/5

jneen#5
Particular, I think we want:

sep_by
related changes to times etc.
should_fail (is there a better name for it, like 'negate' or something?)

Strange peek behaviour/generator exception handling?

~~I've been staring at this for two days and haven't been able to figure it out:~~ I'm pretty sure at this point the problem is as described in the later paragraphs, and I rubber-ducked myself.

Disregarding my weird function naming, why does block_lex_transformer keep trying to parse past the end of the input? I figured peek() would yield a ParseError if I tried to do that, but it doesn't seem to be the case?

peek(any_char).parse_partial("") gives a ParseError.

I don't know much about how coroutines/generators work, my suspicions are that this isn't working because a GeneratorExit exception seems to come from yield peek(seq(sideToken, line(id))) , but I don't know how to deal with it.

I think that happens because googling suggests an exception in a generator will stop the generator, I didn't immediately see anything explaining how to have it continue and catch the exception?

# some context for the following snippets
from dataclasses import dataclass
from parsy import *

@dataclass
class Neutral:
    depth: int

@generate
def neutral():
    return (yield test_item(lambda i: isinstance(i, Neutral), "neutral"))

sideToken = neutral

take = test_item(lambda x: True, "any nested")

def wrap(fn):
    def thing(res):
        try:
            result = fn.parse(res)
            return success(result)
        except ParseError as e:
            return fail("idk, failure '%s'" % e)
    return thing


def line(fn):
    return take.bind(wrap(fn))

@generate
def id():
    return (yield any_char.many().concat())

@generate
def block_lex_transformer():
    token = yield sideToken  # TODO ok?
    curDepth = token.depth

    res = [Neutral(0), (yield line(id))]
    while True:
        try:
            side, ln = yield peek(seq(sideToken, line(id)))  # go to except branch if fail
        except ParseError as e:  # todo havent actually tested this explicitly
            if not e.stream:
                return res
            else:
                raise e
        if side.depth >= curDepth or ln == "":  # as deep or deeper or empty line
            (yield sideToken), (yield line(id))  # actually consume the two tokens we just peeked
            res += [side.__class__(max(side.depth - curDepth, 0)), ln]
        else:
            return res
    return res

block_lex_transformer.parse([Neutral(depth=0), 'test'])

Link to examples/json.py from docs

New release?

There have been some new capabilities added to the project since the November 2017 release. Are there any plans to cut a new release that is more recent?

TravisCI should run flake8/isort/check-manifest etc.

I'm not quite sure how to do this right now in an elegant way that separates these checks from the test suite runs.

[bug] Parser.desc() causes loss of error information, simple fix

Parser.desc causes loss of error information on failure of the original parser. So when a component parser (say in a generate) with a desc fails, the super-parser will not show the correct position or expected message in the result. I can give an example if you like, but it's easier just to look at the code because the problem is fixed with a small change: aggregate the original parser's failed result into the return value.

In Parser.desc, the original wrapped function looks like

        @Parser
        def desc_parser(stream, index):
            result = self(stream, index)
            if result.status:
                return result
            else:
                return Result.failure(index, description)

In the case of failure, the error information in result is lost because it is not used, giving the unexpected behavior. Instead, just change the last line as follows:

        @Parser
        def desc_parser(stream, index):
            result = self(stream, index)
            if result.status:
                return result
            else:
                return Result.failure(index, description).aggregate(result)

and the error information is correct.

Thanks. I'm enjoying using parsy.

Weird bind implementation?

Hi, thanks for making parsy, I like it!

I'm currently succeeding at shooting myself in the foot with heterogeneous inputs, because I want to be able to process a list of "symbols" and strings:

#some context for the following snippets
from dataclasses import dataclass

@dataclass
class Indent:
  depth: int

@generate
def indent():
  return (yield test_item(lambda i: isinstance(i, Indent), "not indent"))

sideToken = indent

newline = string("\r\n")
preline_ = (newline.should_fail("not newline") >> any_char).many().concat()
line = preline_ << newline
take = test_item(lambda x: True, "any nested")

I'm probably not understanding something, why do I need to use .bind like this?:

>>> wrap = lambda fn: lambda res: success(fn.parse(res))
>>> ( sideToken >> take.bind(wrap(line)) ).many()
        .parse([Indent(0), "asd\r\n", Indent(1), "asdf\r\n"])
['asd', 'asdf']

instead of: (the following is pseudocode)

>>> ( sideToken >> take.bind(line) ).many() \
        .parse([Indent(0), "asd\r\n", Indent(1), "asdf\r\n"])
['asd', 'asdf']

Which is to say, why do I have to add an extra layer of wrapping and calling .parse?

parser.times(None) does not work

The definition of times reads:

def times(self, min, max=None):
    # max=None means exactly min
    # min=max=None means from 0 to infinity

But setting min=max=None doesn't actually work. many calls self.times(0, float('inf')) to get the zero-to-infinity behavior -- the comment is probably out of date and should be removed if that's the case.

Help with parsing that "hangs"

Dear community, thanks for the nice library. I am not very familiar with parser combinator libraries yet, but I sometimes encounter expressions that get stuck, such as this one:

>>> spaces = regex(r'[ \t]*')
>>> word = regex('[a-zA-Z0-9\-._:%]*')
>>> words = word.sep_by(spaces)
>>> words.parse('ak kjd l  lksdjf')

Any guidance on what I'm doing wrong here? Thank you.

Document `generate(description)`

Infinite recursion when parsing expression grammar

I am trying to write a parser program that will parse a very minimal expression language; for now, all I want are variable names, string or numeric values, basic comparators (=, !=, <, >, <=, >=), boolean operators (AND, OR, and NOT), symbols for true, false, and null, and parentheses. I want to break these expressions up into parts that I can then assign to a syntax tree.

I've been able to make a parser which can do the simple comparison case, of "variable comparator value" form: x = 1, y = 2, etc. I can make it strip off the parens, though I don't yet know how to implement precedence. And I do have basic handling of an isolated boolean operator: I can parse a single AND expression, or OR expression, of the form "foo = 1 AND bar != 2" or the like.

But when I try to use the generalized form, the system overflows the maximum recursion stack. I'm not sure what I'm doing wrong here?

I've attached my code file. I cribbed some of it from the JSON example, such as the lexeme trick for dealing with whitespace.

parsertest.py.txt

Enable usage as both parser and lexer

See jneen#3 (comment)
Also https://github.com/bugaevc/parsy/blob/simple_eval/examples/simple_eval.py

RFC: Automagical left recursion elimination

Sometimes, it would be very convenient to implement a parser using left recursion, like in this example:

number = digit.at_least(1).map(''.join).map(int)

@generate
def expr():
    lhs = yield expr
    op = yield string('+') | string('-')
    rhs = yield number
     if op == '+':
        return lhs + rhs
    else:
        return lhs - rhs

expr |= number

Note that this cannot be easily rewritten to use number +/- expr, because we need to parse 1 - 2 - 3 as (1 - 2) - 3 and not 1 - (2 - 3).

This currently doesn't work because calling expr inside of expr causes infinite recursion. The proper way to implement this is to unwrap the recursion manually, i.e. to use an explicit loop that consumes and adds/subtracts a sequence of numbers. However, this is much less convenient than implementing 'the mental model' directly.

This RFC proposes for parsy to be able to do such a transformation automatically.

A proof-of-concept (barely readable) implementation is here, and here's an explanation in pseudocode (that skips over many small details):

recursion_detected = False
parser_is_running = False
saved_result = None

def invoke_parser():
    global parser_is_running, recursion_detected, saved_result

    if not parser_is_running:
        # we are the first invocation of this parser at this index
        parser_is_running = True
        res = normal_logic()
    else:
        # we are the second left-recursive invocation
        recursion_detected = True
        return saved_result or fail('recursion detected')

    # we are the first invocation again
    if not recursion_detected or res.is_fail():
        parser_is_running = False  # and other cleanup
        return res

    while True:
        saved_result = res
        res = normal_logic()
        if res.is_fail():
            parser_is_running = False  # and other cleanup
            return saved_result

This works as if the recursion magically failed at just the right depth, giving the alternative sub-parsers (number in the example above) a chance to try their logic.

Unresolved questions:

Should this be implemented for every parser, or should this be an explicit wrapper? If latter, do we call it parsy.recursion_catcher(), Parser.left_recursive() or what?
Does this work for cases that are more complex than a simple left recursion in one of the parsers?
How do we make it thread- (and multiple independent invocation-) safe?

maintaining state while parsing

Sometimes while parsing to you need to state some global state. eg. footnote number while parsing
markdown document.
What would be a preferred way to do so in parsy? Only thing that comes to my mind is global variables.
I tried by encapsulating the state in class but the generator function doesn't accept arguments.

@generate
def function(arg1,arg2) # does not work

I am new to decorators in python, could not get around.
Maybe if some one could provide a example?

Interested in a version of parsy with type annotations?

I had a look at the code base, it shouldn't be too difficult to add type annotations. I'd be happy to make a PR in that direction.

Is there interest in that direction?
If typing the code requires breaking changes or bumping the minimal Python version (I don't see why, but you never know), would you prefer releasing a new major version, or that I fork the library?

Processing list of tokens

Thanks for the library, I have a more of a question and less of an issue.

What is the idiomatic way of processing a list of tokens, discarding tokens that don't pass some test?

I see in the documentation the test_item parser for the testing but how do I go about skipping an item? The skip parser requires to tell it what to skip, I just want to discard the item all together and move to the next.

Thanks

Parsy 1.3.0 fails to support 'group' keyword of `regex` function

The following code fails:

hms_alt = parsy.regex(r'(\d{2}):', group=(1,))
list(toolz.map(lambda t: hms_alt.parse(t), ['01:', '02:']))

The error message is:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-79-eaea1867982d> in <module>
----> 1 hms_alt = parsy.regex(r'(\d{2}):', group=(1,))
      2 list(toolz.map(lambda t: hms_alt.parse(t), ['01:', '02:']))

TypeError: regex() got an unexpected keyword argument 'group'

When I query the signature of the regex method, it reports flags as a keyword but not group:

parsy.regex?

Signature: parsy.regex(exp, flags=0)
Docstring: <no docstring>
File:      c:\users\larry.jones\appdata\local\pypoetry\cache\virtualenvs\orchid-python-api-_tsnd6qt-py3.8\lib\site-packages\parsy\__init__.py
Type:      function

When I look at the source code, I see the following code for parsy.__init__:

def regex(exp, flags=0):
    if isinstance(exp, str):
        exp = re.compile(exp, flags)

    @Parser
    def regex_parser(stream, index):
        match = exp.match(stream, index)
        if match:
            return Result.success(match.end(), match.group(0))
        else:
            return Result.failure(index, exp.pattern)

    return regex_parser

I may not fully understand how everything was implemented but this definition only has the flags keyword and does not have the group keyword.

I am using version 1.3.0 of the package on Windows 10.

Finally, the latest source code in GitHub (src/parsy/__init__.py) does include the group keyword so the cause of the error appears to be in the build process.

Improve debugging: peek show next data in errors

It would probably help a lot if error messages showed some of the upcoming data in parse errors with streams where this is possible (I don't know if parsy supports iterators).

Inline (explicit) and implicit tracing

Proposal:
Variant 1: Explicit inline tracing
There should be a debug() "parser" and a .debug() method. These should both pass through the values going through them and yield information about the parser state out of band. This the goal is to allow tracing parser state at any interface between parsers.

What immediately comes to mind for yielding information is print(), but it's probably better to have a default implementation and allow the user to override with a callback which gets passed all availible state.

Perhaps the initial implementation should yield some manner of structure (e.g. a (parsername, stream, index) namedtuple, or something, don't ask me how to implement things like parsername yet :P) and have a default callback implementation, which .debug() defaults to.

#46 (comment) is a partial example of this.

Variant 2: Implicit tracing with filtering
It shouldn't be necessary to change your code to be able to debug it.
parsy doesn't really have an "engine" (afaik?) so I assume all parsers will need to be modified to support this mode;
Since the parsers and combinators are pure functions it should be possible to wrap them all in a debugger function that will trace their i/o when the user requests it. Since you aren't specifiying where to trace by inserting a .debug() snippet, there should be a way to filter trace output a bit. One way to do this might be based on the stream index, so one can specify which section of the text that is being parsed, to trace.

This would need a little extra integration to pass the outer parser context into .bind(), so the parser can decide if it needs to trace? #47

Perhaps for this variant, it is mostly sufficient to augment Parser?

A drawback here is that it increases overall complexity and might be adding some global (albeit orthogonal to parsing,) state (namely, the callback)?

Misc:
Variant 1 seems easier to do, but it may be ideal to have both variants.

Variant 2 filtering may also be useful for variant 1.

Ensure 100% code coverage.

Possibly add https://coveralls.io/ integration as well

alt operator : alt(A, B) != alt(B, A) ?

Hello,
Maybe I missed something :

from parsy import *
parse_level_docstring= '''
    Level-numbers: 01-49,66,77,88.
    Each word must be a 1-digit or 2-digit integer.
    '''
@generate(parse_level_docstring)
def level_specials():
    return (yield regex(r'(66|77|88)'))

@generate(parse_level_docstring)
def level_normal():
    level = ( yield regex(r'[0-4]?[0-9]') )
    return level

with :

@generate(parse_level_docstring)
def level_num():
    level =  yield ( alt (level_normal , level_specials )   )
    return level

level_num.parse('88') : KO 88, should be OK
level_num.parse('10') : OK
level_num.parse('55') : KO

with :

@generate(parse_level_docstring)
def level_num():
    level =  yield ( alt (level_specials, level_normal  )   )
    return level

level_num.parse('88') : OK
level_num.parse('10') : OK
level_num.parse('55') : KO

combine fails when nothing is produced by many

Given the grammar in PCRE notation:

(?(DEFINE)
  (?<prop_id>[A-Z]+)
  (?<prop_val>\[.+\])
  (?<prop>(?&prop_id)(?&prop_val))
  (?<node>;(?&prop)*)
  (?<tree>\((?&node)+(?&tree)*\))
)
^(?&tree)$

My parsing code is as follows:

def parse(input_string: str):
    prop_id = regex(r'[A-Z]+')
    prop_val = string('[') >> regex(r'[^]]+') << string(']')
    prop = seq(prop_id, prop_val.at_least(1)).combine(
        lambda x, y: {x: y}
    )
    node = string(';') >> prop.many().combine(
        lambda x: SgfTree(properties=x)
    )
    tree = forward_declaration()
    tree.become(string('(') >> seq(nodes=node.at_least(1), forest=tree.many()) << string(')'))
    return tree.parse(input_string)

But this blows up for the input (;).

>   return self.bind(lambda res: success(combine_fn(*res)))
E   TypeError: parse.<locals>.<lambda>() missing 1 required positional argument: 'x'

The input is parsed properly, but the conversion fails. Perhaps combine should have a default value or x should be None so that the function can handle defaults?

>>= for bind

Python appears to have __irshift__ for >>= which corresponds to the typical (Haskell?) syntax for the monadic bind operator. I think it would be good to use this for .bind?

See https://docs.python.org/3/library/operator.html

Enhancement: Convenience method to emit something without consuming input

A couple of times I've found myself wanting a builtin method that produces some output without consuming input when none of a set of other possibilities apply; along the lines of:

some_parser.or_else(my_default_value)

(Following this thinking, the recently added optional() parser method could be thought of as some_parser.or_else(None))

In the absence of a compact notation for this, I have used a function like

def emit(content):
    return seq().result(content)

which can be used as some_parser | emit(my_default_value). However, abusing seq() like that seems a little inelegant/non-obvious – though I don't mind the interface as an alternative for the same thing

Alternatives include using e.g. a lambda or functools.partial() together with combine() etc., but when I try this it always seems to end up less legible overall.

Could this be worth building into parsy?

Tagged unions/product types for results

seq provides a convenient way to return a nicely structured result (and close enough, if not exactly what you want most of the time), namely by using it with kwargs.
e.g.

>>> seq(a=any_char, b=any_char).parse("ab")
{'a': 'a', 'b': 'b'}

This is basically an encoding of a product type.

It would be nice if there was a similarly convenient way to return something analogous for alternatives (i.e. alt or the | operator), the result of which is encodable as a tagged union/is an encoding of a sum type.

Example pseudocode:

>>> alt(a=any_char, b=any_char).parse("a")
{'a': 'a'}
>>> alt(a=any_char, b=any_char).parse("b")
{'b': 'b'}

I don't see a way to use this together with an operator without hacks like passing it tuples or something (like (name, parser) | (name,parser)), but i think this would still be nice to have for the alt() form.

Operator for seq()

I don't know if it was already discussed, but it'd be nice if we have a shorthand operator (i.e. &) for seq() which happens to show up quite often in the grammar.

class Parser(object):
    ...
    def __and__(self, other):
        return seq(self, other)

I guess something like this should simply implement one.

Seems like other parser combinators often have one already:

& for parsita
>>/&& for parsnip
+ for PaserCombinator.jl

Seeking Guidance on Implementing Parser Autocomplete

Hello,

I'm working on an open-source infrastructure resource tool called resoto. Part of its functionality involves a query language that we've implemented using parsy. You can find the related parsing code here.

I'm looking to enhance the user experience by offering autocomplete suggestions as they write their queries. For instance, in a query like is(account) and name, the probable next characters could be ==, in, =~, or !=.

I have a couple of questions:

Autocomplete Suggestions from Grammar: Given our current grammar, is there a straightforward way to infer the possible next characters? The example above shows possible next tokens that can be derived directly from the grammar.
Context Information from Parsy: Our query language has tokens for values encoded as literals, like in the filter name == "test", where "test" is treated as a literal. To offer meaningful autocomplete suggestions, I need to understand which specific parsy Parser element is currently in use. Is there a way to retrieve such context from parsy?

Any advice or recommendations would be greatly appreciated. Thank you for your time!

Fix or remove Python 3.3 tests

Hi, and thanks for the work here.

Probably due to 3.3 series end of life ( https://www.python.org/downloads/release/python-337/ ) it looks like the the 3.3 archive is not available anymore. It's probably safer to remove the tox target altogether, and stop supporting 3.3.

Allow providing a default to optional()

It would be convenient to have optional(default_value) which would return default_value instead of None.
This can be seen as complementary to result(default_value) that always return a fixed value.

Release

Any chance we could bump a version and get the latest changes into a release for PyPi?

Missing seq import statement in tutorial

I apologize if this is a nit, but in working through the tutorial I noticed the code snippet below left out an import/from statement for the seq function before using it. Explicit imports of regex and string were included earlier in the tutorial, so I asssume the absence of a similar explict statement for seq qualifies as a documentation bug.

Improve debugging ergonomics

I've been wondering what it would take to improve the debugging experience for combinators in generalm but here are some specific ideas (TODO: breakdown based on usecases; which part of my code is failing? what data does it see?):

parse errors show where they failed and what symbol they were expecting, this is somewhat redundant with the "line number:character" location data, but it would still be helpful to show the next few symbols the parser is failing on. #46
a .debug() method which can be added inline wherever you need to see the output of the previous parser method / the input to the next parser method #49
It would be good to also have something for things like seq(a, seq(b, c)) where you can insert a debug on the left of b, to see what input it's getting. I.e. you can't use a method at the beginning of a sequence of parsers so it needs to be a function different from .debug() #49
It would also be cool if you didn't need to annotate with .debug wherever you wanted to debug something, but could get the parser to give information about the relevant things without changing your code. The only way to do this that I've been able to think of is to write an extra wrapper for every function? And for filtering, to be able to say "return trace information between characters x and y"'. #49
I haven't found trying to use the debugger particularly illuminating (IIRC?) because it never jumps back into user code where I can see which part of my grammar is causing problems. It would be cool if there were ways to make using a debugger more ergonomic but I have no ideas for that...
It would be helpful if .bind could handle ParseError (stream, index) properly and return the appropriate error information when a ParseError gets passed up to the larger context. #47

Better docs for 'digit' and add 'decimal_digit'

digit docs should explain that things like ¹ are matched. We should probably provide a decimal_digit parser that matches just [0-9] (better name possible?)

Missing documentation for eof parser?

First off, I want to say I am blown away by how thorough and well-written the docs for this library are. :)

That said, it seems the API reference is missing an entry for eof, even though it is used in this how-to. I'm thinking it should have an entry in the "Primitive parsers" section.

I'd be happy to write a PR for this.

Examples in tutorial don't quite work exactly as given

http://parsy.readthedocs.io/en/latest/tutorial.html

optional_dash = dash.at_most(1)

@generate
def full_or_partial_date():
    d = None
    m = None
    y = yield year
    dash1 = yield optional_dash
    if len(dash1) > 0:
        m = yield month
        dash2 = yield optional_dash
        if len(dash2) > 0:
             d = yield day
    if m is not None:
       if m < 1 or m > 12:
           return fail("month must be in 1..12")
    if d is not None:
       try:
           datetime.date(y, m, d)
       except ValueError as e:
           return fail(e.args[0])

    return (y, m, d)

This works now works as expected:

>>> full_or_partial_date('2017-02')
(2017, 2, None)
>>> full_or_partial_date('2017-02-29')
ParseError: expected 'day is out of range for month' at 0:10

But if you copy and paste this and run it you get:

TypeError: __call__() missing 1 required positional argument: 'index'

I was a bit worried at first but I thought I'd just guess that maybe when they're called from a parse method they receive the current index:

 >>> full_or_partial_date('2017-02', 0)
 (2017, 2, None)

works, as does:

>>> full_or_partial_date.parse('2017-02')
 (2017, 2, None)

A slight speed bump for beginners :)

Document the usage of `parsy.Parser` use as a decorator

We'd probably need to documents Result as well.

Missing utilities

There are probably a few missing pieces in terms of building parsers. Some of the following from Parsec/Parsimmon:

It would be good to be able to easily express "discard everything until ..." or "collect everything until ..." which doesn't seem to be easy at the moment, but would be easier with the above utilities.

python-parsy / parsy Goto Github PK

parsy's Introduction

parsy

parsy's People

Contributors

Stargazers

Watchers

Forkers

parsy's Issues

Steps to reproduce

Recommend Projects

Recommend Topics

Recommend Org