pyga / parsley Goto Github PK

View Code? Open in Web Editor NEW

287.0 287.0 53.0 870 KB

License: Other

Python 99.97% Shell 0.03%

parsley's People

Contributors

Stargazers

Watchers

parsley's Issues

Input matches rule A, but A|B fails to match

I'm trying to parse mathematical expressions with operator precedence, but I'm running into an odd problem where a rule of the form "A | B" fails to match, despite the fact that rule A matches the input string.
A stripped-down version of the grammar that exhibits this behavior:

spec = """
integer_literal = < digit+ >:x -> ("integer_literal",x)

oplevel_unary_left = ('+'|'-'):op unit:X -> ("op",1,op,X)
oplevel_unary_and_higher = oplevel_unary_left | integer_literal
oplevel_exponent = oplevel_exponent_and_higher:L ws ('^'):op ws oplevel_unary_and_higher:R -> ("op",2,op,L,R)
oplevel_exponent_and_higher = oplevel_exponent | oplevel_unary_and_higher
oplevel_mult = oplevel_mult_and_higher:L ws ('*'|'/'|'%'):op ws oplevel_exponent_and_higher:R -> ("op",2,op,L,R)
oplevel_mult_and_higher = oplevel_mult | oplevel_exponent_and_higher
"""

parser = parsley.makeGrammar(spec, {})

Then I get

>>> parser("2*3").oplevel_mult()
('op', 2, '*', ('integer_literal', '2'), ('integer_literal', '3'))
but
>>> parser("2*3").oplevel_mult_and_higher()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Program Files\Anaconda3\lib\site-packages\parsley.py", line 98, in invokeRule
    raise err
ometa.runtime.ParseError:
2*3
  ^
Parse error at line 1, column 2: expected EOF. trail: [digit]
(or similarly with exponent_and_higher vs. just exponent)

I don't see how it's even possible for oplevel_mult_and_higher to fail when the first arm is oplevel_mult, which passes.

Documentation/example for "!(pythonExpression): Invoke a Python expression as an action." missing

Hi!
I found this in the reference:

!(pythonExpression): Invoke a Python expression as an action.

but I don't understand how it's used.
Any help appreciated!

There doesn't appear to be any way to generate from a `.parsley` file.

bin/generate_parser (which isn't installed) can generate one from a string in a python module, but not from a file.

Also, that program only appears to work from the parsley source, since ometa.vm_builder uses open with relative paths.

expr1 | expr2 not working as expected

I am experiencing some oddities with using or expressions. I have replicated it with a trivial example:

import parsley

x = parsley.makeGrammar(
    """
    away_we_go = ('away we go' | 'off and running') -> "away_we_go"
    back_underway = ('back underway' | 'back under way') -> "back_underway"
    all = (anything*) -> "all"
    foo = away_we_go | back_underway | all
    """,
    {}
)

print x("giorenagoirhger").foo()            # all
print x("giorena goirhger").foo()           # all
print x("away we go").foo()                 # away_we_go
print x("back underway").foo()              # back_underway
print x("go").foo()                         # all
print x("lets go").foo()                    # all
print x("we are back under way go").foo()   # all
print x("we are back under way").foo()      # all
print x("back under way go").foo()          # ERROR

Traceback (most recent call last):
  File "process/tests/parsley_test.py", line 21, in <module>
    print x("back under way go").foo()          # ERROR
  File "/Users/reuben/.virtualenvs/processor/lib/python2.7/site-packages/parsley.py", line 98, in invokeRule
    raise err
ometa.runtime.ParseError:
back under way go
 ^
Parse error at line 1, column 1: expected EOF. trail: []

unless I am mistaken "back under way go" should fail to match the away_we_go pattern and the back_underway pattern and then match the all pattern.

Support args for ParserProtocol.setNextRule

Currently, we can only do setNextRule("ruleName"). But sometimes, the nextRule may in the form of ：
ruleName :arg1 :arg2 = expr

As the argument is set by the caller, so, the currentRule needn't be in the form of term based Action(arg). It should directly be the data.
To accomplish this, I used the codeName to tell this apart. (I don't know if it's a good way to go or not.) When the codeName is None, it means the arg is already the pure data, which needn't be self._eval again.
A sample link:
https://github.com/introom/parsley/blob/master/ometa/interp.py#L154

To support rule invocation with argument, of course some other parts should be modified too.
I will discuss it later when this issue gets its attention.

@washort @habnabit

Python 3 support?

Are there any plans for Python 3 support? That's the one thing that keeps me from adopting this for use in matplotlib or astropy...

Empty grammar results in cryptic exception

    def moduleFromGrammar(source, className, modname, filename):
        mod = module(modname)
        mod.__name__ = modname
        mod.__loader__ = GeneratedCodeLoader(source)
>       code = compile(source, filename, "exec")
E         File "/pymeta_generated_code/pymeta_grammar__Grammar.py", line 5
E           if Grammar.globals is not None:
E            ^
E       IndentationError: expected an indented block

Bad example

The exceptions.py example is actually a copy (with errors in it) of the minml.py example.

pass current line and column to action

Dear developers,
Is there a way to pass the current line and column of the file being parsed to the action of a rule?

For example:

the grammar:

parameter = (word:w -> AnAction('parameter',str(w),line,column) )

the python file:

def AnAction(name, value, line, column):
   print(value +' ' + str(line) + ' ' + str(column))

the input text:

hello this is
a test

the output:

hello 1 1
this 1 7
is 1 12
a 2 1
test 2 3

I'm trying to build a compiler and I would like to check if a variable is already defined. So to make an error message it would be nice if Parsley would provide me with the location of the variable, even when the file has the correct syntax.

Thanks in advance!!!

given bindings with inputs

some_grammar = "s = 'something' "
bindings = { 'a': 'hello'}

instead of
g = parsley.makeGrammar(some_grammar, bindings)
g("blabla").s()
we want:
g = parsley.makeGrammar(some_grammar, {})
g("blabla", bindings).s()

and, it's even better if we could support overriding(the same name given latter override the previous one)

stack is compose

The docstring for stack ought to mention that it's function composition, rather than anything specific to parsley.

onFormError() callback

When a Parsley field is wrong, you have a onFieldError() callback, although a method onFormError()doesn't exist. Why cannot we create this method like onFieldValidate and onFormValidate? It would be more consistent.

ability to emit warnings and halt compilation

parsley.ometa.runtime.ParseError has a very nice way of formatting / printing error messages

It would be nice if there was some way to access this from the grammar rules. For example, if there was a type of exception that would be picked up and have the line and character added.

Also, it would be nice if there was a way to emit a warning with a given message formatted in the same way as the ParseError.

(This can kinda-sorta be achieved via print str(grammar.input.head()[1].withMessage("whatever warning"), but there doesn't seem to be a way to get access to the grammar object itself from inside a rule.)

docs could use more examples

As I was learning parsley, I made a bunch of little notes / idioms for doing things:

https://gist.github.com/kurtbrose/05f2dd879eba6a88a3dc7c13e36ce772

There are also some larger grammars I defined:

TLS RFC parsing:
https://gist.github.com/kurtbrose/bb98bdf42dc709cbc5a1b94b058b703c

Thrift IDL Parsing:
https://github.com/kurtbrose/thriftpy/blob/new_parser/thriftpy/parser/parser.py#L395-L497

Freeze

Is it possible to freeze code using parsley?
I couldnt with pyinstaller or py2app.
Im on OSX High Sierra.

Getting return value of ?() expression

I've found myself wanting to get the return value from a ?() expression, for example, when I have a lookup dictionary, and I want the rule to match if the word is present in the dictionary, and I also want to use the returned value. Currently you have to do something like:
thing = word:word !(things.get(word)):thing ?(thing) -> thing

Would be nice if it was possible to do something like:
thing = word:word ?(things.get(word))

Or is there a better way of achieving this functionality? ..would it be possible for ?() to to just return the result rather than True or False?

Compile to python example

Watching your pycon talk and looking through the code base, I gather that it's possible to compile from peg into python using parsley, but after a couple hours of searching I have yet to figure out the right structure to pass to the writePython function. Could you put together an example in the documentation?

Support custom rule functions

I just filed #60 where I outline some Unicode-related features Parsley is missing. I notice however I could have easily added these features in my own code, without modifying Parsley, if I could somehow define new rules. Parsley rules appear to all be functions that accept or reject a character. If I could define my own such functions I could define, for example, a function that brings in the Regex module and matches on the regex \p{XID_Continue}.

A friend mentioned they had hacked additional rules into Parsley in the past by calling undocumented functions, and it actually looks like I could possibly get what I want by extending OMetaBase. However, I would not trust such things unless they are to some extent documented because if they are undocumented I expect they could stop working suddenly in a future version.

possible changes to documentation

note: you have to read this in plaintext because there's some markup going on that is eating special characters

In learning parsley, I had trouble understanding what was going on. And after some help from dash on IRC, a light bulb went on and I think I have an explanation of the various components in parsley which may help others come up to speed.

In a variant of BNF, a parsley rule is of the form:

= [:data_name] [[: data_name]]* [-> ( )

The first simple magic is every pattern, when matched, yields a value. That value is either discarded, or if followed by a colon and a data name (i.e. a Python legal symbol), the returned value is placed into the data name.

The second simple magic is that after the arrow ('->') you can have a Python expression surrounded by parentheses. Each data name defined prior to this expression can be used as part of the expression. It's the way you transfer the values determined by the pattern matching into expression so you can do whatever calculation you want.

The third simple magic is that the result of the expression at the end of the statement is assigned to the name at the start of the statement. You can retrieve this value by treating the name as a method call (explanation TBD)

Now the next part gave me that "ah ha" moment. Do you remember the tale about the Hindu woman in a remote village who was asked about her view of a creation myth. Well, in this case, it's Python expressions all the way down.

Every pattern if it's not a terminal pattern (i.e. constant strings, characters, or predefined patterns) refers to another pattern. And that pattern either returns a value or refers to another pattern. And so on… All the way down.

To be quite frank, if I had explained to me like this, I'm not sure I would've understood it right first but I don't think I would've gotten quite often the weeds as far as I had.

---- sketchy rewrite of the tutorial beginning ----
forget everything you know about parsing and regular expressions. Parsley does many things similar to regular expressions and other parsers but you will learn faster if you start with a clean slate.

Let's start with a simple parsley grammar:

X = 'a' ('b' | 'c') 'd'+ 'e'"

X is defined as any string that contains the letter a followed by 'b' or 'c' followed by at least one 'd' which is followed by a single 'e'. Yes, very similar to regular expressions but get those regex out of your mind please.

Each one of the elements of this expression is considered a terminal node. This means once you match, there is no other patterns to search for. the string that matches this pattern is returned as the value of 'X'. Example of expression with a non-terminal mode is a rewrite of the above:

X = 'a' Y 'd'+ 'e'"
Y = ('b' | 'c')

in this case, Y is a non-terminal node because it contains additional information that can match against the input data.

(clarification request: if we get a match, it yields the string we match?)
all of this pattern matching is pretty common and I just know you're thinking about regular expressions. Here's where we start to deviate into something really nice. Going back to the original expression, let's say we want to do something based on whether we match the 'b' or the 'c'. in that case, we want to preserve the state of that match and we do that by assigning a variable to that match.

X = 'a' ('b' | 'c'):v1 'd'+ 'e'"

In this case, we assigned the results of the match of 'b' or 'c' to the variable 'v1'. doing something with that result is also just as easy.

X = 'a' ('b' | 'c'):v1 'd'+ 'e'" -> (print v1)

I think of the '->' arrow as 'yields' as in matching the pattern on the left yields the result of evaluating the right.

.... and it continues on. at this point I'd explain how you get either a single element or lists.

anyway, that's my idea of how to make this more accessible. Corrections, feedback is welcome and if you're willing to wait halfway to the death of the universe, I'll be glad to contribute documentation.

whitespace consumed silently

This parser thinks "foo 34" is an identifier, because the space is silently consumed.

def printit(t, x):
    print t + ':', repr(x)
    return x

parser = parsley.makeGrammar(
'''
DIGIT = ("0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"):x -> printit('digit', x)
LETTER = anything:x ?(x in "ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz") -> printit('letter', x)
WHITESPACE = (' '|'\t'|'\n'|'\r')+:x -> printit('whitespace', ''.join(x))

IDENT = <LETTER ( DIGIT | LETTER )*>:x -> printit('ident', x)

UNSIGNED_INTEGER = <DIGIT+>:digits -> printit('digits', digits)
EXPONENT = <( "e" | "E" ) ( "+" | "-" )? UNSIGNED_INTEGER>:x -> printit('exponent', x)
UNSIGNED_NUMBER = <UNSIGNED_NUMBER ("." UNSIGNED_NUMBER)? EXPONENT?>:x -> printit('unsigned number', x)

TEXPR = (WHITESPACE | IDENT | UNSIGNED_NUMBER):x -> printit('texpr', x)
TEXPRS = <TEXPR+>
''', {'printit': printit})

print parser('bar foo 34 a34 34e-10').TEXPRS()

mutual left recursion not supported

The OMeta paper describes mutual left recursion as a supported feature, but it appears broken in parsley.

Consider a simple left recursive grammar (this works):

exp = """\
num = <digit+>:n -> int(n)
expr = expr:e "-" num:n -> e - n
     | num
"""
compiled = makeGrammar(exp, {})
print(compiled("4-3").expr())

Now lets modify the grammar precisely as done in the paper: replacing expr in the expr recursion with another rule x that calls into expr:

exp = """\
num = <digit+>:n -> int(n)
x = expr
expr = x:e "-" num:n -> e - n
     | num
"""
compiled = makeGrammar(exp, {})
print(compiled("4-3").x())

Traceback (most recent call last):
  File "./t.py", line 36, in <module>
    print(compiled("4-3").x())
  File "/home/robertc/.virtualenvs/scratch/local/lib/python2.7/site-packages/parsley.py", line 98, in invokeRule
    raise err
ometa.runtime.ParseError: 
4-3
  ^
Parse error at line 1, column 2: expected EOF. trail: [digit]

If run with expr rather than x, the result is the same.

qnodes._substitute, unused and questionably named map arg

a) seems like map is unused
b) probably desirable to avoid overriding the map builtin

def _substitute(self, map):
    return [Term(self.tag, self.data, None, self.span)]

PyPI urls point at launchpad

on PyPI, the Home Page and Bug Tracker references point at launchpad which was rather confusing since it was last modified in 2013.

Neither do the docs mention any information about contact, development, issues or contributing for the project. I had to guess that it was now hosted on Github.

It would be great if those links could be updated, and maybe a page in the docs could mention a small amount of info for contacting / helping the project, etc.

Maybe even the launchpad page could point a link/some links to the new development location as well.

Parsley fails to parse

Have you abandoned Launchpad? I think I managed to install parsley-master over Parsley 1.1 and ran the test with the same result as before:

https://bugs.launchpad.net/parsley/+bug/1183224

Order-dependent or statements

Using or statements sometimes fails depending on the ordering. Shown below is an example of such a failure mode.

testCaseGrammar = """
name = (<letter+>:n ws name:rn -> n + ' ' + rn) | <letter+>
nameRev = <letter+> | (<letter+>:n ws name:rn -> n + ' ' + rn)
"""

parser = makeGrammar(testCaseGrammar,{})
print parser("name with spaces").name()
print parser("name with spaces").nameRev()

The first form using name returns "name with spaces", while the second form gives a ParseError. As I understand the grammar, these two rules should function identically, with both forms returning "name with spaces".

Support Unicode categories and properties

Hello, parsley looks really cool and I have an application I am interested in switching from pyparsing to parsley. However, I am blocked in doing so because my application needs to parse Unicode text (Python 3 source code, actually). The built in rules are not Unicode aware, so I cannot possibly match, for example, "a unicode alphanumeric character"-- I would have to inline all possible alphanumeric characters in my grammar (which might not be possible anyway, see issue #1). Pyparsing avoids this problem for my purposes by allowing the use of standard Python regex, which has three simple unicode "classes".

The ideal here would be if Parsley would allow to match on arbitrary Unicode categories and properties. Every codepoint in Unicode has one category, which is a two-letter code (like "Ll" for Letter, Lowercase) and an arbitrary number of "properties" which are key-value pairs (like "Script=Cyrillic"). Properties are very important because if you are using Parsley you are probably parsing something like a programming language, and the current best practice as I understand is that programming languages follow the rules of UAX 31. This defines two Unicode properties XID_Start and XID_Continue, which when set to true match the Unicode body's recommendations for what constitutes an identifier (like, a variable name). Supporting properties unfortunately might be kind of difficult as there is nothing in the standard library for this. So to support properties either Parsley would have to embed some form of the Unicode character database, which is physically large, or introduce a module dependency (the third-party "Regex" module can do this using \p{})

A good next-best-effort would be if Parsley could support just the Unicode categories. This is easier because the standard library has the unicodedata module which lets you query a character's category.

I feel like the minimum version of this feature would be to at least have feature parity with the Python regex module, which has coarse \s, \w, and \d (whitespace, alphanumeric, numeric) classes. It might be hard to literally match the same strings as the re module because the re documentation is quite vague as to how \s\w\d are defined. But as far as I know you could get a reasonable approximation of these categories by using the categories in unicodedata.

Different behavior for single and double quotes

I came across a bug that was rooted in what seems to be a curious feature of Parsley: changing from single to double quotes leads to a parsing error:

import sys
print(sys.version_info)
# sys.version_info(major=3, minor=5, micro=1, releaselevel='final', serial=0)

import parsley
print(parsley.__version__)
#1.3

grammar = parsley.makeGrammar("""
add = <digit+>:left sp "+" sp add:right -> ('add', left, right)
      | <digit+>:child                  -> child

sp = ' '*
""", {})

grammar('4 + 3').add()
# ('add', '4', '3')

But if we change sp = ' '* to sp = " "* and keep everything else constant, we get an error:

grammar = parsley.makeGrammar("""
add = <digit+>:left sp "+" sp add:right -> ('add', left, right)
      | <digit+>:child                  -> child

sp = " "*
""", {})

grammar('4 + 3').add()

---------------------------------------------------------------------------
ParseError                                Traceback (most recent call last)
<ipython-input-20-5d4b7ce9eeba> in <module>()
      6 """, {})
      7 
----> 8 grammar('4 + 3').add()

/Users/jakevdp/anaconda/envs/python3.5/lib/python3.5/site-packages/parsley.py in invokeRule(*args, **kwargs)
     96                     err = ParseError(err.input, err.position + 1,
     97                                      [["message", "expected EOF"]], err.trail)
---> 98             raise err
     99         return invokeRule
    100 

ParseError: 
4 + 3
^
Parse error at line 2, column 0: expected EOF. trail: []

Is the semantic difference between " " and ' ' intended? I couldn't find any discussion of this in the documentation. Thank you!

Should test what happens when arguments to rules is malformed.

From parsely.parsely the following

args = ('(' !(self.applicationArgs(finalChar=')')):args ')'
            -> args
          | -> [])

Looks like it could return an empty list instead of throwing an exception if the argument list is malformed.

parsley.makeGrammar("death = ('a' | ws)*", {})('a ').death() # never returns?

parsley.makeGrammar("death = ('a' | ws)*", {})('a ').death()  # never returns
parsley.makeGrammar("death = ('a' | (' ' | '\t' | '\n')*)*", {})('a ').death()  # never returns
parsley.makeGrammar("death = ('a' | ' '*)*", {})('a ').death()  # never returns
parsley.makeGrammar("death = ('a' | 'b'*)*", {})('ab').death()  # never returns
parsley.makeGrammar("never_returns = ('a'*)*", {})('a').never_returns()

Here are some nearby constructions that work okay:

parsley.makeGrammar("ok = ('a' | ' ')*", {})('a ').ok()  # returns ['a', ' ']
parsley.makeGrammar("ok = 'a' ws", {})('a ').ok()  # returns True
parsley.makeGrammar("ok = ('a' ws)*", {})('a ').ok()  # returns [True]
parsley.makeGrammar("ok = ('a' | ' ' | '\t' | '\n')*", {})('a ').ok()  # returns ['a', ' ']

Here's the stack trace when I manually terminate the never returning grammar:

KeyboardInterruptTraceback (most recent call last)
<ipython-input-65-b38e7b74e28c> in <module>()
----> 1 parsley.makeGrammar("death = ('a' | ws)*", {})('a ').death()

/home/notebook/anaconda2/lib/python2.7/site-packages/parsley.pyc in invokeRule(*args, **kwargs)
     83             """
     84             try:
---> 85                 ret, err = self._grammar.apply(name, *args)
     86             except ParseError as e:
     87                 self._grammar.considerError(e)

/home/notebook/anaconda2/lib/python2.7/site-packages/ometa/runtime.pyc in apply(self, ruleName, *args)
    460         r = getattr(self, "rule_"+ruleName, None)
    461         if r is not None:
--> 462             val, err = self._apply(r, ruleName, args)
    463             return val, err
    464 

/home/notebook/anaconda2/lib/python2.7/site-packages/ometa/runtime.pyc in _apply(self, rule, ruleName, args)
    493             try:
    494                 memoRec = self.input.setMemo(ruleName,
--> 495                                          [rule(), self.input])
    496             except ParseError as e:
    497                 e.trail.append(ruleName)

/pymeta_generated_code/pymeta_grammar__Grammar.py in rule_death(self)
     20                             self._trace("'\n'", (28, 31), self.input.position)
     21                             _G_exactly_8, lastError = self.exactly('\n')
---> 22                             self.considerError(lastError, None)
     23                             return (_G_exactly_8, self.currentError)
     24                         _G_not_9, lastError = self._not(_G_not_7)

/home/notebook/anaconda2/lib/python2.7/site-packages/ometa/runtime.pyc in many(self, fn, *initial)
    552             try:
    553                 m = self.input
--> 554                 v, _ = fn()
    555                 ans.append(v)
    556             except ParseError as err:

/pymeta_generated_code/pymeta_grammar__Grammar.py in _G_many_1()
     17                         return (_G_exactly_5, self.currentError)
     18                     def _G_or_6():
---> 19                         def _G_not_7():
     20                             self._trace("'\n'", (28, 31), self.input.position)
     21                             _G_exactly_8, lastError = self.exactly('\n')

/home/notebook/anaconda2/lib/python2.7/site-packages/ometa/runtime.pyc in _or(self, fns)
    596             try:
    597                 m = self.input
--> 598                 ret, err = f()
    599                 errors.append(err)
    600                 return ret, joinErrors(errors)

/pymeta_generated_code/pymeta_grammar__Grammar.py in _G_or_2()
      9             _G_exactly_1, lastError = self.exactly('//')
     10             self.considerError(lastError, 'c_comment')
---> 11             def _G_consumedby_2():
     12                 def _G_many_3():
     13                     def _G_or_4():

/home/notebook/anaconda2/lib/python2.7/site-packages/ometa/runtime.pyc in exactly(self, wanted)
    529             val, p, self.input = self.input.slice(len(wanted))
    530         else:
--> 531             val, p = self.input.head()
    532             self.input = self.input.tail()
    533         if wanted == val:

/home/notebook/anaconda2/lib/python2.7/site-packages/ometa/runtime.pyc in head(self)
    235             else:
    236                 data = self.data
--> 237             raise EOFError(data, self.position + 1)
    238         return self.data[self.position], self.error
    239 

/home/notebook/anaconda2/lib/python2.7/site-packages/ometa/runtime.pyc in __init__(self, input, position)
    113     """
    114     def __init__(self, input, position):
--> 115         ParseError.__init__(self, input, position, eof())
    116 
    117 

/home/notebook/anaconda2/lib/python2.7/site-packages/ometa/runtime.pyc in eof()
    125 
    126 
--> 127 def eof():
    128     """
    129     Return an indication that the end of the input was reached.

KeyboardInterrupt:

ometa.runtime.joinErrors fails when errors = [None]

The traceback can be view here: https://gist.github.com/introom/6101943

The reason the problem occurs can been seen from:
https://gist.github.com/introom/6101943#file-gistfile1-txt-L8
\x00\x12_error_description**\x00\x00***, we matches anything{int(\x00\x00)}, and there is some remaining data in the input, so the errors become None.

And typically, test like this fails: https://github.com/twisted/parsley-protocols/blob/amp/parseproto/test/test_amp.py#L2181

Parser execution trace

Parsley should be able to produce a detailed trace of whch expression was executed, whether it succeeded or failed, and at what input location, for an entire parse.

unicode literals

At the very least we need \uXXXX escapes.

Infinite loop (or similar) in parser

The following code will make parsley take 100% CPU and slowly growing in memory for a long long time. This is an incorrect grammar because the rule "output" is referencing "one" directly AND indirectly via the rule "number". It would be better if parsley detected this and threw a helpful error.

from parsley import makeGrammar, termMaker
from itertools import chain

x = makeGrammar("""
one = '1'
two = '2'
number = (one | two)*
output = (number | one)*
""", {})
print x("1122121").output()

Consider moving NEWS -> CHANGES

As per convention, you may consider moving NEWS->CHANGES http://guide.python-distribute.org/creation.html

Rule arguments are wrapped in tuples

When a rule has multiple arguments and passes those arguments to another rule, all but the last argument are wrapped in single-valued tuples. This is a concern because the tuples are unexpected and the multiple arguments are not handled consistently.

Code example:

from parsley import makeGrammar

GRAMMAR = """
    a :x :y :z = b(x, y, z) -> print("a: x={} y={} z={}".format(x, y, z))
    b :x :y :z = c(x, y, z) -> print("b: x={} y={} z={}".format(x, y, z))
    c :x :y :z =            -> print("c: x={} y={} z={}".format(x, y, z))
"""

ParserClass = makeGrammar(GRAMMAR, {})
parser = ParserClass('')
parser.a(1, 2, 3)

Output from Python 3.5.3:

c: x=((1,),) y=((2,),) z=3
b: x=(1,) y=(2,) z=3
a: x=1 y=2 z=3

Collect tokens from stream as well as rule return values

I'm parsing a script into tokens, and I'd like to store the parsed results of the child rules and the slice of consumed input (like the angle-brackets shortcut). In the grammar that receives these tokens, the context will determine whether the parsed results can be used, or if the original input is needed. I'm not sure if there's a way store both of these things at once with Parsley.

Parse error reported in the wrong place

Consider the following Python code:

import parsley

single_digit = parsley.makeGrammar("integer = digit", {})
many_digits = parsley.makeGrammar("integer = digit+", {})

for grammar in (single_digit, many_digits):
    try:
        grammar("1x").integer()
    except Exception as e:
        print e

The single_digit grammar matches 1, then expects EOF but finds x instead.

The many_digits grammar matches 1, then expects another digit or EOF, but finds x instead.

I would expect both grammars to have the same, or at least similar parse errors. Instead, with Parsley 1.3 on Python 2.7, I get:


1x
 ^
Parse error at line 1, column 1: expected EOF. trail: []


1x
^
Parse error at line 2, column 0: expected EOF. trail: [digit]

...that is, single_digit reports a sensible error at a sensible location, but many_digits draws the caret under a character that it should have accepted, while reporting an error on line 2 of an input that contains no newline characters.

doesn't work

It looks like there's a lot going on here lately so I guess I shouldn't be surprised that the example program doesn't work. Using python 2.7 installed by ubuntu and pip install Parsley the example program gives:

Traceback (most recent call last):
  File "parsley_stil.py", line 14, in <module>
    result, error = g.stuff()
ValueError: too many values to unpack

It looks like instead of a (result,error) tuple, g.stuff() just returns result which is a 4 element list.

Tried cloning the repo and installing into a virtualenv -- same result

Tried digging into the code but too scary. Sorry.

Update homepage in setup.py

The setup.py file still lists http://launchpad.net/parsley as the project's homepage, which means PyPI does as well. I think it should point to this github page, since everything on the launchpad.net page is for 1.1 and earlier.

The launchpad.net page should also be updated to indicate that the project has moved. I nearly fell under the impression that parsely might be abandoned because PyPI only contains a link ot the old page and the old page's repository hasn't been updated in ages.

Bug calling python function from grammar

Modifying the example in the readme to call a function:

from parsley import makeGrammar

def plusone(d):
    return d + 1

exampleGrammar = """
ones = '1' '1' -> plusone(1)
twos = '2' '2' -> plusone(2)
stuff = (ones | twos)+
"""
Example = makeGrammar(exampleGrammar, {})
g = Example("11221111")
result = g.stuff()
print result

Fails on master (e58c0c6) as well as tag 1.2 (7ddaa41) with

NameError: name 'plusone' is not defined

It seems like this must have worked at one time for the calculator example in the tutorial to have worked.

Full error is:

Traceback (most recent call last):
  File "parsley_example.py", line 13, in <module>
    result = g.stuff()
  File "/home/m/q/topaz/patterns/stil/parsley/v/local/lib/python2.7/site-packages/parsley.py", line 85, in invokeRule
    ret, err = self._grammar.apply(name, *args)
  File "/home/m/q/topaz/patterns/stil/parsley/v/local/lib/python2.7/site-packages/ometa/runtime.py", line 454, in apply
    val, err = self._apply(r, ruleName, args)
  File "/home/m/q/topaz/patterns/stil/parsley/v/local/lib/python2.7/site-packages/ometa/runtime.py", line 483, in _apply
    [rule(), self.input])
  File "/pymeta_generated_code/pymeta_grammar__Grammar.py", line 50, in rule_stuff
  File "/pymeta_generated_code/pymeta_grammar__Grammar.py", line 47, in _G_many1_9
  File "/home/m/q/topaz/patterns/stil/parsley/v/local/lib/python2.7/site-packages/ometa/runtime.py", line 584, in _or
    ret, err = f()
  File "/pymeta_generated_code/pymeta_grammar__Grammar.py", line 39, in _G_or_10
  File "/home/m/q/topaz/patterns/stil/parsley/v/local/lib/python2.7/site-packages/ometa/runtime.py", line 483, in _apply
    [rule(), self.input])
  File "/pymeta_generated_code/pymeta_grammar__Grammar.py", line 14, in rule_ones
  File "<string>", line 1, in <module>
NameError: name 'plusone' is not defined

Terminals as regular expresions

I need to build a grammar from user inputs, and I was trying to do something like this

parsley.makeGrammar("d = '{test}'".format(test="TeSt"), {})("TeSt").d()

But having {test} case insensitive, so I tried the next

parsley.makeGrammar("d = '(?i){test}'".format(test="TeSt"), {})("TeSt").d()
parsley.makeGrammar("d = (?i)'{test}'".format(test="TeSt"), {})("TeSt").d()

But none of them worked. Is there any way to achieve that?

Overriding builtin func globals as kwarg in inter.py TrampolinedGrammarInterpreter

Documentation inaccurately describes "letter"

In general, Parsley seems to not be very aware of Unicode. A specific manifestation of this problem, the documentation says:

letter:
    Matches a single ASCII letter.

However, looking at the source code, I think it is doing something different. Looking at runtime.py, the class OMetaBase, the method letter() is implemented by calling str.isalpha(). The Python documentation describes isalpha as:
Python 2.7: Depends on locale
Python 3.5: Union of several Unicode categories

In other words, unless I am misunderstanding the source, the characters matched by letter will depend on version of Python and in version 2.7 will depend on locale. This should be documented.

ws and digit have similar problems, btw. The ws documentation says it "matches zero or more spaces, tabs, or newlines" but the implementation in the source appears to use str.isspace(), which has similar behavior to isalpha.

chunked input stream

Parsley should be able to consume input from files in chunks. Each ChunkedInputStream object should refer to a buffer, and if .tail() falls off the end of the buffer, the new ChunkedInputStream object should load a new chunk.

Would be better if TrampolinedGrammarInterpreter accepts generated ones.

The translated ones are like those inside ometa/_generated/*.py,
I guess translate the grammar every time might cause some extra overhead.

Ranges

Parsley should support ranges. Think we said

'A'..'Z'

was a reasonable syntax.

Project abandoned?

Hi, I would like to know what's happening with this project. The last tag (1.2) is from April last year and doesn't have Python 3 support, since then no proper release has been done.

OMetaBase.apply should take kwargs, and _GrammarWrapper should pass them along

Maybe if I create some tickets it will persuade me to actually send you pull requests.

Right now _GrammarWrapper eats kwargs, so you can't actually get anything to the rules without icky positional args.

ParseError should be hashable

ParseError defines a __eq__ method, but no __hash__ method. The default __hash__ method fails because the error attribute is a list. This breaks PyTest and Python's traceback.format_exception_only function.

Discovery

During testing of a new grammar, PyTest reported an internal error when Parsley raised a ParseError.

INTERNALERROR>   File "/home/joel/.virtualenvs/meetup2xibo/lib/python3.5/site-packages/_pytest/_code/code.py", line 481, in exconly
INTERNALERROR>     lines = format_exception_only(self.type, self.value)
INTERNALERROR>   File "/usr/lib/python3.5/traceback.py", line 136, in format_exception_only
INTERNALERROR>     return list(TracebackException(etype, value, None).format_exception_only())
INTERNALERROR>   File "/usr/lib/python3.5/traceback.py", line 439, in __init__
INTERNALERROR>     _seen.add(exc_value)
INTERNALERROR> TypeError: unhashable type: 'ParseError'

The TypeError is raised by Python standard library function traceback.format_exception_only.

A Simple Test

The following code demonstrates the use of traceback.format_exception_only and raises the TypeError without involving PyTest.

from parsley import makeGrammar, ParseError
import sys
import traceback

def format_exception():
    (last_type, last_value, last_traceback) = sys.exc_info()
    return traceback.format_exception_only(last_type, last_value)

def parse(text):
    parser = makeGrammar("foo = 'a'", {})
    try:
        return parser(text).foo()
    except ParseError:
        return format_exception()

def divide(x, y):
    try:
        return x / y
    except ZeroDivisionError:
        return format_exception()

def test():
    print(divide(6, 2))
    print(divide(6, 0))
    print(parse('a'))
    print(parse('b'))

test()

Running the test code with Python 3.5 gives the following results:

The quotient is printed.
The ZeroDivisionError is formatted.
The parser recognizes 'a'.
The parser raises a ParseError when parsing 'b', but traceback.format_exception_only fails to format the error.

$ python foo.py
3.0
['ZeroDivisionError: division by zero\n']
a
Traceback (most recent call last):
  File "foo.py", line 28, in <module>
    test()
  File "foo.py", line 26, in test
    print(parse('b'))
  File "foo.py", line 14, in parse
    return format_exception()
  File "foo.py", line 7, in format_exception
    return traceback.format_exception_only(last_type, last_value)
  File "/usr/lib/python3.5/traceback.py", line 136, in format_exception_only
    return list(TracebackException(etype, value, None).format_exception_only())
  File "/usr/lib/python3.5/traceback.py", line 439, in __init__
    _seen.add(exc_value)
TypeError: unhashable type: 'ParseError'

Workaround

The following code monkey patches PyError to add a __hash__ method.

def parse_error_hash(self):
    """Define missing ParseError.__hash__()."""
    return hash((self.position, self.formatReason()))

ParseError.__hash__ = parse_error_hash

Rerunning the test code with the monkey patch gives successful results.

o$ python foo.py
3.0
['ZeroDivisionError: division by zero\n']
a
["ometa.runtime.ParseError: \nb\n^\nParse error at line 1, column 0: expected the character 'a'. trail: [foo]\n\n"]

camleCase makes Pythonista sad

Function names should be lowercase, with words separated by underscores as necessary to improve readability.
PEP 8.

pyga / parsley Goto Github PK

parsley's People

Contributors

Stargazers

Watchers

Forkers

parsley's Issues

Recommend Projects

Recommend Topics

Recommend Org