pyga / parsley Goto Github PK
View Code? Open in Web Editor NEWLicense: Other
License: Other
I'm trying to parse mathematical expressions with operator precedence, but I'm running into an odd problem where a rule of the form "A | B" fails to match, despite the fact that rule A matches the input string.
A stripped-down version of the grammar that exhibits this behavior:
spec = """
integer_literal = < digit+ >:x -> ("integer_literal",x)
oplevel_unary_left = ('+'|'-'):op unit:X -> ("op",1,op,X)
oplevel_unary_and_higher = oplevel_unary_left | integer_literal
oplevel_exponent = oplevel_exponent_and_higher:L ws ('^'):op ws oplevel_unary_and_higher:R -> ("op",2,op,L,R)
oplevel_exponent_and_higher = oplevel_exponent | oplevel_unary_and_higher
oplevel_mult = oplevel_mult_and_higher:L ws ('*'|'/'|'%'):op ws oplevel_exponent_and_higher:R -> ("op",2,op,L,R)
oplevel_mult_and_higher = oplevel_mult | oplevel_exponent_and_higher
"""
parser = parsley.makeGrammar(spec, {})
Then I get
>>> parser("2*3").oplevel_mult()
('op', 2, '*', ('integer_literal', '2'), ('integer_literal', '3'))
but
>>> parser("2*3").oplevel_mult_and_higher()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files\Anaconda3\lib\site-packages\parsley.py", line 98, in invokeRule
raise err
ometa.runtime.ParseError:
2*3
^
Parse error at line 1, column 2: expected EOF. trail: [digit]
(or similarly with exponent_and_higher vs. just exponent)
I don't see how it's even possible for oplevel_mult_and_higher to fail when the first arm is oplevel_mult, which passes.
Hi!
I found this in the reference:
!(pythonExpression): Invoke a Python expression as an action.
but I don't understand how it's used.
Any help appreciated!
bin/generate_parser
(which isn't installed) can generate one from a string in a python module, but not from a file.
Also, that program only appears to work from the parsley source, since ometa.vm_builder
uses open
with relative paths.
I am experiencing some oddities with using or
expressions. I have replicated it with a trivial example:
import parsley
x = parsley.makeGrammar(
"""
away_we_go = ('away we go' | 'off and running') -> "away_we_go"
back_underway = ('back underway' | 'back under way') -> "back_underway"
all = (anything*) -> "all"
foo = away_we_go | back_underway | all
""",
{}
)
print x("giorenagoirhger").foo() # all
print x("giorena goirhger").foo() # all
print x("away we go").foo() # away_we_go
print x("back underway").foo() # back_underway
print x("go").foo() # all
print x("lets go").foo() # all
print x("we are back under way go").foo() # all
print x("we are back under way").foo() # all
print x("back under way go").foo() # ERROR
Traceback (most recent call last):
File "process/tests/parsley_test.py", line 21, in <module>
print x("back under way go").foo() # ERROR
File "/Users/reuben/.virtualenvs/processor/lib/python2.7/site-packages/parsley.py", line 98, in invokeRule
raise err
ometa.runtime.ParseError:
back under way go
^
Parse error at line 1, column 1: expected EOF. trail: []
unless I am mistaken "back under way go" should fail to match the away_we_go
pattern and the back_underway
pattern and then match the all
pattern.
Currently, we can only do setNextRule("ruleName"). But sometimes, the nextRule may in the form of :
ruleName :arg1 :arg2 = expr
As the argument is set by the caller, so, the currentRule needn't be in the form of term based Action(arg). It should directly be the data.
To accomplish this, I used the codeName to tell this apart. (I don't know if it's a good way to go or not.) When the codeName is None, it means the arg is already the pure data, which needn't be self._eval again.
A sample link:
https://github.com/introom/parsley/blob/master/ometa/interp.py#L154
To support rule invocation with argument, of course some other parts should be modified too.
I will discuss it later when this issue gets its attention.
Are there any plans for Python 3 support? That's the one thing that keeps me from adopting this for use in matplotlib or astropy...
def moduleFromGrammar(source, className, modname, filename):
mod = module(modname)
mod.__name__ = modname
mod.__loader__ = GeneratedCodeLoader(source)
> code = compile(source, filename, "exec")
E File "/pymeta_generated_code/pymeta_grammar__Grammar.py", line 5
E if Grammar.globals is not None:
E ^
E IndentationError: expected an indented block
The exceptions.py
example is actually a copy (with errors in it) of the minml.py
example.
Dear developers,
Is there a way to pass the current line and column of the file being parsed to the action of a rule?
For example:
the grammar:
parameter = (word:w -> AnAction('parameter',str(w),line,column) )
the python file:
def AnAction(name, value, line, column):
print(value +' ' + str(line) + ' ' + str(column))
the input text:
hello this is
a test
the output:
hello 1 1
this 1 7
is 1 12
a 2 1
test 2 3
I'm trying to build a compiler and I would like to check if a variable is already defined. So to make an error message it would be nice if Parsley would provide me with the location of the variable, even when the file has the correct syntax.
Thanks in advance!!!
some_grammar = "s = 'something' "
bindings = { 'a': 'hello'}
instead of
g = parsley.makeGrammar(some_grammar, bindings)
g("blabla").s()
we want:
g = parsley.makeGrammar(some_grammar, {})
g("blabla", bindings).s()
and, it's even better if we could support overriding(the same name given latter override the previous one)
The docstring for stack
ought to mention that it's function composition, rather than anything specific to parsley.
When a Parsley field is wrong, you have a onFieldError()
callback, although a method onFormError()
doesn't exist. Why cannot we create this method like onFieldValidate
and onFormValidate
? It would be more consistent.
parsley.ometa.runtime.ParseError has a very nice way of formatting / printing error messages
It would be nice if there was some way to access this from the grammar rules. For example, if there was a type of exception that would be picked up and have the line and character added.
Also, it would be nice if there was a way to emit a warning with a given message formatted in the same way as the ParseError.
(This can kinda-sorta be achieved via print str(grammar.input.head()[1].withMessage("whatever warning")
, but there doesn't seem to be a way to get access to the grammar object itself from inside a rule.)
As I was learning parsley, I made a bunch of little notes / idioms for doing things:
https://gist.github.com/kurtbrose/05f2dd879eba6a88a3dc7c13e36ce772
There are also some larger grammars I defined:
TLS RFC parsing:
https://gist.github.com/kurtbrose/bb98bdf42dc709cbc5a1b94b058b703c
Thrift IDL Parsing:
https://github.com/kurtbrose/thriftpy/blob/new_parser/thriftpy/parser/parser.py#L395-L497
Is it possible to freeze code using parsley?
I couldnt with pyinstaller or py2app.
Im on OSX High Sierra.
I've found myself wanting to get the return value from a ?()
expression, for example, when I have a lookup dictionary, and I want the rule to match if the word is present in the dictionary, and I also want to use the returned value. Currently you have to do something like:
thing = word:word !(things.get(word)):thing ?(thing) -> thing
Would be nice if it was possible to do something like:
thing = word:word ?(things.get(word))
Or is there a better way of achieving this functionality? ..would it be possible for ?()
to to just return the result rather than True
or False
?
Watching your pycon talk and looking through the code base, I gather that it's possible to compile from peg into python using parsley, but after a couple hours of searching I have yet to figure out the right structure to pass to the writePython function. Could you put together an example in the documentation?
I just filed #60 where I outline some Unicode-related features Parsley is missing. I notice however I could have easily added these features in my own code, without modifying Parsley, if I could somehow define new rules. Parsley rules appear to all be functions that accept or reject a character. If I could define my own such functions I could define, for example, a function that brings in the Regex module and matches on the regex \p{XID_Continue}
.
A friend mentioned they had hacked additional rules into Parsley in the past by calling undocumented functions, and it actually looks like I could possibly get what I want by extending OMetaBase
. However, I would not trust such things unless they are to some extent documented because if they are undocumented I expect they could stop working suddenly in a future version.
note: you have to read this in plaintext because there's some markup going on that is eating special characters
In learning parsley, I had trouble understanding what was going on. And after some help from dash on IRC, a light bulb went on and I think I have an explanation of the various components in parsley which may help others come up to speed.
In a variant of BNF, a parsley rule is of the form:
= [:data_name] [[: data_name]]* [-> ( )
The first simple magic is every pattern, when matched, yields a value. That value is either discarded, or if followed by a colon and a data name (i.e. a Python legal symbol), the returned value is placed into the data name.
The second simple magic is that after the arrow ('->') you can have a Python expression surrounded by parentheses. Each data name defined prior to this expression can be used as part of the expression. It's the way you transfer the values determined by the pattern matching into expression so you can do whatever calculation you want.
The third simple magic is that the result of the expression at the end of the statement is assigned to the name at the start of the statement. You can retrieve this value by treating the name as a method call (explanation TBD)
Now the next part gave me that "ah ha" moment. Do you remember the tale about the Hindu woman in a remote village who was asked about her view of a creation myth. Well, in this case, it's Python expressions all the way down.
Every pattern if it's not a terminal pattern (i.e. constant strings, characters, or predefined patterns) refers to another pattern. And that pattern either returns a value or refers to another pattern. And so on… All the way down.
To be quite frank, if I had explained to me like this, I'm not sure I would've understood it right first but I don't think I would've gotten quite often the weeds as far as I had.
---- sketchy rewrite of the tutorial beginning ----
forget everything you know about parsing and regular expressions. Parsley does many things similar to regular expressions and other parsers but you will learn faster if you start with a clean slate.
Let's start with a simple parsley grammar:
X = 'a' ('b' | 'c') 'd'+ 'e'"
X is defined as any string that contains the letter a followed by 'b' or 'c' followed by at least one 'd' which is followed by a single 'e'. Yes, very similar to regular expressions but get those regex out of your mind please.
Each one of the elements of this expression is considered a terminal node. This means once you match, there is no other patterns to search for. the string that matches this pattern is returned as the value of 'X'. Example of expression with a non-terminal mode is a rewrite of the above:
X = 'a' Y 'd'+ 'e'"
Y = ('b' | 'c')
in this case, Y is a non-terminal node because it contains additional information that can match against the input data.
(clarification request: if we get a match, it yields the string we match?)
all of this pattern matching is pretty common and I just know you're thinking about regular expressions. Here's where we start to deviate into something really nice. Going back to the original expression, let's say we want to do something based on whether we match the 'b' or the 'c'. in that case, we want to preserve the state of that match and we do that by assigning a variable to that match.
X = 'a' ('b' | 'c'):v1 'd'+ 'e'"
In this case, we assigned the results of the match of 'b' or 'c' to the variable 'v1'. doing something with that result is also just as easy.
X = 'a' ('b' | 'c'):v1 'd'+ 'e'" -> (print v1)
I think of the '->' arrow as 'yields' as in matching the pattern on the left yields the result of evaluating the right.
.... and it continues on. at this point I'd explain how you get either a single element or lists.
anyway, that's my idea of how to make this more accessible. Corrections, feedback is welcome and if you're willing to wait halfway to the death of the universe, I'll be glad to contribute documentation.
This parser thinks "foo 34" is an identifier, because the space is silently consumed.
def printit(t, x):
print t + ':', repr(x)
return x
parser = parsley.makeGrammar(
'''
DIGIT = ("0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"):x -> printit('digit', x)
LETTER = anything:x ?(x in "ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz") -> printit('letter', x)
WHITESPACE = (' '|'\t'|'\n'|'\r')+:x -> printit('whitespace', ''.join(x))
IDENT = <LETTER ( DIGIT | LETTER )*>:x -> printit('ident', x)
UNSIGNED_INTEGER = <DIGIT+>:digits -> printit('digits', digits)
EXPONENT = <( "e" | "E" ) ( "+" | "-" )? UNSIGNED_INTEGER>:x -> printit('exponent', x)
UNSIGNED_NUMBER = <UNSIGNED_NUMBER ("." UNSIGNED_NUMBER)? EXPONENT?>:x -> printit('unsigned number', x)
TEXPR = (WHITESPACE | IDENT | UNSIGNED_NUMBER):x -> printit('texpr', x)
TEXPRS = <TEXPR+>
''', {'printit': printit})
print parser('bar foo 34 a34 34e-10').TEXPRS()
The OMeta paper describes mutual left recursion as a supported feature, but it appears broken in parsley.
Consider a simple left recursive grammar (this works):
exp = """\
num = <digit+>:n -> int(n)
expr = expr:e "-" num:n -> e - n
| num
"""
compiled = makeGrammar(exp, {})
print(compiled("4-3").expr())
->
1
Now lets modify the grammar precisely as done in the paper: replacing expr in the expr recursion with another rule x that calls into expr:
exp = """\
num = <digit+>:n -> int(n)
x = expr
expr = x:e "-" num:n -> e - n
| num
"""
compiled = makeGrammar(exp, {})
print(compiled("4-3").x())
->
Traceback (most recent call last):
File "./t.py", line 36, in <module>
print(compiled("4-3").x())
File "/home/robertc/.virtualenvs/scratch/local/lib/python2.7/site-packages/parsley.py", line 98, in invokeRule
raise err
ometa.runtime.ParseError:
4-3
^
Parse error at line 1, column 2: expected EOF. trail: [digit]
If run with expr rather than x, the result is the same.
a) seems like map is unused
b) probably desirable to avoid overriding the map builtin
def _substitute(self, map):
return [Term(self.tag, self.data, None, self.span)]
on PyPI, the Home Page and Bug Tracker references point at launchpad which was rather confusing since it was last modified in 2013.
Neither do the docs mention any information about contact, development, issues or contributing for the project. I had to guess that it was now hosted on Github.
It would be great if those links could be updated, and maybe a page in the docs could mention a small amount of info for contacting / helping the project, etc.
Maybe even the launchpad page could point a link/some links to the new development location as well.
Have you abandoned Launchpad? I think I managed to install parsley-master over Parsley 1.1 and ran the test with the same result as before:
Using or statements sometimes fails depending on the ordering. Shown below is an example of such a failure mode.
testCaseGrammar = """
name = (<letter+>:n ws name:rn -> n + ' ' + rn) | <letter+>
nameRev = <letter+> | (<letter+>:n ws name:rn -> n + ' ' + rn)
"""
parser = makeGrammar(testCaseGrammar,{})
print parser("name with spaces").name()
print parser("name with spaces").nameRev()
The first form using name
returns "name with spaces"
, while the second form gives a ParseError
. As I understand the grammar, these two rules should function identically, with both forms returning "name with spaces"
.
Hello, parsley looks really cool and I have an application I am interested in switching from pyparsing to parsley. However, I am blocked in doing so because my application needs to parse Unicode text (Python 3 source code, actually). The built in rules are not Unicode aware, so I cannot possibly match, for example, "a unicode alphanumeric character"-- I would have to inline all possible alphanumeric characters in my grammar (which might not be possible anyway, see issue #1). Pyparsing avoids this problem for my purposes by allowing the use of standard Python regex, which has three simple unicode "classes".
The ideal here would be if Parsley would allow to match on arbitrary Unicode categories and properties. Every codepoint in Unicode has one category, which is a two-letter code (like "Ll" for Letter, Lowercase) and an arbitrary number of "properties" which are key-value pairs (like "Script=Cyrillic"). Properties are very important because if you are using Parsley you are probably parsing something like a programming language, and the current best practice as I understand is that programming languages follow the rules of UAX 31. This defines two Unicode properties XID_Start and XID_Continue, which when set to true match the Unicode body's recommendations for what constitutes an identifier (like, a variable name). Supporting properties unfortunately might be kind of difficult as there is nothing in the standard library for this. So to support properties either Parsley would have to embed some form of the Unicode character database, which is physically large, or introduce a module dependency (the third-party "Regex" module can do this using \p{}
)
A good next-best-effort would be if Parsley could support just the Unicode categories. This is easier because the standard library has the unicodedata module which lets you query a character's category.
I feel like the minimum version of this feature would be to at least have feature parity with the Python regex module, which has coarse \s
, \w
, and \d
(whitespace, alphanumeric, numeric) classes. It might be hard to literally match the same strings as the re module because the re
documentation is quite vague as to how \s\w\d
are defined. But as far as I know you could get a reasonable approximation of these categories by using the categories in unicodedata.
I came across a bug that was rooted in what seems to be a curious feature of Parsley: changing from single to double quotes leads to a parsing error:
import sys
print(sys.version_info)
# sys.version_info(major=3, minor=5, micro=1, releaselevel='final', serial=0)
import parsley
print(parsley.__version__)
#1.3
grammar = parsley.makeGrammar("""
add = <digit+>:left sp "+" sp add:right -> ('add', left, right)
| <digit+>:child -> child
sp = ' '*
""", {})
grammar('4 + 3').add()
# ('add', '4', '3')
But if we change sp = ' '*
to sp = " "*
and keep everything else constant, we get an error:
grammar = parsley.makeGrammar("""
add = <digit+>:left sp "+" sp add:right -> ('add', left, right)
| <digit+>:child -> child
sp = " "*
""", {})
grammar('4 + 3').add()
---------------------------------------------------------------------------
ParseError Traceback (most recent call last)
<ipython-input-20-5d4b7ce9eeba> in <module>()
6 """, {})
7
----> 8 grammar('4 + 3').add()
/Users/jakevdp/anaconda/envs/python3.5/lib/python3.5/site-packages/parsley.py in invokeRule(*args, **kwargs)
96 err = ParseError(err.input, err.position + 1,
97 [["message", "expected EOF"]], err.trail)
---> 98 raise err
99 return invokeRule
100
ParseError:
4 + 3
^
Parse error at line 2, column 0: expected EOF. trail: []
Is the semantic difference between " "
and ' '
intended? I couldn't find any discussion of this in the documentation. Thank you!
From parsely.parsely the following
args = ('(' !(self.applicationArgs(finalChar=')')):args ')'
-> args
| -> [])
Looks like it could return an empty list instead of throwing an exception if the argument list is malformed.
parsley.makeGrammar("death = ('a' | ws)*", {})('a ').death() # never returns
parsley.makeGrammar("death = ('a' | (' ' | '\t' | '\n')*)*", {})('a ').death() # never returns
parsley.makeGrammar("death = ('a' | ' '*)*", {})('a ').death() # never returns
parsley.makeGrammar("death = ('a' | 'b'*)*", {})('ab').death() # never returns
parsley.makeGrammar("never_returns = ('a'*)*", {})('a').never_returns()
Here are some nearby constructions that work okay:
parsley.makeGrammar("ok = ('a' | ' ')*", {})('a ').ok() # returns ['a', ' ']
parsley.makeGrammar("ok = 'a' ws", {})('a ').ok() # returns True
parsley.makeGrammar("ok = ('a' ws)*", {})('a ').ok() # returns [True]
parsley.makeGrammar("ok = ('a' | ' ' | '\t' | '\n')*", {})('a ').ok() # returns ['a', ' ']
Here's the stack trace when I manually terminate the never returning grammar:
KeyboardInterruptTraceback (most recent call last)
<ipython-input-65-b38e7b74e28c> in <module>()
----> 1 parsley.makeGrammar("death = ('a' | ws)*", {})('a ').death()
/home/notebook/anaconda2/lib/python2.7/site-packages/parsley.pyc in invokeRule(*args, **kwargs)
83 """
84 try:
---> 85 ret, err = self._grammar.apply(name, *args)
86 except ParseError as e:
87 self._grammar.considerError(e)
/home/notebook/anaconda2/lib/python2.7/site-packages/ometa/runtime.pyc in apply(self, ruleName, *args)
460 r = getattr(self, "rule_"+ruleName, None)
461 if r is not None:
--> 462 val, err = self._apply(r, ruleName, args)
463 return val, err
464
/home/notebook/anaconda2/lib/python2.7/site-packages/ometa/runtime.pyc in _apply(self, rule, ruleName, args)
493 try:
494 memoRec = self.input.setMemo(ruleName,
--> 495 [rule(), self.input])
496 except ParseError as e:
497 e.trail.append(ruleName)
/pymeta_generated_code/pymeta_grammar__Grammar.py in rule_death(self)
20 self._trace("'\n'", (28, 31), self.input.position)
21 _G_exactly_8, lastError = self.exactly('\n')
---> 22 self.considerError(lastError, None)
23 return (_G_exactly_8, self.currentError)
24 _G_not_9, lastError = self._not(_G_not_7)
/home/notebook/anaconda2/lib/python2.7/site-packages/ometa/runtime.pyc in many(self, fn, *initial)
552 try:
553 m = self.input
--> 554 v, _ = fn()
555 ans.append(v)
556 except ParseError as err:
/pymeta_generated_code/pymeta_grammar__Grammar.py in _G_many_1()
17 return (_G_exactly_5, self.currentError)
18 def _G_or_6():
---> 19 def _G_not_7():
20 self._trace("'\n'", (28, 31), self.input.position)
21 _G_exactly_8, lastError = self.exactly('\n')
/home/notebook/anaconda2/lib/python2.7/site-packages/ometa/runtime.pyc in _or(self, fns)
596 try:
597 m = self.input
--> 598 ret, err = f()
599 errors.append(err)
600 return ret, joinErrors(errors)
/pymeta_generated_code/pymeta_grammar__Grammar.py in _G_or_2()
9 _G_exactly_1, lastError = self.exactly('//')
10 self.considerError(lastError, 'c_comment')
---> 11 def _G_consumedby_2():
12 def _G_many_3():
13 def _G_or_4():
/home/notebook/anaconda2/lib/python2.7/site-packages/ometa/runtime.pyc in exactly(self, wanted)
529 val, p, self.input = self.input.slice(len(wanted))
530 else:
--> 531 val, p = self.input.head()
532 self.input = self.input.tail()
533 if wanted == val:
/home/notebook/anaconda2/lib/python2.7/site-packages/ometa/runtime.pyc in head(self)
235 else:
236 data = self.data
--> 237 raise EOFError(data, self.position + 1)
238 return self.data[self.position], self.error
239
/home/notebook/anaconda2/lib/python2.7/site-packages/ometa/runtime.pyc in __init__(self, input, position)
113 """
114 def __init__(self, input, position):
--> 115 ParseError.__init__(self, input, position, eof())
116
117
/home/notebook/anaconda2/lib/python2.7/site-packages/ometa/runtime.pyc in eof()
125
126
--> 127 def eof():
128 """
129 Return an indication that the end of the input was reached.
KeyboardInterrupt:
The traceback can be view here: https://gist.github.com/introom/6101943
The reason the problem occurs can been seen from:
https://gist.github.com/introom/6101943#file-gistfile1-txt-L8
\x00\x12_error_description**\x00\x00***, we matches anything{int(\x00\x00)}, and there is some remaining data in the input, so the errors become None.
And typically, test like this fails: https://github.com/twisted/parsley-protocols/blob/amp/parseproto/test/test_amp.py#L2181
Parsley should be able to produce a detailed trace of whch expression was executed, whether it succeeded or failed, and at what input location, for an entire parse.
At the very least we need \uXXXX escapes.
The following code will make parsley take 100% CPU and slowly growing in memory for a long long time. This is an incorrect grammar because the rule "output" is referencing "one" directly AND indirectly via the rule "number". It would be better if parsley detected this and threw a helpful error.
from parsley import makeGrammar, termMaker
from itertools import chain
x = makeGrammar("""
one = '1'
two = '2'
number = (one | two)*
output = (number | one)*
""", {})
print x("1122121").output()
As per convention, you may consider moving NEWS->CHANGES http://guide.python-distribute.org/creation.html
When a rule has multiple arguments and passes those arguments to another rule, all but the last argument are wrapped in single-valued tuples. This is a concern because the tuples are unexpected and the multiple arguments are not handled consistently.
Code example:
from parsley import makeGrammar
GRAMMAR = """
a :x :y :z = b(x, y, z) -> print("a: x={} y={} z={}".format(x, y, z))
b :x :y :z = c(x, y, z) -> print("b: x={} y={} z={}".format(x, y, z))
c :x :y :z = -> print("c: x={} y={} z={}".format(x, y, z))
"""
ParserClass = makeGrammar(GRAMMAR, {})
parser = ParserClass('')
parser.a(1, 2, 3)
Output from Python 3.5.3:
c: x=((1,),) y=((2,),) z=3
b: x=(1,) y=(2,) z=3
a: x=1 y=2 z=3
I'm parsing a script into tokens, and I'd like to store the parsed results of the child rules and the slice of consumed input (like the angle-brackets shortcut). In the grammar that receives these tokens, the context will determine whether the parsed results can be used, or if the original input is needed. I'm not sure if there's a way store both of these things at once with Parsley.
Consider the following Python code:
import parsley
single_digit = parsley.makeGrammar("integer = digit", {})
many_digits = parsley.makeGrammar("integer = digit+", {})
for grammar in (single_digit, many_digits):
try:
grammar("1x").integer()
except Exception as e:
print e
The single_digit
grammar matches 1
, then expects EOF but finds x
instead.
The many_digits
grammar matches 1
, then expects another digit or EOF, but finds x
instead.
I would expect both grammars to have the same, or at least similar parse errors. Instead, with Parsley 1.3 on Python 2.7, I get:
1x
^
Parse error at line 1, column 1: expected EOF. trail: []
1x
^
Parse error at line 2, column 0: expected EOF. trail: [digit]
...that is, single_digit
reports a sensible error at a sensible location, but many_digits
draws the caret under a character that it should have accepted, while reporting an error on line 2 of an input that contains no newline characters.
It looks like there's a lot going on here lately so I guess I shouldn't be surprised that the example program doesn't work. Using python 2.7 installed by ubuntu and pip install Parsley the example program gives:
Traceback (most recent call last):
File "parsley_stil.py", line 14, in <module>
result, error = g.stuff()
ValueError: too many values to unpack
It looks like instead of a (result,error) tuple, g.stuff() just returns result which is a 4 element list.
Tried cloning the repo and installing into a virtualenv -- same result
Tried digging into the code but too scary. Sorry.
The setup.py file still lists http://launchpad.net/parsley as the project's homepage, which means PyPI does as well. I think it should point to this github page, since everything on the launchpad.net page is for 1.1 and earlier.
The launchpad.net page should also be updated to indicate that the project has moved. I nearly fell under the impression that parsely might be abandoned because PyPI only contains a link ot the old page and the old page's repository hasn't been updated in ages.
Modifying the example in the readme to call a function:
from parsley import makeGrammar
def plusone(d):
return d + 1
exampleGrammar = """
ones = '1' '1' -> plusone(1)
twos = '2' '2' -> plusone(2)
stuff = (ones | twos)+
"""
Example = makeGrammar(exampleGrammar, {})
g = Example("11221111")
result = g.stuff()
print result
Fails on master (e58c0c6) as well as tag 1.2 (7ddaa41) with
NameError: name 'plusone' is not defined
It seems like this must have worked at one time for the calculator example in the tutorial to have worked.
Full error is:
Traceback (most recent call last):
File "parsley_example.py", line 13, in <module>
result = g.stuff()
File "/home/m/q/topaz/patterns/stil/parsley/v/local/lib/python2.7/site-packages/parsley.py", line 85, in invokeRule
ret, err = self._grammar.apply(name, *args)
File "/home/m/q/topaz/patterns/stil/parsley/v/local/lib/python2.7/site-packages/ometa/runtime.py", line 454, in apply
val, err = self._apply(r, ruleName, args)
File "/home/m/q/topaz/patterns/stil/parsley/v/local/lib/python2.7/site-packages/ometa/runtime.py", line 483, in _apply
[rule(), self.input])
File "/pymeta_generated_code/pymeta_grammar__Grammar.py", line 50, in rule_stuff
File "/pymeta_generated_code/pymeta_grammar__Grammar.py", line 47, in _G_many1_9
File "/home/m/q/topaz/patterns/stil/parsley/v/local/lib/python2.7/site-packages/ometa/runtime.py", line 584, in _or
ret, err = f()
File "/pymeta_generated_code/pymeta_grammar__Grammar.py", line 39, in _G_or_10
File "/home/m/q/topaz/patterns/stil/parsley/v/local/lib/python2.7/site-packages/ometa/runtime.py", line 483, in _apply
[rule(), self.input])
File "/pymeta_generated_code/pymeta_grammar__Grammar.py", line 14, in rule_ones
File "<string>", line 1, in <module>
NameError: name 'plusone' is not defined
I need to build a grammar from user inputs, and I was trying to do something like this
parsley.makeGrammar("d = '{test}'".format(test="TeSt"), {})("TeSt").d()
But having {test}
case insensitive, so I tried the next
parsley.makeGrammar("d = '(?i){test}'".format(test="TeSt"), {})("TeSt").d()
parsley.makeGrammar("d = (?i)'{test}'".format(test="TeSt"), {})("TeSt").d()
But none of them worked. Is there any way to achieve that?
In general, Parsley seems to not be very aware of Unicode. A specific manifestation of this problem, the documentation says:
letter:
Matches a single ASCII letter.
However, looking at the source code, I think it is doing something different. Looking at runtime.py
, the class OMetaBase
, the method letter()
is implemented by calling str.isalpha()
. The Python documentation describes isalpha as:
Python 2.7: Depends on locale
Python 3.5: Union of several Unicode categories
In other words, unless I am misunderstanding the source, the characters matched by letter
will depend on version of Python and in version 2.7 will depend on locale. This should be documented.
ws
and digit
have similar problems, btw. The ws
documentation says it "matches zero or more spaces, tabs, or newlines" but the implementation in the source appears to use str.isspace()
, which has similar behavior to isalpha.
Parsley should be able to consume input from files in chunks. Each ChunkedInputStream object should refer to a buffer, and if .tail() falls off the end of the buffer, the new ChunkedInputStream object should load a new chunk.
The translated ones are like those inside ometa/_generated/*.py,
I guess translate the grammar every time might cause some extra overhead.
Parsley should support ranges. Think we said
'A'..'Z'
was a reasonable syntax.
Hi, I would like to know what's happening with this project. The last tag (1.2) is from April last year and doesn't have Python 3 support, since then no proper release has been done.
Maybe if I create some tickets it will persuade me to actually send you pull requests.
Right now _GrammarWrapper eats kwargs, so you can't actually get anything to the rules without icky positional args.
ParseError defines a __eq__
method, but no __hash__
method. The default __hash__
method fails because the error
attribute is a list. This breaks PyTest and Python's traceback.format_exception_only
function.
Discovery
During testing of a new grammar, PyTest reported an internal error when Parsley raised a ParseError.
INTERNALERROR> File "/home/joel/.virtualenvs/meetup2xibo/lib/python3.5/site-packages/_pytest/_code/code.py", line 481, in exconly
INTERNALERROR> lines = format_exception_only(self.type, self.value)
INTERNALERROR> File "/usr/lib/python3.5/traceback.py", line 136, in format_exception_only
INTERNALERROR> return list(TracebackException(etype, value, None).format_exception_only())
INTERNALERROR> File "/usr/lib/python3.5/traceback.py", line 439, in __init__
INTERNALERROR> _seen.add(exc_value)
INTERNALERROR> TypeError: unhashable type: 'ParseError'
The TypeError is raised by Python standard library function traceback.format_exception_only
.
A Simple Test
The following code demonstrates the use of traceback.format_exception_only
and raises the TypeError without involving PyTest.
from parsley import makeGrammar, ParseError
import sys
import traceback
def format_exception():
(last_type, last_value, last_traceback) = sys.exc_info()
return traceback.format_exception_only(last_type, last_value)
def parse(text):
parser = makeGrammar("foo = 'a'", {})
try:
return parser(text).foo()
except ParseError:
return format_exception()
def divide(x, y):
try:
return x / y
except ZeroDivisionError:
return format_exception()
def test():
print(divide(6, 2))
print(divide(6, 0))
print(parse('a'))
print(parse('b'))
test()
Running the test code with Python 3.5 gives the following results:
traceback.format_exception_only
fails to format the error.$ python foo.py
3.0
['ZeroDivisionError: division by zero\n']
a
Traceback (most recent call last):
File "foo.py", line 28, in <module>
test()
File "foo.py", line 26, in test
print(parse('b'))
File "foo.py", line 14, in parse
return format_exception()
File "foo.py", line 7, in format_exception
return traceback.format_exception_only(last_type, last_value)
File "/usr/lib/python3.5/traceback.py", line 136, in format_exception_only
return list(TracebackException(etype, value, None).format_exception_only())
File "/usr/lib/python3.5/traceback.py", line 439, in __init__
_seen.add(exc_value)
TypeError: unhashable type: 'ParseError'
Workaround
The following code monkey patches PyError to add a __hash__
method.
def parse_error_hash(self):
"""Define missing ParseError.__hash__()."""
return hash((self.position, self.formatReason()))
ParseError.__hash__ = parse_error_hash
Rerunning the test code with the monkey patch gives successful results.
o$ python foo.py
3.0
['ZeroDivisionError: division by zero\n']
a
["ometa.runtime.ParseError: \nb\n^\nParse error at line 1, column 0: expected the character 'a'. trail: [foo]\n\n"]
Function names should be lowercase, with words separated by underscores as necessary to improve readability.
PEP 8.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.