vlasovskikh / funcparserlib Goto Github PK
View Code? Open in Web Editor NEWRecursive descent parsing library for Python based on functional combinators
Home Page: https://funcparserlib.pirx.ru
License: MIT License
Recursive descent parsing library for Python based on functional combinators
Home Page: https://funcparserlib.pirx.ru
License: MIT License
LL(k) parsing is O(N) while LL(*) is O(2^N).
Original issue reported on code.google.com by andrey.vlasovskikh
on 26 May 2011 at 10:53
Usually text positions are 1-based. A text starts with line 1, position 1.
The lexer from the current trunk uses 1-based lines, but 0-based positions.
This fact creates confusion in error messages.
Original issue reported on code.google.com by andrey.vlasovskikh
on 14 Mar 2010 at 10:11
One may want to just see his parse tree before writing AST-building
"semantic" functions (for `>>` a.k.a. `fmap`).
A parse tree is a list structure (in the Lisp sense) of tokens.
Original issue reported on code.google.com by andrey.vlasovskikh
on 6 Oct 2009 at 9:01
What steps will reproduce the problem?
>>> from funcparserlib.lexer import *
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
AttributeError: 'module' object has no attribute 'tokenize'
`tokenize` is commented in `lexer.py` but it is still present in its `__all__`.
Funcparserlib 0.3.3
Original issue reported on code.google.com by s.astanin
on 9 Sep 2009 at 7:37
I'm not sure how difficult this would be
but it would be nice to be able to use
funcparserlib along with RPython.
Original issue reported on code.google.com by [email protected]
on 25 Jul 2013 at 12:25
Such as:
/-- []
|-- 0
| `-- 1
|-- 1
| `-- None
|-- 2
| `-- {}
| |-- a
| | `-- True
| `-- c
| `-- []
|-- 3
| `-- foo
`-- 4
`-- []
|-- 0
| `-- 4
`-- 1
`-- 5
Original issue reported on code.google.com by andrey.vlasovskikh
on 25 Jul 2009 at 11:27
Maybe it will run faster.
Original issue reported on code.google.com by andrey.vlasovskikh
on 26 May 2011 at 10:45
What steps will reproduce the problem?
1. Try to use with Python 3
What is the expected output? What do you see instead?
Lots of syntax errors -- u'...' is now a syntax error, as is 'except Foo, e'.
OK, so I'm actually willing to do the work on this I think, but what I want to
know is how concerned you are with earlier Python versions. For instance, it's
possible to 'from __future__ import unicode_literals' and then change u'...' to
'...', but only starting with Python 2.6 Are you concerned with supporting
versions of Python earlier than that?
Original issue reported on code.google.com by [email protected]
on 20 Nov 2012 at 4:29
Several people have reported that they wrote parsers that ran forever
entering an infinite loop. The problem is documented in FAQ [1].
Find some way to prevent passing this in runtime (by raising an
exception?), i. e. prevent passing a (standard?) universally successful
parser to the `many` combinator.
[1]: http://archlinux.folding-maps.org/2009/funcparserlib/FAQ
Original issue reported on code.google.com by andrey.vlasovskikh
on 6 Oct 2009 at 8:45
[deleted issue]
Consider the code
from funcparserlib.parser import a
p = a('a') + (a('b') | a('c'))
p.parse("ad")
The parse failure produces a stack of exceptions that ends with "funcparserlib.parser.NoParseError: got unexpected token: d". This message shows what was unexpected, but not what was expected. A better message would be something like "got unexpected token d; expected b or c". Such messages could be built using the name
attribute of parsers.
From the tutorial:
See how composition works. We compose a parser some(...) of type Parser(Token, Token) with the function tokval and we get a value of type Parser again, but this time it is Parser(Token, str). Let's put it this way: the set of parsers is closed under the application of >> to a parser and a function of type a -> b.
As a functional programmer, this is confusing since this is an instance of fmapping, not function composition. Also, the tutorial also uses "composition" to refer to composing two parsers
We should be careful and compose parsers using | so that they don't conflict with each other:
To me, the first quote would make more sense if it read:
See how fmapping works. We fmap the function tokval with a parser some(...) of type Parser(Token, Token) and we get a value of type Parser again, but this time it is Parser(Token, str). Let's put it this way: the set of parsers is closed under the application of >> to a parser and a function of type a -> b.
Of course, if someone does not know about functors, this might be confusing.
The doc folder contains multiple files with the license "Creative Commons Attribution-Noncommercial-Share Alike 3.0" which makes those files non-free from a distribution point of perspective and therefore make shipping them with a Linux distribution harder.
Would it be possible to relicense those files or drop them altogether to avoid these licensing issues?
Consider the following simple and fictive example:
from funcparserlib.parser import many, some, finished
code = "aa"
p1 = some(lambda x: x == 'a')
p2 = many(p1)
p_or = p1 | p2
parsed_tokens = (p3+finished).parse("aa")
# Error, since it would short-circuit to p1!
I encounter the case when writing a toy parser for a modeling language.
I'm not really familiar with the parsing theory, so I was wondering if this is the intended behavior for a parser combinator?
If it is not, how may I fix the __or__
method of the Parser
class (maybe even more)? I thought about propagating the exceptions back to the __or__
method, or maybe you would have a better idea?
nose it's faster than my scripts and it gives nice overall stats, etc.
Original issue reported on code.google.com by andrey.vlasovskikh
on 26 May 2011 at 11:28
What steps will reproduce the problem?
1. Give invalid data to parsers
2. Catch exceptions from funcparserlib
3. Catched exception does not include parsing error information
What is the expected output? What do you see instead?
a) Instance of LexerError does not have exception message,
I cannot get error message by str(e).
b) Instance of NoParserError does not have 'place' attribute,
errored position was included only exception messages.
So I cannot get errored position directory.
(LexerError has 'place' attribute)
Original issue reported on code.google.com by i.tkomiya
on 15 May 2011 at 6:10
funcparserlib should support both Python 2 and 3.
Original issue reported on code.google.com by andrey.vlasovskikh
on 22 Nov 2011 at 8:22
from funcparserlib.lexer import make_tokenizer
from funcparserlib.parser import some
tokenize = make_tokenizer([
(u'x', (ur'x',)),
])
some(lambda t: t.type == "x").parse(tokenize("x"))
results in
Traceback (most recent call last):
File "/Users/gsnedders/Documents/other-projects/funcparserlib/funcparserlib/funcparserlib/tests/test_parsing.py", line 76, in test_tokenize
some(lambda t: t.type == "x").parse(tokenize("x"))
File "/Users/gsnedders/Documents/other-projects/funcparserlib/funcparserlib/funcparserlib/parser.py", line 121, in parse
(tree, _) = self.run(tokens, State())
File "/Users/gsnedders/Documents/other-projects/funcparserlib/funcparserlib/funcparserlib/parser.py", line 309, in _some
if s.pos >= len(tokens):
TypeError: object of type 'generator' has no len()
tokenize("x")
is a generator, and you can't call len
on a generator.
tokenize = make_tokenizer([
(u'x', (br'\xff\n',)),
])
tokens = list(tokenize(b"\xff\n"))
throws
File "/Users/gsnedders/Documents/other-projects/funcparserlib/funcparserlib/funcparserlib/tests/test_parsing.py", line 76, in test_tokenize_bytes
tokens = list(tokenize(b"\xff\n"))
File "/Users/gsnedders/Documents/other-projects/funcparserlib/funcparserlib/funcparserlib/lexer.py", line 107, in f
t = match_specs(compiled, str, i, (line, pos))
File "/Users/gsnedders/Documents/other-projects/funcparserlib/funcparserlib/funcparserlib/lexer.py", line 91, in match_specs
nls = value.count(u'\n')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)
match_specs
needs to handle unicode
and bytes
line feed characters.
What steps will reproduce the problem?
1. Try to evaluate "3**3**3" in the tutorial calculator parser in
https://bitbucket.org/vlasovskikh/funcparserlib/src/16ed98522a11620d10f5c3a9d363
82e2e0931c59/doc/Tutorial.md?at=0.3.x
2. Try to evaluate "3**3**3" in Python.
What is the expected output? What do you see instead?
3**3**3 should be 3**(3**3) == 7625597484987, but when treated as
left-associative, it is (3**3)**3 == 3**(3*3) == 3**9 == 19683
What version of the product are you using? On what operating system?
0.3.6
Please provide any additional information below.
The following code will make the ** operator right-associative.
def eval_expr_r(lst, z):
return reduce(lambda s, (x, f): f(x, s), reversed(lst), z)
eval_r = unarg(eval_expr_r)
factor = many(primary + pow) + primary >> eval_r
Or simply design the grammar like this:
@with_forward_decls
def factor():
return (
primary + pow + factor >> (lambda (x,f,y): f(x,y))
| primary
)
Original issue reported on code.google.com by [email protected]
on 19 Aug 2013 at 4:55
In documentation you introduced an example of JSON parser, this format considered to be intrinsically circular on grammar level. So, what if my grammar has circular nature just like json`s one, but parsers spread to several modules, hence I have circular modules imports. Is there any built-in way to handle this?
It is needed.
Original issue reported on code.google.com by andrey.vlasovskikh
on 24 Jul 2009 at 7:49
from funcparserlib.parser import a, many
A = a('A')
B = a('B')
x = many(A + B) + A + A
print x.parse('ABABAA')
This raises a NoParseError("no tokens left in stream") despite being passed a
valid string.
Looking into it a bit I found that the many() is consuming the penultimate A in
the string even though it wasn't matched. This is because many returns the
state from the exception raised when (A + B) failed to parse "AA" - however the
first "A" has been consumed at this point as it was a valid first token for
the failed parse.
Returning the state after the last successful parse fixed it for me:
diff -r 82b1066c6c18 src/funcparserlib/parser.py
--- a/src/funcparserlib/parser.py Sat Aug 06 15:03:56 2011 +0400
+++ b/src/funcparserlib/parser.py Thu Nov 17 13:01:21 2011 +0000
@@ -323,7 +323,7 @@
v, s = self.p(tokens, s)
res.append(v)
except _NoParseError, e:
- return res, e.state
+ return res, s
def ebnf(self):
return u'{ %s }' % self.p
Original issue reported on code.google.com by [email protected]
on 17 Nov 2011 at 1:06
What steps will reproduce the problem?
1." make unittest" fails.
$ LANG=C make test
/usr/bin/python -m unittest discover tests
FE
======================================================================
ERROR: test_many_backtracking (test_parsing.ParsingTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/kohei/devel/debpkg/funcparserlib/funcparserlib-0.3.5+hg~153/tests/test_parsing.py", line 15, in test_many_backtracking
self.assertEqual(expr.parse(u'xyxyxx'),
File "/usr/lib/python2.7/dist-packages/funcparserlib/parser.py", line 124, in parse
raise NoParseError(u'%s: %s' % (e.msg, tok))
NoParseError: no tokens left in the stream: <EOF>
======================================================================
FAIL: test_error_info (test_parsing.ParsingTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/kohei/devel/debpkg/funcparserlib/funcparserlib-0.3.5+hg~153/tests/test_parsing.py", line 30, in test_error_info
u'cannot tokenize data: 1,6: "f is \u0444"')
AssertionError: u'cannot tokenize data: 1,5: "f is \u0444"' != u'cannot
tokenize data: 1,6: "f is \u0444"'
- cannot tokenize data: 1,5: "f is \u0444"
? ^
+ cannot tokenize data: 1,6: "f is \u0444"
? ^
----------------------------------------------------------------------
Ran 2 tests in 0.002s
FAILED (failures=1, errors=1)
make: *** [unittest] Error 1
What is the expected output? What do you see instead?
What version of the product are you using? On what operating system?
funcparserlib branch 0.3.x revision 153
Debian GNU/Linux Sid
Python 2.7.2+
Please provide any additional information below.
make examples are successful.
$ LANG=C make examples
/usr/bin/python -m unittest discover examples
......................
----------------------------------------------------------------------
Ran 22 tests in 0.026s
OK
Original issue reported on code.google.com by [email protected]
on 5 Dec 2011 at 3:15
Sphinx is nice.
Original issue reported on code.google.com by andrey.vlasovskikh
on 26 May 2011 at 11:22
Over in hylang/hy#2026, we're hitting a bug that was fixed 6 years ago, because the last release of funcparserlib on PyPI is nearly 8 years old.
The Nathan Sanders blog is gone. Found it in the Way Back Machine, though. Replace the link in your README?
https://web.archive.org/web/20120507001413/http://sandersn.com/blog/?tag=/monads
It could be something like `Maybe a` or `Either a Error` in Haskell. Maybe
analyzing return values will perform better than exceptions while keeping
the code clean.
Original issue reported on code.google.com by andrey.vlasovskikh
on 6 Oct 2009 at 8:54
Try to write a LL(k) grammar detection function (a *non*-LL(k) detection
function with counter-examples will be nicer to have).
A warning should be user-friendly and helpful for optimizing the grammar.
Original issue reported on code.google.com by andrey.vlasovskikh
on 26 May 2011 at 10:59
Setup.py install
* does not create necessary installation directories in prefix path
* requires write access to system-wide /usr/lib/python2.5/site-packages
even when installing to a user-writable prefix
* is likely to interfere with package manager (due to writing to /usr/lib)
Prefix directory is user-writable (/usr/local/stow). Usually I install
manually built packages there and deploy them with GNU stow. It works for
most of the python packages I have built this was. It does not work for
funcparserlib-0.3.3
Installation log:
funcparserlib-0.3.3$ python setup.py build
...
funcparserlib-0.3.3$ python setup.py install
--prefix=/usr/local/stow/funcparserlib-0.3.3
running install
Checking .pth file support in
/usr/local/stow/funcparserlib-0.3.3/lib/python2.5/site-packages/
error: can't create or remove files in install directory
... [ long explanations follow and suggest to create target directory ]
If I create
/usr/local/stow/funcparserlib-0.3.3/lib/python2.5/site-packages/ manually,
install terminates with an error reporting that prefix is not on PYTHONPATH.
If I define PYTHONPATH, install still terminates with:
error: could not create 'build/bdist.linux-i686/egg': Permission denied
Then I install it like (I don't like to do it under sudo, but it seems it
requires write access to /usr/lib anyway):
funcparserlib-0.3.3$ sudo
PYTHONPATH=/usr/local/stow/funcparserlib-0.3.3/lib/python2.5/site-packages/
python setup.py install --prefix=/usr/local/stow/funcparserlib-0.3.3
running install
... [ skipping ]
creating build/bdist.linux-i686/egg
I think `setup.py install --prefix=anyprefix` should not write anywhere
outside of `anyprefix`. I think it should create all necessary directories
automatically.
Debian/Lenny, Python 2.5.2, Setuptools 0.6c8-4
Original issue reported on code.google.com by s.astanin
on 8 Sep 2009 at 12:09
What steps will reproduce the problem?
1. Run program using funcparserlib-0.3.4 under python 2.4
What is the expected output? What do you see instead?
raises SyntaxError in funcparserlib/util.py (line 38)
What version of the product are you using? On what operating system?
funcparserlib-0.3.4
python 2.4
Please provide any additional information below.
I made patch for this problem.
This error is caused by ternary operator (It's supported from python2.4)
Original issue reported on code.google.com by i.tkomiya
on 8 Jan 2011 at 3:13
Attachments:
Possible techniques: predictive parsing, memoization, parallel computations
(only linear performance improvements though).
Original issue reported on code.google.com by andrey.vlasovskikh
on 26 May 2011 at 11:03
I'd like to have a parser similar to maybe()
, but one that does not return anything (like skip()
) if no match is found.
Here is an example:
def optional(p):
return p | (pure(None) >> _Ignored)
p = a('x') + optional(a('y')) + a('z')
print(p.parse('xyz')) # --> ('x', 'y', 'z')
print(p.parse('xz')) # --> ('x', 'z')
The issue here is that _Ignored
is not in the public interface.
Is it possible to write optional()
by using only the public interface of funcparserlib
?
Hi!
I'm getting an error at line 47: return reduce(lambda s, (f, x): f(s, x), list, z)
It complains about the ( before f
E. g. introduce a class for it in order to set options via named arguments
of the constructor.
Original issue reported on code.google.com by andrey.vlasovskikh
on 6 Oct 2009 at 8:38
As for now, they're making people think hard.
Original issue reported on code.google.com by andrey.vlasovskikh
on 26 May 2011 at 11:24
The documentation should be written in Sphinx and should look like all other
Python docs.
Original issue reported on code.google.com by andrey.vlasovskikh
on 26 May 2011 at 11:32
The new API should be clean and convenient for builtin token types and should
allow using custom token types.
Original issue reported on code.google.com by andrey.vlasovskikh
on 26 May 2011 at 11:08
Tests for JSON parser fail with false left recursion `GrammarError`.
Original issue reported on code.google.com by andrey.vlasovskikh
on 27 May 2011 at 8:19
I've ended up with something like:
header = some(lambda tok: tok.type == "HEADER")
data = some(lambda tok: tok.type == "DATA")
empty_line = some(lambda tok: tok.type == "EMPTY")
body = many(data | empty_line)
segment = header + body
segments = segment + many(skip(empty_line) + segment)
This ends up with an unexpected token error for the second HEADER token with a token stream like HEADER BODY EMPTY HEADER BODY
as the EMPTY gets consumed by body
and hence it cannot be consumed by segments
.
In Haskell I'd solve this with something like body = data <|> (try ( do { empty_line ; notFollowedBy header } ))
. As far as I can tell, there's nothing comparable to try
or notFollowedBy
. Is there any sensible way to define such a grammar?
`some` is potentially slower than `tok` and not acceptable for the grammar
class analysis.
Original issue reported on code.google.com by andrey.vlasovskikh
on 26 May 2011 at 10:49
This optimization possibly has bugs (i. e. `_Alt` is incorrectly optimized).
Tests are needed.What steps will reproduce the problem?
1.
2.
3.
What is the expected output? What do you see instead?
Please use labels and text to provide additional information.
Original issue reported on code.google.com by andrey.vlasovskikh
on 26 May 2011 at 11:17
I tried to make funcparserlib compatible both python2 and 3.
https://code.google.com/r/aodagx-funcparserlib/source/detail?r=632313a710c3eb898
31478ae717dae5f3d576375&name=py3
please merge that.
Original issue reported on code.google.com by [email protected]
on 20 Apr 2013 at 8:08
The funcparserlib Tutorial is a large introduction to the library. One may
want to just start using things, not reading docs. So a much shorter howto
for such a person is needed.
Parsing and counting matched brackets could be used as an example.
Original issue reported on code.google.com by andrey.vlasovskikh
on 23 Jul 2009 at 8:21
Current tutorials are too big and outdated.
To use as exampels: nested brackets, Lisp S-Exprs and JSON.
Format: rst.
Original issue reported on code.google.com by andrey.vlasovskikh
on 26 May 2011 at 11:30
For example, instead of:
from funcparserlib.lexer import make_tokenizer, Token
from funcparserlib.parser import tok, many, fwd
allow this:
from funcparserlib import make_tokenizer, Token, tok, many, fwd
Original issue reported on code.google.com by andrey.vlasovskikh
on 26 May 2011 at 11:19
What steps will reproduce the problem?
1. read README, setup.py
Original issue reported on code.google.com by [email protected]
on 10 Oct 2009 at 7:23
As of now, all links pointing to
http://archlinux.folding-maps.org/2009/funcparserlib/ are unreachable - and
that's most of docs. Can you please consider hosting all docs on the main
project site?
Original issue reported on code.google.com by [email protected]
on 13 Feb 2013 at 7:10
It would be nice to have a library of typical parsers, such as a parser of
int or float literals, escaped strings etc.
Typical tokenizer specs could be also useful.
Original issue reported on code.google.com by andrey.vlasovskikh
on 6 Oct 2009 at 9:03
Example:
x = a('x')
nonhalting = fwd()
nonhalting.define(maybe(x) + nonhalting + x)
assert non_halting(nonhalting)
Original issue reported on code.google.com by andrey.vlasovskikh
on 26 May 2011 at 11:13
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.