swaroopch / edn_format Goto Github PK
View Code? Open in Web Editor NEWEDN reader and writer implementation in Python, using PLY (lex, yacc)
Home Page: https://swaroopch.com/2012/12/24/edn-format-python/
License: Other
EDN reader and writer implementation in Python, using PLY (lex, yacc)
Home Page: https://swaroopch.com/2012/12/24/edn-format-python/
License: Other
edn_format.loads("()")
should parse to ()
but I get "Syntax error! LexToken(LIST_END,')',1,1)"
Would you consider a PR to load edn form just as standard Python dict with string keys?
Could you push the latest up to PyPI? Thanks!
It'd be nice if there were a way to pretty-print EDN, in the same way that the json
module supports pretty-printing.
edn allows * in symbols, but this lib skips them. Note I didn't say that it raises an error, which is a different problem with this lib. ;)
Example:
>>> edn_format.loads('^:test []')
Traceback (most recent call last):
File "/home/p-himik/.local/share/virtualenvs/strayhorn-test-HPuom5qM/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-4-f8dc7704d4e7>", line 1, in <module>
edn.loads('^:test []')
File "/home/p-himik/.local/share/virtualenvs/strayhorn-test-HPuom5qM/lib/python3.6/site-packages/edn_format/edn_parse.py", line 160, in parse
return p.parse(text, lexer=lex())
File "/home/p-himik/.local/share/virtualenvs/strayhorn-test-HPuom5qM/lib/python3.6/site-packages/ply/yacc.py", line 333, in parse
return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
File "/home/p-himik/.local/share/virtualenvs/strayhorn-test-HPuom5qM/lib/python3.6/site-packages/ply/yacc.py", line 1063, in parseopt_notrack
lookahead = get_token() # Get the next token
File "/home/p-himik/.local/share/virtualenvs/strayhorn-test-HPuom5qM/lib/python3.6/site-packages/ply/lex.py", line 386, in token
newtok = self.lexerrorf(tok)
File "/home/p-himik/.local/share/virtualenvs/strayhorn-test-HPuom5qM/lib/python3.6/site-packages/edn_format/edn_lex.py", line 254, in t_error
c=t.value[0], p=t.lexpos, a=t.value[0:100]))
File "<string>", line unknown
SyntaxError: Illegal character '^' with lexpos 0 in the area of ...^:test []...
The character values (like \space) are parsed as strings, which is how they are normally represented is Python code anyway. However, there are two (in broad sense) issues with your current implementation (PyPI version).
The first issue is that a lot of character literals are simply unsupported; more specifically (comparing with default edn implementation in Clojure, clojure.edn/read-string):
In general, Clojure edn parser seems to behave thus, upon encountering backslash:
The second issue is that there is no way to emit character values with dumps().
Proposed solution would be providing a corresponding class for that specific purpose (named, for example, Char); alternatively, it might be used for reading as well, to keep type information (in this case, class probably should be extending str to keep backward-compatibility).
/
, *
, ?
, and %
do not parse as Symbol
Relatively minor bug, but things like #inst "2014" (turns into Jan 1, 2014, 00:00:00.00) are valid edn.
>>> import edn_format
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.3/site-packages/edn_format/__init__.py", line 2, in <module>
from .edn_lex import Keyword, Symbol
File "/usr/local/lib/python3.3/site-packages/edn_format/edn_lex.py", line 157
print "Illegal character '%s'" % t.value[0]
^
SyntaxError: invalid syntax
Hex notation e.g. 0xFFDC73
causes traceback:
Sample EDN:
{"AMBER" {:id "AMBER" :name "Amber 80%" :color 0xFFDC73}}
pip details:
"edn-format": {
"hashes": [
"sha256:aa3cc3041497b7e22386f96c44658e589597d9d26df39c4a77bbac0c8091ea88"
],
"index": "pypi",
"version": "==0.6.3"
}
return edn_format.loads(edn_str)
File "/usr/local/lib/python3.6/site-packages/edn_format/edn_parse.py", line 161, in parse
return p.parse(text, lexer=lex())
File "/usr/local/lib/python3.6/site-packages/ply/yacc.py", line 333, in parse
return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
File "/usr/local/lib/python3.6/site-packages/ply/yacc.py", line 1120, in parseopt_notrack
p.callable(pslice)
File "/usr/local/lib/python3.6/site-packages/edn_format/edn_parse.py", line 78, in p_map
raise EDNDecodeError('Even number of terms required for map')
edn_format.exceptions.EDNDecodeError: Even number of terms required for map
Now, emitting mutable values may or may not be an issue depending on the point of view, but since you went to the trouble of implementing immutable dicts, you probably want them to be completely immutable.
data = edn_format.loads('{5 [1 2 3]}')
print( type(data) ) # <class 'edn_format.immutable_dict.ImmutableDict'>
data[5].append(8)
print(data) # {5: [1, 2, 3, 8]}
...Also, Python lists are unhashable, so both "{[1 2 3] 5}" and "#{[1 2 3] 5}" generate TypeError.
Proposed solution is to add a corresponding class (named, for example, ImmutableVector), implemented in similar fashion to ImmutableDict; however, whether it extends list or not (it would make sense to implement it using tuples), it would break code that mutates parsed vectors.
set([1,2,3,edn_format.Symbol("a")]) == set([1,2,3,edn_format.Symbol("a")])
yields False
Yet: edn_format.Symbol("a") == edn_format.Symbol("a")
yields True
@bfontaine It appears python setup.py bdist_wheel
is not a valid command any more, I uploaded v0.7.0
but I'm not certain if the uploaded package is general or Linux-only. Can you please help me with any guidance here?
https://pypi.org/project/edn-format/#files
Thanks!
specifically 45e+43
and 45.4e+43M
When warnings are enabled, warnings like these are printed on importing edn_format
with Python 3.7:
/Users/dmajda/tmp/warnings/edn_format/.venv/lib/python3.7/site-packages/edn_format/immutable_dict.py:7: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
class ImmutableDict(collections.Mapping):
/Users/dmajda/tmp/warnings/edn_format/.venv/lib/python3.7/site-packages/edn_format/immutable_list.py:8: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
class ImmutableList(collections.Sequence, collections.Hashable):
/Users/dmajda/tmp/warnings/edn_format/.venv/lib/python3.7/site-packages/edn_format/immutable_list.py:8: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
class ImmutableList(collections.Sequence, collections.Hashable):
It would be good if edn_format
was warning-free. In our specific case, the warnings pollute the test suite output.
$ python3 --version
Python 3.7.0
$ python3 -m venv .venv
$ . .venv/bin/activate
$ pip3 install edn_format
[...]
$ python3 -Wall -c 'import edn_format'
/Users/dmajda/tmp/warnings/edn_format/.venv/lib/python3.7/site-packages/edn_format/immutable_dict.py:7: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
class ImmutableDict(collections.Mapping):
/Users/dmajda/tmp/warnings/edn_format/.venv/lib/python3.7/site-packages/edn_format/immutable_list.py:8: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
class ImmutableList(collections.Sequence, collections.Hashable):
/Users/dmajda/tmp/warnings/edn_format/.venv/lib/python3.7/site-packages/edn_format/immutable_list.py:8: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
class ImmutableList(collections.Sequence, collections.Hashable):
import edn_format
edn_format.dumps({1,2,3})
'#{1 2 3}'
edn_format.loads("[1 true nil]")
WARNING: Couldn't open 'parser.out'. [Errno 30] Read-only file system: 'python3-3.7.5-env/lib/python3.7/site-packages/edn_format/parser.out'
Generating LALR tables
WARNING: Couldn't create 'edn_format.parsetab'. [Errno 30] Read-only file system: 'python3-3.7.5-env/lib/python3.7/site-packages/edn_format/parsetab.py'
[1, True, None]
edn_format.loads_all("1 2 3 4")
WARNING: Couldn't open 'parser.out'. [Errno 30] Read-only file system: 'python3-3.7.5-env/lib/python3.7/site-packages/edn_format/parser.out'
Generating LALR tables
WARNING: Couldn't create 'edn_format.parsetab'. [Errno 30] Read-only file system: 'python3-3.7.5-env/lib/python3.7/site-packages/edn_format/parsetab.py'
[1, 2, 3, 4]
Support for edn_format.loads("#tag {:a :b}").
Thanks
Values are mapped inconsistently, i. e.
loads('{ :a false, :b false }')
gives
{Keyword(b): False, Keyword(a): Symbol(false)}
Disregard :)
Since we're targeting the clojure data format, why not support tag handlers through simple functions instead of requiring a TaggedElement class?
The nice thing is that if you only want to go in one direction, =edn_format.add_tag(myfunc, 'mytag')= already works.
It would be relatively simple to just take one or two functions instead of a class for handlers.
It would also be nice if the module functions like =edn_format.dumps= and =edn_format.loads= took optional functions as parameters for handlers.
Downloading/unpacking edn-format
Downloading edn_format-0.2.tar.gz
Running setup.py egg_info for package edn-format
Traceback (most recent call last):
File "", line 14, in
File "/home/mariano/build/edn-format/setup.py", line 13, in
long_description=open('README.md').read(),
IOError: [Errno 2] No such file or directory: 'README.md'
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 14, in
File "/home/mariano/build/edn-format/setup.py", line 13, in
long_description=open('README.md').read(),
IOError: [Errno 2] No such file or directory: 'README.md'
The library produces wrong EDN output for escaped strings. For example
edn_format.dumps('\"')
produces the invalid EDN representation
"""
,
where the correct representation would be
"\""
.
Is it neccessary for the requirement pytz==2016.10 to be so strict? It seems to run the test fine with any other version I've tried (between 2011d and 2017.2) so having such a strict requirements.txt seems to me to be just risking causing conflicts.
Per Datomic's edn requirements (made by the same people that worked on Clojure adn edn).
Patterns that are actually accepted by Clojure's edn reader can be found here: https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/EdnReader.java#L27
This must be followed in order for any edn library to work with Datomic, which is often the point of edn support to begin with.
I discovered this library looking for a tool to parse Roam EDN exports. (Roam is a Clojure project, so EDN is their most native export format.)
Trying to parse a standard export with this library, I immediately run into:
NotImplementedError: Don't know how to handle tag ImmutableDict(datascript/DB)
Indeed, the Roam export begins with the tag #datascript/DB
, but I'm unclear how this is meant to be handled. Am I supposed to register custom types or handlers or something?
The source for BaseEdnType in the version on PyPi doesn't contain the implementation of __hash__()
. It should probably be updated.
Currently I'm working around this with the ghetto-tastic
import edn_format.edn_lex
edn_format.edn_lex.BaseEdnType.__hash__ = lambda s: hash(s._name)
But it would be nice not to have to.
[x #_ y z]
should yield [Symbol(x), Symbol(z)]
but instead it is failing saying: "Don't know how to handle tag _"
edn_format.loads('#:foo{:bar 1}')
fails with:
edn_format.exceptions.EDNDecodeError: Illegal character '#' with lexpos 0 in the area of ...#:foo{:bar 1}...
See Map namespace syntax in the Clojure Reader reference.
Ditto for auto-resolving namespaces (that is, #::{:foo 1}
), although I'm not sure whether that sort of thing is in the scope of this library.
{[1 2 3] "some numbers"}
fails to parse as native dict does not support hashing list.
File "/Library/Python/2.7/site-packages/edn_format/edn_parse.py", line 66, in p_map
p[0] = dict([terms[i:i+2] for i in range(0, len(terms), 2)]) # partition terms in pairs
TypeError: unhashable type: 'list'
One of the round-trip tests failed the first time I ran it:
rmunn@laptop:~/code/python/edn_format (master)$ python tests.py
......F.
======================================================================
FAIL: test_round_trip_conversion (__main__.EdnTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "tests.py", line 140, in test_round_trip_conversion
self.assertIn(step3, literal[1])
AssertionError: '#{:b (1 2 3) :a}' not found in ['#{:a (1 2 3) :b}', '#{(1 2 3) :a :b}', '#{:a :b (1 2 3)}', '#{:b :a (1 2 3)}']
----------------------------------------------------------------------
Ran 8 tests in 0.447s
FAILED (failures=1)
There are six possible permutations of :a
, :b
, and (1 2 3)
, but only four of those permutations are checked for in the test results. The following patch will check for the other two possible permutations:
diff --git a/tests.py b/tests.py
index df26228..6a272ab 100644
--- a/tests.py
+++ b/tests.py
@@ -126,7 +126,9 @@ class EdnTest(unittest.TestCase):
["+32.23M", "32.23M"],
["3.23e10", "32300000000.0"],
['#{:a (1 2 3) :b}', ['#{:a (1 2 3) :b}',
+ '#{:b (1 2 3) :a}',
'#{(1 2 3) :a :b}',
+ '#{(1 2 3) :b :a}',
'#{:a :b (1 2 3)}',
'#{:b :a (1 2 3)}']]
]
The patch is simple enough that I'm not going to bother with setting up a pull request, though I can do so if you'd prefer.
I'd expect this to return a map:
edn_format.loads('{"foo" "bar"}')
But it triggers some exception in PLY. I think I've tracked this down to a problem with the regular expression for matching strings - it seems to be being too greedy (and hence failing to create a dict
with an uneven number of items). This can be confirmed by trying to parse a list of strings:
edn_format.loads('["foo" "bar"]')
This returns ['foo" "bar']
(a list with one item), but I'd expect it to return ['foo', 'bar']
.
I've tried playing around with the definition for t_STRING
without much luck. I was wondering whether the regular expression from ANSI C could be used (L?\"(\\.|[^\\"])*\"
), but I'm not sure how to translate this to Python/PLY. Any ideas?
When you attempt to e.g. insert a key in an ImmutableDict object, nothing happens - no error is thrown or anything. This means that bugs can pass undetected, and is unfortunate. For example, I was parsing an EDN file and forgot that the map within it would become immutable, treating it as a regular Python dict by attempting to insert keys, then ran into strange behavior because my insertions were being silently ignored. See below for an example:
import edn_format
d = edn_format.immutable_dict.ImmutableDict({"apple": 1, "pear": 2})
d["banana"] = 3 # does not throw an error, just is silently ignored
print(d["banana"]) # oops, KeyError
With pip 20.1 installing fails due to the following error:
$ pip install edn_format
Collecting edn_format
Downloading edn_format-0.7.1.tar.gz (18 kB)
ERROR: Command errored out with exit status 1:
command: /home/runner/ardupilot/python3-env/bin/python3.6 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-o3m17uee/edn-format/setup.py'"'"'; __file__='"'"'/tmp/pip-install-o3m17uee/edn-format/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-z0lgf504
cwd: /tmp/pip-install-o3m17uee/edn-format/
Complete output (7 lines):
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-o3m17uee/edn-format/setup.py", line 11, in <module>
requirements = [str(ir.req) for ir in parse_requirements('requirements.txt', session=False)]
File "/tmp/pip-install-o3m17uee/edn-format/setup.py", line 11, in <listcomp>
requirements = [str(ir.req) for ir in parse_requirements('requirements.txt', session=False)]
AttributeError: 'ParsedRequirement' object has no attribute 'req'
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
It appears that parse_requirements
changed such that the setup is now broken. I don't use python enough to be sure, but it appears that using this directly is generally not considered a good idea.
When parsing exact precision floats that do not contain the dot (e.g. 12M
, 1M
or 0M
), the parser considers only the integer part and the output is an int
instead of a Decimal
.
Example of this behaviour:
>>> from edn_format import loads
>>> loads('1M')
1
>>> loads('{:a 1M}')
---------------------------------------------------------------------------
EDNDecodeError Traceback (most recent call last)
<ipython-input-5-94ea5fa6da57> in <module>
----> 1 loads('{:a 1M}')
~/miniconda3/envs/nucli-edn_format/lib/python3.6/site-packages/edn_format/edn_parse.py in parse(text, input_encoding)
201 Parse one object from the text. Return None if the text is empty.
202 """
--> 203 expressions = parse_all(text, input_encoding=input_encoding)
204 return expressions[0] if expressions else None
~/miniconda3/envs/nucli-edn_format/lib/python3.6/site-packages/edn_format/edn_parse.py in parse_all(text, input_encoding)
193 kwargs = dict(debug=True)
194 p = ply.yacc.yacc(**kwargs)
--> 195 expressions = p.parse(text, lexer=lex())
196 return list(expressions)
197
~/miniconda3/envs/nucli-edn_format/lib/python3.6/site-packages/ply/yacc.py in parse(self, input, lexer, debug, tracking, tokenfunc)
331 return self.parseopt(input, lexer, debug, tracking, tokenfunc)
332 else:
--> 333 return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
334
335
~/miniconda3/envs/nucli-edn_format/lib/python3.6/site-packages/ply/yacc.py in parseopt_notrack(self, input, lexer, debug, tracking, tokenfunc)
1118 del symstack[-plen:]
1119 self.state = state
-> 1120 p.callable(pslice)
1121 del statestack[-plen:]
1122 symstack.append(sym)
~/miniconda3/envs/nucli-edn_format/lib/python3.6/site-packages/edn_format/edn_parse.py in p_map(p)
107 terms = p[2]
108 if len(terms) % 2 != 0:
--> 109 raise EDNDecodeError('Even number of terms required for map')
110 # partition terms in pairs
111 p[0] = ImmutableDict((terms[i], terms[i+1]) for i in range(0, len(terms), 2))
EDNDecodeError: Even number of terms required for map
Another wrong behavior (it apparently interprets 1M
as 1 M
):
>>> loads('{:a 1M :b}')
{Keyword(a): 1, Symbol(M): Keyword(b)}
Expected behaviour:
>>> from edn_format import loads
>>> loads('1M')
Decimal('1')
>>> loads('{:a 1M}')
{Keyword(a): Decimal('1')}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.