swaroopch / edn_format Goto Github PK

View Code? Open in Web Editor NEW

131.0 131.0 30.0 222 KB

EDN reader and writer implementation in Python, using PLY (lex, yacc)

Home Page: https://swaroopch.com/2012/12/24/edn-format-python/

License: Other

Makefile 0.28% Python 99.52% Dockerfile 0.20%

clojure deserialization edn edn-format python serialization

edn_format's People

Contributors

Stargazers

Watchers

edn_format's Issues

cannot parse empty list

edn_format.loads("()") should parse to () but I get "Syntax error! LexToken(LIST_END,')',1,1)"

Load edn as a Python dict with string keys

Would you consider a PR to load edn form just as standard Python dict with string keys?

Push 0.5.1 to PyPI

Could you push the latest up to PyPI? Thanks!

Feature request: Pretty printing

It'd be nice if there were a way to pretty-print EDN, in the same way that the json module supports pretty-printing.

symbol with *

edn allows * in symbols, but this lib skips them. Note I didn't say that it raises an error, which is a different problem with this lib. ;)

SyntaxError: Illegal character '^' when parsing files with metadata

Example:

>>> edn_format.loads('^:test []')
Traceback (most recent call last):

  File "/home/p-himik/.local/share/virtualenvs/strayhorn-test-HPuom5qM/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-4-f8dc7704d4e7>", line 1, in <module>
    edn.loads('^:test []')

  File "/home/p-himik/.local/share/virtualenvs/strayhorn-test-HPuom5qM/lib/python3.6/site-packages/edn_format/edn_parse.py", line 160, in parse
    return p.parse(text, lexer=lex())

  File "/home/p-himik/.local/share/virtualenvs/strayhorn-test-HPuom5qM/lib/python3.6/site-packages/ply/yacc.py", line 333, in parse
    return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)

  File "/home/p-himik/.local/share/virtualenvs/strayhorn-test-HPuom5qM/lib/python3.6/site-packages/ply/yacc.py", line 1063, in parseopt_notrack
    lookahead = get_token()     # Get the next token

  File "/home/p-himik/.local/share/virtualenvs/strayhorn-test-HPuom5qM/lib/python3.6/site-packages/ply/lex.py", line 386, in token
    newtok = self.lexerrorf(tok)

  File "/home/p-himik/.local/share/virtualenvs/strayhorn-test-HPuom5qM/lib/python3.6/site-packages/edn_format/edn_lex.py", line 254, in t_error
    c=t.value[0], p=t.lexpos, a=t.value[0:100]))

  File "<string>", line unknown
SyntaxError: Illegal character '^' with lexpos 0 in the area of ...^:test []...

insufficient support for character values

The character values (like \space) are parsed as strings, which is how they are normally represented is Python code anyway. However, there are two (in broad sense) issues with your current implementation (PyPI version).

The first issue is that a lot of character literals are simply unsupported; more specifically (comparing with default edn implementation in Clojure, clojure.edn/read-string):

for characters in ASCII printable range (codes 33-126, they're the same in Unicode), all of them should be representable the default way (", !, \, etc.), but a quick check gives SyntaxError on 31 of them: !"#$%&'()*+,-./:;<=>?@[]^`{|}~
since edn specification doesn't seem to require reading unicode characters normally (though Clojure edn reader accepts them anyway... as well as whitespace), I've tried using allowed hex-code format (\uXXXX); however, when parsing "[\u007E]" (for example), loads() generated this output: [u'u', 7, Symbol(E)]

In general, Clojure edn parser seems to behave thus, upon encountering backslash:

keep reading&appending until a whitespace or a paren is encountered (by peeking in the stream)
if token is empty (""), read&append exactly once ("[\ 1 \]x]" is parsed as [\space 1 ] x])
discard backslash, process cases (hex-code, single Unicode char, known char name)

The second issue is that there is no way to emit character values with dumps().
Proposed solution would be providing a corresponding class for that specific purpose (named, for example, Char); alternatively, it might be used for reading as well, to keep type information (in this case, class probably should be extending str to keep backward-compatibility).

missing some support for symbols

/, *, ?, and % do not parse as Symbol

Throws error on non-RFC3339 timestamp, this isn't actually obligatory.

Relatively minor bug, but things like #inst "2014" (turns into Jan 1, 2014, 00:00:00.00) are valid edn.

import edn_format -> SyntaxError: invalid syntax

>>> import edn_format
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.3/site-packages/edn_format/__init__.py", line 2, in <module>
    from .edn_lex import Keyword, Symbol
  File "/usr/local/lib/python3.3/site-packages/edn_format/edn_lex.py", line 157
    print "Illegal character '%s'" % t.value[0]
                                 ^
SyntaxError: invalid syntax

HEX Numeric notation not supported

Hex notation e.g. 0xFFDC73 causes traceback:

Sample EDN:

{"AMBER" {:id "AMBER" :name "Amber 80%" :color 0xFFDC73}}

pip details:

"edn-format": {
            "hashes": [
                "sha256:aa3cc3041497b7e22386f96c44658e589597d9d26df39c4a77bbac0c8091ea88"
            ],
            "index": "pypi",
            "version": "==0.6.3"
        }

  return edn_format.loads(edn_str)
File "/usr/local/lib/python3.6/site-packages/edn_format/edn_parse.py", line 161, in parse
  return p.parse(text, lexer=lex())
File "/usr/local/lib/python3.6/site-packages/ply/yacc.py", line 333, in parse
   return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
File "/usr/local/lib/python3.6/site-packages/ply/yacc.py", line 1120, in parseopt_notrack
   p.callable(pslice)
File "/usr/local/lib/python3.6/site-packages/edn_format/edn_parse.py", line 78, in p_map
   raise EDNDecodeError('Even number of terms required for map')
edn_format.exceptions.EDNDecodeError: Even number of terms required for map

vectors are parsed as mutable lists

Now, emitting mutable values may or may not be an issue depending on the point of view, but since you went to the trouble of implementing immutable dicts, you probably want them to be completely immutable.

data = edn_format.loads('{5 [1 2 3]}')
print( type(data) ) # <class 'edn_format.immutable_dict.ImmutableDict'>
data[5].append(8)
print(data) # {5: [1, 2, 3, 8]}

...Also, Python lists are unhashable, so both "{[1 2 3] 5}" and "#{[1 2 3] 5}" generate TypeError.

Proposed solution is to add a corresponding class (named, for example, ImmutableVector), implemented in similar fashion to ImmutableDict; however, whether it extends list or not (it would make sense to implement it using tuples), it would break code that mutates parsed vectors.

set equality not working with symbols

set([1,2,3,edn_format.Symbol("a")]) == set([1,2,3,edn_format.Symbol("a")]) yields False

Yet: edn_format.Symbol("a") == edn_format.Symbol("a") yields True

Is v0.7.0 uploaded properly?

@bfontaine It appears python setup.py bdist_wheel is not a valid command any more, I uploaded v0.7.0 but I'm not certain if the uploaded package is general or Linux-only. Can you please help me with any guidance here?

https://pypi.org/project/edn-format/#files

Thanks!

issue parsing big numbers

specifically 45e+43 and 45.4e+43M

DeprecationWarnings with Python 3.7

When warnings are enabled, warnings like these are printed on importing edn_format with Python 3.7:

/Users/dmajda/tmp/warnings/edn_format/.venv/lib/python3.7/site-packages/edn_format/immutable_dict.py:7: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  class ImmutableDict(collections.Mapping):
/Users/dmajda/tmp/warnings/edn_format/.venv/lib/python3.7/site-packages/edn_format/immutable_list.py:8: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  class ImmutableList(collections.Sequence, collections.Hashable):
/Users/dmajda/tmp/warnings/edn_format/.venv/lib/python3.7/site-packages/edn_format/immutable_list.py:8: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  class ImmutableList(collections.Sequence, collections.Hashable):

It would be good if edn_format was warning-free. In our specific case, the warnings pollute the test suite output.

Steps to reproduce

$ python3 --version
Python 3.7.0
$ python3 -m venv .venv
$ . .venv/bin/activate
$ pip3 install edn_format
[...]
$ python3 -Wall -c 'import edn_format'
/Users/dmajda/tmp/warnings/edn_format/.venv/lib/python3.7/site-packages/edn_format/immutable_dict.py:7: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  class ImmutableDict(collections.Mapping):
/Users/dmajda/tmp/warnings/edn_format/.venv/lib/python3.7/site-packages/edn_format/immutable_list.py:8: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  class ImmutableList(collections.Sequence, collections.Hashable):
/Users/dmajda/tmp/warnings/edn_format/.venv/lib/python3.7/site-packages/edn_format/immutable_list.py:8: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  class ImmutableList(collections.Sequence, collections.Hashable):

Couldn't open 'parser.out' & 'edn_format.parsetab'

import edn_format
edn_format.dumps({1,2,3})
'#{1 2 3}'

edn_format.loads("[1 true nil]")
WARNING: Couldn't open 'parser.out'. [Errno 30] Read-only file system: 'python3-3.7.5-env/lib/python3.7/site-packages/edn_format/parser.out'
Generating LALR tables
WARNING: Couldn't create 'edn_format.parsetab'. [Errno 30] Read-only file system: 'python3-3.7.5-env/lib/python3.7/site-packages/edn_format/parsetab.py'
[1, True, None]

edn_format.loads_all("1 2 3 4")
WARNING: Couldn't open 'parser.out'. [Errno 30] Read-only file system: 'python3-3.7.5-env/lib/python3.7/site-packages/edn_format/parser.out'
Generating LALR tables
WARNING: Couldn't create 'edn_format.parsetab'. [Errno 30] Read-only file system: 'python3-3.7.5-env/lib/python3.7/site-packages/edn_format/parsetab.py'
[1, 2, 3, 4]

Support for edn_format.loads("#tag {:a :b}")

Support for edn_format.loads("#tag {:a :b}").
Thanks

Inconsistent mapping of values

Values are mapped inconsistently, i. e.

loads('{ :a false, :b false }')

gives

{Keyword(b): False, Keyword(a): Symbol(false)}

.

Disregard :)

tag_class has to be a class.

Since we're targeting the clojure data format, why not support tag handlers through simple functions instead of requiring a TaggedElement class?

The nice thing is that if you only want to go in one direction, =edn_format.add_tag(myfunc, 'mytag')= already works.

It would be relatively simple to just take one or two functions instead of a class for handlers.

It would also be nice if the module functions like =edn_format.dumps= and =edn_format.loads= took optional functions as parameters for handlers.

pip install doesn't work

Downloading/unpacking edn-format
Downloading edn_format-0.2.tar.gz
Running setup.py egg_info for package edn-format
Traceback (most recent call last):
File "", line 14, in
File "/home/mariano/build/edn-format/setup.py", line 13, in
long_description=open('README.md').read(),
IOError: [Errno 2] No such file or directory: 'README.md'
Complete output from command python setup.py egg_info:
Traceback (most recent call last):

File "", line 14, in

File "/home/mariano/build/edn-format/setup.py", line 13, in

long_description=open('README.md').read(),

IOError: [Errno 2] No such file or directory: 'README.md'

Error in serialization of escape sequences

The library produces wrong EDN output for escaped strings. For example

edn_format.dumps('\"')

produces the invalid EDN representation

""",

where the correct representation would be

"\"".

Less strict versions in requirement.txt

Is it neccessary for the requirement pytz==2016.10 to be so strict? It seems to run the test fine with any other version I've tried (between 2011d and 2017.2) so having such a strict requirements.txt seems to me to be just risking causing conflicts.

Symbol parsing is incorrect and does not follow Clojure's behavior (the edn spec is wrong)

Per Datomic's edn requirements (made by the same people that worked on Clojure adn edn).

Patterns that are actually accepted by Clojure's edn reader can be found here: https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/EdnReader.java#L27

This must be followed in order for any edn library to work with Datomic, which is often the point of edn support to begin with.

Cannot parse Roam EDN exports

I discovered this library looking for a tool to parse Roam EDN exports. (Roam is a Clojure project, so EDN is their most native export format.)

Trying to parse a standard export with this library, I immediately run into:

NotImplementedError: Don't know how to handle tag ImmutableDict(datascript/DB)

Indeed, the Roam export begins with the tag #datascript/DB, but I'm unclear how this is meant to be handled. Am I supposed to register custom types or handlers or something?

pypi packaged version is out of date

The source for BaseEdnType in the version on PyPi doesn't contain the implementation of __hash__(). It should probably be updated.

Currently I'm working around this with the ghetto-tastic

import edn_format.edn_lex
edn_format.edn_lex.BaseEdnType.__hash__ = lambda s: hash(s._name)

But it would be nice not to have to.

not handling discard as expected

[x #_ y z] should yield [Symbol(x), Symbol(z)] but instead it is failing saying: "Don't know how to handle tag _"

Missing support for map namespace syntax

edn_format.loads('#:foo{:bar 1}')

fails with:

edn_format.exceptions.EDNDecodeError: Illegal character '#' with lexpos 0 in the area of ...#:foo{:bar 1}...

See Map namespace syntax in the Clojure Reader reference.

Ditto for auto-resolving namespaces (that is, #::{:foo 1}), although I'm not sure whether that sort of thing is in the scope of this library.

complex keys not supported in Map

{[1 2 3] "some numbers"} fails to parse as native dict does not support hashing list.

File "/Library/Python/2.7/site-packages/edn_format/edn_parse.py", line 66, in p_map
p[0] = dict([terms[i:i+2] for i in range(0, len(terms), 2)]) # partition terms in pairs
TypeError: unhashable type: 'list'

[PATCH] Failing test for set round-tripping

One of the round-trip tests failed the first time I ran it:

rmunn@laptop:~/code/python/edn_format (master)$ python tests.py 
......F.
======================================================================
FAIL: test_round_trip_conversion (__main__.EdnTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests.py", line 140, in test_round_trip_conversion
    self.assertIn(step3, literal[1])
AssertionError: '#{:b (1 2 3) :a}' not found in ['#{:a (1 2 3) :b}', '#{(1 2 3) :a :b}', '#{:a :b (1 2 3)}', '#{:b :a (1 2 3)}']

----------------------------------------------------------------------
Ran 8 tests in 0.447s

FAILED (failures=1)

There are six possible permutations of :a, :b, and (1 2 3), but only four of those permutations are checked for in the test results. The following patch will check for the other two possible permutations:

diff --git a/tests.py b/tests.py
index df26228..6a272ab 100644
--- a/tests.py
+++ b/tests.py
@@ -126,7 +126,9 @@ class EdnTest(unittest.TestCase):
             ["+32.23M", "32.23M"],
             ["3.23e10", "32300000000.0"],
             ['#{:a (1 2 3) :b}', ['#{:a (1 2 3) :b}',
+                                  '#{:b (1 2 3) :a}',
                                   '#{(1 2 3) :a :b}',
+                                  '#{(1 2 3) :b :a}',
                                   '#{:a :b (1 2 3)}',
                                   '#{:b :a (1 2 3)}']]
         ]

The patch is simple enough that I'm not going to bother with setting up a pull request, though I can do so if you'd prefer.

Can't pass map with string key and value.

I'd expect this to return a map:

edn_format.loads('{"foo" "bar"}')

But it triggers some exception in PLY. I think I've tracked this down to a problem with the regular expression for matching strings - it seems to be being too greedy (and hence failing to create a dict with an uneven number of items). This can be confirmed by trying to parse a list of strings:

edn_format.loads('["foo" "bar"]')

This returns ['foo" "bar'] (a list with one item), but I'd expect it to return ['foo', 'bar'].

I've tried playing around with the definition for t_STRING without much luck. I was wondering whether the regular expression from ANSI C could be used (L?\"(\\.|[^\\"])*\"), but I'm not sure how to translate this to Python/PLY. Any ideas?

ImmutableDicts should throw errors on attempted mutation

When you attempt to e.g. insert a key in an ImmutableDict object, nothing happens - no error is thrown or anything. This means that bugs can pass undetected, and is unfortunate. For example, I was parsing an EDN file and forgot that the map within it would become immutable, treating it as a regular Python dict by attempting to insert keys, then ran into strange behavior because my insertions were being silently ignored. See below for an example:

import edn_format
d = edn_format.immutable_dict.ImmutableDict({"apple": 1, "pear": 2})
d["banana"] = 3                 # does not throw an error, just is silently ignored
print(d["banana"])              # oops, KeyError

Can't install with pip 20

With pip 20.1 installing fails due to the following error:

$ pip install edn_format
Collecting edn_format
  Downloading edn_format-0.7.1.tar.gz (18 kB)
    ERROR: Command errored out with exit status 1:
     command: /home/runner/ardupilot/python3-env/bin/python3.6 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-o3m17uee/edn-format/setup.py'"'"'; __file__='"'"'/tmp/pip-install-o3m17uee/edn-format/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-z0lgf504
         cwd: /tmp/pip-install-o3m17uee/edn-format/
    Complete output (7 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-o3m17uee/edn-format/setup.py", line 11, in <module>
        requirements = [str(ir.req) for ir in parse_requirements('requirements.txt', session=False)]
      File "/tmp/pip-install-o3m17uee/edn-format/setup.py", line 11, in <listcomp>
        requirements = [str(ir.req) for ir in parse_requirements('requirements.txt', session=False)]
    AttributeError: 'ParsedRequirement' object has no attribute 'req'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

It appears that parse_requirements changed such that the setup is now broken. I don't use python enough to be sure, but it appears that using this directly is generally not considered a good idea.

Cannot correctly parse exact precision floats

When parsing exact precision floats that do not contain the dot (e.g. 12M, 1M or 0M), the parser considers only the integer part and the output is an int instead of a Decimal.

Example of this behaviour:

>>> from edn_format import loads

>>> loads('1M')
1

>>> loads('{:a 1M}')
---------------------------------------------------------------------------
EDNDecodeError                            Traceback (most recent call last)
<ipython-input-5-94ea5fa6da57> in <module>
----> 1 loads('{:a 1M}')

~/miniconda3/envs/nucli-edn_format/lib/python3.6/site-packages/edn_format/edn_parse.py in parse(text, input_encoding)
    201     Parse one object from the text. Return None if the text is empty.
    202     """
--> 203     expressions = parse_all(text, input_encoding=input_encoding)
    204     return expressions[0] if expressions else None

~/miniconda3/envs/nucli-edn_format/lib/python3.6/site-packages/edn_format/edn_parse.py in parse_all(text, input_encoding)
    193         kwargs = dict(debug=True)
    194     p = ply.yacc.yacc(**kwargs)
--> 195     expressions = p.parse(text, lexer=lex())
    196     return list(expressions)
    197

~/miniconda3/envs/nucli-edn_format/lib/python3.6/site-packages/ply/yacc.py in parse(self, input, lexer, debug, tracking, tokenfunc)
    331             return self.parseopt(input, lexer, debug, tracking, tokenfunc)
    332         else:
--> 333             return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
    334
    335

~/miniconda3/envs/nucli-edn_format/lib/python3.6/site-packages/ply/yacc.py in parseopt_notrack(self, input, lexer, debug, tracking, tokenfunc)
   1118                             del symstack[-plen:]
   1119                             self.state = state
-> 1120                             p.callable(pslice)
   1121                             del statestack[-plen:]
   1122                             symstack.append(sym)

~/miniconda3/envs/nucli-edn_format/lib/python3.6/site-packages/edn_format/edn_parse.py in p_map(p)
    107     terms = p[2]
    108     if len(terms) % 2 != 0:
--> 109         raise EDNDecodeError('Even number of terms required for map')
    110     # partition terms in pairs
    111     p[0] = ImmutableDict((terms[i], terms[i+1]) for i in range(0, len(terms), 2))

EDNDecodeError: Even number of terms required for map

Another wrong behavior (it apparently interprets 1M as 1 M):

>>> loads('{:a 1M :b}')

{Keyword(a): 1, Symbol(M): Keyword(b)}

Expected behaviour:

>>> from edn_format import loads

>>> loads('1M')
Decimal('1')

>>> loads('{:a 1M}')
{Keyword(a): Decimal('1')}

swaroopch / edn_format Goto Github PK

edn_format's People

Contributors

Stargazers

Watchers

Forkers

edn_format's Issues

Steps to reproduce

Recommend Projects

Recommend Topics

Recommend Org