Git Product home page Git Product logo

edn_format's People

Contributors

ashinohara avatar bfontaine avatar bitemyapp avatar czan avatar gabrielferreira95 avatar hcarvalhoalves avatar ivan avatar konr avatar lexofleviafan avatar marianoguerra avatar olivergeorge avatar petrounias avatar swaroopch avatar thiagokokada avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

edn_format's Issues

Error in serialization of escape sequences

The library produces wrong EDN output for escaped strings. For example

edn_format.dumps('\"')

produces the invalid EDN representation

""",

where the correct representation would be

"\"".

[PATCH] Failing test for set round-tripping

One of the round-trip tests failed the first time I ran it:

rmunn@laptop:~/code/python/edn_format (master)$ python tests.py 
......F.
======================================================================
FAIL: test_round_trip_conversion (__main__.EdnTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "tests.py", line 140, in test_round_trip_conversion
    self.assertIn(step3, literal[1])
AssertionError: '#{:b (1 2 3) :a}' not found in ['#{:a (1 2 3) :b}', '#{(1 2 3) :a :b}', '#{:a :b (1 2 3)}', '#{:b :a (1 2 3)}']

----------------------------------------------------------------------
Ran 8 tests in 0.447s

FAILED (failures=1)

There are six possible permutations of :a, :b, and (1 2 3), but only four of those permutations are checked for in the test results. The following patch will check for the other two possible permutations:

diff --git a/tests.py b/tests.py
index df26228..6a272ab 100644
--- a/tests.py
+++ b/tests.py
@@ -126,7 +126,9 @@ class EdnTest(unittest.TestCase):
             ["+32.23M", "32.23M"],
             ["3.23e10", "32300000000.0"],
             ['#{:a (1 2 3) :b}', ['#{:a (1 2 3) :b}',
+                                  '#{:b (1 2 3) :a}',
                                   '#{(1 2 3) :a :b}',
+                                  '#{(1 2 3) :b :a}',
                                   '#{:a :b (1 2 3)}',
                                   '#{:b :a (1 2 3)}']]
         ]

The patch is simple enough that I'm not going to bother with setting up a pull request, though I can do so if you'd prefer.

Cannot parse Roam EDN exports

I discovered this library looking for a tool to parse Roam EDN exports. (Roam is a Clojure project, so EDN is their most native export format.)

Trying to parse a standard export with this library, I immediately run into:

NotImplementedError: Don't know how to handle tag ImmutableDict(datascript/DB)

Indeed, the Roam export begins with the tag #datascript/DB, but I'm unclear how this is meant to be handled. Am I supposed to register custom types or handlers or something?

Can't pass map with string key and value.

I'd expect this to return a map:

edn_format.loads('{"foo" "bar"}')

But it triggers some exception in PLY. I think I've tracked this down to a problem with the regular expression for matching strings - it seems to be being too greedy (and hence failing to create a dict with an uneven number of items). This can be confirmed by trying to parse a list of strings:

edn_format.loads('["foo" "bar"]')

This returns ['foo" "bar'] (a list with one item), but I'd expect it to return ['foo', 'bar'].

I've tried playing around with the definition for t_STRING without much luck. I was wondering whether the regular expression from ANSI C could be used (L?\"(\\.|[^\\"])*\"), but I'm not sure how to translate this to Python/PLY. Any ideas?

insufficient support for character values

The character values (like \space) are parsed as strings, which is how they are normally represented is Python code anyway. However, there are two (in broad sense) issues with your current implementation (PyPI version).

The first issue is that a lot of character literals are simply unsupported; more specifically (comparing with default edn implementation in Clojure, clojure.edn/read-string):

  • for characters in ASCII printable range (codes 33-126, they're the same in Unicode), all of them should be representable the default way (", !, \, etc.), but a quick check gives SyntaxError on 31 of them: !"#$%&'()*+,-./:;<=>?@[]^`{|}~
  • since edn specification doesn't seem to require reading unicode characters normally (though Clojure edn reader accepts them anyway... as well as whitespace), I've tried using allowed hex-code format (\uXXXX); however, when parsing "[\u007E]" (for example), loads() generated this output: [u'u', 7, Symbol(E)]

In general, Clojure edn parser seems to behave thus, upon encountering backslash:

  1. keep reading&appending until a whitespace or a paren is encountered (by peeking in the stream)
  2. if token is empty (""), read&append exactly once ("[\ 1 \]x]" is parsed as [\space 1 ] x])
  3. discard backslash, process cases (hex-code, single Unicode char, known char name)

The second issue is that there is no way to emit character values with dumps().
Proposed solution would be providing a corresponding class for that specific purpose (named, for example, Char); alternatively, it might be used for reading as well, to keep type information (in this case, class probably should be extending str to keep backward-compatibility).

tag_class has to be a class.

Since we're targeting the clojure data format, why not support tag handlers through simple functions instead of requiring a TaggedElement class?

The nice thing is that if you only want to go in one direction, =edn_format.add_tag(myfunc, 'mytag')= already works.

It would be relatively simple to just take one or two functions instead of a class for handlers.

It would also be nice if the module functions like =edn_format.dumps= and =edn_format.loads= took optional functions as parameters for handlers.

symbol with *

edn allows * in symbols, but this lib skips them. Note I didn't say that it raises an error, which is a different problem with this lib. ;)

Inconsistent mapping of values

Values are mapped inconsistently, i. e.

loads('{ :a false, :b false }')

gives

{Keyword(b): False, Keyword(a): Symbol(false)}

Cannot correctly parse exact precision floats

When parsing exact precision floats that do not contain the dot (e.g. 12M, 1M or 0M), the parser considers only the integer part and the output is an int instead of a Decimal.

Example of this behaviour:

>>> from edn_format import loads

>>> loads('1M')
1

>>> loads('{:a 1M}')
---------------------------------------------------------------------------
EDNDecodeError                            Traceback (most recent call last)
<ipython-input-5-94ea5fa6da57> in <module>
----> 1 loads('{:a 1M}')

~/miniconda3/envs/nucli-edn_format/lib/python3.6/site-packages/edn_format/edn_parse.py in parse(text, input_encoding)
    201     Parse one object from the text. Return None if the text is empty.
    202     """
--> 203     expressions = parse_all(text, input_encoding=input_encoding)
    204     return expressions[0] if expressions else None

~/miniconda3/envs/nucli-edn_format/lib/python3.6/site-packages/edn_format/edn_parse.py in parse_all(text, input_encoding)
    193         kwargs = dict(debug=True)
    194     p = ply.yacc.yacc(**kwargs)
--> 195     expressions = p.parse(text, lexer=lex())
    196     return list(expressions)
    197

~/miniconda3/envs/nucli-edn_format/lib/python3.6/site-packages/ply/yacc.py in parse(self, input, lexer, debug, tracking, tokenfunc)
    331             return self.parseopt(input, lexer, debug, tracking, tokenfunc)
    332         else:
--> 333             return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
    334
    335

~/miniconda3/envs/nucli-edn_format/lib/python3.6/site-packages/ply/yacc.py in parseopt_notrack(self, input, lexer, debug, tracking, tokenfunc)
   1118                             del symstack[-plen:]
   1119                             self.state = state
-> 1120                             p.callable(pslice)
   1121                             del statestack[-plen:]
   1122                             symstack.append(sym)

~/miniconda3/envs/nucli-edn_format/lib/python3.6/site-packages/edn_format/edn_parse.py in p_map(p)
    107     terms = p[2]
    108     if len(terms) % 2 != 0:
--> 109         raise EDNDecodeError('Even number of terms required for map')
    110     # partition terms in pairs
    111     p[0] = ImmutableDict((terms[i], terms[i+1]) for i in range(0, len(terms), 2))

EDNDecodeError: Even number of terms required for map

Another wrong behavior (it apparently interprets 1M as 1 M):

>>> loads('{:a 1M :b}')

{Keyword(a): 1, Symbol(M): Keyword(b)}

Expected behaviour:

>>> from edn_format import loads

>>> loads('1M')
Decimal('1')

>>> loads('{:a 1M}')
{Keyword(a): Decimal('1')}

.

Disregard :)

SyntaxError: Illegal character '^' when parsing files with metadata

Example:

>>> edn_format.loads('^:test []')
Traceback (most recent call last):

  File "/home/p-himik/.local/share/virtualenvs/strayhorn-test-HPuom5qM/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-4-f8dc7704d4e7>", line 1, in <module>
    edn.loads('^:test []')

  File "/home/p-himik/.local/share/virtualenvs/strayhorn-test-HPuom5qM/lib/python3.6/site-packages/edn_format/edn_parse.py", line 160, in parse
    return p.parse(text, lexer=lex())

  File "/home/p-himik/.local/share/virtualenvs/strayhorn-test-HPuom5qM/lib/python3.6/site-packages/ply/yacc.py", line 333, in parse
    return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)

  File "/home/p-himik/.local/share/virtualenvs/strayhorn-test-HPuom5qM/lib/python3.6/site-packages/ply/yacc.py", line 1063, in parseopt_notrack
    lookahead = get_token()     # Get the next token

  File "/home/p-himik/.local/share/virtualenvs/strayhorn-test-HPuom5qM/lib/python3.6/site-packages/ply/lex.py", line 386, in token
    newtok = self.lexerrorf(tok)

  File "/home/p-himik/.local/share/virtualenvs/strayhorn-test-HPuom5qM/lib/python3.6/site-packages/edn_format/edn_lex.py", line 254, in t_error
    c=t.value[0], p=t.lexpos, a=t.value[0:100]))

  File "<string>", line unknown
SyntaxError: Illegal character '^' with lexpos 0 in the area of ...^:test []...

cannot parse empty list

edn_format.loads("()") should parse to () but I get "Syntax error! LexToken(LIST_END,')',1,1)"

Couldn't open 'parser.out' & 'edn_format.parsetab'

import edn_format
edn_format.dumps({1,2,3})
'#{1 2 3}'

edn_format.loads("[1 true nil]")
WARNING: Couldn't open 'parser.out'. [Errno 30] Read-only file system: 'python3-3.7.5-env/lib/python3.7/site-packages/edn_format/parser.out'
Generating LALR tables
WARNING: Couldn't create 'edn_format.parsetab'. [Errno 30] Read-only file system: 'python3-3.7.5-env/lib/python3.7/site-packages/edn_format/parsetab.py'
[1, True, None]

edn_format.loads_all("1 2 3 4")
WARNING: Couldn't open 'parser.out'. [Errno 30] Read-only file system: 'python3-3.7.5-env/lib/python3.7/site-packages/edn_format/parser.out'
Generating LALR tables
WARNING: Couldn't create 'edn_format.parsetab'. [Errno 30] Read-only file system: 'python3-3.7.5-env/lib/python3.7/site-packages/edn_format/parsetab.py'
[1, 2, 3, 4]

HEX Numeric notation not supported

Hex notation e.g. 0xFFDC73 causes traceback:

Sample EDN:

{"AMBER" {:id "AMBER" :name "Amber 80%" :color 0xFFDC73}}

pip details:

"edn-format": {
            "hashes": [
                "sha256:aa3cc3041497b7e22386f96c44658e589597d9d26df39c4a77bbac0c8091ea88"
            ],
            "index": "pypi",
            "version": "==0.6.3"
        }

  return edn_format.loads(edn_str)
File "/usr/local/lib/python3.6/site-packages/edn_format/edn_parse.py", line 161, in parse
  return p.parse(text, lexer=lex())
File "/usr/local/lib/python3.6/site-packages/ply/yacc.py", line 333, in parse
   return self.parseopt_notrack(input, lexer, debug, tracking, tokenfunc)
File "/usr/local/lib/python3.6/site-packages/ply/yacc.py", line 1120, in parseopt_notrack
   p.callable(pslice)
File "/usr/local/lib/python3.6/site-packages/edn_format/edn_parse.py", line 78, in p_map
   raise EDNDecodeError('Even number of terms required for map')
edn_format.exceptions.EDNDecodeError: Even number of terms required for map

Can't install with pip 20

With pip 20.1 installing fails due to the following error:

$ pip install edn_format
Collecting edn_format
  Downloading edn_format-0.7.1.tar.gz (18 kB)
    ERROR: Command errored out with exit status 1:
     command: /home/runner/ardupilot/python3-env/bin/python3.6 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-o3m17uee/edn-format/setup.py'"'"'; __file__='"'"'/tmp/pip-install-o3m17uee/edn-format/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-z0lgf504
         cwd: /tmp/pip-install-o3m17uee/edn-format/
    Complete output (7 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-o3m17uee/edn-format/setup.py", line 11, in <module>
        requirements = [str(ir.req) for ir in parse_requirements('requirements.txt', session=False)]
      File "/tmp/pip-install-o3m17uee/edn-format/setup.py", line 11, in <listcomp>
        requirements = [str(ir.req) for ir in parse_requirements('requirements.txt', session=False)]
    AttributeError: 'ParsedRequirement' object has no attribute 'req'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

It appears that parse_requirements changed such that the setup is now broken. I don't use python enough to be sure, but it appears that using this directly is generally not considered a good idea.

pip install doesn't work

Downloading/unpacking edn-format
Downloading edn_format-0.2.tar.gz
Running setup.py egg_info for package edn-format
Traceback (most recent call last):
File "", line 14, in
File "/home/mariano/build/edn-format/setup.py", line 13, in
long_description=open('README.md').read(),
IOError: [Errno 2] No such file or directory: 'README.md'
Complete output from command python setup.py egg_info:
Traceback (most recent call last):

File "", line 14, in

File "/home/mariano/build/edn-format/setup.py", line 13, in

long_description=open('README.md').read(),

IOError: [Errno 2] No such file or directory: 'README.md'

import edn_format -> SyntaxError: invalid syntax

>>> import edn_format
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.3/site-packages/edn_format/__init__.py", line 2, in <module>
    from .edn_lex import Keyword, Symbol
  File "/usr/local/lib/python3.3/site-packages/edn_format/edn_lex.py", line 157
    print "Illegal character '%s'" % t.value[0]
                                 ^
SyntaxError: invalid syntax

DeprecationWarnings with Python 3.7

When warnings are enabled, warnings like these are printed on importing edn_format with Python 3.7:

/Users/dmajda/tmp/warnings/edn_format/.venv/lib/python3.7/site-packages/edn_format/immutable_dict.py:7: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  class ImmutableDict(collections.Mapping):
/Users/dmajda/tmp/warnings/edn_format/.venv/lib/python3.7/site-packages/edn_format/immutable_list.py:8: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  class ImmutableList(collections.Sequence, collections.Hashable):
/Users/dmajda/tmp/warnings/edn_format/.venv/lib/python3.7/site-packages/edn_format/immutable_list.py:8: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  class ImmutableList(collections.Sequence, collections.Hashable):

It would be good if edn_format was warning-free. In our specific case, the warnings pollute the test suite output.

Steps to reproduce

$ python3 --version
Python 3.7.0
$ python3 -m venv .venv
$ . .venv/bin/activate
$ pip3 install edn_format
[...]
$ python3 -Wall -c 'import edn_format'
/Users/dmajda/tmp/warnings/edn_format/.venv/lib/python3.7/site-packages/edn_format/immutable_dict.py:7: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  class ImmutableDict(collections.Mapping):
/Users/dmajda/tmp/warnings/edn_format/.venv/lib/python3.7/site-packages/edn_format/immutable_list.py:8: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  class ImmutableList(collections.Sequence, collections.Hashable):
/Users/dmajda/tmp/warnings/edn_format/.venv/lib/python3.7/site-packages/edn_format/immutable_list.py:8: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  class ImmutableList(collections.Sequence, collections.Hashable):

set equality not working with symbols

set([1,2,3,edn_format.Symbol("a")]) == set([1,2,3,edn_format.Symbol("a")]) yields False

Yet: edn_format.Symbol("a") == edn_format.Symbol("a") yields True

pypi packaged version is out of date

The source for BaseEdnType in the version on PyPi doesn't contain the implementation of __hash__(). It should probably be updated.

Currently I'm working around this with the ghetto-tastic

import edn_format.edn_lex
edn_format.edn_lex.BaseEdnType.__hash__ = lambda s: hash(s._name)

But it would be nice not to have to.

Missing support for map namespace syntax

edn_format.loads('#:foo{:bar 1}')

fails with:

edn_format.exceptions.EDNDecodeError: Illegal character '#' with lexpos 0 in the area of ...#:foo{:bar 1}...

See Map namespace syntax in the Clojure Reader reference.

Ditto for auto-resolving namespaces (that is, #::{:foo 1}), although I'm not sure whether that sort of thing is in the scope of this library.

Less strict versions in requirement.txt

Is it neccessary for the requirement pytz==2016.10 to be so strict? It seems to run the test fine with any other version I've tried (between 2011d and 2017.2) so having such a strict requirements.txt seems to me to be just risking causing conflicts.

complex keys not supported in Map

{[1 2 3] "some numbers"} fails to parse as native dict does not support hashing list.

File "/Library/Python/2.7/site-packages/edn_format/edn_parse.py", line 66, in p_map
p[0] = dict([terms[i:i+2] for i in range(0, len(terms), 2)]) # partition terms in pairs
TypeError: unhashable type: 'list'

vectors are parsed as mutable lists

Now, emitting mutable values may or may not be an issue depending on the point of view, but since you went to the trouble of implementing immutable dicts, you probably want them to be completely immutable.

data = edn_format.loads('{5 [1 2 3]}')
print( type(data) ) # <class 'edn_format.immutable_dict.ImmutableDict'>
data[5].append(8)
print(data) # {5: [1, 2, 3, 8]}

...Also, Python lists are unhashable, so both "{[1 2 3] 5}" and "#{[1 2 3] 5}" generate TypeError.

Proposed solution is to add a corresponding class (named, for example, ImmutableVector), implemented in similar fashion to ImmutableDict; however, whether it extends list or not (it would make sense to implement it using tuples), it would break code that mutates parsed vectors.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.