amazon-ion / ion-python Goto Github PK

View Code? Open in Web Editor NEW

252.0 252.0 50.0 814 KB

A Python implementation of Amazon Ion.

Home Page: https://amazon-ion.github.io/ion-docs/

License: Apache License 2.0

Python 92.05% C 7.95%

ion-python's People

Contributors

Stargazers

Watchers

ion-python's Issues

Managed Reader

Implement a raw reader decorator that process system values.

Add license to sdist

For packaging, would you mind including the license, tests, etc into the sdist uploaded to PyPI.

See .spec at https://build.opensuse.org/package/show/home:jayvdb:moban/python-amazon.ion

Implement load and dump style APIs

Provide APIs compatible or similar to simplejson and the built-in json module over the non-blocking event APIs.

Often we have scenarios where input comes as JSON, converted to ION to do some business requirements, append data, etc, and then needs to be converted back to JSON. IonJava and others support the capability to translate ION to JSON. Add this extension for Python please.

Is it possible to use Enum from standard library instead of creating a custom class called Enum?

https://github.com/amzn/ion-python/blob/master/amazon/ion/util.py#L59

For python2.7 support, you can declare dependency on enum with Python license.

Add `str()`/`repr()` Implementation to `IonPyDict`

Issue

IonPyDict does not have a nice string representation when printed to the console, e.g. I currently the following output:

<amazon.ion.simple_types._ion_type_for.<locals>.IonPyValueType object at 0x100947cc0>

when running this sample code:

import io
import amazon.ion.simpleion as ion

myobj = {'StrKey': 'StrValue', 'IntKey': 123}
fp = io.BytesIO()
ion.dump(myobj, fp)
ion_dict = ion.load(io.BytesIO(fp.getvalue()))
print(ion_dict)

Expected Behavior

Similar output to printing the native Python dict directly, e.g.:

{'StrKey': 'StrValue', 'IntKey': 123}

Add simpleion support for structs with multiple values for the same key

Simpleion maps Ion structs to python dicts, whose keys are de-duped. IonPyDict needs to support duplicate mappings when iterated and compared.

Successful resolution of this issue will involve removal of the following files from the test_vectors.py skip list:
good/equivs/structsFieldsRepeatedNames.ion
good/nonequivs/structs.ion

Fix Python 3.3/3.4 Compatibility

Python 3.3/3.4 is disabled in our Travis CI configuration. The main reason being that we have some code that does the % operator with bytes (which works in Python 3.5).

We need to fix this and re-enable the Travis CI configuration.

Binary Ion Writer

Implement non-blocking Ion binary writer.

Mapping of tuple type

In 7651c7a I see that a python tuple is mapped to IonType.LIST. My expectation is that it would be to IonType.SEXP. At least that's what I have in my workspace and it works as I would expect.

Isn't mapping it List effectively losing data?

Shorten Package Name

Currently the package is nested as amazon.ion, this is somewhat verbose and requires the use of namespace packages which have idiomatic support in Python 2, but are non-standard. I propose we shorten the name to a single level of depth. Some possibilities:

amzion - my preferred choice.
ion - potentially conflicting with other modules with a sufficiently general name.
amazonion - too long in my opinion.
anion - a pun on negatively charged Ion and "an Ion" implementation. Negative connotation may make this not a good choice.

Helper to maintain 'Noneness' for `None` values converted to `IonPyNull`

Issue

simpleion does not provide a way for None python values to be converted back into None from IonPyNull.

The following example code prints:

import io
import amazon.ion.simpleion as ion

def print_is_none(obj):
    if obj['NoneKey'] is None:
        print("Value is None")
    else:
        print("Value is not None")

myobj = {'NoneKey': None}
print_is_none(myobj)

fp = io.BytesIO()
ion.dump(myobj, fp)
ion_dict = ion.load(io.BytesIO(fp.getvalue()))

print_is_none(ion_dict)

prints:

Value is None
Value is not None

because ion_dict['NoneKey'] returns an instance of IonPyNull.

Expected Behavior

simpleion provides some mechanism by which customers can return to the original Python domain types, e.g. Noneness is maintained across conversions to / from Ion.

Integration with PyPI

Distribute Ion Python to Python Package Index.

Investigate poor performance with large symbol table import max IDs

The text reader's performance degrades or hangs when faced with imports with huge max IDs. For example (taken from good/subfieldVarUInt32bit.ion):

$ion_1_0

// 5 bytes
// 31 bits
// hex: 0x7fffffff
// dec: 2147483647
$ion_1_0
$ion_symbol_table::{
    imports:[ { name: "com.amazon.blah.blah.blah",
                version: 1,
                max_id: 2147483636 // 2147483646 - 1 - 9 (system symbols)
              } ],
    symbols:[ "boundary-1",
              "boundary",
              "boundary+1"
            ]
}
'boundary-1'::1         // $2147483646
'boundary'  ::1         // $2147483647 2^31 - 1
'boundary+1'::1         // $2147483648

This should be fixed. Successful resolution of this issue will involve removal of the following files from the test_vectors.py skip list:
subfieldVarUInt.ion
subfieldVarUInt32bit.ion

Add roundtrip testing to test_vectors.py

The harness that uses the ion-tests vectors (test_vectors.py) is currently a test of the text and binary reader implementations. An additional step should be added for good/ vectors to verify that equality is maintained when written and re-read (in both text and binary). This will add coverage to the writer implementations and further verify that the reader and writer implementations are compatible and correct.

Tox Configuration

Support testing across all the version of Python we support with Tox.

This would've been helpful for finding #19 and #20 without running in CI first.

boolean values not converted to the correct scalar type

In the Simpleion _load method the scalar evaluation of boolean values results in an integer being returned. This seems to the linked to the simple_types mapping of IonPyBool = IonPyInt and no patching to provide the correct type True | False based on the IonType.

In the _load reader loop the event correctly identifies the boolean value e.g.

_IonManagedThunkEvent(event_type=<IonEventType.SCALAR: 1>, ion_type=<IonType.BOOL: 1>, value=True, field_name=SymbolToken(text='key', sid=97, location=ImportLocation(name='namespace', position=88)), annotations=(), depth=4)

However, in the returned dict I get:

            "properties": {
                "key": 1
            }

Adding this to the _load event loop achieved the effect but it's not ideal.

    while event.event_type is not end_type:
        ion_type = event.ion_type
        if event.event_type is IonEventType.CONTAINER_START:
            container = _FROM_ION_TYPE[ion_type].from_event(event)
            _load(container, reader, IonEventType.CONTAINER_END, ion_type is IonType.STRUCT)
            add(container)
        elif event.event_type is IonEventType.SCALAR:
            if event.value is None or ion_type is IonType.NULL or event.ion_type.is_container:
                scalar = IonPyNull.from_event(event)
            elif ion_type is IonType.BOOL:
                scalar = event.value
            else:
                scalar = _FROM_ION_TYPE[ion_type].from_event(event)
            add(scalar)
        event = reader.send(NEXT_EVENT)

Binary Ion Reader

Implement non-blocking Ion binary parser.

Fix validation of binary annotation wrapper length subfields

The binary reader needs improved validation of annotation wrappers' various length subfields.

The following is from bad/container_length_mismatch.10n:

\xe0\x01\x00\xea\xe6\x81\x84\x71\x04\x71\x04

This contains a version marker, followed by an annotation wrapper of total specified length 6 octets (type descriptor \xe6), followed by a one-octet annot_length field (\x81) which specifies that the annotation SIDs (VarUInts) total one octet, followed by octet \x84 which corresponds to SID 4, followed by two symbol values representing SID 4. This passes, returning two unannotated symbol 4 values. However, it should fail because the annotation wrapper's value ended before reaching the 6-octet length specified in the type descriptor.

Annotation wrappers are invalid unless their type descriptor length == (the length of the annot_length subfield) + (the UInt value of the annot_length subfield) + (the total length of the wrapped value (the first value following the annot subfield(s))). In this case, 6 != (1 + 1 + 2).

Similarly, annotation wrappers with no annot subfields should be rejected. The following is from bad/emptyAnnotatedInt.10n:
\xe0\x01\x00\xea\xe3\x80\x21\x01

This contains a version marker, followed by an annotation wrapper of total specified length 3 octets (type descriptor \xe3), followed by a one-octed annot_length field (\x80) which specifies that the annotation SIDs total zero octets (i.e. there aren't any annotation SIDs), followed by the binary representation of the int value 1. This passes, returning an unannotated int 1. However, it should fail, as annotation wrappers must have at least one annotation SID.

Successful resolution of this issue will involve removal of the following files from the test_vectors.py skip list:
bad/container_length_mismatch.10n
bad/emptyAnnotatedInt.10n

Documentation Generation

Integrate with Sphinx for generating code documentation that can be published.

Support text timestamps with more than 6 digits of fractional precision

The text reader does not support timestamps with greater than 6 digits fractional precision because Python's datetime only supports microsecond precision. Current logic raises an error instead of silently rounding/truncating.

Consider adding an extension to ion-python's Timestamp type to allow roundtripping of arbitrary fractional precision with full fidelity.

Successful resolution of this issue will involve removal of the following file from the test_vectors.py skip list:
good/equivs/timestampsLargeFractionalPrecision.ion

consider dropping the "amazon" package namespace

The "amazon" package namespace is effectively closed due to behavior around init.py (see the-init-py-trap). This prevents definition of other subpackages in other directory structures (or repositories), e.g. amazon.ionhash.

That said, perhaps the "amazon" namespace isn't adding value; we should consider dropping it.

Fix Python 2.6 Compatibility

Python 2.6 is disabled in Travis CI, the principal reason seems to be about assumptions of float to string encoding and it being not as normalized as in later versions.

We need to fix the tests and re-enable the Travis CI config.

Publish Documentation to Read the Docs

Publish generated documentation to Read the Docs.

Raw Binary Writer Should Support VERSION_MARKER Events

Writing a core.ION_VERSION_MARKER_EVENT fails on the raw writer:

>>> from amazon.ion import *
>>> import io
>>> w = writer.blocking_writer(writer_binary_raw._raw_binary_writer(writer_buffer.BufferTree()), io.BytesIO())
>>> w.send(core.ION_VERSION_MARKER_EVENT)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
    w.send(core.ION_VERSION_MARKER_EVENT)
  File "/usr/local/lib/python3.6/site-packages/amazon/ion/writer.py", line 150, in blocking_writer
    for result_event in _drain(writer, ion_event):
  File "/usr/local/lib/python3.6/site-packages/amazon/ion/writer.py", line 129, in _drain
    result_event = writer.send(ion_event)
  File "/usr/local/lib/python3.6/site-packages/amazon/ion/writer.py", line 111, in writer_trampoline
    trans = trans.delegate.send(Transition(ion_event, trans.delegate))
  File "/usr/local/lib/python3.6/site-packages/amazon/ion/writer_binary_raw.py", line 393, in _raw_writer_coroutine
    fail()
  File "/usr/local/lib/python3.6/site-packages/amazon/ion/writer_binary_raw.py", line 351, in fail
    raise TypeError('Invalid event: %s at depth %d' % (ion_event, depth))
TypeError: Invalid event: IonEvent(event_type=<IonEventType.VERSION_MARKER: 0>, ion_type=None, value=(1, 0), field_name=None, annotations=(), depth=0) at depth 
0

Looking more closely, it appears that the managed writer explicitly writes the IVM which is in correct, the managed writer should propagate the IVM event to the raw writer, and that should emit the byte sequence.

Deep Copy fails / Save to DynamoDB fails

I can't save an Ion payload to DynamoDB with Boto3. It looks like Boto3 makes a deep copy prior to saving, which throws.

import boto3
import copy
from decimal import Decimal
from amazon.ion import simpleion
from io import StringIO


d = { "value": Decimal('1.1') }
payload = simpleion.dumps(d)
payloadValue = simpleion.load(StringIO(payload))
c = copy.deepcopy(payloadValue) # This fails
table = boto3.resource('dynamodb').Table('MyTable')
table.put_item(Item=payloadValue) # This fails because it makes a deep copy

Should an Ion struct be able to be deep copied? Stack below.

TypeError: __new__() missing 1 required positional argument: 'value'

Traceback (most recent call last):
  File "c:\Projects\\lambda\lambda_function.py", line 32, in <module>
    bug()
  File "c:\Projects\\lambda\lambda_function.py", line 29, in bug
    c = copy.deepcopy(payloadValue)
  File "C:\Users\\AppData\Local\Programs\Python\Python36\lib\copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "C:\Users\\AppData\Local\Programs\Python\Python36\lib\copy.py", line 280, in _reconstruct
    state = deepcopy(state, memo)
  File "C:\Users\\AppData\Local\Programs\Python\Python36\lib\copy.py", line 150, in deepcopy
    y = copier(x, memo)
  File "C:\Users\\AppData\Local\Programs\Python\Python36\lib\copy.py", line 240, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "C:\Users\\AppData\Local\Programs\Python\Python36\lib\copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "C:\Users\\AppData\Local\Programs\Python\Python36\lib\copy.py", line 274, in _reconstruct
    y = func(*args)
  File "C:\Users\\AppData\Local\Programs\Python\Python36\lib\copy.py", line 273, in <genexpr>
    args = (deepcopy(arg, memo) for arg in args)
  File "C:\Users\\AppData\Local\Programs\Python\Python36\lib\copy.py", line 180, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "C:\Users\\AppData\Local\Programs\Python\Python36\lib\copy.py", line 274, in _reconstruct
    y = func(*args)
  File "C:\Users\\AppData\Local\Programs\Python\Python36\lib\copyreg.py", line 88, in __newobj__
    return cls.__new__(cls, *args)

IonEvent.field_name is now a string in cases where it used to be a SymbolToken

#97 caused an unexpected change in behavior that broke ion-hash-python (see amazon-ion/ion-hash-python#10). Prior to that change, IonEvent's field_name property (for events derived from an IonNature instance) was a SymbolToken instance. After the change, field_name became a string, as the _dump logic is now used to iterate over the keys (fieldnames) of a dict to generate IonEvents, instead of using relying on IonEvents previously held by IonNature.

Omit Ion identifier for pretty printing text

When pretty-printing elements from an array of Ion values, and the context makes it clear that each value is Ion, allow the $ion_1_0 prefix to be omitted as it is redundant and adds clutter to the output.

Implement dumps and loads in simpleion module.

Now that dump and load are implemented with text and binary support, this should be simple.

Travis CI Default Xenial Breaks Python 2.6/3.3

This change in Travis CI breaks Python 2.6 and Python 3.3 compatibility. We should do two things:

Make the dist explicit so we don't break in a non-deterministic way when Travis CI changes. Specifically we should do one of:
- Remove support for Python 3.3 and 2.6 (both are EOL) and make dist: xenial.
- Make dist: trusty to support the existing 2.6/3.3 dependency.

Cookbook Guide

Add good surrounding documentation around setup/development/contribution/getting started. Also integrate into our existing cookbook.

Text Ion Reader

Implement a non-blocking Ion text parser.

Fix binary read support for ordered structs

According to the spec for the struct type, there is a special case for the type descriptor \xD1: "When L is 1, the struct has at least one symbol/value pair, the length field exists, and the field name integers are sorted in increasing order."

The binary reader currently doesn't support this case, failing on structs defined this way for overrunning one byte. Support for the special case of a struct with L=1 should be added.

Successful resolution of this issue will involve removal of the following files from the test_vectors.py skip list:
good/structAnnotatedOrdered.10n
good/structOrdered.10n

Travis CI Support

Enable Travis CI builds for the unit tests for the Python versions we support (2.6+, 3.3+)

Tests fail on i586 for Python 3.7

[  254s] ==================================== ERRORS ====================================
[  254s] ____________________ ERROR collecting tests/test_vectors.py ____________________
[  254s] tests/test_vectors.py:72: in <module>
[  254s]     getcontext().Emax = 100000000000000000
[  254s] E   OverflowError: Python int too large to convert to C ssize_t
[  254s] !!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!
[  254s] =========================== 1 error in 13.40 seconds ===========================

https://build.opensuse.org/package/live_build_log/home:jayvdb:moban/python-amazon.ion/openSUSE_Tumbleweed/i586

Add simpleion support for roundtripping streams with unavailable symbol table imports

When a shared symbol table is not found, a stream can still be read; the symbols referenced from the unavailable imports simply have "unknown" text (e.g. "$128"). When this happens, it should be possible to re-write the stream, referencing the not-found imports, so that in the future (or down the pipeline) other readers can process the full-fidelity data if they have access to that import.

This is not currently possible with simpleion. All keys (field names) are strings, so field name tokens with unknown text will all collapse to None. To support this feature, all symbol tokens with unknown text need to be preserved as SymbolTokens (with None text, a non-None SID, and a valid import location). A tailored implementation of dict is likely necessary for full support. See also: #36 .

Successful resolution of this issue will involve removal of the following files from the test_vectors.py skip list:
good/item1.10n
good/nonequivs/symbolTablesUnknownText.ion

AttributeError: 'tuple' object has no attribute 'ion_type'

Issue

Upon using the simpleion.dump method to encode a tuple in Ion binary, I encountered the following error:


Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/amazon/ion/simpleion.py", line 184, in _dump
    ion_type = obj.ion_type
AttributeError: 'tuple' object has no attribute 'ion_type'

The string representation of the dict I am attempting to encode:

{'TupleKey': ('a', 'tuple')}

Expected Behavior

The docs for simpleion.dump list a possible tuple -> Ion list conversion, I expected to receive an Ion list as part of the encoded bytes.

Consolidate Binary Writer/Reader Constants

We have some duplication in the binary reader/writer code around things like type constants, we should consolidate hem.

Consider allowing callers of simpleion.dump to provide their own writer

ion-hash-python currently relies on the private-by-convention _dump method instead of the public dump method because the former allows you to provide your own writer.

We should investigate whether there are enough other use cases for providing a writer that we can add that to dump, allowing ion-hash-python to use the public version. (Alternatively, if there aren't other use cases, we could at least add a comment to _dump indicating that it is relied upon by a sibling project.)

Ion Python on the Web

Consider using something like PyPy.js to build an interactive shell for playing with Ion python and provide a client-side Ion playground.

Behavior change in bit_length implementation for python 2.6

https://github.com/amzn/ion-python/blob/26cc0023cb4bfd82c1f2aec6d0b82205ffe92d6f/amazon/ion/util.py#L288

Implementations differ for 0 and negative numbers.

>>> (0).bit_length()
0
>>> len(bin(0)) - 2
1
>>> (-1).bit_length()
1
>>> len(bin(-1)) - 2
2

Not sure if this is important or not for the usages of bit_length, just thought I'd point it out.

SimpleIon load performance: 7min on a 30Mb ion text files

Hi, it looks like there's a big performance issue with the simpleion load/load function. I did a little experiment:

30 MB ion text: 7min to load into memory
When converted above to binary: ~30 sec to load into memory.

The JSON load performance of the same data is ~550ms. IonCPP utils is also pretty fast for both text / binary. Any plan to speedup the dumper?

Thanks!

Allow unicode code points to be written in native UTF-8 representation within quoted text

Currently, the text writer escapes any code point outside of ASCII range. This is required when the output encoding is restricted to ASCII, but causes unnecessary bloating when UTF-8 output is acceptable. This could be made configurable.

Coveralls.io Support

Integrate code coverage with coveralls.io

Run ion-hash-python's tests in ion-python's Travis builds.

ion-hash-python is a companion project that uses simpleion's internal _dump method. When breaking changes are made to _dump, ion-hash-python breaks. Running ion-hash-python's tests within this repository will detect the breakage before it is pushed.

Embedding IonPyValues in dict

I'm seeing some inexplicable behavior when I embed IonPyValues in a standard dict and then load that to Ion.

from amazon.ion.simpleion import loads, dumps
my_struct = {'spam': "eggs", 'count': 12}
i_struct = loads(dumps(my_struct))
other_struct = {'other': 24, 'embedded': i_struct["count"]}
io_struct = loads(dumps(other_struct))
dumps(io_struct, binary=False)

prints:

'$ion_1_0 {other:24,count:12}'

My expectation is that it would print:

'$ion_1_0 {other:24,embedded:12}'

Put the simpleion module docs front-and-center

Python developers using this library for the first time have reported difficulty getting started. We currently provide a link to the complete docs, but we should point to the simpleion module specifically and call it out early in the README file. It would be nice to have an example of load, loads, dump, and dumps in the README.

Keeping order of insertion

It would be very useful to have an option to keep the order of insertion of the keys. This option would be diferent from sort_keysor item_sort_keys where the user needs to specify how the sorting will be done.

I think this has been also requested for simplejson, but it was discarded because it wouldn't be valid JSON as JSON is unordered.

Would such a feature be desirable in simpleion?

Python 3.7 compatibility

test_simpleion fails on Python 3.7, as does vectors test data.

See https://travis-ci.org/jayvdb/ion-python/jobs/495377901

amazon-ion / ion-python Goto Github PK

ion-python's People

Contributors

Stargazers

Watchers

Forkers

ion-python's Issues

Issue

Expected Behavior

Issue

Expected Behavior

Issue

Expected Behavior

Recommend Projects

Recommend Topics

Recommend Org