amazon-ion / ion-python Goto Github PK
View Code? Open in Web Editor NEWA Python implementation of Amazon Ion.
Home Page: https://amazon-ion.github.io/ion-docs/
License: Apache License 2.0
A Python implementation of Amazon Ion.
Home Page: https://amazon-ion.github.io/ion-docs/
License: Apache License 2.0
Implement a raw reader decorator that process system values.
For packaging, would you mind including the license, tests, etc into the sdist uploaded to PyPI.
See .spec at https://build.opensuse.org/package/show/home:jayvdb:moban/python-amazon.ion
Provide APIs compatible or similar to simplejson
and the built-in json
module over the non-blocking event APIs.
Often we have scenarios where input comes as JSON, converted to ION to do some business requirements, append data, etc, and then needs to be converted back to JSON. IonJava and others support the capability to translate ION to JSON. Add this extension for Python please.
https://github.com/amzn/ion-python/blob/master/amazon/ion/util.py#L59
For python2.7 support, you can declare dependency on enum with Python license.
IonPyDict
does not have a nice string representation when printed to the console, e.g. I currently the following output:
<amazon.ion.simple_types._ion_type_for.<locals>.IonPyValueType object at 0x100947cc0>
when running this sample code:
import io
import amazon.ion.simpleion as ion
myobj = {'StrKey': 'StrValue', 'IntKey': 123}
fp = io.BytesIO()
ion.dump(myobj, fp)
ion_dict = ion.load(io.BytesIO(fp.getvalue()))
print(ion_dict)
Similar output to printing the native Python dict directly, e.g.:
{'StrKey': 'StrValue', 'IntKey': 123}
Simpleion maps Ion structs to python dicts, whose keys are de-duped. IonPyDict
needs to support duplicate mappings when iterated and compared.
Successful resolution of this issue will involve removal of the following files from the test_vectors.py
skip list:
good/equivs/structsFieldsRepeatedNames.ion
good/nonequivs/structs.ion
Python 3.3/3.4 is disabled in our Travis CI configuration. The main reason being that we have some code that does the %
operator with bytes (which works in Python 3.5).
We need to fix this and re-enable the Travis CI configuration.
Implement non-blocking Ion binary writer.
In 7651c7a I see that a python tuple
is mapped to IonType.LIST
. My expectation is that it would be to IonType.SEXP
. At least that's what I have in my workspace and it works as I would expect.
Isn't mapping it List effectively losing data?
Currently the package is nested as amazon.ion
, this is somewhat verbose and requires the use of namespace packages which have idiomatic support in Python 2, but are non-standard. I propose we shorten the name to a single level of depth. Some possibilities:
amzion
- my preferred choice.ion
- potentially conflicting with other modules with a sufficiently general name.amazonion
- too long in my opinion.anion
- a pun on negatively charged Ion and "an Ion" implementation. Negative connotation may make this not a good choice.simpleion does not provide a way for None
python values to be converted back into None
from IonPyNull
.
The following example code prints:
import io
import amazon.ion.simpleion as ion
def print_is_none(obj):
if obj['NoneKey'] is None:
print("Value is None")
else:
print("Value is not None")
myobj = {'NoneKey': None}
print_is_none(myobj)
fp = io.BytesIO()
ion.dump(myobj, fp)
ion_dict = ion.load(io.BytesIO(fp.getvalue()))
print_is_none(ion_dict)
prints:
Value is None
Value is not None
because ion_dict['NoneKey']
returns an instance of IonPyNull
.
simpleion provides some mechanism by which customers can return to the original Python domain types, e.g. None
ness is maintained across conversions to / from Ion.
Distribute Ion Python to Python Package Index.
The text reader's performance degrades or hangs when faced with imports with huge max IDs. For example (taken from good/subfieldVarUInt32bit.ion):
$ion_1_0
// 5 bytes
// 31 bits
// hex: 0x7fffffff
// dec: 2147483647
$ion_1_0
$ion_symbol_table::{
imports:[ { name: "com.amazon.blah.blah.blah",
version: 1,
max_id: 2147483636 // 2147483646 - 1 - 9 (system symbols)
} ],
symbols:[ "boundary-1",
"boundary",
"boundary+1"
]
}
'boundary-1'::1 // $2147483646
'boundary' ::1 // $2147483647 2^31 - 1
'boundary+1'::1 // $2147483648
This should be fixed. Successful resolution of this issue will involve removal of the following files from the test_vectors.py
skip list:
subfieldVarUInt.ion
subfieldVarUInt32bit.ion
The harness that uses the ion-tests vectors (test_vectors.py
) is currently a test of the text and binary reader implementations. An additional step should be added for good/
vectors to verify that equality is maintained when written and re-read (in both text and binary). This will add coverage to the writer implementations and further verify that the reader and writer implementations are compatible and correct.
In the Simpleion _load method the scalar evaluation of boolean values results in an integer being returned. This seems to the linked to the simple_types mapping of IonPyBool = IonPyInt and no patching to provide the correct type True | False based on the IonType.
In the _load reader loop the event correctly identifies the boolean value e.g.
_IonManagedThunkEvent(event_type=<IonEventType.SCALAR: 1>, ion_type=<IonType.BOOL: 1>, value=True, field_name=SymbolToken(text='key', sid=97, location=ImportLocation(name='namespace', position=88)), annotations=(), depth=4)
However, in the returned dict I get:
"properties": {
"key": 1
}
Adding this to the _load event loop achieved the effect but it's not ideal.
while event.event_type is not end_type:
ion_type = event.ion_type
if event.event_type is IonEventType.CONTAINER_START:
container = _FROM_ION_TYPE[ion_type].from_event(event)
_load(container, reader, IonEventType.CONTAINER_END, ion_type is IonType.STRUCT)
add(container)
elif event.event_type is IonEventType.SCALAR:
if event.value is None or ion_type is IonType.NULL or event.ion_type.is_container:
scalar = IonPyNull.from_event(event)
elif ion_type is IonType.BOOL:
scalar = event.value
else:
scalar = _FROM_ION_TYPE[ion_type].from_event(event)
add(scalar)
event = reader.send(NEXT_EVENT)
Implement non-blocking Ion binary parser.
The binary reader needs improved validation of annotation wrappers' various length subfields.
The following is from bad/container_length_mismatch.10n
:
\xe0\x01\x00\xea\xe6\x81\x84\x71\x04\x71\x04
This contains a version marker, followed by an annotation wrapper of total specified length 6 octets (type descriptor \xe6), followed by a one-octet annot_length field (\x81) which specifies that the annotation SIDs (VarUInts) total one octet, followed by octet \x84 which corresponds to SID 4, followed by two symbol values representing SID 4. This passes, returning two unannotated symbol 4 values. However, it should fail because the annotation wrapper's value ended before reaching the 6-octet length specified in the type descriptor.
Annotation wrappers are invalid unless their type descriptor length == (the length of the annot_length subfield) + (the UInt value of the annot_length subfield) + (the total length of the wrapped value (the first value following the annot subfield(s))). In this case, 6 != (1 + 1 + 2).
Similarly, annotation wrappers with no annot subfields should be rejected. The following is from bad/emptyAnnotatedInt.10n
:
\xe0\x01\x00\xea\xe3\x80\x21\x01
This contains a version marker, followed by an annotation wrapper of total specified length 3 octets (type descriptor \xe3), followed by a one-octed annot_length field (\x80) which specifies that the annotation SIDs total zero octets (i.e. there aren't any annotation SIDs), followed by the binary representation of the int value 1. This passes, returning an unannotated int 1. However, it should fail, as annotation wrappers must have at least one annotation SID.
Successful resolution of this issue will involve removal of the following files from the test_vectors.py
skip list:
bad/container_length_mismatch.10n
bad/emptyAnnotatedInt.10n
Integrate with Sphinx for generating code documentation that can be published.
The text reader does not support timestamps with greater than 6 digits fractional precision because Python's datetime only supports microsecond precision. Current logic raises an error instead of silently rounding/truncating.
Consider adding an extension to ion-python's Timestamp
type to allow roundtripping of arbitrary fractional precision with full fidelity.
Successful resolution of this issue will involve removal of the following file from the test_vectors.py
skip list:
good/equivs/timestampsLargeFractionalPrecision.ion
The "amazon" package namespace is effectively closed due to behavior around init.py (see the-init-py-trap). This prevents definition of other subpackages in other directory structures (or repositories), e.g. amazon.ionhash
.
That said, perhaps the "amazon" namespace isn't adding value; we should consider dropping it.
Python 2.6 is disabled in Travis CI, the principal reason seems to be about assumptions of float to string encoding and it being not as normalized as in later versions.
We need to fix the tests and re-enable the Travis CI config.
Publish generated documentation to Read the Docs.
Writing a core.ION_VERSION_MARKER_EVENT
fails on the raw writer:
>>> from amazon.ion import *
>>> import io
>>> w = writer.blocking_writer(writer_binary_raw._raw_binary_writer(writer_buffer.BufferTree()), io.BytesIO())
>>> w.send(core.ION_VERSION_MARKER_EVENT)
Traceback (most recent call last):
File "<input>", line 1, in <module>
w.send(core.ION_VERSION_MARKER_EVENT)
File "/usr/local/lib/python3.6/site-packages/amazon/ion/writer.py", line 150, in blocking_writer
for result_event in _drain(writer, ion_event):
File "/usr/local/lib/python3.6/site-packages/amazon/ion/writer.py", line 129, in _drain
result_event = writer.send(ion_event)
File "/usr/local/lib/python3.6/site-packages/amazon/ion/writer.py", line 111, in writer_trampoline
trans = trans.delegate.send(Transition(ion_event, trans.delegate))
File "/usr/local/lib/python3.6/site-packages/amazon/ion/writer_binary_raw.py", line 393, in _raw_writer_coroutine
fail()
File "/usr/local/lib/python3.6/site-packages/amazon/ion/writer_binary_raw.py", line 351, in fail
raise TypeError('Invalid event: %s at depth %d' % (ion_event, depth))
TypeError: Invalid event: IonEvent(event_type=<IonEventType.VERSION_MARKER: 0>, ion_type=None, value=(1, 0), field_name=None, annotations=(), depth=0) at depth
0
Looking more closely, it appears that the managed writer explicitly writes the IVM which is in correct, the managed writer should propagate the IVM event to the raw writer, and that should emit the byte sequence.
I can't save an Ion payload to DynamoDB with Boto3. It looks like Boto3 makes a deep copy prior to saving, which throws.
import boto3
import copy
from decimal import Decimal
from amazon.ion import simpleion
from io import StringIO
d = { "value": Decimal('1.1') }
payload = simpleion.dumps(d)
payloadValue = simpleion.load(StringIO(payload))
c = copy.deepcopy(payloadValue) # This fails
table = boto3.resource('dynamodb').Table('MyTable')
table.put_item(Item=payloadValue) # This fails because it makes a deep copy
Should an Ion struct be able to be deep copied? Stack below.
TypeError: __new__() missing 1 required positional argument: 'value'
Traceback (most recent call last):
File "c:\Projects\\lambda\lambda_function.py", line 32, in <module>
bug()
File "c:\Projects\\lambda\lambda_function.py", line 29, in bug
c = copy.deepcopy(payloadValue)
File "C:\Users\\AppData\Local\Programs\Python\Python36\lib\copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\\AppData\Local\Programs\Python\Python36\lib\copy.py", line 280, in _reconstruct
state = deepcopy(state, memo)
File "C:\Users\\AppData\Local\Programs\Python\Python36\lib\copy.py", line 150, in deepcopy
y = copier(x, memo)
File "C:\Users\\AppData\Local\Programs\Python\Python36\lib\copy.py", line 240, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "C:\Users\\AppData\Local\Programs\Python\Python36\lib\copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\\AppData\Local\Programs\Python\Python36\lib\copy.py", line 274, in _reconstruct
y = func(*args)
File "C:\Users\\AppData\Local\Programs\Python\Python36\lib\copy.py", line 273, in <genexpr>
args = (deepcopy(arg, memo) for arg in args)
File "C:\Users\\AppData\Local\Programs\Python\Python36\lib\copy.py", line 180, in deepcopy
y = _reconstruct(x, memo, *rv)
File "C:\Users\\AppData\Local\Programs\Python\Python36\lib\copy.py", line 274, in _reconstruct
y = func(*args)
File "C:\Users\\AppData\Local\Programs\Python\Python36\lib\copyreg.py", line 88, in __newobj__
return cls.__new__(cls, *args)
#97 caused an unexpected change in behavior that broke ion-hash-python (see amazon-ion/ion-hash-python#10). Prior to that change, IonEvent's field_name property (for events derived from an IonNature instance) was a SymbolToken instance. After the change, field_name became a string, as the _dump logic is now used to iterate over the keys (fieldnames) of a dict to generate IonEvents, instead of using relying on IonEvents previously held by IonNature.
When pretty-printing elements from an array of Ion values, and the context makes it clear that each value is Ion, allow the $ion_1_0 prefix to be omitted as it is redundant and adds clutter to the output.
Now that dump
and load
are implemented with text and binary support, this should be simple.
This change in Travis CI breaks Python 2.6 and Python 3.3 compatibility. We should do two things:
dist
explicit so we don't break in a non-deterministic way when Travis CI changes. Specifically we should do one of:
dist: xenial
.dist: trusty
to support the existing 2.6/3.3 dependency.Add good surrounding documentation around setup/development/contribution/getting started. Also integrate into our existing cookbook.
Implement a non-blocking Ion text parser.
According to the spec for the struct
type, there is a special case for the type descriptor \xD1: "When L is 1, the struct has at least one symbol/value pair, the length field exists, and the field name integers are sorted in increasing order."
The binary reader currently doesn't support this case, failing on structs defined this way for overrunning one byte. Support for the special case of a struct with L=1 should be added.
Successful resolution of this issue will involve removal of the following files from the test_vectors.py
skip list:
good/structAnnotatedOrdered.10n
good/structOrdered.10n
Enable Travis CI builds for the unit tests for the Python versions we support (2.6+, 3.3+)
[ 254s] ==================================== ERRORS ====================================
[ 254s] ____________________ ERROR collecting tests/test_vectors.py ____________________
[ 254s] tests/test_vectors.py:72: in <module>
[ 254s] getcontext().Emax = 100000000000000000
[ 254s] E OverflowError: Python int too large to convert to C ssize_t
[ 254s] !!!!!!!!!!!!!!!!!!! Interrupted: 1 errors during collection !!!!!!!!!!!!!!!!!!!!
[ 254s] =========================== 1 error in 13.40 seconds ===========================
When a shared symbol table is not found, a stream can still be read; the symbols referenced from the unavailable imports simply have "unknown" text (e.g. "$128"). When this happens, it should be possible to re-write the stream, referencing the not-found imports, so that in the future (or down the pipeline) other readers can process the full-fidelity data if they have access to that import.
This is not currently possible with simpleion. All keys (field names) are strings, so field name tokens with unknown text will all collapse to None
. To support this feature, all symbol tokens with unknown text need to be preserved as SymbolToken
s (with None
text, a non-None
SID, and a valid import location). A tailored implementation of dict
is likely necessary for full support. See also: #36 .
Successful resolution of this issue will involve removal of the following files from the test_vectors.py
skip list:
good/item1.10n
good/nonequivs/symbolTablesUnknownText.ion
Upon using the simpleion.dump
method to encode a tuple
in Ion binary, I encountered the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/amazon/ion/simpleion.py", line 184, in _dump
ion_type = obj.ion_type
AttributeError: 'tuple' object has no attribute 'ion_type'
The string representation of the dict I am attempting to encode:
{'TupleKey': ('a', 'tuple')}
The docs for simpleion.dump
list a possible tuple
-> Ion list conversion, I expected to receive an Ion list as part of the encoded bytes.
We have some duplication in the binary reader/writer code around things like type constants, we should consolidate hem.
ion-hash-python
currently relies on the private-by-convention _dump
method instead of the public dump
method because the former allows you to provide your own writer
.
We should investigate whether there are enough other use cases for providing a writer that we can add that to dump
, allowing ion-hash-python to use the public version. (Alternatively, if there aren't other use cases, we could at least add a comment to _dump
indicating that it is relied upon by a sibling project.)
Consider using something like PyPy.js to build an interactive shell for playing with Ion python and provide a client-side Ion playground.
Implementations differ for 0 and negative numbers.
>>> (0).bit_length()
0
>>> len(bin(0)) - 2
1
>>> (-1).bit_length()
1
>>> len(bin(-1)) - 2
2
Not sure if this is important or not for the usages of bit_length, just thought I'd point it out.
Hi, it looks like there's a big performance issue with the simpleion load/load function. I did a little experiment:
The JSON load performance of the same data is ~550ms. IonCPP utils is also pretty fast for both text / binary. Any plan to speedup the dumper?
Thanks!
Currently, the text writer escapes any code point outside of ASCII range. This is required when the output encoding is restricted to ASCII, but causes unnecessary bloating when UTF-8 output is acceptable. This could be made configurable.
Integrate code coverage with coveralls.io
ion-hash-python is a companion project that uses simpleion's internal _dump
method. When breaking changes are made to _dump
, ion-hash-python breaks. Running ion-hash-python's tests within this repository will detect the breakage before it is pushed.
I'm seeing some inexplicable behavior when I embed IonPyValues in a standard dict and then load that to Ion.
from amazon.ion.simpleion import loads, dumps
my_struct = {'spam': "eggs", 'count': 12}
i_struct = loads(dumps(my_struct))
other_struct = {'other': 24, 'embedded': i_struct["count"]}
io_struct = loads(dumps(other_struct))
dumps(io_struct, binary=False)
prints:
'$ion_1_0 {other:24,count:12}'
My expectation is that it would print:
'$ion_1_0 {other:24,embedded:12}'
Python developers using this library for the first time have reported difficulty getting started. We currently provide a link to the complete docs, but we should point to the simpleion module specifically and call it out early in the README file. It would be nice to have an example of load
, loads
, dump
, and dumps
in the README.
It would be very useful to have an option to keep the order of insertion of the keys. This option would be diferent from sort_keys
or item_sort_keys
where the user needs to specify how the sorting will be done.
I think this has been also requested for simplejson
, but it was discarded because it wouldn't be valid JSON as JSON is unordered.
Would such a feature be desirable in simpleion
?
test_simpleion fails on Python 3.7, as does vectors test data.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.