Git Product home page Git Product logo

msgpack-python's Introduction

MessagePack

MessagePack is an efficient binary serialization format. It's like JSON. but fast and small.

This repository manages specification of MessagePack format. See Spec for the specification.

Implementation projects have each own repository. See msgpack.org website to find implementations and their documents.

If you'd like to show your msgpack implementation to the msgpack.org website, please follow the website document.

msgpack-python's People

Contributors

alex avatar antocuni avatar brmmm3 avatar bwesterb avatar devendor avatar drednout avatar faerot avatar frsyuki avatar hauntsaninja avatar hexagonrecursion avatar hugovk avatar jfolz avatar jnothman avatar kjim avatar mayeut avatar mbr0wn avatar methane avatar moreati avatar msabramo avatar palaviv avatar paulmelis avatar pramukta avatar seliopou avatar shadchin avatar steeve avatar thedrow avatar thomaswaldmann avatar wbolster avatar yamt avatar yanlend avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

msgpack-python's Issues

Segmentation fault when exceptions of any kind happen within an object_hook in some situations

Hi, I ran across this quite nasty bug today. In certain situations (I have been unable to determine when exactly), an exception of any kind being thrown in a custom object_hook while decoding, will make the entire Python interpreter segfault. I have included a test case for both a non-segfaulting exception and a segfaulting exception.

Test case:

import msgpack

class TestObject:
    def _unpack(self, data):
        return msgpack.unpackb(data, object_hook=self._decode_unpack)

    def _decode_unpack(self, obj):
        raise Exception("test")

    def testrun_segfault(self):
        # msgpack-encoded version of {"test": "just sending some test data...", "number": 41, "file": open("test.py", "r")}
        # read custom encoding function that was used at the bottom
        encoded = "\x83\xa4test\xbejust sending some test data...\xa6number)\xa4file\x83\xa8__type__\xa4file\xa6__id__\x0b\xa8__size__\xcd\x02`"

        try:
            decoded = self._unpack(encoded)
        except:
            print "Exception (with segfault) happened successfully"

    def testrun_nosegfault(self):
        # msgpack-encoded version of {"test": "just sending some test data...", "number": 41}
        encoded = "\x82\xa4test\xbejust sending some test data...\xa6number)"

        try:
            decoded = self._unpack(encoded)
        except:
            print "Exception (without segfault) happened successfully"


test = TestObject()
print "Attempting testrun without segfault..."
test.testrun_nosegfault()
print "Attempting testrun with segfault..."
test.testrun_segfault()


''' Custom encoding code used to encode the data:

    def _pack(self, data):
        return msgpack.packb(data, default=self._encode_pack)

    def _encode_pack(self, obj):
        if hasattr(obj, "read"):
            datastream_id = self._create_datastream(obj)  # <- Unrelated project code

            # Determine the total size of the file
            current_pos = obj.tell()
            obj.seek(0, os.SEEK_END)
            total_size = obj.tell()
            obj.seek(current_pos)

            obj = {"__type__": "file", "__id__": datastream_id, "__size__": total_size}

        return obj
'''

Output:

[occupy@edge13 pyreactor]$ python ~/segfault.py 
Attempting testrun without segfault...
Exception (without segfault) happened successfully
Attempting testrun with segfault...
Segmentation fault (core dumped)

Allow custom encoding (not character encoding but default function encoding) of unicode strings

As of now, I could not find a way to both support unicode and strings (or bytes) in a way which enables a transparent round trip (encode then decode).

However, this could easily be implemented if the unicode type could be custom encoded by the "default" function hook. In the latest version, if no character-encoding is proposed for the "pack" function, an exception is raised when a unicode instance is serialized.

If this exception were removed (line 38 from _packer.pyx) and the unicode case was then skipped when the character-encoding is not provided, a custom handler could be implemented in the default function...

use_list=1 adds to unpacking time

I have done some quick benchmarking, which showed a big increase in time taken to unpack a nested list structure (500-deep) between two repository branches. It turns out this 5-fold increase in time resulted from the changed default argument use_list=1.

This may be a non-issue, but I thought you should be aware.

Stream processing requires knowledge of the data

I was just trying to use the recent updated msgpack library for stream processing but it still requires knowledge of the data that's incoming which I don't have in all cases. What I want is a function that works roughly like this:

>>> u = msgpack.Unpacker(StringIO('\x94\x01\x02\x03\x04'))
>>> u.next_marker()
('map_start', 4)
>>> u.next_marker()
('value', 1)
>>> u.next_marker()
('value', 2)
>>> u.next_marker()
('value', 3)
>>> u.next_marker()
('value', 4)

Eg: next marker returns a tuple in the form (marker, value) where marker is one of map_start / array_start or value. if it's value it will have the actual value as second item, if it's a container marker then it has the size in there. This would allow trivial stream processing. (a value marker would never contain a map or array, just scalar values).

unpack should decode string types by default

msgpack has native text and binary types, but the default behavior of msgpack.unpackb of the text type is to return the still-encoded bytes, rather than the decoded text (unicode on Python 2, str on Python 3). The msgpack spec specifies that strings are utf-8, so it should be a logical default.

Changing the default value for Unpacker.encoding from None to utf-8 would introduce the desired behavior, but it would be problematic for anyone relying on the default behavior. So I thought I would ask before opening a PR. Would a pull request with this change be accepted?

Incremental parsing from a file-like object results in random failures (probably a memory management issue)

The C implementation of the Unpacker has issues that look like out-of-bound memory reads when it parses from a file-like object. The bug manifests itself both for StringIO/BytesIO and real files.

Let's create a simple file-like object with an encoded [1, 2, 3, 4] array inside.

>>> import io
>>> import msgpack
>>> fp = io.BytesIO(msgpack.packb([1,2,3,4]))
>>> fp.seek(0)
0L
>>> fp.read()
'\x94\x01\x02\x03\x04'

Now let's parse it step by step using an Unpacker instance:

>>> fp.seek(0)
0L
>>> unpacker = msgpack.Unpacker(fp)
>>> unpacker.read_array_header()
Traceback (most recent call last):
  File "<ipython-input-9-8b0ba0929a83>", line 1, in <module>
    unpacker.read_array_header()
  File "_unpacker.pyx", line 388, in msgpack._unpacker.Unpacker.read_array_header (msgpack/_unpacker.cpp:388)
  File "_unpacker.pyx", line 331, in msgpack._unpacker.Unpacker._unpack (msgpack/_unpacker.cpp:331)
ValueError: Unexpected type header on stream

The pure Python fallback implementation works as expected, returning the number of elements in the array, followed by the actual items:

>>> fp.seek(0)
0L
>>> import msgpack.fallback
>>> unpacker = msgpack.fallback.Unpacker(fp)
>>> unpacker.read_array_header()
4
>>> unpacker.unpack()
1
>>> unpacker.unpack()
2
>>> unpacker.unpack()
3
>>> unpacker.unpack()
4
>>> unpacker.unpack()
Traceback (most recent call last):
  ...
OutOfData

(Note that the OutOfData exception is actually expected in this case.)

After fiddling a bit more inside an interactive IPython shell, I've seen various other corruptions as well, e.g.:

>>> up.next()
1
>>> up.next()
2
>>> up.next()
3
>>> up.next()
4
>>> up.next()
127
>>> up.next()
0
>>> up.next()
0
>>> up.next()
8
>>> up.next()
-18
>>> up.next()
-30
>>> up.next()
[63, 127, 0, 0, 0, 60, 95, 3]
>>> up.next()
0
>>> up.next()
0
>>> up.next()
0
>>> up.next()
0
>>> up.next()
0
>>> up.next()
60
>>> up.next()
95
>>> up.next()
3
>>> up.next()
0
>>> up.next()
0
>>> up.next()
0
>>> up.next()
0
>>> list(up)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-46-f4ac41e0b6c8> in <module>()
----> 1 list(up)
/home/uws/.virtualenvs/72b41e48ba5dcfd/local/lib/python2.7/site-packages/msgpack/_unpacker.so in msgpack._unpacker.Unpacker.__next__ (msgpack/_unpacker.cpp:402)()

/home/uws/.virtualenvs/72b41e48ba5dcfd/local/lib/python2.7/site-packages/msgpack/_unpacker.so in msgpack._unpacker.Unpacker._unpack (msgpack/_unpacker.cpp:348)()

ValueError: Unpack failed: error = -1

As you can see, incremental parsing is completely broken here. Due to the strange non-deterministic errors, I suspect the parser reads parts of memory that it should not read, probably caused by incorrect buffer management and resulting out-of-bounds reads.

AIX build problem

I'm having a problem building the C++ extensions on AIX. It might be completely unrelated to msgpack-python itself, albeit I have built other python libraries using C++ extensions.

This is the output I get:

# python setup.py build
running build
running build_py
running build_ext
building 'msgpack._packer' extension
gcc -pthread -DNDEBUG -O -maix32 -g -mminimal-toc -DSYSV -D_AIX -D_AIX32 -D_AIX41 -D_AIX43 -D_AIX51 -D_ALL_SOURCE -DFUNCPROTO=15 -O2 -I/opt/freeware/include -D__BIG_ENDIAN__=1 -I. -I/opt/freeware/include/python2.7 -c msgpack/_packer.cpp -o build/temp.aix-7.1-2.7/msgpack/_packer.o
g++ gcc -pthread -bI:/opt/freeware/lib/python2.7/config/python.exp -L/opt/freeware/lib -L/opt/freeware/lib -lncurses -Wl,-blibpath:/opt/freeware/lib:/usr/lib:/lib -Wl,-bmaxdata:0x80000000 -L/opt/freeware/lib -Wl,-blibpath:/opt/freeware/lib:/usr/lib:/lib -Wl,-bmaxdata:0x80000000 -maix32 -g -mminimal-toc -DSYSV -D_AIX -D_AIX32 -D_AIX41 -D_AIX43 -D_AIX51 -D_ALL_SOURCE -DFUNCPROTO=15 -O2 -I/opt/freeware/include build/temp.aix-7.1-2.7/msgpack/_packer.o -o build/lib.aix-7.1-2.7/msgpack/_packer.so
g++: error: gcc: No such file or directory
g++: error: unrecognized command line option '-bI:/opt/freeware/lib/python2.7/config/python.exp'
WARNING: Failed to compile extensiom modules.
msgpack uses fallback pure python implementation.
command 'g++' failed with exit status 1
building 'msgpack._unpacker' extension
gcc -pthread -DNDEBUG -O -maix32 -g -mminimal-toc -DSYSV -D_AIX -D_AIX32 -D_AIX41 -D_AIX43 -D_AIX51 -D_ALL_SOURCE -DFUNCPROTO=15 -O2 -I/opt/freeware/include -D__BIG_ENDIAN__=1 -I. -I/opt/freeware/include/python2.7 -c msgpack/_unpacker.cpp -o build/temp.aix-7.1-2.7/msgpack/_unpacker.o
In file included from msgpack/_unpacker.cpp:316:0:
msgpack/unpack.h: In function 'int unpack_callback_ext(unpack_user*, const char*, const char*, unsigned int, PyObject**)':
msgpack/unpack.h:250:77: warning: deprecated conversion from string constant to 'char*' [-Wwrite-strings]
     py = PyObject_CallFunction(u->ext_hook, "(is#)", typecode, pos, lenght-1);
                                                                             ^
g++ gcc -pthread -bI:/opt/freeware/lib/python2.7/config/python.exp -L/opt/freeware/lib -L/opt/freeware/lib -lncurses -Wl,-blibpath:/opt/freeware/lib:/usr/lib:/lib -Wl,-bmaxdata:0x80000000 -L/opt/freeware/lib -Wl,-blibpath:/opt/freeware/lib:/usr/lib:/lib -Wl,-bmaxdata:0x80000000 -maix32 -g -mminimal-toc -DSYSV -D_AIX -D_AIX32 -D_AIX41 -D_AIX43 -D_AIX51 -D_ALL_SOURCE -DFUNCPROTO=15 -O2 -I/opt/freeware/include build/temp.aix-7.1-2.7/msgpack/_unpacker.o -o build/lib.aix-7.1-2.7/msgpack/_unpacker.so
g++: error: gcc: No such file or directory
g++: error: unrecognized command line option '-bI:/opt/freeware/lib/python2.7/config/python.exp'
WARNING: Failed to compile extensiom modules.
msgpack uses fallback pure python implementation.
command 'g++' failed with exit status 1

These are my compiler flags:
OBJECT_MODE=32
LDFLAGS="-L/opt/freeware/lib -Wl,-blibpath:/opt/freeware/lib:/usr/lib:/lib -Wl,-bmaxdata:0x80000000"
CFLAGS="-maix32 -g -mminimal-toc -DSYSV -D_AIX -D_AIX32 -D_AIX41 -D_AIX43 -D_AIX51 -D_ALL_SOURCE -DFUNCPROTO=15 -O2 -I/opt/freeware/include"
CXXFLAGS=$CFLAGS

Improve handling of numpy integers

Numpy integers operate almost exactly like Python integers in all circumstances, but msgpack doesn't handle them. That's fine, but the error message that gets thrown is fairly confusing.

>>> import numpy
>>> five = numpy.int32(5)
>>> five
5
>>> type(five)
<type 'numpy.int32'>
>>> msgpack.dumps(five)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "_msgpack.pyx", line 151, in msgpack._msgpack.packb (msgpack/_msgpack.c:1607)
  File "_msgpack.pyx", line 135, in msgpack._msgpack.Packer.pack (msgpack/_msgpack.c:1363)
  File "_msgpack.pyx", line 130, in msgpack._msgpack.Packer._pack (msgpack/_msgpack.c:1305)
TypeError: can't serialize 5

It's not really msgpack's problem per se, but I think handling it more gracefully would be nice, since numpy is in very widespread use. I'd suggest adding type information to the TypeError, so the error message would read something like:

TypeError: can't serialize 5 (<type 'numpy.int32'>)

That's make debugging this and other similar problems easier.

Actually treating numpy integers (and other numpy scalar types) as integers would be a bonus.

Please note:
I wasn't the initial reporter of this bug/request. I am simple re-posting this (msgpack/msgpack#36 (comment)) into this repo.

tuple and list are indistinguishable

I am trying to build universal serializaion system on top of msgpack, similar to standard pickle, except more secure. The system needs to be accurate at saving/restoring data.

The problem is current implementation stores list and tuple as sequence and type information is lost on loading it back. Unfortunately it is impossible to treat tuple and list differently in current version of the library and while it is trivial to fix, it requires altering msgpack.

What is needed is extra argument to Packer. Something like tuple_autoconvert=False. If this argument is set, msgpack would not serialize tuple as list, but treat it as unknown type and will pass it's serialization to default. This way it would be possible to implement custom serialization to tuple via extension type, etc.

use PyMapping_Check rather than PyDict_Check

more relax for mapping protocol.
maybe we just requre len, items(), so we only need PyMapping_Check.
For example, PyDict_Check don't pass https://bitbucket.org/bcsaller/rbtree/ but PyMapping_Check do.

--------------------------------------------------------------in Japanese
็งใ€ๆ—ฅๆœฌไบบใงใ™ใฎใงใ€ใƒ˜ใƒณใช่‹ฑ่ชžใ ใ‘ใง่ชค่งฃใ‚’ๆ‹›ใใฎใ‚‚ๅฑ้™บใชใฎใงใ€ๆ—ฅๆœฌ่ชžใ‚‚ไฝต่จ˜ใ—ใฆใŠใใพใ™ใ€‚

ใŸใจใˆใฐ rbtree (https://bitbucket.org/bcsaller/rbtree/) ใ‚’ใ€ŒMAPๅž‹ใ€ใจใ—ใฆใ‚ทใƒชใ‚ขใƒฉใ‚คใ‚บใ—ใŸใ„ใงใ™ใ€‚ใ“ใ‚Œใฏใ€C++ ใ‚’็›ธๆ‰‹ใซใ‚ทใƒชใ‚ขใƒฉใ‚คใ‚บใ™ใ‚‹ๅ ดๅˆใ€STL ใฎใƒขใƒ‡ใƒซใจๅŒใ˜(่พžๆ›ธ้ †)ใŒ็†ๆƒณ็š„(upper_boundใงใ‚ญใƒผๆคœ็ดขๅ‡บๆฅใŸใ‚Š)ใชใฎใงใ™ใŒใ€rbtree ใฏ PyDict_Check ใŒๅฝใซใชใ‚Šใพใ™ใ€‚PyMapping_Check ใซใ™ใ‚Œใฐๅ•้กŒใชใใ‚ทใƒชใ‚ขใƒฉใ‚คใ‚บๅ‡บๆฅใพใ™ใ€‚ๅฟ…่ฆใชใฎใฏ len ใจ items() ใ ใ‘ใจๆ€ใ„ใพใ™ใฎใงใ€ใŠใใ‚‰ใ PyMapping_Check ใงๅๅˆ†ใชใฎใงใฏใชใ„ใ‹ใจๆ€ใ„ใพใ—ใŸใ€‚

SystemError thrown during unpacking invalid values

When trying to unpack invalid values will cause Python to throw a SystemError

>>> import msgpack
>>> msgpack.unpackb('\xd9\x97#DL_')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
SystemError: error return without exception set

`StopIteration` should only be raised in `__next__`

At present, Unpacker.unpack and Unpacker.skip raise StopIteration, despite the fact that they are not necessarily used in iteration.

Raising StopIteration outside of __next__ methods can make a havoc of debugging. If one uses Unpacker.unpack somewhere within a generator (either an expression or function) and does not handle StopIteration, the surrounding generator will catch the StopIteration and end generation silently.

I propose that Unpacker.unpack and Unpacker.skip (and read_array/map_header when they are merged) raise msgpack.OutOfData, while Unpacker.__next__ raises StopIteration alone.

This will make working with msgpack within iterating contexts more explicit, and debugging with generators much simpler!

unpacker.tell() like in file.tell()

When reading large file of objects, encoded with MessagePack it is important to know object offset.

If you use f.tell() it returns read buffer offset, not actual object offset.

It would be great to have some thing like this:

unpacker = msgpack.Unpacker(stream)
for obj in unpacker:
    offset = unpacker.tell()

Or even this:

unpacker = msgpack.Unpacker(stream, tell=True)
for offset, obj in unpacker:
    # ...

OverflowError on Unpacker.unpack in 32bit system

I'm having an issue unpacking even a small message (on OSX 10.5; 32-bit).

Under the recent changes to Unpacker, it seems read_size is more-or-less ignored: the maximum quantity of bytes read is now:

max(self.read_size, self.max_buffer_size - (self.buf_tail - self.buf_head)))

where max_buffer_size is INT_MAX by default. With initial buf_head = buf_tail = 0, calling file_like_read(max_buffer_size) triggers an 'OverflowError: string is too large'.

My system seems happy with max_buffer_size == INT_MAX - 21.

But I can't understand what the purpose is in read_size, especially when its use is inconsistent with its description in Unpacker's docstring. The description there of max_buffer_size is also unclear.

Make fallback API match Cython's for attributes

Currently fallback.Unpacker has a list_hook attribute publicly available. It would not be surprising if a user decided to change the value of list_hook after the Unpacker was constructed, but such functionality is not compatible with the Cython version.

Hence, we should probably preface the attributes of Unpacker and Packer with _ to make it clear that they are not to be used without caution.

segfault when using hooks with Unpacker

Unlike the one-shot unpacker, when you're using the streaming Unpacker class, the reference count on callback functions (e.g., object_hook) needs to be incremented.

As it stands in 0.3.0, passing a callback with limited lifetime (e.g., an instance method) results in segfaults or SystemErrors.

msgpack.loads hangs for a long time on invalid input

Minimal reproducible example:

from msgpack import loads
#      ---------------------- Array 32
#     |           ----------- Large number
#     |          |        --- No other data
#     |          |       |
#     v          v       v
s = "\xdd\xff\x00\x00\x00"
loads(s)

Function loads in this example consumes a lot of memory and will have failed years later.
And looks like str 32 and map 32 are not affected.

pip installation fails on OS X

If installed via pip install msgpack-python, it fails:

> pip install -U msgpack-python
Collecting msgpack-python
  Using cached msgpack-python-0.4.3.tar.gz
Installing collected packages: msgpack-python
  Running setup.py install for msgpack-python
    building 'msgpack._packer' extension
    clang -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -D__LITTLE_ENDIAN__=1 -I. -I/Users/ivansmirnov/.pyenv/versions/2.7.9/include/python2.7 -c msgpack/_packer.cpp -o build/temp.macosx-10.10-x86_64-2.7/msgpack/_packer.o
    msgpack/_packer.pyx:204:21: error: call to 'msgpack_pack_ext' is ambiguous
          __pyx_v_ret = msgpack_pack_ext((&__pyx_v_self->pk), __pyx_v_longval, __pyx_v_L);
                        ^~~~~~~~~~~~~~~~
    msgpack/pack.h:74:19: note: candidate function
    static inline int msgpack_pack_ext(msgpack_packer* pk, char typecode, size_t l);
                      ^
    msgpack/pack_template.h:715:19: note: candidate function
    static inline int msgpack_pack_ext(msgpack_packer* x, int8_t typecode, size_t l)
                      ^
    1 error generated.
    WARNING: Failed to compile extensiom modules.
    msgpack uses fallback pure python implementation.
    command 'clang' failed with exit status 1
    building 'msgpack._unpacker' extension
    clang -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -D__LITTLE_ENDIAN__=1 -I. -I/Users/ivansmirnov/.pyenv/versions/2.7.9/include/python2.7 -c msgpack/_unpacker.cpp -o build/temp.macosx-10.10-x86_64-2.7/msgpack/_unpacker.o
    In file included from msgpack/_unpacker.cpp:241:
    msgpack/unpack.h:245:45: warning: conversion from string literal to 'char *' is deprecated [-Wc++11-compat-deprecated-writable-strings]
        py = PyObject_CallFunction(u->ext_hook, "(is#)", typecode, pos, lenght-1);
                                                ^
    cython_utility:1285:28: warning: unused function '__Pyx_PyObject_AsString' [-Wunused-function]
    static CYTHON_INLINE char* __Pyx_PyObject_AsString(PyObject* o) {
                               ^
    cython_utility:1282:32: warning: unused function '__Pyx_PyUnicode_FromString' [-Wunused-function]
    static CYTHON_INLINE PyObject* __Pyx_PyUnicode_FromString(const char* c_str) {
                                   ^
    msgpack/_unpacker.cpp:303:29: warning: unused function '__Pyx_Py_UNICODE_strlen' [-Wunused-function]
    static CYTHON_INLINE size_t __Pyx_Py_UNICODE_strlen(const Py_UNICODE *u)
                                ^
    cython_utility:600:27: warning: unused function '__Pyx_ExceptionSave' [-Wunused-function]
    static CYTHON_INLINE void __Pyx_ExceptionSave(PyObject **type, PyObject **value, PyObject **tb) {
                              ^
    cython_utility:1035:32: warning: unused function '__Pyx_PyInt_From_long' [-Wunused-function]
    static CYTHON_INLINE PyObject* __Pyx_PyInt_From_long(long value) {
                                   ^
    cython_utility:1061:27: warning: function '__Pyx_PyInt_As_long' is not needed and will not be emitted [-Wunneeded-internal-declaration]
    static CYTHON_INLINE long __Pyx_PyInt_As_long(PyObject *x) {
                              ^
    7 warnings generated.
    c++ -bundle -undefined dynamic_lookup -L/usr/local/opt/readline/lib -L/usr/local/opt/readline/lib -L/Users/ivansmirnov/.pyenv/versions/2.7.9/lib build/temp.macosx-10.10-x86_64-2.7/msgpack/_unpacker.o -o build/lib.macosx-10.10-x86_64-2.7/msgpack/_unpacker.so
Successfully installed msgpack-python-0.4.3

However it seems to work if installed directly from source via python setup.py build_ext, python setup.py install:

> python setup.py build_ext
running build_ext
cythonize: 'msgpack/_packer.pyx'
building 'msgpack._packer' extension
creating build
creating build/temp.macosx-10.10-x86_64-2.7
creating build/temp.macosx-10.10-x86_64-2.7/msgpack
clang -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -D__LITTLE_ENDIAN__=1 -I. -I/Users/ivansmirnov/.pyenv/versions/2.7.9/include/python2.7 -c msgpack/_packer.cpp -o build/temp.macosx-10.10-x86_64-2.7/msgpack/_packer.o
cython_utility:1746:32: warning: unused function '__Pyx_PyUnicode_FromString' [-Wunused-function]
static CYTHON_INLINE PyObject* __Pyx_PyUnicode_FromString(const char* c_str) {
                               ^
msgpack/_packer.cpp:303:29: warning: unused function '__Pyx_Py_UNICODE_strlen' [-Wunused-function]
static CYTHON_INLINE size_t __Pyx_Py_UNICODE_strlen(const Py_UNICODE *u)
                            ^
cython_utility:1861:33: warning: unused function '__Pyx_PyIndex_AsSsize_t' [-Wunused-function]
static CYTHON_INLINE Py_ssize_t __Pyx_PyIndex_AsSsize_t(PyObject* b) {
                                ^
cython_utility:132:27: warning: unused function '__Pyx_ErrFetch' [-Wunused-function]
static CYTHON_INLINE void __Pyx_ErrFetch(PyObject **type, PyObject **value, PyObject **tb) {
                          ^
cython_utility:1499:32: warning: unused function '__Pyx_PyInt_From_long' [-Wunused-function]
static CYTHON_INLINE PyObject* __Pyx_PyInt_From_long(long value) {
                               ^
cython_utility:1525:26: warning: function '__Pyx_PyInt_As_int' is not needed and will not be emitted [-Wunneeded-internal-declaration]
static CYTHON_INLINE int __Pyx_PyInt_As_int(PyObject *x) {
                         ^
6 warnings generated.
creating build/lib.macosx-10.10-x86_64-2.7
creating build/lib.macosx-10.10-x86_64-2.7/msgpack
c++ -bundle -undefined dynamic_lookup -L/usr/local/opt/readline/lib -L/usr/local/opt/readline/lib -L/Users/ivansmirnov/.pyenv/versions/2.7.9/lib build/temp.macosx-10.10-x86_64-2.7/msgpack/_packer.o -o build/lib.macosx-10.10-x86_64-2.7/msgpack/_packer.so
cythonize: 'msgpack/_unpacker.pyx'
building 'msgpack._unpacker' extension
clang -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -D__LITTLE_ENDIAN__=1 -I. -I/Users/ivansmirnov/.pyenv/versions/2.7.9/include/python2.7 -c msgpack/_unpacker.cpp -o build/temp.macosx-10.10-x86_64-2.7/msgpack/_unpacker.o
In file included from msgpack/_unpacker.cpp:241:
msgpack/unpack.h:245:45: warning: conversion from string literal to 'char *' is deprecated [-Wc++11-compat-deprecated-writable-strings]
    py = PyObject_CallFunction(u->ext_hook, "(is#)", typecode, pos, lenght-1);
                                            ^
cython_utility:1285:28: warning: unused function '__Pyx_PyObject_AsString' [-Wunused-function]
static CYTHON_INLINE char* __Pyx_PyObject_AsString(PyObject* o) {
                           ^
cython_utility:1282:32: warning: unused function '__Pyx_PyUnicode_FromString' [-Wunused-function]
static CYTHON_INLINE PyObject* __Pyx_PyUnicode_FromString(const char* c_str) {
                               ^
msgpack/_unpacker.cpp:303:29: warning: unused function '__Pyx_Py_UNICODE_strlen' [-Wunused-function]
static CYTHON_INLINE size_t __Pyx_Py_UNICODE_strlen(const Py_UNICODE *u)
                            ^
cython_utility:600:27: warning: unused function '__Pyx_ExceptionSave' [-Wunused-function]
static CYTHON_INLINE void __Pyx_ExceptionSave(PyObject **type, PyObject **value, PyObject **tb) {
                          ^
cython_utility:1035:32: warning: unused function '__Pyx_PyInt_From_long' [-Wunused-function]
static CYTHON_INLINE PyObject* __Pyx_PyInt_From_long(long value) {
                               ^
cython_utility:1061:27: warning: function '__Pyx_PyInt_As_long' is not needed and will not be emitted [-Wunneeded-internal-declaration]
static CYTHON_INLINE long __Pyx_PyInt_As_long(PyObject *x) {
                          ^
7 warnings generated.
c++ -bundle -undefined dynamic_lookup -L/usr/local/opt/readline/lib -L/usr/local/opt/readline/lib -L/Users/ivansmirnov/.pyenv/versions/2.7.9/lib build/temp.macosx-10.10-x86_64-2.7/msgpack/_unpacker.o -o build/lib.macosx-10.10-x86_64-2.7/msgpack/_unpacker.so
> python setup.py install
running install
running bdist_egg
running egg_info
creating msgpack_python.egg-info
writing msgpack_python.egg-info/PKG-INFO
writing top-level names to msgpack_python.egg-info/top_level.txt
writing dependency_links to msgpack_python.egg-info/dependency_links.txt
writing pbr to msgpack_python.egg-info/pbr.json
writing manifest file 'msgpack_python.egg-info/SOURCES.txt'
reading manifest file 'msgpack_python.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching '*.c' under directory 'msgpack'
writing manifest file 'msgpack_python.egg-info/SOURCES.txt'
installing library code to build/bdist.macosx-10.10-x86_64/egg
running install_lib
running build_py
copying msgpack/__init__.py -> build/lib.macosx-10.10-x86_64-2.7/msgpack
copying msgpack/_version.py -> build/lib.macosx-10.10-x86_64-2.7/msgpack
copying msgpack/exceptions.py -> build/lib.macosx-10.10-x86_64-2.7/msgpack
copying msgpack/fallback.py -> build/lib.macosx-10.10-x86_64-2.7/msgpack
running build_ext
creating build/bdist.macosx-10.10-x86_64
creating build/bdist.macosx-10.10-x86_64/egg
creating build/bdist.macosx-10.10-x86_64/egg/msgpack
copying build/lib.macosx-10.10-x86_64-2.7/msgpack/__init__.py -> build/bdist.macosx-10.10-x86_64/egg/msgpack
copying build/lib.macosx-10.10-x86_64-2.7/msgpack/_packer.so -> build/bdist.macosx-10.10-x86_64/egg/msgpack
copying build/lib.macosx-10.10-x86_64-2.7/msgpack/_unpacker.so -> build/bdist.macosx-10.10-x86_64/egg/msgpack
copying build/lib.macosx-10.10-x86_64-2.7/msgpack/_version.py -> build/bdist.macosx-10.10-x86_64/egg/msgpack
copying build/lib.macosx-10.10-x86_64-2.7/msgpack/exceptions.py -> build/bdist.macosx-10.10-x86_64/egg/msgpack
copying build/lib.macosx-10.10-x86_64-2.7/msgpack/fallback.py -> build/bdist.macosx-10.10-x86_64/egg/msgpack
byte-compiling build/bdist.macosx-10.10-x86_64/egg/msgpack/__init__.py to __init__.pyc
byte-compiling build/bdist.macosx-10.10-x86_64/egg/msgpack/_version.py to _version.pyc
byte-compiling build/bdist.macosx-10.10-x86_64/egg/msgpack/exceptions.py to exceptions.pyc
byte-compiling build/bdist.macosx-10.10-x86_64/egg/msgpack/fallback.py to fallback.pyc
creating stub loader for msgpack/_packer.so
creating stub loader for msgpack/_unpacker.so
byte-compiling build/bdist.macosx-10.10-x86_64/egg/msgpack/_packer.py to _packer.pyc
byte-compiling build/bdist.macosx-10.10-x86_64/egg/msgpack/_unpacker.py to _unpacker.pyc
creating build/bdist.macosx-10.10-x86_64/egg/EGG-INFO
copying msgpack_python.egg-info/PKG-INFO -> build/bdist.macosx-10.10-x86_64/egg/EGG-INFO
copying msgpack_python.egg-info/SOURCES.txt -> build/bdist.macosx-10.10-x86_64/egg/EGG-INFO
copying msgpack_python.egg-info/dependency_links.txt -> build/bdist.macosx-10.10-x86_64/egg/EGG-INFO
copying msgpack_python.egg-info/pbr.json -> build/bdist.macosx-10.10-x86_64/egg/EGG-INFO
copying msgpack_python.egg-info/top_level.txt -> build/bdist.macosx-10.10-x86_64/egg/EGG-INFO
writing build/bdist.macosx-10.10-x86_64/egg/EGG-INFO/native_libs.txt
zip_safe flag not set; analyzing archive contents...
creating dist
creating 'dist/msgpack_python-0.4.3-py2.7-macosx-10.10-x86_64.egg' and adding 'build/bdist.macosx-10.10-x86_64/egg' to it
removing 'build/bdist.macosx-10.10-x86_64/egg' (and everything under it)
Processing msgpack_python-0.4.3-py2.7-macosx-10.10-x86_64.egg
Copying msgpack_python-0.4.3-py2.7-macosx-10.10-x86_64.egg to /Users/ivansmirnov/.pyenv/versions/2.7.9/lib/python2.7/site-packages
Adding msgpack-python 0.4.3 to easy-install.pth file

Installed /Users/ivansmirnov/.pyenv/versions/2.7.9/lib/python2.7/site-packages/msgpack_python-0.4.3-py2.7-macosx-10.10-x86_64.egg
Processing dependencies for msgpack-python==0.4.3
Finished processing dependencies for msgpack-python==0.4.3

msgpack debugging symbols

It would be nice if there was documentation explaining how to build and install a version with debugging symbols since one is not available either through pip or apt-get

currently im trying to use msgpack with python2.7-dbg. I get the following

Traceback (most recent call last):
File "node.py", line 4, in
import msgpack
File "/usr/local/lib/python2.7/dist-packages/msgpack/init.py", line 3, in
from msgpack._msgpack import *
ImportError: /usr/local/lib/python2.7/dist-packages/msgpack/_msgpack.so: undefined symbol: Py_InitModule4

Is it easy to rebuild the package with the debug symbols needed?

PyArg_ParseTuple fails with "s" if string contains \x00.

msgpack.packb('a'*40) creates '\xda\x00(aaaaaa'.

If some python API uses PyArg_ParseTuple to get input string as "s", it raises TypeError.
(If it uses "s#", it works fine.)

example of C binding code:

...
char *data;
PyArg_ParseTuple(args, "s", &data);
...

>>> test_str = msgpack.packb('a'*40)
>>> some_func(test_str)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "some_func.py", line xx, in xxx
    some_func(test_str)
TypeError: argument 1 must be string without null bytes, not str

I think it's by design.
Is there any good workaround for this issue?
It's difficult to guarantee all API uses "s#".

Format limitations unchecked

It seems like right now, if you violate the spec in terms of what data you try to pack, there's not enough error checking and you only find out what you've stored is garbage once you try to deserialize it (if it detects it at all).

https://github.com/msgpack/msgpack/blob/master/spec.md#types-limitation

# This violates the spec; should fail here
>>> packed = msgpack.packb('\x00'*4294967296)

# No exception; packed is a str
>>> type(packed)
<type 'str'>

# And then unpacking does fail
>>> msgpack.unpackb(packed)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "_unpacker.pyx", line 123, in msgpack._unpacker.unpackb (msgpack/_unpacker.cpp:123)
msgpack.exceptions.ExtraData: unpack(b) received extra data.

In this case, Unpacker does not even raise the same exception as unpackb, so not only can you fail to detect that what you've tried to serialize is garbage, but you can also fail to detect what you're trying to deserialize is garbage.

MsgPack incorrectly serializes longs

msgpack-python seems to get confused when serializing longs < 4,294,967,295.

>>> msgpack.unpackb(msgpack.packb(4294967295L))
-1

Found in msgpack-python 0.4.2

Surface size of unpacker's internal buffer

Could the size of an Unpacker's internal buffer be surfaced somewhere (either the size of the allocation of the buffer, or more usefully the amount of data buffered but yet to be deserialized in the unpacker)? I'm trying to write an app where msgpack buffers are accepted from untrusted users, and I want to prevent a memory blowup.

len(unpacker) would be a particularly nice way to do this, but really anything will do.

Segmentation fault occurred when data is unpacked using `object_hook`

A segmentation fault occurred when I try to execute following script:

import datetime

import msgpack

useful_dict = {
    "id": 1,
    "nickname": "foo",
    "created": datetime.datetime.now(),
    "updated": datetime.datetime.now(),
}
def decode_datetime(obj):
    if b'__datetime__' in obj:
        obj = datetime.datetime.strptime("%Y%m%dT%H:%M:%S.%f")
    return obj

def encode_datetime(obj):
    if isinstance(obj, datetime.datetime):
        return {'__datetime__': True, 'as_str': obj.strftime("%Y%m%dT%H:%M:%S.%f")}
    return obj

print("Dict before packing: %s" % str(useful_dict))
packed_dict = msgpack.packb(useful_dict, default=encode_datetime)
this_dict_again = msgpack.unpackb(packed_dict, object_hook=decode_datetime)
print("Dict after packing/unpacking: %s" % str(this_dict_again))
python test_default.py  
Dict before packing: {'updated': datetime.datetime(2012, 10, 12, 10, 47, 34, 405448), 'nickname': 'foo', 'id': 1, 'created': datetime.datetime(2012, 10, 12, 10, 47, 34, 405413)}
[1]    10601 segmentation fault (core dumped)  python test_default.py

After short investigation using gdb and looking to the source code the reason became clear.
It happens because of insufficient error checking during unpacking process. A problem occurred when some user data structure packed into dictionary.
I've solved the problem in my forked repo and will try to create a pull request.

PIP Build Error

Both pip and easy install fail when trying to install message pack.

Here are the outputs:

$ sudo pip install msgpack-python
Downloading/unpacking msgpack-python
  Running setup.py egg_info for package msgpack-python

Installing collected packages: msgpack-python
  Running setup.py install for msgpack-python
    building 'msgpack._msgpack' extension
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -D__LITTLE_ENDIAN__=1 -I. -I/usr/include/python2.7 -c msgpack/_msgpack.c -o build/temp.linux-armv6l-2.7/msgpack/_msgpack.o
    msgpack/_msgpack.c:4:20: fatal error: Python.h: No such file or directory
    compilation terminated.
    error: command 'gcc' failed with exit status 1
    Complete output from command /usr/bin/python -c "import setuptools;__file__='/tmp/pip-build/msgpack-python/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-8gbdyQ-record/install-record.txt --single-version-externally-managed:
    running install

running build

running build_py

running build_ext

building 'msgpack._msgpack' extension

gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -D__LITTLE_ENDIAN__=1 -I. -I/usr/include/python2.7 -c msgpack/_msgpack.c -o build/temp.linux-armv6l-2.7/msgpack/_msgpack.o

msgpack/_msgpack.c:4:20: fatal error: Python.h: No such file or directory

compilation terminated.

error: command 'gcc' failed with exit status 1

----------------------------------------
Command /usr/bin/python -c "import setuptools;__file__='/tmp/pip-build/msgpack-python/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-8gbdyQ-record/install-record.txt --single-version-externally-managed failed with error code 1 in /tmp/pip-build/msgpack-python
Storing complete log in /root/.pip/pip.log



$ sudo easy_install msgpack-python
Searching for msgpack-python
Reading http://pypi.python.org/simple/msgpack-python/
Reading http://msgpack.org/
Reading http://msgpack.sourceforge.net/
Reading http://msgpack.sourceforge.jp/
Best match: msgpack-python 0.2.4
Downloading http://pypi.python.org/packages/source/m/msgpack-python/msgpack-python-0.2.4.tar.gz#md5=c4bb313cd35b57319f588491b1614289
Processing msgpack-python-0.2.4.tar.gz
Running msgpack-python-0.2.4/setup.py -q bdist_egg --dist-dir /tmp/easy_install-QD613R/msgpack-python-0.2.4/egg-dist-tmp-6Uv4J9
msgpack/_msgpack.c:4:20: fatal error: Python.h: No such file or directory
compilation terminated.
error: Setup script exited with error: command 'gcc' failed with exit status 1

error: invalid command 'egg_info'

โžœ sudo pip install msgpack-python
Password:
Downloading/unpacking msgpack-python
  Downloading msgpack-python-0.4.0.tar.gz (111kB): 111kB downloaded
  Running setup.py egg_info for package msgpack-python
    usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
       or: -c --help [cmd1 cmd2 ...]
       or: -c --help-commands
       or: -c cmd --help

    error: invalid command 'egg_info'
    Complete output from command python setup.py egg_info:
    usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]

   or: -c --help [cmd1 cmd2 ...]

   or: -c --help-commands

   or: -c cmd --help



error: invalid command 'egg_info'

----------------------------------------
Cleaning up...
Command python setup.py egg_info failed with error code 1 in /private/tmp/pip_build_root/msgpack-python
โžœ python --version
Python 3.3.0

โžœ pip --version
pip 1.4.1 from /opt/local/Library/Frameworks/Python.framework/Versions/3.3/lib/python3.3/site-packages (python 3.3)

Windows test failure with large integers

Compiled using mingw on Python 2.7, three tests fail, all errors seemingly related to large integers:

___________________________________ test_5 ____________________________________

    def test_5():
        for o in [1 << 16, (1 << 32) - 1,
                  -((1<<15)+1), -(1<<31)]:
>           check(5, o)

test\test_case.py:32:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

length = 5, obj = 4294967295L

    def check(length, obj):
        v = packb(obj)
        assert len(v) == length, \
            "%r length should be %r but get %r" % (obj, length, len(v))
>       assert unpackb(v, use_list=0) == obj
E       assert -1 == 4294967295L
E        +  where -1 = unpackb('\xce\xff\xff\xff\xff', use_list=0)

test\test_case.py:11: AssertionError
_______________________________ testUnsignedInt _______________________________

    def testUnsignedInt():
        check(
              b"\x99\xcc\x00\xcc\x80\xcc\xff\xcd\x00\x00\xcd\x80\x00"
              b"\xcd\xff\xff\xce\x00\x00\x00\x00\xce\x80\x00\x00\x00"
              b"\xce\xff\xff\xff\xff",
>             (0, 128, 255, 0, 32768, 65535, 0, 2147483648, 4294967295,),
              )

test\test_format.py:39:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

src = '\x99\xcc\x00\xcc\x80\xcc\xff\xcd\x00\x00\xcd\x80\x00\xcd\xff\xff\xce\x00\x00\x00\x00\xce\x80\x00\x00\x00\xce\xff\xff\xff\xff'
should = (0, 128, 255, 0, 32768, 65535, ...), use_list = 0

    def check(src, should, use_list=0):
>       assert unpackb(src, use_list=use_list) == should
E       assert (0, 128, 255,...8, 65535, ...) == (0, 128, 255, ...8, 65535, ...)
E         At index 7 diff: -2147483648 != 2147483648L

test\test_format.py:7: AssertionError
__________________________________ testPack ___________________________________

    def testPack():
        test_data = [
                0, 1, 127, 128, 255, 256, 65535, 65536, 4294967295, 4294967296,
                -1, -32, -33, -128, -129, -32768, -32769, -4294967296, -4294967297,
                1.0,
            b"", b"a", b"a"*31, b"a"*32,
            None, True, False,
            (), ((),), ((), None,),
            {None: 0},
            (1<<23),
            ]
        for td in test_data:
>           check(td)

test\test_pack.py:28:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

data = 4294967295L, use_list = False

    def check(data, use_list=False):
        re = unpackb(packb(data), use_list=use_list)
>       assert re == data
E       assert -1 == 4294967295L

test\test_pack.py:14: AssertionError

How to make a class be able to get packed by msgpack

Hi,

I'm trying to use zerorpc, which uses msgpack. One of my custom classes can't be packed by msgpack. Here is the traceback:

msgpack.dumps(CcaAsyncMotorTask("test"))
*** TypeError: can't serialize <__main__.CcaAsyncMotorTask object at 0xf05890>

The same instance can be pickled:

pickle.dumps(CcaAsyncMotorTask("test"))
"ccopy_reg\n_reconstructor\np0\n(c__main__\nCcaAsyncMotorTask\np1\nc__builtin__\nobject\np2\nNtp3\nRp4\n(dp5\nS'_allow_exec_signals'\np6\n(dp7\nsS'current_state'\np8\nS'stopped'\np9\nsS'name'\np10\nS'test'\np11\nsS'_duration'\np12\nI0\nsS'_allow_exec_signal'\np13\ng0\n(ccca_signal\nCcaSignal\np14\ng2\nNtp15\nRp16\n(dp17\nS'_CcaSignal__val0'\np18\nI00\nsS'_CcaSignal__readers_got_last_change'\np19\n(dp20\nsS'_CcaSignal__val'\np21\nI00\nsbsb."

I don't know why msgpack can not pack this instance. What should I change in the class?

UINT32 IS BORKED

In 0.4.2 ( and some prior versions ) the C implementation for uint32 decoding is broken:

In unpack_callback_uint32 ( from unpack.h ) the line:
p = PyInt_FromLong((long)d);

causes unsigned 32 bit integers to unpack as negative values in python:
ie. the hex "ce f0 00 00 00" unpacks to -268435456 rather than 4026531840

this can lead to catastrophic data corruption at the application layer when dealing with integers between 2 and 4 billion.

I would recommend changing to:
p = PyLong_FromUnsignedLong((unsigned long)d);

with a possible check against INT32_MAX ( not UINT32_MAX ) to create the slightly smaller/faster PyInt_FromLong() case, but only when the value will not trigger negative interpretation as a PyInt.

Fails to build on Mountain Lion

Mountain Lion has removed /usr/bin/gcc-4.2 (/usr/bin/gcc is now 4 by default) and hence "pip install msgpack-python" fails to build with the following error at "building 'msgpack._msgpack' extension":

unable to execute /usr/bin/gcc-4.2: No such file or directory

Creating a symlink to /usr/bin/gcc fixes this and completes the build:

sudo ln -s /usr/bin/gcc /usr/bin/gcc-4.2

It's not a nice fix though.. it would be nice if the installer could detect the correct gcc version to use on Mountain Lion.

Reading corruption in Pure Python Unpacker implementation

After spending several hours debugging an archive corruption in Attic (jborg/attic#185), I came to find:

The Pure Python and C-Optimized implementation of msgpack.Unpacker behave differently when you feed data several times. I have a small (100k) and a large (1.7M) reproducer. Unfortunately, both may contain private data. I can send them to a developer (collaborator) if you keep the files on your machine and do not share it publicly.

To better illustrate what we are dealing with. The small reproducer looks like this:

import msgpack
unpacker = msgpack.Unpacker(use_list=False)
data = b'\xefM<SNIPPED>'

unpacker.feed(data)
print(len(list(unpacker)))
print(len(list(unpacker)))

unpacker.feed(data)
print(len(list(unpacker)))
print(len(list(unpacker)))

In Pure-Python, this yields:

5
5
345
5

In C, this yields:

5
0
340
0

The large reproducer has this structure:

from msgpack import Unpacker
unpacker = Unpacker()

# store has 14 bytestrings
store = [b'\x87\xa3gid....', ...]

for index, data in enumerate(store):
    print("Feed #%s" % index)
    unpacker.feed(data)
    print([type(item) for item in unpacker])

In Pure-Python, this yields:

Feed #0
[<class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>]
Feed #1
[<class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>]
Feed #2
[<class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>]
Feed #3
[<class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>]
Feed #4
[<class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>]
Feed #5
[<class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>]
Feed #6
[<class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>]
Feed #7
[<class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>]
Feed #8
[<class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>]
Feed #9
[<class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>]
Feed #10
[<class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>]
Feed #11
[<class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>]
Feed #12
[<class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>]
Feed #13
[<class 'int'>, <class 'int'>, <class 'dict'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'list'>, <class 'bytes'>, <class 'int'>, <class 'bytes'>, <class 'bytes'>, <class 'bytes'>, <class 'int'>, <class 'bytes'>, <class 'int'>, <class 'bytes'>, <class 'bytes'>, <class 'bytes'>, <class 'int'>, <class 'bytes'>, <class 'bytes'>, <class 'dict'>, <class 'dict'>, <class 'dict'>]

In C, this yields:

Feed #0
[<class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>]
Feed #1
[]
Feed #2
[]
Feed #3
[]
Feed #4
[]
Feed #5
[]
Feed #6
[]
Feed #7
[]
Feed #8
[]
Feed #9
[]
Feed #10
[]
Feed #11
[]
Feed #12
[<class 'dict'>, <class 'dict'>, <class 'dict'>, <class 'dict'>]
Feed #13
[]

Note that in the large reproducer, not only does it repeat, in the last iteration it starts returning corrupt values.

SystemError: error return without exception set

I've been getting errors without exception objects where calls to msgpack.unpackb are the last line in the stacktrace. Could there be somewhere inside the C implementation of the msgpack bindings that is returning an error code but failing to set an exception (ex: with PyErr_SetString)?

To reproduce:

msgpack.unpackb('\xd7\xa3p=\n\xb7|@')

OverflowError: long too big to convert

In [3]: import msgpack

In [4]: msgpack.unpackb(msgpack.packb(45234928034723904723906L))
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-4-b73643f9a196> in <module>()
----> 1 msgpack.unpackb(msgpack.packb(45234928034723904723906L))

/Users/guy/.python-eggs/msgpack_python-0.3.0-py2.7-macosx-10.8-x86_64.egg-tmp/msgpack/_packer.so in msgpack._packer.packb (msgpack/_packer.cpp:259)()

/Users/guy/.python-eggs/msgpack_python-0.3.0-py2.7-macosx-10.8-x86_64.egg-tmp/msgpack/_packer.so in msgpack._packer.Packer.pack (msgpack/_packer.cpp:184)()

/Users/guy/.python-eggs/msgpack_python-0.3.0-py2.7-macosx-10.8-x86_64.egg-tmp/msgpack/_packer.so in msgpack._packer.Packer._pack (msgpack/_packer.cpp:124)()

OverflowError: long too big to convert

use original unpack template.

To unpack concatenated stream like README example will cause msgpack-python use fallback.py to unpack data, which is quite slow in my case. To support concatenated stream with cython based (with unpack_template.h) method, I encountered weird issue as below:

In original version, it seems we should use unpack like this:

init_ctx(&ctx)
off = 0
while True:
    unpack_construct(&ctx, buf, buf_len, &off)
    data = unpack_data()

The ctx.cs and ctx.trail is wrong after some loops.
It works after I changed to this:

init_ctx(&ctx)
noff = 0
off = 0
while True:
    unpack_init(&ctx)
    unpack_construct(&ctx, buf + off, buf_len-off, &noff)
    off += noff
    data = unpack_data()

This may speed up with original version unpack_template.h instead of fallback.py.

Tests failure

Hello,

I used to run py.test test during the packaging of this package on Archlinux. Since last version (0.4.2), tests fail with 2 overflows. Is that something expected? Could we expect a runnable test suite in futher versions?

============================= test session starts ==============================
platform linux -- Python 3.4.0 -- py-1.4.20 -- pytest-2.5.2
collected 81 items

test/test_buffer.py ..
test/test_case.py .............
test/test_except.py ...
test/test_extension.py ...
test/test_format.py ..........
test/test_limits.py .FF
test/test_newspec.py .....
test/test_obj.py ........
test/test_pack.py ................
test/test_read_size.py ......
test/test_seq.py .
test/test_sequnpack.py .....
test/test_subtype.py .
test/test_unpack.py ...
test/test_unpack_raw.py ..

=================================== FAILURES ===================================
______________________________ test_array_header _______________________________

    def test_array_header():
        packer = Packer()
        packer.pack_array_header(2**32-1)
        with pytest.raises(ValueError):
>           packer.pack_array_header(2**32)

test/test_limits.py:25:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   OverflowError: Python int too large to convert to C unsigned long

_packer.pyx:240: OverflowError
_______________________________ test_map_header ________________________________

    def test_map_header():
        packer = Packer()
        packer.pack_map_header(2**32-1)
        with pytest.raises(ValueError):
>           packer.pack_array_header(2**32)

test/test_limits.py:32:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   ???
E   OverflowError: Python int too large to convert to C unsigned long

_packer.pyx:240: OverflowError
===================== 2 failed, 79 passed in 0.36 seconds ======================
==> ERROR: A failure occurred in check().
    Aborting...

==> ERROR: Build failed, check /var/lib/archbuild/extra-i686/seblu/build

python 3 support - proper symmetric encoding

When calling .packb against a dict in python 3 the returned values are all returned as a binary array even if original dict elements were strings and not binary arrays, example below:

b'\x81\xb2__testInnerClass__\x84\xa1d\xa8istring2\xa1a\x01\xa1c\xa8istring1\xa1b\x02'

.unpackb returns:

{b'testInnerClass': {b'd': b'istring2', b'a': 1, b'c': b'istring1', b'b': 2}}

I had to create a function to address:

def convertMsgPack(obj):
"""
This function accepts a single dict() containing a mix of binary arrays and other objects returning them as standard strings and objects.
The purpose of this function is to process MessagePack strings in Python 3 (otherwise unpacking them doesn't work properly).
"""
def _convert(obj):
if isinstance(obj,bytes):
return obj.decode()
elif isinstance(obj,dict):
return convertMsgPack(obj)
else:
return obj

return {x.decode():_convert(obj[x]) for x in obj.keys()}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.