py-bson / bson Goto Github PK

View Code? Open in Web Editor NEW

430.0 430.0 80.0 157 KB

Independent BSON codec for Python that doesn't depend on MongoDB.

License: Other

Python 100.00%

bson's People

Contributors

Stargazers

Watchers

bson's Issues

copy.deepcopy on DBRef error

Hi! In dbref.py instead of

    def __getattr__(self, key):
        return self.__kwargs[key]

please do:

    def __getattr__(self, key):
        if key.startswith('__'):
            raise AttributeError
        return self.__kwargs[key]

because you won't be able to make copy.deepcopy of DBRef's object until then. Please, fix it as fast as you can, because my MongoEngine is broken https://github.com/hmarr/mongoengine/issues/issue/119/ :-)

thank you

decode_cstring is slow when decoding from the middle of a huge buffer

decode_cstring creates a slice of the input buffer every time it's called. Effectively, the slice it creates has many bytes because the input buffer itself has many bytes, even though only a few bytes need to be read.

For reasonable performance, I believe the slice should not be created.

Use of md5 for object IDs makes this module fail on FIPS-enabled machines

    pulp2_dev:   File "/usr/lib/python2.7/site-packages/mongoengine/document.py", line 2, in <module>
    pulp2_dev:     import pymongo
    pulp2_dev:   File "/usr/lib64/python2.7/site-packages/pymongo/__init__.py", line 83, in <module>
    pulp2_dev:     from pymongo.collection import ReturnDocument
    pulp2_dev:   File "/usr/lib64/python2.7/site-packages/pymongo/collection.py", line 21, in <module>
    pulp2_dev:     from bson.code import Code
    pulp2_dev:   File "/usr/lib64/python2.7/site-packages/bson/__init__.py", line 43, in <module>
    pulp2_dev:     from bson.objectid import ObjectId
    pulp2_dev:   File "/usr/lib64/python2.7/site-packages/bson/objectid.py", line 55, in <module>
    pulp2_dev:     class ObjectId(object):
    pulp2_dev:   File "/usr/lib64/python2.7/site-packages/bson/objectid.py", line 62, in ObjectId
    pulp2_dev:     _machine_bytes = _machine_bytes()
    pulp2_dev:   File "/usr/lib64/python2.7/site-packages/bson/objectid.py", line 38, in _machine_bytes
    pulp2_dev:     machine_hash = hashlib.md5()
    pulp2_dev: ValueError: error:060800A3:digital envelope routines:EVP_DigestInit_ex:disabled for fips

version 0.4.4 regression of issue #32

bson seems to have a regression from 0.4.3 to 0.4.4 which was originally reported in issue #32. Dumping/loading binary data doesn't work anymore. I'm able to reproduce this in python 3.4.3 (seen below) and python 3.5.1.

(venv343) kguilbert@kg-ubuntu:~$ pip install -q bson==0.4.3 && echo ok
ok
(venv343) kguilbert@kg-ubuntu:~$ python
Python 3.4.3 (default, Nov 17 2016, 01:08:31) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import bson
>>> bson.loads(bson.dumps({'d':b'qwerty'}))
{'d': b'qwerty'}
>>> 
(venv343) kguilbert@kg-ubuntu:~$ pip install -q bson==0.4.4 && echo ok
ok
(venv343) kguilbert@kg-ubuntu:~$ python
Python 3.4.3 (default, Nov 17 2016, 01:08:31) 
[GCC 4.8.4] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import bson
>>> bson.loads(bson.dumps({'d':b'qwerty'}))
{'d': b''}
>>>

`str` method of ObjectId should return <type 'str'>

Sorry I wrote to the wrong place, please delete this issue.

encode_cstring accepts invalid input.

We found this when this bug report https://bugs.launchpad.net/python-oops-datedir-repo/+bug/896959 was filed, when we found bson documents not decoding, though they encoded without error.

def encode_cstring(value):
    if isinstance(value, unicode):
        value = value.encode("utf8")
    elif isinstance(value, str):
         # check value is utf8.
         value.decode('utf8')
    else:
        raise TypeError('Invalid type for cstring %r' % value)
    return value + "\x00"

As a replacement encode_cstring would avoid this failure mode, and a related one with non-string types that implement add.

decode_document fails silently on truncated document

TEST CASE

import bson

a = bson.dumps({'a': 1, 'c':'ab'})
print bson.loads(a)
print bson.loads(a[:20])

OUTPUT

{u'a': 1, u'c': 'ab'}
{u'a': 1, u'c': 'a'}

EXPECTED OUTPUT

{u'a': 1, u'c': 'ab'}

raise some exception

Downgrade from 1.1.0 to 0.4.3?

Something weird happened to me. I don't know if it is related to bson or to PyPI. I had bson 1.1.0 installed, and I am not dreaming, my requirements.txt shows bson==1.1.0 and I even checked with my colleague who did not upgraded and he has bson==1.1.0 when doing pip freeze | grep bson.
When I did pip install --upgrade bson because pip install -r requirements.txt was failing, it installed bson 0.4.3, which is the "current" version.
So, what happened? Is this "downgrade" gonna break many things in my code? Is it just a renaming? Is it related to you or to PyPI? What should I do?

Thank you in advance!

PS: Again, I am not dreaming: https://libraries.io/pypi/bson/1.1.0

Cannot import module Binary

Hi Community,

I'm pretty new to the bson.
I installed the bson using

sudo pip install bson

The rosbridge_suit depends on bson, specifically bson.Binary
After install the latest version of bson which is 0.5.1, it will argue that

'module' object has no attribute 'Binary'
I did some fix by myself. But it will be good have a official fix for this problem.
I also tried to follow the api page of python
https://api.mongodb.com/python/2.7.2/api/bson/binary.html
to import Binary using

from bson.binary import Binary

It argues

from bson.binary import Binary
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named binary

I have fix it by some ugly modification. But if anyone could tell me what happened will be great!

bson.dumps() cannot encode a list of dicts

I have a json object which is a list of dicts (each describing an event with a dict containing the parameters of the event). bson.dumps(jsonListOfDicts) raises an attribute Error:

In [1]: import bson
In [2]: bson.dumps([{"a": [1, 2, 3]}, {"a": [2, 3, 4]}])
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-23-ed4dc1257349> in <module>()
----> 1 bson.dumps([{"a": [1, 2, 3]}, {"a": [2, 3, 4]}])

/home/fabian/mount/data/LinuxDaten/software/anaconda/lib/python2.7/site-packages/bson/__init__.pyc in dumps(obj, generator)
     36     if isinstance(obj, BSONCoding):
     37         return encode_object(obj, [], generator_func=generator)
---> 38     return encode_document(obj, [], generator_func=generator)
     39 
     40 

/home/fabian/mount/data/LinuxDaten/software/anaconda/lib/python2.7/site-packages/bson/codec.pyc in encode_document(obj, traversal_stack, traversal_parent, generator_func)
    223 def encode_document(obj, traversal_stack, traversal_parent=None, generator_func=None):
    224     buf = StringIO()
--> 225     key_iter = iterkeys(obj)
    226     if generator_func is not None:
    227         key_iter = generator_func(obj, traversal_stack)

/home/fabian/mount/data/LinuxDaten/software/anaconda/lib/python2.7/site-packages/six.pyc in iterkeys(d, **kw)
    568 else:
    569     def iterkeys(d, **kw):
--> 570         return iter(d.iterkeys(**kw))
    571 
    572     def itervalues(d, **kw):

AttributeError: 'list' object has no attribute 'iterkeys'

Would be nice if bson also could support such json objects.

python 3.4.3 dumps fails to serialize binary data

python 2.7.10 - dumps and loads are OK
python 3.4.3 - loads OK, dumps fails to serialize binary data

import bson
message = b'qwerty'
dumped = bson.dumps({'d':message})
des_message = bson.loads(dumped)
des_message
{}

Boolean not decoded anymore since 0.4.4

I expected bson.loads(bson.dumps({"test": False})) to result in {"test": False}, but since 0.4.4 it results in {"test": 0}

Feature request: generator loading

I've got a 200GB BSON file from Mongo, and I want to parse it as a stream: process each element in the list independently. It would be great if there was a version of decode_document which yield-ed parsed objects instead of putting them into an array.

Exception: 'KeyError: 7'

>>> import bson
>>> bson.loads(open('/Users/patrickbassut/Documents/dump/myfc_id/account.bson', 'r').read())
  File "<stdin>", line 1, in <module>
  File "/Users/patrickbassut/Programming/paymant/lib/python2.7/site-packages/bson/__init__.py", line 45, in loads
    return decode_document(data, 0)[1]
  File "/Users/patrickbassut/Programming/paymant/lib/python2.7/site-packages/bson/codec.py", line 279, in decode_document
    base, name, value = decode_element(data, base)
  File "/Users/patrickbassut/Programming/paymant/lib/python2.7/site-packages/bson/codec.py", line 265, in decode_element
    element_description = ELEMENT_TYPES[element_type]
KeyError: 7

This might be very specific to what I'm trying to read. But unfortunately is very sensitive information so I can't provide it.

Support ObjectId element type

refer with #38

This package has been removed from the pypi.

https://pypi.python.org/pypi/bson ➡️ returns 404

ObjectId from scratch

This project should provide (rewrite) ObjectId from scratch.
(In #39 , #66 ObjectId added from mongodb. )

Installing the package messes up pymongo

Pymongo installs its own bson package directly to site-packages (not into pymongo/bson for some reason). Accidentally installing this package causes pymongo imports like this to fail:
from bson.errors import *

module 'bson' has no attribute 'BSON'

 41             # Grab the image from the product.

---> 42 item = bson.BSON(item_data).decode()
43 img_idx = image_row["img_idx"]
44 bson_img = item["imgs"][img_idx]["picture"]

AttributeError: module 'bson' has no attribute 'BSON'

Array elements are produced in the wrong order

The BSON spec says that array elements must occur in numeric order: 0, 1, 2, … but the encode_array_element() function ruins the order by passing the key-value pairs through a Python dictionary, which randomizes the order. This makes BSON messages produced by this library incompatible with other libraries, such as the standard BSON implementation for C#.

python3 support

when doing basic install on python3, it succeeds with errors:

$ sudo easy_install3 bson
Searching for bson
Reading https://pypi.python.org/simple/bson/
Best match: bson 0.3.3
Downloading https://pypi.python.org/packages/source/b/bson/bson-0.3.3.tar.gz#md5=46bce086741b651afaba0ea118fc5f8d
Processing bson-0.3.3.tar.gz
Writing /tmp/easy_install-76jcw__x/bson-0.3.3/setup.cfg
Running bson-0.3.3/setup.py -q bdist_egg --dist-dir /tmp/easy_install-76jcw__x/bson-0.3.3/egg-dist-tmp-bjatn5jv
File "build/bdist.linux-x86_64/egg/bson/network.py", line 6
except ImportError, e:
^
SyntaxError: invalid syntax

File "build/bdist.linux-x86_64/egg/bson/codec.py", line 87
  except KeyError, e:
                 ^

SyntaxError: invalid syntax

zip_safe flag not set; analyzing archive contents...
Adding bson 0.3.3 to easy-install.pth file

Installed /usr/local/lib/python3.4/dist-packages/bson-0.3.3-py3.4.egg
Processing dependencies for bson
Finished processing dependencies for bson

However, when you import bson it fails:
$ ipython3
Python 3.4.0 (default, Apr 11 2014, 13:05:11)
Type "copyright", "credits" or "license" for more information.

IPython 1.2.1 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.

In [1]: import bson

ImportError Traceback (most recent call last)
in ()
----> 1 import bson

/usr/local/lib/python3.4/dist-packages/bson-0.3.3-py3.4.egg/bson/init.py in ()
52 """
53
---> 54 from codec import *
55 import network
56 all = ["loads", "dumps"]

ImportError: No module named 'codec'

importing json_utils issues ImportError

In [2]: from bson import json_util

ImportError Traceback (most recent call last)
in ()
----> 1 from bson import json_util

/home/lhonda/.virtualenvs/test/local/lib/python2.7/site-packages/bson/json_util.py in ()
84
85 import bson
---> 86 from bson import EPOCH_AWARE
87 from bson.binary import Binary
88 from bson.code import Code

ImportError: cannot import name EPOCH_AWARE

In [3]:

What does the AttributeError about iterkeys mean in dumps a unicode string ?

In comparing dumps() between json and bson 0.4.7 module, I encounting this:

>>> s = u'你我他'
>>> s
u'\u4f60\u6211\u4ed6'
>>> json.dumps(s)
'"\\u4f60\\u6211\\u4ed6"'
>>> bson.dumps(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/bson/__init__.py", line 40, in dumps
    generator_func=generator, on_unknown=on_unknown)
  File "/usr/local/lib/python2.7/dist-packages/bson/codec.py", line 223, in encode_document
    key_iter = iterkeys(obj)
  File "/usr/local/lib/python2.7/dist-packages/six.py", line 593, in iterkeys
    return d.iterkeys(**kw)
AttributeError: 'unicode' object has no attribute 'iterkeys'

No dumps since 0.5.0 version

My code uses dumps function and since last update (0.5.0), this function doesn't exist.
Is it possible to update documented sample ?

Thank you 😄

Consider renaming?

I have a project witch among others depends on 2 libraries, one of them depends on standalone bson package, and other depends on pymongo.

Now because pymongo from unknown to me reasons also exposes top-level bson module, with different API; this creates a shadowing problem, that is dependent on order of dependencies installation :/

Currently to get around it I have to do on my top-level package monkey-patching like that:

import bson
bson.loads = bson.BSON.decode
bson.dumps = bson.BSON.encode

Could we consider renaming the top-level package, with an alias for BC?

pip install bson failing

import bson

File "bson/init.py", line 66, in

from . import codec

File "bson/codec.py", line 28, in

from .objects import *

File "bson/objects.py", line 36

class BSONObject(object, metaclass=ABCMeta):

SyntaxError: invalid syntax

UnicodeDecodeError

Why do you use "unicode" built-in function to decode utf-8 string for Python2.x? Why don't you use method decode with 'utf-8' as argument.
I've got one case:

import bson
doc = {u'саша': u'маша'}
bson.loads(bson.dumps(doc))

As I know from http://bsonspec.org/spec.html "string" ('\x02' UTF-8 encoded characters) and cstring not fully but UTF-8 encoded characters (except '\x00')

Can you explain me this case? Maybe I'm wrong.

Remove pip dependency

Arrays are mis-sorted upon decode

The BSON library only works on arrays of up to ten items long. If an eleventh item is supplied, then the decoded Python list has what should have been the 11th item in slot #2, any 12th item in slot #3, and so forth, because after decoding the BSON document that represents the array the library sorts the keys by their string value instead of their numeric value. To fix this, change the line "keys.sort()" in "decode_array_element" so that it reads instead: "keys.sort(key=int)".

objectid: After 2038, object id generation will fail.

The code as follows:
def __generate(self):
"""Generate a new value for this ObjectId.
"""

    # 4 bytes current time
    oid = struct.pack(">i", int(time.time()))

Error message as follows:
packages/bson/objectid.py", line 170, in __generate
oid = struct.pack(">i", int(time.time()))
error: 'i' format requires -2147483648 <= number <= 2147483647

tag and release 0.4.3 on github

Dear Ayun Park,

I maintain a bitbake recipe of your python package at:

https://github.com/bmwcarit/meta-ros/blob/master/recipes-devtools/python/python-bson_0.4.1.bb

This file contains some meta data that allows a cross-compile tool chain to compile and package your python package.

Now, I wanted to update to the most recent version 0.4.3 and noticed that the repository has not been tagged and hence github does not provide an archive of version 0.4.3 for downloading, which my recipe relies on.

Could you please tag the version 0.4.3 in the github repository? Thanks for your work and your effort.

Best regards,

Lukas Bulwahn

bson can't dump python array as root element

AttributeError: 'list' object has no attribute 'keys'

UUID are silently ignored

import bson
import uuid
bson.loads(bson.dumps({'uuid': uuid.uuid4()}))
{}

Pymongo's version of BSON does not do this.

File object (de)serialization is not supported (bson.load, bson.dump)

When you serialize a document to a file, the serialized bson-string is allocated three times.

Into "buf" (https://github.com/martinkou/bson/blob/master/bson/codec.py#L200)
Into "e_list" (https://github.com/martinkou/bson/blob/master/bson/codec.py#L209)
Into the return variable (https://github.com/martinkou/bson/blob/master/bson/codec.py#L211)

Functions that serialize directly to a file object would not need such allocations.

Are there plans to implement bson.load or bson.dump? The latter is much easier to implement and I can volunteer to do it.

No sdist available in PyPI

$ pip install bson
Downloading/unpacking bson
  Could not find any downloads that satisfy the requirement bson
No distributions at all found for bson

bson loads a list as a dict

if you do the following:
bsonjs.loads(json.dumps([]))

The result is {}

This caused some minor issues in a program I'm working on. Since json is able to load lists I think there is no reason bson shouldn't experience similar behavior.

issue in datetime

>>> import bson
>>> from bson import json_util
>>> from datetime import datetime
>>> k = datetime.now()
>>> k
datetime.datetime(2017, 7, 12, 20, 7, 44, 694669)
>>> json_util.dumps(k)
'{"$date": 1499890064694}'
>>> json_util.loads(json_util.dumps(k))
datetime.datetime(2017, 7, 12, 20, 7, 44, 694000, tzinfo=<bson.tz_util.FixedOffset object at 0x237d750>)
>>> j = json_util.loads(json_util.dumps(k))
>>> k ==j 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't compare offset-naive and offset-aware datetimes
>>>

I think we should respect the datetime as is whether it has timezone or it is naive.

int with value > 7FFFFFFFFFFFFFFF not supported

The bug is in codec.py file:
if value < -0x80000000 or value > 0x7fffffff: buf.write(encode_int64_element(name, value))

def encode_int64_element(name, value): return b"\x12" + encode_cstring(name) + struct.pack("<q", value)

struct.pack("<q", 0x8fffffff) raises "{error}required argument is not an integer"

In (BSON specs) Uint64 is indeed supported:

"\x11" e_name uint64

Support of the encoding of tuples

I'm wondering if you can add support for tuples ? so that they are encoded as per lists, much like cjson, e.g.

    >>> bson.loads( bson.dumps({'a':(1,2,0)}))
    {}

would instead return

    {"a": [1, 2, 0]}'

pip install bson error, pip verion is 10.0.0

[root@localhost home]# pip -V
pip 10.0.0 from /usr/lib/python2.7/site-packages/pip (python 2.7)
[root@localhost home]# pip install bason
Collecting bason
Could not find a version that satisfies the requirement bason (from versions: )
No matching distribution found for bason
[root@localhost home]# pip install bson
Collecting bson
Using cached bson-0.5.2.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 1, in
File "/tmp/pip-install-H29FRs/bson/setup.py", line 8, in
from pip import get_installed_distributions
ImportError: cannot import name get_installed_distributions

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-install-H29FRs/bson/

Allow serializing and deserializing custom classes

I was not able to find a way (not involving changing pybson) to serialize custom objects. It would be nice to have an object_hook in a similar way to json.

ImportError: cannot import name 'abc'

from bson.py3compat import abc, string_type, PY3, text_type
ImportError: cannot import name 'abc'

numeric keys are not serialized properly

Given that bson can serialize floats, i was expecting float keys as valid, but it seems they are converted to strings on serialization, is that expected behavior or a bug?

>>> from bson import loads, dumps
>>> loads(dumps({1.2: 3}))
{'1.2': 3}

pytz.utc vs datetime.timezone.utc

I want to integrate bson in a custom yocto image and I'm having troubles installing pytz dependency. I searched the repo and I found it is used here

I'm wondering if we could replace it with datetime.timezone.utc.

Not support unicode character?

I encounter a problem with unicode character. It works fine with utf-8 encoding. But if I translate the string into unicode, it reports error like this:

>>> a="\xe2\x97\x8f"
>>> bson.loads(bson.dumps({"key":a}))
{u'key': '\xe2\x97\x8f'}
>>> a.decode("utf-8")
u'\u25cf'
>>> a=a.decode("utf-8")
>>> bson.loads(bson.dumps({"key":a}))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/bson/__init__.py", line 47, in loads
    return decode_document(data, 0)[1]
  File "/usr/local/lib/python2.7/dist-packages/bson/codec.py", line 297, in decode_document
    value = unicode(value)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

Python Version: 2.7.9
bson version: 0.4.7

For now, I have to encode my unicode string first to use bson :P
Thanks for the help.

Don't use md5 to calculate objectid

md5 has a lot of limitations. See this commit in pymongo on a possible replacement:

mongodb/mongo-python-driver@6185035

bson is slow

from: http://stackoverflow.com/questions/9884080/fastest-packing-of-data-in-python-and-java

test_struct encoding took 0.6742, decoding 0.5187
test_json_1 encoding took 0.6256, decoding 0.9847
test_bson encoding took 56.1082, decoding 30.1189
test_json_2 encoding took 0.7145, decoding 0.8737
test_pickle encoding took 8.8269, decoding 9.9774
test_cPickle encoding took 1.1699, decoding 1.4753
test_marshal encoding took 0.1216, decoding 0.5035

Deserialization of UUID mismatches that of other bson library

I've just found a problem when doing message passing with other languages,
look at this binary file:

8a00 0000 0578 636f 6d63 6c73 6964 0010
0000 0004 6cce a4f1 319a 7944 b7c0 5e1e
7ab8 15b3 1069 6400 0000 0000 056d 6574
686f 6400 1000 0000 0462 9057 b338 1afe
4389 4f08 a734 8131 1504 7061 7261 6d73
0038 0000 0003 3000 3000 0000 0574 7970
6500 1000 0000 04b2 66f4 bb31 b187 4b90
a5f6 a426 b05c bf02 6461 7461 0006 0000
0078 203d 2031 0000 0000

It'll be deserialized to the following one by py-bson

{'xcomclsid': UUID('6ccea4f1-319a-7944-b7c0-5e1e7ab815b3'),
 'id': 0,
 'method': UUID('629057b3-381a-fe43-894f-08a734813115'),
 'params': [{'type': UUID('b266f4bb-31b1-874b-90a5-f6a426b05cbf'),
   'data': 'x = 1'}]}

However, Newtonsoft.Json gets

{
  "xcomclsid": "f1a4ce6c-9a31-4479-b7c0-5e1e7ab815b3",
  "id": 0,
  "method": "b3579062-1a38-43fe-894f-08a734813115",
  "params": [
    {
      "type": "bbf466b2-b131-4b87-90a5-f6a426b05cbf",
      "data": "x = 1"
    }
  ]
}

What's wrong with the case?
Thanks for you help!

Non-encodables are silently ignored

In [20]: def foo():
   ....:     print('testing')
   ....:     

In [21]: bson.dumps({'test': foo})
Out[21]: b'\x05\x00\x00\x00\x00'

As the above example shows, the dumps function silently ignores content which is not encodable. This behaviour is somewhat unexpected, since e.g. json.dumps does raise a TypeError in this situation. Perhaps provide an option to set silent filtering but by default raise errors.

Inconsistent use of shebang

Not sure if it's really an issue, but thought I'd bring up that several files in this package use an incosistent shebang line:

E.g.
bson/__init__.py uses #!/usr/bin/python -OOOO
bson/codec.py uses #!/usr/bin/python -OOOO
bson/network.py uses #!/usr/bin/env python

Perhaps all files should use the highly adaptable #!/usr/bin/env python? Is there some reason for this that's not immediately obvious to me?