The bencodepy from eweast

bencodepy's Introduction

#BencodePy A small Python 3 library for encoding and decoding Bencode data licensed under the GPLv2.

##Overview Although Bencoding is mainly, if not exclusively, used for BitTorrent metadata (.torrent) files, this library seeks to provide a generic means of encoding/decoding Bencode from/to Python data structures independent of torrent files.

##Docs

Installation

pip install bencodepy

Encode

from bencodepy import encode
mydata = { 'keyA': 'valueA' } #example data
bencoded_data = encode(mydata)
print (bencoded_data)
>>> b'd4:keyA6:valueAe'

######Encode Mapping

Python Type*	Bencode Type
dict	Dictionary
list	List
tuple	List
int	Integer
str	String
bytes	String

*Includes subtypes thus both dict and OrderedDict would be represented as Bencode dictionary.

Decode

From bytes...

from bencodepy import decode
mydata = b'd4:KeyA6:valueAe'
my_ordred_dict = decode(mydata)
print(my_ordred_dict)
>>> OrderedDict([(b'KeyA', b'valueA')])

Alternatively from a file...

from bencodepy import decode_from_file
my_file_path = 'c:\whatever'
my_ordred_dict = decode_from_file(my_file_path)

######Decode Mapping

Bencode Type	Python Type
Dictionary	OrderedDict
List	list
Integer	int
String	bytes

Bencode dictionaries are decoded as Python OrderedDict to preserve the order of elements. This is necessary to correctly calculate certain hash values such as that of a torrent file's Info Dictionary.

Decode methods will always return an iterable. If the root element of the bencode data is not a dictionary or list, decode() will wrap the all bencode elements in a tuple. Thus input data of b'5:ItemA5:ItemB' would yield a python tuple of ('ItemA', 'ItemB').

##Performance

Hardware: Xeon 1270v3 w/ 16GB RAM OS: Windows 7 Pro Python: CPython 3.4

Method: The benchmarks measure the time taken to encode/decode objects in memory; thus disk IO is excluded. The sample data used were 5 torrent files that were multiplied (in memory) to generate a sufficient number of elements. The source code is available under tests/benchmarks in this repository.

The new encoder benchmarks have not been graphed yet. It can process 10,000 in about 1 second and 30,000 in 3. Again, this is using multiple small torrent files. Just a handful of large files (ie MB range) can take well over a second.

These benchmarks are neither scientific or rigorous. As always, YMMV.

##Roadmap

Determine method of distributing the optimized (cythonized) version of bencodepy.
Consider async file read; I may hold off until someone creates a request on the Github issue tracker.

Licensed under the GPLv2

bencodepy's People

Contributors

Stargazers

Watchers

bencodepy's Issues

Missing info_hash item

I believe there's an missing item when decoding .torrent file. There should be info_hash of the whole torrent.

DOAP record has incorrect home page URL

The package listing and DOAP record at (pypi.python.org/pypi/bencodepy/0.9.4) appears to have a broken home page URL:

It lists the URL as (https://github.com/eweast/bencodepy_opti), which returns a 404. Looks like it should be (https://github.com/eweast/BencodePy), yes?

Encoder is seriously flawed in performance due to bytestring immutability

The encode program completely chokes on anything > 1MB or so.
ByteStrings are immutable, so every time you do coded_bytes += b'whatever', a new copy is created and the old one is discarded. Even if they were mutable, it would run into cases where it runs out of space to resize the big bytestring and would have to reallocate it. It can't read your mind and know that you're going to add to it, so when you create the bytestring, nothing stops it from allocating other objects just past the end of the bytestring.

My proposed solution involves tearing down those, and storing all the bytestrings to a list. (with append) That way you can return b''.join(iterableList)
This iterable method was confirmed and tested, on a 500 entry .dat file ~1.6MB, it went from 245 seconds your way to 5 seconds my way.
This was also how the original coders did it on: https://pypi.python.org/pypi/BitTorrent-bencode

Recommend Projects

eweast / bencodepy Goto Github PK

bencodepy's Introduction

Installation

Encode

Decode

bencodepy's People

Contributors

Stargazers

Watchers

Forkers

bencodepy's Issues

Missing info_hash item

DOAP record has incorrect home page URL

Encoder is seriously flawed in performance due to bytestring immutability

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent