Git Product home page Git Product logo

bencodepy's Introduction

#BencodePy A small Python 3 library for encoding and decoding Bencode data licensed under the GPLv2.

##Overview Although Bencoding is mainly, if not exclusively, used for BitTorrent metadata (.torrent) files, this library seeks to provide a generic means of encoding/decoding Bencode from/to Python data structures independent of torrent files.

##Docs

Installation

pip install bencodepy

Encode

from bencodepy import encode
mydata = { 'keyA': 'valueA' } #example data
bencoded_data = encode(mydata)
print (bencoded_data)
>>> b'd4:keyA6:valueAe'

######Encode Mapping

Python Type* Bencode Type
dict Dictionary
list List
tuple List
int Integer
str String
bytes String

*Includes subtypes thus both dict and OrderedDict would be represented as Bencode dictionary.

Decode

From bytes...

from bencodepy import decode
mydata = b'd4:KeyA6:valueAe'
my_ordred_dict = decode(mydata)
print(my_ordred_dict)
>>> OrderedDict([(b'KeyA', b'valueA')])

Alternatively from a file...

from bencodepy import decode_from_file
my_file_path = 'c:\whatever'
my_ordred_dict = decode_from_file(my_file_path)

######Decode Mapping

Bencode Type Python Type
Dictionary OrderedDict
List list
Integer int
String bytes

Bencode dictionaries are decoded as Python OrderedDict to preserve the order of elements. This is necessary to correctly calculate certain hash values such as that of a torrent file's Info Dictionary.

Decode methods will always return an iterable. If the root element of the bencode data is not a dictionary or list, decode() will wrap the all bencode elements in a tuple. Thus input data of b'5:ItemA5:ItemB' would yield a python tuple of ('ItemA', 'ItemB').

##Performance

Hardware: Xeon 1270v3 w/ 16GB RAM OS: Windows 7 Pro Python: CPython 3.4

Method: The benchmarks measure the time taken to encode/decode objects in memory; thus disk IO is excluded. The sample data used were 5 torrent files that were multiplied (in memory) to generate a sufficient number of elements. The source code is available under tests/benchmarks in this repository.

alt text

The new encoder benchmarks have not been graphed yet. It can process 10,000 in about 1 second and 30,000 in 3. Again, this is using multiple small torrent files. Just a handful of large files (ie MB range) can take well over a second.

These benchmarks are neither scientific or rigorous. As always, YMMV.

##Roadmap

  1. Determine method of distributing the optimized (cythonized) version of bencodepy.
  2. Consider async file read; I may hold off until someone creates a request on the Github issue tracker.

##License Copyright © 2014, 2015 by Eric Weast

Licensed under the GPLv2

bencodepy's People

Contributors

eweast avatar jamesan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

bencodepy's Issues

Missing info_hash item

I believe there's an missing item when decoding .torrent file. There should be info_hash of the whole torrent.

Encoder is seriously flawed in performance due to bytestring immutability

The encode program completely chokes on anything > 1MB or so.
ByteStrings are immutable, so every time you do coded_bytes += b'whatever', a new copy is created and the old one is discarded. Even if they were mutable, it would run into cases where it runs out of space to resize the big bytestring and would have to reallocate it. It can't read your mind and know that you're going to add to it, so when you create the bytestring, nothing stops it from allocating other objects just past the end of the bytestring.

My proposed solution involves tearing down those, and storing all the bytestrings to a list. (with append) That way you can return b''.join(iterableList)
This iterable method was confirmed and tested, on a 500 entry .dat file ~1.6MB, it went from 245 seconds your way to 5 seconds my way.
This was also how the original coders did it on: https://pypi.python.org/pypi/BitTorrent-bencode

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.