Git Product home page Git Product logo

picklable-itertools's Introduction

https://travis-ci.org/mila-udem/picklable-itertools.svg?branch=master

picklable-itertools

A reimplementation of the Python standard library's itertools, in Python, using picklable iterator objects. Intended to be Python 2.7 and 3.4+ compatible. Also includes picklable, Python {2, 3}-compatible implementations of some related utilities, including some functions from the Toolz library, in picklable_itertools.extras.

Why?

  • Because the standard library pickle module (nor the excellent dill package) can't serialize all of the itertools iterators, at least on Python 2 (at least some appear to be serializable on Python 3).
  • Because there are lots of instances where these things in itertools would simplify code, but can't be used because serializability must be maintained across both Python 2 and Python 3. The in-development framework Blocks is our first consumer. We'd like to be able to serialize the entire state of a long-running program for later resumption. We can't do this with non-picklable objects.

Philosophy

  • This should be a drop-in replacement. Pretty self-explanatory. Test against the standard library itertools or builtin implementation to verify behaviour matches. Where Python 2 and Python 3 differ in their naming, (filterfalse vs ifilterfalse, zip_longest vs. izip_longest) we provide both. We also provide names that were only available in the Python 2 incarnation of itertools (ifilter, izip), also available under their built-in names in Python 3 (filter, zip), for convenience. As new objects are added to the Python 3 itertools module, we intend to add them (accumulate, for example, appears only in Python 3, and a picklable implementation is contained in this package.)
  • Handle built-in types gracefully if possible. List iterators, etc. are not picklable on Python 2.x, so we provide an alternative implementation. File iterators are handled transparently as well. dict iterators and set iterators are currently not supported. picklable_itertools.xrange can be used as a drop-in replacement for Python 2 xrange/Python 3 range, with the benefit that the iterators produced by it will be picklable on both Python 2 and 3.
  • Premature optimization is the root of all evil. These things are implemented in Python, so speed is obviously not our primary concern. Several of the more advanced iterators are constructed by chaining simpler iterators together, which is not the most efficient thing to do but simplifies the code a lot. If it turns out that speed (or a shallower object graph) is necessary or desirable, these can always be reimplemented. Pull requests to this effect are welcome.

picklable-itertools's People

Contributors

bartvm avatar dwf avatar rizar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

picklable-itertools's Issues

itertoolz?

Would be great to extend this package to contain itertoolz implementations as well. The one that I can think of that are particularly handy are partition and partition_all, so that returning batches becomes: partition(batch_size, xrange(num_examples)).

Make sure every function has at least one verify_pickle

  • count
  • cycle
  • repeat
  • chain
  • compress
  • dropwhile
  • groupby
  • ifilter
  • ifilterfalse
  • islice
  • imap
  • starmap
  • tee
  • takewhile
  • izip
  • izip_longest
  • product
  • permutations
  • combinations
  • combinations_with_replacement
  • accumulate
  • file_iterator
  • range_iterator
  • dict_iterator
  • ordered_sequence_iterator

Equizip

It'd be nice to have a picklable version of the equizip operator I wrote in mila-iqia/blocks#458. For example, the Merge transformer from mila-iqia/fuel#31 could use this (optionally, but by default) so that we can raise errors when a user is trying to zip two datastreams of different length.

Set up buildbot

Travis, maybe scrutinizer. It'll run pretty infrequently but this package is important enough for Blocks that we'd like it not to break.

PyPI checklist

  • Reorganize files to clean up namespace
  • Docstrings
  • Naive implementations of combinations and permutations (based on filtering product, permutations, etc. -- see itertools docs)
  • At least one serialization test per public function

xrange iterator

rangeiterator objects aren't picklable either, but a custom iterator should be simple enough.

Picklable file iterator?

@dwf Wondering if you consider this within the scope of the framework.

File handles are iterators, and they can't be pickled.

>>> f = open('README.md')
>>> f
<open file 'README.md', mode 'r' at 0x7f13b1360d20>
>>> it = iter(f)
>>> it
<open file 'README.md', mode 'r' at 0x7f13b1360d20>
>>> next(it)
'picklable_itertools\n'
>>> cPickle.dumps(it)
TypeError: can't pickle file objects

Analogously to ordered_sequence_iterator, can I add a file_iterator? It will be a bit more involved, because it needs to handle a variety of things through custom __getstate__ and __setstate__ methods:

  • Upon pickling, close and remove file handle
  • When unpickling, re-open the file, re-iterate over the lines until we get back to where we were
  • Have sensible defaults for if the file is not available in the unpickled environment, is no longer writable, etc. Dill implements all of this, so we can steal a lot of the behaviour from there.

Badges?

Don't know how to get the badges in README.rst back to normal.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.