Git Product home page Git Product logo

python-zipstream's Introduction

python-zipstream

Build Status Coverage Status

zipstream.py is a zip archive generator based on python 3.3's zipfile.py. It was created to generate a zip file generator for streaming (ie web apps). This is beneficial for when you want to provide a downloadable archive of a large collection of regular files, which would be infeasible to generate the archive prior to downloading or of a very large file that you do not want to store entirely on disk or on memory.

The archive is generated as an iterator of strings, which, when joined, form the zip archive. For example, the following code snippet would write a zip archive containing files from 'path' to a normal file:

import zipstream

z = zipstream.ZipFile()
z.write('path/to/files')

with open('zipfile.zip', 'wb') as f:
    for data in z:
        f.write(data)

zipstream also allows to take as input a byte string iterable and to generate the archive as an iterator. This avoids storing large files on disk or in memory. To do so you could use something like this snippet:

def iterable():
    for _ in xrange(10):
        yield b'this is a byte string\x01\n'

z = zipstream.ZipFile()
z.write_iter('my_archive_iter', iterable())

with open('zipfile.zip', 'wb') as f:
    for data in z:
        f.write(data)

Of course both approach can be combined:

def iterable():
    for _ in xrange(10):
        yield b'this is a byte string\x01\n'

z = zipstream.ZipFile()
z.write('path/to/files', 'my_archive_files')
z.write_iter('my_archive_iter', iterable())

with open('zipfile.zip', 'wb') as f:
    for data in z:
        f.write(data)

Since recent versions of web.py support returning iterators of strings to be sent to the browser, to download a dynamically generated archive, you could use something like this snippet:

def GET(self):
    path = '/path/to/dir/of/files'
    zip_filename = 'files.zip'
    web.header('Content-type' , 'application/zip')
    web.header('Content-Disposition', 'attachment; filename="%s"' % (
        zip_filename,))
    return zipstream.ZipFile(path)

If the zlib module is available, zipstream.ZipFile can generate compressed zip archives.

Installation

pip install zipstream

Requirements

  • Python 2.6, 2.7, 3.2, 3.3, pypy

Examples

flask

from flask import Response

@app.route('/package.zip', methods=['GET'], endpoint='zipball')
def zipball():
    def generator():
        z = zipstream.ZipFile(mode='w', compression=ZIP_DEFLATED)

        z.write('/path/to/file')

        for chunk in z:
            yield chunk

    response = Response(generator(), mimetype='application/zip')
    response.headers['Content-Disposition'] = 'attachment; filename={}'.format('files.zip')
    return response

# or

@app.route('/package.zip', methods=['GET'], endpoint='zipball')
def zipball():
    z = zipstream.ZipFile(mode='w', compression=ZIP_DEFLATED)
    z.write('/path/to/file')

    response = Response(z, mimetype='application/zip')
    response.headers['Content-Disposition'] = 'attachment; filename={}'.format('files.zip')
    return response

django 1.5+

from django.http import StreamingHttpResponse

def zipball(request):
    z = zipstream.ZipFile(mode='w', compression=ZIP_DEFLATED)
    z.write('/path/to/file')

    response = StreamingHttpResponse(z, content_type='application/zip')
    response['Content-Disposition'] = 'attachment; filename={}'.format('files.zip')
    return response

webpy

def GET(self):
    path = '/path/to/dir/of/files'
    zip_filename = 'files.zip'
    web.header('Content-type' , 'application/zip')
    web.header('Content-Disposition', 'attachment; filename="%s"' % (
        zip_filename,))
    return zipstream.ZipFile(path)

Running tests

With python version > 2.6, just run the following command: python -m unittest discover

Alternatively, you can use nose.

python-zipstream's People

Contributors

allanlei avatar hecvd avatar kouk avatar lesthaeghet avatar marcosdiez avatar peter-juritz avatar vstoykov avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-zipstream's Issues

Attempting to compress file larger than ZIP64_LIMIT fails

On line 265:

zinfo.file_size = 0

This is never initialized to the correct file size before this code is called on line 291:

zip64 = self._allowZip64 and zinfo.file_size * 1.05 > ZIP64_LIMIT

resulting in zip64 always being set to false and later when zinfo.file_size is set on line 323 after the compression is performed and the file size is counted, the exception on line 326 is raised.

I will submit a pull request with a possible quick fix. Thanks!

Archiving a ZIP Archive

I seem to be having issues when archiving files, where a few of the files are ZIP archives.

This will put me into an endless loop of uncompressing a ZIP into a CPGZ, then a CPGZ into the original ZIP, and so on...

Is this a known issue? Is there a workaround?

Add file objects rather than paths?

Is there a way to add file-like objects rather than file paths? Something like:

handle = open('foobar', 'r')
z = zipstream.ZipFile(mode='w', compression=ZIP_DEFLATED)
z.write(handle)

A clear example for archiving whole folders and subfolders would be nice

The example in README seems to imply that simply doing z.write('/path/to/files/') is sufficient but that generates empty archives. Something like:

    z = zipstream.ZipFile(mode='w', compression=zipstream.ZIP_DEFLATED)
    for root, dirs, files in os.walk(path):
        for filename in files:
            file_path = os.path.join(root, filename)
            arcpath = os.path.join(path, os.path.relpath(file_path, path))
            z.write(file_path, arcpath)

Install tests directory as a package

Hi and thank you very much for this package !

It seems that python setup.py install also create a package corresponding to the tests directory.

On my system (debian wheezy), it creates the two following directory:

  • /usr/local/lib/python2.7/dist-packages/zipstream , which is fine
  • /usr/local/lib/python2.7/dist-packages/tests which causes issues for other tests suites from other local packages because it put a 'tests' directory on the PYTHON_PATH.

Deleting __init__.py from tests directory seems to fix it.
Excluding tests directory in setup.py might be another solution.

handling exceptions while streaming

This is the case scenario:

  • create a zipsrteam.ZipFile
  • add some files through the ZipFile.write function
  • delete some of the added files
  • stream the content using ZipFile.__iter__

So while streaming, you try to open a file but an exception (OSError or IOError) is raised. You want to skip that specific file and stream all other valid elements.

I did not find any way to achieve this behaviour. Maybe some option like skip_on_error could help.

Support password protected archive generation

The original encryption algorithm supported by zipfile (only in decryption) is known to be insecure but still needed in specific cases.

There is a pure python implementation of this feature in zipencrypt but you lose the stream feature of your library.

Goal of this issue is to merge both feature and to have the possibility of generating on-the-fly password protected zip file.

Error in the examples

ZipStream class does not exist, we should use ZipFile class instead.

By trying auto completion in an ipython shell for "zipstream." that's what i get :
zipstream.BZIP2_VERSION zipstream.str
zipstream.LZMA_VERSION zipstream.stringCentralDir
zipstream.PointerIO zipstream.stringDataDescriptor
zipstream.ZIP64_LIMIT zipstream.stringEndArchive
zipstream.ZIP64_VERSION zipstream.stringEndArchive64
zipstream.ZIP_BZIP2 zipstream.stringEndArchive64Locator
zipstream.ZIP_DEFLATED zipstream.stringFileHeader
zipstream.ZIP_FILECOUNT_LIMIT zipstream.struct
zipstream.ZIP_LZMA zipstream.structCentralDir
zipstream.ZIP_MAX_COMMENT zipstream.structEndArchive
zipstream.ZIP_STORED zipstream.structEndArchive64
zipstream.ZipFile zipstream.structEndArchive64Locator
zipstream.ZipInfo zipstream.structFileHeader
zipstream.bytes zipstream.sys
zipstream.compat zipstream.time
zipstream.crc32 zipstream.unicode_literals
zipstream.os zipstream.with_statement
zipstream.print_function zipstream.zipfile
zipstream.stat zipstream.zlib

So, in every example, it should be : "z = ZipFile(mode='w', compression=ZIP_DEFLATED)"
And not "ZipStream"

Interested in a new maintainer? (update: forked package)

Dear @allanlei,
It seems you have no time available anymore to work on this package. Since there are people using it, would you be willing to add another owner/maintainer to the PyPI package and possibly this repo? That way we can ensure PR's are handled and new versions are released.
I would be willing to help out.
Kind regards

broken zip

Hi! This test code create zip but files are corrupted! Python 3.8

stream = ZipFile(mode='w', compression=ZIP_DEFLATED)
for file in path2files:
    stream.write(file, arcname=osp.basename(file))

for b in stream:
     handler.wfile.write(b)

handler is BaseHTTPRequestHandler object

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.