allanlei / python-zipstream Goto Github PK

View Code? Open in Web Editor NEW

This project forked from spideroak/zipstream

128.0 7.0 33.0 50 KB

Like Python's ZipFile module, except it works as a generator that provides the file in many small chunks.

License: GNU General Public License v3.0

Python 100.00%

zip python stream

python-zipstream's Introduction

python-zipstream

zipstream.py is a zip archive generator based on python 3.3's zipfile.py. It was created to generate a zip file generator for streaming (ie web apps). This is beneficial for when you want to provide a downloadable archive of a large collection of regular files, which would be infeasible to generate the archive prior to downloading or of a very large file that you do not want to store entirely on disk or on memory.

The archive is generated as an iterator of strings, which, when joined, form the zip archive. For example, the following code snippet would write a zip archive containing files from 'path' to a normal file:

import zipstream

z = zipstream.ZipFile()
z.write('path/to/files')

with open('zipfile.zip', 'wb') as f:
    for data in z:
        f.write(data)

zipstream also allows to take as input a byte string iterable and to generate the archive as an iterator. This avoids storing large files on disk or in memory. To do so you could use something like this snippet:

def iterable():
    for _ in xrange(10):
        yield b'this is a byte string\x01\n'

z = zipstream.ZipFile()
z.write_iter('my_archive_iter', iterable())

with open('zipfile.zip', 'wb') as f:
    for data in z:
        f.write(data)

Of course both approach can be combined:

def iterable():
    for _ in xrange(10):
        yield b'this is a byte string\x01\n'

z = zipstream.ZipFile()
z.write('path/to/files', 'my_archive_files')
z.write_iter('my_archive_iter', iterable())

with open('zipfile.zip', 'wb') as f:
    for data in z:
        f.write(data)

Since recent versions of web.py support returning iterators of strings to be sent to the browser, to download a dynamically generated archive, you could use something like this snippet:

def GET(self):
    path = '/path/to/dir/of/files'
    zip_filename = 'files.zip'
    web.header('Content-type' , 'application/zip')
    web.header('Content-Disposition', 'attachment; filename="%s"' % (
        zip_filename,))
    return zipstream.ZipFile(path)

If the zlib module is available, zipstream.ZipFile can generate compressed zip archives.

Installation

pip install zipstream

Requirements

Python 2.6, 2.7, 3.2, 3.3, pypy

Examples

flask

from flask import Response

@app.route('/package.zip', methods=['GET'], endpoint='zipball')
def zipball():
    def generator():
        z = zipstream.ZipFile(mode='w', compression=ZIP_DEFLATED)

        z.write('/path/to/file')

        for chunk in z:
            yield chunk

    response = Response(generator(), mimetype='application/zip')
    response.headers['Content-Disposition'] = 'attachment; filename={}'.format('files.zip')
    return response

# or

@app.route('/package.zip', methods=['GET'], endpoint='zipball')
def zipball():
    z = zipstream.ZipFile(mode='w', compression=ZIP_DEFLATED)
    z.write('/path/to/file')

    response = Response(z, mimetype='application/zip')
    response.headers['Content-Disposition'] = 'attachment; filename={}'.format('files.zip')
    return response

django 1.5+

from django.http import StreamingHttpResponse

def zipball(request):
    z = zipstream.ZipFile(mode='w', compression=ZIP_DEFLATED)
    z.write('/path/to/file')

    response = StreamingHttpResponse(z, content_type='application/zip')
    response['Content-Disposition'] = 'attachment; filename={}'.format('files.zip')
    return response

webpy

def GET(self):
    path = '/path/to/dir/of/files'
    zip_filename = 'files.zip'
    web.header('Content-type' , 'application/zip')
    web.header('Content-Disposition', 'attachment; filename="%s"' % (
        zip_filename,))
    return zipstream.ZipFile(path)

Running tests

With python version > 2.6, just run the following command: python -m unittest discover

Alternatively, you can use nose.

python-zipstream's People

Contributors

Stargazers

Watchers

python-zipstream's Issues

Attempting to compress file larger than ZIP64_LIMIT fails

On line 265:

zinfo.file_size = 0

This is never initialized to the correct file size before this code is called on line 291:

zip64 = self._allowZip64 and zinfo.file_size * 1.05 > ZIP64_LIMIT

resulting in zip64 always being set to false and later when zinfo.file_size is set on line 323 after the compression is performed and the file size is counted, the exception on line 326 is raised.

I will submit a pull request with a possible quick fix. Thanks!

Archiving a ZIP Archive

I seem to be having issues when archiving files, where a few of the files are ZIP archives.

This will put me into an endless loop of uncompressing a ZIP into a CPGZ, then a CPGZ into the original ZIP, and so on...

Is this a known issue? Is there a workaround?

Add file objects rather than paths?

Is there a way to add file-like objects rather than file paths? Something like:

handle = open('foobar', 'r')
z = zipstream.ZipFile(mode='w', compression=ZIP_DEFLATED)
z.write(handle)

A clear example for archiving whole folders and subfolders would be nice

The example in README seems to imply that simply doing z.write('/path/to/files/') is sufficient but that generates empty archives. Something like:

    z = zipstream.ZipFile(mode='w', compression=zipstream.ZIP_DEFLATED)
    for root, dirs, files in os.walk(path):
        for filename in files:
            file_path = os.path.join(root, filename)
            arcpath = os.path.join(path, os.path.relpath(file_path, path))
            z.write(file_path, arcpath)

Allow to zip empty directories

At the moment the library don't allow add empty directory into the zipped stream.

There is also a TODO comment in the code:
https://github.com/allanlei/python-zipstream/blob/master/zipstream/__init__.py#L214

@allanlei do you have any plan on this? I really would like to help and I've read the code, but I think that I should know the zip structure details in order to implement this feature.

Install tests directory as a package

Hi and thank you very much for this package !

It seems that python setup.py install also create a package corresponding to the tests directory.

On my system (debian wheezy), it creates the two following directory:

/usr/local/lib/python2.7/dist-packages/zipstream , which is fine
/usr/local/lib/python2.7/dist-packages/tests which causes issues for other tests suites from other local packages because it put a 'tests' directory on the PYTHON_PATH.

Deleting __init__.py from tests directory seems to fix it.
Excluding tests directory in setup.py might be another solution.

handling exceptions while streaming

This is the case scenario:

create a zipsrteam.ZipFile
add some files through the ZipFile.write function
delete some of the added files
stream the content using ZipFile.__iter__

So while streaming, you try to open a file but an exception (OSError or IOError) is raised. You want to skip that specific file and stream all other valid elements.

I did not find any way to achieve this behaviour. Maybe some option like skip_on_error could help.

Python 3.5 support

Hi, a new bug came in to Ubuntu recently - https://bugs.launchpad.net/ubuntu/+source/python-zipstream/+bug/1480548

It seems to me that Python 3.5 slightly changed the expected symantics on file-liek objects. It looks to me that simply removing the seek() function from the zipstream.PointerIO calss could do the trick. I have not investigated if there are any other problems yet as I don't have Python 3.5 yet.

Support password protected archive generation

The original encryption algorithm supported by zipfile (only in decryption) is known to be insecure but still needed in specific cases.

There is a pure python implementation of this feature in zipencrypt but you lose the stream feature of your library.

Goal of this issue is to merge both feature and to have the possibility of generating on-the-fly password protected zip file.

The documentation of write_iter is wrong.

z.write_iter(iterable(), 'my_archive_iter') should be z.write_iter('my_archive_iter', iterable())

Error in the examples

ZipStream class does not exist, we should use ZipFile class instead.

By trying auto completion in an ipython shell for "zipstream." that's what i get :
zipstream.BZIP2_VERSION zipstream.str
zipstream.LZMA_VERSION zipstream.stringCentralDir
zipstream.PointerIO zipstream.stringDataDescriptor
zipstream.ZIP64_LIMIT zipstream.stringEndArchive
zipstream.ZIP64_VERSION zipstream.stringEndArchive64
zipstream.ZIP_BZIP2 zipstream.stringEndArchive64Locator
zipstream.ZIP_DEFLATED zipstream.stringFileHeader
zipstream.ZIP_FILECOUNT_LIMIT zipstream.struct
zipstream.ZIP_LZMA zipstream.structCentralDir
zipstream.ZIP_MAX_COMMENT zipstream.structEndArchive
zipstream.ZIP_STORED zipstream.structEndArchive64
zipstream.ZipFile zipstream.structEndArchive64Locator
zipstream.ZipInfo zipstream.structFileHeader
zipstream.bytes zipstream.sys
zipstream.compat zipstream.time
zipstream.crc32 zipstream.unicode_literals
zipstream.os zipstream.with_statement
zipstream.print_function zipstream.zipfile
zipstream.stat zipstream.zlib

So, in every example, it should be : "z = ZipFile(mode='w', compression=ZIP_DEFLATED)"
And not "ZipStream"

ZipFile.iter seems to ignore data added with writestr()

Title says it all. Bug or writestr() wasn't meant to be supported?

Interested in a new maintainer? (update: forked package)

Dear @allanlei,
It seems you have no time available anymore to work on this package. Since there are people using it, would you be willing to add another owner/maintainer to the PyPI package and possibly this repo? That way we can ensure PR's are handled and new versions are released.
I would be willing to help out.
Kind regards

broken zip

Hi! This test code create zip but files are corrupted! Python 3.8

stream = ZipFile(mode='w', compression=ZIP_DEFLATED)
for file in path2files:
    stream.write(file, arcname=osp.basename(file))

for b in stream:
     handler.wfile.write(b)

handler is BaseHTTPRequestHandler object