Git Product home page Git Product logo

zipstreamer's Introduction

Zipstreamer Logo

Build and Test Format and Vet Docker Generation Go Report Card Go Reference

ZipStreamer is a golang microservice for streaming zip files from a series of web links, on the fly. For example, if you have 200 files on S3, and you want to download a zip file of them, you can do so in 1 request to this server.

Highlights include:

  • Low memory: the files are streamed out to the client immediately
  • Low CPU: the default server doesn't compress files, only packages them into a zip, so there's minimal CPU load (configurable)
  • High concurrency: the two properties above allow a single small server to stream hundreds of large zips simultaneous
  • Easy to host: several deployment options, including Docker images and two one-click deployers
  • It includes a HTTP server, but can be used as a library (see zip_streamer.go)

Content

JSON Zip File Descriptor

Each HTTP endpoint requires a JSON description of the desired zip file. It includes a root object with the following structure:

  • suggestedFilename [Optional, string]: The filename to suggest in the "Save As" UI in browsers. Defaults to archive.zip if not provided or invalid. Limited to US-ASCII.
  • files [Required, array]: an array descibing the files to include in the zip file. Each array entry required 2 properties:
    • url [Required, string]: the public URL of the file to include in the zip. Zipstreamer will fetch this via a GET request. The file must be publically accessible via this URL; if you're files are private, most file hosts provide query string authentication options which work well with Zipstreamer (example AWS S3 Docs).
    • zipPath [Required, string]: the path and filename where this entry should appear in the resulting zip file. This is a relative path to the root of the zip file.

Example JSON description with 2 files:

{
  "suggestedFilename": "tps_reports.zip",
  "files": [
    {
      "url":"https://server.com/image1.jpg",
      "zipPath":"image1.jpg"
    },
    {
      "url":"https://server.com/image2.jpg",
      "zipPath":"in-a-sub-folder/image2.jpg"
    }
  ]
}

HTTP Endpoints

POST /download

This endpoint takes a http POST body containing the JSON zip file descriptor, and returns a zip file.

Example usage with curl

Example curl usage of POST /download endpoint

# download a sample json descriptor
curl https://gist.githubusercontent.com/scosman/f57a3561fed98caab2d0ae285a0d7251/raw/4a9630951373e50f467f41d8c7b9d440c13a14d2/zipJsonDescriptor.json > zipJsonDescriptor.json
# call POST /download endpoint, passing json descriptor in body
curl --data-binary "@./zipJsonDescriptor.json" http://localhost:4008/download > archive.zip

GET /download

This endpoint fetches a JSON zip file descriptor hosted on another server, and returns a zip file. This is useful over the POST /download endpoint for a few use cases:

  • You want to hide from the client where the original files are hosted (see zsid parameter)
  • Use cases where POST requests aren't easy to adopt (traditional static webpages)
  • You want to trigger a browsers' "Save File" UI, which isn't shown for POST requests. See POST /create_download_link for a client side alternitive to achieve this.

This endpoint requires one of two query parameters describing where to find the JSON zip file descriptor:

  • zsurl: the full URL to the JSON file describing the zip. Example: /download?zsurl=https://yourserver.com/path_to_descriptors/82a1b54cd20ab44a916bd76a5
  • zsid: must be used with the ZS_LISTFILE_URL_PREFIX environment variable. The JSON file will be fetched from ZS_LISTFILE_URL_PREFIX + zsid. This allows you to hide the full URL path from clients, revealing only the end of the URL. Example: ZS_LISTFILE_URL_PREFIX = "https://yoursever.com/path_to_descriptors/" and /download?zsid=82a1b54cd20ab44a916bd76a5
Example usage with curl

Example curl usage of GET /download endpoint with zsurl parameter

curl -X GET "http://localhost:4008/download?zsurl=https://gist.githubusercontent.com/scosman/f57a3561fed98caab2d0ae285a0d7251/raw/4a9630951373e50f467f41d8c7b9d440c13a14d2/zipJsonDescriptor.json" > archive.zip

Example curl usage of GET /download endpoint with zsid parameter

# start server with ZS_LISTFILE_URL_PREFIX
ZS_LISTFILE_URL_PREFIX="https://gist.githubusercontent.com/scosman/" ./zipstreamer
# call `GET /download` endpoint with zsid
curl -X GET "http://localhost:4008/download?zsid=f57a3561fed98caab2d0ae285a0d7251/raw/4a9630951373e50f467f41d8c7b9d440c13a14d2/zipJsonDescriptor.json" > archive.zip

POST /create_download_link

This endpoint takes a http POST body containing the JSON zip file descriptor, stores it in a local cache, and returns a link ID which allows the caller to fetch the zip file via an additional call to GET /download_link/{link_id}.

This is useful for if you want to trigger a browser "Save File" UI, which isn't shown for POST requests. See GET /download for a server side alternative to achieve this.

Important:

  • These links only live for 60 seconds. They are expected to be used immediately.
  • This stores the link in an in-memory cache, so it's not suitable for deploying to a multi-server cluster without extra configuration. If you are hosting on a multi-server cluster, see the deployment section for configuration advice.

Here is an example response body containing the link ID. See docs for GET /download_link/{link_id} below for how to fetch this zip file:

{
  "status":"ok",
  "link_id":"b4ecfdb7-e0fa-4aca-ad87-cb2e4245c8dd"
}

Example usage: see GET /download_link/{link_id} documentation below.

GET /download_link/{link_id}

Call this endpoint with a link_id generated with /create_download_link to download that zip file.

Example usage with curl

Example curl usage of POST /create_download_link and GET /download_link/{link_id} endpoints working together

# download a sample json descriptor
curl https://gist.githubusercontent.com/scosman/f57a3561fed98caab2d0ae285a0d7251/raw/4a9630951373e50f467f41d8c7b9d440c13a14d2/zipJsonDescriptor.json > zipJsonDescriptor.json
# call POST endpoint to create link
curl --data-binary "@./zipJsonDescriptor.json" http://localhost:4008/create_download_link
# Call GET endpoint to download zip. Note: must copy UUID from output of above POST command into this URL
curl -X GET "http://localhost:4008/download_link/UUID_FROM_ABOVE" > archive.zip

Deploy

Heroku - One Click Deploy

Deploy

Be sure to enable session affinity if you're using multiple servers and using /create_download_link.

Google Cloud Run - One Click Deploy, Serverless

Run on Google Cloud

Cloud Run is ideal serverless environment for ZipStreamer, as it routes many requests to a single container instance. ZipStreamer is designed to handle many concurrent requests, and will be cheaper to run on this serverless architecture than a instance-per-request architecture like AWS Lamba or Google Cloud Functions.

Important

  • The one-click deploy button has a bug and may force you to set the optional environment variables. If the server isn't working, check ZS_URL_PREFIX is blank in the Cloud Run console.
  • Be sure to enable session affinity if you're using using /create_download_link. Cloud Run may scale up to multiple containers automatically.

Docker

This repo contains an dockerfile, and an image is published on Github Packages.

Build Your Own Image

To build your own image, clone the repo and run:

docker build --tag docker-zipstreamer .
# Start on port 8080
docker run --env PORT=8080 -p 8080:8080 docker-zipstreamer

Run Official Package from Github Packages

Official packages are published on Github packages. To pull latest stable release:

docker pull ghcr.io/scosman/packages/zipstreamer:stable
# Start on port 8080
docker run --env PORT=8080 -p 8080:8080 ghcr.io/scosman/packages/zipstreamer:stable

Note: stable pulls the latest github release. Use ghcr.io/scosman/packages/zipstreamer:latest for top of tree.

Configuration Options

These environment variables can be used to configure the server:

  • PORT - Defaults to 4008. Sets which port the HTTP server binds to.
  • ZS_URL_PREFIX - If set, the server will verify the url property of the files in the JSON zip file descriptors start with this prefix. Useful to preventing others from using your server to serve their files.
  • ZS_COMPRESSION - Defaults to no compression. It's not universally known, but zip files can be uncompressed, and used only to combining many files into one file. Set to DEFLATE to use zip deflate compression. WARNING - enabling compression uses CPU, and will reduce throughput of server. Note: for files with internal compression (JPEGs, MP4s, etc), zip DEFLATE compression will often increase the total zip file size.
  • ZS_LISTFILE_URL_PREFIX - See documentation for GET /download

Why

I was mentoring at a "Teens Learning Code" class, but we had too many mentors, so I had some downtime.

Logo

Zipper portion of logo by Kokota from Noun Project (Creative Commons CCBY)

zipstreamer's People

Contributors

scosman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

zipstreamer's Issues

Consider supporting streaming zip file descriptors

For larger zip files it can be prohibitive to provide the entire descriptor in one shot. It would be nice if the descriptor could be paginated in some way. The use case here is running a traditional prefork app on Heroku where 30 second time limits apply.

Stuck on Starting Server

First off, just wanted to say thanks this is such a useful repo! I've managed to deploy using Google Cloud run for testing and works great.

Only issue I'm having is trying to run it elsewhere. I want to run on AWS as that's where the rest of my infrastructure is, but when I try to run on an EC2 instance, on App Runner or on my MacBook locally, it just gets stuck on Server starting on port 8080.

Any ideas what might be causing that?

CONGRATULATIONS!

I have not tested the project yet... but it looks amazingly AWESOME.

One question though, I have no experience with Golang at all, even though there is even a Framework under my name... LOL

Do you suggest I should take a course on that, so I can catch up and understand this new environment?

Best regards

Hugo Barbosa
MyCloudVIP

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.