Git Product home page Git Product logo

client-server-flask-big-file-uploads's Introduction

Quick test of large file uploads from Python to Flask app

Problem described in reanahub/reana-client#302.

Prerequisites

Create a/some big file(s):

$ dd if=/dev/random of=2048MB-file.txt count=2048 bs=1048576
$ ls -lh 2048MB-file.txt
-rw-r--r--  1 username  staff   2.0G Jul  5 11:49 2048MB-file.txt

Local test

Run the server locally with Flask

Run the Flask application:

$ FLASK_ENV=development FLASK_APP=app.py flask run
...

Upload files of different sizes

In a different window upload big files:

$ ls -lah *-file.txt
-rw-r--r--  1 rodrigdi  staff   256M Oct 10 13:50 256MB-file.txt
-rw-r--r--  1 rodrigdi  staff   512M Oct 10 13:53 512MB-file.txt
-rw-r--r--  1 rodrigdi  staff   1.0G Oct 10 13:55 1GB-file.txt
-rw-r--r--  1 rodrigdi  staff   2.0G Oct 10 13:59 2GB-file.txt
$ python uploader.py [256MB|512MB|1024MB|2048MB]-file.txt \
                      -t [request.stream|request.files] \
                      --profile
Uploading 512MB-file.txt using request.stream ...
File 512MB-file.txt uploaded using request.stream.
$ # Profiling output is written to ./benchmark
$ ls ./benchmark/
application-octet-stream_2147483648bytes.txt
application-octet-stream_1073741824-bytes.txt
application-octet-stream_536870912-bytes.txt
application-octet-stream_268435456-bytes.txt
multipart-form-data_1073741972-bytes.txt
multipart-form-data_536871062-bytes.txt
multipart-form-data_268435606-bytes.txt

Passing file between Flask applications

Start two applications (you will need to terminal sessions):

$ FLASK_ENV=development FLASK_APP=app flask run -p 5001
...
$ FLASK_ENV=development FLASK_APP=app flask run -p 5000
...

In a another session stream upload a big file:

$ ipython
with open('2GB-file.txt', 'rb') as f:
    requests.post('http://localhost:5000/stream-pass-to-next',
                  headers={
                      'Content-Type': 'application/octet-stream'},
                  params={'profile': True},
                  data=f)

Closer to production scenario: using Docker and uwsgi

Build the server side application and run it in a different terminal session:

$ docker build . -t big-file-upload
$ docker run -ti --rm -p 5000:5000 \
             -v `pwd`/docker-benchmark:/code/benchmark \
             big-file-upload
...

Upload the file from your host machine:

$ python uploader.py 2048MB-file.txt -t request.stream --profile
Uploading 2048MB-file.txt using request.stream ...
File 2048MB-file.txt uploaded using request.stream.
$ python uploader.py 2048MB-file.txt -t request.files --profile
Uploading 2048MB-file.txt using request.files ...
File 2048MB-file.txt uploaded using request.files.

Conclusions

Disclaimer: All this tests do not take into account network overhead.

File size Method Deployment Time
256 MB command dd bash 6.280 seconds
256 MB request.stream Flask 5.196 seconds
256 MB request.files Flask 20.431 seconds
256 MB request.stream uwsgi + Docker 77.104 seconds
256 MB request.files uwsgi + Docker 457.781 seconds
512 MB command dd bash 12.307 seconds
512 MB request.stream Flask 10.323 seconds
512 MB request.files Flask 41.110 seconds
1 GB command dd bash 24.582 seconds
1 GB request.stream Flask 21.085 seconds
1 GB request.files Flask 84.105 seconds
2 GB command dd bash 49.426 seconds
2 GB request.stream Flask 41.558 seconds
2 GB request.files Flask crashed

Now, choosing request.stream, let's compare differences between:

  • 1 Hop: sending the file to a Flask app and writing it directly to disk
  • 2 Hops: sending the file to a Flask app, which streams it to another Flask app and writes it to disk
File size Hops Time
256MB 1 5.196 seconds
256MB 2 1.611 seconds
512MB 1 10.323 seconds
512MB 2 3.237 seconds
1GB 1 21.085 seconds
1GB 2 9.472 seconds
2GB 1 41.558 seconds
2GB 2 20.078 seconds
  • The request.files upload type is taking 4 times the time request.stream takes

  • We can also observe that in the case of request.files for a 1GB file most of the time is spent in

            2    9.804    4.902   78.880   39.440 /Users/rodrigdi/.virtualenvs/big-file-upload-server/lib/python3.6/site-packages/werkzeug/formparser.py:531(parse_parts)
    

    This is because werkzeug writes to a tempfile any file bigger that 500KB.

  • There is clearly something that needs to be improved regarding uwsgi configuration.

  • Looks like the 2 hops setup is roughly 2x faster. Since this is suspicious, you can check the file integrity (original file versus uploaded file) like follows:

    $ bash check-files-integrity.sh
    Removing existing files and downloads ...
    Re-creating new files of sizes: 256MB 512MB 1024MB 2048MB
    256+0 records in
    256+0 records out
    268435456 bytes transferred in 6.280994 secs (42737735 bytes/sec)
    Original file 256MB-file.txt created.
    512+0 records in
    512+0 records out
    536870912 bytes transferred in 12.307863 secs (43620157 bytes/sec)
    Original file 512MB-file.txt created.
    1024+0 records in
    1024+0 records out
    1073741824 bytes transferred in 24.582782 secs (43678613 bytes/sec)
    Original file 1024MB-file.txt created.
    2048+0 records in
    2048+0 records out
    2147483648 bytes transferred in 49.426168 secs (43448314 bytes/sec)
    Original file 2048MB-file.txt created.
    Uploading 256MB file ...
    Uploading 512MB file ...
    Uploading 1024MB file ...
    Uploading 2048MB file ...
    md5(downloads/256MB_0de0d949-6acd-46d2-b02c-aaf86112ba37) -> 76b422e5e525096c2e9c3844e6ae49c0
    md5(256MB-file.txt) -> 76b422e5e525096c2e9c3844e6ae49c0
    256MB - 0
    md5(downloads/512MB_3d6d1acf-6e0c-4799-ad42-9f04f2bde831) -> cf47f19ab9d2e7073741aa4d474e58c2
    md5(512MB-file.txt) -> cf47f19ab9d2e7073741aa4d474e58c2
    512MB - 0
    md5(downloads/1024MB_4242840a-7289-442b-bb55-978a261b9ba1) -> d3d36b9aba3bbd812955983159deb2ed
    md5(1024MB-file.txt) -> d3d36b9aba3bbd812955983159deb2ed
    1024MB - 0
    md5(downloads/2048MB_207b0f69-5971-493e-b52a-0bcc9b076393) -> 2d16f591080ba0fb397743ef8cdbb399
    md5(2048MB-file.txt) -> 2d16f591080ba0fb397743ef8cdbb399
    2048MB - 0

Next

On the client side: Use streaming upload with Requests as stated in Requests docs with application/octet-stream content type:

with open(big_file, 'r') as f:
    request.post('http://localhost/upload-endpoint', data=f,
                 headers={'Content-Type': 'application/octet-stream'})

On the server side: Use request.stream object:

with open(path_to_new_file, 'w') as f:
    f.writelines(request.stream)

client-server-flask-big-file-uploads's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

client-server-flask-big-file-uploads's Issues

decode to get origin file uploaded

I thank you for this amazing project. Now i facing with file uploaded is big image (by upload_request_stream func), and when i have file binary stream save in downloads folder, how to decode to get origin image uploaded from binary stream
I appreciate your help. Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.