Git Product home page Git Product logo

Comments (6)

robertmeta avatar robertmeta commented on July 21, 2024

Your python implementation downloads them one by one like this? Are you certain "boto" isn't doing any work behind the scenes to download files concurrently? At the very least I would at least make the go version concurrent, as Amazon recommends: "When executing many requests from a single client, use multi-threading to enable concurrent request execution." ( https://aws.amazon.com/articles/1904 )

from goamz.

grotos avatar grotos commented on July 21, 2024

Yes, Python implementation is the quite similar: I iterate through list of keys.

I was thinking about multi-threading, but I am surprised that simple approach is so much slower in comparison to Python code.

from goamz.

robertmeta avatar robertmeta commented on July 21, 2024

You could profile it -- profiling in Go is exceptionally simple. I assume your CPUs are not pegged while doing this, so it isn't a CPU load issue, it is going to end up being something simple I am guessing, and wait time / wall clock time. .Get reads everything into memory, then you copy everything again into the file... seems like moving small groups of bytes from the network to the file would make a lot more sense.

What call are you using on the python side, something ..._download_to_file... where it might do smarter internal buffering.

It would be interesting to see your profile output, as I am curious what might be the part you are getting stuck on. That said, S3 is built for concurrency and so is Go -- I would be doing all of those downloads concurrent across Cores * 2 threads... the naive "do it all at once" approach will probably scale well up to thousands or tens of thousands of connections.

from goamz.

grotos avatar grotos commented on July 21, 2024

I did profiling, here is the result: https://gist.github.com/grotos/f001cd2149630067426f

Stats: 163 files / 1.6MB / 25.5469062s ("manual", simple fmt.Println timing)

Why timing program in console output shows that code took 25s, whereas in profiling it is only 1.8s?

from goamz.

glycerine avatar glycerine commented on July 21, 2024

The profile seems to say that most time is being spent doing crypto stuff.

If python is using OpenSSL for crypto, that would explain why it is 3x faster than Go. The default Go crypto libraries don't use Gueron and Krasnov's P256 assembly routines for speed like OpenSSL does. However those performance improvements have been ported to Go and are available as a library (although not yet in the standard library due to Intel's refusing to re-license them in a form acceptable to the Go team). You can get fast go crypto libraries here:

https://github.com/glycerine/fast-elliptic-curve-p256

and read more about the improvements (or use CloudFlare's Go release linked) here

https://blog.cloudflare.com/go-crypto-bridging-the-performance-gap/

Let me know if that solves your performance problem.

from goamz.

ianmacartney avatar ianmacartney commented on July 21, 2024

You can see big performance improvements by re-using connections. Looks like you spend a lot of time in handshaking. Looks like #117
can fix that.

from goamz.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.