Git Product home page Git Product logo

Comments (8)

mtrmac avatar mtrmac commented on July 21, 2024 1

If I understand it correctly, the idea is that a layer that is 50% of the total image size should be started first, so that the others can all be pulled in parallel with the big one, and the total time is about the same as the time to pull the big layer, instead of spending time copying 10 smaller layers, and only after most of that is done, starting the big layer.

That only makes a difference for moderately unbalanced images, where the largest layer is probably > 1/6 of the total image size, but not something like 99%.


I think it’s an interesting optimization worth exploring. We can’t/shouldn’t do that for c/storage, and due to compatibility we’d need an opt-in anyway, but that’s not too bad.

We might want to think about the UI impact — e.g. should we list the progress bars in the original image order, to show the user what’s going on? Currently we create the progress bars only when we start pulling, in order. That might end up being the most complex part of the feature.

from image.

vrothberg avatar vrothberg commented on July 21, 2024

Thanks for reaching out.

Can you elaborate on why the order matters?

As for pulls: the order must be preserved as the layers must be applied to the local storage in the exact order.

from image.

mirekphd avatar mirekphd commented on July 21, 2024

Adequate unbalancing is guaranteed in many containerized python applications for example, which have to be based on Ubuntu, so the base image layer is much larger than the application layers (all the way up to the NVIDIA CUDA images with their astoundingly heavy 3.5... GB base images). The problem is if the unbalanced images are pre-sorted already, and this unfortunately looks likely, as the base layer is first already, so the size-sorting might not make much of a difference in practice.

On the other hand, the forking has to be done anyway, and altering its sequence does not add any extra overhead, so unless there is some noticeable overhead on gathering layer sizes and sorting them or on accessing server-side layers "out of order", this new method should be always outperforming the current method, regardless of how small or unnoticeable (and performance gains should be double, because they should be also achievable during the push phase). I suspect the main reason why this is has not been done already like this is the way in which the legacy system from which skopeo inherited operates. The docker pull however has a very different use case - to run the container after the pull is complete, rather than to immediately push it somewhere else.

from image.

mtrmac avatar mtrmac commented on July 21, 2024

The way c/storage is set up, pulls must create layers from base to the last child, in order (they have parent links).

Now, whether that’s a 100% hard requirement, where we just can’t create the child before the parent, or more of an implementation choice, depends on the graph driver (it‘s 100% hard for device-mapper-snapshots, and it might be a choice for overlay, but I’m not quite sure). Even if it were 100% an implementation choice, that would be a pretty large implementation effort (we would need to have a concept of an extracted diff that is not yet a layer, a mechanism to turn that into a layer quickly, and a cleanup mechanism to delete that extracted diff on unexpected aborts).


For direct registry-to-registry copies, this should be quite easy to do; the progress UI is the hardest part, the rest is just mechanical work. (But note that such copies are not pulls+pushes with a disk intermediary; they are direct streaming copies, so there are no “double” gains.)

For pushes, I think it’s same as registry-to-registry copies, but there’s a small chance I’m missing something.

from image.

github-actions avatar github-actions commented on July 21, 2024

A friendly reminder that this issue had no activity for 30 days.

from image.

rhatdan avatar rhatdan commented on July 21, 2024

You would also take up more temporary space as the blobs would exist on disk for a longer point of time. Currently once a blob is downloaded, completely that layer is applied to storge and the layer is removed.

But if this is a minor change, I think we should do it.

from image.

github-actions avatar github-actions commented on July 21, 2024

A friendly reminder that this issue had no activity for 30 days.

from image.

mtrmac avatar mtrmac commented on July 21, 2024

Moving to c/image; this would be transparent to Skopeo itself.

from image.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.