Git Product home page Git Product logo

Comments (7)

cyphar avatar cyphar commented on July 20, 2024

Ah, sorry this is actually caused by the automatic compression/decompression done during copying. It looks like ShouldCompressLayers isn't being handled properly, causing a copy from docker-daemon -> docker-archive to work, but docker -> docker-archive to not work.

from image.

cyphar avatar cyphar commented on July 20, 2024

Ah, it's because copy/copy.go doesn't actually appear to decompress layers if the target layer doesn't accept compression. Which is a bit ... odd.

from image.

mtrmac avatar mtrmac commented on July 20, 2024

So, to be explicit, am I correct that the issue is:

  • skopeo copy docker://… docker-archive:$x creates a valid (i.e. consumable by docker load) tarball, which contains compressed layers and DiffIDs of uncompressed data [but no record of the digests of the compressed data]
  • skopeo copy docker-archive:$x … then fails, because it uses the DiffIDs to identify layers but the actual files are compressed

?

Yeah, that’s pretty ugly. (For the record, it is not an issue for docker-daemon:, because the daemon transparently unpacks the compressed layers and verifies DiffIDs on docker load, and on docker save it always creates uncompressed layers.)

Now, whether we should be creating and/or accepting tarballs which contain compressed layers is really a matter of definition; AFAIK the tarball format is not formally documented, so strictly speaking we can do anything, e.g. one (or more) of the following:

  1. Always uncompress layers in archiveImageDestination so that the DiffID matches in the tarball. May waste space.
  2. Always uncompress layers in archiveImageSource so that the DiffID matches during verification. Fairly likely wastes time.
  3. Teach the copy.go digest verification code about some digests being of the compressed data / of the uncompressed data (we already have ugly code to compute DiffID values for compressed tarballs).
  4. Teach copy.go that some sources have expected digest verification failures. (That’s pretty risky and ugly.)

I guess my weak preference would be 2. or 3., because that makes any tarball acceptable to docker load also acceptable to docker-archive:. OTOH 1. is probably a bit easier to implement.

from image.

mtrmac avatar mtrmac commented on July 20, 2024

A particular concern, of course, is signatures: whether we verify, what we verify, and who decides what we verify, all matters. Right now, in the case of the docker-daemon: format, this is academic because the format can’t record or provide signatures, but we do intend to integrate that $somehow fairly soon. I’m afraid right now I don’t have a completely clear idea what that would look like and what that would mean for the whole problem space. (It does seem attractive to be able to authenticate the config.json from the signature $somehow, and then use the DiffID values from there to authenticate layers; rebuilding compressed tarballs to match expected digests of compressed data is pretty much a non-starter, see #157 .)

from image.

mtrmac avatar mtrmac commented on July 20, 2024

Thinking about this a bit more, the abstraction we impose on an ImageSource forces us to create an artificial manifest for GetManifest; and that manifest must refer to blobs using the same digests which we return in PutBlob. So, unless we invent a new manifest schema, options 3 and 4 are not viable.

(OTOH we also now have option 5: rework the abstraction, perhaps to expose the tar manifest.json as a new kind of manifest supported by containers/image/image and using the UpdatedImage mechanism for manifest conversion from/to schema2 and others. But that is a manifest of the tar file which may contain several images, it is a poor fit for an “image manifest”, and defining a single manifestItem to be the tar image manifest would be inventing a completely new format.)

So right now I am leaning towards either 2 (silently uncompress blobs in archiveImageSource.GetBlob) or 2a (when collecting the data for archiveImageSource.GetManifest, use the available DiffID if we detect the blob as uncompressed, and compute a new digest of the compressed data if the blob is compressed or unrecognizable).

from image.

cyphar avatar cyphar commented on July 20, 2024

@mtrmac Alright I'm back. Sorry for the extended silence.

I would prefer that we just decompress blobs when we're an ImageSource -- which means that there's no messing around with manifests or other structures -- we are just providing a different blob to the one we were originally given.

Handling signatures is a bit of an issue, but ultimately because of the hacks done with DiffIDs we will have to modify something. And that something will always be either the manifest or the actual blobs. I'd prefer if we maintain the invariant of identifiers of layers being content-addressible...

Currently I'm reworking #193 to make my current solution better.

from image.

mtrmac avatar mtrmac commented on July 20, 2024

I would prefer that we just decompress blobs when we're an ImageSource

ACK.

from image.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.