Git Product home page Git Product logo

Comments (7)

shuaichang avatar shuaichang commented on September 6, 2024

One feasible solution is to verify the blob with crc32, tried a 4.5GB blob and it took 6s to verify, which is not bad. When we do parallel and range read, we could also run checksum in parallel to make it faster.

The thing that's not easy is that once the URL is presigned, passing a header to GET the presigned URL will be access denied (403). Is it possible that the checksum (crc32 or other algorithm) can be part of the annotations, e.g. containerd.io/snapshot/overlaybd/blob-checksum-crc32:{value}, during conversion we can add a flag to opt in for checksum algorithm. If not passed, then the image spec will not contain the annotation.

Only contributing one idea and definitely open to other alternative options.

         "annotations": {
            "containerd.io/snapshot/overlaybd/blob-digest": "sha256:6452dbeda269615d2eadec32a783d863d7600cd4c9cbdd7cebfcc4b9ee61f1e7",
            "containerd.io/snapshot/overlaybd/blob-fs-type": "ext4",
            "containerd.io/snapshot/overlaybd/blob-size": "4734097920"
         }

from accelerated-container-image.

liulanzheng avatar liulanzheng commented on September 6, 2024

The best way is use cache type download, you may download the blob and write it into the corresponding sparse file. when a chunk is reading on demand, overlaybd will ask the kernel whether this chunk in the sparse file is hole or data. if it is a hole, overlaybd will read from remote and fill this chunk, otherwise, overlaybd read this chunk from sparse file directly. There are crc checksums in zfile format, if a mismatch block is loaded, zfile will evict the corresponding chunk from sparse file and refill the data.
If cache type file is used, you must be careful about gc. I don't known what whill happen when writing data from outside during gc.

from accelerated-container-image.

shuaichang avatar shuaichang commented on September 6, 2024

@liulanzheng @BigVan do you think it's possible that upstream provide some CRC validation cli for blob integrity check? We want to do the client side parallel download for the blobs and then check integrity, after which moving the blob to snapshot dir as overlaybd.commit file.

Happy to contribute the client side parallel download code if that's something overlaybd community is interested.

from accelerated-container-image.

simha-db avatar simha-db commented on September 6, 2024

does overlaybd perform the validation done by this tool before consuming the contents in registry cache?

from accelerated-container-image.

simha-db avatar simha-db commented on September 6, 2024

Does this work on the layer file or do i need to untar it first? when i run on the downloaded layer file - the sha256: file - it said invalid file. If i untar it - it works fine on the overlaybd.commit file.

I tried

overlaybd-zfile --verify -t -x layerfile

as well - but i got an error

format error! <source_file> should be a zfile.

from accelerated-container-image.

BigVan avatar BigVan commented on September 6, 2024

ZFile '--verify' option should skip tar header but seems I forget it....

I will fix it..

from accelerated-container-image.

simha-db avatar simha-db commented on September 6, 2024

Thanks - is this check done before reading registry_cache layer files already? And if it fails is it evicted?

from accelerated-container-image.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.