Git Product home page Git Product logo

go-containerregistry's Introduction

go-containerregistry

GitHub Actions Build Status GoDoc Code Coverage

Introduction

This is a golang library for working with container registries. It's largely based on the Python library of the same name.

The following diagram shows the main types that this library handles. OCI image representation

Philosophy

The overarching design philosophy of this library is to define interfaces that present an immutable view of resources (e.g. Image, Layer, ImageIndex), which can be backed by a variety of medium (e.g. registry, tarball, daemon, ...).

To complement these immutable views, we support functional mutations that produce new immutable views of the resulting resource (e.g. mutate). The end goal is to provide a set of versatile primitives that can compose to do extraordinarily powerful things efficiently and easily.

Both the resource views and mutations may be lazy, eager, memoizing, etc, and most are optimized for common paths based on the tooling we have seen in the wild (e.g. writing new images from disk to the registry as a compressed tarball).

Experiments

Over time, we will add new functionality under experimental environment variables listed here.

Env Var Value(s) What is does
GGCR_EXPERIMENT_ESTARGZ "1" ⚠️DEPRECATED⚠️: When enabled this experiment will direct tarball.LayerFromOpener to emit estargz compatible layers, which enable them to be lazily loaded by an appropriately configured containerd.

v1.Image

Sources

Sinks

v1.ImageIndex

Sources

Sinks

v1.Layer

Sources

Sinks

Overview

mutate

The simplest use for these libraries is to read from one source and write to another.

For example,

  • crane pull is remote.Image -> tarball.Write,
  • crane push is tarball.Image -> remote.Write,
  • crane cp is remote.Image -> remote.Write.

However, often you actually want to change something about an image. This is the purpose of the mutate package, which exposes some commonly useful things to change about an image.

partial

If you're trying to use this library with a different source or sink than it already supports, it can be somewhat cumbersome. The Image and Layer interfaces are pretty wide, with a lot of redundant information. This is somewhat by design, because we want to expose this information as efficiently as possible where we can, but again it is a pain to implement yourself.

The purpose of the partial package is to make implementing a v1.Image much easier, by filling in all the derived accessors for you if you implement a minimal subset of v1.Image.

transport

You might think our abstractions are bad and you just want to authenticate and send requests to a registry.

This is the purpose of the transport and authn packages.

Tools

This repo hosts some tools built on top of the library.

crane

crane is a tool for interacting with remote images and registries.

gcrane

gcrane is a GCR-specific variant of crane that has richer output for the ls subcommand and some basic garbage collection support.

krane

krane is a drop-in replacement for crane that supports common Kubernetes-based workload identity mechanisms using k8schain as a fallback to traditional authentication mechanisms.

k8schain

k8schain implements the authentication semantics used by kubelets in a way that is easily consumable by this library.

k8schain is not a standalone tool, but it is linked here for visibility.

Emeritus: ko

This tool was originally developed in this repo but has since been moved to its own repo.

go-containerregistry's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-containerregistry's Issues

Rename d8s

d8s is a placeholder name, we should find a better name before this one sticks.

imgctl and regctl seem to be the current leaders. Your ideas welcome.

d8s pull && d8s push doesn't work

$ bazel run cmd/d8s:d8s -- pull ubuntu $PWD/ubuntu.tar.gz
2018/04/23 12:50:07 Pulling index.docker.io/library/ubuntu:latest
$ ls -lh ubuntu.tar.gz 
-rw-r--r-- 1 redacted redacted 42M Apr 23 12:50 ubuntu.tar.gz
$ bazel run cmd/d8s:d8s -- push $PWD/ubuntu.tar.gz gcr.io/some-project/ubuntu:d8s
2018/04/23 12:52:07 Pushing gcr.io/some-project/ubuntu:d8s
2018/04/23 12:52:10 mounted sha256:817da545be2ba4bac8f6b4da584bca0fb4844938ecc462b9feab1001b5df7405
2018/04/23 12:52:10 mounted sha256:a9b30c108bda615dc10e402f62d712f413214ea92c7ec4354cd1cc0f3450bc58
2018/04/23 12:52:10 mounted sha256:67de21feec183fcd009a5eddc4de8c346ee0f4369a20047f1a302a90716fc741
2018/04/23 12:52:10 mounted sha256:d967c497ce230b63996a7b1fc6ec95b741aea9348118d3328c676f13be789fa7
2018/04/23 12:52:10 mounted sha256:d3938036b19cfa369e1081a6776b07b54be9612bc4c8fed7f139370c8142b79f
2018/04/23 12:52:10 blob sha256:739f09eca6e43d4e99d63fa442d2daeefc3fc1dc46b13ec5a1414de74a2b7c86 not found

This happens for images:

  • pulled from gcr.io/aaa/foo and pushed to gcr.io/bbb/foo, and
  • pulled from gcr.io/aaa/foo and pushed to gcr.io/aaa/bar

...so it doesn't seem to be correlated to cross-registry or cross-repository pushes.

crane append: cross repo mount causes "UNAUTHORIZED" error on Dockerhub

I've identified a potential bug with crane append on both macOS 10.13.4 and Linux (docker4mac).

Crane version: 48dcc10b3aad9a760316aba5c88138bcc60dbcd1
Go version (mac): go version go1.10 darwin/amd64
Go version (linux): go version go1.10 darwin/amd64 (GOOS=linux)

When the source is a Dockerhub repo and the destination is a different Dockerhub repo, I get the following error:

 opal:test stephen$ crane append packs/cflinuxfs2:run sclevine/some-app:latest ./layer.tgz
2018/05/06 16:58:34 UNAUTHORIZED: "authentication required"

I get the same error if I have write access to both repos:

 opal:test stephen$ crane append sclevine/cflinuxfs2:run sclevine/some-app:latest ./layer.tgz
2018/05/06 16:59:45 UNAUTHORIZED: "authentication required"

Using the same repo for both the image source and image destination succeeds.

The authorization error is coming from this call:

if err := remote.Write(dstTag, image, dstAuth, http.DefaultTransport, opts); err != nil {

I was able to fix this by removing these lines (which create a cross-repo mount):

if srcRef.Context().RegistryStr() == dstTag.Context().RegistryStr() {
opts.MountPaths = append(opts.MountPaths, srcRef.Context())
}

This is very slow, presumably because the base layers are copied to the host and back to the registry.

Consider adding a utility to validate Images

This could be used in testing or by derived tools to validate the constraints imposed by Docker on an image. It could cover things like:

  • relationship between config file history and layers
  • layer sizes in the manifest match actual layer sizes
  • config digest in the manifest matches the actual config digest

etc.

Docker can't parse our created times

Not sure where exactly the issue is, but Docker can't parse an empty creation time. Trying to docker load a saved tarball gives:

parsing time """" as ""2006-01-02T15:04:05Z07:00"": cannot parse """ as "2006"

Implement a way to create an image from scratch.

Thoughts on the name: parts.Image?

The idea would be a constructor that takes an ordered list of layers and a config file, and assembles them into an Image. This would be useful for modifying the config of an existing image, modifying layers, or other mutation operations.

Semantic difference between v1.Image.Layers & serialization format forces people to think

v1.Image.Layers() returns the layers with the most recent being first.

In contrast, the serialization format (OCI Image Spec) orders layers with the base being first. Similarly in the config spec rootfs.diff_ids are ordered to match the manifest layers.

This difference led to some bugs: pull/63/commits/bc06df7c5dc16c72b9f1968a3e46a8aedfd4a878

I think we should change the OCI Image Spec be consistent and havev1.Image.Layers() return the base layer first.

github.com/google/subcommands isn't very user friendly

Should consider using another library or patching it.

Biggest issue is if your subcommand returns ExitUsageError during Execute it doesn't actually print the usage.

$ d8s append
$ # Nothing was printed!

Additionally, when performing d8s [subcommand] -h or d8s [subcommand] -non-existent-flag the output is confusing. See the weird -o string ... output below

$ d8s append -h
append [-o output-file] <src-reference> <dest-tag> <tarball>  -o string
    	output the resulting image to a new tarball

Support Manifest Lists

We should add support for manifest lists, if only to avoid needing this non-Fatal workaround for DockerHub:

2018/04/29 07:27:03 manifest digest: "sha256:90f24abe180424046a5d53f6fc6f9fdb8f79b835cb2fd7d1a782e4c30dfb5dcc" does not match Docker-Content-Digest: "sha256:d2518289e66fd3892c2dae5003218117abeeed2edbb470cba544aef480fb6b3a" for "index.docker.io/library/ubuntu:latest"

Jon, you're easily the best person for the job :)

Unified CLI surface

The cmd directory currently has four separate binaries a user can use to interact with a remote registry: puller, pusher, poke (which gets only metadata), and deleter. It looks like we'll be adding appender soon in #63.

I think we should head off the proliferation of separate binaries and move them into a single command with subcommands for each operation. This will allow users to install a single binary into their path, and could allow some code reuse and help standardize conventions for the commands.

This would also make it easier to package as a builder image for GCB.

Subcommands

  • get (instead of poke)
  • pull
  • push
  • delete
  • append
  • tag (list|add|remove)
  • label (list|add|remove)
  • flatten
  • rebase?

Conventions

Here are some potentially controversial CLI conventions for us to bikeshed over:

  • Use the default keychain for all authorization.

  • If the image produces a new image, print that new image's reference by digest to stdout.

  • Positional args: If a command operates on an image by reference, it should be a positional arg. Required flags should be positional args as much as makes sense. E.g., the-command pull ubuntu ubuntu.tar.gz instead of the-command pull --tarball=ubuntu.tar.gz --tag=ubuntu. If the command takes more than a couple args, or if both args are image references, they should be named flags to avoid confusion.

  • No er suffix: Subcommands should be pull, push, delete, append, etc., and not the-command puller foo bar.tar.gz.

  • Use google/subcommands: I don't have a strong preference, but that one seems good.

What to name it

As with any Serious Engineering Endeavor, naming is the only hard part. Here are some naming suggestions for us to bikeshed over:

WDTY?

Split v1.Image

In the Python client, we provided base implementations (in our ABC) for things like UncompressedBlob that simply decompressed the result of its sibling.

To achieve the same for Go, my thinking was that we'd split apart the v1.Image interface into:

  1. Common accessors,
  2. Accessors for natively uncompressed image forms (e.g. tarball.Image),
  3. Accessors for natively compressed image forms (e.g. remote.Image),

... and then provide functions for turning v1.{Foo,Bar}Image => v1.Image, which would implement the remaining methods in terms of its available sibling methods.

WDYT?

Refactor v1.Image to use a v1.Layer abstraction

I am considering refactoring v1.Image to use a second interface type v1.Layer to cut down on the cross-product of methods currently present.

// Layer is an interface for accessing the properties of a particular layer of a v1.Image
type Layer interface {
	// Digest returns the Hash of the compressed layer.
	Digest() (Hash, error)

	// DiffID returns the Hash of the uncompressed layer.
	DiffID() (Hash, error)

	// Compressed returns an io.ReadCloser for the compressed layer contents.
	Compressed() (io.ReadCloser, error)

	// Uncompressed returns an io.ReadCloser for the uncompressed layer contents.
	Uncompressed() (io.ReadCloser, error)

	// Size returns the compressed size of the Layer.
	Size() (int64, error)
}

// Image defines the interface for interacting with an OCI v1 image.
type Image interface {
	// Layers returns the ordered collection of filesystem layers that comprise this image.
	// The order of the list is most-recent first, and oldest base layer last.
	Layers() ([]Layer, error)

	// BlobSet returns an unordered collection of all the blobs in the image.
	BlobSet() (map[Hash]struct{}, error)

	// MediaType of this image's manifest.
	MediaType() (types.MediaType, error)

	// ConfigName returns the hash of the image's config file.
	ConfigName() (Hash, error)

	// ConfigFile returns this image's config file.
	ConfigFile() (*ConfigFile, error)

	// RawConfigFile returns the serialized bytes of ConfigFile()
	RawConfigFile() ([]byte, error)

	// Digest returns the sha256 of this image's manifest.
	Digest() (Hash, error)

	// Manifest returns this image's Manifest object.
	Manifest() (*Manifest, error)

	// RawManifest returns the serialized bytes of Manifest()
	RawManifest() ([]byte, error)

	// LayerByDigest returns a Layer for interacting with a particular layer of
	// the image, looking it up by "digest" (the compressed hash).
	LayerByDigest(Hash) (Layer, error)

	// LayerByDiffID is an analog to LayerByDigest, looking up by "diff id"
	// (the uncompressed hash).
	LayerByDiffID(Hash) (Layer, error)
}

I'm unsure what this will do do v1/partial or the other implementations, but this feels a bit cleaner to me. WDYT?

Experiment: //cmd/ko

tl;dr I want a more purpose built variation of what you get with bazel + rules_go + rules_docker + rules_k8s that simply wraps the Go toolchain directly.

Background: State of Bazel

Today in Bazel, my build is described in BUILD files (largely generated by Gazelle) and extended to handle containerization and kubernetesification via:

# BEGIN generated by Gazelle
go_library(
    name = "go_default_library",
    ...
)

go_test(
    name = "go_default_test",
    ...
)

go_binary(
    name = "name-of-directory",
    ...
)
# END generated by Gazelle

# Wrap the Go binary into a minimal container image.
go_image(
    name = "image",
    binary = ":name-of-directory",
)

# Helper for interacting with my K8s "Deployment"
k8s_object(
    name = "deployment",
    template = "deployment.yaml",
    images = {
        # Associate the image reference in deployment.yaml with the associated binary target.
        "gcr.io/foo/bar:baz": ":image",
    },
)

With this, I can quickly iterate and redeploy via bazel run :deployment.apply.

Unfortunately, it's painful to deal with Go in Bazel because it forks the source of truth for dependencies: the import paths in *.go. This often leaves you running Gazelle before build commands, which breaks the promise of rules_k8s: a single command for rapid iteration.

You also end up checking in a lot of redundant configuration.

Background: Go importpaths

As alluded to above, Go declares dependencies via a block like this at the top of each file:

import (
    "github.com/google/go-containerregistry/authn"
    "github.com/google/go-containerregistry/name"
)

Binary targets are determined by the declaration of package main within a particular one of these paths. For example, we have a binary that can be referenced via the Go importpath: github.com/google/go-containerregistry/cmd/d8s

Foreshadowing

Gee, Go importpaths look a LOT like Docker image references.

Convention over Configuration

This idea builds around an extension of the Go importpath convention to tie Go binaries all the way through to one's Kubernetes configuration. Instead of writing an image reference in your Kubernetes configuration, you would write a Go importpath:

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: foo
spec:
  replicas: 1
  template:
    spec:
      containers:
      - name: bar
        image: github.com/path/to/my/thing

In Bazel, we had to manually associate our BUILD target for go_image with the image reference in the template. In this example, we would implicitly associate embedded Go importpath references to a binary, which we'd wrap in a container in a similar fashion to go_image.

This set of conventions should enable us to have something like rules_k8s that's driven purely by conventions without forking any sources of truth.

Hold your horses

... unless Github is launching a Docker registry, this is still a fantasy. We can't publish images to those embedded references!

Luckily we've seen and solved this same problem with Bazel already, via: image_chroot. The value of image_chroot is used to prefix image references within the templates (for things we plan to publish). So image_chroot = "gcr.io/foo" with an image: github.com/bar/baz would become the image reference: image: gcr.io/foo/github.com/bar/baz. This looks funny, but it is a legitimate Docker image URI.

Unfortunately, this will not work with certain registries (including DockerHub), which disallow multi-level image names. I'm calling this an acceptable loss and not going to lose a lot of sleep over it, at least for now.

Ok, Some Configuration...

So we will still have at least a little configuration for the typical flow (to configure where to publish images), but there are also some pretty obvious knobs that folks will look for pretty quickly (e.g. what base image?).

I think for the purposes of prototyping, I'm going to just use $KO_DOCKER_REPO for the base repository with which to chroot image references.

Over time, I think the most idiomatic solution for configuration will be to adopt something like Kopkg.toml (and Kopkg.lock). This would enable us to reference a base image (tag) and lock it (digest) for reproducible builds.

Extra Credit

Given the incredible portability of Go, I can easily see us extending this to produce "fat" images (aka "manifest lists") pretty trivially. I think that this should motivate us to structure our configuration in a certain way:

# TODO(mattmoor): We want this to be something individual developers can
# override (e.g. stamp variables), so perhaps KO_DOCKER_REPO could still override this?
repository = "gcr.io/my-project"

[[linux/amd64]]
  base = "gcr.io/base-images/blah:amd64"

[[linux/arm]]
  base = "gcr.io/base-images/blah:arm"

[[windows/amd64]]
  base = "gcr.io/base-images/blah:win64"

The ko CLI Surface

The goal of ko is to effectively blend:

  1. The CLI surface of go (e.g. build, test, ...)
  2. An enhanced subset of kubectl (e.g. apply, create, update, delete)

The "enhancement" of the latter is that we will build, publish, and resolve image references within templates in a similar manner to rules_k8s.

Tying it all together

With this surface, I should be able to have a fast-path development experience similar to bazel run :deployment.apply by simply running: ko apply -f deployment.yaml

name.NewTag bug

name.NewTag has a WeakValidation bug. When the host part contains a port (e.g. foo.io:8383), and the :latest tag is omitted, we incorrectly treat the port and repository portion as the tag, and then reject it due to illegal characters.

Can't push v1.random.Image

When I try to remote.Write a random.Image(3, 1024) image to GCR I am seeing:

2018/04/20 13:42:00 pushed sha256:b64cd7971bfe7ea90fbdfa19ca0f87b32f947f88ae2d843fc982f95d4fe88acb
2018/04/20 13:42:19 DIGEST_INVALID: "'digest' parameter 'sha256:39ebc7139b5962acd73c597590937476080b432619632930e17ae87b411bb124' does not match computed digest 'sha256:7c31563558f89064288cadafe17818b596f302bd95a070e80d1b185814776788'."

However, when I publish a tarball.Image() it works just fine. Something's broke in random.Image...

Manifest digest mismatch against Dockerhub

package main

import (
        "log"
        "net/http"

        "github.com/google/go-containerregistry/authn"
        "github.com/google/go-containerregistry/name"
        "github.com/google/go-containerregistry/v1/remote"
)

func main() {
        ref, err := name.NewTag("mirror.gcr.io/library/ubuntu", name.WeakValidation)
        if err != nil {
                log.Fatalf("NewTag: %v", err)
        }
        img, err := remote.Image(ref, authn.Anonymous, http.DefaultTransport)
        if err != nil {
                log.Fatalf("Image: %v", err)
        }
        dig, err := img.Digest()
        if err != nil {
                log.Fatalf("Digest: %v", err)
        }
        log.Println(dig)
}

Running this program I get:

2018/04/11 22:40:24 Digest: manifest digest: sha256:52286464db54577a128fa1b1aa3c115bd86721b490ff4cbd0cd14d190b66c570 does not match Docker-Content-Digest: sha256:e348fbbea0e0a0e73ab0370de151e7800684445c509d46195aef73e090a49bd6
exit status 1

Changing the reference to mirror.gcr.io/library/ubuntu prints the digest, 52286484db... without any error.

We should disallow accessing the config through LayerBy*

Especially now that we have v1.Layer, it is awkward to reason about ConfigFile as a layer.

This has been the source of multiple bugs, we should just eliminate this and require that LayerBy* be used exclusively to access something from Layers().

The history here is that since the config is stored as a blob in the registry, we had incorporated it into concepts like blob_set() (but not fs_layers()!)...

IsGzipped consumes the io.Reader it's reading

func TestIsGzipped(t *testing.T) {
  var buf bytes.Buffer
  w := gzip.NewWriter(&buf)
  w.Write([]byte("hello"))

  if is, _ := IsGzipped(&buf); !is {
    t.Errorf("IsGzipped identified contents as not gzipped")
  }

  if is, _ := IsGzipped(&buf); !is {
    t.Errorf("IsGzipped identified contents as not gzipped on retry") // <-- boom
  }
}

IsGzipped determines whether the given reader represents gzipped data by consuming the first two bytes of the reader and looking for a magic header. This consumes the data, so subsequent IsGzipped checks will not identify the contents as gzipped.

This isn't a problem in practice while Readers are only checked once while reading tarballs, but this could become a problem in the future, or if a client decides to call IsGzipped themselves on a Reader before passing it to tarball.Image (naively wrapped in an Opener).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.