Git Product home page Git Product logo

estaleiro's Introduction

estaleiro [istaˈlejru]





masculine noun - shipyard

estaleiro allows you to ship container images with confidence - a declarative approach to dealing with the last mile in building container images, so you can have more control (through transparency) over what you ship.




HIGHLY EXPERIMENTAL - DO NOT USE THIS


Table of Contents

problem set

Keeping track of what has been added to a container image that one is about to ship if hard.

With the versatility of Dockerfiles, it's quite easy to shoot itself in the foot when it comes to either installing dependencies that one couldn't even consume, or that wouldn't be wise.

While it's great to talk about best practices, it's hard to enforce them.

estaleiro

estaleiro sits at the very final portion of your image building process, gating what gets into the final container image that is supposed to be shipped to your customers.


                                   .-----------.
    somehow you build stuff  --->  | estaleiro | --> final container image
                                   *-----------*              +
                                                       bill of materials
                                                              
    with the Dockerfiles and                                  |
     build process that you                                   |
         already have                                         .

                                                      anything that gets into the
                                                      final container image *MUST*
                                                      declare where it comes from.


It leverages buildkit as a way of implementing a convention of how the last stage in building a container image (i.e., gathering binaries built in previous steps), putting guard-rails where needed, and enforcing a set of rules where needed.

Here's an example of how that looks like in practice:

  1. bring your Dockerfile that you've already been using to build your binary
FROM golang AS base

	ENV CGO_ENABLED=0
	RUN apt update && apt install -y git

	ADD . /src
	WORKDIR /src

	RUN go mod download


FROM base AS build

	RUN go build \
		-tags netgo -v -a \
		-o /usr/local/bin/estaleiro \
		-ldflags "-X main.version=$(cat ./VERSION) -extldflags \"-static\""
  1. bring a estaleiro file that describes how to package that binary produced
# syntax = cirocosta/estaleiro-frontend

# the final image to produce
#
image "cirocosta/estaleiro" {
  base_image = "ubuntu:bionic"
  
  apt {
    package "ca-certificates" {}
  }

  file "/usr/local/bin/estaleiro" {
    from_step "build" {
      path = "/bin/estaleiro"
    }
  }
}


# performs the build of `estaleiro`.
#
step "build" {
  dockerfile = "./Dockerfile"
  target     = "build"

  source_file "/bin/estaleiro" {
    vcs "git" {
      ref        = "${estaleiro-commit}"
      repository = "https://github.com/cirocosta/estaleiro"
    }
  }
}

Having those pieces in, estaleiro creates the intermediary representation to be used by buildkitd to build the final container image that starts from ubuntu:bionic, has the ca-certificates package installed, and the file that the Dockerfile built - all while keeping track of their versions and sources along the way in the form of a bill of materials:

base_image:
  name: docker.io/library/ubuntu
  digest: sha256:c303f19cfe9ee92badbbbd7567bc1ca47789f79303ddcef56f77687d4744cd7a
  packages:
    - name: fdisk
      version: 2.31.1-0.4ubuntu3.3
      source_package: util-linux
      architecture: amd64
    - name: libpam-runtime
      version: 1.1.8-3.6ubuntu2.18.04.1
      source_package: pam
      architecture: all
    # ...

changeset:
  files:
    - name: "/usr/local/bin/seataleiro"
      digest: "sha256:89f687d4744cd779303ddc7ef56f77c303f19cfe9ee92badbbbd7567bc1ca47a"
      source:
        - url: https://github.com/cirocosta/estaleiro
          type: git
          ref: 6a4d0b73673a1863a62b7ac6cbde4ae7597c56d7
      from_step:
        name: "build"
        dockerfile_digest: "sha256:9303ddc7ef56f77c303f19cfe9ee92badbbbd7567bc189f687d4744cd77ca47a"
  packages:
    - name: ca-certificates
      version: "20180409"
      source_package: ""
      architecture: all
      location:
          uri: http://archive.ubuntu.com/ubuntu/pool/main/c/ca-certificates/ca-certificates_20180409_all.deb
          name: ca-certificates_20180409_all.deb
          size: "150932"
          md5sum: eae40792673dcb994af86284d0a01f36
      source:
        - uri: http://archive.ubuntu.com/ubuntu/pool/main/c/ca-certificates/ca-certificates_20180409.dsc
          name: ca-certificates_20180409.dsc
          size: "1420"
          md5sum: cd1f6540d0dab28f897e0e0cb2191130cdbf897f8ce3f52c8e483b2ed1555d30
        - uri: http://archive.ubuntu.com/ubuntu/pool/main/c/ca-certificates/ca-certificates_20180409.tar.xz
          name: ca-certificates_20180409.tar.xz
          size: "246908"
          md5sum: 7af6f5bfc619fd29cbf0258c1d95107c38ce840ad6274e343e1e0d971fc72b51
    # and all of its dependencies too ...

how to use

THIS IS STILL HIGHLY EXPERIMENTAL

All that you need is:

  • Docker 18.09+

Having an estaleiro file (like the estaleiro.hcl that you find in this repo), direct docker build to it via a regular --file (-f), having DOCKER_BUILDKIT=1 set as the environent variable:

# create an `estaleiro.hcl` file.
# 
# note.: the first line (with syntax ...) is important - it's
#        what tells the docker engine to fetch our implementation
#        of `estaleiro`, responsible for creating the build
#        definition.
#
$ echo "# syntax=cirocosta/estaleiro
image "cirocosta/sample" {
  base_image = "ubuntu:bionic"
}
" > ./estaleiro.hcl


# instruct `docker` to build our image
#
$ docker build -t test -f ./estaleiro.hcl
[+] Building 9.4s (4/4) FINISHED


# retrieve the bill of materials from the filesystem
#
$ docker create --name tmp
$ docker cp tmp:/bom/merged.yml ./bom.yml

references

license

See ./LICENSE.

estaleiro's People

Contributors

cirocosta avatar

Stargazers

 avatar  avatar

Watchers

 avatar

estaleiro's Issues

sha256sum computation

Before finishing writing bom, we could also compute the digest of those files that we're adding to the final image.

Would it be useful? 🤷‍♂

frontend: do not run `packages`-related code on the absence of packages

 => => sha256:7413c47ba209e555018c4be91101d017737f24b0c9d1f65339b97a4da98acb2a 26.69MB / 26.69MB                                                                                                                                                         0.0s
 => => unpacking docker.io/concourse/concourse:latest@sha256:38cb311fac63c09dee250c888c7488b1aba964404bdcc268e02ad916af509bc9                                                                                                                            5.2s
 => local://bin                                                                                                                                                                                                                                          0.8s
 => => transferring bin: 29.21MB                                                                                                                                                                                                                         0.8s
 => /usr/local/bin/estaleiro apt-repositories --output=/keys.yml                                                                                                                                                                                        10.3s
 => /usr/local/bin/estaleiro collect --input=/var/lib/dpkg/status --output=/bom/initial-packages.yml                                                                                                                                                     0.6s
 => /usr/local/bin/estaleiro base --output=/bom/base.yml                                                                                                                                                                                                 0.5s
 => ERROR /usr/local/bin/estaleiro apt-packages --output=/pkgs.yml --debs=/var/lib/estaleiro/debs                                                                                                                                                        0.5s
 => copy /keys.yml /keys.yml                                                                                                                                                                                                                             0.1s
------
 > /usr/local/bin/estaleiro apt-packages --output=/pkgs.yml --debs=/var/lib/estaleiro/debs:
#4 0.327 the required flag `-p' was not specified
------
error: failed to solve: rpc error: code = Unknown desc = failed to build LLB: executor failed running [/usr/local/bin/estaleiro apt-packages --output=/pkgs.yml --debs=/var/lib/estaleiro/debs]: exit code: 1

sample:

# syntax=cirocosta/estaleiro

# an example showing how leveraging `estaleiro` as just a final step would look
# like.
#
image "concourse" {
  base_image {
    name = "concourse/concourse"
  }
}

feature: external files

for instance:

FROM ubuntu:bionic AS base

	ARG HELM_RELEASE_URL=https://get.helm.sh/helm-v2.14.1-linux-amd64.tar.gz
	ARG HELM_RELEASE_SHA256SUM=804f745e6884435ef1343f4de8940f9db64f935cd9a55ad3d9153d064b7f5896

	ADD $HELM_RELEASE_URL /tmp/helm.tgz
	RUN echo "$HELM_RELEASE_SHA256SUM /tmp/helm.tgz" | sha256sum -c -

	RUN tar xvzf /tmp/helm.tgz \
		--strip-components=1 \
		-C /usr/local/bin \
		linux-amd64/helm linux-amd64/tiller


FROM ubuntu:bionic AS release

	COPY --from=base /usr/local/bin/ /usr/local/bin/
	ENTRYPOINT [ "/usr/local/bin/tiller" ]

tarball

Hey,

In order to get tarballs working as intended

image "concourse/concourse" {
  file "/usr/local/concourse/bin/gdn" {
    from_tarball "linux-rc" {
      path = "concourse/bin/gdn"
    }
  }
}


tarball "linux-rc" {
  file "gdn" {
    paths = ["concourse/bin/gdn"]

    vcs "git" {
      ref        = "master"
      repository = "https://github.com/cloudfoundry/guardian"
    }
  }
}

We could:

  1. for each tarball definition - extract that to a scratch-based layer where
    all the files can be found in a directory

  2. for each file reference, we copy the file from the corresponding layer into
    the final image layer (using fileOp too)

Thanks!

frontend/config: use of globs

# syntax=cirocosta/estaleiro

image "concourse/s3-resource" {
  base_image {
    name = "ubuntu"
  }

  file "/opt/resource/check" {
    from_step "build" {
      path = "/opt/resource/check"
    }

  file "/opt/resource/in" {
    from_step "build" {
      path = "/opt/resource/in"
    }

  file "/opt/resource/out" {
    from_step "build" {
      path = "/opt/resource/out"
    }
  }
}

step "build" {
  dockerfile = "./s3-resource/dockerfiles/ubuntu/Dockerfile"
  context    = "./s3-resource"
  target     = "builder"

  source_file "/opt/resource/check" {
    vcs "git" {
      repository = "https://github.com/concourse/s3-resource"
      ref        = "master"
    }
  }

  source_file "/opt/resource/in" {
    vcs "git" {
      repository = "https://github.com/concourse/s3-resource"
      ref        = "master"
    }
  }

  source_file "/opt/resource/out" {
    vcs "git" {
      repository = "https://github.com/concourse/s3-resource"
      ref        = "master"
    }
  }
}

that whole repetition could just be:

# syntax=cirocosta/estaleiro

image "concourse/s3-resource" {
  base_image {
    name = "ubuntu"
  }

  file "/opt/resource/*" {
    from_step "build" {
      path = "/opt/resource/*"
    }
  }
}

step "build" {
  dockerfile = "./s3-resource/dockerfiles/ubuntu/Dockerfile"
  context    = "./s3-resource"
  target     = "builder"

  source_file "/opt/resource/*" {
    vcs "git" {
      repository = "https://github.com/concourse/s3-resource"
      ref        = "master"
    }
  }
}

frontend: split `bom` into multiple states

As each modification to bom we're just adding stuff, we could generate N bom states, and then in the end (when performing the merge), mount that list of states altogether.

This would allow us to better parallelize the work that we do, and avoid all of the copying.

frontend: better image resolution error

failed to convert to llb: failed to resolve digest for concourse when preparing llb: couldn't resolve image for docker.io/library/concourse:latest: pull access denied, repository does not exist or may require authorization: server message: insufficient_scope: authorization failed is kinda hard to understand

image "dasdas" {
  base_image {
    name = "oasejoidj"
  }
}

custom `op` interpretation

While leveraging debug dump-llb has been useful, it's been showing its limitations (e.g., not being properly able to tell us the fileop operations accordingly).

Having a custom interpreter would allow us to show not only a dot or json output, but a "dry-run"-a-like pretty output that demonstrates beforehand all of the steps that will take place.

pretty errors

Leverage file location information to display some pretty diags

BOM

generation of bill of materials (bom)

Having most of the "source to container image" functionality figured out, now
we have to get better at generating the bill of materials.

Here are some steps to make that better:

  • refactor packages

    • get away from apt install
      • retrieve deb uris (apt-get --print-uris $LIST_OF_PACKAGES)
      • gather the packages (wget -i $uris_file) and inspect their control file (for deb in *.deb; do dpkg-deb -I $deb control)
      • install with dpkg
        • dpkg -i *.deb
      • for each package, verify if source packages can be found for that version
        • for pkg in $list_of_packages; do apt-get source --print-uris $pkg ; done (so that it can fail properly)
          • take note of packages without sources
      • produce BOM
  • refactor file addition

    • compute the file digest (estaleiro digest --filename=<> --algorithm=sha256sum)
      • when copying, copy both file (to regular dest) and digest (to /var/lib/estaleiro/something.digest)
  • metadata injection

    • add a final bom.yml to the labelset
  • refactor step addition

    • should provide a digest too

Below, some context.

the problem space

For each source components that bring artifacts into the final image, we must
be able to keep record of such additions.

At the moment, there are three ways of getting stuff in:

  1. base image
  2. files from steps
  3. files from tarballs
  4. debian packages

While the first three have no "build-time" dynamic aspect to them, the last one
does - a package might have many other dependencies that we don't know
beforehand.

That means that we have two types of BOM generation:

  1. static
  2. dynamic

For 1, we can populate the BOM struct from within the LLB generation code.

For 2, we must do it from steps at runtime, persisting their results into
files that can be accessed later.

format

base image

image "blabla" {
  base_image {
    name = "this"
    ref  = "that"
  }
}

produces

base_image:
  name: "docker.io/library/this"
  digest: "sha256@ahushaui"
  packages:
    - name: vim-runtime
      version: 1.29b-2ubuntu0.1
      source_package: vim

packages

image "bla" {
  package "this" {}
}

produces

changeset:
  packages:
    - name: zstd
      version: '1.3.3+dfsg-2ubuntu1'
      url: http://archive.ubuntu.com/ubuntu/pool/universe/libz/libzstd/zstd_1.3.3+dfsg-2ubuntu1_amd64.deb
      digest: 'SHA256:c85b2abcddbd7abc07fb06bc3a1b3fb6b80c2316e787abe05bb4d6909dc831f2'
      source_package: higher-level-package
      source:
        - url: http://archive.ubuntu.com/ubuntu/pool/main/libz/libzstd/libzstd_1.3.3+dfsg-2ubuntu1.dsc
          name: libzstd_1.3.3+dfsg-2ubuntu1.dsc
          type: deb-src
          digest: SHA256:c28c88103e3b8eecd5361bf38b185d1ac4a02712e153786ea4d01d26fea6eeb0

files

from steps

image "blabla" {
  file "/usr/local/bin/estaleiro" {
    from_step "estaleiro" {
      path = "/usr/bin/estaleiro"
    }
  }
}

step "estaleiro" {
  dockerfile = "./Dockerfile"
  target = "build"

  source_file "/usr/bin/estaleiro" {
    vcs {
      repository = "https://github.com/cirocosta/estaleiro"
      ref        = "master"
    }
  }
}

produces

changeset:
  files:
    - name: "/usr/local/concourse/bin/concourse"
      digest: "sha256:huidashiu"
      from_step:
        name: "estaleiro"
      source:
        - url: https://github.com/concourse/concourse
          type: git
          ref: master

from tarballs

image "blabla" {
  file "/usr/local/concourse/bin/concourse" {
    from_tarball "linux-rc" {
      path = "/concourse/bin/concourse"
    }
  }
}

tarball "linux-rc" {
  source_file "concourse/bin/concourse" {
    vcs {
      repository = "https://github.com/concourse/concourse"
      ref        = "master"
    }
  }
}

produces

changeset:
  files:
    - name: "/usr/local/concourse/bin/concourse"
      digest: "sha256:huidashiu"
      source:
        - url: https://github.com/concourse/concourse
          type: git
          ref: master
      from_tarball:
        name: "linux-rc"
        digest: "sha256:ahuhsui"

bom: repository that brought the package

Hey,

In order to retrieve the repository that brought a particular package, we need to run apt-cache policy $pkg_name and parse the outputs of it.

apt-cache policy search vim
vim:
  Installed: 2:8.0.1453-1ubuntu1.1
  Candidate: 2:8.0.1453-1ubuntu1.1
  Version table:
 *** 2:8.0.1453-1ubuntu1.1 500
        500 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages
        100 /var/lib/dpkg/status
     2:8.0.1453-1ubuntu1 500
        500 http://archive.ubuntu.com/ubuntu bionic/main amd64 Packages
N: Unable to locate package search

That, however, needs to run from within a container within the buildstep. To prepare the bom.yml then, we'd need to get access to the output of such execution, which seems to be only available through mounts 🤔

frontend: multiple files not working

Having multiple files declared is generating an LLB that copies over just a single file:

# syntax=cirocosta/estaleiro

image "concourse/s3-resource" {
  base_image {
    name = "ubuntu"
  }

  apt {
    package "tzdata" {}
    package "ca-certificates" {}
    package "unzip" {}
    package "zip" {}
  }

  file "/opt/resource/check" {
    from_step "build" {
      path = "/assets/check"
    }
  }

  file "/opt/resource/in" {
    from_step "build" {
      path = "/assets/in"
    }
  }

  file "/opt/resource/out" {
    from_step "build" {
      path = "/assets/out"
    }
  }

}

step "build" {
  dockerfile = "./s3-resource/dockerfiles/ubuntu/Dockerfile"
  context    = "./s3-resource"
  target     = "builder"

  source_file "/assets/check" {
    vcs "git" {
      repository = "https://github.com/concourse/s3-resource"
      ref        = "master"
    }
  }

  source_file "/assets/in" {
    vcs "git" {
      repository = "https://github.com/concourse/s3-resource"
      ref        = "master"
    }
  }

  source_file "/assets/out" {
    vcs "git" {
      repository = "https://github.com/concourse/s3-resource"
      ref        = "master"
    }
  }
}

sample

cmd: detect unused tarballs / files / steps

e.g.:

image "a" {
  base_image {}
}

tarball "unused" {}

should let the user known that unused is "dead code".

The same should be true for specific files too:

image "a" {
  base_image {}
  file "dsd" {
    from_tarball "unused" { path = "a"}
  }
}

tarball "unused" {
  source_file "a" {}
  source_file "b" {}
}

should let the user know that b is never consumed.

frontend: semantic validation step

Hey,

We've been performing a bunch of validation during the construction of the LLB,
but that could be better done through the use of a semantic validation step that
we could perform right after the parsing.


	config parsing
		==> semantic check 
			==> llb generation

Ideally, it'd be nice to be able to reference back in the config file where a
given problem occurs, but I highly doubt that this can be trivially achieved.


Valdiations to perform:

  • empty apt blocks
  • scratch-based using apt
  • dangling files / steps / tarball refeferences
  • unused files / steps / tarballs

frontend: compute digest of files coming from steps

changeset:
    files:
      - path: /usr/local/bin/estaleiro
        digest: ""
        source:
            git:
                repository_uri: https://github.com/cirocosta/estaleiro
                ref: 49b6936ef130793fff677038891e517718c2baf8

(see - missing digest there)

?: packages without source

There are some cases where repositories simply don't have a source counterpart that is able to give us the source code for the contents that are being packages.

How should we deal with that?

frontend/cmd: https-based repositories

As ubuntu:bionic doesn't have ca-certificates by default, we end up having trouble performing any retrievals for packages that are https-based.

I've been wondering about the possibility of splitting the retrieval of the initial repository listing from the rest so that for that part, we could have ca-certificates, not polluting the rest of the process afterwards 🤔

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.