meltwater / docker-cleanup Goto Github PK

View Code? Open in Web Editor NEW

586.0 52.0 117.0 40 KB

DEPRECATED Automatic Docker image, container and volume cleanup

License: MIT License

Shell 90.92% Makefile 6.17% Dockerfile 2.91%

docker archived deprecated obsolete

docker-cleanup's Introduction

DEPRECATED

This repository is no longer activiely maintained.

Docker Cleanup

This image will periodically clean up exited containers and remove images and volumes that aren't in use by a running container. Based on tutumcloud/image-cleanup and chadoe/docker-cleanup-volumes with some small fixes.

WARNING: This script will remove all exited containers, data-only containers and unused images unless you carefully exclude them. Take care if you mount /var/lib/docker into the container since that will clean up all unused data volumes. If it's not compatible with your system or Docker version it may delete all your volumes, even from under running containers.

Normally any Docker containers that exit are still kept on disk until docker rm -v is used to clean them up. Similarly any images that aren't used any more are kept around. For a cluster node that see lots of containers start and stop, large amounts of exited containers and old image versions can fill up the disk. A Jenkins build slave has the same issues, but can also suffer from SNAPSHOT images being continuously rebuilt and causing untagged images to be left around.

Environment Variables

The default parameters can be overridden by setting environment variables on the container using the docker run -e flag.

CLEAN_PERIOD=1800 - Interval in seconds to sleep after completing a cleaning run. Defaults to 1800 seconds = 30 minutes.
DELAY_TIME=1800 - Seconds to wait before removing exited containers and unused images. Defaults to 1800 seconds = 30 minutes.
KEEP_IMAGES - List of images to avoid cleaning, e.g. "ubuntu:trusty, ubuntu:latest". Defaults to clean all unused images.
KEEP_CONTAINERS - List of images for exited or dead containers to avoid cleaning, e.g. "ubuntu:trusty, ubuntu:latest".
KEEP_CONTAINERS_NAMED - List of names for exited or dead containers to avoid cleaning, e.g. "my-container1, persistent-data".
LOOP - Add the ability to do non-looped cleanups, run it once and exit. Options are true, false. Defaults to true to run it forever in loops.
DEBUG - Set to 1 to enable more debugging output on pattern matches
DOCKER_API_VERSION - The docker API version to use. This defaults to 1.20, but you can override it here in case the docker version on your host differs from the one that is installed in this container. You can find this on your host system by running docker version --format '{{.Client.APIVersion}}'.

Note that KEEP_IMAGES, KEEP_CONTAINERS, and KEEP_CONTAINERS_NAMED are left-anchored bash shell pattern matching lists (NOT regexps). Therefore, the image foo/bar:tag will be matched by ANY of the following:

foo/bar:tag
foo/bar
foo/b
[[:alpha:]]/bar
*/*:tag
*:tag
foo/*:tag

However it will not match

foo/baz
bar:tag
/bar
:tag
[[:alpha:]]:tag

By default, both are set to **None** which is the same as the blank string. If you want to keep ALL images or containers, effectively disabling this part of the cleanup, then you should use *:* to match all images. Do not use a bare * as this will be taken as a filename match.

Deployment

The image uses the Docker client to to list and remove containers and images. For this reason the Docker client and socket is mapped into the container.

If the /var/lib/docker directory is mapped into the container this script will also clean up orphaned Docker volumes.

Systemd and CoreOS/Fleet

Create a Systemd unit file in /etc/systemd/system/docker-cleanup.service with contents like below. Using CoreOS and Fleet then add the X-Fleet section to schedule the unit on all cluster nodes.

[Unit]
Description=Cleanup of exited containers and unused images/volumes
After=docker.service
Requires=docker.service

[Install]
WantedBy=multi-user.target

[Service]
Environment=IMAGE=meltwater/docker-cleanup:latest NAME=docker-cleanup

# Allow docker pull to take some time
TimeoutStartSec=600

# Restart on failures
KillMode=none
Restart=always
RestartSec=15

ExecStartPre=-/usr/bin/docker kill $NAME
ExecStartPre=-/usr/bin/docker rm $NAME
ExecStartPre=-/bin/sh -c 'if ! docker images | tr -s " " : | grep "^${IMAGE}:"; then docker pull "${IMAGE}"; fi'
ExecStart=/usr/bin/docker run \
    -v /var/run/docker.sock:/var/run/docker.sock:rw \
    -v /var/lib/docker:/var/lib/docker:rw \
    --name=${NAME} \
    $IMAGE

ExecStop=/usr/bin/docker stop $NAME

[X-Fleet]
Global=true

Puppet Hiera

Using the garethr-docker module

classes:
  - docker::run_instance

docker::run_instance:
  'cleanup':
    image: 'meltwater/docker-cleanup:latest'
    volumes:
      - "/var/run/docker.sock:/var/run/docker.sock:rw"
      - "/var/lib/docker:/var/lib/docker:rw"

Command Line

docker run \
  -v /var/run/docker.sock:/var/run/docker.sock:rw \
  -v /var/lib/docker:/var/lib/docker:rw \
  meltwater/docker-cleanup:latest

Kubernetes

You can find a Kubernetes DaemonSet configuration, that will allow you to run the meltwater/docker-cleanup container on every node of your cluster.

kubectl create -f contrib/k8s-daemonset.yml

Development

A Makefile is included to help with repetitive commands during development.

make help

docker-cleanup's People

Contributors

Stargazers

Watchers

Forkers

jperon freshbooks adsabs antoinefink gloppenhosting davidcollom adrianprecub telmo wurstmeister osterzel jesseshieh pitrho johnallen3d evenco scheetz linxsys-admin jorgegilmoreira sshipway karlitxo noumansaleem davedecaprio olalonde salimane romanminkin mfournier sroze kalw aripringle artburkart borsboom bagre marceloalmeida msutter materone msesterhenn daniel-yavorovich diceone eea sorenroug ellerbrock vmakhaev inetkiller behance chr0n1x ddoloroi sharmaansh21 smartsensebydigi cloud-architecture doloroi oskapt hasantayyar nilscant qhyou11 bupttcl tradnairb dartb ljusyu jackluo2012 mikljohansson bestrand andruwa13 up1 kackaz heijigaoke gitshaw davidas85 zakkg3 medullan timothyzhw crane-docker deeco mveroone xpwdm2020 snacky6 gkoerk gunurunagasivakumar lagartoplastico akking muthukumarse dezmodue wmoinacourses balder1840 nhatitachi tibmeister telefonica mhlee0328 dnldiniz container-projects strikerjjb huggla tekenny sessa93 mixergit jameshannon bouncy-banana yamaszone xlanor jonizen hansolosoami dev-wei

docker-cleanup's Issues

Unable to clean same unused images from multiple repositories

REPOSITORY                                           TAG                                        IMAGE ID            CREATED             SIZE
autodeploy                                           6ebe3bbe6cbed36f0c69cf87cbfb68aad14c62ba   9ffd283dcfcb        5 minutes ago       190.5 MB
x.x.io:443/ci/autodeploy                    6ebe3bbe6cbed36f0c69cf87cbfb68aad14c62ba   9ffd283dcfcb        5 minutes ago       190.5 MB

Both images are not used but docker rmi 9ffd283dcfcb doesn't work since they have the same ID. In this case it should be deleted by name.

2016-03-10T17:40:28.659207597Z => Start to clean 3 images
2016-03-10T17:40:28.679801771Z Error response from daemon: conflict: unable to delete 9ffd283dcfcb (must be forced) - image is referenced in one or more repositories
2016-03-10T17:40:28.680382797Z Error response from daemon: conflict: unable to delete 9ffd283dcfcb (must be forced) - image is referenced in one or more repositories

image cleanup based on created time

need ability to delete images based on age after the exclusion list

Allow wildcards in exclude list

It would be very helpful if KEEP_IMAGES could take wildcards, or if specifying a container without a tag implied ALL tags. IE:

steve/foo : Keep steve/foo:latest, steve/foo:v1, steve/foo:v2 etc (same as steve/foo:* )
steve/* : Keep steve/foo:latest, steve/bar:v1, etc (same as steve/: )
steve/foo:latest : Only keep steve/foo:latest
steve/*:latest : Keep latest versions of all steve containers; steve/foo:latest, steve/bar:latest, etc

Doing this would allow us to clean up only unused container that are not in-house created, or all versions of one particular container.

Wrong Docker version on host

Hi, I wanted to use this useful tool on a EC2 instance running the last CoreOS stable AMI (version 835.11.0, ami-7e72c70d).

The thing is that it comes with Docker 1.8.3:

 ~ $ docker version
Client:
 Version:      1.8.3
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   cedd534-dirty
 Built:        Fri Jan 22 06:07:01 UTC 2016
 OS/Arch:      linux/amd64

Server:
 Version:      1.8.3
 API version:  1.20
 Go version:   go1.4.2
 Git commit:   cedd534-dirty
 Built:        Fri Jan 22 06:07:01 UTC 2016
 OS/Arch:      linux/amd64

When I try to run docker-cleanup with docker run -v /var/run/docker.sock:/var/run/docker.sock:rw -v /var/lib/docker:/var/lib/docker:rw meltwater/docker-cleanup:latest, I get:

Error response from daemon: client is newer than server (client API version: 1.21, server API version: 1.20)
Cannot run docker binary at /usr/bin/docker
Please check if the docker binary is mounted correctly

Is there a way to use an older version of docker-cleanup that runs the same version of my host? Do you have any suggestion?

Isn't this a common issue? I don't think that all the people is using Docker 1.9.
Thanks a lot!

Throwing error 'docker: invalid restart policy -e.'

Docker version 1.9.1, build a34a1d5
docker@default:~$ docker run \
>   --restart \
>   -e KEEP_IMAGES="ubuntu:14.04 corp/important-image:tag" \
>   -v /var/run/docker.sock:/var/run/docker.sock:rw \
>   -v /var/lib/docker:/var/lib/docker:rw \
>   meltwater/docker-cleanup:latest
> docker: invalid restart policy -e.
> See 'docker run --help'.`

Permission denied?

With the new release (1.7.0) im getting the following error:

$ docker run
   -v /var/run/docker.sock:/var/run/docker.sock:rw
   -v /var/lib/docker:/var/lib/docker:rw
   meltwater/docker-cleanup

exec: "/run.sh": permission denied
docker: Error response from daemon: Container command could not be invoked..

There's no problem with 1.6.0. Not sure what changes made in run.sh that require extra permission.

should use `docker volume rm`

Since docker 1.9 introduced the docker volume subcommand, removing volumes with docker-cleanup-volumes.sh only partially removes them. The data is removed, but docker volume ls still lists them.

This has the unfortunate side effect that creating named volumes with the same name as a previously removed one will blocks. Which makes containers startup stall when implicitly creating volumes with the -v switch.

Support for overlay

Right now this tool is aimed at vfs, coreos uses overlay, which doesn't really work in the same way, at least changing vfs/dir to overlay and running the script removes all sorts of things that break running containers.

I'm wondering if you investigated that subject already.

Janitor deletes data only containers and images

I had to re-create my entire environment

cannot unmarshal object into Go value of type []string

I'm experiencing a strange issue ATM, where I get the following logs from the containers:

docker is running properly
DEBUG ENABLED
=> Run the clean script every 60 seconds and delay 1800 seconds to clean.
DEBUG: Starting loop
=> Removing unused volumes using native 'docker volume' command
Error response from daemon: json: cannot unmarshal object into Go value of type []string
=> Removing exited/dead containers
Error response from daemon: json: cannot unmarshal object into Go value of type []string
=> Removing unused images
Error response from daemon: json: cannot unmarshal object into Go value of type []string

Here are the different Docker informations:

Client:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.3
 Git commit:   a34a1d5
 Built:        Fri Nov 20 17:56:04 UTC 2015
 OS/Arch:      linux/amd64
Server:
 Version:      1.9.1
 API version:  1.21
 Go version:   go1.4.3
 Git commit:   a34a1d5
 Built:        Fri Nov 20 17:56:04 UTC 2015
 OS/Arch:      linux/amd64

The Docker daemon (/var/log/docker.log) also have these logs:

time="2016-06-13T11:06:38.415267697Z" level=error msg="Handler for GET /v1.21/volumes returned error: json: cannot unmarshal object into Go value of type []string" 
time="2016-06-13T11:06:38.415364606Z" level=error msg="HTTP Error" err="json: cannot unmarshal object into Go value of type []string" statusCode=500 
time="2016-06-13T11:06:38.434171019Z" level=error msg="Handler for GET /v1.21/containers/json returned error: json: cannot unmarshal object into Go value of type []string" 
time="2016-06-13T11:06:38.434265178Z" level=error msg="HTTP Error" err="json: cannot unmarshal object into Go value of type []string" statusCode=500 
time="2016-06-13T11:06:38.451053964Z" level=error msg="Handler for GET /v1.21/containers/json returned error: json: cannot unmarshal object into Go value of type []string" 
time="2016-06-13T11:06:38.451140106Z" level=error msg="HTTP Error" err="json: cannot unmarshal object into Go value of type []string" statusCode=500

Anybody already experienced this issue?

Thank you very much,

delay on start

seems that if you use --restart=always it'll start removing others that haven't restarted yet, some extra info in the logs would be nice too right now there's just the id so it's hard to see what happened

KEEP_VOLUMES

Hi,

I have some cuda (nvidia volumes) that should always stay around. Is there a possibility to keep them?

Avoid mounting /var/lib/docker directory from host

As stated on this article, only the docker daemon should have exclusive access to the configuration directory, /var/lib/docker. Quoting from the article:

The Docker daemon was explicitly designed to have exclusive access to /var/lib/docker. Nothing else should touch, poke, or tickle any of the Docker files hidden there.

Why is that? It’s one of the hard learned lessons from the dotCloud days. The dotCloud container engine worked by having multiple processes accessing /var/lib/dotcloud simultaneously. Clever tricks like atomic file replacement (instead of in-place editing), peppering the code with advisory and mandatory locking, and other experiments with safe-ish systems like SQLite and BDB only got us so far; and when we refactored our container engine (which eventually became Docker) one of the big design decisions was to gather all the container operations under a single daemon and be done with all that concurrent access nonsense.

(Don’t get me wrong: it’s totally possible to do something nice and reliable and fast involving multiple processes and state-of-the-art concurrency management; but we think that it’s simpler, as well as easier to write and to maintain, to go with the single actor model of Docker.)

This means that if you share your /var/lib/docker directory between multiple Docker instances, you’re gonna have a bad time. Of course, it might work, especially during early testing. “Look ma, I can docker run ubuntu!” But try to do something more involved (pull the same image from two different instances…) and watch the world burn.

I think binding the unix socket should be enough, and the right way of doing it - by only executing docker commands and not inspecting the configuration files.

Script attempts to remove non-local docker volume (noop, but still)

Take volume output like the following:

$ docker volume ls
DRIVER              VOLUME NAME
rancher-secrets     5e41a869acdb6a76c0cbf3212a358f72f96254b94bb3eebeb28d04b1171f3708
rancher-secrets     5ffeab5ad23e1f15705ade81b41677d6875f9cca5b33efde8149a43f6904853b
rancher-secrets     8c0744a2e9c3b15fcd42225b140716d6848bbfb01754de17f08d373cbd0f3aed
rancher-secrets     9774cc2270aad64b81cfa52db62d76ceebed4d2b183fa0c0ade34d2f3bcc6cc7
rancher-secrets     c7b8416d75b3baa24cb500e28e3e1a29ce7df3ef530496b5d087ec9c0bef6739
rancher-nfs         dev-mysql-etc
local               f258cd3613f345121bafeb1643dd0849d698cb9058fecf25506a2dcf2998da77
rancher-secrets     f3ac3d0bf7d9484b2483d7eafba72c4be204feeb648bb40b760855301e1e92ba
rancher-nfs         openvpn-etc
rancher-nfs         postfix-log
rancher-nfs         postfix-spool
local               rancher-agent-state
local               rancher-cni
local               rancher-cni-driver
rancher-nfs         runner01-etc
rancher-nfs         runner02-etc
rancher-nfs         sq-content
rancher-nfs         sq-content-dev

Every one of the non-local volumes is managed by a volume driver and cannot be managed by docker volume, yet the script still identifies them as unused volumes and attempts to remove them.

This can be fixed by adding the filter of driver=local to the command on line 85 as follows:

docker volume ls -f driver=local,dangling=true -q

Tag suggestion for Docker Hub project

Hi,
I am Kang Yin, a graduate student of Institute of Software, Chinese Academy of Sciences. Now we are doing a research on how to recommend tags for Docker Hub’s projects. We applied text mining and natural language processing to build a tag recommendation system for Docker projects.
We notice that you have created a repository on Docker Hub, which is named meltwater/docker-cleanup and the project address is https://hub.docker.com/r/meltwater/docker-cleanup/
Since the developers knows their projects better, we want to evaluate our recommendation results with your help as the project developer. We want to know if the recommended tags are reasonable for your projects.
The following tags (ranked by order) are generated from our model automatically. Would you like to do me a favor and reply with what tags are reasonable (Good) and what are not (Bad), in form of, “Good tags: ***, ***; Bad tags: ***, ***”.

The recommend tags for your project meltwater/docker-cleanup are listed as follows:

exited, cleanup, unused, docker-cleanup, clean, docker-cleanup-volumes, defaults, seconds, volumes, ubuntu

It will be a great help if you can give us a feedback.
Thank you so much for your precious time.

Containers getting incorrectly cleaned up?

I'm a bit confused as to how KEEP_CONTAINERS and KEEP_CONTAINERS_NAMED are supposed to work together. We were running this docker container (using the Rancher catalog stack) with the following env vars set:

KEEP_CONTAINERS = "*:*"
KEEP_CONTAINERS_NAMED = "**None**"
KEEP_IMAGES = "/rancher"

Based on that config and the docs I would expect no containers to ever get cleaned up (each container's image should match the KEEP_CONTAINERS rule). However, we were seeing some start-once, data-only containers get cleaned up.

Limit disk usage with LRU

Feature request: Instead of deleting all unused images, have a pool size that evicts based on LRU.

This would be really useful when nodes are running all sorts of images but we would still like fast starts when.

Context: We run periodic jobs through a set of nodes. We have a lot of images and the nodes don't have a large attached disk.. I would like to keep the last 16GB of recently used images, but remove all others to be able to run new jobs.

allow non-looped cleanups

hey there!
Is it possible to run this script "one-shot" ?
I'd like to run it, and once finished, exit.

Why not use docker commands with clean-up jenkins job? Not a issue - just a thought

I like this idea. we initially thought something like this for our CI box but we need a shell script to clean up this container which might be dangling

Ended up creating a new jenkins job which runs weekly with following container commands:
docker rm -f $(docker ps -aq)
docker rmi -f $(docker images -q)

We dont have any requirement to keep any specific containers running but thanks for providing this code!

License

Please specifically include the MIT License from chadoe/docker-cleanup-volumes (https://github.com/chadoe/docker-cleanup-volumes/blob/master/LICENSE).

sleep: invalid number ''3600''

Hi when i change the default value of DELAY_TIME I get the following error in my logs. What could be causing this?

This is my unit file

[Unit]
Description=Cleanup of exited containers and unused images/volumes
After=docker.service
Requires=docker.service

[Install]
WantedBy=multi-user.target

[Service]
Environment=IMAGE=meltwater/docker-cleanup:latest NAME=docker-cleanup

Allow docker pull to take some time

TimeoutStartSec=600

Restart on failures

KillMode=none
Restart=always
RestartSec=15

ExecStartPre=-/usr/bin/docker kill $NAME
ExecStartPre=-/usr/bin/docker rm $NAME
ExecStartPre=-/bin/sh -c 'if ! docker images | tr -s " " : | grep "^${IMAGE}:"; then docker pull "${IMAGE}"; fi'
ExecStart=/usr/bin/docker run
-e CLEAN_PERIOD='3600'
-e DELAY_TIME='3600'
-v /var/run/docker.sock:/var/run/docker.sock:rw
-v /var/lib/docker:/var/lib/docker:rw
--name=${NAME}
$IMAGE

ExecStop=/usr/bin/docker stop $NAME

Does not remove images without -f parameter for docker rmi command

I found out that it does not remove any images. If -f is added to docker rmi command images will be removed then correctly. Reason was something like "because of tags try -f".

Use labels instead of container names

https://docs.docker.com/compose/compose-file/#labels

ie :

labels:
  com.janitor.ephemeral: true
  com.janitor.dependsOn: [ 'container-name', 'container-name']