Comments (17)
/committee steering
as this has budget / cncf related items
from registry.k8s.io.
Jeefy lizard brain policy idea:
Non-Prod images: Age out after 9 months
Prod images: Age out after release-EOL + 1y (so if 1.27 aged out Jan 2024, it would get culled Jan 2025)
from registry.k8s.io.
I am broadly in support of this. I don't think its a reasonable ask for the project to host all releases for all time.
Do we have any data on which versions are being pulled?
I know there are 3rd party reports available (e.g. datadog report) that show 1.21 is still the most common version deployed right now, I would want to make sure we take that into account.
Right now I'm leaning towards EOL-9 (3 years), but would want some data before making any decision.
EDIT: Even without data I think we should remove the google-containers images...oof
from registry.k8s.io.
/sig testing
/kind cleanup
from registry.k8s.io.
Some additional points:
-
We have a lot of images that are not related to Kubernetes releases, but may have their own release timelines. I think we should maybe just set an N year policy, where N is relatively generous but still gives us room to stop perma-hosting.
-
The mechanism to remove images needs some thinking ...
- The source of truth on what images are available is in the https://github.com/kubernetes/k8s.io repo, and is https://github.com/kubernetes-sigs/promo-tools copying to the backing registries (so to be clear no changes will happen in this repo, it's a multi-repo problem and I figured visibility might be best here, but we need to forward this to more people).
- We might have to cull them from the image promoter manifests somehow, and then start to permit auto-pruning things that are in production storage but not in the manifests. Automating the policy to drop things from the source manifests sounds tricky, but I'm not sure there's a more reasonable approach. The implementation details need input from promo-tools maintainers and will probably at least somewhat drive the viable policy. cc @puerco @kubernetes/release-engineering (EDIT: see kubernetes-sigs/promo-tools#719)
-
IMHO Despite needing to consider the mechanics, we should decide if we're doing this and on a reasonable policy and starting communication ahead of actually implementing, it may take time to staff these changes but communicating sooner alongside the new registry would be beneficial to users.
EDIT: Even without data I think we should remove the google-containers images...oof
Yeah. That also just hasn't happened due to lack of policy. We did the flip to k8s.gcr.io July 24th 2020. kubernetes/release#270 (comment)
/sig testing
While I'm sure sig-testing is happy to support this effort, I'd suggest that the policy is a combination of k8s-infra (what is k8s-infra willing to fund resources for generally) and release (particularly around the promo tools support for this and kubernetes release timeframe).
/sig k8s-infra
/sig release
/remove-sig testing
from registry.k8s.io.
Technical aside: when we serve layer redirects, we then get an option to serve a Warning
header alongside the redirect.
from registry.k8s.io.
One additional complication for people to consider: We more commonly host our own base images at this point, building old commits from source will become more challenging (but not necessarily impossible*) if we age out those images.
* non-reproducible builds may be an issue, e.g. the "debian-base" image.
from registry.k8s.io.
Posted this on the linked issue, but maybe it's better here:
The cost reduction would primarily be because we would break people, and encourage them to upgrade (?)
I think other OSS projects follow a similar strategy, e.g. the "normal" debian APT repos don't work with old releases, but there is a public archive.
I don't know the actual strategy for when debian distros move to the archive. For kubernetes, if we support 4 versions (?), we probably want to keep at least 5 versions "live" so that people aren't forced to upgrade all their EOL clusters at once, but we probably want to keep no more than 8 versions "live" so that people are nudged to upgrade. So I come up with a range of 5-8 releases if we wanted to split off old releases, and I can imagine a case for any of those values.
from registry.k8s.io.
For kubernetes, if we support 4 versions (?), we probably want to keep at least 5 versions "live" so that people aren't forced to upgrade all their EOL clusters at once, but we probably want to keep no more than 8 versions "live" so that people are nudged to upgrade.
This is bringing up a very good point. We have to keep in mind that you can't skip Kubernetes versions when upgrading, otherwise you'll go against the skew policy and eventually break the cluster. Let's say that we remove images for up until 1.22, but someone is using 1.20. They don't have a way to upgrade their cluster to a newer version because they have to start with upgrading to 1.21 which wouldn't exist any longer. This is a very problematic scenario because the only way is more-or-less to start from scratch and that's unacceptable for many.
We need to be a bit generous here. I agree with @justinsb that we should target 5-8 release. I'd probably go closer to 8.
from registry.k8s.io.
These are good points.
I think we should probably be implementing policy in terms of time though, both to be more managable to implement and because we have many images that are not part of Kubernetes releases.
If we consider the lifespan of releases but put it in terms of time we could say e.g. "3 years after publication" which would be 5-8 releases (since releases are every 1/3 year and supported for one year).
from registry.k8s.io.
I think we should set a more aggressive timeline going forward, say starting with 1.27 we'll host artifacts for 1 year after things after they hit end of support. I'm not sure how we best handle older things, but things like 1.21 are still being heavily used. If we said "5" releases, that would still fall into that window pretty soon.
If we have to chose between the health of the project overall (CI infra, etc) and impacting people with those older versions, I think we have to unfortunately chose the health of the project :( Can we provide some extraordinary mechanisms for people to pull maybe tarballs of older stuff and some instructions on how to put those into a registry, like some sort of cold storage for folks?
from registry.k8s.io.
💯 to say starting with 1.27 we'll host artifacts for 1 year after things after they hit end of support.
from registry.k8s.io.
Can we provide some extraordinary mechanisms for people to pull maybe tarballs of older stuff and some instructions on how to put those into a registry, like some sort of cold storage for folks?
What if we keep the latest patch release for each minor? For example, 1 year after reaching the EOL date, we keep images for the latest patch release, but delete all other images. Eventually, we can remove the latest patch release, let's say, 3-4 years after the EOL date. That will reduce storage and (hopefully) bandwidth costs, but at the same time, it shouldn't break any clusters or workflows.
from registry.k8s.io.
The source code will remain https://github.com/kubernetes/kubernetes/releases , is not that people that needs can not do make release
;)
from registry.k8s.io.
Technical capabilities aside, the ideal set would be, IMHO:
- Everything for the last 4 releases
- Just the final patch release for the 4 releases before that, just to enable upgrading
If we had the ability, I expect that there are probably lots of other images we could purge for older releases from subprojects, etc, which could be removed much more aggresively.
from registry.k8s.io.
Thanks all for all the input and suggestions!
The source code will remain https://github.com/kubernetes/kubernetes/releases , is not that people that needs can not do make release ;)
That's not sufficient, because the base images cannot just be re-built from source typically, and the source code references base images. Those base images are a snapshot of various packages at a point in time.
We also run binary builds in containers sometimes (see kubernetes/kubernetes), where the build-environment image is also an important point-in-time snapshot for reproducing a source build.
We've been discussing this further in various forums, and on further review I think the cost angle is going to be near totally irrelevant once we have traffic off k8s.gcr.io onto registry.k8s.io.
Storage costs are not large enough to matter. Image storage de-dupes pretty well, it's compressed, in real numbers we're actually looking at less than 2 TB currently even after ~8 years. 2 TB costs like $50/mo to store in GCS standard tier for exxample.
Bandwidth costs matter, deeply, ... but only for content people actually use, and they're going to be a lot more manageable on registry.k8s.io (e.g. >50% of requests to k8s.gcr.io came from AWS, and we're now serving content out of AWS ... no egress).
For the tooling "needs to iterate all of this" angle, I've been working on a new tool that needs to scan all the images over the past few days, and it's actually quite doable to optimize this even as images grow. AR provides a cheap API to list all digests in a repo. I think we can teach the promoter tools to avoid repeatedly fetching content-addressable data we've already processed.
I think disappearing base images in particular is going to cause more harm than benefit ...
We're also really ramping up having users migrate, so I think we are missing the window on "lets get a policy in place before people use it", and it we've forced users to adapt to a new registry twice in the past few years, so I think we can just introduce a policy later if we wind up needing it.
@mrbobbytables outreach to subprojects and other CNCF projects to migrate, and other pending cost optimization efforts are probably a better use of time at the moment.
from registry.k8s.io.
There are other issues tracking things like potentially sunsetting k8s.gcr.io, they're tracked in the general k8s-infra repo at https://github.com/kubernetes/k8s.io
from registry.k8s.io.
Related Issues (20)
- Can't pull images from k8s.io in eu-west region over IPv6 HOT 10
- Not able to install K8s Cluster using kubeadm init command due to x509: certificate signed by unknown authority. HOT 3
- Unable to download docker images from registry.k8s.io due to x509: certificate is not valid for any names, but wanted to match prod-registry-k8s-io-ap-south-1.s3.dualstack.ap-south-1.amazonaws.com HOT 2
- Unable to download docker images from registry.k8s.io due to x509: certificate is not valid for any names, but wanted to match prod-registry-k8s-io-ap-south-1.s3.dualstack.ap-south-1.amazonaws.com HOT 1
- kubeadm init clone project to local compilation error HOT 11
- TLS handshake timeout pulling registry.k8s.io/kube-apiserver:v1.25.4 HOT 8
- Unable to pull registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.9.2 HOT 5
- Error response from daemon: Head "https://europe-west3-docker.pkg.dev/v2/k8s-artifacts-prod/images/sig-storage/csi-provisioner/manifests/v4.0.0": Forbidden HOT 3
- regional outage due to GCP us-west1 incident HOT 17
- Consider blocking some invalid requests at the edge HOT 1
- enable outlier detection HOT 5
- K8's registry block server IP HOT 4
- investigate switching to signed URLs HOT 4
- how to configure kubernetes registry to pull images i am getting image pull back issue HOT 1
- PULL REQUEST FAIL WHEN TRYING TO PULL IMAGE FROM registry.k8s.io HOT 4
- Disconnected Environments and "mirroring to a location you control" HOT 1
- Unable to access the registry when a specific User-Agent header is set HOT 13
- Handle missing referrers API HOT 4
- switch to aws-sdk-go-v2 HOT 3
- Please use subdomains for wildcard dns entries instead of path substitution HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from registry.k8s.io.