Git Product home page Git Product logo

kubernetes-failure-stories's Introduction

kubernetes-failure-stories's People

Contributors

aasiutin avatar charlieegan3 avatar cruschke avatar dmitri-lerko avatar elpicador avatar ereli-cb avatar fayizk1 avatar gjtempleton avatar hjacobs avatar hrzbrg avatar johnlunney avatar jtolio avatar kjgorman avatar kubaj avatar medyagh avatar philipithomas avatar pieterlange avatar povilasv avatar raffo avatar salilgupta1 avatar srvaroa avatar tomwilkie avatar yashmehrotra avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kubernetes-failure-stories's Issues

Failure story length and language

Hi there,

is there any lower limit on how long a failure story needs to be (i.e would an Ingress traffic outage because of a wrongly configured Load Balancer Service count already)?

And, what are the requirements on language? English only? I have published the failure story mentioned above in my German blog, but still would like to contribute it.

Regards
Marcel

Failure story: hub.docker.com slow (5-10kB/s)

Short, easy, small story, but got on my nerves as I had to cancel a dinner.
In end effect it was a huge decision-changer for my company. All looked good till this failure... I decided to delay our k8s-based hosting offerings for now :|

I used k8s cluster(s) on AWS, provisioned with kops.

Friday, 5PM. Last task remaining for the week: change instance size(s) of k8s cluster. All services are correctly distributed over N nodes, what could possibly go wrong?

hub.docker.com's CDN. I'm not sure where it's hosted, but for some reason it became totally slow on AWS. Downloads of ca. 5-10kb/s. Works like a charm in another AWS regions or non-AWS datacenter. Just does not work for me.

So... I had to cancel my evening plans (and rendered the cluster unavailable), because:

  • each node tries to start for >30 minutes, then fails and is re-added. basic k8s services can't start, as container images can't be downloaded in reasonable. There's no quick fail - each time it tries to start, getting images is super slow and eventually times out
  • I got an open "kops update" in a terminal window on my local workstation. I found no information if I can safely break this operation. It will disconnect if unplug network cable from my laptop.

Solution:

  • cancel your dinner
  • wait for some hours until CDN bandwidth stabilizes
  • rethink many, many times, if our company should offer production k8s services...

Group by k8s release, etcd release, etc.

Group all reports by versions/env it happened on, similar to bug reports. Also attach bug report links and what release it has been fixed in (maybe use a form of table)

These reports are getting stale very quickly with the new releases, it will be very helpful if I can trace what's applied to me using components versions

Idea: annotate keywords/topics and contributing factors?

The list of failure stories is still pretty short, but it might still make sense to add more information such as keywords hinting possible contributing factors. This would allow readers to more easily find relevant information, e.g:

  • "I saw problems with kubelet connecting to API server, let's look at the kubelet, dynamic ELB IPs outage post"
  • "I saw DNS issues in our cluster, let's see what the incident report with keyword DNS has to say"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.