Comments (10)
@solsson I've tried to rephrase the reason for having pzoo and zoo below. Let me know what you think:
AFAICT, there are at least two types of failures for which there should be some protection.
-
Software errors: This is where something goes wrong with a Zookeeper pod that results in it going down. There is nothing wrong with the underlying infrastructure.
-
Infra errors: Underlying AWS/cloud infrastructure went down.
If there are 3 AZs, the 5 ZK pods are spread across these 3 AZs. If an AZ goes down, there is little benefit to be had of having 5 ZK pods since the AZ that went down could result in 2 ZK pods being lost. The ZK cluster is 1 more failure away from being unavailable. The situation would be the same if there were only 3 ZK pods and 1 AZ went down.
However, for software errors, each pod could go down by itself and having 5 ZK nodes helps because it can tolerate 2 individual pod failures (instead of 1 in the 3ZK case).
While having only 3 EBS volumes instead of 5 does keep costs low, to avoid confusion, it would be better to have a single statefulset of pzoo with 5 nodes.
from kubernetes-kafka.
It was introduced in #34 and discussed in #26 (comment).
The case for this has weakened though, with increased support for dynamic volume provisioning across different kubernetes setups, and with this setup being used for heavier workloads. I'd perfer if the two statefulsets could simply be scaled up and down individually. For example if you're on a single zone you don't have the volume portability issue. In a setup like #118 with local volumes howerver it's quite difficult to ensure quorum capabilities on single node failure.
Unfortunately Zookeeper's configuration is static, prior to 3.5 wich is in development. Adapting to initial scale would be doable, I think. For example the init script could use the Kubernetes API to read the desired
number of replicas for the both StatefulSets and generate the server.X strings accordingly.
from kubernetes-kafka.
Hi @solsson . I have the same confusion. I've read your comment on the other issue, but it didn't help me understand why two node are using empty_dir but not persistent volumn. Could you elaborate a little more as to under what scenario they will be useful? How does it compare to use persistent volume for all 5 nodes? I'm running my Kubernetes cluster on AWS, with 6 worker nodes spreading across 3 availability zones. Thanks.
from kubernetes-kafka.
Good that you question this. The complexity should be removed if it can't be motivated. I'm certainly prepared to switch to all-persistent Zookeeper.
The design goal was to make the persistent layer as robust as the services layer. Probably not as robust as bucket stores or 3rd party hosted databases, but same uptime as your frontend is good enough.
Thus workloads will have to migrate in the face of lost availability zones, like non-stateful apps will certainly do with Kubernetes. I recall https://medium.com/spire-labs/mitigating-an-aws-instance-failure-with-the-magic-of-kubernetes-128a44d44c14 "a sense of awe watching the automatic mitigation".
Unless you have a volume type that can migrate, the problem is that stateful pods will only start in the zone where the volume was provisioned. With both 5 and 7 node zk across 3 zones, if a zone with 2 or 3 zk pods repsectively goes out, you're -1 pod away from losing a majority of your zk. My assumption is that lost majority means your service goes down. Zone outage can be extensive, as in the AWS case above, and due to zk's static configuration you can't reconfigure to adapt to the situation as it would cause the -1.
With kafka brokers you can throw money at the problem: increase your replication factor. With zk you can't. Or maybe you can, with scale=9?
from kubernetes-kafka.
While having only 3 EBS volumes instead of 5 does keep costs low, to avoid confusion, it would be better to have a single statefulset of pzoo with 5 nodes.
@shrinandj I think I agree at this stage. What would be even better, in particular now (unlike in the k8s 1.2 days) that support for automatic volume provisioning can be expected, would be to support scaling of the zookeeper statefulset(s). That way everyone can descide for themselves, and we can default to 5 persistent pods. Should be quite doable in the initscript, by retrieving desired number of replicas with kubect. I'd be happy to accept PRs for such things.
from kubernetes-kafka.
Can you elaborate a bit on that?
- The default will be a statefulset with 5 pods.
- Users can scale this up if needed by simply increasing the number from 5 to whatever using
kubectl scale statefulsets pzoo --replicas=<new-replicas>
. This should create the new PVCs and then run the pods.
What changes are required in the init script?
from kubernetes-kafka.
Sounds like a good summary, and my ideas for how are sketchy at best. Sadly(?) this repo has come of age already and needs to consider backwards compatibility. Hence we might want a multi-step solution:
- Add volume claims to the
zoo
statefulset, keep the init script as is. - Add an
ezoo
(ephemeral) statefulset as a copy of the "old"zoo
, for the multi-zone frugal use case, but with replicas=0. - Include the above kubernetes-kafka release.
- Add a branch (for evaluation by those who dare) that generates the server entries based on
kubectl -n kafka get statefulset zoo -o=jsonpath='{.status.replicas}'
(and equivalent for pzoo - deprecated - and ezoo). - If this is looking good, change defaults to replicas=5 for zoo and replicas=0 for pzoo+ezoo, with a documented migration procedure in release notes.
from kubernetes-kafka.
@solsson I understand that the steps mentioned above are needed due to backwards compatibility, but in case I want 5 pzoos
I just need to change the replication to 5 and remove the zoo
statefulset, right?
from kubernetes-kafka.
@AndresPineros You'll also need to change the server.4
and server.5
lines in 10zookeeper-config.yml and prepend the p
.
from kubernetes-kafka.
See #191 (comment) for the suggested way forward.
from kubernetes-kafka.
Related Issues (20)
- Zookeeper properties file needs an empty line at the end of the file HOT 3
- Run JMX exporter as a Java Agent (how to?) HOT 1
- Pod, Service and Statefull pending
- Error connecting to node kafka-0.broker.kafka.svc.cluster.local:9092 HOT 1
- Error processing /etc/kafka/zookeeper.properties.scale-5.pzoo-0 HOT 5
- Can you tell me about 10 brokers in Kafka- config.yml File parameters log.retention.hours= -1 and log.retention.hours=168 What's the difference?
- Release v6.0.4 Seems to be a Breaking Release? HOT 6
- ZooKeeper produce a zombie processes HOT 4
- Error processing /etc/kafka/zookeeper.properties.scale-5.pzoo-1 HOT 5
- Can't produce/consume with outside brokers HOT 1
- [Question] Getting started but no resources created?
- upstream bug: zookeeper 3.5.7 leader election seriously broken HOT 1
- How do I specify my own volumeclass / volume mount locations?
- Zookeeper Init:Error "/etc/kafka-configmap/init.sh: No such file or directory"
- Issue on external service (Kafka) HOT 1
- Incompatible with newer kustomize/kubectl
- Quickstart is broken (v6.0.3) HOT 1
- Auto scale Kafka partitions HOT 1
- Unable to successfully start pods - CrashLoopBackOff error HOT 1
- ARM64 Images for Kafka JMX Prometheus Exporter
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kubernetes-kafka.