Comments (18)
Well, @jmccormick2001 I was asking because looking for alternative solution and understand benefits against mine, please let me know whenever you got something around... I'm all into this topic! thanks :)
from postgres-operator.
right now there is only a full backup (using pg_basebackup)...the other forms of backup and a scheduling capability are in the works too but they will likely lag behind the failover and policy releases.
from postgres-operator.
its something we are definitely going to add, the containers we use under the hood on this support a watch/failover capability, we will leverage some of that into what the operator will allow...
current thinking is something like pgo create watch mycluster or something similar would let a user set up a failover watch on a cluster, ideally we would like to support different failover strategies similar to the way we support different cluster strategies. But for sure, the operator would let you trigger a recovery. so stay tuned, it will happen in an upcoming release of the operator.
from postgres-operator.
Assuming that master went down
Isn't failover supposed to be automatic, based on which replica has most of the data of the master & promote him as master?
(or) automatic mounting pvc of (previous)master to one of the replicas and promote him as the new master?
My assumption was that the operator is supposed to watch over the instances & take care of master selection (on bootstrap & on master failure) & coordination among the participating pods.
Please correct me incase if I am missing anything
from postgres-operator.
this is a good topic for sure and one I'm thinking about...some design ideas in my head include:
- allow cluster 'watching', but don't mandate it
- allow for a user initiated faillover, where the user can trigger the failover to whatever replica they choose
- allow for a failover to a 'certain' replica using a configurable/programmable selection algorithm (based on metadata of a replica, or replication status, other)
- allow for a pre-hook and post-hook script to be executed when a failover is triggered
- allow for continuous cluster watching after a failover has occurred (updating labels as required to enable proper client routing)
- allow for killing off stale replicas after a failover
from postgres-operator.
- allow cluster 'watching', but don't mandate it
usually you dont want cluster without failover logic
- allow for a user initiated faillover, where the user can trigger the failover to whatever replica they choose
Swichover is the correct term here :P as we are not failing ... and yeah it's something i have not developed.
- allow for a failover to a 'certain' replica using a configurable/programmable selection algorithm (based on metadata of a replica, or replication status, other)
That might be solved by replica priority, and quorum election of course to avoid split-brain... so your priority should not mean much in case of network issues but it will in case of master bad health.
- allow for a pre-hook and post-hook script to be executed when a failover is triggered
Any practical use for this one?
- allow for continuous cluster watching after a failover has occurred (updating labels as required to enable proper client routing)
For that one I use pgpool ;) it does it's job, excluding nodes from list of backends based on different conditions and health checks to postgres servers
- allow for killing off stale replicas after a failover
That I don't understand at all :(
Sorry for going through the list of your ideas, but looks like I've been in that state half an year ago. And can help you with some of them if you need help ofcourse :)
I like the Idea to implement special API for DB objects in k8s but also think that you might want to segregate responsibilities:
- reliable dockerized postgres cluster
- management of the cluster using k8s stuff and facilities
PS
I would invest some time in first one(build or adapt my images) while you could focus on wrapper you are developing right now. Let me know if you are interested in and have wish to build something a bit more cooler than we've done separately. Cheers!
from postgres-operator.
Hi everyone!
@jmccormick2001 the postgres-operator is looking very exciting, nice work and overall concept.
I was looking into failover/HA capabilities, too but couldn't really find out what's the current status.
Currently if the cluster master dies the cluster is simply down?
Keep up the good work!
from postgres-operator.
Was also looking at an alternative using stolon / etcd:
https://medium.com/@SergeyNuzhdin/how-to-deploy-ha-postgresql-cluster-on-kubernetes-3bf9ed60c64f
from postgres-operator.
thanks, currently the master runs in a Deployment which 'should' get rescheduled by Kube if the Kube node dies, Kube 'should' restart the pod if it dies as well to keep the Deployment consistent. However, the gotcha here is what if the master's data is somehow corrupted? Kube in that case will just restart the bad database over and over. What I'm considering is a more formal way to specify that "I want to fail over to this replica", I have some ideas on how this will work but want to give it some extra thought before I do the implementation. I also want a means of specifying a 'sync' replica that a user could specifically target as the failover target. This is definitely a high priority for an upcoming release so stay tuned. There is also the case where a user might want an 'automated failover', I'm thinking about that use case as well.
from postgres-operator.
Thanks for the immediate reply!
Simple use case: HA on AWS...
e.g. you're running a Multi-AZ k8s cluster on eu-west-1a
, eu-west-1b
, eu-west-1c
and you want your pgcluster running on k8s to be HA in case one A-Z goes down. In the case that the node on an AZ with the pgcluster master goes down - the EBS being AZ-restrictive isn't available to mount on any other node/pod (AZ down..) - so one of the replicas running on other pods on other nodes on another AZ would immediately and automatically take over.
from postgres-operator.
That's not AWS specific though... just in general - if the db is the critical point of failure - which is for most applications in some way.. and you are aiming towards 100% uptime...
So in case of a sudden node death or downtime, 1-2min for k8s to free the PVC, mount on another node + restart the pod somewhere else is quite a long downtime - when there are replicas available that could take over...
from postgres-operator.
understood, there is definitely the case where users will want to orchestrate a failover onto a specific replica regardless of what Kube might do
from postgres-operator.
Many thanks for your input! I don't want to urge or anything - do you have a rough timescale / idea when some sort of automatic failover may get realised? May I help in any way?
I need to choose a way/option going forward how to build/manage a pg cluster and this operator seems like a pretty good candidate... :-)
from postgres-operator.
no worries, the current roadmap/schedule is to release the 'policy' mechanism in the next week or so, this is a new feature that lets you apply SQL policies against clusters....right after that is the failover work...so I'm shooting to have some form of failover feature in about the 4-5 week time frame, hopefully earlier. Once I have some early work done on this, I'll reach out and see if you could do a sanity check on it.
from postgres-operator.
👍 will try to keep track, happy to help with sanity check.
One more question - is there any way to have automated backups? It's always full backups, right? What about incremental backups?
from postgres-operator.
That all sounds highly promising 👏 !
from postgres-operator.
manual failover is coded and will land in upcoming 2.6 release.
from postgres-operator.
auto failover (first cut) will land in operator 3.1 soon to be released.
from postgres-operator.
Related Issues (20)
- how to create groups or run SQL files using crunchy postgres operator yaml without deleting the Postgres cluster HOT 1
- Basic Steps/Tutorial On how to use the Operator? HOT 3
- 5.5.1 broken upgrade HOT 2
- Documentation Bug - Missing Key HOT 2
- Create Openshift Service + Route on PGAdmin resource deployment HOT 1
- Recreate pgbackrest stanza after lost backrest-shared-repo storage, please help! HOT 2
- Postgres replicas bootstrap error at new deployment HOT 2
- How to enable debug mode in postgres HOT 4
- closed HOT 3
- Adding extension mysql_fdw to postgres-gis HOT 2
- How to limit or prevent the impact HOT 3
- pgbackrest backup fails with ERROR: [082]: WAL segment 000001B000000AF80000009B was not archived before the 60000ms timeout HOT 2
- Issue with the makefile
- Default requests/limits for containers in PostgresCluster CR HOT 3
- Deploying replication slots HOT 1
- Postgres Major Version Upgrade connection to server on socket "/pgdata/.s.PGSQL.50432" failed: FATAL: could not access file "zombodb.so": No such file or directory HOT 2
- UnableToCreateStanzas warning with wrong address for backup repo pod
- Add the postgrescluster helm chart to the OCI developer registry
- Pgadmin URL is not working post adding userInterface in the PGO CRD. HOT 1
- `no pg_hba.conf entry for host` when cluster is being connected to from another namespace
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from postgres-operator.