Git Product home page Git Product logo

vip-manager's Introduction

License: MIT Go Report Card Release Github All Releases

vip-manager

Manages a virtual IP based on state kept in etcd or Consul

Table of Contents

Prerequisites

  • go >= 1.19
  • make (optional)
  • goreleaser (optional)

Building

  1. clone this repo
git clone https://github.com/cybertec-postgresql/vip-manager.git
  1. Build the binary using make or go build.
  2. To build your own packages (.deb, .rpm, .zip, etc.), run
make package

or

goreleaser release --snapshot --skip-publish --rm-dist

Installing from package

You can download .rpm or .deb packages here, on the Releases page. On Debian and Ubuntu, the universe repositories should provide you with vip-manager, though the version may be not as recent.

Warning
Our packages are probably not compatible with the one from those repositories, do not try to install them side-by-side.

Installing from source

  • Follow the steps to build vip-manager.
  • Run DESTDIR=/tmp make install to copy the binary, service files and config file into the destination of your choice.
  • Edit config to your needs, then run systemctl daemon-reload, then systemctl start vip-manager.

Note
systemd will only pick the service files up if you chose a DESTDIR so that it can find it. Usually DESTDIR='' should work.

Environment prerequisites

When vip-manager is in charge of registering and deregistering the VIP locally, it needs superuser privileges to do so. This is not required when vip-manager is used to manage a VIP through some API, e.g. Hetzner Robot API or Hetzner Cloud API.

Note
At some point it would be great to reduce this requirement to only the CAP_NET_RAW and CAP_NET_ADMIN capabilities, which could be added by a superuser to the vip-manager binary once. Right now, this is not possible since vip-manager launches plain shell commands to register and deregister virtual IP addresses locally (at least on linux), so the whole user would need these privileges. When vip-manager is eventually taught to directly use a library that directly uses the Linux kernel's API to register/deregister the VIP, the capabilities set for the binary will suffice.

PostgreSQL prerequisites

For any virtual IP based solutions to work in general with Postgres you need to make sure that it is configured to automatically scan and bind to all found network interfaces. So something like * or 0.0.0.0 (IPv4 only) is needed for the listen_addresses parameter to activate the automatic binding. This again might not be suitable for all use cases where security is paramount for example.

nonlocal bind

If you can't set listen_addresses to a wildcard address, you can explicitly specify only those adresses that you want to listen to. However, if you add the virtual IP to those addresses, PostgreSQL will fail to start when that address is not yet registered on one of the interfaces of the machine. You need to configure the kernel to allow "nonlocal bind" of IP (v4) addresses:

  • temporarily:
sysctl -w net.ipv4.ip_nonlocal_bind=1
  • permanently:
echo "net.ipv4.ip_nonlocal_bind = 1"  >> /etc/sysctl.conf
sysctl -p

Configuration

The configuration can be passed to the executable through argument flags, environment variables or through a YAML config file. Run vip-manager --help to see the available flags.

Note
The location of the YAML config file can be specified with the --config flag. An exemplary config file is installed into /etc/default/vip-manager.yml or is available in the vipconfig directory in the repository of the software.

Configuration is now (from release v1.0 on) handled using the viper library. This means that environment variables, command line flags, and config files can be used to configure vip-manager. When using different configuration sources simultaneously, this is the precedence order:

  • flag
  • env
  • config

Note
So flags always overwrite env variables and entries from the config file. Env variables overwrite the config file entries.

All flags and file entries are written in lower case. To make longer multi-word flags and entries readable, they are separated by dashes, e.g. retry-num.

If you put a flag or file entry into uppercase and replace dashes with underscores, you end up with the format of environment variables. To avoid overlapping configuration with other applications, the env variables are additionall prefixed with VIP_, e.g. VIP_RETRY_NUM.

This is a list of all avaiable configuration items:

flag/yaml key env notation required example description
ip VIP_IP yes 10.10.10.123 The virtual IP address that will be managed.
netmask VIP_NETMASK yes 24 The netmask that is associated with the subnet that the virtual IP vip is part of.
interface VIP_INTERFACE yes eth0 A local network interface on the machine that runs vip-manager. Required when using manager-type=basic. The vip will be added to and removed from this interface.
trigger-key VIP_TRIGGER_KEY yes /service/pgcluster/leader The key in the DCS that will be monitored by vip-manager. Must match <namespace>/<scope>/leader from Patroni config. When the value returned by the DCS equals trigger-value, vip-manager will make sure that the virtual IP is registered to this machine. If it does not match, vip-manager makes sure that the virtual IP is not registered to this machine.
trigger-value VIP_TRIGGER_VALUE no pgcluster_member_1 The value that the DCS' answer for trigger-key will be matched to. Must match <name> from Patroni config. This is usually set to the name of the patroni cluster member that this vip-manager instance is associated with. Defaults to the machine's hostname.
manager-type VIP_MANAGER_TYPE no basic Either basic or hetzner. This describes the mechanism that is used to manage the virtual IP. Defaults to basic.
dcs-type VIP_DCS_TYPE no etcd The type of DCS that vip-manager will use to monitor the trigger-key. Defaults to etcd.
dcs-endpoints VIP_DCS_ENDPOINTS no http://10.10.11.1:2379 A url that defines where to reach the DCS. Multiple endpoints can be passed to the flag or env variable using a comma-separated-list. In the config file, a list can be specified, see the sample config for an example. Defaults to http://127.0.0.1:2379 for dcs-type=etcd and http://127.0.0.1:8500 for dcs-type=consul.
etcd-user VIP_ETCD_USER no patroni A username that is allowed to look at the trigger-key in an etcd DCS. Optional when using dcs-type=etcd .
etcd-password VIP_ETCD_PASSWORD no snakeoil The password for etcd-user. Optional when using dcs-type=etcd . Requires that etcd-user is also set.
consul-token VIP_CONSUL_TOKEN no snakeoil A token that can be used with the consul-API for authentication. Optional when using dcs-type=consul .
interval VIP_INTERVAL no 1000 The time vip-manager main loop sleeps before checking for changes. Measured in ms. Defaults to 1000. Doesn't affect etcd checker since v2.3.0.
retry-after VIP_RETRY_AFTER no 250 The time to wait before retrying interactions with components outside of vip-manager. Measured in ms. Defaults to 250.
retry-num VIP_RETRY_NUM no 3 The number of times interactions with components outside of vip-manager are retried. Defaults to 3.
etcd-ca-file VIP_ETCD_CA_FILE no /etc/etcd/ca.cert.pem A certificate authority file that can be used to verify the certificate provided by etcd endpoints. Make sure to change dcs-endpoints to reflect that https is used.
etcd-cert-file VIP_ETCD_CERT_FILE no /etc/etcd/client.cert.pem A client certificate that is used to authenticate against etcd endpoints. Requires etcd-ca-file to be set as well.
etcd-key-file VIP_ETCD_KEY_FILE no /etc/etcd/client.key.pem A private key for the client certificate, used to decrypt messages sent by etcd endpoints. Required when etcd-cert-file is specified.
verbose VIP_VERBOSE no true Enable more verbose logging. Currently only the manager-type=hetzner provides additional logs.

Configuration - Hetzner

To use vip-manager with Hetzner Robot API you need a Credential file, set hosting_type to hetzner in /etc/default/vip-manager.yml and your Floating-IP must be added on all Servers. The Floating-IP (VIP) will not be added or removed on the current Master node interface, Hetzner will route it to the current one.

Credential File - Hetzner

Add the File /etc/hetzner with your Username and Password

user="myUsername"
pass="myPassword"

Debugging

Either:

  • run vip-manager with --verbose flag or
  • set verbose to true in /etc/default/vip-manager.yml
  • set VIP_VERBOSE=true

Note
Currently only supported for hetzner

Author

Cybertec Schönig & Schönig GmbH, https://www.cybertec-postgresql.com

vip-manager's People

Contributors

ants avatar bjoernalbers avatar christoph-heiss avatar dbungert avatar dependabot[bot] avatar df7cb avatar dreamingdeer avatar factort avatar gandalfmagic avatar immercool avatar kmoppel avatar lukasertl avatar markwort avatar mbanck-ntap avatar pashagolub avatar tobischo avatar tpo avatar xiuhuaruan avatar yanchenko-igor avatar zhongyibill avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

vip-manager's Issues

new raw package incompatible with golang < 1.12

Hi,

after merging the new arp code along with updated arp and raw packages, vip-manager no longer compiles under golang v1.11. It seems that code needs at least golang v1.12. See mdlayher/raw#63

Current Debian stable (buster) has the golang v1.11 package , but there is golang v1.14 in buster-backports - see here. I'm not sure about the status of golang v1.14 in backports, but I'd think it's not supported on the same level as v1.11.

Similar for Ubuntu, where bionic has golang v1.10, later releases have golang > v1.12 - see here.

Please do not take my analysis above at face value, since I am not at home in Go development and so the above might not be 100% correct.

If my analysis above is correct however then please ponder whether you want to depend on Go > v1.12. If so, then I'd suggest to document that dependency explicitly in the README.

Undelayed retry on Consul GET error

I am not totally sure if this is a bug in vip-manager, but would assume it.

After the Consul cluster was in a degraded state, I noticed that the Consul log was flooded with hundreds of these Messages which appeared in millisecond interval:

Apr 21 10:02:33 db-demo3 consul[589]:     2020-04-21T10:02:33.650+0200 [ERROR] agent.http: Request error: method=GET url=/v1/kv/postgresql-common/9.6-demo/leader?index=549685&wait=1.5085322856903076s from=127.0.0.1:55614 error="rpc error getting client: failed to get conn: dial tcp <nil>->10.17.4.12:8300: connect: connection refused"
Apr 21 10:02:33 db-demo3 consul[589]:     2020-04-21T10:02:33.651+0200 [ERROR] agent.http: Request error: method=GET url=/v1/kv/postgresql-common/9.6-demo/leader?index=549685&wait=1.5074844360351562s from=127.0.0.1:55614 error="rpc error getting client: failed to get conn: dial tcp <nil>->10.17.4.12:8300: connect: connection refused"
Apr 21 10:02:33 db-demo3 consul[589]:     2020-04-21T10:02:33.652+0200 [ERROR] agent.http: Request error: method=GET url=/v1/kv/postgresql-common/9.6-demo/leader?index=549685&wait=1.5064284801483154s from=127.0.0.1:55614 error="rpc error getting client: failed to get conn: dial tcp <nil>->10.17.4.12:8300: connect: connection refused"
Apr 21 10:02:33 db-demo3 consul[589]:     2020-04-21T10:02:33.653+0200 [ERROR] agent.http: Request error: method=GET url=/v1/kv/postgresql-common/9.6-demo/leader?index=549685&wait=1.5053811073303223s from=127.0.0.1:55614 error="rpc error getting client: failed to get conn: dial tcp <nil>->10.17.4.12:8300: connect: connection refused"

I think this is caused by vip-manager repeatedly requesting the leader key without any delay under certain cirumstances.

make don`t work

Download

git clone https://github.com/cybertec-postgresql/vip-manager.git
Cloning into 'vip-manager'...
remote: Enumerating objects: 4735, done.
remote: Total 4735 (delta 0), reused 0 (delta 0), pack-reused 4735
Receiving objects: 100% (4735/4735), 8.70 MiB | 6.10 MiB/s, done.
Resolving deltas: 100% (1267/1267), done.

cd vip-manager/

[root@vip-manager-etcd-apatsev-1 ~]# cd vip-manager/

make all

[root@vip-manager-etcd-apatsev-1 vip-manager]# make all
go build -ldflags="-s -w" .
main.go:16:2: cannot find package "github.com/cybertec-postgresql/vip-manager/checker" in any of:
	/usr/lib/golang/src/github.com/cybertec-postgresql/vip-manager/checker (from $GOROOT)
	/root/go/src/github.com/cybertec-postgresql/vip-manager/checker (from $GOPATH)
main.go:17:2: cannot find package "github.com/cybertec-postgresql/vip-manager/vipconfig" in any of:
	/usr/lib/golang/src/github.com/cybertec-postgresql/vip-manager/vipconfig (from $GOROOT)
	/root/go/src/github.com/cybertec-postgresql/vip-manager/vipconfig (from $GOPATH)
basicConfigurer.go:13:2: cannot find package "github.com/mdlayher/arp" in any of:
	/usr/lib/golang/src/github.com/mdlayher/arp (from $GOROOT)
	/root/go/src/github.com/mdlayher/arp (from $GOPATH)
main.go:14:2: cannot find package "gopkg.in/yaml.v2" in any of:
	/usr/lib/golang/src/gopkg.in/yaml.v2 (from $GOROOT)
	/root/go/src/gopkg.in/yaml.v2 (from $GOPATH)
make: *** [vip-manager] Error 1

make install

[root@vip-manager-etcd-apatsev-1 vip-manager]# make install
install -d tmp/usr/bin
install vip-manager tmp/usr/bin/vip-manager
install: cannot stat ‘vip-manager’: No such file or directory
make: *** [install] Error 1

make package-rpm

[root@vip-manager-etcd-apatsev-1 vip-manager]# make package-rpm
go build -ldflags="-s -w" .
main.go:16:2: cannot find package "github.com/cybertec-postgresql/vip-manager/checker" in any of:
	/usr/lib/golang/src/github.com/cybertec-postgresql/vip-manager/checker (from $GOROOT)
	/root/go/src/github.com/cybertec-postgresql/vip-manager/checker (from $GOPATH)
main.go:17:2: cannot find package "github.com/cybertec-postgresql/vip-manager/vipconfig" in any of:
	/usr/lib/golang/src/github.com/cybertec-postgresql/vip-manager/vipconfig (from $GOROOT)
	/root/go/src/github.com/cybertec-postgresql/vip-manager/vipconfig (from $GOPATH)
basicConfigurer.go:13:2: cannot find package "github.com/mdlayher/arp" in any of:
	/usr/lib/golang/src/github.com/mdlayher/arp (from $GOROOT)
	/root/go/src/github.com/mdlayher/arp (from $GOPATH)
main.go:14:2: cannot find package "gopkg.in/yaml.v2" in any of:
	/usr/lib/golang/src/gopkg.in/yaml.v2 (from $GOROOT)
	/root/go/src/gopkg.in/yaml.v2 (from $GOPATH)
make: *** [vip-manager] Error 1
```

vip-manager does not see the etcd keys written by patroni using etcd3 config

Hello.

Does vip-manager support etcd v3 api?

When using etcd.hosts with patroni, vip-manager can see the etcd keys and works properly.

But if using etcd3.hosts instead, vip-manager throws etcd error: 100: Key not found (/service/pgcluster/leader),
though keys are there.

# ETCDCTL_API=3 etcdctl --endpoints http://192.168.4.4:2379 get --prefix / | grep leader
/service/pgcluster/leader
/service/pgcluster/optime/leader

Related issue: zalando/patroni#1822

Clarify "interactions with components outside of vip-manager"

Both the retry-num and retry-after configuration parameters mention "interactions with components outside of vip-manager". I initially assumed this had something to do with communication with the DCS, but looking at the code, it seems to be about the arp client?

I would assume other "interactions with [other] components outside of vip-manager" would probably require other timeouts/number of retries, so being very unspecific here might not be helpful to the user.

Provide guidance on listen_addresses / other Postgres setup

AFAICT, the documentation/README only describes the vip-manager part of the equation. However, for Postgres to actually listen to the vip, isn't there some setup needed? I guess you either need to set listen_addresses to *, or set net.ipv4.ip_nonlocal_bind to 1 or something? Otherwise, Postgres might be started before the vip is set and won't be able to bind to it.

The problem with * is that if you have multiple instances running concurrently, they will all bind to all VIPs which might not be what you want if VIPs are managed per-instance. It might not be a problem in practice though as usually the port will be different anyway.

Am I missing something obvious here that makes it just-work on the Postgres side?

Improve config handling

In the beginning, only flags where available to configure vip-manager.
ENV variables where used in the Service file to define the Variables that would be inserted following the flags in the executing call in the Service.

In the meantime, I have added config file handling, where the file was parsed by our own code and precedence had to be handled explicitly (i.e. override values from config file with CLI settings etc.)

There has been an effort to enable vip-manager to talk to etcd Clusters using certificate authentication #28 .
That implementation itself introduces logic to retrieve ENV variables.

To try and clean up this mess, I have decided to use Viper ( https://github.com/spf13/viper ) to handle config stuff for us.
At the same time, I tried to clean up the mess of different naming-schemes for various variables and added appropriate deprecation notices (so far only for flags, as there was so far no possibility to "Get" ENV vars from within vip-manager, it was always through the service file, so the deprecation notes would show up there as well).

Please see branch https://github.com/cybertec-postgresql/vip-manager/tree/viper-config for the changes in main.go .
I have not yet updates the documentation or reference .yml config file to reflect the changes.

vip-manager complains virtual ip address's state is false

Hi Team,

I am using consul as my DCS and the version of vip-manager is 1.0.2. Once I tried to bring it up via "vip-manager -config=./vip-manager.yml", it posted:

IP address <ip>/<netmask> state is false, desired false
IP address <ip>/<netmask> state is false, desired true
Problems with producing the arp client: listen package <part of mac I think>: socket: operation not permitted
Problems with producing the arp client: listen package <part of mac I think>: socket: operation not permitted
too many retries
Couldn't create an Arp client: listen package <part of mac I think>: socket: operation not permitted

Does this mean I do not have the permission to add the VIP to the ens?

[Feature proposal] Support a "standby IP" in case of 3+ nodes

Currently we can only "float around" a single primary IP...but in many cases also a secondary standby IP-s are needed to offload some traffic for example. In case of 2 total nodes this can be achieved somehow with some ugly "negate" hacking currently but with 3 nodes it will already become impossible as we need somekind of qorum.

Implementation idea: to piggyback on top of ETCD and introduce some new key like /service/batman/standbyleader that all standby nodes would like to grab and respond to.

Handle etcd hickups more gracefully

(maybe consul has the same behaviour)

Right now, if vip-manager cannot get a positive answer from etcd after the interval main-loop, it deconfigures the VIP, resulting in application downtime. This happens also during leader election, maybe because one etcd node was too slow to respond (while the PostgreSQL leader is perfectly fine and available).

The admin could increase the interval timeout in order to make vip-manager more resilient against etcd issues, but that will also increase the time it takes for vip-manager to pickup Patroni failovers/switchovers.

As a practical (contrived) example, if I have one external etcd server, and a 3-node vip-manager-1.0.1/patroni-2.0.1 cluster with interval, loop_wait and retry_timeout all set to 10, and I SIGSTOP the etcd server, then I see the following:

Dez 07 21:41:07 pg1 vip-manager[27203]: 2020/12/07 21:41:07 IP address 10.0.0.2/24 state is true, desired true
Dez 07 21:41:08 pg1 patroni@13-test3[26334]: 2020-12-07 21:41:08,088 INFO: Lock owner: pg1; I am pg1
Dez 07 21:41:08 pg1 patroni@13-test3[26334]: 2020-12-07 21:41:08,099 INFO: no action.  i am the leader with the lock
Dez 07 21:41:27 pg1 vip-manager[27203]: 2020/12/07 21:41:27 IP address 10.0.0.2/24 state is true, desired true
Dez 07 21:41:28 pg1 patroni@13-test3[26334]: 2020-12-07 21:41:28,088 INFO: Lock owner: pg1; I am pg1
Dez 07 21:41:28 pg1 patroni@13-test3[26334]: 2020-12-07 21:41:28,099 INFO: no action.  i am the leader with the lock

Everything cool, now I SIGSTOP etcd.

Dez 07 21:41:34 pg1 vip-manager[27203]: 2020/12/07 21:41:34 etcd error: client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://192.168.122.189:2379 exceeded header timeout
Dez 07 21:41:34 pg1 vip-manager[27203]: 2020/12/07 21:41:34 IP address 10.0.0.2/24 state is true, desired false
Dez 07 21:41:34 pg1 vip-manager[27203]: 2020/12/07 21:41:34 Removing address 10.0.0.2/24 on eth0

6 seconds later, vip-manager's loop wakes up, sees that etcd is down and immediately deconfigures the VIP. (EDIT: I wonder why that did not happen at the scheduled interval timeout of Dez 07 21:41:37?)

Dez 07 21:41:34 pg1 vip-manager[27203]: 2020/12/07 21:41:34 IP address 10.0.0.2/24 state is false, desired false
Dez 07 21:41:37 pg1 vip-manager[27203]: 2020/12/07 21:41:37 IP address 10.0.0.2/24 state is false, desired false
Dez 07 21:41:41 pg1 patroni@13-test3[26334]: 2020-12-07 21:41:41,422 WARNING: Retrying (Retry(total=1, connect=None, read=None, redirect=0, status=None)) after connection broken by 'ReadTimeoutError("HTTPConnectionPool(host='192.168.122.189', port=2379): Read timed out. (read timeout=3.333155213321637)")': /v2/keys/postgresql-common/13-test3/?recursive=true

Now patroni realized something is wrong (though I'm not sure why it took it another 3 seconds from the scheduled ping timestamp of Dez 07 21:41:38 - maybe Patroni has much longer timeouts to wait for an etcd response than vip-manager?

Dez 07 21:41:44 pg1 patroni@13-test3[26334]: 2020-12-07 21:41:44,759 WARNING: Retrying (Retry(total=0, connect=None, read=None, redirect=0, status=None)) after connection broken by 'ReadTimeoutError("HTTPConnectionPool(host='192.168.122.189', port=2379): Read timed out. (read timeout=3.3322158453423376)")': /v2/keys/postgresql-common/13-test3/?recursive=true
Dez 07 21:41:45 pg1 vip-manager[27203]: 2020/12/07 21:41:45 etcd error: client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://192.168.122.189:2379 exceeded header timeout
Dez 07 21:41:47 pg1 vip-manager[27203]: 2020/12/07 21:41:47 IP address 10.0.0.2/24 state is false, desired false
Dez 07 21:41:48 pg1 patroni@13-test3[26334]: 2020-12-07 21:41:48,096 ERROR: Request to server http://192.168.122.189:2379 failed: MaxRetryError('HTTPConnectionPool(host=\'192.168.122.189\', port=2379): Max retries exceeded with url: /v2/keys/postgresql-common/13-test3/?recursive=true (Caused by ReadTimeoutError("HTTPConnectionPool(host=\'192.168.122.189\', port=2379): Read timed out. (read timeout=3.3330249423355176)"))')
Dez 07 21:41:48 pg1 patroni@13-test3[26334]: 2020-12-07 21:41:48,096 INFO: Reconnection allowed, looking for another server.
Dez 07 21:41:48 pg1 patroni@13-test3[26334]: 2020-12-07 21:41:48,097 ERROR: get_cluster
Dez 07 21:41:48 pg1 patroni@13-test3[26334]: Traceback (most recent call last):
Dez 07 21:41:48 pg1 patroni@13-test3[26334]:   File "/usr/lib/python3/dist-packages/patroni/dcs/etcd.py", line 590, in _load_cluster
Dez 07 21:41:48 pg1 patroni@13-test3[26334]:     result = self.retry(self._client.read, self.client_path(''), recursive=True)
Dez 07 21:41:48 pg1 patroni@13-test3[26334]:   File "/usr/lib/python3/dist-packages/patroni/dcs/etcd.py", line 443, in retry
Dez 07 21:41:48 pg1 patroni@13-test3[26334]:     return retry(*args, **kwargs)
Dez 07 21:41:48 pg1 patroni@13-test3[26334]:   File "/usr/lib/python3/dist-packages/patroni/utils.py", line 333, in __call__
Dez 07 21:41:48 pg1 patroni@13-test3[26334]:     return func(*args, **kwargs)
Dez 07 21:41:48 pg1 patroni@13-test3[26334]:   File "/usr/lib/python3/dist-packages/etcd/client.py", line 595, in read
Dez 07 21:41:48 pg1 patroni@13-test3[26334]:     response = self.api_execute(
Dez 07 21:41:48 pg1 patroni@13-test3[26334]:   File "/usr/lib/python3/dist-packages/patroni/dcs/etcd.py", line 271, in api_execute
Dez 07 21:41:48 pg1 patroni@13-test3[26334]:     raise ex
Dez 07 21:41:48 pg1 patroni@13-test3[26334]:   File "/usr/lib/python3/dist-packages/patroni/dcs/etcd.py", line 255, in api_execute
Dez 07 21:41:48 pg1 patroni@13-test3[26334]:     response = self._do_http_request(retry, machines_cache, request_executor, method, path, **kwargs)
Dez 07 21:41:48 pg1 patroni@13-test3[26334]:   File "/usr/lib/python3/dist-packages/patroni/dcs/etcd.py", line 232, in _do_http_request
Dez 07 21:41:48 pg1 patroni@13-test3[26334]:     raise etcd.EtcdConnectionFailed('No more machines in the cluster')
Dez 07 21:41:48 pg1 patroni@13-test3[26334]: etcd.EtcdConnectionFailed: No more machines in the cluster
Dez 07 21:41:48 pg1 patroni@13-test3[26334]: 2020-12-07 21:41:48,098 ERROR: Error communicating with DCS
Dez 07 21:41:48 pg1 patroni@13-test3[26334]: 2020-12-07 21:41:48,237 INFO: closed patroni connection to the postgresql cluster
Dez 07 21:41:48 pg1 patroni@13-test3[26334]: 2020-12-07 21:41:48,501 INFO: postmaster pid=27452
Dez 07 21:41:48 pg1 patroni@13-test3[27453]: /var/run/postgresql/:5435 - keine Antwort
Dez 07 21:41:48 pg1 patroni@13-test3[27452]: 2020-12-07 21:41:48.510 UTC [27452] LOG:  Logausgabe wird an Logsammelprozess umgeleitet
Dez 07 21:41:48 pg1 patroni@13-test3[27452]: 2020-12-07 21:41:48.510 UTC [27452] TIPP:  Die weitere Logausgabe wird im Verzeichnis »/var/log/postgresql« erscheinen.
Dez 07 21:41:49 pg1 patroni@13-test3[27459]: /var/run/postgresql/:5435 - Verbindungen werden angenommen
Dez 07 21:41:49 pg1 patroni@13-test3[27461]: /var/run/postgresql/:5435 - Verbindungen werden angenommen
Dez 07 21:41:49 pg1 patroni@13-test3[26334]: 2020-12-07 21:41:49,575 INFO: demoted self because DCS is not accessible and i was a leader

Only now (around 10 seconds after vip-manager) does Patroni deconfigure itself. If I SIGCONT etcd between 10-20 seconds after I do SIGSTOP, Patroni marches on without user-visable changes, while vip-manager takes down the VIP.

So I think having a tight main loop interval in order to notice leader key changes, but a second retry interval in order to decide whether the DCS is down or not would be good. I assumed the retry-after and retry-num configuration parameters were that, but that does not seem to be the case (see #68).

Tags/Releases

There seem to be several versions in the Debian packaging or mentioned in commit messages (like adc9404), but there are no tags or releases, is that intentional? Is the actual current version on the master branch 0.4, or is that some Debian package version?

delete Dockerfile

Now Docker image contains only binary of vip-manager and is outdated. There is no reason for having docker image, because container cannot apply changes to the host machine

vip on a sync standby

Hi,

In some cases, we need a vip on a sync standby. If we use only 2 patroni nodes, everything is ok with the trigger-key "/service//sync.

  • Server A : trigger-value: "{\"leader\":\"server_b\",\"sync_standby\":\"server_a\"}"
  • Server B : trigger-value: "{\"leader\":\"server_a\",\"sync_standby\":\"server_b\"}"

But when we use 4 nodes (1 leader, 1 sync_standby and 2 async_standby), we can't set a correct trigger-value because that trigger-key returns the leader and the sync_standby in json and we can't predict which node is leader.

It's possible to parse the json value to get only the sync_standby ?

broken Debian version number vip-manager_1.0_beta3-1_amd64.deb

So to me this is a minor problem, but maybe you can fix it up somewhere in your process:

dpkg -i vip-manager_1.0_beta3-1_amd64.deb 
dpkg: error processing archive vip-manager_1.0_beta3-1_amd64.deb (--install):
 parsing file '/var/lib/dpkg/tmp.ci/control' near line 2 package 'vip-manager':
 'Version' field value '1.0_beta3-1': invalid character in version number
Errors were encountered while processing:
 vip-manager_1.0_beta3-1_amd64.deb

A valid version number/package name would be f.ex.

vip-manager_1.0~beta3-1_amd64.deb

https://readme.phys.ethz.ch/documentation/debian_version_numbers/ has a nice overview of package versioning possiblities and logic.

workaround:

dpkg -i --force-bad-version vip-manager_1.0~beta3-1_amd64.deb

Support multiple instances

The current debian package sets up a systemd service that uses /etc/default/vip-manager.yml. This doesn't allow for the use case of having more than one VIP allocated to a node.

I utilize a separate VIP for each database, and host some databases on the same clusters. So I would propose the addition of the following in addition to creation of a /etc/vip-manager directory, to support this:

/lib/systemd/system/[email protected]:

[Unit]
Description=VIP-Manager instance %i
Before=patroni.service
ConditionPathExists=/etc/vip-manager/%i.yml

[Service]
Type=simple
ExecStart=/usr/bin/vip-manager -config=/etc/vip-manager/%i.yml
Restart=on-failure

[Install]
WantedBy=multi-user.target

retry-num description in README wrong

The retry-num config parameter has the following in README.md:

`retry-num`         | `VIP_RETRY_NUM`       | no        | 3                         | The number of times interactions with components outside of vip-manager are retried. Measured in ms. Defaults to `250`.

The "Measured in ms. Defaults to 250" appears to be a copy-pasto.

install rule broken

The install rule in the Makefile appears to be broken, the files package/scripts/init-systemd.service and package/scripts/init-systemv.sh mentioned there do no longer exist and the file vip-manager.default is in package/config, not package/scripts.

Do nothing when vip-manager is gracefully shutdown

Currently the vip-manager unassigns itself the virtual ip if the vip-manager is gracefully shutdown.

In my opinion the vip-manager should do nothing when shutdown in order to do a clean upgrade of the vip-manager or for basic maintenance?

What do you think @ants ?

Error with secured etcd

Hi,
after having successfully tested etcd, patroni and vip-manager on a 3 nodes cluster, I got some issue with vip-manager when etcd is secured.

etcd error: client: etcd cluster is unavailable or misconfigured; error #0: x509: certificate signed by unknown authority

Patroni is configured to use the certificates and is working well.

Could you tell me if vip-manager need some additional configuration to support certificates with etcd ?

And thank you for the tool, very usefull.

Regards,
Henri Chapelle

create Dockerfile for building binary and possibly packages using golang image instead of golang:alpine

To facilitate building for other platforms (that don't support new enough golang version, see #42 ) we should consider creating a Dockerfile that uses golang, not golang:alpine (due to musl libc vs. glibc incompatibilities) and builds the binary.
That Dockerfile could be used to create packages as well, then we could create a github action that will automatically build and publish packages of new builds.

Feature for certificate authentification

HI,

for etcds connection does vip-manager support cert-authentification ?
The aim is to avoid to store user's password in the configuration file, because of security issues.

Thank you

trigger condition

Hi,

I want to use vip-manager for set IP on replica.
Can you add trigger condition flag with keys 'eq', 'ne'?

Use with pure raft install

Hi to all,

I'm planning to install for a POC a 2 nodes patroni in pure raft mode, offered by last versions.

Does vip-manager van be used to have a virtual IP in this kind of setup?

Regards

Traffic not routed to new host of IP address

When failing over to a new Postgres host, the IP is assigned to the new host and removed from the old host, but traffic fails to be routed to the new host until roughly 15 minutes later.

Workaround is to do a arping -c 4 -A -I <interface> <ip address>
Ubuntu 18.04

unknown shorthand flag: 'c' in -config with vip-manager v1.0 deb package

FYI
vip-manager_1.0-1_amd64.deb -> /lib/systemd/system/vip-manager.service

/usr/bin/vip-manager -config=/etc/default/vip-manager.yml

Failed to start Manages Virtual IP for Patroni.

Nov 04 16:01:07 pgnode01 systemd[1]: Started Manages Virtual IP for Patroni.
Nov 04 16:01:07 pgnode01 vip-manager[10730]: unknown shorthand flag: 'c' in -config=/etc/default/vip-manager.yml

Because the old keys in the YAML config files will be remapped to new keys, all that needs to be done is to add a single dash:

ExecStart=/usr/bin/vip-manager --config=/etc/default/vip-manager.yml

upgrade from 0.6.4 to 1.0-beta3 breaks because "one-dash" option -config doesn't exist any more

From syslog:

Oct 12 15:07:43 testmachine vip-manager[15756]: unknown shorthand flag: 'c' in -config=/etc/patroni/11-test.vip.yml

This is actually coming from the systemd config file:

# cat /etc/systemd/system/multi-user.target.wants/[email protected]
[...]
ExecStart=/usr/bin/vip-manager -config=/etc/patroni/%i.vip.yml

Other than that the next vip-manager seems to work with the old /etc/patroni/%i.vip.yml config file just fine. 👍 !

etcd error: 100: Key not found

Hello,
I'm trying to configure this tool with my patroni installation, but it looks like it's not compatible when etcd v3 api :/

when i launch it i just have this errors:

etcd error: 100: Key not found (/service/postgresql_sys)

On etcd cluster this is a result

[16:39]root@etcd-02:~# etcdctl get /service/postgresql/leader
/service/postgresql/leader
postgresql-sys-01

How can i make it work ?
Thanks in advance

Change vip-manager repo description and add consul

This one is very very minor but can be confusing. Please change the repos description from

"Manages a virtual IP based on state kept in etcd"
to
"Manages a virtual IP based on state kept in etcd or consul"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.