Comments (9)
from vip-manager.
Isn't that two seperate things?
When somebody else takes over the vip this is called:
Lines 62 to 70 in ecdf8ea
But when the process receives a not necessarily vip-change-based interrupt externally, it calls this:
Lines 78 to 83 in ecdf8ea
I totally agree with you that the first one has to stay the same but I would just get rid of the second one.
It is most likely that if an external interrupt is issued, it originates from the operating system or a user interaction.
If that interrupt occured due to a vip-change it shouldn't be a problem to stay with the same IP either because with the gratuitous arp the former vip-owner is ignored anyways?
I see a bigger problem with unassigning the ip-address as it could lead to a situations where no servers have the vip.
from vip-manager.
If two hosts have the same vip then it isn't a case of "latest one wins", new hosts will send arp queries and old hosts will have their arp cache entry age out if the connection isn't used for 60 seconds. The two hosts with same vip will then race to respond to the request, and it's much less clear who will win that race. That's why I'm much more worried about two servers having the same vip.
Looking at the code, it looks like we don't catch SIGTERM right now anyway. So it actually already works like I proposed.
If you shutdown vip-manager with kill -SIGINT $pid
, it will release the vip, if you shutdown with kill -SIGTERM $pid
, the vip is not released. The only nasty part is that on startup we deconfigure the IP and then immediately reconfigure it once first response from etcd comes back.
I think we should try to get the initial state from cluster consensus and initialize IPManager.current_state
with that value. If getting the initial state times out (a new timeout parameter, defaulting to 30s), just start up IPManager with initial state false.
from vip-manager.
The new timeout should also apply for the recheck loop in leader checker:
vip-manager/checker/etcd_leader_checker.go
Lines 47 to 54 in ecdf8ea
If we don't get a response within the specified time, we issue a false state.
from vip-manager.
Mh... That's something we should test with another setup. In our cluster setup the "latest one wins".
Only If the new vip is unassigned during the 60 seconds cache entry duration, it will fallback to the old server which has the vip still assigned... (an arp is maybe broadcasted by the switch as soon as the new one unassigns the IP again?)
That's what we actually tested in our infrastructure but I absolutely agree what you said if you only focus on the arp-documentation.
Could it be possible that our switch is handling this and that would mean the behaviour actually differs between different switch-server-cluster setup?
So coming back to the actual problem with vip deconfiguring.
Should we handle the deconfigure/configure problem when starting vip-manager?
Should we add the SIGTERM handling or should we just not handle it because it already handles it like we want it to? ( would be more explicit to someone else who is reading the code if we add it?)
from vip-manager.
My understanding of the ARP protocol is that first response wins. So when a client needs to know which host is serving the IP it will send out a broadcast request, if the network path to one host is shorter that one will get to respond first and it's response will arrive earlier. Unless there is a delay from network congestion or the packet is dropped. So for a given setup some clients will see server A almost always win, some will see server B almost always win, and in a very unlikely case of equal network delay a client could see randomly A or B.
I think for SIGTERM it would be enough to just document this behavior. A prerequisite for that would be that we actually have documentation, which seems like a nice idea. I'll try to get to writing some within the next couple of days, unless you want to volunteer.
from vip-manager.
I absolutely agree with your understanding of ARP but it actually behaved differently when we tested it... Let's wait and test more! We will do a lot more testing the next days anyways
But nevertheless I absolutely agree on the documentation but I would be happy if we at least could split the documentation effort!
Should we also take care of the start-deconfigure-configure issue?
from vip-manager.
Would be interesting to see tcpdump of the ARP traffic from all servers to see what is going on. I will try it out once I get my hands back on our hardware based test setup.
I'll try to write down first cut of the docs tomorrow. If you have the time you are welcome to tackle the startup issue, looking at my schedule I don't think I will get enough time to implement it in the next 2 weeks.
from vip-manager.
I actually catched all the traffic with tcpdump but maybe it is also possible to have a live-update of the arp-table of the switch...
If something really interesting happens we will most likely write a blog-post to also publish the test-results.
I will tackle the startup issue as soon as i can!
from vip-manager.
Related Issues (20)
- iphlpapi has unclear LICENSE HOT 1
- vip-manager 2.1 doesn't set vip on my environment HOT 4
- vip-manager continues to update the "desired" as true even when the whole etcd cluster is down. HOT 3
- cluster ip confusion HOT 6
- IP address xxx.xx.xx.xxx/24 state is false, desired false HOT 5
- etcd endpoint format is brittle, fails silently HOT 1
- `Go Build & Test` GHA fails while running goreleaser v1.19.0
- support callback when VIP switched(switch hook) HOT 2
- Cannot assign address HOT 4
- Windows Server 2022 failover or switchover with duplicate IP address HOT 2
- Add suport for consul tls HOT 3
- New release ? HOT 1
- release assets names miss version information
- IP-Address not switched on Hard-Shutdown HOT 9
- VIP still assigned when etcd is down HOT 2
- Handle etcd compacted revisions HOT 10
- Questions on DCS availability
- getMask function has some problem HOT 1
- Handle etcd leader changes HOT 5
- bind ipv6 failed HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vip-manager.