Git Product home page Git Product logo

Comments (9)

ants avatar ants commented on June 14, 2024

from vip-manager.

Faffnir avatar Faffnir commented on June 14, 2024

Isn't that two seperate things?
When somebody else takes over the vip this is called:

if desiredState {
m.ConfigureAddress()
// For now it is save to say that also working even if a
// gratuitous arp message could not be send but logging an
// errror should be enough.
m.ARPSendGratuitous()
} else {
m.DeconfigureAddress()
}

But when the process receives a not necessarily vip-change-based interrupt externally, it calls this:
select {
case <-ctx.Done():
m.DeconfigureAddress()
return
default:
}

I totally agree with you that the first one has to stay the same but I would just get rid of the second one.
It is most likely that if an external interrupt is issued, it originates from the operating system or a user interaction.
If that interrupt occured due to a vip-change it shouldn't be a problem to stay with the same IP either because with the gratuitous arp the former vip-owner is ignored anyways?

I see a bigger problem with unassigning the ip-address as it could lead to a situations where no servers have the vip.

from vip-manager.

ants avatar ants commented on June 14, 2024

If two hosts have the same vip then it isn't a case of "latest one wins", new hosts will send arp queries and old hosts will have their arp cache entry age out if the connection isn't used for 60 seconds. The two hosts with same vip will then race to respond to the request, and it's much less clear who will win that race. That's why I'm much more worried about two servers having the same vip.

Looking at the code, it looks like we don't catch SIGTERM right now anyway. So it actually already works like I proposed.

If you shutdown vip-manager with kill -SIGINT $pid, it will release the vip, if you shutdown with kill -SIGTERM $pid, the vip is not released. The only nasty part is that on startup we deconfigure the IP and then immediately reconfigure it once first response from etcd comes back.

I think we should try to get the initial state from cluster consensus and initialize IPManager.current_state with that value. If getting the initial state times out (a new timeout parameter, defaulting to 30s), just start up IPManager with initial state false.

from vip-manager.

ants avatar ants commented on June 14, 2024

The new timeout should also apply for the recheck loop in leader checker:

if err != nil {
if ctx.Err() != nil {
break checkLoop
}
log.Printf("etcd error: %s", err)
time.Sleep(1 * time.Second)
continue
}

If we don't get a response within the specified time, we issue a false state.

from vip-manager.

Faffnir avatar Faffnir commented on June 14, 2024

Mh... That's something we should test with another setup. In our cluster setup the "latest one wins".
Only If the new vip is unassigned during the 60 seconds cache entry duration, it will fallback to the old server which has the vip still assigned... (an arp is maybe broadcasted by the switch as soon as the new one unassigns the IP again?)
That's what we actually tested in our infrastructure but I absolutely agree what you said if you only focus on the arp-documentation.
Could it be possible that our switch is handling this and that would mean the behaviour actually differs between different switch-server-cluster setup?

So coming back to the actual problem with vip deconfiguring.
Should we handle the deconfigure/configure problem when starting vip-manager?
Should we add the SIGTERM handling or should we just not handle it because it already handles it like we want it to? ( would be more explicit to someone else who is reading the code if we add it?)

from vip-manager.

ants avatar ants commented on June 14, 2024

My understanding of the ARP protocol is that first response wins. So when a client needs to know which host is serving the IP it will send out a broadcast request, if the network path to one host is shorter that one will get to respond first and it's response will arrive earlier. Unless there is a delay from network congestion or the packet is dropped. So for a given setup some clients will see server A almost always win, some will see server B almost always win, and in a very unlikely case of equal network delay a client could see randomly A or B.

I think for SIGTERM it would be enough to just document this behavior. A prerequisite for that would be that we actually have documentation, which seems like a nice idea. I'll try to get to writing some within the next couple of days, unless you want to volunteer.

from vip-manager.

Faffnir avatar Faffnir commented on June 14, 2024

I absolutely agree with your understanding of ARP but it actually behaved differently when we tested it... Let's wait and test more! We will do a lot more testing the next days anyways

But nevertheless I absolutely agree on the documentation but I would be happy if we at least could split the documentation effort!

Should we also take care of the start-deconfigure-configure issue?

from vip-manager.

ants avatar ants commented on June 14, 2024

Would be interesting to see tcpdump of the ARP traffic from all servers to see what is going on. I will try it out once I get my hands back on our hardware based test setup.

I'll try to write down first cut of the docs tomorrow. If you have the time you are welcome to tackle the startup issue, looking at my schedule I don't think I will get enough time to implement it in the next 2 weeks.

from vip-manager.

Faffnir avatar Faffnir commented on June 14, 2024

I actually catched all the traffic with tcpdump but maybe it is also possible to have a live-update of the arp-table of the switch...
If something really interesting happens we will most likely write a blog-post to also publish the test-results.
I will tackle the startup issue as soon as i can!

from vip-manager.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.