HA -- run any number of Marathon schedulers, but only one gets elected as

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-

Marathon in HA mode about go-marathon HOT 7 OPEN

gambol99 commented on May 30, 2024

Marathon in HA mode

from go-marathon.

Comments (7)

timoreimann commented on May 30, 2024

Have you evaluated whether the latency you describe actually makes a difference in your environment? I'm not convinced that the optimizations you suggest have a significant impact. Forwarding requests from a slave to the master should be very fast in regular setups where all Marathon nodes live in the same network. So I'd ask for caution to increase code complexity in this regard unless the benefit is truly measurable.

My personal experience is that there are other factors in Marathon operations that play a much bigger role and make the initial one or two round-trips to the endpoint look neglectable (such as running health checks, downloading Docker images, and waiting for subsequent network elements like Bamboo or HAProxy to react in time). FWIW, the Marathon clients in our organization always hit the same Marathon node, and we execute up to a few hundred deployments each day.

from go-marathon.

mattes commented on May 30, 2024

I have no reliable data right now unfortunately. Proxying always comes at a cost. Twice as many TCP connections, burnt CPU cycles, more points of failure, ... I think we can do better. In fact, I don't understand why we ended up with the current approach? I agree, complexity should be avoided. This is especially true for my "resending request" idea.

I can confirm all other points you mentioned in the second paragraph. Btw, we started investigating plain iptable rules and IPVS to get rid of HAproxy. That being said, I think it's out of scope for this particular issue.

FWIW from my side, we use Marathon a little differently. We have a lot of deployments going on and somewhat rely on fast marathon response times. We might end up writing our own framework.

from go-marathon.

timoreimann commented on May 30, 2024

I actually think it was smart not to build in any optimization logic in the first place. Marathon is supposed to do a good job at handling fail-over, including making sure there's always an endpoint responding. Redoing part of this logic inside the library also duplicates part of the (operational) responsibility. I'm not saying that we shouldn't do it, just that it doesn't come for free and that we should consider it carefully.

Can you elaborate further on how your usage of Marathon differs from the "standard" case? I'd like to better understand how shaving off a few milliseconds or so of network delay will help you.

from go-marathon.

mattes commented on May 30, 2024

We essentially try to guarantee that a Docker container we start is reachable within less than a second. The fluctuation is high.

from go-marathon.

msabramo commented on May 30, 2024

I wonder why the library supports passing in multiple endpoints, as opposed to passing in a single endpoint which points to a load balancer?

Was the multiple endpoints feature added because having the library handle this was thought to be superior, or was it just for folks who don’t have or don’t want to deal with a separate load balancer?

I would think that throwing an nginx, HAProxy, ELB, etc. in front of 3 marathon servers and passing the LB address to go-marathon would give pretty good availability and latency. Are there benefits to be had from letting go-marathon be aware of the individual servers?

from go-marathon.

timoreimann commented on May 30, 2024

@msabramo the LB approach seems totally valid, and nothing should stop you from running go-marathon in that setup.

What go-marathon does under the hood though when you specify multiple endpoints is that it automatically fails over to other hosts if the initial one is not available, marking it as down, and periodically trying to get it back into rotation if it proves to come alive again. I suppose that you'd need or at least want to replicate some of the described behavior with your load balancer of choice as well. This would come on top of the general effort to maintaining a load balancer per se.

So each approach comes with different trade-offs, none is necessarily better than the other to my understanding.

from go-marathon.

msabramo commented on May 30, 2024

Thanks, @timoreimann!

We currently use the LB approach (and the LB is doing similar things to what go-marathon does) and I was wondering if we were losing out on resiliency benefits by not giving all of the endpoints to go-marathon. It sounds like we're not losing anything, so that's great.

Thanks for responding!

from go-marathon.

Marathon in HA mode about go-marathon HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent