Git Product home page Git Product logo

Comments (6)

LincolnBryant avatar LincolnBryant commented on July 18, 2024

So one of the problems we had with scaling up squids is that the monitoring port is impossible(?) to change in the WLCG monitoring. As far as I understand, they have everything hard-coded to port 3401 for Squid SNMP monitoring. That makes a nodePort-based Squid a bit more challenging in a number of ways, unfortunately. If you don't care about whether the WLCG can see your SNMP ports then it's definitely doable to change things over to a Statefulset.

I have some more thoughts but I'll have to follow up later :)

from slate-catalog.

rptaylor avatar rptaylor commented on July 18, 2024

True, if the number of members is > 1 (regardless of whether a deployment or statefulset is used) they could not all use the same nodeport. What is the use case for a nodeport-based squid monitoring service? A nodeport is just a ClusterIP with an extra kubeproxy/ipvs forwarding rule on top; it allows service discovery by an arbitrary local port number instead of a service name. For access from outside the cluster, it requires public IPs on the kubelet nodes, so IIUC you would either have a special (SPOF) node with the right public IP, or you would still need a LB on top across all of the public IPs of the nodes. That being said it can be a bit easier in some environments than setting up a k8s-native LBaaS.

The "standard" mechanism for exposing external access to cluster services is ingress (and nearly all the ingress providers have TCP and UDP extensions). Anyway for our use case we would not encounter the nodePort issue; we just need ClusterIP services and we would create Traefik IngressRoutes to expose them externally. (The thought did occur to me that it might be useful to be able to configure the squid client and monitoring services differently instead of having them together.) Anyway figuring out a way to monitor each one of multiple squids individually is an issue, but to me it seems orthogonal to the network access method and deployment vs statefulset question.

It would be possible in principle to have e.g. for a statefulset of size 3, 3 clusterIP services and 3 ingresses, to access the monitoring of each squid independently. It might involve a for loop in Helm and some tricks but should be doable I think.

from slate-catalog.

rptaylor avatar rptaylor commented on July 18, 2024

Using ingress also provides a way to avoid issues with WLCG monitoring using the hardcoded port 3401; you can externally expose any arbitrary port and map it to any cluster service (details depend on ingress provider).

from slate-catalog.

LincolnBryant avatar LincolnBryant commented on July 18, 2024

Right, as is currently configured the squid is essentially bound to a single host which is indeed a SPOF. We're running a bunch of tiny clusters distributed around the US that are mostly 1 node anyhow at the moment! For our use cases we have clusters that are using Squid in K8S as a replacement for Squid e.g. in a VM or on bare metal. So public IP is required because all of the workers accessing the squid aren't in K8S.

Ingress is certainly possible too although some folks expressed concerns that the ingress wouldn't be able to handle the high number of packets per second that a heavily utilized Squid would require. I haven't tested it, but willing to see how it goes. We're largely using NGINX as our Ingress controller for SLATE - I haven't had much experience with trying to route general TCP/UDP packets through ingresses - my impression (from several years ago now) was that it didn't work all that well.

What I would prefer of course is to just have the WLCG monitoring be a bit more amenable to cloud native ways of deploying things :)

So anyhow - for your use case- do I understand correctly that using a StatefulSet with VolumeClaimTemplates would cover your needs? I have experience setting that up for other software, happy to try it here. Most of our users are actually just using hostPath (I know, not desirable) so we'd want to switch them over to using something like the local persistent volume provider instead in that case.

from slate-catalog.

rptaylor avatar rptaylor commented on July 18, 2024

Certainly some ingress controllers are more performant than others under high load, but a lot of massive web-scale apps runs on k8s behind ingress or service meshes. A single ingress pod should typically be able to handle ~ 10K HTTP RPS without much trouble and you can scale up as many as needed; though the TCP performance may be different (in principle I would think it should be approximately comparable to anything else that involves routing to another node, like nodePort or NAT). We moved away from the NGINX community controller due to poor performance and security issues.

Anyway our squid clients (compute jobs) are all inside the cluster so it won't be an issue for us, nor for the other users you mention if they continue to use nodeport and number of squid pods = 1.

do I understand correctly that using a StatefulSet with VolumeClaimTemplates would cover your needs?

Yep I think so!

For the record I'm also looking at https://github.com/sciencebox/charts/tree/master/frontier-squid

from slate-catalog.

rptaylor avatar rptaylor commented on July 18, 2024

Actually it seems that frontier-squid still deletes the cache every time it is restarted (because apparently squid can corrupt the cache on restart), so persistent storage wouldn't be useful anyway. :/
So a deployment > 1 with ephemeral storage would work. The sciencebox chart already does that so I'm going to give it a try.

from slate-catalog.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.