The deployment <a href="https://github.com/slateci/slate-catalog/blob/master/stable/os

frontier-squid is a deployment of size 1 and can not be scaled up about slate-catalog HOT 6 OPEN

slateci commented on July 18, 2024

frontier-squid is a deployment of size 1 and can not be scaled up

from slate-catalog.

Comments (6)

LincolnBryant commented on July 18, 2024

So one of the problems we had with scaling up squids is that the monitoring port is impossible(?) to change in the WLCG monitoring. As far as I understand, they have everything hard-coded to port 3401 for Squid SNMP monitoring. That makes a nodePort-based Squid a bit more challenging in a number of ways, unfortunately. If you don't care about whether the WLCG can see your SNMP ports then it's definitely doable to change things over to a Statefulset.

I have some more thoughts but I'll have to follow up later :)

from slate-catalog.

rptaylor commented on July 18, 2024

True, if the number of members is > 1 (regardless of whether a deployment or statefulset is used) they could not all use the same nodeport. What is the use case for a nodeport-based squid monitoring service? A nodeport is just a ClusterIP with an extra kubeproxy/ipvs forwarding rule on top; it allows service discovery by an arbitrary local port number instead of a service name. For access from outside the cluster, it requires public IPs on the kubelet nodes, so IIUC you would either have a special (SPOF) node with the right public IP, or you would still need a LB on top across all of the public IPs of the nodes. That being said it can be a bit easier in some environments than setting up a k8s-native LBaaS.

The "standard" mechanism for exposing external access to cluster services is ingress (and nearly all the ingress providers have TCP and UDP extensions). Anyway for our use case we would not encounter the nodePort issue; we just need ClusterIP services and we would create Traefik IngressRoutes to expose them externally. (The thought did occur to me that it might be useful to be able to configure the squid client and monitoring services differently instead of having them together.) Anyway figuring out a way to monitor each one of multiple squids individually is an issue, but to me it seems orthogonal to the network access method and deployment vs statefulset question.

It would be possible in principle to have e.g. for a statefulset of size 3, 3 clusterIP services and 3 ingresses, to access the monitoring of each squid independently. It might involve a for loop in Helm and some tricks but should be doable I think.

from slate-catalog.

rptaylor commented on July 18, 2024

Using ingress also provides a way to avoid issues with WLCG monitoring using the hardcoded port 3401; you can externally expose any arbitrary port and map it to any cluster service (details depend on ingress provider).

from slate-catalog.

LincolnBryant commented on July 18, 2024

Right, as is currently configured the squid is essentially bound to a single host which is indeed a SPOF. We're running a bunch of tiny clusters distributed around the US that are mostly 1 node anyhow at the moment! For our use cases we have clusters that are using Squid in K8S as a replacement for Squid e.g. in a VM or on bare metal. So public IP is required because all of the workers accessing the squid aren't in K8S.

Ingress is certainly possible too although some folks expressed concerns that the ingress wouldn't be able to handle the high number of packets per second that a heavily utilized Squid would require. I haven't tested it, but willing to see how it goes. We're largely using NGINX as our Ingress controller for SLATE - I haven't had much experience with trying to route general TCP/UDP packets through ingresses - my impression (from several years ago now) was that it didn't work all that well.

What I would prefer of course is to just have the WLCG monitoring be a bit more amenable to cloud native ways of deploying things :)

So anyhow - for your use case- do I understand correctly that using a StatefulSet with VolumeClaimTemplates would cover your needs? I have experience setting that up for other software, happy to try it here. Most of our users are actually just using hostPath (I know, not desirable) so we'd want to switch them over to using something like the local persistent volume provider instead in that case.

from slate-catalog.

rptaylor commented on July 18, 2024

Certainly some ingress controllers are more performant than others under high load, but a lot of massive web-scale apps runs on k8s behind ingress or service meshes. A single ingress pod should typically be able to handle ~ 10K HTTP RPS without much trouble and you can scale up as many as needed; though the TCP performance may be different (in principle I would think it should be approximately comparable to anything else that involves routing to another node, like nodePort or NAT). We moved away from the NGINX community controller due to poor performance and security issues.

Anyway our squid clients (compute jobs) are all inside the cluster so it won't be an issue for us, nor for the other users you mention if they continue to use nodeport and number of squid pods = 1.

do I understand correctly that using a StatefulSet with VolumeClaimTemplates would cover your needs?

Yep I think so!

For the record I'm also looking at https://github.com/sciencebox/charts/tree/master/frontier-squid

from slate-catalog.

rptaylor commented on July 18, 2024

Actually it seems that frontier-squid still deletes the cache every time it is restarted (because apparently squid can corrupt the cache on restart), so persistent storage wouldn't be useful anyway. :/
So a deployment > 1 with ephemeral storage would work. The sciencebox chart already does that so I'm going to give it a try.

from slate-catalog.

frontier-squid is a deployment of size 1 and can not be scaled up about slate-catalog HOT 6 OPEN

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent