l7mp / stunner-gateway-operator Goto Github PK

STUNner Kubernetes Gateway Operator

License: Apache License 2.0

Makefile 1.28% Dockerfile 0.10% Go 98.62%

gateway kubernetes kubernetes-operator webrtc

stunner-gateway-operator's Introduction

l7mp: A L7 Multiprotocol Proxy and Service Mesh

[L7mp is currently under construction, with many advertised features untested, not working as promised, or completely missing.]

L7mp is an experimental Layer-7, multiprotocol service proxy and a service mesh framework. The emphasis is on multiprotocol support, which lets l7mp to handle lots of transport- and application-layer network protocols natively, not just the usual TCP/HTTP, and transparently convert between different protocol encapsulations. The intention for l7mp is to serve as an incubator project to prototype the main service mesh features that are indispensable to support network-intensive legacy/non-HTTP applications seamlessly in Kubernetes.

The distribution contains an l7mp proxy component and a service mesh operator for Kubernetes.

The l7mp proxy is a programmable proxy very similar in nature to Envoy. The difference is that the l7mp proxy is purposely built from the bottom-up to support multiprotocol operations, in that it can stitch an arbitrary number of application-level traffic streams together into an end-to-end stream in a protocol-agnostic manner; e.g., you can pipe a UNIX domain socket to a WebSocket stream and vice versa and it should just work as expected. The proxy is written in a high-level language framework, node.js, which makes it particularly easy to extend: adding a new protocol to l7mp is a matter of implementing a custom listener and a cluster, usually about a hundred lines of Javascript code. Meanwhile, a tc/ebpf-based kernel acceleration service is in development to mitigate the Javascript performance tax.

The l7mp service mesh operator can be used to manage a legion of l7mp gateway and sidecar proxy instances seamlessly. It allows to enforce a rich set of high-level traffic management and observability policies throughout an entire cluster, enjoying the convenience of a high-level Kubernetes API, much akin to the Istio or the Service Mesh Interface API.

The l7mp framework is work-in-progress. This means that at any instance of time some features may not work as advertised or may not work at all, and some critical features, including the security API, are left for further development. Yet, l7mp is already capable enough to serve as a demonstrator to get a glimpse into the multiprotocol future of the service mesh concept.

The l7mp data plane

The data plane of the l7mp framework is comprised by a set of l7mp proxy instances. The l7mp proxy supports multiple deployment models; e.g., it can be deployed as an ingress gateway to feed traffic with exotic protocol encapsulations into a Kuberntes cluster, or as a sidecar proxy to expose a legacy UDP/SCTP application to a Kuberntes cluster using a cloud-native protocol.

The l7mp proxy is modeled after Envoy, in that it uses similar abstractions (Listeners, Clusters, etc.), but in contrast to Envoy that is mostly HTTP/TCP-centric, l7mp is optimized for persistent, long-lived UDP-based media and tunneling protocol streams. The l7mp proxy features an extended routing API, which allows to transparently pipe application streams across diverse protocol encapsulations, with automatic and transparent protocol transformation, native support for datagram- and byte-streams, stream multiplexing and demultiplexing, encapsulation/decapsulation, etc.

Considering the strong emphasis on multiprotocol support, the l7mp proxy may actually be closer in nature to socat(1) than to Envoy, but it is dynamically configurable via a REST API in contrast to socat(1) which is a static CLI tool (in turn socat it is much more feature-complete).

The l7mp proxy is written in Javascript/Node.js. This way, it is much simpler and easier to extend than Envoy or socat, but at the same time it is also much slower. It does not have to be that way though; a tc/ebpf-based proxy-acceleration framework is under construction that would enable l7mp to run at hundreds of thousands of packets per second speed.

The l7mp control plane

The l7mp distribution contains a Kubernetes operator that makes it possible to deploy and configure multiple instances of l7mp as sidecar proxies and service/API gateways, in a framework that can be best described as a multiprotocol service mesh. The operator uses the same high-level concepts as most service mesh frameworks (i.e., VirtualServices), but it contains a number of extensions (the Route and the Target custom resources) that allow the user to precisely control the way traffic is routed across the cluster.

Deployment models

Currently there are two ways to deploy l7mp: either the l7mp proxy is deployed in a standalone mode (e.g., as a gateway or a sidecar proxy) in which case each distinct l7mp proxy instance needs to be configured (using a static config file of via the l7mp proxy REST API), or it is used in conjunction with the l7mp service mesh operator for Kubernetes, which makes it possible to manage possibly large numbers of l7mp proxy instances enjoying the convenience of a high-level Kubernetes API.

The l7mp service mesh

In this short introduction we use Minikube to demonstrate the installation of the l7mp service mesh. Of course, using the below helm charts will make it possible to deploy l7mp in any Kubernetes cluster.

Set up l7mp inside a Minikube cluster

First, install kubectl and helm:

For installing kubectl and minikube please follow this guide: Install Tools
For installing helm please follow this guide: Installing Helm. Note that with Helm 2 the below commands may take a bit different form.

Then, bootstrap your minikube cluster and deploy the l7mp-ingress helm chart.

minikube start
helm repo add l7mp https://l7mp.io/charts
helm repo update
helm install l7mp l7mp/l7mp-ingress

WARNING: the l7mp-ingress chart will automatically (1) deploy the l7mp proxy in the host network namespace of all your Kubernetes nodes and (2) open up two HTTP ports (the controller port 1234 and the Prometheus scraping port 8080) for unrestricted external access on each of your nodes. If your nodes are available externally on these ports, this will allow unauthorized access to the ingress gateways of your cluster. Before installing this helm chart, make sure that you filter port 1234 and 8080 on your cloud load-balancer. Use this chart only for testing, never deploy in production unless you know the potential security implications.

This configuration will deploy the following components into the default namespace:

l7mp-ingress: an l7mp proxy pod at each node (a DaemonSet) sharing the network namespace of the host (hostNetwork=true), plus a Kubernetes service called l7mp-ingress. The proxies make up the data-plane of the l7mp service mesh.
l7mp-operator: a control plane pod that takes a high-level mesh configuration as a set of Kubernetes Custom Resource objects (i.e., VitualServices, Targets, etc.) as input and creates the appropriate data-plane configuration, i.e., a series of REST calls to the l7mp proxies, to map the high-level intent to the data plane.

In order to add the l7mp Prometheus toolchain into the monitoring namespace for automatically surfacing data-plane metrics from the l7mp proxies, install the l7mp-prometheus chart:

helm install l7mp-prometheus l7mp/l7mp-prometheus

After the installation finishes, your Prometheus instance will be available on the NodePort 30900.

You can check the status of your l7mp deployment as usual:

kubectl get pod,svc,vsvc,target,rule -o wide -n default -n monitoring

You should see an output like:

NAME                                      READY   STATUS    RESTARTS   AGE     IP              NODE       NOMINATED NODE   READINESS GATES
pod/alertmanager-alertmanager-0           2/2     Running   0          2m34s   172.17.0.8      minikube   <none>           <none>
pod/grafana-86b84774bb-7s7kq              1/1     Running   0          3m10s   172.17.0.5      minikube   <none>           <none>
pod/kube-state-metrics-7df77cbbd6-x27x5   3/3     Running   0          3m10s   172.17.0.4      minikube   <none>           <none>
pod/node-exporter-j59fj                   2/2     Running   0          3m10s   192.168.39.45   minikube   <none>           <none>
pod/prometheus-operator-9db5cb44b-hf7cq   1/1     Running   0          3m10s   172.17.0.6      minikube   <none>           <none>
pod/prometheus-prometheus-0               2/2     Running   1          2m33s   172.17.0.9      minikube   <none>           <none>
pod/prometheus-prometheus-1               2/2     Running   1          2m33s   172.17.0.10     minikube   <none>           <none>

NAME                            TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE     SELECTOR
service/alertmanager            NodePort    10.102.201.47    <none>        9093:30903/TCP               3m10s   alertmanager=alertmanager
service/alertmanager-operated   ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP   2m34s   app=alertmanager
service/grafana                 NodePort    10.104.212.103   <none>        80:30901/TCP                 3m10s   app=grafana
service/kube-state-metrics      ClusterIP   None             <none>        8443/TCP,9443/TCP            3m10s   app.kubernetes.io/name=kube-state-metrics
service/node-exporter           ClusterIP   None             <none>        9100/TCP                     3m10s   app.kubernetes.io/name=node-exporter
service/prometheus              NodePort    10.104.58.199    <none>        9090:30900/TCP               3m10s   app=prometheus
service/prometheus-operated     ClusterIP   None             <none>        9090/TCP                     2m34s   app=prometheus
service/prometheus-operator     ClusterIP   None             <none>        8080/TCP                     3m10s   app.kubernetes.io/component=controller,app.kubernetes.io/name=prometheus-operator

You are ready to go! Enjoy using l7mp.

Query configuration and manage sessions

At any point in time you can directly read the configuration of the l7mp proxies using the l7mp REST API. By default, the l7mp proxy HTTP REST API port is opened at port 1234 on all proxy pods. This is extremely useful to check your mesh configuration for debuging purposes, but as mentioned above it also opens a considerable security hole if the port is reachable from outside your cluster.

The below call returns the whole configuration of the ingress gateway l7mp proxy:

curl http://$(minikube ip):1234/api/v1/config

To query the directory of active connections through the data plane and delete the session named session-name, you can use the below REST API calls:

curl http://$(minikube ip):1234/api/v1/sessions
curl -iX DELETE http://$(minikube ip):1234/api/v1/sessions/<session-name>

Usage example:

Applying the below configuration will expose the kube-dns Kubernetes system DNS service through the l7mp ingress gateway on port 5053. Note that, depending on the type of DNS service deployed, the below may or may not work in your own cluster.

kubectl apply -f - <<EOF
apiVersion: l7mp.io/v1
kind: VirtualService
metadata:
  name: kube-dns-vsvc
spec:
  selector:
    matchLabels:
      app: l7mp-ingress
  listener:
    spec:
      UDP:
        port: 5053
    rules:
      - action:
          route:
            destination:
              spec:
                UDP:
                  port: 53
              endpoints:
                - spec: { address:  "kube-dns.kube-system.svc.cluster.local" }
EOF

In an on itself, this configuration does not make anything fancier than exposing the kube-dns service using a NodePort. The additional features provided by l7mp, including routing, timeouts/retries, load-balancing and monitoring, can be enabled by customizing this VirtualService spec. For more information on the use of the l7mp service mesh, consult the Tasks section in the documentation.

Test

Administer a DNS query to your Kubernetes cluster:

dig @$(minikube ip) +timeout=1 +notcp +short kube-dns.kube-system.svc.cluster.local -p 5053
10.96.0.10

The above call will send a DNS query to the minikube cluster, which the l7mp ingress gateway will properly route to the kube-dns service (after querying the same DNS service for the ClusterIP corresponding to kube-dns) and deliver the result back to the sender.

Clean up

Delete the VirtualService we created above:

kubectl delete virtualservice kube-dns-vsvc

To delete the entire l7mp service mesh, Simply delete with helm. Note that this will not remove the Custom Resource Definitions installed by the l7mp helm chart, you will need to do that manually:

helm delete l7mp

The l7mp proxy

Installation

Standalone installation

Use the below to install the l7mp proxy from the official l7mp distribution at npm.js.

npm install l7mp
npm test

At least Node.js v14 is required.

Docker installation

Pull the official image by docker pull l7mp/l7mp:latest or use the enclosed Dockerfile to deploy the l7mp proxy.

Deploy into Kubernetes

Use the below configuration to deploy l7mp as an ingress gateway in your Kubernetes cluster.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: l7mp-ingress-gw
  labels:
    app: l7mp-ingress-gw
spec:
  selector:
    matchLabels:
      app: l7mp-ingress-gw
  template:
    metadata:
      labels:
        app: l7mp-ingress-gw
    spec:
      volumes:
        - name: l7mp-ingress-gw-config
          configMap:
            name: l7mp-ingress-gw
      containers:
      - name: l7mp
        image: l7mp/l7mp:latest
        imagePullPolicy: IfNotPresent
        command: [ "node" ]
        args: [ "l7mp-proxy.js", "-c", "config/l7mp-ingress-gw.yaml", "-s", "-l", "info" ]
        ports:
        - containerPort: 1234
        volumeMounts:
          - name: l7mp-ingress-gw-config
            mountPath: /app/config
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet

---

# Controller listening on 1234
apiVersion: v1
kind: ConfigMap
metadata:
  name: l7mp-ingress-gw
data:
  l7mp-ingress-gw.yaml: |
    admin:
      log_level: info
      log_file: stdout
      access_log_path: /tmp/admin_access.log
    listeners:
      - name: controller-listener
        spec: { protocol: HTTP, port: 1234 }
        rules:
          - action:
              route:
                cluster:
                  spec: { protocol: L7mpController }

Usage example

Run

The below usage examples assume that the l7mp proxy is deployed in standalone mode and it is available on the localhost.

Run l7mp locally with a sample static configuration.

cd node_modules/l7mp
node l7mp-proxy.js -c config/l7mp-minimal.yaml -l warn -s

Configuration is accepted either in YAML format (if the extension is .yaml) or JSON (otherwise). Command line arguments override static configuration parameters.

Query configuration

The sample configuration will fire up a HTTP listener on port 1234 and route it to the l7mp controller that serves the l7mp REST API. This API can be used to query or configure the proxy on the fly; e.g., the below will dump the full configuration in JSON format:

curl http://localhost:1234/api/v1/config

For a list of all REST API endpoints, see the l7mp OpenAPI specs.

Manage sessions

On top of the static configuration, the response contains a list of sessions, enumerating the set of active (connected) streams inside l7mp. You can list the live sessions explicitly as follows:

curl http://localhost:1234/api/v1/sessions

You should see only a single HTTP session: this session was created by the l7mp proxy to route the REST API query from the HTTP listener to the controller endpoint and this session happens to be active when the session list request is issued.

You can also delete any session (suppose its name is session-name) via the below REST API call.

curl -iX DELETE http://localhost:1234/api/v1/sessions/<session-name>

Add a new cluster

Add a new WebSocket cluster named ws-cluster that will connect to an upstream WebSocket service with a single endpoint at localhost:16000.

curl -iX POST --header 'Content-Type:text/x-yaml' --data-binary @- <<EOF  http://localhost:1234/api/v1/clusters
cluster:
  name: ws-cluster
  spec: { protocol: "WebSocket", port: 16000 }
  endpoints:
    - spec: { address:  "127.0.0.1" }
EOF

Note that the REST API accepts both JSON and YAML configs (YAML will be converted to JSON internally). If multiple endpoints are added, l7mp will load-balance among these; e.g., the below will distribute connections across 3 upstream endpoints in proportion 3:1:1 and also implement sticky sessions, by applying consistent hashing on the source IP address of each connection.

curl -iX POST --header 'Content-Type:text/x-yaml' --data-binary @- <<EOF  http://localhost:1234/api/v1/clusters
cluster:
  name: ws-cluster-with-sticky-sessions
  spec: { protocol: "WebSocket", port: 16000 }
  endpoints:
    - spec: { address:  "127.0.0.1" }
      weight: 3
    - spec: { address:  "127.0.0.2" }
    - spec: { address:  "127.0.0.3" }
  loadbalancer:
    policy: "ConsistentHash"
    key: "IP/src_addr"
EOF

Add a new listener and a route

Now add a new UDP listener called udp-listener at port 15000 that will accept connections from an IP address but only with source port 15001, and route the received connections to the above cluster (which, recall, we named as ws-cluster).

curl -iX POST --header 'Content-Type:text/x-yaml' --data-binary @- <<EOF  http://localhost:1234/api/v1/listeners
listener:
  name: udp-listener
  spec: { protocol: UDP, port: 15000, connect: {port: 15001} }
  rules:
    - action:
        route:
          destination: ws-cluster
          ingress:
            - spec: { protocol: Logger }
          retry: {retry_on: always, num_retries: 3, timeout: 2000}
EOF

There is an important quirk here. The route spec in the above REST API call specifies a new cluster (the one with the protocol Logger), but this specification is embedded into the route definition. Here, Logger is a special transform cluster that will instruct l7mp to log all traffic arriving from the stream's source (the UDP listener) to the destination (the WebSocket cluster) to the standard output. Of course, we could have added this cluster in a separate REST API call as well:

curl -iX POST --header 'Content-Type:text/x-yaml' --data-binary @- <<EOF  http://localhost:1234/api/v1/clusters
cluster:
  name: logger-cluster
  spec: { protocol: "Logger" }
EOF

And then we could let the route to simply refer to this cluster by name:

curl -iX POST --header 'Content-Type:text/x-yaml' --data-binary @- <<EOF  http://localhost:1234/api/v1/listeners
listener:
  name: udp-listener-with-no-embedded-cluster-def
  spec: { protocol: UDP, port: 15000, connect: {port: 15001} }
  rules:
    - action:
        route:
          destination: ws-cluster
          ingress:
            - logger-cluster
          retry: {retry_on: always, num_retries: 3, timeout: 2000}
EOF

This flexibility of l7mp to accept explicit and implicit (embedded) configurations is available in essentially all REST API calls, and it greatly simplifies the use of the API.

Routing

On session creation, l7mp will demultiplex the bidirectional stream received at the listener into two uni-directional streams: the ingress stream (in the direction from the source/listener to the destination/cluster) will be routed through the Logger transform cluster. Theoretically, a transform cluster is free to apply any modification it wants to the traffic passing through it, it can be local (built into the l7mp datapath, like Logger) or remote (e.g., another WebSocket cluster), the only requirement is that the cluster endpoint listen at the specified address on the specified port and send the modified traffic back to l7mp. For now, the Logger cluster just dumps the content of the stream without transforming it in any ways, but you get the point. The returned stream is then piped to the cluster ws-cluster. In the egress direction (from the destination/cluster back to the source/listener), no transformation occurs as the egress chain spec is missing.

The ingress and the egress routes are specified and handled separately. Both routes can contain a list of any number of transform clusters that will be chained sequentially, automatically performing transparent protocol and payload conversion along the way. Note that datagram boundaries are preserved during transformation whenever possible, and when not (i.e., piping a UDP stream to a TCP cluster will lose segmentation), l7mp issues a warning.

The above should yield the routes:

ingress: udp-listener -> logger-cluster -> ws-cluster
egress:  ws-cluster -> udp-listener

Retries and timeouts

Route specifications may contain a retry spec, in order to describe what to do when one of the connected endpoints fail. By the above spec, l7mp will automatically retry the connection at most 3 times both on connection setup errors and disconnect events on already established connections, waiting each time 2000 ms for the stream to be successfully re-established.

Test the connection

To complete the connection, fire up a socat(1) sender (don't forget to bind the sender to 15001, otherwise l7mp, which connects back to this port, will not accept the connection):

socat - udp:localhost:15000,sourceport=15001

Then start a websocat receiver:

websocat -Eb ws-l:127.0.0.1:16000 -

What you type in the sender should now appear at the receiver verbatim, and the l7mp proxy should report everything that passes from the sender to the receiver on the standard output. Note that in the reverse direction, i.e., from the receiver to the sender, nothing will be logged, since the Logger was added to the ingress route only but not to the egress route.

Clean up

Provided that the new session is named session-name (l7mp automatically assigns a unique name to each session, you can check this by issuing a GET request to the API endpoint /api/v1/sessions), you can delete this session as follows:

curl -iX DELETE http://localhost:1234/api/v1/sessions/<session-name>

In addition, use the below to remove the udp-listener and ws-cluster:

curl -iX DELETE http://localhost:1234/api/v1/listeners/udp-listener
curl -iX DELETE http://localhost:1234/api/v1/clusters/ws-cluster

Note however that this will delete only the named listener and the cluster even though, as mentioned above, these objects may contain several embedded objects; e.g., udp-listener contains and implicit rulelist (a match-action table) with a single match-all rule, plus a route and an embedded cluster spec ("Logger"), and these will not be removed by the above call.

You can use the below recursive version of the delete operations to delete all the embedded sub-objects of an object, but bear in mind that this will remove everything that was implciitly defined by udp-listener and ws-cluster and this includes all the sessions emitted by the listener and all the sessions routed via the cluster.

curl -iX DELETE http://localhost:1234/api/v1/listeners/udp-listener?recursive=true
curl -iX DELETE http://localhost:1234/api/v1/clusters/ws-cluster?recursive=true

You can avoid this by not using embedded defs or, if this is too inconvenient, explicitly naming all embedded objects and then using the specific APIs (the RuleList API, Rule API, etc.) to clean up each object selectively.

Multiprotocol Support

The main feature l7mp intends to get right is multiprotocol support. While l7mp is optimized for persistent, long-lived UDP-based media and tunneling protocol streams, and hence the support for the usual HTTP protocol suite is incomplete as of now, it should already be pretty capable as a general purpose multiprotocol proxy and service mesh, supporting lots of built-in transport and application-layer protocols. Below is a summary of the protocols supported by l7mp and the current status of the implementations.

Type	Protocol	Session ID	Type	Role	Mode	Re/Lb	Status
Remote	UDP	IP 5-tuple	datagram-stream	l/c	singleton/server	yes/yes	Full
	TCP	IP 5-tuple	byte-stream	l/c	server	yes/yes	Full
	HTTP	IP 5-tuple	byte-stream	l	server	yes/yes	Partial
	WebSocket	IP 5-tuple + HTTP	datagram-stream	l/c	server	yes/yes	Full
	JSONSocket	IP 5-tuple + JSON header	datagram-stream	l/c	server	yes/yes	Full
	SCTP	IP 5-tuple	datagram-stream	l/c	server	yes/yes	TODO
	AF_PACKET	file desc	datagram-stream	l/c	singleton	no/no	TODO
Local	STDIO-fork	N/A	byte-stream	c	singleton	no/no	Full
	UNIX/stream	file desc/path	byte-stream	l/c	server	yes/yes	Full
	UNIX/dgram	file desc/path	datagram-stream	l/c	singleton	no/no	TODO
	PIPE	file desc/path	byte-stream	l/c	singleton	no/no	TODO
Transform	Stdio	N/A	byte-stream	c	singleton	yes/no	Full
	Echo	N/A	datagram-stream	c	singleton	yes/no	Full
	Discard	N/A	datagram-stream	c	singleton	yes/no	Full
	Logger	N/A	datagram-stream	c	singleton	yes/no	Full
	JSONENcap	N/A	datagram-stream	c	singleton	yes/no	Full
	JSONDecap	N/A	datagram-stream	c	singleton	yes/no	Full

The standard protocols, like TCP, HTTP/1.1 and HTTP/2 (although only listener/server side at the moment), WebSocket, and Unix Domain Socket (of the byte-stream type, see below) are fully supported, and for plain UDP there are two modes available: in the "UDP singleton mode" l7mp acts as a "connected" UDP server that is statically tied/connected to a downstream remote IP/port pair, while in "UDP server mode" l7mp emits a new "connected" UDP session for each packet received with a new IP 5-tuple. In addition, JSONSocket is a very simple "UDP equivalent of WebSocket" that allows to enrich a plain UDP stream with arbitrary JSON encoded metadata; see the spec here. Finally, SCTP is a reliable message transport protocol widely used in telco applications and AF_PACKET would allow to send and receive raw L2/Ethernet or L3/IP packets on a stream; currently adding proper support for these protocols is a TODO.

Furthermore, there is a set of custom pseudo-protocols included in the l7mp proxy to simplify debugging and troubleshooting: the "Stdio" protocol makes it possible to pipe a stream to the l7mp proxy's stdin/stdout, the "Echo" protocol implements a simple Echo server behavior which writes back everything it reads to the input stream, "Discard" simply blackholes everyting it receives, and finally "Logger" is like the Echo protocol but it also writes everything that goes through it to a file or to the standard output. Finally, there are a couple of additional protocols (currently unimplemented) to further improve the usability of l7mp (see the equivalents in socat(1)): "STDIO-fork" is a protocol for communicating with a forked process through STDIO/STDOUT and PIPE uses standard UNIX pipes to do the same.

There are two types of streams supported by L7mp: a "byte-stream" (like TCP or Unix Domain Sockets in SOCK_STREAM mode) is a bidirectional stream that ignores segmentation/message boundaries, while "datagram-stream" is the same but it prefers segmentation/message boundaries whenever possible (e.g., UDP or WebSocket). The l7mp proxy warns if a datagram-stream type stream is routed to a byte-stream protocol, because this would lead to a loss of message segmentation. In addition, protocols may support any or both of the following two modes: a "singleton" mode protocol accepts only a single connection (e.g., a fully connected UDP listener will emit only a single session) while a "server" mode listener may accept multiple client connections, emitting a separate session for each connection received (e.g., a TCP or a HTTP listener).

A protocol is marked with a flag l if it has a listener implementation in l7mp, acting as a server-side protocol "plug" that listens to incoming connections from downstream peers and emits new sessions, and with flag c if it implements the cluster side, i.e., the client-side of the protocol that can route a connection to an upstream service and load-balance across a set of remote endpoints, Re means that the protocol supports retries and Lb indicates that load-balancing support is also available for the protocol.

Kernel offload

To enhance performance, l7mp provides an experimental kernel offload feature. The offload uses the tc-bpf Linux kernel mechanism and supports UDP traffic. We show the usage and details of the kernel offload is in its dedicated documentation.

The l7mp service mesh

The l7mp service mesh operator for Kubernetes is currently under construction, more details to follow soon.

License

MIT License

stunner-gateway-operator's People

Contributors

Stargazers

Watchers

Forkers

nmate rg0now davidkornel iterion

stunner-gateway-operator's Issues

AWS support: Fallback to LBStatus.Hostname with LBStatus.IP is not available

The gateway operator is hardwired to use the .Status.loadBalancer.ingress[*].IP field for finding the public IP for a listener/gateway, but rather then assigning a public IP AWS rather creates a DNS FWDN and writes that into .Status.loadBalancer.ingress[*].Hostname. Correspondingly, the public IP is never written into the running-config, which breaks stuff all over the place (e.g., turncat k8s://...). This issue aims at fixing this.

The plan is to add a a heuristics here that when .IP is not available but .Hostname is then we fall back to the latter (see https://pkg.go.dev/k8s.io/api/core/v1#LoadBalancerIngress).

How to configure imagePullSecrets for the Dataplane?

I noticed that the imagePullSecrets Deployment config is not exposed in the Dataplane. Which prohibit use of authenticated pull from dockerhub and/or alternative registry, which are necessary to bypass the number of pull limit enforced on unauthenticated pulls.

Insufficient memory for config-watcher container

As it stays in the title the current memory request is not sufficient for the config-watcher container in case it is used.
On GKE it will be immediately killed with an OOMKilled error. Right now it is limited to 20M of memory which should be raised or made configurable.
https://github.com/l7mp/stunner-gateway-operator/blob/main/internal/config/dataplane.go#L36

Introduce aliases for specifying the protocol in Gateway listeners

Currently L4 and L7 protocols are mapped to a single Protocol field in Gateway listeners (this is a limitation of the Gateway API that does not have an AppProtocol field). This causes all sorts of problems in STUNner where different L7 protocols can be combined (or will be combined in a later release) with different L4 protocols.

Currently setting the listener protocol to UDP actually means to use "TURN over a UDP transport", TCP means "TURN over TCP", etc. Later, we may want to introduce pure-UDP and pure TCP-listeners (i.e., without TURN), but given the current state of the art there will be no way to distinguish between the two since, for instance, Protocol=UDP may mean either "TURN over UDP" or "pure UDP".

The plan is to introduce more descriptive names for specifying TURN listener protocols as follows:

TURN-UDP to mean "TURN over UDP" (equivalent with the current UDP setting),
TURN-TCP to mean "TURN over TCP" (equivalent with the current TCP setting),
TURN-TLS to mean "TURN over TLS" (equivalent with the current TLS setting),
TURN-DTLS to mean "TURN over DTLS" (equivalent with the current DTLS setting).

Currently these "long protocol names" would be simple aliases to the existing "short names", but in the long run we will deprecate the short names and mandate the long names instead (i.e., we would require TURN-UDP to start a "TURN over UDP" listener, UDP would fall back to a pure-UDP listener).

This issue is tracking the progress on implementing this change.

Upgrade to Gateway API v1.0

In the latest versions of Kubernetes the deprecated v1alpha2 API has been removed definitively. I cannot provide any docs but I hope to find some exact details. I gathered this information from the Kubernetes Slack channel.

This error is raised only when the gateway class and gateway are being updated. Not sure @rg0now if you know about this but it doesn't matter what API version we create the resources with. It will be overwritten on the first upgrade event from the operator. In this line the API version will be ignored and will be set to v1alpha2 and used further on during the whole process.

The main problem that this raises is that the operator won't be able to run on the latest versions of k8s.

Allow ServiceImports as UDPRoute backends

In order to decide whether to allow a client to reach a particular peer IP, STUNner consults the list of pod IP addresses belonging to the backends assigned to the target UDPRoute. Currently only Service resources are allowed as backends. However, in a multi-cluster setting services globally exported across the clusterset will appear as ServiceImport objects locally (and not as regular Services), with the assumption that ServiceImports and Services are essentially equivalent in that wherever one can use a Service one could also use a ServiceImport instead. Currently, however, STUNner accepts only Services as backends.

This issue is tracking the progress on extending the operator to accept ServiceImports are backends.

Rename the authentication types `plaintext` and `longterm`

STUNner' current terminology is doubly confusing:

what we call plaintext is not plain-text at all
we reuse the name longterm from the STUN spec to mean the time-windowed authentication mode but in fact both what we call plaintext and longterm use the STUN "long-term" authentication mechanism.

This issue addresses the introduction of the following aliases in the GatewayConfig authType field:

static as an alias for plaintext,
timewindowed and ephemeral as aliases for longterm,

For backward compatibility, the dataplane API would not change for now.

StaticService error on `master`

This happens on 905c993, maybe I should have used a release instead of master.

The controller frequently fails/restarts with this error message:

2023-07-25T23:20:11.098595965Z  ERROR   ctrl-runtime.controller-runtime.source  if kind is a CRD, it should be installed before calling Start   {"kind": "StaticService.stunner.l7mp.io", "error": "no matches for kind \"StaticService\" in version \"stunner.l7mp.io/v1alpha1\""}
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/source/source.go:143
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:235
k8s.io/apimachinery/pkg/util/wait.WaitForWithContext
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:662
k8s.io/apimachinery/pkg/util/wait.poll
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:596
k8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext
        /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:547
sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1
        /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/source/source.go:136

Use EndpointSlices for endpoint discovery

One of STUNner's main security features is filtering client connections based on the requested peer address: STUNner permits clients to reach all the pods that belong to the backend Services in the target UDPRoute and blocks access to all other pod IPs. This feature relies on STUNner's endpoint discovery mechanism, which makes it possible for the operator to enumerate the pod IPs that belong to a backend Service.

Currently the operator uses Kubernetes Endpoints objects for endpoint discovery. Since Kubernetes v.1.21 there is a new stable API that fixes some of the scalability issues with Endpoints called EndpointSlices. New applications are encouraged to use the new API, which currently the operator does not understand.

This issue is tracking the progress on converting the operator from using Endpoints to EndpointSlices.

Cluster-internal auth secret disclosure

The secret which the stunner auth service uses in ephemeral mode to create stunner credentials is stored in a kubernetes Secret and referenced under authRef: in the stunner GatewayConfig.

While processing the Gateway listeners the stunner gateway operator creates a stunnerd.conf configuration file and stores it in a kubernetes ConfigMap. This ConfigMap contains the aforementioned secret, effectively disclosing something which was previously stored as a Secret in a ConfigMap (which can be accessed with fewer privileges).

Possible solution: Store the stunnerd.conf in a Secret instead of a ConfigMap.

Gateway CRD deployment

I noticed this operator deploys some CRDs for gateway-API but not all of them (for example HTTPRoute). Is there a reason for that?

This can affect other workloads that also install the CRDs or that require all the CRDs when some feature flag is activated.

I propose to have a feature flag to trigger the CRDs installation OR add all the CRDs found at https://github.com/kubernetes-sigs/gateway-api/releases/download/v0.6.2/standard-install.yaml.

I am happy to do the PR if you agree with one of those.

Feature request: Take auth credentials from a Secret

Currently the only way to set the authentication credentials is in plain text in the GatewayConfig. This is not optimal for security reasons and it makes automatic deployment difficult.

Plan: let the GatewayConfig refer to a Secret for the auth credentials and generate the stunner config from there.

Note that the stunnerd config will still include the credentials in plain text.

Name listeners/clusters using the namespace/name of the corresponding Gateway/UDPRoute

There is a problematic corner case due to that we name listeners/clusters using only the name of the corresponding Gateway/UDPRoute objects, but not using the namespace. This results that in certain unfortunate situations we end up with two identically named listeners/clusters, which confuses the route lookup code where it does not know which cluster to use for which listener.

This issue aims at changing the config renderer to as follows:

use the "namespace/name" of a Gateway to name the corresponding listener, and
use the "namespace/name" of a *Route to name the corresponding listener.

This will disambiguate the name of listeners/clusters, which will remove the above corner case and make degugging simpler.

Add support for `admin.metrics_endpoint`

Stunner has a new admin.metrics_endpoint parameter: https://github.com/l7mp/stunner/blob/main/cmd/stunnerd/stunnerd.conf#L5

This configuration parameter is unknown to the operator. As consequence, the operator clears this parameter at an update. This blocks the stunner monitoring feature.

Feature proposal: Gateway Annotation to include all kind of Service.spec fields

Currently, we do not support many Service fields, thus limiting the possible configurations that might be needed in not well-documented/known use cases. The service that is generated by the operator based on Gateway resource has limited functionality without any extra fields.
A Discord issue stated that service.spec.externalIPs field is needed in order to set up an external DNS for the service in order to have control of the publicly accessible address of the service.
JSON formatted Gateway annotation would look like the following:

apiVersion: gateway.networking.k8s.io/v1alpha2
kind: Gateway
metadata:
  name: complex-gateway
  namespace: stunner
  annotations:
    stunner.l7mp.io/extra-spec-fields: {\r\n  \"spec\": {\r\n    \"externalIPs\": [\r\n      \"198.51.100.32\",\r\n      \"1.2.3.4\"\r\n    ]\r\n  }\r\n}
spec:
  gatewayClassName: stunner-gatewayclass
  listeners:
    - name: udp-listener
      port: 3478
      protocol: UDP
      allowedRoutes:
        namespaces:
          from: All

I can think of three different approaches on how to set the annotation field:

#without special characters
{\"spec\":{\"externalIPs\":[\"198.51.100.32\",\"1.2.3.4\"]}}
#without spec explicit top-level spec key
{\"externalIPs\":[\"198.51.100.32\",\"1.2.3.4\"]}
#with the indenting special characters (probably not the best idea)
{\r\n  \"spec\": {\r\n    \"externalIPs\": [\r\n      \"198.51.100.32\",\r\n      \"1.2.3.4\"\r\n    ]\r\n  }\r\n}

What do you think?
What would be a good annotation key instead of stunner.l7mp.io/extra-spec-fields?

Support mixed-protocol (UDP/TCP) LBs for exposing Gateways

Currently creating a Gateway with a TCP listener and a UDP listener will not be correctly exposed in an LB service since many cloud-providers do not support this (currently we expose the first listener, and then all the listeners with the same protocol as the first one). This is suboptimal since many use cases will want to a couple in a single Gateway a UDP TURN listeners plus a TCP/TLS:443 listener that can be used as a fallback, and this is currently blocked by the operator. Support of mixed protocols in Services with type=LoadBalancer: KEP-1435 goes GA in Kubernetes v1.26, but we do not expect general provider support for this in the near future.

Plan: Some Gateway implementations allow users to add an annotation to each Gateway to control whether mixed-protocol LBs should be enforced for this Gateway. The idea is to adopt this solution.

Say, one could disable STUNner's blocking of mixed-protocol LBs for specific Gateways by adding an annotation stunner.l7mp.io/enable-mixed-protocol-lb: true to it:

apiVersion: gateway.networking.k8s.io/v1alpha2
kind: Gateway
metadata:
  name: mixed-protocol-gateway
  annotations:
    stunner.l7mp.io/enable-mixed-protocol-lb: true
spec:
gatewayClassName: stunner-gatewayclass
  listeners:
    - name: udp-listener
      port: 3478
      protocol: UDP
    - name: udp-listener
      port: 3478
      protocol: TCP

Implementation: see here. PR must come with unit tests and an integration test plus docs here.

Add manual public IP setting to Gateway

Currently the stunner-gateway-operator sets the public_address field as the following:

if there is LoadBalancer created, we use its public IP
if not, we use the NodePort
(in both cases the port comes from Gateway.Spec.Listeners[0].Port)

Unfortunately, this does not allow to set the public IP manually, which is anyway supported by the Gateway API (field Addresses[]). In some cases the user may want to set the public IP of the media plane manually. Such a case when Stunner is behind multiple NATs, and it has no option to figure out the public IP through which it is going to receive media.

Accordingly, the desired selection of public IP should go in the following order:

Gateway.Spec.Addresses[0] + Gateway.Spec.Listeners[0].Port
If Address is not set, we use the LoadBalancer IP and the above listener port
If Address is not set and there is no LoadBalancer IP, we use the first node's IP and NodePort

Support static routes

This issue tracks the progress on implementing StaticRoute CRs in STUNner.

A main security feature of STUNner is that it allows clients to access only a selected set of peers through TURN. This makes sure that, even possessing a valid TURN credential, clients may access only the peers explicitly exposed via a UDPRoute.

Currently the only way to specify the allowed peer IPs is via a standard Kubernetes Service: adding a Service to the backendRefs of a UDPRoute will allow clients to access all the Service's pods via STUNner. In addition, this mechanism supports specifying the permitted peer IPs as DNS domains (via Type ExternalName services) or fully specified IP addresses (via selectorless Services).

Unfortunately, currently there is no way to define specific lists of IP prefixes as backends. This will create problems if someone wishes to deploy STUNner as a public TURN server, since in such cases there is no Kubernetes Service that would be usable as a backendRef.

The goal is to implement a StaticRoute custom resource that would allow users to specify a static IP prefix list to which STUNner should permit client access. For instance, the below StaticRoute would allow access to any backend via the udp-gateway and tcp-gateway Gateways.

apiVersion: stunner.l7mp.io/v1alpha1
kind: StaticRoute
metadata:
  name: open-cluster
spec:
  parentRefs:
    - name: udp-gateway
    - name: tcp-gateway
  rules:
    - backends:
        - "0.0.0.0/0"

Note: static routes are fully supported in stunnerd via STATIC type clusters, the only missing piece is exposing this feature via the control plane.

Provide a means to add labels and annotations on `stunnerd` pods

Currently there is no configurable way to add user-defined labels and annotations to the stunnerd pods launched by STUNner to run the data plane for a Gateway. This would be useful, for instance, to enable/disable automatic sidecar proxy injection (e.g., for linkerd) or specify the sidecar proxy parameters.

One possible workflow to support this feature would be to create the Gateway from a separate Dataplane CRD and then simply copy all labels and annotations from the Dataplane into the pods created for the Gateway. If you want different labels/annotations for different Gateways, launch them from a different Dataplane via the GatewayConfig.

This issue is to initiate the discussion on this feature and track implementation.

The stunner-gateway-operator-controller-manager is failing to start due to the Dataplane CRD not being installed before the controller starts.

I installed the dev version of stunner-gateway-operator using helm(Chart version stunner-gateway-operator-dev-0.15.0 ) , and now stunner-gateway-operator exited abnormally
Here's my logs
2023-08-17T10:29:57.997819374Z ERROR ctrl-runtime.controller-runtime.source.EventHandler if kind is a CRD, it should be installed before calling Start {"kind": "Dataplane.stunner.l7mp.io", "error": "no matches for kind \"Dataplane\" in version \"stunner.l7mp.io/v1alpha1\""} sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1.1 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:63 k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext.func1 /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:62 k8s.io/apimachinery/pkg/util/wait.loopConditionUntilContext /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/loop.go:63 k8s.io/apimachinery/pkg/util/wait.PollUntilContextCancel /go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/poll.go:33 sigs.k8s.io/controller-runtime/pkg/internal/source.(*Kind).Start.func1 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/source/kind.go:56 2023-08-17T10:30:07.988854671Z ERROR ctrl-runtime Could not wait for Cache to sync {"controller": "dataplane", "error": "failed to wait for dataplane caches to sync: timed out waiting for cache to be synced for Kind *v1alpha1.Dataplane"} sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.1 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:202 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:207 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:233 sigs.k8s.io/controller-runtime/pkg/manager.(*runnableGroup).reconcile.func1 /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/manager/runnable_group.go:219 2023-08-17T10:30:07.988926582Z INFO ctrl-runtime Stopping and waiting for non leader election runnables 2023-08-17T10:30:07.988983143Z INFO ctrl-runtime shutting down server {"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8080"} 2023-08-17T10:30:07.989002223Z INFO ctrl-runtime Stopping and waiting for leader election runnables 2023-08-17T10:30:07.989015373Z INFO ctrl-runtime Shutdown signal received, waiting for all workers to finish {"controller": "udproute"} 2023-08-17T10:30:07.989020444Z INFO ctrl-runtime Shutdown signal received, waiting for all workers to finish {"controller": "gateway"} 2023-08-17T10:30:07.989024744Z INFO ctrl-runtime Shutdown signal received, waiting for all workers to finish {"controller": "node"} 2023-08-17T10:30:07.989028694Z INFO ctrl-runtime Shutdown signal received, waiting for all workers to finish {"controller": "gatewayconfig"} 2023-08-17T10:30:07.989053724Z INFO ctrl-runtime All workers finished {"controller": "gateway"} 2023-08-17T10:30:07.989058654Z INFO ctrl-runtime All workers finished {"controller": "udproute"} 2023-08-17T10:30:07.989062284Z INFO ctrl-runtime All workers finished {"controller": "gatewayconfig"} 2023-08-17T10:30:07.989066294Z INFO ctrl-runtime All workers finished {"controller": "node"} 2023-08-17T10:30:07.989070174Z INFO ctrl-runtime Stopping and waiting for caches 2023-08-17T10:30:07.989671422Z INFO ctrl-runtime Stopping and waiting for webhooks 2023-08-17T10:30:07.989691322Z INFO ctrl-runtime Wait completed, proceeding to shutdown the manager 2023-08-17T10:30:07.989718533Z ERROR setup problem running manager {"error": "failed to wait for dataplane caches to sync: timed out waiting for cache to be synced for Kind *v1alpha1.Dataplane"} main.main /workspace/main.go:179 runtime.main /usr/local/go/src/runtime/proc.go:250

So what the Dataplane crd is 🤔

Proposal: Deploy dataplane pods into same namespace as the gateway operator

Right now the operator deploys the stunner dataplane pods into the namespace where the kubernetes Gateway is located in and this seems to be hardcoded in the dataplane_util.go:

ObjectMeta: metav1.ObjectMeta{
	Name:      gw.GetName(),
	Namespace: gw.GetNamespace(),
	Labels: map[string]string{
		opdefault.OwnedByLabelKey:         opdefault.OwnedByLabelValue,
		opdefault.RelatedGatewayKey:       gw.GetName(),
		opdefault.RelatedGatewayNamespace: gw.GetNamespace(),
	},
	Annotations: map[string]string{
		opdefault.RelatedGatewayKey: types.NamespacedName{
			Namespace: gw.GetNamespace(),
			Name:      gw.GetName(),
		}.String(),
	},
},

The Gateway namespace is likely the namespace where the media applications stunner relays the traffic to are deployed to as well.
I would like to hear the maintainers opinion about deploying the dataplane pods into the operator namespace (e.g. stunner-system) instead. I personally find this the cleaner approach because it separates stunner managed resources from the actual application resources. This is also how other gateway operators (for example the envoy gateway operator) do this.

Support per-Gateway LoadBalancer annotations

Currently the recommended way to customize the LoadBalancers created by STUNner is through adding annotations to the GatewayConfig spec.loadBalancerServiceAnnotations field. This, however, is global, in that it affects all Gateways, it does not make it possible to stop STUNner from exposing a Gateway, and it does not make it possible to add labels, only annotations.

This issue is a placeholder to add another way for controlling LoadBalancers and to discuss and implement this behavior. In particular, Istio implements a policy where annotations/labels can be added to each Gateway and these will be automatically copied into the Kubernetes services created for that Gateway.

The particular behavior (converted to STUNner) would be as follows:

Annotations and labels on the Gateway will be copied into the Service. This allows configuring things such as Internal load balancers that read from these fields.
STUNner offers an additional annotation to configure the generated resources: adding a special annotation stunner.l7mp.io/service-type controls the Service.spec.type field of the Service. For example, set to ClusterIP to not expose the service externally. The default is LoadBalancer.

Implement cross-namespace Gateway-Route bindings

Reported here.

Currently STUNner does not implement cross-namespace Gateway-UDPRoute bindings for simplicity: only UDPRoutes from the same namespace are accepted by the Gateway. This limitation is documented here and here.

This issue tracks the progress on implementing cross-namespace Gateway-Route bindings.

Automatically expose health-check ports on LBs

Related issue: #21

Many cloud providers support the assignment of health-check ports to publicly exposed Gateways/services via LB Service annotations. The below example will let the provider's LB to health-check our stunnerd pods over HTTP on the port TCP:8086.

apiVersion: gateway.networking.k8s.io/v1alpha2
kind: Gateway
metadata:
  name: health-check-gateway
  namespace: stunner
  annotations:
    stunner.l7mp.io/service-type: NodePort
    stunner.l7mp.io/expose-health-check-port: true
    service.beta.kubernetes.io/do-loadbalancer-healthcheck-port: "8086"
    service.beta.kubernetes.io/do-loadbalancer-healthcheck-protocol: "http"
    service.beta.kubernetes.io/do-loadbalancer-healthcheck-path: "/live
spec:
  ...

The way this works is that the gateway operator will copy the annotations from the Gateway verbatim into the LB service that exposes it, and the provider's LB will then pick up the annotations from this LB service and bootstrap the health-checker.

Problem: Many providers require the health-check port to be explicitly exposed in LB services, which currently the operator does not support. In particular, the automatically created LB service should contain a service-port that covers the health-check port:

apiVersion: v1
kind: Service
metadata:
  name: health-check-gateway
spec:
  ports:
...
    - name: health-check-port
      port: 8086
      protocol: TCP
...
  type: LoadBalancer

Since the operator does not create the service-port automatically, users have to create them manually which is error-prone.

Solution: Automatically create the health-checker service-ports based on the service.beta.kubernetes.io/do-loadbalancer-healthcheck-* annotations. Since exposing health-check ports is fundamentally insecure, this feature should be explicitly enabled by the user by setting stunner.l7mp.io/expose-health-check-port: true. The port should come from service.beta.kubernetes.io/do-loadbalancer-healthcheck-port, the protocol should be TCP if service.beta.kubernetes.io/do-loadbalancer-healthcheck-protocol is TCP or HTTP (other protocols may be supported later if the need arises), and the name should be something like health-check-<protocol>-<port> (hopefully, this will be unique enough for now).

Implementation: see here. PR must come with unit tests and an integration test plus docs here.

Edited: Added feature gate.

Expose `throttleTimeout` on the command line

The constant throttleTimeout controls the time under which config generation is suppressed. The idea is that we do not want to re-generate the config and update the API server every time one of our watchers update a resource in the local storage since the config rendering pipeline is expensive. Rather, every time a watcher triggers an update we start a timer to wait throttleTimeout msecs and we completely suppress all config rendering triggers until the timer ticks. This decreases the CPU footprint of the operator at the cost of artificially delaying all updates with at most throttleTimeout msecs.

In large clusters with massive control plane churn, the hardcoded 250 msec timeout would be too little. This enhancement request covers:

make throttling the default: remove DefaultEnableRenderThrottling and EnableRenderThrottling and change the code equally to as if EnableRenderThrottling=true was in effect,
expose throttleTimeout to the command line and as an ENV variable.

False `Accepted` status in rejected UDPRoute

Reported here.

The URPRoute misreports the Accepted status as True even when the Gateway rejects the route due to a cross-namespace binding attempt.

ERROR updater cannot update service ... Invalid value: \"null\": may not change once set"

Followup issue first raised in #19.

In AWS EKS, services created with type: LoadBalancer will have its loadBalancerClass set to service.k8s.aws/nlb. The added keys is immutable and must be provided when the operator tries to synchronize and update existing services.

Example of the created service:

apiVersion: v1
kind: Service
metadata:
  name: stunner-gateway
  namespace: stunner
  labels:
    stunner.l7mp.io/owned-by: stunner
    stunner.l7mp.io/related-gateway-name: stunner-gateway
    stunner.l7mp.io/related-gateway-namespace: stunner
  annotations:
    meta.helm.sh/release-name: stunner-chart
    meta.helm.sh/release-namespace: stunner
    stunner.l7mp.io/related-gateway-name: stunner/stunner-gateway
spec:
  ports:
    - name: udp-listener
      protocol: UDP
      port: 3478
      targetPort: 3478
      nodePort: 31753
    - name: tls-listener
      protocol: UDP
      port: 443
      targetPort: 443
      nodePort: 32394
  selector:
    app: stunner
    stunner.l7mp.io/related-gateway-name: stunner-gateway
    stunner.l7mp.io/related-gateway-namespace: stunner
  clusterIP: A.B.C.D # filtered out
  clusterIPs:
    - A.B.C.D # filtered out
  type: LoadBalancer
  sessionAffinity: None
  externalTrafficPolicy: Cluster
  ipFamilies:
    - IPv4
  ipFamilyPolicy: SingleStack
  allocateLoadBalancerNodePorts: true
  loadBalancerClass: service.k8s.aws/nlb
  internalTrafficPolicy: Cluster

Stack Trace of errors:

ERROR updater cannot update service {"operation": "unchanged", "service": "{"metadata":{"name":"stunner-gateway","namespace":"stunner","creationTimestamp":null,"labels":{"stunner.l7mp.io/owned-by":"stunner","stunner.l7mp.io/related-gateway-name":"stunner-gateway","stunner.l7mp.io/related-gateway-namespace":"stunner"},"annotations":{"kots.io/app-slug":"labforward","meta.helm.sh/release-name":"stunner-chart","meta.helm.sh/release-namespace":"stunner","stunner.l7mp.io/related-gateway-name":"stunner/stunner-gateway"},"ownerReferences":[{"apiVersion":"gateway.networking.k8s.io/v1beta1","kind":"Gateway","name":"stunner-gateway","uid":"0a2e959e-da0a-4aa6-97db-42bc7f065c52"}]},"spec":{"ports":[{"name":"udp-listener","protocol":"UDP","port":3478,"targetPort":0},{"name":"tls-listener","protocol":"UDP","port":443,"targetPort":0}],"selector":{"app":"stunner","stunner.l7mp.io/related-gateway-name":"stunner-gateway","stunner.l7mp.io/related-gateway-namespace":"stunner"},"type":"LoadBalancer"},"status":{"loadBalancer":{}}}", "error": "cannot upsert service "stunner/stunner-gateway": Service "stunner-gateway" is invalid: spec.loadBalancerClass: Invalid value: "null": may not change once set"}
github.com/l7mp/stunner-gateway-operator/internal/updater.(*Updater).ProcessUpdate
/workspace/internal/updater/updater.go:115
github.com/l7mp/stunner-gateway-operator/internal/updater.(*Updater).Start.func1
/workspace/internal/updater/updater.go:62

Support setting the `externalTrafficPolicy` in LB Services

It should be possible to set the ExternalTrafficPolicy on the automatically created LB Services, as it allows the stunnerd pods to see the original IP address of clients and may facilitate running STUNner as a STUN server. The plan is to support a new stunner.l7mp.io/external-traffic-policy annotation on STUNner Gateways that can be set to local (retain the client source IP) or cluster (masquerade the client source IP).

This issue is to track progress on implementing this feature.

Bug: Gateway listener names are not used properly

The gateway-operator does not use the configured names of the listeners in the Gateway resource.
In case of applying the following manifest:

apiVersion: gateway.networking.k8s.io/v1alpha2
kind: Gateway
metadata:
  name: udp-gateway
  namespace: stunner
spec:
  gatewayClassName: stunner-gatewayclass
  listeners:
    - name: udp-listener
      port: 3478
      protocol: UDP
    - name: another-udp-listener
      port: 3479
      protocol: UDP

The following error shows up in the gateway-operator's logs:

2023-03-02T16:51:31.076679312Z    ERROR    updater    cannot update service    {"operation": "unchanged", "error": "cannot upsert service \"stunner/udp-gateway\": Service \"udp-gateway\" is invalid: spec.ports[1].name: Duplicate value: \"udp-gateway-udp\""}
github.com/l7mp/stunner-gateway-operator/internal/updater.(*Updater).Start.func1
    /workspace/internal/updater/updater.go:63

It seems the Services spec.ports.name field is filled with the gateway's name + the port's protocol type, instead of the name configured in the Gateway's spec.listeners.name.

This does not cause any problems if only one listener is configured. As soon as two or more listeners are configured their names will bump into each other causing an error.

l7mp / stunner-gateway-operator Goto Github PK

stunner-gateway-operator's Introduction

l7mp: A L7 Multiprotocol Proxy and Service Mesh

The l7mp data plane

The l7mp control plane

Deployment models

The l7mp service mesh

Set up l7mp inside a Minikube cluster

Query configuration and manage sessions

Usage example:

Test

Clean up

The l7mp proxy

Installation

Standalone installation

Docker installation

Deploy into Kubernetes

Usage example

Run

Query configuration

Manage sessions

Add a new cluster

Add a new listener and a route

Routing

Retries and timeouts

Test the connection

Clean up

Multiprotocol Support

Kernel offload

The l7mp service mesh

License

stunner-gateway-operator's People

Contributors

Stargazers

Watchers

Forkers

stunner-gateway-operator's Issues

Recommend Projects

Recommend Topics

Recommend Org