Git Product home page Git Product logo

k8s-source's Introduction

Kubernetes Source

Installation

Create an API Key in Overmind under Account settings > API Keys

Install the source into your Kubernetes cluster using Helm:

helm repo add overmind https://overmindtech.github.io/k8s-source
helm install overmind-kube-source overmind/overmind-kube-source --set source.apiKey=ovm_api_YOURKEY_HERE

To upgrade:

helm upgrade overmind-kube-source overmind/overmind-kube-source

To uninstall:

helm uninstall overmind-kube-source

NOTE: Currently the source won't appear in your "Sources" list in Overmind since it's running on your infrastructure, not ours. We'll improve this soon.

Support

This source will support all Kubernetes versions that are currently maintained in the kubernetes project. The list can be found here: https://kubernetes.io/releases/

Search

The backends in this package implement the Search() method. The query format that they are expecting is a JSON object with one or more of the following keys, with strings in the corresponding string format:

An example would be:

{
    "labelSelector": "app=wordpress"
}

or

{
    "labelSelector": "environment=production,tier!=frontend",
    "fieldSelector": "metadata.namespace!=default"
}

Other fields can also be set of advanced querying is required, these fields must match the JSON schema for ListOptions: https://pkg.go.dev/k8s.io/[email protected]/pkg/apis/meta/v1#ListOptions

Development

Testing

The tests for this package rely on having a Kubernetes cluster to interact with. This is handled using kind when the tests are started. Please make sure that you have the required software installed:

IMPORTANT: If you already have kubectl configured and are connected to a cluster, that cluster is what will be used for testing. Resources will be cleaned up with the exception of the testing namespace. If a cluster is not configured, or not available, one will be created (and destroyed) using kind. This behavior may change in the future as I see it being a bit risky as it could accidentally run the tests against a production cluster, though that would be a good way to validate real-world use-cases.

k8s-source's People

Contributors

dylanratcliffe avatar renovate[bot] avatar davids-ovm avatar dependabot[bot] avatar tphoney avatar

Watchers

Lucian avatar  avatar  avatar

k8s-source's Issues

Investigate performance

Performance doesn't seem to change much as max-parallel is increased, I suspect this is a bug and need to look into why.

After some testing it doesn't seem that the issue is related to the discovery library. New tests show that this is performing as expected. Seems that the issue is something to do with this source. Interestingly it seems that increasing the --max-parallel actually slows down the source rather than speeding it up, however not in a linear fashion. e.g. a --max-parallel of 999 takes about the same amount of time as 99,999,999

Create terraform & docs mappings

We need to map the existing k8s sources to terraform, and create the mapping so that things can be documented with DocGPT

  • clusterrolebinding
  • clusterrole
  • configmap
  • cronjob
  • daemonset
  • deployment
  • endpoints
  • endpointslice
  • generic_source
  • horizontalpodautoscaler
  • ingress
  • job
  • limitrange
  • networkpolicy
  • node
  • persistentvolumeclaim
  • persistentvolume
  • poddisruptionbudget
  • pods
  • priorityclass
  • replicaset
  • replicationcontroller
  • resourcequota
  • rolebinding
  • role
  • secret
  • serviceaccount
  • service
  • statefulset
  • storageclass
  • volumeattachment

Pods not linking to config maps

I'm finding that pods aren't linking to the congif maps they mount. For example a pod has the following attributes:

spec:
  dnsPolicy: ClusterFirst
  schedulerName: default-scheduler
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: data-neo4j-0
  - name: neo4j-conf
    projected:
      sources:
      - configMap:
          name: neo4j-default-config
      - configMap:
          name: neo4j-user-config
      - configMap:
          name: neo4j-k8s-config

However the ConfigMap isn't connected to anything

Work out how this source will be run/deployed

In Cluster

The simplest way for us to allow customers to run this would be to have them run it in their own cluster. We could create a helm chart that included:

  • A role that confers the required Get and Watch permissions
  • A serviceaccount that the pod/s would use
  • A rolebinding probably to bind the role to the serviceaccount
  • A deployment to run the source within the cluster. We would need resource limits and to ensure that we always have one running

The most difficult part about this is the fact that the source will need to communicate directly with NATS, meaning that it will need to get a token from the API. In order to do that it'll need an OAuth token. This might bring back into contention the complexity of having sources run as specific OAuth apps. Not the end of the world but it it'll be a bit of work. We'll need some way of giving the source a token that it can refresh, without exposing the client secret too. Which is doable but once again a bit harder

Config

Required config:

  • Create/delete (does this count?)
  • Auth

TODO:

  • Look to see if there is anything good that we could use like auto-updating helm charts or something

Since the source is running in the customer's cluster, it would be hard to manage in the GUI. Currently we rely on srcman to manage the entire lifecycle of our sources, and I would really like to keep the experience as close to SAAS as possible so forcing the user to make manual changes to their cluster is strongly discouraged.

Implementation

One way we could implement this is to have the source run exactly the same as a regular source, but we launch it in a "watcher" mode. And its entire job is to watch the source that the customer starts and report its health through srcman like everything else. This will still require some changes to srcman as there will need to be a way to show the user the auth details that they would need to provide to helm

Hosted

If we could host these ourselves we'd be in a much better position since we could manage the lifecycle in the same way that we do for everything else. The problem with this is that there is a good chance that the API for the kube clusters is going to be pretty locked down because of how important it is.

There are a few options for EKS cluster endpoints. As expected many of them are private so they really won't be much help for us. I think that running in the cluster is probably the best way for the time being

Next Steps

  • Create helm charts since we'll need that no matter what
  • Once we know exactly what config the helm charts will need and how that config gets provided, think about the changes required for srcman to be able to tell users which options they need to pass to helm

Support Helm

Helm is built on top of kubernetes and doesn't actually create kubernetes objects. In order to read helm info we will need to use the helm Go libraries: https://pkg.go.dev/helm.sh/helm

GPT-4 suggests that we can query helm releases as follows:

package main

import (
    "log"
	"os"
	"helm.sh/helm/v3/pkg/action"
	"helm.sh/helm/v3/pkg/cli"
	"k8s.io/client-go/tools/clientcmd"
	"fmt"
)

func main() {
	helmDriver := os.Getenv("HELM_DRIVER")
	settings := cli.New()

	actionConfig := new(action.Configuration)
	kubeconfig := clientcmd.NewNonInteractiveDeferredLoadingClientConfig(
		clientcmd.NewDefaultClientConfigLoadingRules(),
		&clientcmd.ConfigOverrides{},
	)
	namespace, _, _ := kubeconfig.Namespace()
	err := actionConfig.Init(settings.RESTClientGetter(), namespace, helmDriver, func(format string, v ...interface{}) {
		fmt.Sprintf(format, v)
	})

	if err != nil {
		log.Fatalf("Failed to connect to Kubernetes: %v", err)
	}

	// create a new List action
	listAction := action.NewList(actionConfig)
	
    // set the namespace if needed - default it's "", which lists releases across all namespaces
    listAction.AllNamespaces = true
	
    // retrieve releases
	releases, err := listAction.Run()
	if err != nil {
		log.Fatalf("Failed to retrieve releases: %v", err)
	}

	// print release name
	for _, rls := range releases {
		fmt.Println(rls.Name)
	}
}

Decide what to do about different versions of the same resource

It's possible in kubernetes to have different APIs that serve the same resource, like when some thing move from a beta to a regular API. Our sources are only able to find data for one type, and one API version each. So a source that works for v1 Pods wouldn't work for v2 Pods (just an example). In order to support people that are using older resources, we really should be looking at the possible versions for things and creating additional sources. This would then raise the question of which should we should use. Kube does have APIs that will report which versions of things are available, but more research would be needed to determine how this works exactly and how we should be using it

End-to-end test

I need to do a full end to end test once https://github.com/overmindtech/deploy/issues/435 is complete. This will involve:

  • Deleting all sources and API keys
  • Adding a source for prod & dogfood to the production Overmind
  • Making sure we can get data from both of them
  • Ensure that the defaults are correct i.e. NATS url etc.
  • Configure mappings as per documentation
  • Make sure blast radius is accurate
  • Move management to terraform

Create tests for all sources

Currently there are only some tests, but a good test framework. It shouldn't be too hard to create tests for everything

Expose rate limiting config

These events are common:

I0720 11:21:47.684093       1 request.go:696] Waited for 1.033262583s due to client-side throttling, not priority and fairness, request: GET:https://[fd16:6e37:bf5d::1]:443/api/v1/namespaces/ku
I0720 11:21:57.684261       1 request.go:696] Waited for 3.990879119s due to client-side throttling, not priority and fairness, request: GET:https://[fd16:6e37:bf5d::1]:443/api/v1/namespaces/ku
I0720 11:23:43.378810       1 request.go:696] Waited for 1.147997843s due to client-side throttling, not priority and fairness, request: GET:https://[fd16:6e37:bf5d::1]:443/api/v1/namespaces/de
I0720 11:23:53.578883       1 request.go:696] Waited for 3.995475924s due to client-side throttling, not priority and fairness, request: GET:https://[fd16:6e37:bf5d::1]:443/api/v1/namespaces/de
I0720 11:24:03.578965       1 request.go:696] Waited for 3.988223247s due to client-side throttling, not priority and fairness, request: GET:https://[fd16:6e37:bf5d::1]:443/api/v1/namespaces/de
I0720 11:24:13.778828       1 request.go:696] Waited for 3.99516626s due to client-side throttling, not priority and fairness, request: GET:https://[fd16:6e37:bf5d::1]:443/api/v1/namespaces/def
I0720 11:24:57.409218       1 request.go:696] Waited for 1.143622363s due to client-side throttling, not priority and fairness, request: GET:https://[fd16:6e37:bf5d::1]:443/api/v1/namespaces/de

We should make sure rate limits are exposed

Can't grant request:recieve scope

When creating an API token. I can't grant the request:recieve scope. I think this is because interactive users don't have this scope themselves. Need to test in dogfood...

leaking nats connections

This source is seemingly leaking nats connections leading to unnecessary resource usage in the source process as well as on NATS:

time="2023-07-05T09:21:22Z" level=info msg="NATS connecting" servers="nats://nats:4222,nats://nats:4223"                                                                      
time="2023-07-05T09:21:32Z" level=info msg="NATS connected" ServerID=NA3WG7EBM4TOVT67EWZ4Z6HUMSOAQE7REZKOSWZR5JZ2HNIMKY4BHWHU URL:="nats://nats:4222"                         
time="2023-07-05T09:21:32Z" level=info msg="Listing namespaces"                                                                                                               
time="2023-07-05T09:21:32Z" level=info msg="got 6 namespaces"                                                                                                                 
time="2023-07-05T09:21:32Z" level=info msg="NATS connecting" servers="nats://nats:4222,nats://nats:4223"                                                                      
time="2023-07-05T09:21:32Z" level=info msg="NATS connected" ServerID=NA3WG7EBM4TOVT67EWZ4Z6HUMSOAQE7REZKOSWZR5JZ2HNIMKY4BHWHU URL:="nats://nats:4222"                         
time="2023-07-05T09:21:32Z" level=info msg="Listing namespaces"                                                                                                               
time="2023-07-05T09:21:32Z" level=info msg="got 6 namespaces"                                                                                                                 

Image

Create fallback source

It would be possible to create a source that is completely generic. This would mean that we could scan the server for the available resource types, and for all the ones we don't have real sources for, create a generic one on the fly. This wouldn't have as many links but would give us a lot of coverage for relatively little work.

I think we should try this before we make a start on #18 and #17

Low priority k8s source backlog

  • componentstatuses
  • apiservices
  • csistoragecapacities
  • customresourcedefinitions
  • eniconfigs
  • fieldexports: low
  • flowschemas
  • globalclusters
  • leases
  • limitranges
  • priorityclasses
  • prioritylevelconfigurations
  • validatingwebhookconfigurations
  • mutatingwebhookconfigurations
  • runtimeclasses

Locking up after long usage

kube source hovering around 1.5% cpu usage through the night, oddly though the source does not respond to queries while it is still happily connected to both kube api and nats:

~ $ netstat -atp
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 :::http-alt             :::*                    LISTEN      1/source
tcp        0      0 demo-overmind-kube-source-5b868897d-vqxdn:54732 nats.default.svc.cluster.local:4222 ESTABLISHED 1/source
tcp        0      0 demo-overmind-kube-source-5b868897d-vqxdn:50986 kubernetes.default.svc.cluster.local:https ESTABLISHED 1/source
~ $ 

Nats-top:

NATS server version 2.9.20 (uptime: 20h13m50s)
Server:
  ID:   NB2LR3QJDH6IYAJ5D4OXBEPVYGDQBT6YDJAYTUVXZRVBKIXQPZPP57J5
  Load: CPU:  0.0%  Memory: 19.8M  Slow Consumers: 0
  In:   Msgs: 17.0M  Bytes: 460.8M  Msgs/Sec: 327.2  Bytes/Sec: 7.7K 
  Out:  Msgs: 86.2K  Bytes: 38.5M  Msgs/Sec: 0.0  Bytes/Sec: 0

Connections Polled: 5
  HOST                                CID    NAME                                                                   SUBS    PENDING     MSGS_TO     MSGS_FROM   BYTES_TO    BYTES_FROM  LANG     VERSION  UPTIME   LAST_ACTIVITY
  2a05:d01c:997:3200:9a2f::8:56112    16     source.default.f63e5fff-fd97-4723-984a-eb65e8a35b6f-58d4f7cdf5-2d8jw   10      0           7.3K        37.6K       987.2K      34.3M       go       1.27.1   20h13m17s  2023-07-20 06:35:22.741581908 +0000 UTC
  2a05:d01c:997:3200:e9b4::c:49632    17     source.default.2919c5eb-ba0d-4495-beed-fbd80404e924-57489b896b-4vdbn   4       0           7.4K        1.6K        1015.1K     672.2K      go       1.27.1   20h13m7s  2023-07-20 06:35:22.735401345 +0000 UTC 
  2a05:d01c:997:3200:e9b4::d:33344    18     source.default.313a0824-416b-4a9a-b455-369a38e62245-6587ff8c8b-rglj5   14      0           6.8K        32.2K       894.7K      9.7M        go       1.27.1   20h13m6s  2023-07-20 06:35:22.750481085 +0000 UTC 
  2a05:d01c:997:3200:e9b4::7:44966    19     revlink-6ffbd7567f-qnt79                                               2       0           6.3K        842         4.4M        152.1K      go       1.27.1   20h12m54s  2023-07-20 06:35:22.631274491 +0000 UTC
  2a05:d01c:997:3200:9a2f::4:54732    34     .demo-overmind-kube-source-5b868897d-vqxdn                             20      0           6.8K        16.9M       889.9K      410.6M      go       1.27.1   15h40m11s  2023-07-20 07:39:48.900815265 +0000 UTC

It's constantly sending a large number of messages.

Initial steps

  • Add tracing: basically copy over the various tracing.go tracing/ files and hook the startup/shutdown methods into the root cmd*
  • Fix that godforsaken tracing log message in everything
  • Restart the pod with the required tracing so that we can catch this tomorrow at least
  • Audit the response sender logic in discovery, see if there is any way for it to not realise a query is done
  • Audit what we're doing when we get new namespaces

Engine restarting due to namespace event every 10 seconds

This only seems to happen after the source has been running for a while. Example of the output:

time="2023-05-31T12:00:29Z" level=info msg="Restarting engine due to namespace event: "
time="2023-05-31T12:00:29Z" level=info msg="Listing namespaces"
time="2023-05-31T12:00:29Z" level=info msg="NATS disconnected" address= error="<nil>"
time="2023-05-31T12:00:29Z" level=info msg="NATS connection closed" error="<nil>"
time="2023-05-31T12:00:29Z" level=info msg="got 7 namespaces"
time="2023-05-31T12:00:29Z" level=info msg="NATS connecting" servers="nats://nats:4222,nats://nats:4223"
time="2023-05-31T12:00:39Z" level=info msg="NATS connected" ServerID=NDKSO4EZTKGJ5MCON5FZ4UPUGQHBNIM3F6V37OJTGPBSL4IOLZ2LB
time="2023-05-31T12:00:39Z" level=info msg="Restarting engine due to namespace event: "
time="2023-05-31T12:00:39Z" level=info msg="Listing namespaces"
time="2023-05-31T12:00:39Z" level=info msg="NATS disconnected" address= error="<nil>"
time="2023-05-31T12:00:39Z" level=info msg="NATS connection closed" error="<nil>"
time="2023-05-31T12:00:39Z" level=info msg="got 7 namespaces"
time="2023-05-31T12:00:40Z" level=info msg="NATS connecting" servers="nats://nats:4222,nats://nats:4223"
time="2023-05-31T12:00:40Z" level=info msg="NATS connected" ServerID=NDKSO4EZTKGJ5MCON5FZ4UPUGQHBNIM3F6V37OJTGPBSL4IOLZ2LB
time="2023-05-31T12:00:40Z" level=info msg="Restarting engine due to namespace event: "
time="2023-05-31T12:00:40Z" level=info msg="NATS disconnected" address= error="<nil>"
time="2023-05-31T12:00:40Z" level=info msg="NATS connection closed" error="<nil>"
time="2023-05-31T12:00:40Z" level=info msg="Listing namespaces"
time="2023-05-31T12:00:40Z" level=info msg="got 7 namespaces"
time="2023-05-31T12:00:40Z" level=info msg="NATS connecting" servers="nats://nats:4222,nats://nats:4223"
time="2023-05-31T12:00:40Z" level=info msg="NATS connected" ServerID=NDKSO4EZTKGJ5MCON5FZ4UPUGQHBNIM3F6V37OJTGPBSL4IOLZ2LB
time="2023-05-31T12:00:40Z" level=info msg="Restarting engine due to namespace event: "
time="2023-05-31T12:00:40Z" level=info msg="Listing namespaces"
time="2023-05-31T12:00:40Z" level=info msg="NATS disconnected" address= error="<nil>"
time="2023-05-31T12:00:40Z" level=info msg="NATS connection closed" error="<nil>"
time="2023-05-31T12:00:40Z" level=info msg="got 7 namespaces"
time="2023-05-31T12:00:40Z" level=info msg="NATS connecting" servers="nats://nats:4222,nats://nats:4223"
time="2023-05-31T12:00:50Z" level=info msg="NATS connected" ServerID=NDKSO4EZTKGJ5MCON5FZ4UPUGQHBNIM3F6V37OJTGPBSL4IOLZ2LB
time="2023-05-31T12:00:50Z" level=info msg="Restarting engine due to namespace event: "
time="2023-05-31T12:00:50Z" level=info msg="NATS disconnected" address= error="<nil>"
time="2023-05-31T12:00:50Z" level=info msg="NATS connection closed" error="<nil>"
time="2023-05-31T12:00:50Z" level=info msg="Listing namespaces"
time="2023-05-31T12:00:51Z" level=info msg="got 7 namespaces"
time="2023-05-31T12:00:51Z" level=info msg="NATS connecting" servers="nats://nats:4222,nats://nats:4223"
time="2023-05-31T12:00:51Z" level=info msg="NATS connected" ServerID=NDKSO4EZTKGJ5MCON5FZ4UPUGQHBNIM3F6V37OJTGPBSL4IOLZ2LB
time="2023-05-31T12:00:51Z" level=info msg="Restarting engine due to namespace event: "
time="2023-05-31T12:00:51Z" level=info msg="Listing namespaces"
time="2023-05-31T12:00:51Z" level=info msg="NATS disconnected" address= error="<nil>"
time="2023-05-31T12:00:51Z" level=info msg="NATS connection closed" error="<nil>"
time="2023-05-31T12:00:51Z" level=info msg="got 7 namespaces"
time="2023-05-31T12:00:51Z" level=info msg="NATS connecting" servers="nats://nats:4222,nats://nats:4223"
time="2023-05-31T12:01:01Z" level=info msg="NATS connected" ServerID=NDKSO4EZTKGJ5MCON5FZ4UPUGQHBNIM3F6V37OJTGPBSL4IOLZ2LB
time="2023-05-31T12:01:01Z" level=info msg="Restarting engine due to namespace event: "
time="2023-05-31T12:01:01Z" level=info msg="Listing namespaces"
time="2023-05-31T12:01:01Z" level=info msg="NATS disconnected" address= error="<nil>"
time="2023-05-31T12:01:01Z" level=info msg="NATS connection closed" error="<nil>"
time="2023-05-31T12:01:02Z" level=info msg="got 7 namespaces"
time="2023-05-31T12:01:02Z" level=info msg="NATS connecting" servers="nats://nats:4222,nats://nats:4223"

Create helm chart to deploy

If users are to run this on their clusters they will need a helm chart to deploy it. This chart will need to contain:

  • A deployment to run the pod
  • A configmap for the NATS and other config
  • A secret for the NKey seed & the token

Source is looping

For some reason the source is still looping, like what happened previously in #53

time="2023-06-09T14:28:41Z" level=info msg="NATS connected" ServerID=NDKSO4EZTKGJ5MCON5FZ4UPUGQHBNIM3F6V37OJTGPBSL4IOLZ2LBQM7 URL:="nats://nats:4222"                                                                                             │
│ time="2023-06-09T14:28:41Z" level=info msg="Listing namespaces"                                                                                                                                                                                   │
│ time="2023-06-09T14:28:42Z" level=info msg="got 7 namespaces"                                                                                                                                                                                     │
│ time="2023-06-09T14:28:42Z" level=info msg="NATS connecting" servers="nats://nats:4222,nats://nats:4223"                                                                                                                                          │
│ time="2023-06-09T14:28:42Z" level=info msg="NATS connected" ServerID=NDKSO4EZTKGJ5MCON5FZ4UPUGQHBNIM3F6V37OJTGPBSL4IOLZ2LBQM7 URL:="nats://nats:4222"                                                                                             │
│ time="2023-06-09T14:28:42Z" level=info msg="Listing namespaces"                                                                                                                                                                                   │
│ time="2023-06-09T14:28:43Z" level=info msg="got 7 namespaces"                                                                                                                                                                                     │
│ time="2023-06-09T14:28:43Z" level=info msg="NATS connecting" servers="nats://nats:4222,nats://nats:4223"                                                                                                                                          │
│ time="2023-06-09T14:28:53Z" level=info msg="NATS connected" ServerID=NDKSO4EZTKGJ5MCON5FZ4UPUGQHBNIM3F6V37OJTGPBSL4IOLZ2LBQM7 URL:="nats://nats:4222"                                                                                             │
│ time="2023-06-09T14:28:53Z" level=info msg="Listing namespaces"                                                                                                                                                                                   │
│ I0609 14:28:55.086434       1 request.go:696] Waited for 1.151082353s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/api/v1/namespaces                                                             │
│ time="2023-06-09T14:28:55Z" level=info msg="got 7 namespaces"                                                                                                                                                                                     │
│ time="2023-06-09T14:28:55Z" level=info msg="NATS connecting" servers="nats://nats:4222,nats://nats:4223"                                                                                                                                          │
│ time="2023-06-09T14:28:55Z" level=info msg="NATS connected" ServerID=NDKSO4EZTKGJ5MCON5FZ4UPUGQHBNIM3F6V37OJTGPBSL4IOLZ2LBQM7 URL:="nats://nats:4222"                                                                                             │
│ time="2023-06-09T14:28:55Z" level=info msg="Listing namespaces"                                                                                                                                                                                   │
│ time="2023-06-09T14:28:56Z" level=info msg="got 7 namespaces"                                                                                                                                                                                     │
│ time="2023-06-09T14:28:56Z" level=info msg="NATS connecting" servers="nats://nats:4222,nats://nats:4223"                                                                                                                                          │
│ time="2023-06-09T14:28:56Z" level=info msg="NATS connected" ServerID=NDKSO4EZTKGJ5MCON5FZ4UPUGQHBNIM3F6V37OJTGPBSL4IOLZ2LBQM7 URL:="nats://nats:4222"                                                                                             │
│ time="2023-06-09T14:28:56Z" level=info msg="Listing namespaces"                                                                                                                                                                                   │
│ time="2023-06-09T14:28:57Z" level=info msg="got 7 namespaces"                                                                                                                                                                                     │
│ time="2023-06-09T14:28:57Z" level=info msg="NATS connecting" servers="nats://nats:4222,nats://nats:4223"                                                                                                                                          │
│ I0609 14:29:07.489119       1 request.go:696] Waited for 1.002376188s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/api/v1/namespaces/kube-system/replicationcontrollers                          │
│ time="2023-06-09T14:29:07Z" level=info msg="NATS connected" ServerID=NDKSO4EZTKGJ5MCON5FZ4UPUGQHBNIM3F6V37OJTGPBSL4IOLZ2LBQM7 URL:="nats://nats:4222"                                                                                             │
│ time="2023-06-09T14:29:07Z" level=info msg="Listing namespaces"                                                                                                                                                                                   │
│ time="2023-06-09T14:29:08Z" level=info msg="got 7 namespaces"                                                                                                                                                                                     │
│ time="2023-06-09T14:29:08Z" level=info msg="NATS connecting" servers="nats://nats:4222,nats://nats:4223"                                                                                                                                          │
│ time="2023-06-09T14:29:08Z" level=info msg="NATS connected" ServerID=NDKSO4EZTKGJ5MCON5FZ4UPUGQHBNIM3F6V37OJTGPBSL4IOLZ2LBQM7 URL:="nats://nats:4222"                                                                                             │
│ time="2023-06-09T14:29:08Z" level=info msg="Listing namespaces"                                                                                                                                                                                   │
│ time="2023-06-09T14:29:09Z" level=info msg="got 7 namespaces"                                                                                                                                                                                     │
│ time="2023-06-09T14:29:09Z" level=info msg="NATS connecting" servers="nats://nats:4222,nats://nats:4223"                                                                                                                                          │
│ time="2023-06-09T14:29:09Z" level=info msg="NATS connected" ServerID=NDKSO4EZTKGJ5MCON5FZ4UPUGQHBNIM3F6V37OJTGPBSL4IOLZ2LBQM7 URL:="nats://nats:4222"                                                                                             │
│ time="2023-06-09T14:29:09Z" level=info msg="Listing namespaces"                                                                                                                                                                                   │
│ time="2023-06-09T14:29:11Z" level=info msg="got 7 namespaces"                                                                                                                                                                                     │
│ time="2023-06-09T14:29:11Z" level=info msg="NATS connecting" servers="nats://nats:4222,nats://nats:4223"                                                                                                                                          │
│ time="2023-06-09T14:29:11Z" level=info msg="NATS connected" ServerID=NDKSO4EZTKGJ5MCON5FZ4UPUGQHBNIM3F6V37OJTGPBSL4IOLZ2LBQM7 URL:="nats://nats:4222"                                                                                             │
│ time="2023-06-09T14:29:11Z" level=info msg="Listing namespaces"                                                                                                                                                                                   │
│ time="2023-06-09T14:29:12Z" level=info msg="got 7 namespaces"                                                                                                                                                                                     │
│ time="2023-06-09T14:29:12Z" level=info msg="NATS connecting" servers="nats://nats:4222,nats://nats:4223"                                                                                                                                          │
│ I0609 14:29:18.083232       1 request.go:696] Waited for 1.045901276s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/rbac.authorization.k8s.io/v1/clusterrolebindings                         │
│ time="2023-06-09T14:29:22Z" level=info msg="NATS connected" ServerID=NDKSO4EZTKGJ5MCON5FZ4UPUGQHBNIM3F6V37OJTGPBSL4IOLZ2LBQM7 URL:="nats://nats:4222"                                                                                             │
│ time="2023-06-09T14:29:22Z" level=info msg="Listing namespaces"                                                                                                                                                                                   │
│ time="2023-06-09T14:29:24Z" level=info msg="got 7 namespaces"                                                                                                                                                                                     │
│ time="2023-06-09T14:29:24Z" level=info msg="NATS connecting" servers="nats://nats:4222,nats://nats:4223"                                                                                                                                          │
│ I0609 14:29:28.084857       1 request.go:696] Waited for 1.198547805s due to client-side throttling, not priority and fairness, request: GET:https://172.20.0.1:443/apis/networking.k8s.io/v1/namespaces/buildkit/networkpolicies                 │
│ time="2023-06-09T14:29:34Z" level=info msg="NATS connected" ServerID=NDKSO4EZTKGJ5MCON5FZ4UPUGQHBNIM3F6V37OJTGPBSL4IOLZ2LBQM7 URL:="nats://nats:4222"                                                                                             │
│ time="2023-06-09T14:29:34Z" level=info msg="Listing namespaces"  

Implement medium priority k8s sources

  • securitygrouppolicies
  • resourcequotas
  • networkpolicies
  • ingressclasses
  • ingressclassparams
  • certificatesigningrequests
  • csidrivers
  • csinodes
  • dbclusterparametergroups
  • dbclusters
  • dbinstances
  • dbparametergroups
  • dbproxies
  • dbsubnetgroups
  • podtemplates
  • targetgroupbindings

Decide how auth/lifecycle should work for self-hosted sources

Auth

At the moment the sources is using the same auth method which assumes that it's able to get NKeys and Token to the source securely. This is okay in srcman as it's all hosted by us. But for self-hosted sources we need to be bit smarter. The things I don't like about the current approach are:

  • There is no need for the NKey seed to be sent over the wire, it could be generated locally and then just the public key sent back to get a token
  • The token has a really long lifetime and can't be revoked

It would be cool if we could have the user install the helm chart without having to pass any parameters. I'm thinking maybe it generates an NKey seed, and then shows the user something that they can then provide to us to prove the source is theirs. Like maybe a a URL that they can click which adds the source to their account. The thing is once they have clicked this, we need to get the JWT to the source somehow. Maybe it would need to make an unauthenticated request to the API which is "approved" by the user clicking the link.

Lifecycle

The next problem is: how do we tell what sources you have? Currently we use kubernetes as a database for sources and all data is stored there. We could customise this so that you can store source data but it doesn't actually start a source maybe? It does however raise some questions:

  • How would you delete it?
  • How do we track the state?

Handle temporary connection issues better

The alert in

sentry.CaptureException(err)

is a bit overzealous and should wait for a bit before raising an alert.

Sentry Issue: BACKEND-15

syscall.Errno: connection refused
*os.SyscallError: connect: connection refused
*net.OpError: dial tcp [fd16:6e37:bf5d::1]:443: connect: connection refused
*url.Error: Get "https://[fd16:6e37:bf5d::1]:443/api/v1/namespaces": dial tcp [fd16:6e37:bf5d::1]:443: connect: connection refused
*rest.wrapPreviousError: Get "https://[fd16:6e37:bf5d::1]:443/api/v1/namespaces": dial tcp [fd16:6e37:bf5d::1]:443: connect: connection refused - error from a previous attempt: read tcp [2a05:d01c:997:3200:9a2f::9]:43548->[fd16:6e37:bf5d::1]:443: read: connection reset by peer
  File "/workspace/cmd/root.go", line 275, in run.func3

Review current state of k8s integration

David has already done a bunch of work to get the old source compiling. I need to review it and create tickets for what else needs to be done

We are going to need this no matter what so I need to get cracking on creating/resurrecting the k8s integration. Since we already have an example source the work should start by reviewing that: https://github.com/overmindtech/k8s-source

We should then go and create the source using the newest template and the knowledge that we've gained from the AWS source. I'm thinking that we'll almost certainly be able to use generics for the sources themselves

Memory leak

In kube we're seeing this pod being eventually evicted with the following errors:

  • The node was low on resource: memory. Container overmind-kube-source was using 1822764Ki, which exceeds its request of 0
  • The node was low on resource: memory. Container overmind-kube-source was using 2515040Ki, which exceeds its request of 0
  • The node was low on resource: memory. Container overmind-kube-source was using 3094324Ki, which exceeds its request of 0

This is a super massive amount of memory so it must be leaking somewhere.

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

This repository currently has no open or pending branches.

Detected dependencies

dockerfile
build/package/Dockerfile
  • golang 1.22-alpine
  • alpine 3.20
github-actions
.github/workflows/docker-cleanup.yml
  • actions/delete-package-versions v5
.github/workflows/release-charts.yml
  • actions/checkout v4
  • azure/setup-helm v4
  • helm/chart-releaser-action v1.6.0
.github/workflows/test-build.yml
  • actions/checkout v4
  • actions/setup-go v5
  • actions/checkout v4
  • docker/metadata-action v5
  • docker/login-action v3
  • docker/login-action v3
  • depot/setup-action v1
  • depot/build-push-action v1
  • actions/upload-artifact v4
  • actions/checkout v4
  • actions/upload-artifact v4
gomod
go.mod
  • go 1.22.4
  • github.com/MrAlias/otel-schema-utils v0.2.1-alpha
  • github.com/getsentry/sentry-go v0.28.1
  • github.com/google/uuid v1.6.0
  • github.com/overmindtech/discovery v0.27.6
  • github.com/overmindtech/sdp-go v0.79.0
  • github.com/overmindtech/sdpcache v1.6.4
  • github.com/sirupsen/logrus v1.9.3
  • github.com/spf13/cobra v1.8.1
  • github.com/spf13/pflag v1.0.5
  • github.com/spf13/viper v1.19.0
  • github.com/uptrace/opentelemetry-go-extra/otellogrus v0.3.1
  • go.opentelemetry.io/contrib/detectors/aws/ec2 v1.28.0
  • go.opentelemetry.io/otel v1.28.0
  • go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.28.0
  • go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.28.0
  • go.opentelemetry.io/otel/sdk v1.28.0
  • go.uber.org/automaxprocs v1.5.3
  • k8s.io/api v0.30.2
  • k8s.io/apimachinery v0.30.2
  • k8s.io/client-go v0.30.2
  • sigs.k8s.io/kind v0.23.0
  • go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.53.0
  • k8s.io/client-go v1.4.0
  • k8s.io/client-go v1.5.0
  • k8s.io/client-go v1.5.1
  • k8s.io/client-go v2.0.0+incompatible
  • k8s.io/client-go v3.0.0+incompatible
  • k8s.io/client-go v4.0.0+incompatible
  • k8s.io/client-go v5.0.0+incompatible
  • k8s.io/client-go v5.0.1+incompatible
  • k8s.io/client-go v6.0.0+incompatible
  • k8s.io/client-go v7.0.0+incompatible
  • k8s.io/client-go v8.0.0+incompatible
  • k8s.io/client-go v9.0.0-invalid+incompatible
  • k8s.io/client-go v9.0.0+incompatible
  • k8s.io/client-go v10.0.0+incompatible
  • k8s.io/client-go v11.0.0+incompatible
helm-values
deployments/overmind-kube-source/values.yaml

  • Check this box to trigger a request for Renovate to run again on this repository

Create generic kube source

Using the existing reflection implementation as an example, I need to create a k8s source using generics.

  • Implement a source using generics that satisfies the discovery.Source interface
  • Create a pod & node source as an example of namespaced and non-namespaced
  • Review developer experience and move duplication into the source
  • Create comprehensive tests

Auto-determine cluster name in EKS

This is possible and will us relying on the user configuring the source correctly which is always a good thing for everyone involved.

  • We can use the IPDSv2 API to determine the details of the instance such as the region, ID etc
  • We can then use the EC2 API to get the tags from the instance
  • From the tags we can determine the name of the cluster, this can be validated by getting the cluster details

TODO: Work out how this would work in Fargate

Implement high priority k8s sources

Implement the following sources:

  • refactor clusterrolebindings
  • refactor clusterroles
  • refactor configmaps
  • refactor cronjobs
  • refactor daemonsets
  • refactor deployments
  • refactor endpoints
  • refactor endpointslices
  • refactor horizontalpodautoscalers
  • refactor ingresses
  • refactor jobs
  • refactor limitranges
  • refactor namespaces I'm got skip this one
  • refactor networkpolicy
  • refactor nodes
  • refactor persistentvolumeclaims
  • refactor persistentvolumes
  • refactor poddisruptionbudgets
  • refactor pods
  • create podsecuritypolicies Doesn't work
  • create priorityclass
  • refactor replicasets
  • refactor replicationcontrollers
  • create resourcequota
  • refactor rolebindings
  • refactor roles
  • refactor secrets
  • refactor serviceaccounts
  • refactor services
  • refactor statefulsets
  • refactor storageclasses
  • create volumeattachments

Once this is complete go through all the shared stuff and delete anything that isn't required

Then go through the start command and make sure all of these get loaded properly, including reload on changed namespaces

Final k8s integration work

Work out the final bits we need in order for people to use the Kuebernetes source. This means:

  • Test and document helm charts
  • #85
  • Create a modal or whatever for adding a kube source. This should allow the user to add an API key for that source that already has the correct scopes, and once that is done it give you output to copy that creates the source from helm
  • Allow srcman to support external sources (ones that it doesn't need to run)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.