Git Product home page Git Product logo

kube-thanos's People

Contributors

ahysing avatar brancz avatar bwplotka avatar clyang82 avatar craigfurman avatar dgrisonnet avatar douglascamata avatar dschaaff avatar jacobbaungard avatar jpugliesi avatar kakkoyun avatar lilic avatar marcolan018 avatar maxbrunet avatar metalmatze avatar michael-burt avatar morvencao avatar mwasilew2 avatar onprem avatar paulfantom avatar philipgough avatar qianchenglong avatar s-urbaniak avatar saswatamcode avatar squat avatar srueg avatar thibaultmg avatar vprashar2929 avatar wujie1993 avatar yeya24 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kube-thanos's Issues

Regenerate Manifests

The manifests aren't up to date with the latest code on master, would it be possible to re-generate them? Another option would be to remove the folder and generate the manifests on tag creation as a release artifact?

add extraEnv to all containers to make injection of custom env vars easier

I need to inject some environment variables to all Thanos pods (to configure tracing). Considering this is currently not supported by the lib I'll be probably doing this by patching multiple components in the generated json.

Would adding support for it be something you'd be interested in? I'm happy to prepare a PR

For example, for kube-thanos-store it could be something like:

(...)
      env: [
        (existing env config)
      ] + (
        if std.length(ts.config.extraEnv) > 0 then [
         ts.config.extraEnv,
        ] else []
      ),

(...)

not sure who the maintainers are, pinging @yeya24 @brancz @metalmatze for some feedback

Full example of kube-thanos and kube-prometheus

Hey! I've spent a good part of today combining kube-thanos and kube-prometheus into one jsonnet file. I think the process could have been a lot smoother with examples (Like a full combo of the two mixed together) and if some defaults were set to mesh a bit more nicely with kube-prometheus (such as the namespace).

I could potentially throw my config at you the developers if I have opportunity to remove confidential information. Would that be of use?

"thanos-objectstorage" not found

  1. kubectl apply -f manifests/
  2. kubectl describe po -n thanos thanos-store-0

Just got error: Error: secret "thanos-objectstorage" not found and i checked yaml files,have not found file of secret yaml.

# ls manifests/
thanos-query-deployment.yaml      thanos-query-serviceMonitor.yaml  thanos-store-serviceAccount.yaml  thanos-store-service.yaml
thanos-query-serviceAccount.yaml  thanos-query-service.yaml         thanos-store-serviceMonitor.yaml  thanos-store-statefulSet.yaml

Add kustomization.yaml

I was wondering if I could add a kustomization.yaml that list the manifest files. kube-prometheus has one and would make it easier for teams using kustomization to just pull upstream changes into their kustomization when they are are building thanos.

Not all VMs are equal. Support for kubernetes tolerations and nodeAffinity

I work in an organisation where we are heavy users of kubernetes running on Microsoft Azure AKS .
Thanos and kube-thanos has worked out great for us. However thanos requires more memory than what we have on ordinary application servers. The solution is to schedule thanos to run on a different node pool with more memory than normal applications. To achieve this one could use a combination of two features in kubernetes; Taints and Tolerations and Node Affinity .

In the current version of kube-thanos these two fields are not configurable. I hope to contribute to the community a pull request where these two sections can be set up with jsonnet-bundler.

The end result should contain tolerations to all objects of kind: Deployment:

apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: query-layer
      app.kubernetes.io/instance: thanos-query
      app.kubernetes.io/name: thanos-query
  template:
    metadata:
      labels:
        app.kubernetes.io/component: query-layer
        app.kubernetes.io/instance: thanos-query
        app.kubernetes.io/name: thanos-query
        app.kubernetes.io/version: v0.19.0
    spec:
      tolerations: 
        - effect: NoSchedule
          key: CriticalAddonsOnly
          operator: Equal
          value: "true"
...

The end result should also contain nodeAffinity to all objects of kind: Deployment:

apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app.kubernetes.io/component: query-layer
      app.kubernetes.io/instance: thanos-query
      app.kubernetes.io/name: thanos-query
  template:
    metadata:
      labels:
        app.kubernetes.io/component: query-layer
        app.kubernetes.io/instance: thanos-query
        app.kubernetes.io/name: thanos-query
        app.kubernetes.io/version: v0.19.0
    spec:
      ...
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: agentpool
                operator: In
                values:
                - systempool
        podAntiAffinity:
        ...

A working solution should build on standard kubernetes configuration, and be generic enough to fit into a similar setup on all major cloud providers.

There might be other ways to achieve the same result on Azure Kubernetes Services. To run thanos on dedicated hardware. My proposal might not be the only good solution.

Unify naming conventions

We use command names for the names of files, we only have two exception querier and compactor, should we rename them?

What do you think?

Run components as non-root

Thanos components run as root by default:

$ docker run --rm -ti --entrypoint= quay.io/thanos/thanos:v0.14.0 id
uid=0(root) gid=0(root) groups=10(wheel)

Pods should probably have a restricted security context, I currently run them with the following (except receive and rule that I do not have):

securityContext:
  fsGroup: 2000
  runAsNonRoot: true
  runAsUser: 1000

Do you have an opinion on the uid/gid to use? 65534/nobody/nogroup seems popular too, but not everyone thinks this what they should be used for.

PodSecurityPolicy integration

I think there could be various ways to do this, but my first hunch goes towards allowing to pass a PodSecurityPolicy name to bind each component to.

add envoy proxy for remote side car

see thanos-io/thanos#977

As Thanos Query does not support mTLS per store currently, the recommended pattern is to add for each remote store an envoy proxy, that can terminate the TLS connection. It will be nice to add it to kube-thanos, to make it easier to deploy it with Thanos.

[Question] how to use the mixin.libsonnet?

Are there any recommendations on how to import the mixins?

I tried just importing mixin.libsonnet since it will import the other configs but now I'm getting this error saying there are duplicated rules and Prometheus will now go into a crash loop.

level=info ts=2019-09-24T17:44:32.693Z caller=main.go:670 msg="TSDB started"
level=info ts=2019-09-24T17:44:32.693Z caller=main.go:740 msg="Loading configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
level=info ts=2019-09-24T17:44:32.697Z caller=kubernetes.go:192 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-09-24T17:44:32.698Z caller=kubernetes.go:192 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-09-24T17:44:32.699Z caller=kubernetes.go:192 component="discovery manager scrape" discovery=k8s msg="Using pod service account via in-cluster config"
level=info ts=2019-09-24T17:44:32.700Z caller=kubernetes.go:192 component="discovery manager notify" discovery=k8s msg="Using pod service account via in-cluster config"
level=error ts=2019-09-24T17:44:32.749Z caller=manager.go:833 component="rule manager" msg="loading groups failed" err="groupname: \"thanos-querier.rules\" is repeated in the same file"
level=error ts=2019-09-24T17:44:32.749Z caller=manager.go:833 component="rule manager" msg="loading groups failed" err="groupname: \"thanos-receive.rules\" is repeated in the same file"
level=error ts=2019-09-24T17:44:32.749Z caller=manager.go:833 component="rule manager" msg="loading groups failed" err="groupname: \"thanos-store.rules\" is repeated in the same file"
level=error ts=2019-09-24T17:44:32.749Z caller=main.go:759 msg="Failed to apply configuration" err="error loading rules, previous rule set restored"
level=info ts=2019-09-24T17:44:32.749Z caller=main.go:523 msg="Stopping scrape discovery manager..."
level=info ts=2019-09-24T17:44:32.749Z caller=main.go:537 msg="Stopping notify discovery manager..."
level=info ts=2019-09-24T17:44:32.750Z caller=main.go:559 msg="Stopping scrape manager..."
level=info ts=2019-09-24T17:44:32.750Z caller=main.go:519 msg="Scrape discovery manager stopped"
level=error ts=2019-09-24T17:44:32.750Z caller=endpoints.go:131 component="discovery manager scrape" discovery=k8s role=endpoint msg="endpoints informer unable to sync cache"
level=error ts=2019-09-24T17:44:32.750Z caller=endpoints.go:131 component="discovery manager scrape" discovery=k8s role=endpoint msg="endpoints informer unable to sync cache"
level=error ts=2019-09-24T17:44:32.750Z caller=endpoints.go:131 component="discovery manager scrape" discovery=k8s role=endpoint msg="endpoints informer unable to sync cache"
level=info ts=2019-09-24T17:44:32.751Z caller=main.go:533 msg="Notify discovery manager stopped"
level=error ts=2019-09-24T17:44:32.751Z caller=endpoints.go:131 component="discovery manager notify" discovery=k8s role=endpoint msg="endpoints informer unable to sync cache"
level=info ts=2019-09-24T17:44:32.751Z caller=manager.go:815 component="rule manager" msg="Stopping rule manager..."
level=info ts=2019-09-24T17:44:32.751Z caller=manager.go:821 component="rule manager" msg="Rule manager stopped"
level=info ts=2019-09-24T17:44:32.753Z caller=notifier.go:602 component=notifier msg="Stopping notification manager..."
level=info ts=2019-09-24T17:44:32.753Z caller=main.go:724 msg="Notifier manager stopped"
level=info ts=2019-09-24T17:44:32.753Z caller=main.go:553 msg="Scrape manager stopped"
level=error ts=2019-09-24T17:44:32.753Z caller=main.go:733 err="error loading config from \"/etc/prometheus/config_out/prometheus.env.yaml\": one or more errors occurred while applying the new configuration (--config.file=\"/etc/prometheus/config_out/prometheus.env.yaml\")"

[Question/Proposal] How to track compatible Thanos

Is it a good practice to create matching version tags with Thanos itself? And cut releases accordingly?

There might be several versions out there and depending on the version Thanos uses different flags, for example. It would be really good to know which versions are compatible with the current state.

What do you think?

Add functionality to render individual components

kube-thanos has shaped up pretty nicely into a modular library. A concern that users still face is that they must render all components manually, even though they just pass a single config. See here for an example, not only is this leaky configuration with layering violations, it's inconvenient and error prone (it took multiple attempts to get certain parts right in the linked example.

It would be great if each individual component could offer .manifests field that could be recursively used to build the above without leaking configuration into the final rendering act.

@kakkoyun @metalmatze

I keep on getting a unbound immediate persistent claim on thanos-store

I am following this article : https://programmer.group/how-to-use-thanos-to-implement-prometheus-multi-cluster-monitoring.html

Everything is fine until I use kube-thanos build scritpt, I can build the manifests for store and query, but when I apply the manifest I keep on getting unbound immediate persistent claim on thanos-store. The OBJSTORE_ CONFIG to access minio works for the prometheus sidecar statefulset, but not for the store.

`apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app.kubernetes.io/component: object-store-gateway
app.kubernetes.io/instance: thanos-store
app.kubernetes.io/name: thanos-store
app.kubernetes.io/version: v0.17.0
name: thanos-store
namespace: monit
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: object-store-gateway
app.kubernetes.io/instance: thanos-store
app.kubernetes.io/name: thanos-store
serviceName: thanos-store
template:
metadata:
labels:
app.kubernetes.io/component: object-store-gateway
app.kubernetes.io/instance: thanos-store
app.kubernetes.io/name: thanos-store
app.kubernetes.io/version: v0.17.0
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- thanos-store
- key: app.kubernetes.io/instance
operator: In
values:
- thanos-store
namespaces:
- monit
topologyKey: kubernetes.io/hostname
weight: 100
containers:
- args:
- store
- --log.level=info
- --log.format=logfmt
- --data-dir=/var/thanos/store
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
- --objstore.config=$(OBJSTORE_CONFIG)
- --ignore-deletion-marks-delay=24h
env:
- name: OBJSTORE_CONFIG
valueFrom:
secretKeyRef:
key: thanos.yaml
name: thanos-objectstorage
image: quay.io/thanos/thanos:v0.17.0
livenessProbe:
failureThreshold: 8
httpGet:
path: /-/healthy
port: 10902
scheme: HTTP
periodSeconds: 30
name: thanos-store
ports:
- containerPort: 10901
name: grpc
- containerPort: 10902
name: http
readinessProbe:
failureThreshold: 20
httpGet:
path: /-/ready
port: 10902
scheme: HTTP
periodSeconds: 5
resources: {}
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: /var/thanos/store
name: data
readOnly: false
terminationGracePeriodSeconds: 120
volumes: []
volumeClaimTemplates:

  • metadata:
    labels:
    app.kubernetes.io/component: object-store-gateway
    app.kubernetes.io/instance: thanos-store
    app.kubernetes.io/name: thanos-store
    name: data
    spec:
    accessModes:
    • ReadWriteOnce
      resources:
      requests:
      storage: 3Gi`

When I add a storageClass to the volumeClaimTemplates spec section, it basically ignores it, no matter what I keep on getting this error. Can you please shed some light in to what's missing here ?

301 response missing Location header

getting the following error. Please let me know what is missing.

level=info ts=2020-02-25T12:27:04.082433095Z caller=main.go:149 msg="Tracing will be disabled"
level=info ts=2020-02-25T12:27:04.082519571Z caller=factory.go:43 msg="loading bucket configuration"
level=info ts=2020-02-25T12:27:04.099206846Z caller=inmemory.go:167 msg="created in-memory index cache" maxItemSizeBytes=131072000 maxSizeBytes=262144000 maxItems=math.MaxInt64
level=info ts=2020-02-25T12:27:04.099521724Z caller=options.go:20 protocol=gRPC msg="disabled TLS, key and cert must be set to enable"
level=info ts=2020-02-25T12:27:04.099760677Z caller=store.go:288 msg="starting store node"
level=info ts=2020-02-25T12:27:04.099850267Z caller=prober.go:127 msg="changing probe status" status=healthy
level=info ts=2020-02-25T12:27:04.099887538Z caller=http.go:53 service=http/server component=store msg="listening for requests and metrics" address=0.0.0.0:10902
level=info ts=2020-02-25T12:27:04.099942146Z caller=store.go:243 msg="initializing bucket store"
level=info ts=2020-02-25T12:27:04.131914347Z caller=prober.go:107 msg="changing probe status" status=ready
level=info ts=2020-02-25T12:27:04.131935092Z caller=http.go:78 service=http/server component=store msg="internal server shutdown" err="bucket store initial sync: sync block: MetaFetcher: iter bucket: Get https://amjad-thanos.s3.dualstack.eu-west-1.amazonaws.com/?delimiter=%2F&encoding-type=url&prefix=: 301 response missing Location header"
level=info ts=2020-02-25T12:27:04.131976289Z caller=prober.go:137 msg="changing probe status" status=not-healthy reason="bucket store initial sync: sync block: MetaFetcher: iter bucket: Get https://amjad-thanos.s3.dualstack.eu-west-1.amazonaws.com/?delimiter=%2F&encoding-type=url&prefix=: 301 response missing Location header"
level=info ts=2020-02-25T12:27:04.131983955Z caller=grpc.go:98 service=gRPC/server component=store msg="listening for StoreAPI gRPC" address=0.0.0.0:10901
level=warn ts=2020-02-25T12:27:04.131999302Z caller=prober.go:117 msg="changing probe status" status=not-ready reason="bucket store initial sync: sync block: MetaFetcher: iter bucket: Get https://amjad-thanos.s3.dualstack.eu-west-1.amazonaws.com/?delimiter=%2F&encoding-type=url&prefix=: 301 response missing Location header"
level=info ts=2020-02-25T12:27:04.13202616Z caller=grpc.go:117 service=gRPC/server component=store msg="gracefully stopping internal server"
level=info ts=2020-02-25T12:27:04.132126286Z caller=grpc.go:129 service=gRPC/server component=store msg="internal server shutdown" err="bucket store initial sync: sync block: MetaFetcher: iter bucket: Get https://amjad-thanos.s3.dualstack.eu-west-1.amazonaws.com/?delimiter=%2F&encoding-type=url&prefix=: 301 response missing Location header"
level=error ts=2020-02-25T12:27:04.13216824Z caller=main.go:194 msg="running command failed" err="bucket store initial sync: sync block: MetaFetcher: iter bucket: Get https://amjad-thanos.s3.dualstack.eu-west-1.amazonaws.com/?delimiter=%2F&encoding-type=url&prefix=: 301 response missing Location header"

following is my configuration:

- args:
    - store
    - --data-dir=/var/thanos/store
    - --grpc-address=0.0.0.0:10901
    - --http-address=0.0.0.0:10902
    - |
      --objstore.config=type: s3
      config:
        bucket: "amjad-thanos"
        endpoint: "s3.eu-west-1.amazonaws.com"
        access_key: "kajshdajkshd87098098"
        secret_key: "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxas"

all.jsonnet causes the store statefulset manifest to be duplicated

Currently the example jsonnet at all.jsonnet causes the StatefulSet for thanos-store to be duplicated, once with memcache and once without. This is probably due to

{ 'thanos-store-statefulSet-with-memcached': swm.statefulSet }
and this can be seen in https://github.com/thanos-io/kube-thanos/tree/a6a0027a3cd3da380479642debb202f2722710ee/examples/all/manifests

I suggest that the example be updated with a variable which takes in a boolean to check if memcached is required or not and updates the StatefulSet in-place. Let me know how that sounds and I can create a PR for the same.

kube-thanos + istio = broken GRPC

Hey! So I have istio installed in my cluster and am attempting to install Thanos components as well. With istio installed, none of my grpc connections are successful and querier receives the following for all services:

rpc error: code = Unavailable desc = upstream connect error or disconnect/reset before headers. reset reason: connection failure"

I've renamed the grpc service ports to http2 to no avail, I'm utilizing the dnssrv+ approach for local service discovery --store=dnssrv+_grpc._tcp.thanos-store.monitoring.svc.cluster.local
(--store=dnssrv+_http2._tcp.thanos-store.monitoring.svc.cluster.local) with the name swap

Istio mutual TLS is disabled and istio-proxy logs don't give me any clues, however without istio, everything can discover one another fine, and works as intended.

Any ideas?

Eliminate job regex matcher on Alerts

Although I'm the one who introduced those, I realized they are error-prone and they could do more harm then they make things easier. Better to remove those selectors for more reliable alerting rules.

Add healthcheck probes

The containers are missing liveness/readiness probes.

As its a Kubernetes best-practice, we should get them added.

I can work on it once I figure out my jb crashing locally :)

Side car is not able to clean up block (Unsupported type of io.Reader)

Hi
I am using the thanos sidecar for prometheus which is from operator after the block generation the block is not uplodig success ful some irregular formates are geenerating these are the logs.

level=warn ts=2020-11-26T03:03:51.020333618Z caller=s3.go:399 msg="could not guess file size for multipart upload; upload might be not optimized" name=debug/metas/01ER17WJWGR8Z44CHR42DQW3SY.json err="unsupported type of io.Reader"
level=error ts=2020-11-26T03:04:12.102790195Z caller=shipper.go:349 msg="failed to clean upload directory" err="unlinkat /prometheus/thanos/upload/01ER17WJWGR8Z44CHR42DQW3SY/chunks: directory not empty"

How-to add nodeSelector restriction to components?

Hi there,

Before the recent refactor, I was able to add by nodeSelector restriction, like the following example:

local s = 
  t.store + 
  t.store.withVolumeClaimTemplate + 
  t.store.withServiceMonitor + 
  commonConfig + {
    config+:: {
      name: 'thanos-store',
      replicas: 1,
    },
    statefulSet+: {
      spec+: {
        template+: {
          spec+: {
            nodeSelector+: {
              'k8s.scaleway.com/pool-name': 'kubernetes-infra',
            },
          },
        },
      },
    },
  };

With the new format, how can I achieve the same behavior?

I tried the following:

local s = t.store(commonConfig {
  replicas: 1,
  serviceMonitor: true,
  statefulSet+: {
    spec+: {
      template+: {
        spec+: {
          nodeSelector+: {
            'k8s.scaleway.com/pool-name': 'kubernetes-infra',
          },
        },
      },
    },
  },
});

Thanks in advance,

support jaeger tracing

It is convenient to add basic tracing configuration to the jsonnet files. I am not sure if this is in the scope of this repo?

what can i replace here "dnssrv+_client._tcp.<MEMCACHED_SERVICE>.thanos.svc.cluster.local"

i had to tryed
thano-store, thano-store-0, thano-store-1, thano-store-2 but pod not started. could you advise

level=info ts=2022-03-24T07:39:58.779840562Z caller=caching_bucket_factory.go:71 msg="loading caching bucket configuration"
level=error ts=2022-03-24T07:39:58.781341992Z caller=resolver.go:99 msg="failed to lookup SRV records" host=_client._tcp.thanos-store.thanos.svc.cluster.local err="no such host"
level=error ts=2022-03-24T07:39:58.781506043Z caller=main.go:132 err="no server address resolved for \nfailed to create memcached client\ngithub.com/thanos-io/thanos/pkg/store/cache.NewCachingBucketFromYaml\n\t/home/circleci/project/pkg/store/cache/caching_bucket_factory.go:92\nmain.runStore\n\t/home/circleci/project/cmd/thanos/store.go:260\nmain.registerStore.func1\n\t/home/circleci/project/cmd/thanos/store.go:195\nmain.main\n\t/home/circleci/project/cmd/thanos/main.go:130\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371\ncreate caching bucket\nmain.runStore\n\t/home/circleci/project/cmd/thanos/store.go:262\nmain.registerStore.func1\n\t/home/circleci/project/cmd/thanos/store.go:195\nmain.main\n\t/home/circleci/project/cmd/thanos/main.go:130\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371\npreparing store command failed\nmain.main\n\t/home/circleci/project/cmd/thanos/main.go:132\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1371"

side car service name is incorrect

The sidecar.libsonnet add a store with prometheus-k8s.monitoring, which is the regular Prometheus service, instead of pointing to prometheus-operated.monitoring, which is the headless service.

Thanos v0.17.0 and `--experimental.enable-index-cache-postings-compression`

The flag --experimental.enable-index-cache-postings-compression was removed from Thanos v0.17.0, and its behaviour is now the default.

At the time of writing, kube-thanos sets this flag when an index cache is configured: https://github.com/thanos-io/kube-thanos/blob/master/jsonnet/kube-thanos/kube-thanos-store.libsonnet#L64. This is itself a little strange to me, because an in-memory index cache is used by default, even if it's not explicitly configured.

I'm new to this project and am not sure yet what the best way forward is. At the very least I think kube-thanos will have to pull the version from config and not set this flag when v0.17.0+ is used. That also avoids a breaking change.

I'm happy to PR this, but figured it'd be best to get maintainer opinions first.

Add HOST_IP_ADDRESS env to all containers

It is not uncommon to deploy tracing agents as daemonsets on every node. The way the tracing libraries are then configured is typically by configuring an environment variable via the Kubernetes downward API. It seems paradox to force this onto downstream users, when the tracing config is already available.

Since it is not harmful to users that don't use this environment variable, I propose we automatically add the HOST_IP_ADDRESS environment variable to each Thanos component container, so that for example for tracing purposes the environment variable can be used directly.

cc @metalmatze @kakkoyun

Support full stack in kustomization.yaml

The current kustomization.yaml file doesn't support all the components of the Thanos stack. Can it be updated to use the manifests from the examples/all-manifests folder or can we add a kustomization.yaml file to the that folder?
Would be happy to create a PR if it can be decided where the best location is.
Thanks

Support specify the annotations for SAs

We have a case to support AWS STS. It is supported by thanos

STS Endpoint

If you want to use IAM credential retrieved from an instance profile, Thanos needs to authenticate through AWS STS. For this purposes you can specify your own STS Endpoint.

By default Thanos will use endpoint: https://sts.amazonaws.com/ and AWS region coresponding endpoints.

In order to support in STS/ROSA clusters, I need to add annotated for SAs: "thanos-store-shard", "thanos-compact", "thanos-receive" , "thanos-receive-controller" to provide ARN Permissoin Policy.

thanos-receive ingress

Hello everyone, can somebody help me with ingress configuration for thanos-receive?
I have pretty good working thanos stack running on minikube, but when I deploy it to testing k8s env., I have a lot errors in logs:


apiVersion: extensions/v1beta1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/auth-realm: Authentication Required
# nginx.ingress.kubernetes.io/backend-protocol: GRPC
nginx.ingress.kubernetes.io/proxy-buffer-size: "128k"
nginx.ingress.kubernetes.io/proxy-connect-timeout: "40"
nginx.ingress.kubernetes.io/proxy-buffers-number: "8"
nginx.ingress.kubernetes.io/proxy-buffering: "on"
nginx.ingress.kubernetes.io/proxy-max-temp-file-size: "1024m"
nginx.ingress.kubernetes.io/auth-secret: prometheus-auth
nginx.ingress.kubernetes.io/auth-type: basic
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: "20m"
name: thanos-receive
namespace: monitoring
spec:
rules:

  • host: receive-thanos.sandbox.k8s.sandbox.example.com
    http:
    paths:
    • backend:
      serviceName: thanos-receive
      servicePort: 19291
      path: /
      tls:
  • hosts:
    • receive-thanos.sandbox.k8s.sandbox.example.com
      secretName: thanos-tls-certs

logs of ingress pod:
2021/02/24 09:19:32 [error] 89#89: *1227 upstream timed out (110: Connection timed out) while connecting to upstream, client: 1.2.3.4, server: receive-thanos.sandbox.k8s.sandbox.example.com, request: "POST /api/v1/receive HTTP/1.1", upstream: "http://100.96.24.19:19291/api/v1/receive", host: "receive-thanos.sandbox.k8s.sandbox.example.com"

logs of receive:
level=error ts=2021-02-24T09:37:28.552040714Z caller=handler.go:330 component=receive component=receive-handler err="context deadline exceeded" msg="internal server error" level=debug ts=2021-02-24T09:37:20.829019499Z caller=handler.go:315 component=receive component=receive-handler msg="failed to handle request" err="context deadline exceeded"

logs of prometheus
ts=2021-02-24T10:06:20.725Z caller=dedupe.go:112 component=remote level=warn remote_name=b29d86 url=https://receive-thanos.sandbox.k8s.sandbox.example.com/api/v1/receive msg="Failed to send batch, retrying" err="server returned HTTP status 502 Bad Gateway: <html>"

Non-default ServiceAccounts

All components use the default service account right now which is problematic from a security standpoint, as in GCP for example through workload identity the object storage bucket permissions are given through the service account, so even components that don't need object storage access get it currently.

I'll prepare a PR to create a ServiceAccount per component.

@kakkoyun @metalmatze

Breaking update to store shard statefulsets

Store shards deployed before #199 was merged cannot easily be updated with the latest revision of kube-thanos, because of 2 changes made to StatefulSet.spec: volumeClaimTemplates, and selector. The error I'm seeing from Kubernetes is updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden.

This isn't a huge problem because of being able to jsonnet-patch around this, but was still a little time consuming to diagnose and fix, and I wonder if there's a reasonable place to document this to save future users the time.

Docs/Version question

I was reading through the README and it says to use another repo, is this correct?

jb install github.com/metalmatze/kube-thanos/jsonnet/kube-thanos@master

thanos-store without volumeClaimTemplate not working

According to the docs, backing thanos-store with a PVC is optional:

It [thanos-store] acts primarily as an API gateway and therefore does not need significant amounts of local disk space. It joins a Thanos cluster on startup and advertises the data it can access. It keeps a small amount of information about all remote blocks on local disk and keeps it in sync with the bucket. This data is generally safe to delete across restarts at the cost of increased startup times.

When compiling a jsonnet configuration not containing config.volumeClaimTemplate, the statefulset is never created because the "data" volume can not be found:

$ kubectl -n monitoring describe statefulset/thanos-store
Name:               thanos-store
Namespace:          monitoring
CreationTimestamp:  Mon, 22 Nov 2021 15:10:46 +0000
Selector:           app.kubernetes.io/component=object-store-gateway,app.kubernetes.io/instance=thanos-store,app.kubernetes.io/name=thanos-store
Labels:             app.kubernetes.io/component=object-store-gateway
                    app.kubernetes.io/instance=thanos-store
                    app.kubernetes.io/name=thanos-store
                    app.kubernetes.io/version=v0.22.0
                    kustomize.toolkit.fluxcd.io/name=kube-prometheus-thanos
                    kustomize.toolkit.fluxcd.io/namespace=monitoring
Annotations:        <none>
Replicas:           1 desired | 0 total
Update Strategy:    RollingUpdate
  Partition:        0
Pods Status:        0 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:           app.kubernetes.io/component=object-store-gateway
                    app.kubernetes.io/instance=thanos-store
                    app.kubernetes.io/name=thanos-store
                    app.kubernetes.io/version=v0.22.0
  Service Account:  thanos-store
  Containers:
   thanos-store:
    Image:       quay.io/thanos/thanos:v0.22.0
    Ports:       10901/TCP, 10902/TCP
    Host Ports:  0/TCP, 0/TCP
    Args:
      store
      --log.level=info
      --log.format=logfmt
      --data-dir=/var/thanos/store
      --grpc-address=0.0.0.0:10901
      --http-address=0.0.0.0:10902
      --objstore.config=$(OBJSTORE_CONFIG)
      --ignore-deletion-marks-delay=24h
    Liveness:   http-get http://:10902/-/healthy delay=0s timeout=1s period=30s #success=1 #failure=8
    Readiness:  http-get http://:10902/-/ready delay=0s timeout=1s period=5s #success=1 #failure=20
    Environment:
      OBJSTORE_CONFIG:  <set to the key 'thanos.yaml' in secret 'thanos-objectstorage'>  Optional: false
      HOST_IP_ADDRESS:   (v1:status.hostIP)
    Mounts:
      /var/thanos/store from data (rw)
  Volumes:      <none>
Volume Claims:  <none>
Events:
  Type     Reason        Age                   From                    Message
  ----     ------        ----                  ----                    -------
  Warning  FailedCreate  8m41s (x19 over 30m)  statefulset-controller  create Pod thanos-store-0 in StatefulSet thanos-store failed error: Pod "thanos-store-0" is invalid: spec.containers[0].volumeMounts[0].name: Not found: "data"

Logic exists to ensure any passed volumeClaimTemplate is qualified to be used with thanos-store, if a template is passed: https://github.com/thanos-io/kube-thanos/blob/main/jsonnet/kube-thanos/kube-thanos-store.libsonnet#L28

This logic should be extended to include an emptyDir volume definition for "data" in the StatefulSet if no volumeClaimTemplate is passed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.