Git Product home page Git Product logo

opentelemetry-helm-charts's Introduction

OpenTelemetry Helm Charts

License Artifact Hub

This repository contains Helm charts for OpenTelemetry project.

Usage

Helm must be installed to use the charts. Please refer to Helm's documentation to get started.

Once Helm is set up properly, add the repo as follows:

$ helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts

Helm Charts

You can then run helm search repo open-telemetry to see the charts.

OpenTelemetry Collector

The chart can be used to install OpenTelemetry Collector in a Kubernetes cluster. More detailed documentation can be found in OpenTelemetry Collector chart directory.

OpenTelemetry Demo

The chart can be used to install OpenTelemetry Demo in a Kubernetes cluster. More detailed documentation can be found in OpenTelemetry Demo chart directory.

OpenTelemetry Operator

The chart can be used to install OpenTelemetry Operator in a Kubernetes cluster. More detailed documentation can be found in OpenTelemetry Operator chart directory.

Contributing

See CONTRIBUTING.md.

Approvers (@open-telemetry/helm-approvers):

Emeritus Approvers:

Maintainers (@open-telemetry/helm-maintainers):

Emeritus Maintainers:

Learn more about roles in the community repository.

License

Apache 2.0 License.

opentelemetry-helm-charts's People

Contributors

2start avatar allex1 avatar ancostas avatar austinlparker avatar awxfeix avatar bogdandrutu avatar chrsmark avatar chunter0 avatar dependabot[bot] avatar dmitryax avatar eislm0203 avatar ekarlso avatar erichsueh3 avatar jaredtan95 avatar jaronoff97 avatar jinja2 avatar joshleecreates avatar julianocosta89 avatar nicolastakashi avatar povilasv avatar puckpuck avatar rabunkosar-dd avatar saber-w avatar secustor avatar sherifkayad avatar swiatekm-sumo avatar tylerhelmuth avatar yamagai avatar zehenrique avatar zrochler avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

opentelemetry-helm-charts's Issues

failed to push trace data via OTLP exporter: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: tls: first record does not look like a TLS handshake"

using chart open-telemetry/opentelemetry-collector: 0.6.0
using chart grafana/tempo: 0.7.7 (for traces backend)

Problem is that if I'm trying to push traces from python code into otel collector, I get errors.

2021-10-09T03:01:33.665Z	info	service/service.go:411	Starting OpenTelemetry Collector...	{"Version": "v0.22.0", "GitHash": "0345f0d2", "NumCPU": 2}
2021-10-09T03:01:33.669Z	info	service/service.go:593	Using memory ballast	{"MiBs": 204}
2021-10-09T03:01:33.669Z	info	service/service.go:255	Setting up own telemetry...
2021-10-09T03:01:33.670Z	info	service/telemetry.go:102	Serving Prometheus metrics	{"address": "0.0.0.0:8888", "level": 0, "service.instance.id": "be91d61f-ee6e-445a-9275-d9c4ac5343c6"}
2021-10-09T03:01:33.670Z	info	service/service.go:292	Loading configuration...
2021-10-09T03:01:33.672Z	info	service/service.go:303	Applying configuration...
2021-10-09T03:01:33.749Z	info	service/service.go:324	Starting extensions...
2021-10-09T03:01:33.749Z	info	builder/extensions_builder.go:53	Extension is starting...	{"component_kind": "extension", "component_type": "health_check", "component_name": "health_check"}
2021-10-09T03:01:33.749Z	info	healthcheckextension/healthcheckextension.go:40	Starting health_check extension	{"component_kind": "extension", "component_type": "health_check", "component_name": "health_check", "config": {"TypeVal":"health_check","NameVal":"health_check","Port":13133}}
2021-10-09T03:01:33.750Z	info	builder/extensions_builder.go:59	Extension started.	{"component_kind": "extension", "component_type": "health_check", "component_name": "health_check"}
2021-10-09T03:01:33.750Z	info	builder/exporters_builder.go:302	Exporter is enabled.	{"component_kind": "exporter", "exporter": "otlp"}
2021-10-09T03:01:33.750Z	info	builder/exporters_builder.go:302	Exporter is enabled.	{"component_kind": "exporter", "exporter": "logging"}
2021-10-09T03:01:33.750Z	info	service/service.go:339	Starting exporters...
2021-10-09T03:01:33.750Z	info	builder/exporters_builder.go:92	Exporter is starting...	{"component_kind": "exporter", "component_type": "otlp", "component_name": "otlp"}
2021-10-09T03:01:33.751Z	info	builder/exporters_builder.go:97	Exporter started.	{"component_kind": "exporter", "component_type": "otlp", "component_name": "otlp"}
2021-10-09T03:01:33.752Z	info	builder/exporters_builder.go:92	Exporter is starting...	{"component_kind": "exporter", "component_type": "logging", "component_name": "logging"}
2021-10-09T03:01:33.752Z	info	builder/exporters_builder.go:97	Exporter started.	{"component_kind": "exporter", "component_type": "logging", "component_name": "logging"}
2021-10-09T03:01:33.752Z	info	memorylimiter/memorylimiter.go:108	Memory limiter configured	{"component_kind": "processor", "component_type": "memory_limiter", "component_name": "memory_limiter", "limit_mib": 428867584, "spike_limit_mib": 134217728, "check_interval": 5}
2021-10-09T03:01:33.752Z	info	builder/pipelines_builder.go:203	Pipeline is enabled.	{"pipeline_name": "logs", "pipeline_datatype": "logs"}
2021-10-09T03:01:33.752Z	info	memorylimiter/memorylimiter.go:108	Memory limiter configured	{"component_kind": "processor", "component_type": "memory_limiter", "component_name": "memory_limiter", "limit_mib": 428867584, "spike_limit_mib": 134217728, "check_interval": 5}
2021-10-09T03:01:33.752Z	info	builder/pipelines_builder.go:203	Pipeline is enabled.	{"pipeline_name": "traces", "pipeline_datatype": "traces"}
2021-10-09T03:01:33.752Z	info	service/service.go:352	Starting processors...
2021-10-09T03:01:33.752Z	info	builder/pipelines_builder.go:51	Pipeline is starting...	{"pipeline_name": "logs", "pipeline_datatype": "logs"}
2021-10-09T03:01:33.752Z	info	builder/pipelines_builder.go:61	Pipeline is started.	{"pipeline_name": "logs", "pipeline_datatype": "logs"}
2021-10-09T03:01:33.752Z	info	builder/pipelines_builder.go:51	Pipeline is starting...	{"pipeline_name": "traces", "pipeline_datatype": "traces"}
2021-10-09T03:01:33.752Z	info	builder/pipelines_builder.go:61	Pipeline is started.	{"pipeline_name": "traces", "pipeline_datatype": "traces"}
2021-10-09T03:01:33.752Z	info	builder/receivers_builder.go:230	Receiver is enabled.	{"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp", "datatype": "traces"}
2021-10-09T03:01:33.752Z	info	builder/receivers_builder.go:230	Receiver is enabled.	{"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp", "datatype": "logs"}
2021-10-09T03:01:33.752Z	info	builder/receivers_builder.go:105	Ignoring receiver as it is not used by any pipeline	{"component_kind": "receiver", "component_type": "prometheus", "component_name": "prometheus", "receiver": "prometheus"}
2021-10-09T03:01:33.752Z	info	builder/receivers_builder.go:230	Receiver is enabled.	{"component_kind": "receiver", "component_type": "zipkin", "component_name": "zipkin", "datatype": "traces"}
2021-10-09T03:01:33.752Z	info	builder/receivers_builder.go:230	Receiver is enabled.	{"component_kind": "receiver", "component_type": "jaeger", "component_name": "jaeger", "datatype": "traces"}
2021-10-09T03:01:33.752Z	info	service/service.go:364	Starting receivers...
2021-10-09T03:01:33.752Z	info	builder/receivers_builder.go:70	Receiver is starting...	{"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp"}
2021-10-09T03:01:33.752Z	info	otlpreceiver/otlp.go:93	Starting GRPC server on endpoint 0.0.0.0:4317	{"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp"}
2021-10-09T03:01:33.752Z	info	otlpreceiver/otlp.go:130	Setting up a second GRPC listener on legacy endpoint 0.0.0.0:55680	{"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp"}
2021-10-09T03:01:33.752Z	info	otlpreceiver/otlp.go:93	Starting GRPC server on endpoint 0.0.0.0:55680	{"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp"}
2021-10-09T03:01:33.752Z	info	otlpreceiver/otlp.go:108	Starting HTTP server on endpoint 0.0.0.0:55681	{"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp"}
2021-10-09T03:01:33.752Z	info	builder/receivers_builder.go:75	Receiver started.	{"component_kind": "receiver", "component_type": "otlp", "component_name": "otlp"}
2021-10-09T03:01:33.752Z	info	builder/receivers_builder.go:70	Receiver is starting...	{"component_kind": "receiver", "component_type": "zipkin", "component_name": "zipkin"}
2021-10-09T03:01:33.752Z	info	builder/receivers_builder.go:75	Receiver started.	{"component_kind": "receiver", "component_type": "zipkin", "component_name": "zipkin"}
2021-10-09T03:01:33.752Z	info	builder/receivers_builder.go:70	Receiver is starting...	{"component_kind": "receiver", "component_type": "jaeger", "component_name": "jaeger"}
2021-10-09T03:01:33.752Z	info	static/strategy_store.go:201	No sampling strategies provided or URL is unavailable, using defaults	{"component_kind": "receiver", "component_type": "jaeger", "component_name": "jaeger"}
2021-10-09T03:01:33.752Z	info	builder/receivers_builder.go:75	Receiver started.	{"component_kind": "receiver", "component_type": "jaeger", "component_name": "jaeger"}
2021-10-09T03:01:33.752Z	info	healthcheck/handler.go:128	Health Check state change	{"component_kind": "extension", "component_type": "health_check", "component_name": "health_check", "status": "ready"}
2021-10-09T03:01:33.752Z	info	service/service.go:267	Everything is ready. Begin running and processing data.

then I submit a sample trace with python app and get this error in pod logs:

2021-10-09T03:05:52.417Z info exporterhelper/queued_retry.go:276 Exporting failed. Will retry the request after interval. {"component_kind": "exporter", "component_type": "otlp", "component_name": "otlp", "error": "failed to push trace data via OTLP exporter: rpc error: code = Unavailable desc = connection error: desc = "transport: authentication handshake failed: tls: first record does not look like a TLS handshake"", "interval": "4.615761008s"}

helm chart values:

config:
  exporters:
    otlp:
      endpoint: tempo.monitoring:4317
  service:
    pipelines:
      metrics: null
      traces:
        exporters:
          - otlp
  receivers:
    jaeger:
      protocols:
        grpc:
          endpoint: 0.0.0.0:14250
        thrift_http:
          endpoint: 0.0.0.0:14268
    otlp:
      protocols:
        grpc: 
          endpoint: 0.0.0.0:4317
        http: 
          endpoint: 0.0.0.0:55681

nodeSelector:
  apps: "true"            
standaloneCollector:
  enabled: false

test.py

from opentelemetry import trace
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import (
    OTLPSpanExporter,
)
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import (
    BatchSpanProcessor,
    ConsoleSpanExporter,
)

span_exporter = OTLPSpanExporter(
   # endpoint="tempo.monitoring:4317",
   endpoint="10.120.4.103:4317",
   insecure=True
)
provider = TracerProvider()
processor = BatchSpanProcessor(ConsoleSpanExporter())
span_processor = BatchSpanProcessor(span_exporter)
provider.add_span_processor(processor)
provider.add_span_processor(span_processor)
trace.set_tracer_provider(provider)


tracer = trace.get_tracer(__name__)

with tracer.start_as_current_span("foo"):
    with tracer.start_as_current_span("bar"):
        with tracer.start_as_current_span("baz"):
            print("Hello world from OpenTelemetry Python!")

if however I change to OTLPSpanExporter to

span_exporter = OTLPSpanExporter(
   endpoint="tempo.monitoring:4317",   
   insecure=True
)

then everything works and I can see a trace in grafana tempo:

curl tempo.monitoring:3100/api/traces/695df1582064293a3be4fbac4285c779
{"batches":[{"resource":{"attributes":[{"key":"telemetry.sdk.language","value":{"stringValue":"python"}},{"key":"telemetry.sdk.name","value":{"stringValue":"opentelemetry"}},{"key":"telemetry.sdk.version","value":{"stringValue":"1.0.0"}},{"key":"service.name","value":{"stringValue":"unknown_service"}}]},"instrumentationLibrarySpans":[{"instrumentationLibrary":{"name":"__main__"},"spans":[{"traceId":"aV3xWCBkKTo75PusQoXHeQ==","spanId":"heVf3cr2TEk=","parentSpanId":"2JXZSaNcNVk=","name":"baz","kind":"SPAN_KIND_INTERNAL","startTimeUnixNano":"1633749090560979667","endTimeUnixNano":"1633749090561020135","status":{}},{"traceId":"aV3xWCBkKTo75PusQoXHeQ==","spanId":"2JXZSaNcNVk=","parentSpanId":"VyXEDPYNvOc=","name":"bar","kind":"SPAN_KIND_INTERNAL","startTimeUnixNano":"1633749090560949791","endTimeUnixNano":"1633749090561049979","status":{}},{"traceId":"aV3xWCBkKTo75PusQoXHeQ==","spanId":"VyXEDPYNvOc=","name":"foo","kind":"SPAN_KIND_INTERNAL","startTimeUnixNano":"1633749090560726098","endTimeUnixNano

just in case configmap opentelemetry-collector-agent

kgcm opentelemetry-collector-agent -o yaml     
apiVersion: v1
data:
  relay: |
    exporters:
      logging: {}
      otlp:
        endpoint: tempo.monitoring:4317
    extensions:
      health_check: {}
    processors:
      batch: {}
      memory_limiter:
        ballast_size_mib: 204
        check_interval: 5s
        limit_mib: 409
        spike_limit_mib: 128
    receivers:
      jaeger:
        protocols:
          grpc:
            endpoint: 0.0.0.0:14250
          thrift_http:
            endpoint: 0.0.0.0:14268
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317
          http:
            endpoint: 0.0.0.0:55681
      prometheus:
        config:
          scrape_configs:
          - job_name: opentelemetry-collector
            scrape_interval: 10s
            static_configs:
            - targets:
              - ${MY_POD_IP}:8888
      zipkin:
        endpoint: 0.0.0.0:9411
    service:
      extensions:
      - health_check
      pipelines:
        logs:
          exporters:
          - logging
          processors:
          - memory_limiter
          - batch
          receivers:
          - otlp
        traces:
          exporters:
          - otlp
          processors:
          - memory_limiter
          - batch
          receivers:
          - otlp
          - jaeger
          - zipkin
kind: ConfigMap

Regex for extract_metadata_from_filepath added by #36 is too strict

The regex is currently:
^.*\/(?P<namespace>[^_]+)_(?P<pod_name>[^_]+)_(?P<uid>[a-f0-9\-]{36})\/(?P<container_name>[^\._]+)\/(?P<run_id>\d+)\.log$

The uid check for 36 characters is too strict. I had to change the capture group to (?P<uid>[^\/]+), which also matches with what is currently in PR #38

oltp grpc exporter failing in k8s deployment using helm charts

Hi Team,

I have deployed both otel-collector as daemonset and standalone deployment as per the README docs in below link

https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-collector

agentCollector:
enabled: true
standaloneCollector:
enabled: true

The oltp grpc exporter in agent pod is failing with below error:

2022-01-25T05:53:25.907Z	info	service/telemetry.go:116	Serving Prometheus metrics	{"address": "0.0.0.0:8888", "level": "basic", "service.instance.id": "ebb82220-7804-4da1-af63-1bd1055120e8", "service.version": "latest"}
2022-01-25T05:53:25.907Z	info	service/collector.go:230	Starting otelcontribcol...	{"Version": "v0.37.1", "NumCPU": 4}
2022-01-25T05:53:25.907Z	info	service/collector.go:132	Everything is ready. Begin running and processing data.
2022-01-25T05:53:37.500Z	info	exporterhelper/queued_retry.go:231	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "name": "otlp", "error": "failed to push metrics data via OTLP exporter: rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "5.810290538s"}
2022-01-25T05:53:56.408Z	info	exporterhelper/queued_retry.go:231	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "name": "otlp", "error": "failed to push metrics data via OTLP exporter: rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "7.534661199s"}
2022-01-25T05:54:07.502Z	info	exporterhelper/queued_retry.go:231	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "name": "otlp", "error": "failed to push metrics data via OTLP exporter: rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "4.816822688s"}
2022-01-25T05:54:26.217Z	info	exporterhelper/queued_retry.go:231	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "name": "otlp", "error": "failed to push metrics data via OTLP exporter: rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "16.060087817s"}
2022-01-25T05:54:37.525Z	info	exporterhelper/queued_retry.go:231	Exporting failed. Will retry the request after interval.	{"kind": "exporter", "name": "otlp", "error": "failed to push metrics data via OTLP exporter: rpc error: code = DeadlineExceeded desc = context deadline exceeded", "interval": "4.261162066s"}

Steps followed:
helm install --set standaloneCollector.enabled=true my-opentelemetry-collector open-telemetry/opentelemetry-collector
kubectl logs -f daemonset/my-opentelemetry-collector-agent

Could you please help resolve this issue.

Default chart configuration not fully overridable

Hello,

I understand that you wanted to provide a "base configuration" (config) and an overridable one (standaloneCollector.configOverride or agentCollector.configOverride) and it's very useful. But there is a problem because setting up the Chart will always bring some elements of your default config, even if I have my defaults config in the config file.

So when I use configOverride (for my clients), it's now a three-way merge between your config, my config, and the configOverride. Which makes it quite complicated to understand.

I think there are multiple ways to correct this (if you feel like this needs to be corrected) but I'm not sure which would be the best since I'm unable to fully understand all of your _config.tpl syntaxes.
The easiest solution for me would be to either comment or pass it as multiline string the service, receivers, collectors and exporters, block content into your values.yaml. So that deep merge is still performed up to this depth, but it's easilu overridable. Example:

service: |
    extensions:
      - health_check
    pipelines:
      logs:
        exporters:
          - logging
        processors:
          - memory_limiter
          - batch
        receivers:
          - otlp
      metrics:
        exporters:
          - logging
        processors:
          - memory_limiter
          - batch
        receivers:
          - otlp
          - prometheus
      traces:
        exporters:
          - logging
        processors:
          - memory_limiter
          - batch
        receivers:
          - otlp
          - jaeger
          - zipkin

This doesn't work as-is in otel-collector because of the toYaml (I think) but still get templatized the way I need it to be.

Thanks

Headless service for the agent

How do you usually configure the tracing SDK exporter to point to the agent? Wouldn't be a good idea to include a headless service for the agent?
My stack is: SDK -> Agent -> Standalone Collector

Release opentelemetry-collector Helm chart for latest app version (0.42.0)

As of writing, the latest opentelemetry-collector (0.8.1) chart supports appVersion 0.37.1. Attempts to use this Helm chart version with an the image for the app version 0.42.0 produces this error in the pod:

Error: failed to start container "opentelemetry-collector": Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: exec: "/otelcontribcol": stat /otelcontribcol: no such file or directory: unknown

Default configuration not deployable due to misconfigured oltp protocol

I am new to the open telemetry, and was starting to set up an Operator and Collector via helm charts. It is not really a bug, but the default helm chart configuration for Collector does specify the protocol "otlp" but provides no configuration to it (here) resulting in following error when running the container:

receiver "otlp" has invalid configuration: must specify at least one protocol when using the OTLP receiver

The error is easy to understand and by adding the endpoint values to the "oltp" protocol, everything launches smoothly.

  config:
    receivers:
      otlp:
        protocols:
          grpc:
            endpoint: 0.0.0.0:4317 # <= Missing
          http:
            endpoint: 0.0.0.0:4318 # <= Missing

Adding this default values or removing the "oltp" driver no require this values improves the onboarding and learning process for newbies like me without having to debug the helm charts first.

operator Helm chart should update cert-manager apiVersion: cert-manager.io/v1

Currently, the operator Helm chart template is using cert-manager with older apiVersion.
Trying to deploy with instructions we fail when installing the Helm Chart as instructed here.
looking inside we can see that the cert-manager is old using certmanager.yaml)

apiVersion: cert-manager.io/v1alpha2
kind: Certificate

and

apiVersion: cert-manager.io/v1alpha2
kind: Issuer

After fixing that to the new version apiVersion: cert-manager.io/v1 the deployment succeeds and seems to be working.

Support installing to user-defined namespace

I create my own namespace, for example monitoring, and pass it to the helm command when applying the chart, but this chart seems to have a requirement on its name as I still get the error namespaces "opentelemetry-operator-system" not found. It should be possible to install into any namespace, respecting the helm namespace argument

Add functions to current operator Helm chart release script to validate which template files need to be updated

Is your feature request related to a problem? Please describe.

Currently, in operator Helm chart, we provide a release helper (main.go) for maintainers to update CRDs, values.yaml and Chart.yaml automatically during every release. We want to add one more function to this, which would compare all the template files under templates directory with the latest OTEL Operator manifest, and tell maintainers which files need to be updated.

Describe the solution you'd like

  • Retrieve the latest operator manifest
  • Unmarshal it and compare each YAML file with the template file under templates directory

cc @alolita

Add functional end-to-end tests for the chart

We need to test the Helm chart's functionality end-to-end:

  • Install the chart on a test Kubernetes cluster.
  • Run a test mock backend (can possibly reuse mock backends from Collector testbed).
  • Configure the chart to send collected traces, metrics and logs to the mock backend.
  • Run some test workload on the cluster. The workload should generate telemetry: application traces, metrics and logs and cluster metrics and logs.
  • Let mock backend to receiver generated telemetry and compare against expected data. It would be best that expected data is defined in a set of golden files.

Make sure end-to-end test is run on Github action as part of every PR and main branch build.

Error creating processor "memory_limiter" on clean install

Helm chart: opentelemetry-collector-0.4.3 (latest at the time of writing)

When installing the chart with:

helm install otc open-telemetry/opentelemetry-collector

the installation succeeds, but the OTC pod crashes with the following error:

Error: cannot setup pipelines: cannot build pipelines: error creating processor "memory_limiter" in pipeline "metrics": checkInterval must be greater than zero
See full OTC pod logs
2021-03-09T12:37:54.144Z	info	service/service.go:411	Starting OpenTelemetry Collector...	{"Version": "v0.20.0", "GitHash": "a10a1a7a", "NumCPU": 8}
2021-03-09T12:37:54.147Z	info	service/service.go:592	Using memory ballast	{"MiBs": 204}
2021-03-09T12:37:54.147Z	info	service/service.go:255	Setting up own telemetry...
2021-03-09T12:37:54.148Z	info	service/telemetry.go:102	Serving Prometheus metrics	{"address": "0.0.0.0:8888", "level": 0, "service.instance.id": "49b6c5c0-07a7-41c2-9a92-b5e01a14b86b"}
2021-03-09T12:37:54.149Z	info	service/service.go:292	Loading configuration...
2021-03-09T12:37:54.149Z	info	service/service.go:303	Applying configuration...
2021-03-09T12:37:54.149Z	info	service/service.go:324	Starting extensions...
2021-03-09T12:37:54.149Z	info	builder/extensions_builder.go:53	Extension is starting...	{"component_kind": "extension", "component_type": "health_check", "component_name": "health_check"}
2021-03-09T12:37:54.149Z	info	healthcheckextension/healthcheckextension.go:40	Starting health_check extension	{"component_kind": "extension", "component_type": "health_check", "component_name": "health_check", "config": {"TypeVal":"health_check","NameVal":"health_check","Port":13133}}
2021-03-09T12:37:54.149Z	info	builder/extensions_builder.go:59	Extension started.	{"component_kind": "extension", "component_type": "health_check", "component_name": "health_check"}
2021-03-09T12:37:54.149Z	info	builder/exporters_builder.go:306	Exporter is enabled.	{"component_kind": "exporter", "exporter": "logging"}
2021-03-09T12:37:54.150Z	info	service/service.go:339	Starting exporters...
2021-03-09T12:37:54.150Z	info	builder/exporters_builder.go:92	Exporter is starting...	{"component_kind": "exporter", "component_type": "logging", "component_name": "logging"}
2021-03-09T12:37:54.150Z	info	builder/exporters_builder.go:97	Exporter started.	{"component_kind": "exporter", "component_type": "logging", "component_name": "logging"}
Error: cannot setup pipelines: cannot build pipelines: error creating processor "memory_limiter" in pipeline "metrics": checkInterval must be greater than zero
2021/03/09 12:37:54 application run finished with error: cannot setup pipelines: cannot build pipelines: error creating processor "memory_limiter" in pipeline "metrics": checkInterval must be greater than zero

Here's the relevant part of the configmap, as installed by Helm:

kubectl describe configmap otc-opentelemetry-collector-agent
# ...
processors:
    batch: {}
    memory_limiter: null
# ...

Add Helm chart to deploy OpenTelemetry Operator in an Kubernetes cluster

Is your feature request related to a problem? Please describe.

The Collector is deployed today using the OpenTelemetry Operator. We would like to add an OpenTelemetry Operator Helm chart to give the users another choice to deploy the Operator. With this Helm chart, users will have the flexibility to tune the configurations passed into the Operator and manage the OpenTelemetry Operator efficiently.

Describe the solution you'd like

We are designing and building an Helm chart with the OpenTelemetry Operator being the package content. We plan to add this Helm chart to the existing OpenTelemetry Helm repository (https://github.com/open-telemetry/opentelemetry-helm-charts).

Describe alternatives you've considered

We reviewed the Prometheus Helm chart (https://github.com/prometheus-community/helm-charts) and it did not meet our needs.

Additional context

We are using the existing OpenTelemetry Collector Helm chart (https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-collector).

cc: @Saber-W @Raul9595 @alexperez52

Latest published helm chart version is 0.2.0 for the operator

I am trying to deploy the newest version of the helm chart, 0.3.2, but it is not available in the open-telemetry helm chart repository. The latest version available is 0.2.0.

Also, I am unable to even use version 0.2.0 since the crds are incorrect.

This is related to issue #72 and PR #75 which looks like an attempt to solve it, but I am still seeing the issue. Looking at the build for that PR, it seems to have failed https://github.com/open-telemetry/opentelemetry-helm-charts/runs/3580447145.

Permission management for prometheus receivers

I am currently testing the prometheus receiver.
The chart creates a service account with no specific permission, therefore I faced the classical Failed to watch *v1.Service

Which is the recommended way to set up RBAC for the service account this chart creates?
Should this chart provide something similar to what prometheus chartdoes or due to the number of receivers is not feasible?

Enable extensions to be string

How it behaves

Extensions are only able to be specified by object.

How it should behave

Extensions can be passed as string. If so, default configuration for the extension should be applied.

Reproduction

Configuration

config:
  extensions: [health_check, pprof, zpages]

Error

➜  capa-viewer git:(ops/tracing) ✗ (☸ |kind-kind:operations) kubectl get pod                                                               
NAME                                                        READY   STATUS             RESTARTS   AGE
opentelemetry-collector-agent-hdxb6                         1/2     CrashLoopBackOff   6          9m40s

➜  capa-viewer git:(ops/tracing) ✗ (☸ |kind-kind:operations) kubectl logs opentelemetry-collector-agent-hdxb6 opentelemetry-collector
{"level":"info","ts":1608966467.3456182,"caller":"service/service.go:396","msg":"Starting OpenTelemetry Collector...","Version":"latest","GitHash":"a85c7a2d","NumCPU":4}
{"level":"info","ts":1608966467.3494048,"caller":"service/service.go:577","msg":"Using memory ballast","MiBs":204}
{"level":"info","ts":1608966467.3494508,"caller":"service/service.go:240","msg":"Setting up own telemetry..."}
{"level":"info","ts":1608966467.4299529,"caller":"service/telemetry.go:108","msg":"Serving Prometheus metrics","address":"0.0.0.0:8888","legacy_metrics":false,"new_metrics":true,"level":0,"service.instance.id":"5ecd691e-56c9-4c93-a5a1-d0716cad798d"}
{"level":"info","ts":1608966467.430466,"caller":"service/service.go:277","msg":"Loading configuration..."}
Error: cannot load configuration: error reading top level configuration sections: 1 error(s) decoding:

* 'extensions[0]' expected a map, got 'string'
2020/12/26 07:07:47 application run finished with error: cannot load configuration: error reading top level configuration sections: 1 error(s) decoding:

* 'extensions[0]' expected a map, got 'string'

Migrate the KUTTL smoke tests of the operator Helm chart to Github CI workflow

Is your feature request related to a problem? Please describe.

Currently, during every release of the operator Helm chart, maintainers need to clone the operator repo and run the KUTTL tests locally. We want to migrate all the tests to Github CI workflow so that the tests will run automatically when a new PR is submitted.

Describe the solution you'd like

Add KUTTL tests automation to current Github actions.

cc @alolita

Make collector chart have a single responsibility

Bringing up a dedicated issue from the comment #3 (comment) and PR #86.

The proposal is to remove the option to deploy both agentCollector and standaloneCollector with data automatically forwarded through from agent to standalone collector. This will significantly reduce complexity of the chart and resolve some configuration issues

Action items:

Unable to disable pipelines

According to the documentation to disable a pipeline, we need to define the metrics as null,

config:
  exporters:
    zipkin:
      endpoint: zipkin-all-in-one:14250
  service:
    pipelines:
      metrics: null
      traces:
        exporters:
          - zipkin

but when I do this I got the message on logs that I need at least one metric receiver;

2021-07-15T15:08:26.040Z	info	service/collector.go:262	Starting otelcol...	{"Version": "v0.29.0", "NumCPU": 4}
2021-07-15T15:08:26.056Z	info	service/collector.go:322	Using memory ballast	{"MiBs": 819}
2021-07-15T15:08:26.056Z	info	service/collector.go:170	Setting up own telemetry...
2021-07-15T15:08:26.058Z	info	service/telemetry.go:99	Serving Prometheus metrics	{"address": "0.0.0.0:8888", "level": 0, "service.instance.id": "37d7dabe-6838-41ee-a5b6-254efb27bb6b"}
2021-07-15T15:08:26.058Z	info	service/collector.go:205	Loading configuration...
Error: invalid configuration: pipeline "metrics" must have at least one receiver
2021/07/15 15:08:26 collector server run finished with error: invalid configuration: pipeline "metrics" must have at least one receiver

How can I disable those pipelines re metrics and logs?

CHANGELOG.md

Hey guys, I'm implementing opentelemetry and every time I access this repo a new version exists (at least a bug fix version).
It is really difficult to know what is going on if a simple change log is absent. I would be really nice to have one.

Headless Service for Standalone Collector

Hey I've been using the helm chart for work and one thing that came up was I needed a headless service for the standalone collector when used in conjunction with the load-balancing exporter from the contrib repo.

Specifically in this line of the exporter has a service discovery mechanism via dns. In Kubernetes one can use a headless service to have the A record resolve to a list of IPs for the backing pods.

The hostname property inside a dns node specifies the hostname to query in order to obtain the list of IP addresses.

So I forked the chart to test this change within my environment, and so far so good. I wanted to open up an issue first before submitting a PR if the maintainers are cool accepting this feature. Before forking the chart my team had to use a static resolver with the exporter using n services/deployments with 1 pod backing it vs using n replicas within the same deployment and 1 headless service. (easier to scale replicas within 1 deployment vs adding n+1 deployments)

Opentelemetry Gateway Configuration

According to Opentelemetry documentation, we can implement gateway deployment other than collector. But I cannot find any implementation details in these chart settings or from documentation. Any hint to let me know how to configure opentelemetry gaetway for large cluster? Thanks

Allow option to create opentelemetry-collector CR along with otel-operator helm installation

I am currently evaluating opentelemetry-operator helm chart. I am stuck on an issue. I wanted to add the opentelemetry-collector custom resource along with otel-operator helm chart installation.

The otel-col CR creation fails during installation with the below error:

Internal error occurred: conversion webhook for cert-manager.io/v1alpha2, Kind=Certificate failed: Post "https://cert-manager-webhook.cert-manager.svc:443/convert?timeout=30s": dial tcp 10.102.92.197:443: connect: connection refused

Because the CR creation triggers the mutating webhook request to the operator and on creating otel-col CR during installation the otel-operator is still in the container creating phase, so the CR creation is failing. Is there any solution to fix this issue? I want to create otelcol CR along with the otel operator installation.

As a workaround, I am installing otel-col CR alongside otel-operator installation by adding helm pre-install webhook to create otel-col CR prior to the creation of webhooks and I am myself adding the otel-col default mode as deployment and labels that are configured using mutating webhook as below

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: opentelemetry-collector
  labels:
    "app.kubernetes.io/managed-by": "opentelemetry-operator"
  annotations:
    "helm.sh/hook": "pre-install"
spec:
  mode: deployment

Are there any alternative solutions to do this? Is my workaround an appropriate fix?
cc @Saber-W @jpkrohling

/var/log/pods/default_nginx_57d......a31f/nginx/0.log: permission denied

AWS EKS 1.21:

$ aws eks describe-cluster --name ...
{
    "cluster": {
        ...
        "version": "1.21",

Following your published docs here, I created the following values.yaml except that I took the liberty to point to the latest image:

$ cat values.yaml
agentCollector:
  containerLogs:
    enabled: true

image:
  repository: otel/opentelemetry-collector-contrib
  tag: 0.33.0

command:
  name: otelcontribcol

Deployed helm:

$ helm upgrade --install my-opentelemetry-collector open-telemetry/opentelemetry-collector -f values.yaml
Release "my-opentelemetry-collector" has been upgraded. Happy Helming!
NAME: my-opentelemetry-collector
LAST DEPLOYED: Thu Aug 26 23:51:14 2021
NAMESPACE: default
STATUS: deployed
REVISION: 6
TEST SUITE: None


  ## Double check the image tag:
$ kubectl get ds my-opentelemetry-collector-agent -o yaml | yq e '.spec.template.spec.containers[0].image' -
otel/opentelemetry-collector-contrib:0.33.0

Deploy some app pod:

$ kubectl run nginx --image=nginx

I get the following error:

$ kubectl logs -f --tail=10 ds/my-opentelemetry-collector-agent
...
2021-08-27T03:51:58.232Z	error	Failed to open file	{"kind": "receiver", "name": "filelog", "operator_id": "$.file_input", "operator_type": "file_input", "error": "open /var/log/pods/default_nginx_57d7fc93-dfd8-42e6-8028-d2ea2c9aa31f/nginx/0.log: permission denied"}

agentCollector doesn't expose ports despite config file defining ports

https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/values.yaml#L111 speaks of ports exposed for both the agentCollector and the standaloneCollector. But https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/templates/service.yaml defines a service for only the standaloneCollector.

Unclear if this is an intentional omission that I don't understand, or if this is a known gap that needs a pull request.

Update for opentelemetry-operator v0.43.0

The opentelemetry-operator v0.43.0 has been released!

However, this chart is still based on v0.41.1.

There're some new features we'd like to try.

Please consider updating the chart for opentelemetry-operator v0.43.0!

Thank you so much!

Config overide issues

When running the agent and collecctor i've been seeing configs.
it maybe related null values when rendering the config to the configmap. output from just enabling the agent and collector without touching any config:

receivers:
  ...
  otlp:
    protocols: {}

it seems to be missing grpc:

Add hostalises to opentelemetry-collector helm chart

I am using opentelemetry-collector helm chart to deploy my instances of collector that serves as a batch processor of jaeger tracing and exports traces to remote jaeger collector. During this process, I need to update the hostalies field of otel collector deployment to add a DNS entry for routing purpose. However this field is not available in the helm chart and I hence have to customize the helm chart to achieve this goal.

I wonder if it is possible to include this field in otel collector helm chart? I am happy to submit a PR for this.

Datadog exporter does not run

I'm posting this here because I think the issue is specific to running the helm chart, not a problem with the Datadog exporter in general. Please let me know if this is not the case and I can post this is the collector contrib repo.

When I run the image directly:

docker run -p 4317:4317 \
    -v $PWD/otel-collector-config.yaml:/etc/otel-collector-config.yaml \
    otel/opentelemetry-collector-contrib:0.30.0 \
    --config=/etc/otel-collector-config.yaml

Where otel-collector-config.yaml looks like:

receivers:
    otlp:
        protocols:
            http:
exporters:
    datadog:
        service: test
        env: test
        version: 0.0.0
        api:
            key: "..."
processors:
    batch:
        timeout: 10s
service:
    pipelines:
        traces:
            receivers: [otlp]
            exporters: [datadog]
            processors: [batch]

I see:

info    utils/api.go:32 Validating API key.     {"kind": "exporter", "name": "datadog"}
info    utils/api.go:39 API key validation successful.  {"kind": "exporter", "name": "datadog"}
...
info    builder/exporters_builder.go:98 Exporter started.       {"kind": "exporter", "name": "datadog"}

And everything works as expected.

However if I use this helm chart:

helm install opentelemetry-collector open-telemetry/opentelemetry-collector --values opentelemetry-collector.yaml

Where opentelemetry-collector.yaml looks like:

agentCollector:
  enabled: false

standaloneCollector:
  enabled: true

image:
  repository: otel/opentelemetry-collector-contrib
  pullPolicy: IfNotPresent
  tag: "0.30.0"

command:
  name: otelcontribcol

config:
  receivers:
    otlp:
      protocols:
        http:

  processors:
    batch:
      timeout: 10s

  exporters:
    datadog:
      service: test
      env: test
      version: 0.0.0
      api:
        key: "..."

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [datadog]

I see:

info    builder/exporters_builder.go:98    Exporter started.    {"kind": "exporter", "name": "datadog"}

But I don't see the other messages about verifying the API key.

And if I send over traces, I see them being logged (if I enable the logger exporter ad debug level):

Span #57                                                                                                                                                                                                                
     Trace ID       : 12055027ec737720c79be4a66b1e8672
     Parent ID      : 1363f0dfbd1f4d41
     ...

But nothing is being exported to Datadog.
So it seems like the Datadog exporter is not starting up correctly.

revamp ports

k8s ports (pod, service) should be determined by which receivers are setup rather than managed independently via .Values.ports unless there is a reason that receiver ports and k8s ports should differ.

This would be a breaking changes.

Latest version in helm repo is 0.2.0

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update
helm search open-telemetry

Latest version shown is 0.2.0

Release workflow seems to not be detecting chart changes for some reason

https://github.com/open-telemetry/opentelemetry-helm-charts/runs/3490399927?check_suite_focus=true

Looking up latest tag...
Discovering changed charts since 'opentelemetry-operator-0.3.1'...
Nothing to do. No chart changes detected.

Consider other names for agentCollector and standaloneCollector

To continue the discussion about naming of the collector components supported by the chart from open-telemetry/opentelemetry-collector-contrib#1026 (review)

This chart can use one of two (or both) forms of installations: collector running as deamonset vs collector running as deployment. Current names can sound confusing, we need to consider other options (or keep existing). Currently they called:

  • agentCollector for collector running as daemonset - other components like configmap get suffix -agent.
  • standaloneCollector for collector running as deployment - other components don't get any suffix.

pipeline "traces" references processor batch which does not exist

How it behaves

Processor is not defined.

How it should behave

Processor should be defined.

Reproduction

Configuration

config:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
  processors:
    batch:
    memory_limiter:
      ballast_size_mib: 70
      check_interval: 5s
      limit_mib: 170
      spike_limit_mib: 20
  exporters:
    jaeger:
      endpoint: jaeger-collector:14250
  service:
    pipelines:
      traces:
        receivers: [otlp]
        processors: [memory_limiter, batch]
        exporters: [jaeger]

Error

➜  capa-viewer git:(ops/tracing) ✗ (☸ |kind-kind:operations) kubectl get pod               
NAME                                                        READY   STATUS             RESTARTS   AGE
opentelemetry-collector-agent-8kzwp                         1/2     CrashLoopBackOff   6          9m27s

➜  capa-viewer git:(ops/tracing) ✗ (☸ |kind-kind:operations) kubectl logs opentelemetry-collector-agent-8kzwp opentelemetry-collector 
{"level":"info","ts":1608967307.530805,"caller":"service/service.go:396","msg":"Starting OpenTelemetry Collector...","Version":"latest","GitHash":"a85c7a2d","NumCPU":4}
{"level":"info","ts":1608967307.5512323,"caller":"service/service.go:577","msg":"Using memory ballast","MiBs":204}
{"level":"info","ts":1608967307.551373,"caller":"service/service.go:240","msg":"Setting up own telemetry..."}
{"level":"info","ts":1608967307.6330154,"caller":"service/telemetry.go:108","msg":"Serving Prometheus metrics","address":"0.0.0.0:8888","legacy_metrics":false,"new_metrics":true,"level":0,"service.instance.id":"0cc8ea06-3312-48d1-9328-677ea81065c9"}
{"level":"info","ts":1608967307.635118,"caller":"service/service.go:277","msg":"Loading configuration..."}
Error: cannot load configuration: pipeline "traces" references processor batch which does not exist
2020/12/26 07:21:47 application run finished with error: cannot load configuration: pipeline "traces" references processor batch which does not exist

Logs support using FluentBit

I have researched the possibility of using a common Fluent Bit Helm Chart for log collection on K8s (as a DaemonSet), passing them through OpenTelemetry Collector and getting K8s metadata tagged.

The details are available in this document: https://docs.google.com/document/d/1QlFbXz0eQUaKXK1WrnEs3VqRkcPD2RAdXJdJMjnOSd8/edit#

There are several issues that need to be solved first, before metadata tagging will be available: open-telemetry/opentelemetry-collector#1884, open-telemetry/opentelemetry-collector-contrib#1146 and open-telemetry/opentelemetry-collector#1652

Also, the fields such as pod_uid need to be separately converted into semantic-conventions compatible one, like k8s.pod.uid (it's not possible to use such capture group name, but another process can change the attribute name).

Having said of all that, it's plausible to setup an optional pipeline for log collection as part of the Helm Chart. Either taking the Fluent Bit Helm Chart as a dependency or via a similar approach

Fluent Bit config for retrieving logs using the CRI log path format (as suggested by @dashpole) and exporting to OpenTelemetry Collector:

    [INPUT]
        Name tail
        Path /var/log/pods/*/*/*.log
        Parser docker
        Tag kube.*
        Mem_Buf_Limit 5MB
        Skip_Long_Lines On

    [OUTPUT]
        Name forward
        Match kube.*
        Host <OPENTELEMETRY_COLLECTOR_HOST>
        Port <OPENTELEMETRY_COLLECTOR_FLUENT_FORWARDER_PORT>

The attribute extraction from Fluent Bit tag can be done using following processor:

    processors:
      attributes:
        actions:
          - key: fluent.tag
            action: extract      
            pattern: ^kube\.var\.log\.pods\.(?P<namespace_name>[^_]+)_(?P<pod_name>[a-z0-9](?:[-a-z0-9]*[a-z0-9])?(?:[a-z0-9](?:[-a-z0-9]*[a-z0-9])?)*)_(?P<pod_uid>[-a-z0-9]+).(?P<container_name>.+)\.(?P<restarts>[0-9]+)\.log$

Add `--otelcol-image` cli flag value to Operator Helm Chart

The --otelcol-image cli flag should be added to the Operator helm chart so users can have a way to set the default Collector image without setting it in the Collector configuration. The current default image will remain the same, and will only be changed if a user explicitly configures it otherwise.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.