Git Product home page Git Product logo

otel-schema's Introduction

otel-schema

Playground to prototype and investigate configuration schema proposals for OpenTelemetry

Schema Languages

jsonschema

CUE

Protobuf

otel-schema's People

Contributors

codeboten avatar jack-berg avatar martinkuba avatar mralias avatar tsloughter avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

otel-schema's Issues

How to handle pull based metric exporters like prometheus

Discussed here.

The spec allows pull based metric exporters to be implemented as "just another exporter", which needs to be paired with a metric reader, or as metric readers. This flexibility means that there's not an obvious way to configure pull based metric exporters since what is intuitive for one language ecosystem will not be so for another.

If pull based exporters are paired with a reader, you'd expect something like:

sdk:
  meter_provider:
    exporters:
      prometheus:
        port: 5555
    metric_readers:
      - name: periodic
        args:
          exporter: prometheus

This is nice because prometheus is configured in the sdk.meter_provider.exporters block with the other exporters. But its confusing because its paired with a reader (in this case periodic for lack of a better option) and it clearly does not read periodically.

But if pull based exporters are readers, you'd expect something like:

sdk:
  meter_provider:
    metric_readers:
      - name: prometheus
         args:
           port: 5555

This is nice because prometheus doesn't need to be paired with periodic reader like in the example above. But it may be confusing to see prometheus configuration in the reader configuration instead of in sdk.meter_provider.exporters with the other exporters.

A compromise will have to be made at some level.

Should exporter configuration be nested or at top level?

Related to #10, but specifically for exporters. Exporters could be defined at the top level and referenced by name, or could be expressed as nested arguments to the processors / metric readers which ultimately use them.

Let's talk about this!

Option 1: Exporters live at top level and are referenced by name

Advantages:

  • Built-in reusability of components. I.e. no need to use YAML anchors to avoid repeating yourself.
  • Less nesting
  • Similar to collector configuration

Disadvantages:

  • Exporters are not consistent across signals. Need to do validation to make sure referenced exporter is compatible with signal.
  • OTLP exporter is different across signals: OTLP metric exporter has options for configuring aggregation temporality and default histogram aggregation. These options don't apply to spans and logs - its a bit confusing / awkward to configure an exporter with these options for a signal that will ignore them.

Example:

exporters:
  otlp/exporterA:
    endpoint: http://localhost:4317
    temporality_preference: delta
  otlp/exporterB:
    endpoint: http://remote-host:4317
    headers:
      api-key: 12345

span_processors:
  - name: batch
    exporter: otlp/exporterA
metric_readers:
  - name: periodic
    exporter: otlp/exporterA
log_processors:
  - name: batch
    exporter: otlp/exporter

Option 2: Exporters are nested under the components that use them

Advantages:

  • More aligned with programatic SDK configuration, where exporters aren't directly configurable. I.e. you don't configure TracerProvider with a SpanExporter, you configure TracerProvider with a BatchSpanProcessor which has an a SpanExporter.
  • Enables strict typing of signal specific exporters. I.e can express schema which says that zipkin and jaeger exporter can only be associated with spans, and NOT with metrics and logs.

Disadvantages:

  • More nesting
  • Divergent from collector config
  • Need to rely on YAML anchors to avoid repetition, which may be unfamiliar to some people.

Example:

otlp_exporterA_args: &otlpExporterAArgs
  endpoint: https://localhost:4317

span_processors:
  - name: batch
    exporter:
      name: otlp
      args: *otlpExporterAArgs
metric_readers:
  - name: periodic
    exporter:
      name: otlp
      args: *otlpExporterAArgs
        temporality_preference: delta
log_processors:
  - name: batch
    exporter:
      name: otlp
      args:
        endpoint: http://remote-host:4317
        headers:
          api-key: 12345

Document known limitations in otep

Important to call out some of the limitations for configuration, a couple of limitations we may want to call out:

  1. multiple tracer/meter/logger providers are not supported (it would be really difficult to specify the correct provider in the config and in the code)
  2. custom context propagators are not supported (there's no way to provide custom context extraction code via configuration)

Should config validation ensure configured components are available?

One question that came to mind as I was reviewing #26 is whether configuration validation should be responsible for validating that configured components are available. For example if I configure the xray propagator but the propagator is not available in my system, will configuration validation tell me this or will it load the config (assuming its configured properly) and let the SDK fail?

Include config schema version

We should include some sort of version specification in configuration files to help with parsing and schema evolution.

Example:

version: 0.1
sdk: ..
instrumentation ...

The idea of a document self describing the version it adheres to is pretty common. It shows up in docker compose files, kubernetes yaml and many other places.

Decide what configuration lives in "root" of schema

Let's try to get consensus on a small piece of the initial target configuration conversation by talking about whether configuration should lean towards being more nested or flat.

Option 1: Everything in root
Put everything in root and use prefixes to keys to disambiguate where similar concepts exist (i.e. both traces and logs have processors so call them span_processors and log_record_processors. Results in flatter configuration that is less organized because configuration for a signal isn't necessarily grouped together (i.e. I can configure span processors, then metric views, then span limits). Not clear where configuration of SDK starts / stops versus instrumentation.

Example:

resource: ...
limits: ...
propagators: ...
logging:
  level: info
exporters: # Array of named exporters to be referenced in processors / readers
  - ...
  - ...
span_processors:
  - ...
  - ...
span_limits: ...
metric_readers:
 - ...
 - ...
metric_views:
  - ...
  - ...
log_record_processors:
  - ...
  - ...
log_limits: ...
http_client_request_headers: ...
http_client_response_headers: ...
...

Option 2: Nest everything

The opposite of option 1. Nest everything under respective sections. Results in more nesting but clear organization and boundaries (i.e. all trace configuration will always be grouped together, all metric configuration will be grouped together, etc).

Example:

sdk:
  resource: ...
  logging:
    level: info
  limits: ...
  exporters: # Array of named exporters to be referenced in processors / readers
    - ...
    - ....
  tracer_provider:
    span_processors:
      - ...
      - ...
    span_limits: ...  
  meter_provider:
    metric_readers:
      - ...
      - ...
    views:
       - ...
       - ....  
  logger_provider:
    log_record_processors:
      - ...
      - ...
    span_limits: ...  
instrumentation:
  http:
     client:
       request_headers: ...
       response_headers: ...

Option 3: Something in between

Separate SDK configuration from instrumentation, but minimize nesting within those sections. Try to strike a balance between organizing related concepts without excessive nesting.

Example:

sdk:
  resource: ...
  logging:
    level: info
  limits: ...
  exporters: # Array of named exporters to be referenced in processors / readers
  span_processors:
    - ...
    - ...
  span_limits: ...  
  metric_readers:
    - ...
    - ...
  metric_views:
     - ...
     - ....  
  log_record_processors:
    - ...
    - ...
  span_limits: ...  
instrumentation:
  http_client_request_headers: ...
  http_client_response_headers: ...

Define versioning

What would be the use case to bump the "patch number"?

I am used to have "patch" when versioning the config files. Usually it is only "major" or "major.minor" depending on how strict is the parser.

Originally posted by @pellared in #13 (comment)

Support log appender/handler configuration

Follow up from #20, users must have a way to specify hooking up log appenders/handlers to the logging provider.

The wiring up of appenders to the SDK seems more like an instrumentation concern than an SDK concern. I suggest we include that under the instrumentation section. I.e. something like:

sdk:
  ...
instrumentation:
  log_appenders:
    - name: log4j
    - name: logback

Originally posted by @jack-berg in #20 (comment)

Inconstistencies in use of key or `name` to declare type

In the current config.yaml for Trace exporters the key of the exporter also declares the type. Like:

zipkin:
  endpoint: http://localhost:9411/api/v2/spans

is a Zipkin exporter.

But for Span Processors, Log Record Processors and Metric Readers the type is based on the name field:

span_processors:     
  - name: batch

It feels like an inconsistency but maybe there is a reason behind it? Best I could guess is the key is used in scenarios where the name is used as a reference in other places in the config, like zipkin from above is used in Span Processor:

      - name: batch
        args:
          exporter: zipkin

Even if this is the reason I think it looks inconsistent.

Additionally, if the name is only really used as a type I'd argue it should be renamed to something like type.

Configuration for semantic attributes collection

This is related to #8 (comment). I was thinking about this for a little bit and wanted to propose the following:

One of the key features of the OpenTelemetry spec are the semantic conventions. The semconv comes down to a bunch of attributes for resources, traces, metrics, logs etc. Now some of them have already a comment on their configurability, e.g. the general identity attributes, the http headers, database statement sanitization, etc.

For the otel configuration file I was thinking about a language agnostic way of potentially(!) making all attributes configurable, similar to what the example config.yml already has for resource

sdk:
  metric:

  # Copied from the existing example
  resource:
    attributes:
      service:
        name: !!str "unknown_service"
  trace:
    attributes:
     db:
       statement: 
         sanitized: true
     enduser: disabled
       http:
         request:
           header:
              - content_type
              - x_forwarded_for
              - ...
        response:
           header:
              - my_custom_header

I am not yet 100% satisfied with the layout, since it is extremely complex if attributes have a long namespace and it is also not yet consistent, but I wanted to start with this as a proposal and see what everybody else thinks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.