External labels were discussed in the 4/14 Prometheus-OTel-WG SIG meeting.
The Prometheus documentation describes external labels:
# The labels to add to any time series or alerts when communicating with
# external systems (federation, remote storage, Alertmanager).
external_labels:
[ <labelname>: <labelvalue> ... ]
When we say that "labels are added", no semantic interpretation is given. This implies:
- some external labels describe the process being monitored (e.g., datacenter name)
- some external labels describe how the process was monitored (e.g., replica name)
It seems we have a mix of descriptive and non-identifying attributes. OpenTelemetry has not formally added a mechanism to distinguish different kinds of attribute, but it appears increasingly important that we do this. In today's Prometheus/Cortex environment, the backend system has to be configured to recognize duplicate streams of information. I would like for OTLP to include a formal way to encode duplicate streams of information, which means distinguishing identifying and descriptive attributes from those that are non-identifying.
The terminology used here is developed in open-telemetry/opentelemetry-specification#1298, where it seems we have three kinds of attribute: identifying (e.g., "job", "instance"), descriptive (e.g., data center, k8s node), and non-identifying (e.g., replica name).
One way we can expose this information in the OTLP protocol that appears promising to me is with the use of schemas, see open-telemetry/oteps#152.