Comments (17)
A reproduction using the latest contrib image would be very helpful. If you can share a collector configuration that demonstrates the issue, that would also be helpful. Otherwise, I'll still try and reproduce it myself.
from opentelemetry-operations-go.
therwise, I'll still try and reproduce it myself.
Sounds good, ill get a cluster deployed and use the contrib collector. Will also report back with my exact config. Thanks!
from opentelemetry-operations-go.
@dashpole thank you for the fast response. I was able to reproduce the issue with a contrib build (from main branch, commit 2816252149
).
I deployed the following to a new GKE cluster.
Click me
---
apiVersion: v1
kind: ConfigMap
metadata:
name: otelcol
namespace: default
data:
config.yaml: |
receivers:
k8s_cluster:
allocatable_types_to_report:
- cpu
- memory
- ephemeral-storage
- storage
auth_type: serviceAccount
collection_interval: 60s
distribution: kubernetes
node_conditions_to_report:
- Ready
- DiskPressure
- MemoryPressure
- PIDPressure
- NetworkUnavailable
processors:
batch:
resource/clustername:
attributes:
- action: insert
key: k8s.cluster.name
value: minikube
transform/cleanup:
error_mode: ignore
metric_statements:
- context: datapoint
statements:
- delete_key(resource.attributes, "k8s.cluster.name") where true
- delete_key(resource.attributes, "k8s.pod.name") where true
- delete_key(resource.attributes, "k8s.node.name") where true
- delete_key(resource.attributes, "k8s.container.name") where true
- delete_key(resource.attributes, "k8s.namespace.name") where true
- delete_key(resource.attributes, "k8s.node.uid") where true
- delete_key(resource.attributes, "opencensus.resourcetype") where true
transform/hostname:
error_mode: ignore
metric_statements:
- context: datapoint
statements:
- set(resource.attributes["host.name"], "otel-cluster-agent") where true
exporters:
googlecloud:
logging:
service:
pipelines:
metrics:
receivers:
- k8s_cluster
processors:
- resource/clustername
# - transform/cleanup
# - transform/hostname
- batch
exporters:
- googlecloud
- logging
---
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app.kubernetes.io/name: otelcol
name: otelcolcontrib
namespace: default
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: otelcolcontrib
labels:
app.kubernetes.io/name: otelcol
namespace: default
rules:
- apiGroups:
- ""
resources:
- events
- namespaces
- namespaces/status
- nodes
- nodes/spec
- nodes/stats
- nodes/proxy
- pods
- pods/status
- replicationcontrollers
- replicationcontrollers/status
- resourcequotas
- services
verbs:
- get
- list
- watch
- apiGroups:
- apps
resources:
- daemonsets
- deployments
- replicasets
- statefulsets
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources:
- daemonsets
- deployments
- replicasets
verbs:
- get
- list
- watch
- apiGroups:
- batch
resources:
- jobs
- cronjobs
verbs:
- get
- list
- watch
- apiGroups:
- autoscaling
resources:
- horizontalpodautoscalers
verbs:
- get
- list
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: otelcolcontrib
labels:
app.kubernetes.io/name: otelcol
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: otelcolcontrib
subjects:
- kind: ServiceAccount
name: otelcolcontrib
namespace: default
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: otel-cluster-agent
labels:
app.kubernetes.io/name: otelcol
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: otelcol
template:
metadata:
labels:
app.kubernetes.io/name: otelcol
spec:
serviceAccount: otelcolcontrib
containers:
- name: opentelemetry-container
image: bmedora/otelcolcontrib:2816252149.0
imagePullPolicy: IfNotPresent
securityContext:
readOnlyRootFilesystem: true
resources:
requests:
memory: 200Mi
cpu: 100m
limits:
cpu: 100m
memory: 200Mi
env:
- name: AGENT_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: KUBE_NODE_NAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
volumeMounts:
- mountPath: /etc/otel
name: config
volumes:
- name: config
configMap:
name: otelcol
Once deployed, you can uncomment the processors in the pipeline to observe the workaround:
processors:
- resource/clustername
# - transform/cleanup
# - transform/hostname
- batch
After applying again, restart the deployment: kubectl rollout restart deploy otel-cluster-agent
.
Because I was running on GKE, I could have used resource detection. I left the resource
processor to set k8s.cluster.name
to minikube
, as that is where I observed the issue initially. On GKE, we get automatic authentication, as you know.
from opentelemetry-operations-go.
I think I figured out why it silently fails. I removed the retry_on_failure helper because we aren't using the retry mechanism. However, that is what ultimately logs the error message. Downgrading to v0.83.0 will give you the error message back.
from opentelemetry-operations-go.
I get a bunch of errors:
Field timeSeries[57] had an invalid value: Duplicate TimeSeries encountered. Only one point can be written per TimeSeries per request.\nerror details: name = Unknown desc = total_point_count:101 success_point_count:16 errors:{status:{code:3} point_count:85}"}]}
from opentelemetry-operations-go.
The following made some of the metrics work:
- action: insert
key: k8s.cluster.name
value: minikube
- action: insert
key: cloud.availability_zone
value: us-east1-b
- action: insert
key: cloud.platform
value: gcp_kubernetes_engine
I think the remaining issue is that we need to map the deployment/daemonset/statefulset, etc name to an attribute.
from opentelemetry-operations-go.
@dashpole it sounds like I need cloud.platform and cloud.availability_zone in order to map to k8s monitored resource types?
I would have expected my metrics to be unique with or without the additional two resources. Things work fine if I use host.name
and "trick" cloud monitoring / exporter into using generic_node.
Even without host.name or k8s.cluster.name, my project had a single collector sending metrics from a single cluster. Usually the duplicate time series errors how up if we have a uniqueness issue on our end (multiple collectors sending the same metrics).
from opentelemetry-operations-go.
I would have expected my metrics to be unique with or without the additional two resources. Things work fine if I use host.name and "trick" cloud monitoring / exporter into using generic_node.
I'm actually surprised this worked. I would have expected metrics to still collide, as multiple metrics would have the same host.name... I suspect most metrics still failed to send, but some succeeded. The failures just weren't logged because open-telemetry/opentelemetry-collector-contrib#25900 removed all logging of errors.
Even without host.name or k8s.cluster.name, my project had a single collector sending metrics from a single cluster. Usually the duplicate time series errors how up if we have a uniqueness issue on our end (multiple collectors sending the same metrics).
One thing to keep in mind is that we don't preserve all resource attributes, since we need to map to Google Cloud Monitoring resources. Any resource attributes we don't use for the monitored resource are discarded, unless you set metric.resource_filters in the config:
it sounds like I need cloud.platform and cloud.availability_zone in order to map to k8s monitored resource types?
You can see the full mapping logic here: https://github.com/GoogleCloudPlatform/opentelemetry-operations-go/blob/main/internal/resourcemapping/resourcemapping.go#L65. For k8s_cluster, you need cloud.availability_zone
, k8s.cluster.name
. For k8s_pod, you additionally need k8s.namespace.name, and k8s.pod.name. For k8s_container, you additionally need k8s.container.
One omission to note is that we don't have mappings for k8s_deployment, k8s_daemonset, etc. For example, for deployment metrics, the best mapping would be to k8s_cluster. You would need to use metric.resource_filters to add k8s.deployment.name as a metric attribute to make those metrics work.
Filed #761 for the collector error logging issue.
from opentelemetry-operations-go.
I've also filed GoogleCloudPlatform/opentelemetry-operator-sample#56 to try and document this usage better.
from opentelemetry-operations-go.
Let me know if using metric.resource_filters
for k8s.deployment.name
(and other k8s..name) attributes fixes the remaining issues you are having.
from opentelemetry-operations-go.
Let me know if using
metric.resource_filters
fork8s.deployment.name
(and other k8s..name) attributes fixes the remaining issues you are having.
Our distribution (bindplane-agent) configures the exporter's resource_filters with prefix: ""
, which matches all resource attributes. We have found this to be necessary for many different receivers where their resource attributes would be dropped.
I re-ran my test with the contrib collector, with the following config. No luck.
receivers:
k8s_cluster:
allocatable_types_to_report:
- cpu
- memory
- ephemeral-storage
- storage
auth_type: serviceAccount
collection_interval: 60s
distribution: kubernetes
node_conditions_to_report:
- Ready
- DiskPressure
- MemoryPressure
- PIDPressure
- NetworkUnavailable
processors:
batch:
resource/clustername:
attributes:
- action: insert
key: k8s.cluster.name
value: minikube
exporters:
googlecloud:
metric:
resource_filters:
prefix: "k8s."
logging:
service:
pipelines:
metrics:
receivers:
- k8s_cluster
processors:
- resource/clustername
- batch
exporters:
- googlecloud
- logging
If I set host.name
instead of k8s.cluster.name
, the metrics show up just fine. If there are time series issues, they would be resolved by the resource_filters
settings that we normally use, copying deployment name (and other resource attributes) to datapoint attributes / google labels.
This screenshot shows host.name
being turned into node_id
like usual, and the datapoints show up. If I switch back to k8s.cluster.name, the datapoints stop appearing.
from opentelemetry-operations-go.
You need to add this as well in the resource processor:
- action: insert
key: cloud.availability_zone
value: us-east1-b
- action: insert
key: cloud.platform
value: gcp_kubernetes_engine
(the requirement for cloud.platform was removed in recent versions, but could possibly still be needed)
from opentelemetry-operations-go.
With platform and location missing, shouldn't I expect to see the metrics show up for generic node? Or is host.name / node_id a hard requirement?
With resource_filters
configured, all resource attributes are copied over to metric labels. Each datapoint for each metric is unique, but missing from Cloud Monitoring.
from opentelemetry-operations-go.
With platform and location missing, shouldn't I expect to see the metrics show up for generic node? Or is host.name / node_id a hard requirement?
That is what I would expect. host.name / node_id is not a hard requirement (I think). It very well may be a bug. As mentioned above, you will need to downgrade to v0.83.0 to see the error message so we can figure out what is actually happening and fix it.
from opentelemetry-operations-go.
If you update to v0.89.0, the error logging will be fixed
from opentelemetry-operations-go.
I added a sample that works for either the googlecloud or googlemanagedprometheus exporters: GoogleCloudPlatform/opentelemetry-operator-sample#57. Just make sure you also set cloud.availability_zone
, cloud.platform
and k8s.cluster.name
set as well if you aren't on GKE.
from opentelemetry-operations-go.
Optimistically closing. Feel free to reopen if you have any more questions.
from opentelemetry-operations-go.
Related Issues (20)
- Implement exemplar attachments for SDK metric exporter HOT 2
- Implement sum of squared deviations for exponential histograms in SDK exporter HOT 1
- Offline Queuing of Logs and Metrics HOT 4
- https://github.com/GoogleCloudPlatform/opentelemetry-operations-go/tree/main/exporter%2Ftrace
- Add support for createservicetimeseries to the SDK metrics exporter HOT 2
- Follow least-permissions model for Github token permissions
- Metric data exported repeatedly HOT 8
- GMP exporter should deduplicate resources when writing target_info HOT 2
- Export Exponential Histogram to Managed Prometheus doesn't work due to unimplemented naming convertion HOT 4
- GMP: target_info and otel_scope_info metrics are INT-typed
- Resource detector returns invalid zone on GAE standard HOT 1
- resource detection fails for GCE running k3s HOT 2
- Deadlock on tracing HOT 4
- Add k8s.container.name to GMP fallbacks. HOT 1
- exporter/trace: Should trace context propagation be disabled in the Cloud Trace client? HOT 5
- Codecov failing with token error HOT 1
- Typo in `intToDouble` feature gate naming HOT 1
- Recommended span batch write size HOT 2
- Update Cloud Run Job `task_id` to avoid high cardinality? HOT 4
- Collector config is missing mapstructure tags
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from opentelemetry-operations-go.