Comments (24)
I had reproduced the error locally yesterday (or at least something that looked the same), but had to switch focus before I could find the root cause. Now I can't reproduce it :(
One thing I did notice was in the collector logs there were errors about not being able to connect to kind-control-plane. Perhaps the e2e workflow should capture the pod logs before tearing down, to make debugging easier.
from opentelemetry-collector-contrib.
I had reproduced the error locally yesterday (or at least something that looked the same), but had to switch focus before I could find the root cause. Now I can't reproduce it :(
One thing I did notice was in the collector logs there were errors about not being able to connect to kind-control-plane. Perhaps the e2e workflow should capture the pod logs before tearing down, to make debugging easier.
I also could not reproduce the same error like github action, it's really weird. But your advise is really good to capture the logs of pod (no matter collector or telemetrygen) in workflow to help debugging. @axw
from opentelemetry-collector-contrib.
Not sure if there is another way to get access to the Pods' logs but I tried sth dirty to capture the logs of the Pods: #33538.
Let's see if this can provide us some insights here.
from opentelemetry-collector-contrib.
@ChrsMark In your latest pr, the hostEndpoint is empty, I think this is the root cause.
https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9498361987/job/26177138997?pr=33538
from opentelemetry-collector-contrib.
Sounds possible @fatsheep9146, I will try to upgrade docker on my machine to 26.x.x
as well and see if I can reproduce it.
Update:
I was able to reproduce this locally with docker 26.1.4
(ubuntu machine).
Collector Pod logs:
2024-06-13T12:42:55.052Z warn zapgrpc/zapgrpc.go:193 [core] [Channel #2 SubChannel #8]grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:4317", ServerName: "localhost:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:4317: connect: connection refused" {"grpc_log": true}
2024-06-13T12:42:55.052Z warn zapgrpc/zapgrpc.go:193 [core] [Channel #2 SubChannel #8]grpc: addrConn.createTransport failed to connect to {Addr: "[::1]:4317", ServerName: "localhost:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp [::1]:4317: connect: connection refused" {"grpc_log": true}
2024-06-13T12:42:55.316Z warn zapgrpc/zapgrpc.go:193 [core] [Channel #1 SubChannel #9]grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:4317", ServerName: "localhost:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:4317: connect: connection refused" {"grpc_log": true}
2024-06-13T12:42:55.316Z warn zapgrpc/zapgrpc.go:193 [core] [Channel #1 SubChannel #9]grpc: addrConn.createTransport failed to connect to {Addr: "[::1]:4317", ServerName: "localhost:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp [::1]:4317: connect: connection refused" {"grpc_log": true}
2024-06-13T12:42:57.265Z warn zapgrpc/zapgrpc.go:193 [core] [Channel #4 SubChannel #11]grpc: addrConn.createTransport failed to connect to {Addr: "127.0.0.1:4317", ServerName: "localhost:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp 127.0.0.1:4317: connect: connection refused" {"grpc_log": true}
2024-06-13T12:42:57.265Z warn zapgrpc/zapgrpc.go:193 [core] [Channel #4 SubChannel #11]grpc: addrConn.createTransport failed to connect to {Addr: "[::1]:4317", ServerName: "localhost:4317", }. Err: connection error: desc = "transport: Error while dialing: dial tcp [::1]:4317: connect: connection refused" {"grpc_log": true}
from opentelemetry-collector-contrib.
@ChrsMark I'm trying to update the sdk version docker to see if it can fix the problem.
from opentelemetry-collector-contrib.
@fatsheep9146 thank's! FYI debugging this, I spot that
is failing withcontext deadline exceeded
, but the weird thing is that this error is for some reason "muted".
Hopefully the lib upgrade can solve this.
from opentelemetry-collector-contrib.
I had a successful run at #33548. I'm going to enable the rest of the tests and check again.
@ChrsMark
Yes, I found update docker sdk library is blocked by for some reasons.
#32614
#31989
So I also try to use another way to get the right host endpoint
https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9501492668/job/26187569925?pr=33542
I think we can try in both ways and get more opnions from others.
from opentelemetry-collector-contrib.
I hit an additional error at
k8scluster
receiver. It seems that some image names have also changed as well: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9501435003/job/26187307997?pr=33548#step:11:35potential fix: c87a639
I think this maybe due to the newer version of kind
from opentelemetry-collector-contrib.
I have been unable to reproduce the issues locally and reverting #33415 did not help (according to the CI jobs on main that was the first commit where things started to flake).
Looking at the workflow it looks like all versions are pinned so I don't think we suddenly started using some new action, kind versions, etc.
from opentelemetry-collector-contrib.
Pinging code owners for internal/k8stest: @crobert-1. See Adding Labels via Comments if you do not have permissions to add labels yourself.
from opentelemetry-collector-contrib.
@jinja2 @fatsheep9146 any guesses?
from opentelemetry-collector-contrib.
Pinging code owners for receiver/k8sobjects: @dmitryax @hvaghani221 @TylerHelmuth. See Adding Labels via Comments if you do not have permissions to add labels yourself.
from opentelemetry-collector-contrib.
Pinging code owners for processor/k8sattributes: @dmitryax @rmfitzpatrick @fatsheep9146 @TylerHelmuth. See Adding Labels via Comments if you do not have permissions to add labels yourself.
from opentelemetry-collector-contrib.
Pinging code owners for receiver/k8scluster: @dmitryax @TylerHelmuth @povilasv. See Adding Labels via Comments if you do not have permissions to add labels yourself.
from opentelemetry-collector-contrib.
Pinging code owners for receiver/kubeletstats: @dmitryax @TylerHelmuth. See Adding Labels via Comments if you do not have permissions to add labels yourself.
from opentelemetry-collector-contrib.
Got some interesting "connection refused" errors: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9497224255/job/26173693278?pr=33538#step:11:225
2024-06-13T09:44:56.953Z info exporterhelper/retry_sender.go:118 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp", "error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:4317: connect: connection refused\"", "interval": "7.546970563s"}
2024-06-13T09:44:57.064Z info exporterhelper/retry_sender.go:118 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "logs", "name": "otlp", "error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:4317: connect: connection refused\"", "interval": "7.612004411s"}
2024-06-13T09:44:57.486Z info exporterhelper/retry_sender.go:118 Exporting failed. Will retry the request after interval. {"kind": "exporter", "data_type": "traces", "name": "otlp", "error": "rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 127.0.0.1:4317: connect: connection refused\"", "interval": "6.403460654s"}
from opentelemetry-collector-contrib.
@ChrsMark
Seems that the logic of getting hostEndpoint
is the root cause, and this logic is different between mac and linux.
from opentelemetry-collector-contrib.
I suspect the reason is due to the https://github.com/actions/runner-images/pull/10039/files.
The os ubuntu-latest
we use in github action updated with new version of docker.
from opentelemetry-collector-contrib.
I had a successful run at #33548. I'm going to enable the rest of the tests and check again.
from opentelemetry-collector-contrib.
I hit an additional error at k8scluster
receiver. It seems that some image names have also changed as well: https://github.com/open-telemetry/opentelemetry-collector-contrib/actions/runs/9501435003/job/26187307997?pr=33548#step:11:35
potential fix: c87a639
from opentelemetry-collector-contrib.
@fatsheep9146 e2e tests passed at #33548. I'm opening that one for review since it offers a fix anyways. I'll be out tomorrow (Friday) so feel free to pick the gateway check and proceed with yours if people find the approach more suitable. I'm fine either way as soon as we solve the issue :).
from opentelemetry-collector-contrib.
Resolved by #33548
from opentelemetry-collector-contrib.
Thanks for addressing and fixing so quickly @ChrsMark and @fatsheep9146!
from opentelemetry-collector-contrib.
Related Issues (20)
- Migrate otel metrics to use mdatagen: routing processor HOT 2
- Add OTel-Arrow exporter/receiver to the testbed HOT 4
- New component: Add systemd receiver HOT 5
- ottlfunc regex doesn't support double quotes HOT 8
- ParseJSON doesn't allow leading square bracket HOT 3
- Cardinality or memory limit for prometheus exporters HOT 3
- Support Storage classes in AWSS3 Exporter HOT 1
- Dynamic topic for googlecloudpubsub exporter HOT 1
- [receiver/gitprovider] checks CI/CD action failing due to token limit HOT 5
- [receiver/vcenter] Could use VSAN metrics for vCenter HOT 1
- [receiver/dockerstatsreceiver] Does not support setting TLS settings HOT 1
- [receiver/sqlserver] Allow server config option to include port HOT 3
- need k8s.pod.memory.working_set_memory_limit_utilization metric HOT 3
- [exporter/datadog] collector panics on invalid trace & span ids in logs HOT 5
- Create internal/arrow package for common code used in otelarrowexporter, otelarrowreceiver HOT 5
- Tail sampling processor: add a way to sample all spans that have a span link to a sampled span. HOT 2
- [extension/observer] Expose host and port separately in endpoint env HOT 1
- Migrate otel metrics to use mdatagen: deltatocumulative processor HOT 3
- [cmd/opampsupervisor] Forward Custom Messages to/from agent HOT 1
- [testbed/mockdatasenders/mockdatadogagentexporter] Broken load test HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from opentelemetry-collector-contrib.