jirwin / burrow_exporter Goto Github PK
View Code? Open in Web Editor NEWPrometheus exporter for burrow
License: Apache License 2.0
Prometheus exporter for burrow
License: Apache License 2.0
Hi
We have a situation where the kafka consumer was stopped for an upgrade and we expect the lag to increase while the upgrade is taking place. However the metrics reported in Prometheus didn't show the lag increasing. After looking at the burrow API and burrow_exporter it seem that burrow_exporter is reporting the lag at the last offset commit (end.lag
) instead of the current lag (current_lag
) of the partition.
$ curl -s http://burrow.service.docker:8000/v3/kafka/local/consumer/GROUP/lag | jq '.'
{
"error": false,
"message": "consumer status returned",
"status": {
"cluster": "local",
"group": "GROUP",
"status": "ERR",
"complete": 1,
"partitions": [
{
"topic": "short",
"partition": 0,
"owner": "",
"status": "STOP",
"start": {
"offset": 12490020,
"timestamp": 1547163289948,
"lag": 0
},
"end": {
"offset": 12490085,
"timestamp": 1547163309431,
"lag": 0
},
"current_lag": 70131,
"complete": 1
},
...
The above if from the burrow V3 API and shows the end.lag
as zero but the current_lag
is positive.
I'm relatively new to Kafka and burrow so this may be by design but it seems to me that that current lag would be more useful than the lag at last commit.
Thanks
Hi,
it would be nice to have a docker build from this package. Something like
docker run -p 80:3000 -e BURROW_HOME="http://{burrow_host}/v2/kafka" -e PROMETHEUS_ENDPOINT="/metrics" -d burrow-exporter
At least some README.md explaining "Getting started"
Any chance to have look at this? Actually, this is only project converting burrow to prometheus.
Hi,
I was currently testing this project on our cluster but we are facing an issue with the current version.
Our cluster currently has more than 2k topics and even more consumers groups. Apparently this exporter launch a go-routine by cluster, then by topic & then by consumer group to scrape the REST api of Burrow in parallel. The problem that was that too many simultaneous call to the rest api were done causing this exporter to crash with a unable to open socket : too many open files.
For now we just disabled all the go routine call (exporter.go => line 139, 148, 191) and everything works fine, the exporter is able to scrape the whole api in less than 10 seconds
Hi,
I have a working grafana with prometheus server (on a node). ON a diffarent node I am runnibng burow and burrow_exporter (as 2 docker services). For burrow_exporter i have the below configuration. when i see burrow_logs i dont see any errors (docker logs -f ; I see Scraping burrow.. , and Finished scraping burrow.... ). But i dont see any entries into my prometheus server with substring "burrow" so clearly the scrapped data is not sent across.
if you can give me a way to push data from burrow_exporter to external promethus server over an arbitary IP, will be a great help.
services:
burrow:
build: .
environment:
BURROW_ADDR: http://172.31.18.137:8000
METRICS_ADDR: 172.31.18.137:9090
INTERVAL: 5
API_VERSION: 3
volumes:
- ../burrow-master/docker-config:/etc/burrow/
- ${PWD}/tmp:/var/tmp/burrow
Hi,
I'm trying to identify how does consumer group status matches against the metric exported. I receive a number for the status of a consumer group and i want to identify what does that number mean based on the list of string status on burrow:
Those number above are assumptions based on test i did. I'm not completely sure about those number. Anyway, i looked through the documentation on burrow and here and i can't find anything.
Does this information exist? can someone here provide that information?
Hi,
I am using the latest code built as a Docker image
docker run --name burrow-exporter -d -p 8090:8080 --link burrow -e BURROW_ADDR="http://burrow:8000/" -e METRICS_ADDR="0.0.0.0:8080" -e API_VERSION="3" -e INTERVAL="30" jirwin/burrow_exporter
The only metric showing up in the /metrics endpoint is kafka_burrow_topic_partition_offset, why other metrics are absent, like the lag?
Many thanks
It would be nice to have a metric that indicates if burrow_exporter has been able to scrape burrow and that the health check succeeded.
I have developed Helm charts which uses your Docker image for burrow exporter.
Helm chart reference: https://github.com/Yolean/kubernetes-kafka/blob/master/linkedin-burrow/burrow.yml
However, when things are not in good shape with the Kafka cluster things go wrong with the Burrow exporter as well. Following is the log
time="2019-07-24T02:16:09Z" level=error msg="error listing clusters. Continuing." err="Get http://localhost:8000/v3/kafka: dial tcp 127.0.0.1:8000: connect: connection refused" time="2019-07-24T02:16:39Z" level=info msg="Scraping burrow..." timestamp=1563934599815846635 time="2019-07-24T02:16:39Z" level=error msg="error making request" endpoint="http://localhost:8000/v3/kafka" err="Get http://localhost:8000/v3/kafka: dial tcp 127.0.0.1:8000: connect: connection refused" time="2019-07-24T02:16:39Z" level=error msg="error retrieving cluster details" err="Get http://localhost:8000/v3/kafka: dial tcp 127.0.0.1:8000: connect: connection refused" time="2019-07-24T02:16:39Z" level=error msg="error listing clusters. Continuing." err="Get http://localhost:8000/v3/kafka: dial tcp 127.0.0.1:8000: connect: connection refused"
We do not have liveness and readiness probes defined for burrow exporter, how do we check it?
My concern is any issue with the Kafka cluster should not put the burrow-exporter in CrashLoopBackOff state
Has anyone tried running the Docker image in Kubernetes and had issues passing either command line arguments or just environment variables to the Go binary? It seems to redact a -
in my args, although it works in Docker. I have tried everything I can think of, I have Burrow running in the same pod just fine passing flags. See the shortened config below:
image: 715666668144.dkr.ecr.us-east-1.amazonaws.com/burrow_exporter:latest
command:
- ./burrow-exporter
env:
- name: BURROW_ADDR
value: http://localhost:8000
- name: METRICS_ADDR
value: 0.0.0.0:8080
ports:
- name: web
containerPort: 8080
Or as the following, where the --
turns into -
, visible in the logs of the container
image: 715666668144.dkr.ecr.us-east-1.amazonaws.com/burrow_exporter:latest
command:
- ./burrow-exporter
args:
- --burrow-addr http://localhost:8000
ports:
- name: web
containerPort: 8080
Hi,
I'm running burrow and burrow_exporter, and seing this a lot in the logs:
time="2019-03-21T17:15:12Z" level=error msg="error retrieving consumer group topic details" cluster=staging err="Get http://localhost:31363/v3/kafka/staging/topic/topic_name dial tcp 127.0.0.1:31363: connect: cannot assign requested address" topic=topic_name
Any idea what could be happening?
Hello,
Is there a way to use this tool without using docker i.e: compile into a standalone executable?
This would be very useful as we do not have docker on our linux machines.
Thanks for your help!
Julien
This may be a gap of usage knowledge on my end but I am only getting go stats form /metrics
Is there a sub URL that I am missing?
Burrow is working and, I am not seeing any errors with connecting via docker.
Burrow ouput
{ "error": false, "message": "cluster list returned", "clusters": [ "local" ], "request": { "url": "/v3/kafka", "host": "70fb55e70e02" } }
/metrics output
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0.0003077
go_gc_duration_seconds{quantile="0.25"} 0.0003077
go_gc_duration_seconds{quantile="0.5"} 0.0003139
go_gc_duration_seconds{quantile="0.75"} 0.0003139
go_gc_duration_seconds{quantile="1"} 0.0003139
go_gc_duration_seconds_sum 0.0006216
go_gc_duration_seconds_count 2
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 15
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 437784
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 5.018448e+06
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.444015e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 15182
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 9.598504609245178e-07
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 2.371584e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 437784
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 6.4847872e+07
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 1.572864e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 2168
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 0
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 6.6420736e+07
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 1.5433399375312576e+09
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 17350
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 6912
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 30248
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 49152
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 4.194304e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.294409e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 688128
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 688128
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 7.2284408e+07
# HELP go_threads Number of OS threads created
# TYPE go_threads gauge
go_threads 13
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.31
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1.048576e+06
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 7
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 1.1558912e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.54333909606e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 1.15638272e+08
I'd like to export the status of each partition too.
We can always write some logic at prometheus end, but Burrow already does this well.
https://github.com/linkedin/Burrow/wiki/http-request-consumer-group-status
These are the valid status strings: NOTFOUND, OK, WARN, ERR, STOP, STALL
Edit:
We shall model them as separate time series
kafka_burrow_partition_state{cluster="MY_CLUSTER",group="MY_GROUP",partition="13",topic="MY_TOPIC",state:"OK"} 1
kafka_burrow_partition_state{cluster="MY_CLUSTER",group="MY_GROUP",partition="13",topic="MY_TOPIC",state:"STOP"} 1
something like https://www.robustperception.io/exposing-the-software-version-to-prometheus/
I hope to show with grafana
burrow API upgraded to v3, while the client.go module reference v2 endpoints.
new api
The tag is at 0.0.6, the exporter still claims it's 0.0.4 https://github.com/jirwin/burrow_exporter/blob/master/burrow-exporter.go#L17
It states
The status of a partition as reported by burrow.
but should likely say
The status of a consumer group as reported by burrow.
Source: https://github.com/jirwin/burrow_exporter/blob/master/burrow_exporter/metrics.go#L55
Prometheus support was added to linkedin/Burrow#628 in 2020.
Is this project still relevant? May I suggest that you update your documentation.
...as found here.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.