digitalocean / ceph_exporter Goto Github PK
View Code? Open in Web Editor NEWPrometheus exporter that scrapes meta information about a ceph cluster.
License: Apache License 2.0
Prometheus exporter that scrapes meta information about a ceph cluster.
License: Apache License 2.0
Hey,
We have issues with ceph_cluster_objects
and ceph_osd_pgs
that those are not being displayed correctly.
First of all - ceph_cluster_objects
. If it shows total number of objects in Ceph, why in our graphs ceph_misplaced_objects
> ceph_cluster_objects
?
ceph_osd_pgs
also is not looking to give us correct values. In our DEV cluster, number of placement groups is always the same - 10560
, but the graph is showing this number varying:
Nevertheless, sum(ceph_osd_pgs)
current value is 30233
and even subtracting by the number of replicas (2), does not equals what ceph -s
provides - 10560
.
ceph-exporter is running on Docker container and we're using https://grafana.net/dashboards/917 dashboard.
Hello there,
I am trying to use the docker image to have Grafana scrape the data from it and use the ceph dashboard here:
https://grafana.com/dashboards/917
However, I believe the docker image is missing the main prometheus server. I am wondering if Grafana is supported out of the box with this as a data source?
Or
Do I need to setup a full prometheus server to perform scrapes?
The reason I ask is because when going to http://$dockerhostIP:9128/metrics the metrics are clearly available.
Guys, in my work environment, I have multiple ceph clusters. Because I'm new to Ceph, when I installed these clusters, I didn't modified their cluster names. As a result, they all have the same cluster name, so I cannot use "cluster" to distinguish them. By far, there seems no easy way to modify ceph cluster name, too.
The solution jumped into my mind is adding "fsid" as a new variable. It should be unique. Of course, shortcoming is also obvious: it will be harder to tell which is which, but, at least, it should've worked.
What do you guys think about it?
The latest version shows supporting luminous. Will it work for ceph v13.2.2?
On OSX the Ceph packages are not available to build the binary and create a container. We use Ceph in our production systems, but installing golang on the nodes to build the package is something to be avoided.
This could be mitigated by using a CI system (Travis?) to do the builds and create a public container for the exporter (which is somewhat implied by the tag digitalocean/ceph_exporter
used to build the image)
Trying to build and test the ceph_exporter but make
fails:
make
Go version 1.5.3 required but not found in PATH.
About to download and install go1.5.3 to /home/jan/work/code/go/src/github.com/digitalocean/ceph_exporter/.build/go1.5.3
Abort now if you want to manually install it system-wide instead.
mkdir -p .build
# The archive contains a single directory called 'go/'.
curl -L https://golang.org/dl/go1.5.3.linux-amd64.tar.gz | tar -C .build -xzf -
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 87 100 87 0 0 285 0 --:--:-- --:--:-- --:--:-- 286
100 76.4M 100 76.4M 0 0 2218k 0 0:00:35 0:00:35 --:--:-- 2672k
rm -rf /home/jan/work/code/go/src/github.com/digitalocean/ceph_exporter/.build/go1.5.3
mv .build/go /home/jan/work/code/go/src/github.com/digitalocean/ceph_exporter/.build/go1.5.3
GO15VENDOREXPERIMENT=1 GOROOT=/home/jan/work/code/go/src/github.com/digitalocean/ceph_exporter/.build/go1.5.3 /home/jan/work/code/go/src/github.com/digitalocean/ceph_exporter/.build/go1.5.3/bin/go build -o ceph_exporter
# github.com/digitalocean/ceph_exporter/vendor/github.com/ceph/go-ceph/rados
cannot load DWARF output from $WORK/github.com/digitalocean/ceph_exporter/vendor/github.com/ceph/go-ceph/rados/_obj//_cgo_.o: decoding dwarf section info at offset 0x4: unsupported version 0
make: *** [Makefile.COMMON:86: ceph_exporter] Error 2
make 8.05s user 2.03s system 23% cpu 42.795 total
System-wide go version is go-2:1.8.3-1 and is ignored.
ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90)
the metric ceph_osd_crush_weight
returns osd WEIGHT
valueceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable)
the metric ceph_osd_weight
returns osd REWEIGHT
valueI expect WEIGHT
value to be returned so it seems to be a bug.
Metrics that carry pool names or daemon IDs as a labels stay after the respective entity has been deleted from the cluster.
To reproduce, create a pool until the ceph_exporter exports it, then delete it. The exporter will keep exporting the last pool metrics for said pool until restarted.
When trying to run "go install" inside ceph_exporter I'm getting the following error msg:
can't load package: /usr/local/go/src/ceph_exporter/exporter.go:26:2: non-standard import "github.com/ceph/go-ceph/rados" in standard package "ceph_exporter"
root@cm03:/usr/local/go/src/github.com/ceph/go-ceph/rados# ls -ltr
total 40
-rw-r--r-- 1 root root 1989 May 18 09:15 rados.go
-rw-r--r-- 1 root root 23737 May 18 09:15 ioctx.go
-rw-r--r-- 1 root root 57 May 18 09:15 doc.go
-rw-r--r-- 1 root root 8174 May 18 09:15 conn.go
Any ideas? Thanks in advance,
Hello!
root@ceph-osd-mon02:~# ceph --version
ceph version 10.2.10 (5dc1e4c05cb68dbf62ae6fce3f0700e4654fdbbe)
Using exporter in docker image
docker run -v /etc/ceph:/etc/ceph -d --net=host digitalocean/ceph_exporter
OSD metrics collect normal, but no monitor metrics. In logs :
root@ceph-osd-mon02:~# docker logs -f a70ac49a79d3
2018/07/19 16:29:19 Starting ceph exporter on ":9128"
2018/07/19 16:29:29 failed collecting monitor metrics: rados: Invalid argument
2018/07/19 16:41:56 failed collecting monitor metrics: rados: Invalid argument
ceph.conf - https://pastebin.com/sYPgqPSR
ceph_osds_down is returning zero (0) when in fact there are a number down. We are running Jewel. We can use a workaround of ceph_osds - ceph_osds_up
because ceph_osds_up seems to be returning correctly.
I just compiled the most recent version and I'm not getting any ceph metrics.
Using ceph version 12.2.0 (32ce2a3ae5239ee33d6150705cdb24d43bab910c) luminous (rc)
Log just shows
2017/09/17 22:45:27 Starting ceph exporter on ":9128
These are the only metrics I'm getting from ceph_exporter:
# HELP go_gc_duration_seconds A summary of the GC invocation durations.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0
go_gc_duration_seconds_sum 0
go_gc_duration_seconds_count 0
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 6
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.9"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 837360
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 837360
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 1.443282e+06
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 424
# HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
# TYPE go_memstats_gc_cpu_fraction gauge
go_memstats_gc_cpu_fraction 0
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 169984
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 837360
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 933888
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 1.851392e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 8410
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 0
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 2.78528e+06
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 0
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 21
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 8834
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 6944
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 16384
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 29184
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 32768
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 4.473924e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 797478
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 360448
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 360448
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 5.605624e+06
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 7
# HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
# TYPE process_cpu_seconds_total counter
process_cpu_seconds_total 0.07
# HELP process_max_fds Maximum number of open file descriptors.
# TYPE process_max_fds gauge
process_max_fds 1024
# HELP process_open_fds Number of open file descriptors.
# TYPE process_open_fds gauge
process_open_fds 18
# HELP process_resident_memory_bytes Resident memory size in bytes.
# TYPE process_resident_memory_bytes gauge
process_resident_memory_bytes 2.7762688e+07
# HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
# TYPE process_start_time_seconds gauge
process_start_time_seconds 1.50568112768e+09
# HELP process_virtual_memory_bytes Virtual memory size in bytes.
# TYPE process_virtual_memory_bytes gauge
process_virtual_memory_bytes 7.87976192e+08
In Nautilus the structure of the output of ceph osd perf has changed a little bit, and therefore it looks like collecting OSD latency metrics is failing.
The following two metrics are not collected.
As far as I have seen from the commit log, there was a fix for this but got reverted somehow.
1f2df8d#diff-1536b81b5897f95267830a7c215ad5ab
Hey!
Just now updated our ceph-exporter to latest image (previously our image was 3 weeks old) and we've noticed that while it works with Ceph cluster version 10.2.2
it gets connection timeouts
with newer ones i.e. 10.2.3
:
# docker logs -f ceph-env-region1
2016/12/29 11:06:35 cannot connect to ceph cluster: rados: Connection timed out
2016/12/29 11:11:35 cannot connect to ceph cluster: rados: Connection timed out
2016/12/29 11:16:35 cannot connect to ceph cluster: rados: Connection timed out
2016/12/29 11:21:35 cannot connect to ceph cluster: rados: Connection timed out
2016/12/29 11:26:36 cannot connect to ceph cluster: rados: Connection timed out
2016/12/29 11:31:36 cannot connect to ceph cluster: rados: Connection timed out
Have you tried it with newer version than 10.2.2
? Also it would be cool if you would tag any exporter releases, so we won't have any headaches reverting back :)
Hi,
Any new regarding Luminous full compatibillity ?
I want to just collect metrics using Ceph Exporter Methodology and use those values for other purpose but I do not want to expose it on localhost:9128. How it can be done using Ceph Exporter Functions or Methods defined here as dependency in other project.
I have deploy the ceph_exporter on k8s with two version ,jewel and luminous
For ceph version 12.2.5 luminous, the io and thoughput value is always 0
Please tell me how to discover this problem
TKS
After building the ceph_exporter go project, I try to run the bin ceph_exporter file, and I'm getting the following err msgs:
root@cm03:/# /usr/local/go/bin/ceph_exporter
2017/05/18 14:33:48 Starting ceph exporter on ":9128"
2017/05/18 14:33:55 [ERROR] cannot extract total bytes: strconv.ParseFloat: parsing "": invalid syntax
2017/05/18 14:33:55 [ERROR] cannot extract used bytes: strconv.ParseFloat: parsing "": invalid syntax
2017/05/18 14:33:55 [ERROR] cannot extract available bytes: strconv.ParseFloat: parsing "": invalid syntax
2017/05/18 14:33:55 failed collecting cluster health metrics: strconv.ParseFloat: parsing "": invalid syntax
2017/05/18 14:33:55 [ERROR] Unable to collect data from ceph osd df rados: Invalid argument
2017/05/18 14:33:55 failed collecting osd metrics: rados: Invalid argument
Any ideas?
jewel -> luminous ceph_health_status value to 1 will not change ?
Hi,
with ceph hammer I get inclomplete values for the ioops
# HELP ceph_cache_promote_io_ops Total cache promote operations measured per second
# TYPE ceph_cache_promote_io_ops gauge
ceph_cache_promote_io_ops 0
# HELP ceph_client_io_ops Total client ops on the cluster measured per second
# TYPE ceph_client_io_ops gauge
ceph_client_io_ops 111
# HELP ceph_client_io_read_ops Total client read I/O ops on the cluster measured per second
# TYPE ceph_client_io_read_ops gauge
ceph_client_io_read_ops 0
# HELP ceph_client_io_write_ops Total client write I/O ops on the cluster measured per second
# TYPE ceph_client_io_write_ops gauge
ceph_client_io_write_ops 0
Is this a problem of the hammer relase?
Regards, Eckebrecht
HI,
Unfortunately building from source is not possible anymore.
After some research i guess this is caused by changes in ceph libs but I'm just a noob here.
Docker build gives me:
# github.com/digitalocean/ceph_exporter/vendor/github.com/ceph/go-ceph/rados
cgo-gcc-prolog: In function '_cgo_c6f595483c63_Cfunc_rados_objects_list_close':
cgo-gcc-prolog:348:2: warning: 'rados_objects_list_close' is deprecated [-Wdeprecated-declarations]
In file included from vendor/github.com/ceph/go-ceph/rados/ioctx.go:6:0:
/usr/include/rados/librados.h:3845:21: note: declared here
CEPH_RADOS_API void rados_objects_list_close(
^
cgo-gcc-prolog: In function '_cgo_c6f595483c63_Cfunc_rados_objects_list_get_pg_hash_position':
cgo-gcc-prolog:364:2: warning: 'rados_objects_list_get_pg_hash_position' is deprecated [-Wdeprecated-declarations]
In file included from vendor/github.com/ceph/go-ceph/rados/ioctx.go:6:0:
/usr/include/rados/librados.h:3836:25: note: declared here
CEPH_RADOS_API uint32_t rados_objects_list_get_pg_hash_position(
^
cgo-gcc-prolog: In function '_cgo_c6f595483c63_Cfunc_rados_objects_list_next':
cgo-gcc-prolog:384:2: warning: 'rados_objects_list_next' is deprecated [-Wdeprecated-declarations]
In file included from vendor/github.com/ceph/go-ceph/rados/ioctx.go:6:0:
/usr/include/rados/librados.h:3841:20: note: declared here
CEPH_RADOS_API int rados_objects_list_next(
^
cgo-gcc-prolog: In function '_cgo_c6f595483c63_Cfunc_rados_objects_list_open':
cgo-gcc-prolog:403:2: warning: 'rados_objects_list_open' is deprecated [-Wdeprecated-declarations]
In file included from vendor/github.com/ceph/go-ceph/rados/ioctx.go:6:0:
/usr/include/rados/librados.h:3833:20: note: declared here
CEPH_RADOS_API int rados_objects_list_open(
^
cgo-gcc-prolog: In function '_cgo_c6f595483c63_Cfunc_rados_objects_list_seek':
cgo-gcc-prolog:423:2: warning: 'rados_objects_list_seek' is deprecated [-Wdeprecated-declarations]
In file included from vendor/github.com/ceph/go-ceph/rados/ioctx.go:6:0:
/usr/include/rados/librados.h:3838:25: note: declared here
CEPH_RADOS_API uint32_t rados_objects_list_seek(
^
cgo-gcc-prolog: In function '_cgo_c6f595483c63_Cfunc_rados_read_op_omap_get_vals':
cgo-gcc-prolog:497:2: warning: 'rados_read_op_omap_get_vals' is deprecated [-Wdeprecated-declarations]
In file included from vendor/github.com/ceph/go-ceph/rados/ioctx.go:6:0:
/usr/include/rados/librados.h:3272:21: note: declared here
CEPH_RADOS_API void rados_read_op_omap_get_vals(rados_read_op_t read_op,
^
Despite the "warning" messages this causes the build to fail. The new docker image does not get an new ceph_exporter binary but uses a binary which is obviously already included.
An attempt to build via "go build" directly from localhost gives me:
# github.com/digitalocean/ceph_exporter/collectors
collectors/conn.go:32:15: undefined: rados.Conn
But rados.Conn can be found and resolved in /vendor. Root cause might be found in recent changes at ceph libs as I mentioned.
Steps to reproduce:
Does someone know a workaround?
thanks in advance
Metrics coming from ceph daemon osd.X perf dump
are very important to evaluate performace. The only problem I see in the code is that for now there is no other case where you need to call an API for every OSD, as this case requires. Is there any plan to support them?
The ceph_exporter from the luminous-2.0.0. branch fails when some OSDs not online. It seems there are duplicates values. Ceph Release is 12.2.7. We using the official docker container with luminous-2.0.0 tag...
An error has occurred during metrics gathering:
4 error(s) occurred:
When all OSDs online, everything work as expected.
Both ceph_misplaced_objects and degraded_objects of these show 0, but my ceph -s output looks like.
data:
pools: 3 pools, 364 pgs
objects: 1103M objects, 21756 GB
usage: 47129 GB used, 26597 GB / 73727 GB avail
pgs: 64413611/2314108426 objects degraded (2.784%)
466174533/2314108426 objects misplaced (20.145%)
237 active+clean
106 active+remapped+backfill_wait
15 active+undersized+degraded+remapped+backfill_wait
5 active+undersized+degraded+remapped+backfilling
1 active+remapped+backfilling
Is this normal?
ceph_exporter need to installed on each node? or it's enough installed ceph_exporter on one node?
When removing an OSD from the cluster it does not vanish from the metrics.
Version: 1.0.0
Example:
We removed OSD.230 from the cluster after a complete node failure and the OSD therefor never returned after failure. We removed the osd, its keys and the crush rules. When we reload the exporter the metrics of the removed osd are not exported anymore. The moment we query a node where we did not reload the exporter we do keep getting the results:
ph_osd_avail_bytes{osd="osd.230"} 4.252156604e+12
ceph_osd_bytes{osd="osd.230"} 5.858434628e+12
ceph_osd_crush_weight{osd="osd.230"} 5.456085
ceph_osd_depth{osd="osd.230"} 2
ceph_osd_in{osd="osd.230"} 0
ceph_osd_perf_apply_latency_seconds{osd="osd.230"} 0
ceph_osd_perf_commit_latency_seconds{osd="osd.230"} 0
ceph_osd_pgs{osd="osd.230"} 131
10ceph_osd_reweight{osd="osd.230"} 1
ceph osd tree|grep 230 returns nothing.
ceph auth list |grep 230 returns nothing.
According to our audit logs the osd was removed from the cluster on 2017-05-22, more than 9 days ago.
We removed the osd by:
ceph osd crush remove osd.{osd-num}
ceph auth del osd.{osd-num}
ceph osd rm {osd-num}
ceph osd crush remove {host}
Ceph version:
10.2.5
Restarting exporter process clears the removed OSD's from the result.
We have 10 OSD's per node, all 10 removed OSD's from this node are still reported.
Currently we have an exporter running per ceph monitor node, we have 5 exporters running.
This seems the same issue as issue #48 but we do not see the osd's removed from the metrics after a day.
When one of the OSDs is out, ceph osd df -f json
displays -nan for utilization and variance.
Like following:
{
"id": 4,
"name": "osd.4",
"type": "osd",
"type_id": 0,
"crush_weight": 0.047791,
"depth": 2,
"reweight": 0.000000,
"kb": 0,
"kb_used": 0,
"kb_avail": 0,
"utilization": -nan,
"var": -nan,
"pgs": 50
}
And the exporter says
2017/04/10 19:51:07 Starting ceph exporter on ":9128"
2017/04/10 19:51:08 failed collecting osd metrics: invalid character 'n' in numeric literal
The ceph_osd_average_utilization
variable also gets wrong in this case.
Radosgw exposes nice stats. I'm mostly interested in per bucket utilization:
root@dev:/# radosgw-admin bucket stats --bucket complainer
{
"bucket": "complainer",
"pool": ".rgw.buckets",
"index_pool": ".rgw.buckets.index",
"id": "default.1710998.19",
"marker": "default.1710998.19",
"owner": "complainer",
"ver": "0#8858394",
"master_ver": "0#0",
"mtime": "2016-05-23 15:21:12.000000",
"max_marker": "0#",
"usage": {
"rgw.none": {
"size_kb": 0,
"size_kb_actual": 0,
"num_objects": 0
},
"rgw.main": {
"size_kb": 200505199,
"size_kb_actual": 203796736,
"num_objects": 1342459
}
},
"bucket_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
}
}
Is it something that ceph_exporter
should provide or is it a job for a separate radosgw_exporter
?
Hi,
When debugging OSD latency issues it would be very helpful if there was a label connecting the OSD to the host (like the output of ceph osd tree
).
It would also disambiguate if the OSD ever moves to another host.
[root@VM_0_13_centos ceph_exporter]# go install
vendor/github.com/ceph/go-ceph/rados/ioctx.go:453:2: could not determine kind of name for C.rados_read_op_omap_get_vals2
[root@VM_0_13_centos ceph_exporter]# rpm -qa |grep rbd
librbd1-10.2.5-4.el7.x86_64
librbd1-devel-10.2.5-4.el7.x86_64
[root@VM_0_13_centos ceph_exporter]# rpm -qa |grep rados
librados2-devel-10.2.5-4.el7.x86_64
librados2-10.2.5-4.el7.x86_64
When running docker run -v /etc/ceph:/etc/ceph --net=host -it digitalocean/ceph_exporter
I get the following error:
cannot connect to ceph cluster: rados: No such file or directory
Am I doing something wrong?
When the cluster is unavailable (because e.g. 2/3 MONs are down) the ceph_exporter seems to never return. Since the ceph_exporter is not actually dependend on a running cluster it would be nicer if it could return an appropriate status. Now a monitoring solution has to rely on a prometheus scrape timeout to "detect this"
Hi,
What could be wrong here ?
Thanks
You guys are using prometheus.Handler() which has deprecated.
We've noticed the issue with ceph_exporter that it still displays ceph_osd_up
metric with value (0), even if that OSD has been removed from Cluster.
Does the exporter have counter resets or something like that?
We're using Docker container with ceph_exporter version 0.1.0.
Thanks!
I download this project and flow below steps:
Here is the error:
./exporter.go:49: cannot use collectors.NewClusterUsageCollector(conn, cluster) (type *collectors.ClusterUsageCollector) as type "github.com/prometheus/client_golang/prometheus".Collector in array or slice literal:
*collectors.ClusterUsageCollector does not implement "github.com/prometheus/client_golang/prometheus".Collector (wrong type for Collect method)
have Collect(chan<- "github.com/digitalocean/ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
want Collect(chan<- "github.com/prometheus/client_golang/prometheus".Metric)
./exporter.go:50: cannot use collectors.NewPoolUsageCollector(conn, cluster) (type *collectors.PoolUsageCollector) as type "github.com/prometheus/client_golang/prometheus".Collector in array or slice literal:
*collectors.PoolUsageCollector does not implement "github.com/prometheus/client_golang/prometheus".Collector (wrong type for Collect method)
have Collect(chan<- "github.com/digitalocean/ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
want Collect(chan<- "github.com/prometheus/client_golang/prometheus".Metric)
./exporter.go:51: cannot use collectors.NewClusterHealthCollector(conn, cluster) (type *collectors.ClusterHealthCollector) as type "github.com/prometheus/client_golang/prometheus".Collector in array or slice literal:
*collectors.ClusterHealthCollector does not implement "github.com/prometheus/client_golang/prometheus".Collector (wrong type for Collect method)
have Collect(chan<- "github.com/digitalocean/ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
want Collect(chan<- "github.com/prometheus/client_golang/prometheus".Metric)
./exporter.go:52: cannot use collectors.NewMonitorCollector(conn, cluster) (type *collectors.MonitorCollector) as type "github.com/prometheus/client_golang/prometheus".Collector in array or slice literal:
*collectors.MonitorCollector does not implement "github.com/prometheus/client_golang/prometheus".Collector (wrong type for Collect method)
have Collect(chan<- "github.com/digitalocean/ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
want Collect(chan<- "github.com/prometheus/client_golang/prometheus".Metric)
./exporter.go:53: cannot use collectors.NewOSDCollector(conn, cluster) (type *collectors.OSDCollector) as type "github.com/prometheus/client_golang/prometheus".Collector in array or slice literal:
*collectors.OSDCollector does not implement "github.com/prometheus/client_golang/prometheus".Collector (wrong type for Collect method)
have Collect(chan<- "github.com/digitalocean/ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
want Collect(chan<- "github.com/prometheus/client_golang/prometheus".Metric)
How I can fix it, Thank so much.
It would be nice to have metrics for the CephFS (POSIX mode) metadata server.
ceph_health_status_interp
with the source code ,i do not find soft_warn define,does ceph_health_status_interp useful for ceph monitoring?
can someone explain this metric?thank you
@neurodrone @nickvanw jewel client IOPS still isnt reported correctly after building with 8af54c1 #16 (re: original report #15)
2016/05/17 21:17:03 failed collecting cluster recovery/client io: can't parse units "op"
2016/05/17 21:17:03 failed collecting cluster recovery/client io: can't parse units "op"
2016/05/17 21:17:06 failed collecting cluster recovery/client io: can't parse units "op"
2016/05/17 21:17:06 failed collecting cluster recovery/client io: can't parse units "op"
I see some complaints when querying /metrics
when connected to a ceph jewel cluster:
2016/05/17 16:40:01 failed collecting cluster recovery/client io: can't parse units "op"
It is probably balking at this line in the ceph -s
output: client io 20166 kB/s wr, 0 op/s rd, 42 op/s wr
. Older versions only reported op/s, but jewel reports ops read and ops write.
[root@client4ha ceph_exporter-1.0.0]# go build
./exporter.go:49: not enough arguments in call to collectors.NewClusterUsageCollector
have (*rados.Conn)
want (collectors.Conn, string)
./exporter.go:49: cannot use collectors.NewClusterUsageCollector(conn) (type *collectors.ClusterUsageCollector) as type "github.com/prometheus/client_golang/prometheus".Collector in array or slice literal:
*collectors.ClusterUsageCollector does not implement "github.com/prometheus/client_golang/prometheus".Collector (wrong type for Collect method)
have Collect(chan<- "github.com/digitalocean/ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
want Collect(chan<- "github.com/prometheus/client_golang/prometheus".Metric)
./exporter.go:50: cannot use collectors.NewPoolUsageCollector(conn) (type *collectors.PoolUsageCollector) as type "github.com/prometheus/client_golang/prometheus".Collector in array or slice literal:
*collectors.PoolUsageCollector does not implement "github.com/prometheus/client_golang/prometheus".Collector (wrong type for Collect method)
have Collect(chan<- "github.com/digitalocean/ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
want Collect(chan<- "github.com/prometheus/client_golang/prometheus".Metric)
./exporter.go:51: cannot use collectors.NewClusterHealthCollector(conn) (type *collectors.ClusterHealthCollector) as type "github.com/prometheus/client_golang/prometheus".Collector in array or slice literal:
*collectors.ClusterHealthCollector does not implement "github.com/prometheus/client_golang/prometheus".Collector (wrong type for Collect method)
have Collect(chan<- "github.com/digitalocean/ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
want Collect(chan<- "github.com/prometheus/client_golang/prometheus".Metric)
./exporter.go:52: cannot use collectors.NewMonitorCollector(conn) (type *collectors.MonitorCollector) as type "github.com/prometheus/client_golang/prometheus".Collector in array or slice literal:
*collectors.MonitorCollector does not implement "github.com/prometheus/client_golang/prometheus".Collector (wrong type for Collect method)
have Collect(chan<- "github.com/digitalocean/ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
want Collect(chan<- "github.com/prometheus/client_golang/prometheus".Metric)
./exporter.go:53: cannot use collectors.NewOSDCollector(conn) (type *collectors.OSDCollector) as type "github.com/prometheus/client_golang/prometheus".Collector in array or slice literal:
*collectors.OSDCollector does not implement "github.com/prometheus/client_golang/prometheus".Collector (wrong type for Collect method)
have Collect(chan<- "github.com/digitalocean/ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
hi
i want to get rbd image info using ceph_exporter, but after reading the source code, i found it not support.
and there is a question confused me:
as you know, the command we used to get info of rbd is started with"rbd", while in ceph_exporter code, it seems the command are started with "ceph"?
func (o *OSDCollector) cephOSDDFCommand() []byte {
cmd, err := json.Marshal(map[string]interface{}{
"prefix": "osd df",
"format": "json",
})
if err != nil {
panic(err)
}
return cmd
}
If this need to go somewhere else I apologize; please point me in the correct direction.
I have two ceph clusters, both config files are in /etc/ceph and they are name ceph.conf and ceph2.conf. However, I am unable to find documentation regarding what else I may have to do to send prometheus information on both clusters. Both config files do appear in the container. Any help would be appreciated.
does 2.0.7-luminous support ceph jewel ?
[root@rg1-ceph01 ~/go/src/ceph_exporter]# go build
./exporter.go:85:39: cannot use collectors.NewClusterUsageCollector(conn, cluster) (type *collectors.ClusterUsageCollector) as type "ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Collector in array or slice literal:
*collectors.ClusterUsageCollector does not implement "ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Collector (wrong type for Collect method)
have Collect(chan<- "github.com/digitalocean/ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
want Collect(chan<- "ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
./exporter.go:86:36: cannot use collectors.NewPoolUsageCollector(conn, cluster) (type *collectors.PoolUsageCollector) as type "ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Collector in array or slice literal:
*collectors.PoolUsageCollector does not implement "ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Collector (wrong type for Collect method)
have Collect(chan<- "github.com/digitalocean/ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
want Collect(chan<- "ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
./exporter.go:87:40: cannot use collectors.NewClusterHealthCollector(conn, cluster) (type *collectors.ClusterHealthCollector) as type "ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Collector in array or slice literal:
*collectors.ClusterHealthCollector does not implement "ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Collector (wrong type for Collect method)
have Collect(chan<- "github.com/digitalocean/ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
want Collect(chan<- "ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
./exporter.go:88:34: cannot use collectors.NewMonitorCollector(conn, cluster) (type *collectors.MonitorCollector) as type "ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Collector in array or slice literal:
*collectors.MonitorCollector does not implement "ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Collector (wrong type for Collect method)
have Collect(chan<- "github.com/digitalocean/ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
want Collect(chan<- "ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
./exporter.go:89:30: cannot use collectors.NewOSDCollector(conn, cluster) (type *collectors.OSDCollector) as type "ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Collector in array or slice literal:
*collectors.OSDCollector does not implement "ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Collector (wrong type for Collect method)
have Collect(chan<- "github.com/digitalocean/ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
want Collect(chan<- "ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
./exporter.go:95:24: cannot use collectors.NewRGWCollector(cluster, config, false) (type *collectors.RGWCollector) as type "ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Collector in append:
*collectors.RGWCollector does not implement "ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Collector (wrong type for Collect method)
have Collect(chan<- "github.com/digitalocean/ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
want Collect(chan<- "ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
./exporter.go:100:24: cannot use collectors.NewRGWCollector(cluster, config, true) (type *collectors.RGWCollector) as type "ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Collector in append:
*collectors.RGWCollector does not implement "ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Collector (wrong type for Collect method)
have Collect(chan<- "github.com/digitalocean/ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
want Collect(chan<- "ceph_exporter/vendor/github.com/prometheus/client_golang/prometheus".Metric)
We have Ceph Luminous cluster running on Ubuntu 16.04.3 LTS. We also use ceph_exporter via docker (6 days old), but io metrics are always giving zero values.
Any advice how to solve this?
Thank you
ceph_cache_evict_io_bytes{cluster="ceph"} 0
ceph_cache_flush_io_bytes{cluster="ceph"} 0
ceph_cache_promote_io_ops{cluster="ceph"} 0
ceph_client_io_ops{cluster="ceph"} 0
ceph_client_io_read_bytes{cluster="ceph"} 0
ceph_client_io_read_ops{cluster="ceph"} 0
ceph_client_io_write_bytes{cluster="ceph"} 0
ceph_client_io_write_ops{cluster="ceph"} 0
ceph_recovery_io_bytes{cluster="ceph"} 0
ceph_recovery_io_keys{cluster="ceph"} 0
ceph_recovery_io_objects{cluster="ceph"} 0
ceph11 ~ # ceph --version
ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)
ceph11 ~ # ceph -s
cluster:
id: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph11,ceph12,ceph13
mgr: ceph12(active), standbys: ceph13, ceph11
osd: 28 osds: 28 up, 28 in
rgw: 3 daemons active
data:
pools: 6 pools, 1064 pgs
objects: 60452k objects, 6684 GB
usage: 23480 GB used, 16565 GB / 40045 GB avail
pgs: 1062 active+clean
2 active+clean+scrubbing+deep
io:
client: 231 kB/s rd, 4014 kB/s wr, 243 op/s rd, 393 op/s wr
ceph11 ~ #
Would it be possible to export counts for scrub, deepscrub, blocked requests, osds with blocked requests? These metrics are very useful to graph against latency and i/o bandwidth for evaluating cluster performance.
When connected to a Firefly cluster the IO and health collectors fail to parse. Here is an extract of the exporter output.
2016/08/31 11:32:11 Starting ceph exporter on ":9128"
2016/08/31 11:32:17 [ERROR] cannot extract total bytes: strconv.ParseFloat: parsing "": invalid syntax
2016/08/31 11:32:17 [ERROR] cannot extract used bytes: strconv.ParseFloat: parsing "": invalid syntax
2016/08/31 11:32:17 [ERROR] cannot extract available bytes: strconv.ParseFloat: parsing "": invalid syntax
2016/08/31 11:32:17 failed collecting cluster health metrics: strconv.ParseFloat: parsing "": invalid syntax
2016/08/31 11:32:17 [ERROR] Unable to collect data from ceph osd df rados: Invalid argument
2016/08/31 11:32:17 failed collecting osd metrics: rados: Invalid argument
What is the version of CEPH this collector has been developed/tested on ?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.