Comments (15)
Hi @lordk911, can you show your metrics-name-capture-regex
and metrics-name-replacement
setting?
from spark-metrics.
@stoader Thanks for your reply.
*.sink.prometheus.metrics-name-capture-regex=(application_\\d+_\\d+_.{1,6}_)(.+)
*.sink.prometheus.metrics-name-replacement=$2
from spark-metrics.
@lordk911 this error is thrown by Prometheus when there are two metrics with the same name but with different help messages.
In this case:
name:"CodeGenerator_compilationTime" help:"Generated from Dropwizard metric import (metric=application_1539228068007_5477.driver.CodeGenerator.compilationTime, type=com.codahale.metrics.Histogram)"
name:"CodeGenerator_compilationTime" help:"Generated from Dropwizard metric import (metric=application_1539228068007_5478.driver.CodeGenerator.compilationTime, type=com.codahale.metrics.Histogram)
This is caused by the regex replacement being applied only on the name of the metrics but not on the help string. Since the help string contains the metric name this results in the above issue.
We'll fix this issue later this week.
from spark-metrics.
@stoader Thanks for your reply. It is very exciting news.
But there is another proplem, maybe it's about pushgateway not spark-metrics-sink.
Cause there was a lost of spark batch jobs , even if the jobs dead the connection to pushgateway was not close , so after keep run some time , the pushgateway can not work very will , and I found a lost of connections with CLOSE_WAIT state .
CLOSE_WAIT 28417
ESTABLISHED 30
TIME_WAIT 22
Do you have the experience of how to resolve such proplem?
from spark-metrics.
@lordk911 there is new version 2.3-2.0.1
out there which addresses the inconsistent help strings issue. Please give it try and let me know if works you. Note that you'll need to erase all metrics from pushgateway
prior publishing new metrics with version 2.3-2.0.1
.
Regarding the connections in CLOSE_WAIT state, did you run your batch jobs against a singe spark driver or each job with own spark driver?
from spark-metrics.
ok,I will have a try。
We use apache livy to start some long running spark session and the executors were dynamic allocation. then we post some query statements through livy ,all these statements use one session ,one session means one driver. Now I'm not sure the CLOSE_WAIT state was cause by spark session dead or the executors dynamic allocation . I will keep an eye on it .
from spark-metrics.
@stoader with the new version 2.3-2.0.1 the inconsistent help strings issue fixed, thanks.
from spark-metrics.
Closing this issue. Please re-open it if it surfaces again.
from spark-metrics.
level=info ts=2019-08-20T08:32:30.746Z caller=diskmetricstore.go:130 msg="metric families inconsistent help strings" err="Metric families have inconsistent help strings. The latter will have priority. This is bad. Fix your pushed metrics!" new="name:\"java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_init\" help:\"java.lang.management.MemoryUsage (java.lang<type=GarbageCollector, name=PS MarkSweep, key=Compressed Class Space><LastGcInfo, memoryUsageBeforeGc>init)\" type:UNTYPED metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Compressed Class Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Survivor Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:2.359296e+07 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Old Gen\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:3.84827392e+08 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Metaspace\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Eden Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:1.44703488e+08 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Code Cache\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:2.555904e+06 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Compressed Class Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Survivor Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:2.359296e+07 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Old Gen\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:3.84827392e+08 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Metaspace\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Eden Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:1.44703488e+08 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Code Cache\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:2.555904e+06 > > " old="name:\"java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_init\" help:\"java.lang.management.MemoryUsage (java.lang<type=GarbageCollector, name=PS Scavenge, key=Compressed Class Space><LastGcInfo, memoryUsageBeforeGc>init)\" type:UNTYPED metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Compressed Class Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Survivor Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:2.359296e+07 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Old Gen\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:3.84827392e+08 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Metaspace\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Eden Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:1.44703488e+08 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Code Cache\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:2.555904e+06 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Compressed Class Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Survivor Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:2.359296e+07 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Old Gen\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:3.84827392e+08 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Metaspace\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Eden Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:1.44703488e+08 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Code Cache\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:2.555904e+06 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Compressed Class Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Survivor Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:2.359296e+07 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Old Gen\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:3.84827392e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Metaspace\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Eden Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:1.44703488e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Code Cache\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:2.555904e+06 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Compressed Class Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Survivor Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:2.359296e+07 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Old Gen\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:3.84827392e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Metaspace\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Eden Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:1.44703488e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Code Cache\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:2.555904e+06 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Compressed Class Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Survivor Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:2.359296e+07 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Old Gen\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:3.84827392e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Metaspace\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Eden Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:1.44703488e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Code Cache\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:2.555904e+06 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Compressed Class Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Survivor Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:2.359296e+07 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Old Gen\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:3.84827392e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Metaspace\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Eden Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:1.44703488e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Code Cache\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:2.555904e+06 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Compressed Class Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Survivor Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:2.359296e+07 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Old Gen\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:3.84827392e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Metaspace\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Eden Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:1.44703488e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Code Cache\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:2.555904e+06 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Compressed Class Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Survivor Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:2.359296e+07 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Old Gen\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:3.84827392e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Metaspace\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Eden Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:1.44703488e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Code Cache\" > label:<name:\"nam
I am getting similar issue
from spark-metrics.
What spark-metrics version are you using?
Are these log lines from the driver or executor?
Can you share your metrics.properties file also the log lines that show the metrics system intialization (that should be among the fisrt couple of log lines) ?
from spark-metrics.
Version used is 2.3-2.1.0
`# Enable Prometheus for all instances by class name
*.sink.prometheus.class=com.banzaicloud.spark.metrics.sink.PrometheusSink
Prometheus pushgateway address
*.sink.prometheus.pushgateway-address=eaa-platform-pushgateway:9091
*.sink.prometheus.period=60
*.sink.prometheus.pushgateway-enable-timestamp=false
*.sink.prometheus.enable-dropwizard-collector=false
*.sink.prometheus.enable-jmx-collector=true
*.sink.prometheus.jmx-collector-config=/opt/mycomosi/spark/jars/resources/jmxCollector.yaml
*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink
*.sink.prometheus.enable-hostname-in-instance=true
Enable JVM metrics source for all instances by class name
*.source.jvm.class=org.apache.spark.metrics.source.JvmSource`
coming in pushgateway logs.
Also, Can you help me understand why we need to set *.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink this property?
First incident of log :
oc logs -f eaa-platform-pushgateway-5-hb016
level=info ts=2019-08-20T18:58:30.326Z caller=main.go:78 msg="starting pushgateway" version="(version=0.9.0, branch=HEAD, revision=44d7ae6d9fb05dfe141eefb4bdf1bed7b89dee31)"
level=info ts=2019-08-20T18:58:30.326Z caller=main.go:79 build_context="(go=go1.12.7, user=root@45c30774a08c, date=20190723-15:31:55)"
level=info ts=2019-08-20T18:58:30.349Z caller=main.go:128 listen_address=:9091
level=info ts=2019-08-20T19:00:00.580Z caller=diskmetricstore.go:130 msg="metric families inconsistent help strings" err="Metric families have inconsistent help strings. The latter will have priority. This is bad. Fix your pushed metrics!" new="name:"kafka_consumer_consumer_fetch_manager_metrics_records_per_request_avg" help:"The average number of records in each request (kafka.consumer<type=consumer-fetch-manager-metrics, client-id=consumer-5><>records-per-request-avg)" type:UNTYPED metric:<label:<name:"app_name" value:"service-impact-spark-2.8.9+3259c38" > label:<name:"client_id" value:"consumer-5" > label:<name:"instance" value:"eaa-platform-spark-worker-0" > label:<name:"job" value:"app-20190820205332-0019" > label:<name:"number" value:"0" > label:<name:"role" value:"executor" > untyped:<value:0 > > metric:<label:<name:"app_name" value:"service-impact-spark-2.8.9+3259c38" > label:<name:"client_id" value:"consumer-5" > label:<name:"instance" value:"eaa-platform-spark-worker-0" > label:<name:"job" value:"app-20190820205332-0019" > label:<name:"number" value:"0" > label:<name:"role" value:"executor" > label:<name:"topic" value:"event-alarm" > untyped:<value:0 > > metric:<label:<name:"app_name" value:"service-impact-spark-2.8.9+3259c38" > label:<name:"client_id" value:"consumer-1" > label:<name:"instance" value:"eaa-platform-spark-worker-0" > label:<name:"job" value:"app-20190820205332-0019" > label:<name:"number" value:"0" > label:<name:"role" value:"executor" > untyped:<value:0 > > metric:<label:<name:"app_name" value:"service-impact-spark-2.8.9+3259c38" > label:<name:"client_id" value:"consumer-2" > label:<name:"instance" value:"eaa-platform-spark-worker-0" > label:<name:"job" value:"app-20190820205332-0019" > label:<name:"number" value:"0" > label:<name:"role" value:"executor" > untyped:<value:0 > > " old="name:"kafka_consumer_consumer_fetch_manager_metrics_records_per_request_avg" help:"The average number of records in each request (kafka.consumer<type=consumer-fetch-manager-metrics, client-id=consumer-1><>records-per-request-avg)" type:UNTYPED metric:<label:<name:"app_name" value:"service-impact-spark-2.8.9+3259c38" > label:<name:"client_id" value:"consumer-1" > label:<name:"instance" value:"eaa-platform-spark-worker-2" > label:<name:"job" value:"app-20190820205332-0019" > label:<name:"role" value:"driver" > untyped:<value:0 > > "
...
from spark-metrics.
The metrics system intialization is in the driver and executor log. Can you provide driver and executor logs?
There are two different metrics system in Spark. One provides metrics through the Dropwizzard library the other through Jmx Nbeans. The metrics published from Spark by these are different (through there is some overall).
You can choose which one to use through enable-dropwizard-collector or enable-jmx-collector.
If you choose Dropwizzard than the *.sink.jmx.class is ignored.
Have you tried setting a metrics namespace for you spark jobs? I'm asking as multiple spark jobs may publish some metrics with the same name. This is not correct from Prometheus standpoint. Thus if you have multiple spark jobs sending metrics to the same Prometheus instance you should differentiate them by using different metrics namespace.
from spark-metrics.
Looking at this more the issue relates to the way how metrics and their help messages are constructed by JmxCollector when gathering metrics from MBeans inside Spark.
In this particular case there are two metrics with the same name but different help string:
"name:\"java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_init\" help:\"java.lang.management.MemoryUsage (java.lang<type=GarbageCollector, name=PS MarkSweep, key=Compressed Class Space><LastGcInfo, memoryUsageBeforeGc>init)\"
"name:\"java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_init\" help:\"java.lang.management.MemoryUsage (java.lang<type=GarbageCollector, name=PS Scavenge, key=Compressed Class Space><LastGcInfo, memoryUsageBeforeGc>init)\"
This is not allowed by Prometheus thus Prometheus drops one of the two metrics. (Prometheus requires that two metrics with the same name to have the same help string as well.
The help message for a metric is constructed as follows:
https://github.com/prometheus/jmx_exporter/blob/master/collector/src/main/java/io/prometheus/jmx/JmxCollector.java#L363
String help = attrDescription + " (" + beanName + attrName + ")";
The MBean attribute description, MBean name and attribute name is used to construct the help string for a metric. The metrics name is constructed as follows: https://github.com/prometheus/jmx_exporter/blob/master/collector/src/main/java/io/prometheus/jmx/JmxCollector.java#L313-L329
This may lead to cases when two metrics with same name have different help string.
I'd suggest to use imx exporter rules https://github.com/prometheus/jmx_exporter#configuration in the your imx exporter config file to suppress the help string.
In case you have multiple spark jobs running in the same cluster and passing metrics to the same Prometheus instance chances that there will be two metrics with same name and different help string is higher. To avoid this either strip all help strings or ensure that the fields that make up the spark job metrics key instance
, app_name
, role
to be unique
from spark-metrics.
@dahiyahimanshu if you're using the JMX Collector to collect Spark metrics (*.sink.prometheus.enable-jmx-collector=true
) than you should be able to change metrics name as in this example:
https://github.com/prometheus/jmx_exporter/blob/master/example_configs/spark.yml
If you're using Spark metrics exposed via the Dropwizzard library (*.sink.prometheus.enable-dropwizard-collector=true) there is some limited capability to change metric name using metrics-name-replacement
(see https://github.com/banzaicloud/spark-metrics/blob/master/PrometheusSink.md#how-to-enable-prometheussink-in-spark).
from spark-metrics.
Hi @stoader ,
(*.sink.prometheus.enable-jmx-collector=true) than you should be able to change metrics name as in this example:
https://github.com/prometheus/jmx_exporter/blob/master/example_configs/spark.yml
thanks for input. i`ll try that but i found out the metrics name are correct infact help string mentioning different garbage collector. So as you mentioned earlier i need to change only help string.
- There are limitations using Dropwizzard collector i.e. JMX metrics will not be availaible with Dropwizard export method.
from spark-metrics.
Related Issues (20)
- Metrics namespace HOT 11
- Pushgateway Read timed out HOT 2
- Want to understand that this spark matrics repo will work with prometheus in Hadoop cluster ? HOT 4
- Release Spark provided fix to maven HOT 2
- No Metrics From Spark Executors (Classes are being instantiated) HOT 10
- Metrics name pre-processing by custom Prometheus sink is working for only one component(driver/executor/applicationMaster) HOT 10
- Filter metrics HOT 10
- Metric Name RegEx Replacement doesn't work with JMX HOT 2
- Spark Metrics Stop Pushing After Pushgateway Restarts
- Configure sink to stop sending job as label/group-key HOT 3
- Metrics filter doesn't work
- Add remote_write to PrometheusSink
- Security Policy violation Binary Artifacts HOT 315
- Security Policy violation Branch Protection HOT 314
- Prometheus Sink is not working with SparkPi
- VictoriaMetrics HOT 2
- Repetitions of last metric value HOT 2
- Adding the ability to set custom labels on metrics
- com.banzaicloud:spark-metrics_2.12:3.1-1.0.0 version Not published to maven central HOT 3
- Only driver metrics visible on local
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spark-metrics.