Git Product home page Git Product logo

Comments (15)

stoader avatar stoader commented on June 1, 2024

Hi @lordk911, can you show your metrics-name-capture-regex and metrics-name-replacement setting?

from spark-metrics.

lordk911 avatar lordk911 commented on June 1, 2024

@stoader Thanks for your reply.

*.sink.prometheus.metrics-name-capture-regex=(application_\\d+_\\d+_.{1,6}_)(.+)
*.sink.prometheus.metrics-name-replacement=$2

from spark-metrics.

stoader avatar stoader commented on June 1, 2024

@lordk911 this error is thrown by Prometheus when there are two metrics with the same name but with different help messages.

In this case:

name:"CodeGenerator_compilationTime" help:"Generated from Dropwizard metric import (metric=application_1539228068007_5477.driver.CodeGenerator.compilationTime, type=com.codahale.metrics.Histogram)"

name:"CodeGenerator_compilationTime" help:"Generated from Dropwizard metric import (metric=application_1539228068007_5478.driver.CodeGenerator.compilationTime, type=com.codahale.metrics.Histogram)

This is caused by the regex replacement being applied only on the name of the metrics but not on the help string. Since the help string contains the metric name this results in the above issue.

We'll fix this issue later this week.

from spark-metrics.

lordk911 avatar lordk911 commented on June 1, 2024

@stoader Thanks for your reply. It is very exciting news.
But there is another proplem, maybe it's about pushgateway not spark-metrics-sink.
Cause there was a lost of spark batch jobs , even if the jobs dead the connection to pushgateway was not close , so after keep run some time , the pushgateway can not work very will , and I found a lost of connections with CLOSE_WAIT state .

CLOSE_WAIT 28417
ESTABLISHED 30
TIME_WAIT 22

Do you have the experience of how to resolve such proplem?

from spark-metrics.

stoader avatar stoader commented on June 1, 2024

@lordk911 there is new version 2.3-2.0.1 out there which addresses the inconsistent help strings issue. Please give it try and let me know if works you. Note that you'll need to erase all metrics from pushgateway prior publishing new metrics with version 2.3-2.0.1.

Regarding the connections in CLOSE_WAIT state, did you run your batch jobs against a singe spark driver or each job with own spark driver?

from spark-metrics.

lordk911 avatar lordk911 commented on June 1, 2024

ok,I will have a try。
We use apache livy to start some long running spark session and the executors were dynamic allocation. then we post some query statements through livy ,all these statements use one session ,one session means one driver. Now I'm not sure the CLOSE_WAIT state was cause by spark session dead or the executors dynamic allocation . I will keep an eye on it .

from spark-metrics.

lordk911 avatar lordk911 commented on June 1, 2024

@stoader with the new version 2.3-2.0.1 the inconsistent help strings issue fixed, thanks.

from spark-metrics.

stoader avatar stoader commented on June 1, 2024

Closing this issue. Please re-open it if it surfaces again.

from spark-metrics.

dahiyahimanshu avatar dahiyahimanshu commented on June 1, 2024

level=info ts=2019-08-20T08:32:30.746Z caller=diskmetricstore.go:130 msg="metric families inconsistent help strings" err="Metric families have inconsistent help strings. The latter will have priority. This is bad. Fix your pushed metrics!" new="name:\"java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_init\" help:\"java.lang.management.MemoryUsage (java.lang<type=GarbageCollector, name=PS MarkSweep, key=Compressed Class Space><LastGcInfo, memoryUsageBeforeGc>init)\" type:UNTYPED metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Compressed Class Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Survivor Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:2.359296e+07 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Old Gen\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:3.84827392e+08 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Metaspace\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Eden Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:1.44703488e+08 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Code Cache\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:2.555904e+06 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Compressed Class Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Survivor Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:2.359296e+07 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Old Gen\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:3.84827392e+08 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Metaspace\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Eden Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:1.44703488e+08 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Code Cache\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:2.555904e+06 > > " old="name:\"java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_init\" help:\"java.lang.management.MemoryUsage (java.lang<type=GarbageCollector, name=PS Scavenge, key=Compressed Class Space><LastGcInfo, memoryUsageBeforeGc>init)\" type:UNTYPED metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Compressed Class Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Survivor Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:2.359296e+07 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Old Gen\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:3.84827392e+08 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Metaspace\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Eden Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:1.44703488e+08 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Code Cache\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:2.555904e+06 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Compressed Class Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Survivor Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:2.359296e+07 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Old Gen\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:3.84827392e+08 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Metaspace\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"PS Eden Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:1.44703488e+08 > > metric:<label:<name:\"app_name\" value:\"service-impact-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820085836-0000\" > label:<name:\"key\" value:\"Code Cache\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:2.555904e+06 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Compressed Class Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Survivor Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:2.359296e+07 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Old Gen\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:3.84827392e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Metaspace\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Eden Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:1.44703488e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Code Cache\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"role\" value:\"driver\" > untyped:<value:2.555904e+06 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Compressed Class Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Survivor Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:2.359296e+07 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Old Gen\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:3.84827392e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Metaspace\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Eden Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:1.44703488e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-1\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Code Cache\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"role\" value:\"driver\" > untyped:<value:2.555904e+06 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Compressed Class Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Survivor Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:2.359296e+07 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Old Gen\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:3.84827392e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Metaspace\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Eden Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:1.44703488e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Code Cache\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:2.555904e+06 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Compressed Class Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Survivor Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:2.359296e+07 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Old Gen\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:3.84827392e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Metaspace\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Eden Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:1.44703488e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-2\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Code Cache\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"1\" > label:<name:\"role\" value:\"executor\" > untyped:<value:2.555904e+06 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Compressed Class Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Survivor Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:2.359296e+07 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Old Gen\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:3.84827392e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Metaspace\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Eden Space\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:1.44703488e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Code Cache\" > label:<name:\"name\" value:\"PS Scavenge\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:2.555904e+06 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Compressed Class Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Survivor Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:2.359296e+07 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Old Gen\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:3.84827392e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Metaspace\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:0 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"PS Eden Space\" > label:<name:\"name\" value:\"PS MarkSweep\" > label:<name:\"number\" value:\"0\" > label:<name:\"role\" value:\"executor\" > untyped:<value:1.44703488e+08 > > metric:<label:<name:\"app_name\" value:\"problem-management-spark-2.8.9+3259c38\" > label:<name:\"instance\" value:\"eaa-platform-spark-worker-0\" > label:<name:\"job\" value:\"app-20190820090041-0001\" > label:<name:\"key\" value:\"Code Cache\" > label:<name:\"nam

I am getting similar issue

from spark-metrics.

stoader avatar stoader commented on June 1, 2024

What spark-metrics version are you using?

Are these log lines from the driver or executor?

Can you share your metrics.properties file also the log lines that show the metrics system intialization (that should be among the fisrt couple of log lines) ?

from spark-metrics.

dahiyahimanshu avatar dahiyahimanshu commented on June 1, 2024

Version used is 2.3-2.1.0

`# Enable Prometheus for all instances by class name
*.sink.prometheus.class=com.banzaicloud.spark.metrics.sink.PrometheusSink

Prometheus pushgateway address

*.sink.prometheus.pushgateway-address=eaa-platform-pushgateway:9091
*.sink.prometheus.period=60
*.sink.prometheus.pushgateway-enable-timestamp=false
*.sink.prometheus.enable-dropwizard-collector=false
*.sink.prometheus.enable-jmx-collector=true
*.sink.prometheus.jmx-collector-config=/opt/mycomosi/spark/jars/resources/jmxCollector.yaml
*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink
*.sink.prometheus.enable-hostname-in-instance=true

Enable JVM metrics source for all instances by class name

*.source.jvm.class=org.apache.spark.metrics.source.JvmSource`

coming in pushgateway logs.

Also, Can you help me understand why we need to set *.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink this property?

First incident of log :

oc logs -f eaa-platform-pushgateway-5-hb016

level=info ts=2019-08-20T18:58:30.326Z caller=main.go:78 msg="starting pushgateway" version="(version=0.9.0, branch=HEAD, revision=44d7ae6d9fb05dfe141eefb4bdf1bed7b89dee31)"
level=info ts=2019-08-20T18:58:30.326Z caller=main.go:79 build_context="(go=go1.12.7, user=root@45c30774a08c, date=20190723-15:31:55)"
level=info ts=2019-08-20T18:58:30.349Z caller=main.go:128 listen_address=:9091
level=info ts=2019-08-20T19:00:00.580Z caller=diskmetricstore.go:130 msg="metric families inconsistent help strings" err="Metric families have inconsistent help strings. The latter will have priority. This is bad. Fix your pushed metrics!" new="name:"kafka_consumer_consumer_fetch_manager_metrics_records_per_request_avg" help:"The average number of records in each request (kafka.consumer<type=consumer-fetch-manager-metrics, client-id=consumer-5><>records-per-request-avg)" type:UNTYPED metric:<label:<name:"app_name" value:"service-impact-spark-2.8.9+3259c38" > label:<name:"client_id" value:"consumer-5" > label:<name:"instance" value:"eaa-platform-spark-worker-0" > label:<name:"job" value:"app-20190820205332-0019" > label:<name:"number" value:"0" > label:<name:"role" value:"executor" > untyped:<value:0 > > metric:<label:<name:"app_name" value:"service-impact-spark-2.8.9+3259c38" > label:<name:"client_id" value:"consumer-5" > label:<name:"instance" value:"eaa-platform-spark-worker-0" > label:<name:"job" value:"app-20190820205332-0019" > label:<name:"number" value:"0" > label:<name:"role" value:"executor" > label:<name:"topic" value:"event-alarm" > untyped:<value:0 > > metric:<label:<name:"app_name" value:"service-impact-spark-2.8.9+3259c38" > label:<name:"client_id" value:"consumer-1" > label:<name:"instance" value:"eaa-platform-spark-worker-0" > label:<name:"job" value:"app-20190820205332-0019" > label:<name:"number" value:"0" > label:<name:"role" value:"executor" > untyped:<value:0 > > metric:<label:<name:"app_name" value:"service-impact-spark-2.8.9+3259c38" > label:<name:"client_id" value:"consumer-2" > label:<name:"instance" value:"eaa-platform-spark-worker-0" > label:<name:"job" value:"app-20190820205332-0019" > label:<name:"number" value:"0" > label:<name:"role" value:"executor" > untyped:<value:0 > > " old="name:"kafka_consumer_consumer_fetch_manager_metrics_records_per_request_avg" help:"The average number of records in each request (kafka.consumer<type=consumer-fetch-manager-metrics, client-id=consumer-1><>records-per-request-avg)" type:UNTYPED metric:<label:<name:"app_name" value:"service-impact-spark-2.8.9+3259c38" > label:<name:"client_id" value:"consumer-1" > label:<name:"instance" value:"eaa-platform-spark-worker-2" > label:<name:"job" value:"app-20190820205332-0019" > label:<name:"role" value:"driver" > untyped:<value:0 > > "
...

from spark-metrics.

stoader avatar stoader commented on June 1, 2024

The metrics system intialization is in the driver and executor log. Can you provide driver and executor logs?

There are two different metrics system in Spark. One provides metrics through the Dropwizzard library the other through Jmx Nbeans. The metrics published from Spark by these are different (through there is some overall).
You can choose which one to use through enable-dropwizard-collector or enable-jmx-collector.

If you choose Dropwizzard than the *.sink.jmx.class is ignored.

Have you tried setting a metrics namespace for you spark jobs? I'm asking as multiple spark jobs may publish some metrics with the same name. This is not correct from Prometheus standpoint. Thus if you have multiple spark jobs sending metrics to the same Prometheus instance you should differentiate them by using different metrics namespace.

from spark-metrics.

stoader avatar stoader commented on June 1, 2024

Looking at this more the issue relates to the way how metrics and their help messages are constructed by JmxCollector when gathering metrics from MBeans inside Spark.

In this particular case there are two metrics with the same name but different help string:

"name:\"java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_init\" help:\"java.lang.management.MemoryUsage (java.lang<type=GarbageCollector, name=PS MarkSweep, key=Compressed Class Space><LastGcInfo, memoryUsageBeforeGc>init)\"

"name:\"java_lang_GarbageCollector_LastGcInfo_memoryUsageBeforeGc_init\" help:\"java.lang.management.MemoryUsage (java.lang<type=GarbageCollector, name=PS Scavenge, key=Compressed Class Space><LastGcInfo, memoryUsageBeforeGc>init)\" 

This is not allowed by Prometheus thus Prometheus drops one of the two metrics. (Prometheus requires that two metrics with the same name to have the same help string as well.

The help message for a metric is constructed as follows:
https://github.com/prometheus/jmx_exporter/blob/master/collector/src/main/java/io/prometheus/jmx/JmxCollector.java#L363

String help = attrDescription + " (" + beanName + attrName + ")";

The MBean attribute description, MBean name and attribute name is used to construct the help string for a metric. The metrics name is constructed as follows: https://github.com/prometheus/jmx_exporter/blob/master/collector/src/main/java/io/prometheus/jmx/JmxCollector.java#L313-L329

This may lead to cases when two metrics with same name have different help string.

I'd suggest to use imx exporter rules https://github.com/prometheus/jmx_exporter#configuration in the your imx exporter config file to suppress the help string.

In case you have multiple spark jobs running in the same cluster and passing metrics to the same Prometheus instance chances that there will be two metrics with same name and different help string is higher. To avoid this either strip all help strings or ensure that the fields that make up the spark job metrics key instance, app_name, role to be unique

@sancyx @baluchicken

from spark-metrics.

stoader avatar stoader commented on June 1, 2024

@dahiyahimanshu if you're using the JMX Collector to collect Spark metrics (*.sink.prometheus.enable-jmx-collector=true) than you should be able to change metrics name as in this example:
https://github.com/prometheus/jmx_exporter/blob/master/example_configs/spark.yml

If you're using Spark metrics exposed via the Dropwizzard library (*.sink.prometheus.enable-dropwizard-collector=true) there is some limited capability to change metric name using metrics-name-replacement (see https://github.com/banzaicloud/spark-metrics/blob/master/PrometheusSink.md#how-to-enable-prometheussink-in-spark).

from spark-metrics.

dahiyahimanshu avatar dahiyahimanshu commented on June 1, 2024

Hi @stoader ,

(*.sink.prometheus.enable-jmx-collector=true) than you should be able to change metrics name as in this example:
https://github.com/prometheus/jmx_exporter/blob/master/example_configs/spark.yml

thanks for input. i`ll try that but i found out the metrics name are correct infact help string mentioning different garbage collector. So as you mentioned earlier i need to change only help string.

  1. There are limitations using Dropwizzard collector i.e. JMX metrics will not be availaible with Dropwizard export method.

from spark-metrics.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.