Git Product home page Git Product logo

Comments (11)

Gnoale avatar Gnoale commented on June 1, 2024

I have looked at this issue
But in my case, even the metric name stays set to the app.id value, so each time I run the job, new metrics are created...

from spark-metrics.

stoader avatar stoader commented on June 1, 2024

Instead of --repositories https://raw.github.com/banzaicloud/spark-metrics/master/maven-repo/releases use the maven repository (https://search.maven.org/artifact/com.banzaicloud/spark-metrics_2.11) to ensure that you use the latest version of spark-metrics that matches your Spark version (eg. 2.3-3.0.1)

from spark-metrics.

Gnoale avatar Gnoale commented on June 1, 2024

I just tried spark-submit --master yarn --queue default --conf spark.metrics.conf=/mnt/code/infra-hdp-test/metrics.conf --deploy-mode cluster --packages org.apache.hadoop:hadoop-aws:2.7.7,org.elasticsearch:elasticsearch-spark-20_2.11:6.5.4,org.apache.hadoop:hadoop-aws:2.7.7,org.elasticsearch:elasticsearch-spark-20_2.11:6.5.4,com.banzaicloud:spark-metrics_2.11:2.3-3.0.1,io.prometheus:simpleclient:0.3.0,io.prometheus:simpleclient_dropwizard:0.3.0,io.prometheus:simpleclient_pushgateway:0.3.0,io.dropwizard.metrics:metrics-core:3.1.2 /tmp/test.py
It fails and no metrics are posted at all
I have no clue in the stack traces

from spark-metrics.

Gnoale avatar Gnoale commented on June 1, 2024

I tried also with the prometheus 0.8.1 modules version
I 'm checking the libs, I didn't wrote this script test.py
Logs from the node (same exception as with prom modules 0.3.0 fyi):

Traceback (most recent call last):
  File "test.py", line 12, in <module>
    .config(conf=conf)
  File "/mnt/disk5/yarn/local/usercache/g.noale/appcache/application_1585733805129_0091/container_e77_1585733805129_0091_02_000001/pyspark.zip/pyspark/sql/session.py", line 173, in getOrCreate
  File "/mnt/disk5/yarn/local/usercache/g.noale/appcache/application_1585733805129_0091/container_e77_1585733805129_0091_02_000001/pyspark.zip/pyspark/context.py", line 353, in getOrCreate
  File "/mnt/disk5/yarn/local/usercache/g.noale/appcache/application_1585733805129_0091/container_e77_1585733805129_0091_02_000001/pyspark.zip/pyspark/context.py", line 119, in __init__
  File "/mnt/disk5/yarn/local/usercache/g.noale/appcache/application_1585733805129_0091/container_e77_1585733805129_0091_02_000001/pyspark.zip/pyspark/context.py", line 181, in _do_init
  File "/mnt/disk5/yarn/local/usercache/g.noale/appcache/application_1585733805129_0091/container_e77_1585733805129_0091_02_000001/pyspark.zip/pyspark/context.py", line 292, in _initialize_context
  File "/mnt/disk5/yarn/local/usercache/g.noale/appcache/application_1585733805129_0091/container_e77_1585733805129_0091_02_000001/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1525, in __call__
  File "/mnt/disk5/yarn/local/usercache/g.noale/appcache/application_1585733805129_0091/container_e77_1585733805129_0091_02_000001/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NoSuchMethodException: com.banzaicloud.spark.metrics.sink.PrometheusSink.<init>(java.util.Properties, com.codahale.metrics.MetricRegistry, org.apache.spark.SecurityManager)

from spark-metrics.

stoader avatar stoader commented on June 1, 2024

the java.lang.NoSuchMethodException: com.banzaicloud.spark.metrics.sink.PrometheusSink error message indicates that the spark-metrics.jar is not available on the node. Please ensure that the jar is downloaded from maven repository and is available on the node.

from spark-metrics.

stoader avatar stoader commented on June 1, 2024

Also *.sink.prometheus.class in your sparm-metrics config seems to be wrong it should be *.sink.prometheus.class=org.apache.spark.banzaicloud.metrics.sink.PrometheusSink (see https://github.com/banzaicloud/spark-metrics/blob/2.3-3.0.1/PrometheusSink.md#how-to-enable-prometheussink-in-spark)

from spark-metrics.

Gnoale avatar Gnoale commented on June 1, 2024

Thanks for noticing, I don't know why we ended up with something different than in the doc...
However, with the correct class path, without --repositories and the latest package version com.banzaicloud:spark-metrics_2.11:2.3-2.1.0 it still fail but I get differents error :

1. jmx collector enable

*.sink.prometheus.class=org.apache.spark.banzaicloud.metrics.sink.PrometheusSink
# Prometheus pushgateway address
*.sink.prometheus.pushgateway-address-protocol=https
*.sink.prometheus.pushgateway-address=ourpushgateway
#*.sink.prometheus.period=<period> - defaults to 10
#*.sink.prometheus.unit=< unit> - defaults to seconds (TimeUnit.SECONDS)
#*.sink.prometheus.pushgateway-enable-timestamp=<enable/disable metrics timestamp> - defaults to false

# Metrics name processing (version 2.3-1.1.0 +)
#*.sink.prometheus.metrics-name-capture-regex="application_.*"
#*.sink.prometheus.metrics-name-replacement=${spark.app.name}
#*.sink.prometheus.labels=<labels in label=value format separated by comma>

# Support for JMX Collector (version 2.3-2.0.0 +)
*.sink.prometheus.enable-dropwizard-collector=true
*.sink.prometheus.enable-jmx-collector=true
*.sink.prometheus.jmx-collector-config=/mnt/code/infra-hdp-test/jmxCollector.yaml

# Enable HostName in Instance instead of Appid (Default value is false i.e. instance=${appid})
*.sink.prometheus.enable-hostname-in-instance=true

# Enable JVM metrics source for all instances by class name
*.sink.jmx.class=org.apache.spark.metrics.sink.JmxSink
*.source.jvm.class=org.apache.spark.metrics.source.JvmSource
File "/mnt/disk3/yarn/local/usercache/g.noale/appcache/application_1585733805129_0560/container_e77_1585733805129_0560_02_000001/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1067, in start
    self.socket.connect((self.address, self.port))
  File "/usr/lib64/python2.7/socket.py", line 224, in meth
    return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused

Also, some metrics are posted

2. jmx collector disable

# Support for JMX Collector (version 2.3-2.0.0 +)
#*.sink.prometheus.enable-dropwizard-collector=true
#*.sink.prometheus.enable-jmx-collector=true
#*.sink.prometheus.jmx-collector-config=/mnt/code/infra-hdp-test/jmxCollector.yaml

py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext. : java.lang.ClassNotFoundException: org.apache.spark.banzaicloud.metrics.sink.PrometheusSink

No metrics posted

from spark-metrics.

stoader avatar stoader commented on June 1, 2024

You can not have both dropwizzard and jmx metrics enabled at the same time:

*.sink.prometheus.enable-dropwizard-collector=true
*.sink.prometheus.enable-jmx-collector=true

Enable only one of them. Spark provides more metrics through dropwizzard than through imx, though consuming imx metrics might be easier.

from spark-metrics.

Gnoale avatar Gnoale commented on June 1, 2024

Ok, now we do not get any exception by setting driver.sink.prometheus.class=org.apache.spark.banzaicloud.metrics.sink.PrometheusSink
I think we have issue with maven because it only works with --repository

But I cannot get the metrics namespace to the right value, i.e the job value in prometheus.

20/04/02 12:27:36 INFO PrometheusSink: metricsNamespace=None, sparkAppName=Some(test.py), sparkAppId=Some(application_1585816535116_0110), executorId=Some(driver)
20/04/02 12:27:36 INFO PrometheusSink: role=driver, job=application_1585816535116_0110``

I tried in the config file METRICS_NAMESPACE= and spark.metrics.namespace=

from spark-metrics.

Gnoale avatar Gnoale commented on June 1, 2024

Hi ! finally we worked it out by putting manually all the jars in a shared folder

 spark-submit --master yarn --queue default --conf spark.metrics.conf=/mnt/code/infra-hdp-test/metrics.conf --deploy-mode client --jars /mnt/code/infra-hdp-test/spark-metrics_2.11-2.3-3.0.1.jar,/mnt/code/infra-hdp-test/simpleclient-0.3.0.jar,/mnt/code/infra-hdp-test/simpleclient_pushgateway-0.3.0.jar,/mnt/code/infra-hdp-test/metrics-core-3.1.2.jar,/mnt/code/infra-hdp-test/simpleclient_dropwizard-0.3.0.jar,/mnt/code/infra-hdp-test/simpleclient_common-0.3.0.jar  --conf spark.executor.extraClassPath=/mnt/code/infra-hdp-test/spark-metrics_2.11-2.3-3.0.1.jar:/mnt/code/infra-hdp-test/simpleclient-0.3.0.jar:/mnt/code/infra-hdp-test/simpleclient_pushgateway-0.3.0.jar:/mnt/code/infra-hdp-test/metrics-core-3.1.2.jar:/mnt/code/infra-hdp-test/simpleclient_dropwizard-0.3.0.jar:/mnt/code/infra-hdp-test/simpleclient_common-0.3.0.jar /tmp/test.py

And the only way we found to replace the metrics namespace is directly in the app code

conf = (
        SparkConf()
        .set('spark.serializer', 'org.apache.spark.serializer.KryoSerializer')
        .set('spark.metrics.namespace', 'test_namespace')
    )

And now, we need the app.id back in a metric label
And we are back to the point where the interpolation doesn't work in metrics.conf :
I set *.sink.prometheus.labels=appid=${spark.app.id}
I get a label with appid="${spark.app.id}" which is not really usefull :-)

Any advice ?

from spark-metrics.

stoader avatar stoader commented on June 1, 2024

Substitution for *.sink.prometheus.labels is not supported. The labels provided through *.sink.prometheus.labels are passed to Prometheus in their original format. The *.sink.prometheus.labels is meant to be used for static list of labels that you want to have on all metrics in addition to the ones published by Spark. The value of spark.app.id is published under instance label.

/cc @sancyx @baluchicken

from spark-metrics.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.