I'm using spark 2.4.0. When I follow the steps mentioned in <a href="https://githu

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thank you <a class="user-mention notranslate" data-hovercard-type="user" data-hovercar

Spark 2.4.0: Sink class com.banzaicloud.spark.metrics.sink.PrometheusSink cannot be instantiated about spark-metrics HOT 12 CLOSED

banzaicloud commented on June 6, 2024

Spark 2.4.0: Sink class com.banzaicloud.spark.metrics.sink.PrometheusSink cannot be instantiated

from spark-metrics.

Comments (12)

mitchelldavis commented on June 6, 2024 2

@stoader I was able to get it figured out and for anyone else who is struggling to get this to work on AWS EMR with Yarn, here are the steps I took:

I created the following pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>company</groupId>
	<artifactId>helloworld</artifactId>
	<version>1.0-SNAPSHOT</version>

	<repositories>
		<repository>
			<id>banzaicloud</id>
			<name>banzaicloud</name>
			<url>https://raw.github.com/banzaicloud/spark-metrics/master/maven-repo/releases</url>
		</repository>
	</repositories>

	<build>
		<plugins>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-compiler-plugin</artifactId>
				<configuration>
					<source>1.8</source>
					<target>1.8</target>
				</configuration>
				<version>3.7.0</version>
			</plugin>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-dependency-plugin</artifactId>
				<version>3.0.0</version>
				<executions>
					<execution>
						<id>copy-dependencies</id>
						<phase>package</phase>
						<goals>
							<goal>copy-dependencies</goal>
						</goals>
						<configuration>
							<outputDirectory>${project.build.directory}/alternateLocation</outputDirectory>
							<overWriteReleases>false</overWriteReleases>
							<overWriteSnapshots>false</overWriteSnapshots>
							<overWriteIfNewer>true</overWriteIfNewer>
						</configuration>
					</execution>
				</executions>
			</plugin>
		</plugins>
	</build>

	<dependencies>
		<dependency>
			<groupId>com.banzaicloud</groupId>
			<artifactId>spark-metrics_2.11</artifactId>
			<version>2.3-2.0.4</version>
		</dependency>
		<dependency>
			<groupId>io.prometheus</groupId>
			<artifactId>simpleclient</artifactId>
			<version>0.3.0</version>
		</dependency>
		<dependency>
			<groupId>io.prometheus</groupId>
			<artifactId>simpleclient_dropwizard</artifactId>
			<version>0.3.0</version>
		</dependency>
		<dependency>
			<groupId>io.prometheus</groupId>
			<artifactId>simpleclient_pushgateway</artifactId>
			<version>0.3.0</version>
		</dependency>
		<dependency>
			<groupId>io.dropwizard.metrics</groupId>
			<artifactId>metrics-core</artifactId>
			<version>3.1.2</version>
		</dependency>
	</dependencies>
</project>

Then ran the following command line to gather up all the dependencies:

mvn dependency:copy-dependencies -DoutputDirectory="./result"

The ./result directory now has all of the dependencies for the spark-metrics jar. Zipping all that up and transfering the archive to all of the nodes of the EMR cluster, then unzipping into the /usr/lib/spark/jars location prepared the EMR cluster to use the spark-metrics library.

Here is the example job that I was using to test:

spark-submit \
	--class org.apache.spark.examples.SparkPi \
	--repositories https://raw.github.com/banzaicloud/spark-metrics/master/maven-repo/releases \
	--packages com.banzaicloud:spark-metrics_2.11:2.3-2.0.4,io.prometheus:simpleclient:0.3.0,io.prometheus:simpleclient_dropwizard:0.3.0,io.prometheus:simpleclient_pushgateway:0.3.0,io.dropwizard.metrics:metrics-core:3.1.2 \
	--jars /usr/lib/spark/jars/metrics-core-3.1.2.jar,/usr/lib/spark/jars/simpleclient-0.3.0.jar,/usr/lib/spark/jars/simpleclient_dropwizard-0.3.0.jar,/usr/lib/spark/jars/simpleclient_pushgateway-0.3.0.jar,/usr/lib/spark/jars/spark-metrics_2.11-2.3-2.0.4.jar \
	--conf spark.metrics.conf=/baseSinkConfig/sinkprops.conf \
	/usr/lib/spark/examples/jars/spark-examples.jar 1000

the /baseSinkConfi/sinkprops.conf has the configuration mentioned in the documentation.

(Note, this may not answer the question that prompted opening this issue, but it solved my issue and wanted to share it here so other's on the google train will find it.)

from spark-metrics.

stoader commented on June 6, 2024 1

@amitrmishra can you check and confirm that the $PWD/com.banzaicloud_spark-metrics_2.11-2.3-2.0.1.jar is actually available on the host?

Can you share your metrics.properties config?

I also suggest using com.banzaicloud:spark-metrics_2.11:2.3-2.0.4 instead of com.banzaicloud:spark-metrics_2.11:2.3-2.0.1

from spark-metrics.

amitrmishra commented on June 6, 2024

Yes the jar is in my list of external libraries. After building my project assembly jar, I can see the class (using jar tf ) com.banzaicloud.spark.metrics.sink.PrometheusSink is contained.

When running my spark application locally, it works fine. But when running the same application in cluster, using --master yarn, I start to get ClassNotFoundException on the executors.

I tried following approaches, but none of them worked:

Using --packages and --repositories for banzaicloud artifact
Downloading the jars and then passing them in --jars
Using --conf spark.executor.extraJavaOptions to include the jars
Using --conf spark.executor.userClassPathFirst=true

I can the jar with class 'com.banzaicloud.spark.metrics.sink.PrometheusSink' is able to the executor machine and in the executor command this jar is included as well. Still it is weird to get ClassNotFoundException.

This is my metrics.properties:

*.sink.prometheus.class=com.banzaicloud.spark.metrics.sink.PrometheusSink
*.sink.graphite.host=xxx.xxx.xxx.xxx
*.sink.graphite.port=2003
*.sink.graphite.period=5
*.sink.graphite.prefix=spark

master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource

Meanwhile, I'll try com.banzaicloud:spark-metrics_2.11:2.3-2.0.4 as well.

from spark-metrics.

stoader commented on June 6, 2024

@amitrmishra can you check if this #28 (comment) fixes the ClassNotFoundException for you.

from spark-metrics.

amitrmishra commented on June 6, 2024

Thanks @stoader for your time to reply.
But using
driver.sink.prometheus.class=com.banzaicloud.spark.metrics.sink.PrometheusSink
does not give me the executor metrics.

from spark-metrics.

stoader commented on June 6, 2024

@amitrmishra according to this http://apache-spark-user-list.1001560.n3.nabble.com/Custom-Metric-Sink-on-Executor-Always-ClassNotFound-td34205.html#a34206 on executors the jar that contains the sink must be in the system classpath

First, it's really weird to use "org.apache.spark" for a class that is
not in Spark. For executors, the jar file of the sink needs to be in the system
classpath; the application jar is not in the system classpath, so that does not work. There are different ways for you to get it there, most of them manual (YARN is, I think, the only RM supported in Spark where the application itself can do it).

Can you check that on executors the PrometheusSink jar is placed into the system classpath?

from spark-metrics.

mitchelldavis commented on June 6, 2024

@stoader None of this seems to be working. What is the system class path for a spark cluster running with yarn? (AWS EMR?) I've tried adding --conf "spark.executor.extraClassPath to the spark-submit call and used a local path and an hdfs path and neither worked.

Has anyone got this to work with yarn? I.E. Spark on AWS EMR?

from spark-metrics.

stoader commented on June 6, 2024

@mitchelldavis, unfortunately, I'm not familiar with Spark on AWS EMR as we run Spark on Kubernetes.

Can you verify what path is PrometheusSink jar is copied to on the hosts running the executors? Also can you check the timestamp of when the jar is copied whether it's before the org.apache.spark.metrics.MetricsSystem class is being initialised by Spark executor.

from spark-metrics.

mitchelldavis commented on June 6, 2024

@stoader Thanks for the quick reply. I'm going to have to figure out how to do that on Yarn, but I'll start working on that right away.

(Any tips you can give would be great!)

from spark-metrics.

stoader commented on June 6, 2024

Thank you @mitchelldavis for sharing this.
If I understand correctly what you did is you uploaded the spark-metrics jar and its dependencies to all the EMR hosts in advance instead of relying on Yarn to download and distribute the jars to the EMR hosts.

With having the jars uploaded to EMR hosts I guess you can omit the --repositories and --packages spark-submit command line options now.

from spark-metrics.

Drewster727 commented on June 6, 2024

@mitchelldavis @stoader I'm attempting to do the pom.xml process of downloading/building out the jars and distributing them manually -- I do that, then I go provide my extraClassPath references when I submit to yarn:
spark.driver.extraClassPath
spark.executor.extraClassPath

/opt/prometheus/jars/collector-0.12.0.jar:/opt/prometheus/jars/metrics-core-4.1.2.jar:/opt/prometheus/jars/simpleclient_common-0.8.0.jar:/opt/prometheus/jars/simpleclient_pushgateway-0.8.0.jar:/opt/prometheus/jars/simpleclient-0.8.0.jar:/opt/prometheus/jars/simpleclient_dropwizard-0.8.0.jar:/opt/prometheus/jars/spark-metrics_2.11-2.3-3.0.1.jar

All of the nodes in my yarn cluster have that path with those jars, but I get the following:

2019-12-30 14:15:19 ERROR ApplicationMaster:91 - User class threw exception: java.lang.NoClassDefFoundError: org/yaml/snakeyaml/Yaml
java.lang.NoClassDefFoundError: org/yaml/snakeyaml/Yaml
	at io.prometheus.jmx.JmxCollector.<init>(JmxCollector.java:74)
	at com.banzaicloud.spark.metrics.sink.PrometheusSink.jmxMetrics$lzycompute(PrometheusSink.scala:206)
	at com.banzaicloud.spark.metrics.sink.PrometheusSink.jmxMetrics(PrometheusSink.scala:206)
	at com.banzaicloud.spark.metrics.sink.PrometheusSink.start(PrometheusSink.scala:217)
	at org.apache.spark.metrics.MetricsSystem$$anonfun$start$3.apply(MetricsSystem.scala:103)
	at org.apache.spark.metrics.MetricsSystem$$anonfun$start$3.apply(MetricsSystem.scala:103)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:103)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:513)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
	at com.smg.rosetta.streaming.application.configuration.SparkHiveSessionBuilder$.getSparkSession(SparkHiveSessionBuilder.scala:22)
	at com.smg.rosetta.streaming.application.mediators.spark.SparkIngestionMediator.initialize(SparkIngestionMediator.scala:32)
	at com.smg.rosetta.streaming.application.Application$$anonfun$main$2.apply(Application.scala:47)
	at com.smg.rosetta.streaming.application.Application$$anonfun$main$2.apply(Application.scala:47)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at com.smg.rosetta.streaming.application.Application$.main(Application.scala:47)
	at com.smg.rosetta.streaming.application.Application.main(Application.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:721)
Caused by: java.lang.ClassNotFoundException: org.yaml.snakeyaml.Yaml
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 27 more

Any idea what's going on here?

from spark-metrics.

stoader commented on June 6, 2024

Do you use

*.sink.prometheus.enable-dropwizard-collector=true
*.sink.prometheus.enable-jmx-collector=false

*.sink.prometheus.enable-dropwizard-collector=false
*.sink.prometheus.enable-jmx-collector=true

in your metrics.properties config file?

from spark-metrics.

Spark 2.4.0: Sink class com.banzaicloud.spark.metrics.sink.PrometheusSink cannot be instantiated about spark-metrics HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent