Git Product home page Git Product logo

Comments (12)

mitchelldavis avatar mitchelldavis commented on June 6, 2024 2

@stoader I was able to get it figured out and for anyone else who is struggling to get this to work on AWS EMR with Yarn, here are the steps I took:

I created the following pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>

	<groupId>company</groupId>
	<artifactId>helloworld</artifactId>
	<version>1.0-SNAPSHOT</version>

	<repositories>
		<repository>
			<id>banzaicloud</id>
			<name>banzaicloud</name>
			<url>https://raw.github.com/banzaicloud/spark-metrics/master/maven-repo/releases</url>
		</repository>
	</repositories>

	<build>
		<plugins>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-compiler-plugin</artifactId>
				<configuration>
					<source>1.8</source>
					<target>1.8</target>
				</configuration>
				<version>3.7.0</version>
			</plugin>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-dependency-plugin</artifactId>
				<version>3.0.0</version>
				<executions>
					<execution>
						<id>copy-dependencies</id>
						<phase>package</phase>
						<goals>
							<goal>copy-dependencies</goal>
						</goals>
						<configuration>
							<outputDirectory>${project.build.directory}/alternateLocation</outputDirectory>
							<overWriteReleases>false</overWriteReleases>
							<overWriteSnapshots>false</overWriteSnapshots>
							<overWriteIfNewer>true</overWriteIfNewer>
						</configuration>
					</execution>
				</executions>
			</plugin>
		</plugins>
	</build>

	<dependencies>
		<dependency>
			<groupId>com.banzaicloud</groupId>
			<artifactId>spark-metrics_2.11</artifactId>
			<version>2.3-2.0.4</version>
		</dependency>
		<dependency>
			<groupId>io.prometheus</groupId>
			<artifactId>simpleclient</artifactId>
			<version>0.3.0</version>
		</dependency>
		<dependency>
			<groupId>io.prometheus</groupId>
			<artifactId>simpleclient_dropwizard</artifactId>
			<version>0.3.0</version>
		</dependency>
		<dependency>
			<groupId>io.prometheus</groupId>
			<artifactId>simpleclient_pushgateway</artifactId>
			<version>0.3.0</version>
		</dependency>
		<dependency>
			<groupId>io.dropwizard.metrics</groupId>
			<artifactId>metrics-core</artifactId>
			<version>3.1.2</version>
		</dependency>
	</dependencies>
</project>

Then ran the following command line to gather up all the dependencies:

mvn dependency:copy-dependencies -DoutputDirectory="./result"

The ./result directory now has all of the dependencies for the spark-metrics jar. Zipping all that up and transfering the archive to all of the nodes of the EMR cluster, then unzipping into the /usr/lib/spark/jars location prepared the EMR cluster to use the spark-metrics library.

Here is the example job that I was using to test:

spark-submit \
	--class org.apache.spark.examples.SparkPi \
	--repositories https://raw.github.com/banzaicloud/spark-metrics/master/maven-repo/releases \
	--packages com.banzaicloud:spark-metrics_2.11:2.3-2.0.4,io.prometheus:simpleclient:0.3.0,io.prometheus:simpleclient_dropwizard:0.3.0,io.prometheus:simpleclient_pushgateway:0.3.0,io.dropwizard.metrics:metrics-core:3.1.2 \
	--jars /usr/lib/spark/jars/metrics-core-3.1.2.jar,/usr/lib/spark/jars/simpleclient-0.3.0.jar,/usr/lib/spark/jars/simpleclient_dropwizard-0.3.0.jar,/usr/lib/spark/jars/simpleclient_pushgateway-0.3.0.jar,/usr/lib/spark/jars/spark-metrics_2.11-2.3-2.0.4.jar \
	--conf spark.metrics.conf=/baseSinkConfig/sinkprops.conf \
	/usr/lib/spark/examples/jars/spark-examples.jar 1000

the /baseSinkConfi/sinkprops.conf has the configuration mentioned in the documentation.

(Note, this may not answer the question that prompted opening this issue, but it solved my issue and wanted to share it here so other's on the google train will find it.)

from spark-metrics.

stoader avatar stoader commented on June 6, 2024 1

@amitrmishra can you check and confirm that the $PWD/com.banzaicloud_spark-metrics_2.11-2.3-2.0.1.jar is actually available on the host?

Can you share your metrics.properties config?

I also suggest using com.banzaicloud:spark-metrics_2.11:2.3-2.0.4 instead of com.banzaicloud:spark-metrics_2.11:2.3-2.0.1

from spark-metrics.

amitrmishra avatar amitrmishra commented on June 6, 2024

Yes the jar is in my list of external libraries. After building my project assembly jar, I can see the class (using jar tf ) com.banzaicloud.spark.metrics.sink.PrometheusSink is contained.

When running my spark application locally, it works fine. But when running the same application in cluster, using --master yarn, I start to get ClassNotFoundException on the executors.

I tried following approaches, but none of them worked:

  1. Using --packages and --repositories for banzaicloud artifact
  2. Downloading the jars and then passing them in --jars
  3. Using --conf spark.executor.extraJavaOptions to include the jars
  4. Using --conf spark.executor.userClassPathFirst=true

I can the jar with class 'com.banzaicloud.spark.metrics.sink.PrometheusSink' is able to the executor machine and in the executor command this jar is included as well. Still it is weird to get ClassNotFoundException.

This is my metrics.properties:

*.sink.prometheus.class=com.banzaicloud.spark.metrics.sink.PrometheusSink
*.sink.graphite.host=xxx.xxx.xxx.xxx
*.sink.graphite.port=2003
*.sink.graphite.period=5
*.sink.graphite.prefix=spark

master.source.jvm.class=org.apache.spark.metrics.source.JvmSource
worker.source.jvm.class=org.apache.spark.metrics.source.JvmSource
driver.source.jvm.class=org.apache.spark.metrics.source.JvmSource
executor.source.jvm.class=org.apache.spark.metrics.source.JvmSource

Meanwhile, I'll try com.banzaicloud:spark-metrics_2.11:2.3-2.0.4 as well.

from spark-metrics.

stoader avatar stoader commented on June 6, 2024

@amitrmishra can you check if this #28 (comment) fixes the ClassNotFoundException for you.

from spark-metrics.

amitrmishra avatar amitrmishra commented on June 6, 2024

Thanks @stoader for your time to reply.
But using
driver.sink.prometheus.class=com.banzaicloud.spark.metrics.sink.PrometheusSink
does not give me the executor metrics.

from spark-metrics.

stoader avatar stoader commented on June 6, 2024

@amitrmishra according to this http://apache-spark-user-list.1001560.n3.nabble.com/Custom-Metric-Sink-on-Executor-Always-ClassNotFound-td34205.html#a34206 on executors the jar that contains the sink must be in the system classpath

First, it's really weird to use "org.apache.spark" for a class that is
not in Spark. For executors, the jar file of the sink needs to be in the system
classpath; the application jar is not in the system classpath, so that does not work. There are different ways for you to get it there, most of them manual (YARN is, I think, the only RM supported in Spark where the application itself can do it).

Can you check that on executors the PrometheusSink jar is placed into the system classpath?

from spark-metrics.

mitchelldavis avatar mitchelldavis commented on June 6, 2024

@stoader None of this seems to be working. What is the system class path for a spark cluster running with yarn? (AWS EMR?) I've tried adding --conf "spark.executor.extraClassPath to the spark-submit call and used a local path and an hdfs path and neither worked.

Has anyone got this to work with yarn? I.E. Spark on AWS EMR?

from spark-metrics.

stoader avatar stoader commented on June 6, 2024

@mitchelldavis, unfortunately, I'm not familiar with Spark on AWS EMR as we run Spark on Kubernetes.

Can you verify what path is PrometheusSink jar is copied to on the hosts running the executors? Also can you check the timestamp of when the jar is copied whether it's before the org.apache.spark.metrics.MetricsSystem class is being initialised by Spark executor.

from spark-metrics.

mitchelldavis avatar mitchelldavis commented on June 6, 2024

@stoader Thanks for the quick reply. I'm going to have to figure out how to do that on Yarn, but I'll start working on that right away.

(Any tips you can give would be great!)

from spark-metrics.

stoader avatar stoader commented on June 6, 2024

Thank you @mitchelldavis for sharing this.
If I understand correctly what you did is you uploaded the spark-metrics jar and its dependencies to all the EMR hosts in advance instead of relying on Yarn to download and distribute the jars to the EMR hosts.

With having the jars uploaded to EMR hosts I guess you can omit the --repositories and --packages spark-submit command line options now.

from spark-metrics.

Drewster727 avatar Drewster727 commented on June 6, 2024

@mitchelldavis @stoader I'm attempting to do the pom.xml process of downloading/building out the jars and distributing them manually -- I do that, then I go provide my extraClassPath references when I submit to yarn:
spark.driver.extraClassPath
spark.executor.extraClassPath

/opt/prometheus/jars/collector-0.12.0.jar:/opt/prometheus/jars/metrics-core-4.1.2.jar:/opt/prometheus/jars/simpleclient_common-0.8.0.jar:/opt/prometheus/jars/simpleclient_pushgateway-0.8.0.jar:/opt/prometheus/jars/simpleclient-0.8.0.jar:/opt/prometheus/jars/simpleclient_dropwizard-0.8.0.jar:/opt/prometheus/jars/spark-metrics_2.11-2.3-3.0.1.jar

All of the nodes in my yarn cluster have that path with those jars, but I get the following:

2019-12-30 14:15:19 ERROR ApplicationMaster:91 - User class threw exception: java.lang.NoClassDefFoundError: org/yaml/snakeyaml/Yaml
java.lang.NoClassDefFoundError: org/yaml/snakeyaml/Yaml
	at io.prometheus.jmx.JmxCollector.<init>(JmxCollector.java:74)
	at com.banzaicloud.spark.metrics.sink.PrometheusSink.jmxMetrics$lzycompute(PrometheusSink.scala:206)
	at com.banzaicloud.spark.metrics.sink.PrometheusSink.jmxMetrics(PrometheusSink.scala:206)
	at com.banzaicloud.spark.metrics.sink.PrometheusSink.start(PrometheusSink.scala:217)
	at org.apache.spark.metrics.MetricsSystem$$anonfun$start$3.apply(MetricsSystem.scala:103)
	at org.apache.spark.metrics.MetricsSystem$$anonfun$start$3.apply(MetricsSystem.scala:103)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
	at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:103)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:513)
	at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:930)
	at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:921)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:921)
	at com.smg.rosetta.streaming.application.configuration.SparkHiveSessionBuilder$.getSparkSession(SparkHiveSessionBuilder.scala:22)
	at com.smg.rosetta.streaming.application.mediators.spark.SparkIngestionMediator.initialize(SparkIngestionMediator.scala:32)
	at com.smg.rosetta.streaming.application.Application$$anonfun$main$2.apply(Application.scala:47)
	at com.smg.rosetta.streaming.application.Application$$anonfun$main$2.apply(Application.scala:47)
	at scala.collection.immutable.List.foreach(List.scala:381)
	at com.smg.rosetta.streaming.application.Application$.main(Application.scala:47)
	at com.smg.rosetta.streaming.application.Application.main(Application.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$4.run(ApplicationMaster.scala:721)
Caused by: java.lang.ClassNotFoundException: org.yaml.snakeyaml.Yaml
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	... 27 more

Any idea what's going on here?

from spark-metrics.

stoader avatar stoader commented on June 6, 2024

Do you use

*.sink.prometheus.enable-dropwizard-collector=true
*.sink.prometheus.enable-jmx-collector=false

or

*.sink.prometheus.enable-dropwizard-collector=false
*.sink.prometheus.enable-jmx-collector=true

in your metrics.properties config file?

from spark-metrics.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.