Comments (6)
@RyanZotti glad it worked!
Yes, it seems we can provide two images here, one for running the pod that runs the operator itself, that has different service account and permissions too. Then we can have another image for the SparkApplication which is responsible for spawning driver and executor and uses a different service account with possibly different permissions. During helm install we specify the operator pod's image, but when we write the SparkApplication manifest we can supply a different image. This is quite handy as often we'd need to modify the Driver / Executor images with additional layers for various connectors, such as described in the GCP guide for example.
from spark-operator.
I was able to run your examples on my cluster (not minikube) using image: "spark:3.5.0"
for the SparkApplication
from spark-operator.
@mereck Yes, that worked for me too. Thanks.
I'm new to Kubernetes and operators in general. Am I correct to assume that the driver and executors can use an image separate from the image of the operator? I assume that's the case, since that would explain why there are two places to specify images, one in the spark application yaml and another in the helm install command, but that wasn't clear to me from the docs.
from spark-operator.
@mereck That makes sense. As a quick follow-up, inspired by the external jars example you linked to, do you know if I'm doing anything wrong while adding third party jars?
For example, assume a driver/executor Dockerfile like so:
FROM spark:3.5.0
ADD https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.3.0/mysql-connector-j-8.3.0.jar /opt/spark/work-dir/
Where I've added the following line under spec
to my original spark-py-pi.yaml
:
deps:
jars:
- local:///opt/spark/work-dir/mysql-connector-j-8.3.0.jar
Built the image like so:
eval $(minikube docker-env)
docker build -t spark-debug:latest -f Dockerfile
and where I've updated spark-py-pi.yaml
to point to the spark-debug:latest
image accordingly.
When I do all that I get a message like this:
Files local:///opt/spark/work-dir/mysql-connector-j-8.3.0.jar from /opt/spark/work-dir/mysql-connector-j-8.3.0.jar to /opt/spark/work-dir/mysql-connector-j-8.3.0.jar
Exception in thread "main" java.nio.file.NoSuchFileException: /opt/spark/work-dir/mysql-connector-j-8.3.0.jar
I can see the file at the specified location when I log into an interactive container. Do the jars need to be "local" to the Spark Operator image? I don't think that makes sense, but want to confirm.
from spark-operator.
It sounds like that jar would only need to be on the SparkApplication image. Have you tried adding the chmod 644 command to Dockerfile as well? Also I think it's better to place jars in the $SPARK_HOME/jars
Here's an example.
If permissions don't help, I would try spinning up a container with bash and explore the file system to check if the file you've added is there:
docker run --rm -it --entrypoint bash <image-name-or-id>
from spark-operator.
@mereck Thanks! That did the trick, so I'm marking this issue as closed.
from spark-operator.
Related Issues (20)
- Helm chart references non-existent version `v1beta2-1.4.2-3.5.0` HOT 3
- [BUG] spark-operator v1beta2-1.4.2-3.5.0 install with helm timeout HOT 4
- Add support for livenessProbe and ReadinessProbe of driver and executor pods HOT 4
- [FEATURE] Custom log patterns
- [BUG] Not able to open additional port on executor HOT 1
- [FEATURE] Metrics for pod update, pod delete, pod added
- [BUG] spec.ipFamilies: field not declared in schema Exception HOT 5
- [QUESTION] How can we monitor the Spark Operator container logs?
- [BUG] Submitting a Spark Application Fails on K8s versions < 1.21 HOT 2
- [Feature] Spark Operator Deployment Not Handling Multiple Namespaces HOT 6
- [FEATURE] prevent driver pod from being deleted before its status is processed by the operator HOT 2
- [BUG] volumes configmap not find in driver pod HOT 3
- [QUESTION] How should we upgrade to newer Spark versions for: the Spark Operator and long running Spark Applications?
- [BUG] Executor run error HOT 1
- [FEATURE] Reduce startup time associated with duplicated dependency downloads
- Add new slack channel to README.md HOT 3
- [BUG] Spark Operator Lock identity is empty while HA HOT 3
- [BUG] Changed the usage of the path of Example, a strange exception occurred.
- [BUG] A strange ClassCastException HOT 1
- [SECURITY ISSUE] A potential risk in spark-operator which can be levereaged to make cluster-level privilege escalation HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spark-operator.