Git Product home page Git Product logo

Comments (6)

mereck avatar mereck commented on July 2, 2024 1

@RyanZotti glad it worked!

Yes, it seems we can provide two images here, one for running the pod that runs the operator itself, that has different service account and permissions too. Then we can have another image for the SparkApplication which is responsible for spawning driver and executor and uses a different service account with possibly different permissions. During helm install we specify the operator pod's image, but when we write the SparkApplication manifest we can supply a different image. This is quite handy as often we'd need to modify the Driver / Executor images with additional layers for various connectors, such as described in the GCP guide for example.

from spark-operator.

mereck avatar mereck commented on July 2, 2024

I was able to run your examples on my cluster (not minikube) using image: "spark:3.5.0" for the SparkApplication

from spark-operator.

RyanZotti avatar RyanZotti commented on July 2, 2024

@mereck Yes, that worked for me too. Thanks.

I'm new to Kubernetes and operators in general. Am I correct to assume that the driver and executors can use an image separate from the image of the operator? I assume that's the case, since that would explain why there are two places to specify images, one in the spark application yaml and another in the helm install command, but that wasn't clear to me from the docs.

from spark-operator.

RyanZotti avatar RyanZotti commented on July 2, 2024

@mereck That makes sense. As a quick follow-up, inspired by the external jars example you linked to, do you know if I'm doing anything wrong while adding third party jars?

For example, assume a driver/executor Dockerfile like so:

FROM spark:3.5.0

ADD https://repo1.maven.org/maven2/com/mysql/mysql-connector-j/8.3.0/mysql-connector-j-8.3.0.jar /opt/spark/work-dir/

Where I've added the following line under spec to my original spark-py-pi.yaml:

 deps:
  jars:
    - local:///opt/spark/work-dir/mysql-connector-j-8.3.0.jar

Built the image like so:

eval $(minikube docker-env)
docker build -t spark-debug:latest -f Dockerfile

and where I've updated spark-py-pi.yaml to point to the spark-debug:latest image accordingly.

When I do all that I get a message like this:

Files local:///opt/spark/work-dir/mysql-connector-j-8.3.0.jar from /opt/spark/work-dir/mysql-connector-j-8.3.0.jar to /opt/spark/work-dir/mysql-connector-j-8.3.0.jar
Exception in thread "main" java.nio.file.NoSuchFileException: /opt/spark/work-dir/mysql-connector-j-8.3.0.jar

I can see the file at the specified location when I log into an interactive container. Do the jars need to be "local" to the Spark Operator image? I don't think that makes sense, but want to confirm.

from spark-operator.

mereck avatar mereck commented on July 2, 2024

It sounds like that jar would only need to be on the SparkApplication image. Have you tried adding the chmod 644 command to Dockerfile as well? Also I think it's better to place jars in the $SPARK_HOME/jars

Here's an example.

If permissions don't help, I would try spinning up a container with bash and explore the file system to check if the file you've added is there:

docker run --rm -it --entrypoint bash <image-name-or-id>

from spark-operator.

RyanZotti avatar RyanZotti commented on July 2, 2024

@mereck Thanks! That did the trick, so I'm marking this issue as closed.

from spark-operator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.