Git Product home page Git Product logo

Comments (4)

pquentin avatar pquentin commented on August 15, 2024

@nehalecky This sounds great! The current Jupyter init script leaves a lot to be desired, and needs to be fixed for each Spark release because it specifies the py4j path.

Thanks for working on this.

For a single jupyter instance, I think it's better to launch jupyter from pyspark (using PYSPARK_PYTHON=python and PYSPARK_DRIVER_PYTHON=jupyter) rather than launching spark from jupyter. It makes things much simpler. Not sure if this still applies with JupyterHub.

from initialization-actions.

nehalecky avatar nehalecky commented on August 15, 2024

@pquentin thanks for the note!

For a single jupyter instance, I think it's better to launch jupyter from pyspark (using PYSPARK_PYTHON=python and PYSPARK_DRIVER_PYTHON=jupyter) rather than launching spark from jupyter. It makes things much simpler.

This sounds like an interesting approach, and I'd be willing to implement, but I am unfamiliar the param. Could you explain more how it simplifies things (I'd like to understand better)?

Not sure if this still applies with JupyterHub.

OK, good point. I think whatever mechanism that we associate Jupyter instance with PySpark should be consistent between Jupyter and JupyterHub, do you agree? If so, how could we find out more about this?

Thanks!

from initialization-actions.

pquentin avatar pquentin commented on August 15, 2024

This is how we do it:

export PYSPARK_PYTHON=python
export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook --no-browser --port 8123 --ip="*"'

pyspark \
    --conf spark.executorEnv.PYTHONHASHSEED=0 \
    ...

And sure, we need to use the same setup for both Jupyter and JupyterHub.

from initialization-actions.

nehalecky avatar nehalecky commented on August 15, 2024

closed with #31

from initialization-actions.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.