Comments (2)
Thanks for filing an excellent issue report.
It also creates a python source file (dask_test.py) containing a simple function on all nodes, and adds the file to the PYTHONPATH
What's happening here is that YARN manages the worker environments, so you setting PYTHONPATH
manually beforehand isn't seen by the workers (since YARN doesn't run user's .bashrc
files). Mucking with PYTHONPATH
is not the recommended way to install code in python, as it relies on the environment variables being set appropriately in all environments. I recommend one of the following:
- Make your code a true package, and install it properly. Note that this doesn't require you to put it on pypi, pip can install fine from source or from a git url (e.g.
pip install git+https://github.com/dask/dask-yarn.git
). - Copy your code manually into
$HOME/miniconda/lib/python3.6/site-packages/
(depends on your python version/location). - If your code is small enough you could define everything in the notebook/script/whatever and cloudpickle will handle serializing the code to all nodes. Anything in
__main__
will be serialized automatically.
Using Client.upload_file
will work if your cluster is fixed in size, but won't work if you add/remove nodes during operation (it only uploads things once, so new workers won't get the files added). I don't recommend using it, it's around mostly for legacy reasons.
If you really want to set PYTHONPATH
manually, you can pass worker_env={'PYTHONPATH': ...}
to YarnCluster
, and it will be set in your worker environments.
from dask-yarn.
Great. Both of these two options worked for me:
- Make your code a true package, and install it properly. Note that this doesn't require you to put it on pypi, pip can install fine from source or from a git url (e.g. pip install git+https://github.com/dask/dask-yarn.git).
- If your code is small enough you could define everything in the notebook/script/whatever and cloudpickle will handle serializing the code to all nodes. Anything in main will be serialized automatically.
Thanks for the quick response.
from dask-yarn.
Related Issues (20)
- AWS EMR bootstrap script fails HOT 5
- Conda environment does not activate HOT 1
- Dask Scheduler host/port Not Written to Skein Key-Value Storage When YARN Application Restarts HOT 5
- Move default branch from "master" -> "main" HOT 1
- YarnCluster.shutdown() Won't Work on EMR, results in `concurrent.futures._base.CancelledError` HOT 1
- Verify that Read the Docs is building after master -> main HOT 7
- YarnCluster hangs HOT 11
- wait_for_workers got stuck when to create cluster but application failed on yarn HOT 3
- dask-yarn job fails with dumps_msgpack ImportError HOT 3
- register workers of scheduler are less than workers in dashborad HOT 1
- can't upload files HOT 2
- EMR 6.3.0 Bootstrap Action BOOTSTRAP_FAILURE : Python 3.9 support? HOT 3
- Application Failure When Submitting Dask-Yarn Model Inferencing Job Remotely
- FileNotFoundError: [Errno 2] No such file or directory: 'yarn' HOT 3
- Jupyter Notebook Cell Hangs after submitting job to remote EMR cluster
- distributed 2022.3.0 no more compatible with dask-yarn because of missing "status" attribute in YarnCluster HOT 7
- YarnCluster() does not initialize but runs indefinetly HOT 3
- AttributeError while running dask on amazon EMR. HOT 3
- .skein.sh: line 2: environment/bin/python: No such file or directory HOT 4
- Bootstrapping for 40min, when use the script. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask-yarn.