Comments (6)
Java tracebacks are terrible - the important bit is hidden in the middle:
Suppressed: java.lang.UnsatisfiedLinkError: /tmp/libcom_anaconda_skein_shaded_netty_tcnative_linux_x86_64218435490097249322.so: /tmp/libcom_anaconda_skein_shaded_netty_tcnative_linux_x86_64218435490097249322.so: failed to map segment from shared object: Operation not permitted
This indicates that /tmp
does not have exec
privileges on your machine, which prevents the native libraries from being loaded properly. I've seen this before on worker nodes, but never on an edge node. Skein already deals with this when running in a container (we use the container local directory, which is always cleaned up), but we can't make this assumption when running on an edge node (as the driver does).
To get around this, setting the environment variableSKEIN_DRIVER_JAVA_OPTIONS="-Dio.netty.native.workdir=SOME_TEMPORARY_DIRECTORY"
(where SOME_TEMPORARY_DIRECTORY
is a directory that already exists that can be used as temp storage) should work, but it doesn't seem to in my test setup. I'll either debug this tonight, or on Monday.
from dask-yarn.
So export SKEIN_DRIVER_JAVA_OPTIONS="-Djava.io.tmpdir=SOME_TEMPORARY_DIRECTORY"
does work. It's not clear to me why the difference matters (internally we set io.netty.native.workdir
when patching on the worker nodes), but this should get you going.
Alternatively you can set this programmatically:
import skein
import dask_yarn
client = skein.Client(java_options="-Djava.io.tmpdir=SOME_TEMPORARY_DIRECTORY")
cluster = dask_yarn.YarnCluster(skein_client=client, ...)
from dask-yarn.
That works for me. I was able to get it to work without this java option once and another user was using it successfully. I don't know why it is intermittent. The permissions on /tmp
are listed as drwxrwxrwt. 1300 root root 5128192 Feb 15 20:31
, which is not exec
but spark jobs use this directory to launch from the edge nodes.
from dask-yarn.
I can't comment on why things worked intermittently (perhaps admin changing permissions?), but the reason things don't work for skein is that it has native library dependencies which need to be decompressed into a temporary directory. Spark may not have this need, or admin may have configured spark to avoid this issue (I'm not sure if this is an issue for spark).
from dask-yarn.
I am talking to the admins, but I think I found another application that ran into this intermittent /tmp
permission issue. Happy to close.
from dask-yarn.
If you figure out why it's intermittent, I'd be curious to know. If this is something we can easily work around I wouldn't be opposed to a fix.
from dask-yarn.
Related Issues (20)
- AWS EMR bootstrap script fails HOT 5
- Conda environment does not activate HOT 1
- Dask Scheduler host/port Not Written to Skein Key-Value Storage When YARN Application Restarts HOT 5
- Move default branch from "master" -> "main" HOT 1
- YarnCluster.shutdown() Won't Work on EMR, results in `concurrent.futures._base.CancelledError` HOT 1
- Verify that Read the Docs is building after master -> main HOT 7
- YarnCluster hangs HOT 11
- wait_for_workers got stuck when to create cluster but application failed on yarn HOT 3
- dask-yarn job fails with dumps_msgpack ImportError HOT 3
- register workers of scheduler are less than workers in dashborad HOT 1
- can't upload files HOT 2
- EMR 6.3.0 Bootstrap Action BOOTSTRAP_FAILURE : Python 3.9 support? HOT 3
- Application Failure When Submitting Dask-Yarn Model Inferencing Job Remotely
- FileNotFoundError: [Errno 2] No such file or directory: 'yarn' HOT 3
- Jupyter Notebook Cell Hangs after submitting job to remote EMR cluster
- distributed 2022.3.0 no more compatible with dask-yarn because of missing "status" attribute in YarnCluster HOT 7
- YarnCluster() does not initialize but runs indefinetly HOT 3
- AttributeError while running dask on amazon EMR. HOT 3
- .skein.sh: line 2: environment/bin/python: No such file or directory HOT 4
- Bootstrapping for 40min, when use the script. HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask-yarn.