Comments (20)
Yes, I can help with some of this. To my mind, the most difficult thing will be managing the lifecycle of the clusters. Your basic description of the setup sounds reasonable. For communicating between the server side and the client side: I think rather than a config json we should probably model it after the rest of the notebook REST API. In particular, the sessions API is probably a good place to start.
Major questions that I have.
- How do we poll for clusters in a backend-agnostic way? Does dask-distributed have all the abstractions we need?
- What are things that need to be configurable on the backend? How much of that should be exposed to the frontend vs configured at server launch time?
from dask-labextension.
So whenever someone starts up a Client
in a Python session we would optionally hit some address to see if it will respond? Presumably it responds with the address that we should check? That could work.
I can imagine doing this either over HTTP similar to how Jupyter does things, or using Dask comms instead.
>>> client = Client('http://localhost/api/dask')
>>> client.scheduler.address
'tcp://some-other-address:####'
>>> client = Client('tcp://localhost:8786') # actually the address of our nbserver extension
>>> client.scheduler.address
'tcp://some-other-address:####'
from dask-labextension.
What are things that need to be configurable on the backend? How much of that should be exposed to the frontend vs configured at server launch time?
Well, to start we'll need to decide what library we're using to construct clusters. Common choices today include dask-kubernetes, dask-jobqueue, dask-yarn, and the LocalCluster in the core dask.distributed library. This should probably be determined by configuration, and not by the user directly.
At runtime we'll want users to be able to start, stop, and restart their cluster. We'll also want them to have numerical or text inputs for number of cores and memory. They'll also want to be able to hit "Adapt" and have Dask take over the decision about cores and memory.
from dask-labextension.
from dask-labextension.
So whenever someone starts up a Client in a Python session we would optionally hit some address to see if it will respond? Presumably it responds with the address that we should check? That could work.
Yes, we can have something like a dask/clients
endpoint that returns a list of active clients, as well as ids for them. We can then hit dask/clients/clientid
with GET
, DELETE
, etc requests to manage them. We can poll the list every few seconds to keep up to date. This is pretty close to how we handle kernel sessions in the application. I am currently thinking that our server extension would just proxy the client dashboard urls and comms and such, but you have a better idea of the networking requirements there than I.
Well, to start we'll need to decide what library we're using to construct clusters. Common choices today include dask-kubernetes, dask-jobqueue, dask-yarn, and the LocalCluster in the core dask.distributed library. This should probably be determined by configuration, and not by the user directly.
Are the abstractions here sufficient that we could hit multiple (or all) of these use cases with a single extension, and allow their selection via a config option?
At runtime we'll want users to be able to start, stop, and restart their cluster. We'll also want them to have numerical or text inputs for number of cores and memory. They'll also want to be able to hit "Adapt" and have Dask take over the decision about cores and memory.
I think this should be doable via a REST API.
from dask-labextension.
Are the abstractions here sufficient that we could hit multiple (or all) of these use cases with a single extension, and allow their selection via a config option?
I think so, yes.
Yes, we can have something like a dask/clients endpoint that returns a list of active clients
I think that we'll want to switch out the term clients for clusters or schedulers. The client is an object that the user will need to interact with in their notebook/script/whatever. That object will need the address of the scheduler to connect to.
@ian-r-rose perhaps we should chat about this real-time? We might be able to bounce back and forth and come up with a plan more quickly. I'm around most of today and tomorrow if you're free.
from dask-labextension.
Sure, I am around and pretty flexible today. Feel free to ping me on Gitter and we can set up a room.
from dask-labextension.
@ian-r-rose and I had a quick chat we agreed that ...
- a server extension should probably manage a few clusters, not just one
- a user might attach a particular cluster to a notebook kernel by clicking and dragging something into a notebook
- on the server side this can probably be a fairly simple tornado web application
As an initial set of operations, the following probably work pretty well
from dask.distributed import LocalCluster
cluster = LocalCluster(threads_per_worker=2, memory_limit='4GB') # configure workers and start
cluster.scale(10) # scale cluster to ten workers
cluster.scale(2) # scale cluster down to two workers
cluster.adapt(minimum=0, maximum=10) # adapt cluster between 0 and 10 workers
cluster.close() # shut down cluster
We may at some point want to start these running on the same event loop as the Jupyter web server, I'm not sure. This will probably affect some discussions that we're thinking about for deployment now upstream.
from dask-labextension.
It looks like you were right to be concerned about the tornado event loop @mrocklin. In my initial explorations, just importing dask.distributed
from the same process as that which is running the notebook server seems to cause problems. Specifically, the notebook server no longer responds to HTTP requests. Any ideas about what might be causing the problem or how we could get around it? Since it seems to be happening at import time, I don't really know where to start looking.
https://github.com/ian-r-rose/dask-labextension/tree/serverextension
from dask-labextension.
@ian-r-rose I'm happy to investigate. This may sound dumb, but what's the right way to install and test this?
from dask-labextension.
Thanks @mrocklin. You can install it with
pip install -e .
jupyter serverextension enable --sys-prefix dask_labextension
This attempts to add an additional REST endpoint to the web server. However, I was able to reproduce the problem with a do-nothing extension that just imported dask.distributed
.
from dask-labextension.
My suspicion is that both dask.distributed
and the notebook server are trying to wrest control of the default tornado IOLoop
and stepping on each other's toes, but I don't know that for sure.
from dask-labextension.
Some binary search of imports and code lead to this diff on the Dask side, which seems to solve the immediate problem
diff --git a/distributed/utils.py b/distributed/utils.py
index df7561aa..dcdd7f5e 100644
--- a/distributed/utils.py
+++ b/distributed/utils.py
@@ -1394,8 +1394,8 @@ def reset_logger_locks():
# Only bother if asyncio has been loaded by Tornado
-if 'asyncio' in sys.modules:
- fix_asyncio_event_loop_policy(sys.modules['asyncio'])
+# if 'asyncio' in sys.modules:
+# fix_asyncio_event_loop_policy(sys.modules['asyncio'])
def has_keyword(func, keyword):
I'll look into why we did this in the first place. In the mean time though applying this diff directly may allow us to move forward.
from dask-labextension.
Seems to be a workaround for tornadoweb/tornado#2183
from dask-labextension.
OK, after looking more at this I'm not sure that Dask is doing something wrong here. I've standardized things on the Dask side at dask/distributed#2326 .
If possible I think we should ask someone on the Jupyter side about why this might cause issues. Who is the right contact for this today?
from dask-labextension.
@minrk do you have thoughts on why adding the following lines might break the Jupyter server?
import asyncio
import tornado.platform.asyncio
asyncio.set_event_loop_policy(tornado.platform.asyncio.AnyThreadEventLoopPolicy())
from dask-labextension.
Thanks for looking into this @mrocklin, I'll apply your fix while we work out a more permanent solution. @Carreau recently did a lot of work on the IPython event loop and may have some insights as well.
from dask-labextension.
I can have a look. I've been poking at async stuff, and we are still in some places using old way and doing ensure_future instead of yielding anything which is not None, This has lead for me to some prototype just not running coroutines.
So far I'm working on deploying a JupyterHub on the merced cluster and once this is done, I'll likely start integrating dask, so happy to be a guinea pig and debug these things.
I just need to get things to work first :-)
from dask-labextension.
Thanks for the info @Carreau.
from dask-labextension.
Fixed by #31
from dask-labextension.
Related Issues (20)
- Dashboard doesn't show when scheduler does not listen on localhost HOT 1
- Update branding
- Open Dashboard programmatically HOT 3
- Specify default address to look for scheduler HOT 13
- Dashboard shows cluster of different user HOT 1
- No visibility (black background) with JupyterLab in Dark theme HOT 3
- Can't connect to local cluster using local IP and port HOT 1
- 5.3.0 does not create a schemas/dask-labextension/plugin.json file on install HOT 11
- Default layout HOT 4
- Need to press the new "Launch in JupyterLab" button twice HOT 1
- Help connecting to existing KubeCluster using the build-in Discovery Mechanism HOT 2
- Using +NEW freezes Jupyterlab completely HOT 3
- JLab 4.0 side-effect: Class 'RenderedCommon' incorrectly implements interface 'IRenderer'. HOT 12
- Logos no longer working well HOT 4
- Conflicting dependencies with jupyterlab=4.0.2 HOT 3
- Inject client code adds an empty cell. HOT 3
- Release HOT 3
- Button Launch Dashboard in Jupyterlab not available HOT 1
- JLab 4-compatible release HOT 5
- Dashboard does not display graphs with bokeh = 3.3.0 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask-labextension.