Git Product home page Git Product logo

pangeo-stacks's Introduction

!! This repo is no longer being monitored !!

active development now happening here: https://github.com/pangeo-data/pangeo-docker-images

Pangeo Stacks

Currated Docker images for use with Jupyter and Pangeo

Action Status

This repository contains a few currated Docker images that can be used with deployments of the Pangeo Helm Chart. Each of the images in this repository are configured and built using repo2docker and are continuously deployed to DockerHub. Importantly, each image built in this repo includes the minimum required libraries to do scalable computations with Pangeo (via dask-kubernetes).

Current Notebook Images:

Image Description Link Badges
base-notebook A bare-bones image with Jupyter and Dask. DockerHub
pangeo-notebook A complete image with lots of Python packages DockerHub
pangeo-esip An image customized for ESIP use DockerHub

Customize images with the -onbuild variants

You can customize the images here in comon ways by using the image variants that have the -onbuild suffix. If your Dockerfile inherits from an -onbuild Pangeo image, you automatically get the following features:

  1. The contents of your directory are copied with appropriate permissions into the image. The files will be present under the directory pointed to by ${REPO_DIR} docker environment variable.

  2. If you have any of the following files in your repository, they are used to automatically customize the image similar to what repo2docker (used by mybinder.org) does:

    a. requirements.txt installs python packages with pip b. environment.yml installs conda packages c. apt.txt lists ubuntu packages to be installed d. postBuild is a script (in any language) that is run automatically after other customization steps for you to execute arbitrary code.

    These files could also be inside a binder/ directory rather than the top level of your repository if you would like.

For example, if you want to start from the base pangeo-notebook image but add the django python package, you would do the following.

  1. Create a Dockerfile in your repo with just the following content:

    FROM pangeo/pangeo-notebook-onbuild:<version>
    
  2. Add a requirements.txt file with the following contents

    django
    

And that's it! Now you can build the image any way you wish (on a binder instance, with repo2docker, or just with docker build), and it'll do the customizations for you automatically.

Adding new images

It is easy to add additional images. The basic steps involved are:

  1. Open an Issue to discuss adding your image.
  2. Copy the base-notebook directory and name it something informative.
  3. Modify the contents of the binder directory, adding any configuration you need according to the repo2docker documentation.
  4. Edit the TravisCI configuration file to inclue the new image.
  5. Push your changes to GitHub and open a Pull Request.

CI/CD

The images in Pangeo-stacks are built and deployed continuously using TravisCI. Images are versioned using the CALVER system.

Build locally

The images here can be built locally using repo2docker. The following example demonstrates how to build the base-notebook image:

repo2docker --no-run --user-name=jovyan --user-id 1000 \
    --image-name=pangeo/base-notebook ./base-notebook

Related projects

  • Jupyter/docker-stacks: Ready-to-run Docker images containing Jupyter applications
  • repo2docker: A tool to build, run, and push Docker images from source code repositories that run via a Jupyter server
  • Pangeo Helm Chart: The helm chart for installing Pangeo.

pangeo-stacks's People

Contributors

arokem avatar ocefpaf avatar rabernat avatar rsignell-usgs avatar scollis avatar scottyhq avatar shanicetbailey avatar tomaugspurger avatar yuvipanda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pangeo-stacks's Issues

Intermittent failure of jupyter labextension install

I have been working to build the Pangeo notebooks locally, and I have noticed that the base-notebook postBuild fails, only intermittently, during the jupyter labextension install:

Building jupyterlab assets (build:prod:minimize)
An error occured.
RuntimeError: JupyterLab failed to build
See the log file for details:  /tmp/jupyterlab-debug-pxded7y6.log
Removing intermediate container 5152aa5d766e
The command '/bin/sh -c ./binder/postBuild' returned a non-zero code: 1Traceback (most recent call last):
  File "build.py", line 183, in <module>
    main()
  File "build.py", line 168, in main
    r2d_build(
  File "build.py", line 85, in r2d_build
    r2d.build()
  File "/Users/tjc/miniconda3/envs/repo2docker/lib/python3.8/site-packages/repo2docker/app.py", line 700, in build
    raise docker.errors.BuildError(l["error"], build_log="")
docker.errors.BuildError: The command '/bin/sh -c ./binder/postBuild' returned a non-zero code: 1

But sometimes it works, sometimes it doesn't, with no changes to any of our code. It's pretty weird. I'm guessing some sort of network timeout from some server somewhere. Has anyone seen this?

need parquet support

I tried to load a dask dataframe from parquet and was told I needed to install either fastparquet or pyarrow for this to work.

I think one of these should be in our notebook base image. But which one?

How to use pangeo-stacks images with dask-labextention layout in binder repos?

So I'm trying to work on pangeo-data/pangeo-tutorial#14.

I decided to use pangeo/pangeo-notebook-onbuild:2019.04.19 Docker image as found in several recent Pangeo deployments. This seems to work, however, I've lost the dask-labextension layout, and I'm not sure what I should do.

Looking at https://github.com/pangeo-data/pangeo-cloud-federation/tree/staging/deployments/nasa/image/binder or https://github.com/pangeo-data/pangeo-cloud-federation/tree/staging/deployments/ocean/image/binder, there seems to be some post config files, but no dask-labextension layout.

So what is the correct configuration to use to have a basic Pangeo notebook image with a nice dask-labextension layout?

adding GPU-enabled tensorflow and pytorch

We would like to do deep learning on GPU nodes in google cloud. (I know, just use collaboratory, right?)

There is this blog post from anaconda which describes the performance benefits of using the conda build of tensorflow. They also provide a tensorflow-gpu package.

However, none of this is available on conda forge. There is a long issue about why it is hard / impossible to build gpu-enabled packages on conda-forge.

So what should we do? Is it feasible to switch our whole notebook image to defaults rather than conda-forge?

stick to conda-forge packages?

Cross posting this issue from pangeo-cloud-federation since it is relevant here:
pangeo-data/pangeo-cloud-federation#254

In particular, the base image is mixing conda-forge and pip:
https://github.com/pangeo-data/pangeo-stacks/blob/master/base-notebook/binder/environment.yml

But then pangeo-notebook installs some of the same stuff via conda-forge:
https://github.com/pangeo-data/pangeo-stacks/blob/master/pangeo-notebook/binder/environment.yml

I think this is leading to 'inconsistent environment' messages in the hub. @rabernat and @jhamman , could you take a look at the linked issue? I'm wondering if it might be good to:

  1. use just conda-forge
  2. unpin version numbers (since the docker builds by date effectively save references to compatible package combinations)

Cannot conda

Once in a notebook I've tried creating a new conda environment but I get permissions errors.

$ conda create -n testenv python
Collecting package metadata: failed

NotWritableError: The current user does not have write permissions to a required path.
  path: /srv/conda/pkgs/cache/3e39a7aa.json
  uid: 1000
  gid: 1000

If you feel that permissions on this path are set incorrectly, you can manually
change them by executing

  $ sudo chown 1000:1000 /srv/conda/pkgs/cache/3e39a7aa.json

In general, it's not advisable to use 'sudo conda'.

add git commit hash to docker tag?

Currently we are tagging our images based on calver: CALVER="$( date '+%Y.%m.%d' )". This could lead to problems if we need to push multiple changes per day.

What if the tags were instead like our helm charts: 19.03.09-a0475df etc?

trying to use onbuild with binder, error "unknown flag: chown"

I'm trying out the new onbuild stuff in this repo:
https://github.com/rabernat/local_stencil_ml_examples

My Dockerfile looks like this

FROM pangeo/pangeo-notebook-onbuild:2019.04.19

# https://mybinder.readthedocs.io/en/latest/tutorials/dockerfile.html#preparing-your-dockerfile
ENV NB_USER jovyan
ENV NB_UID 1000
ENV HOME /home/${NB_USER}

COPY . ${HOME}
USER root
RUN chown -R ${NB_UID} ${HOME}
USER ${NB_USER}

I got this error.

Waiting for build to start...
/usr/local/lib/python3.7/site-packages/repo2docker/utils.py:214: FutureWarning: Possible nested set at position 1815
  """, re.VERBOSE)
Picked Git content provider.
Cloning into '/tmp/repo2docker_8lb8bku'...
HEAD is now at fce01c6 first commit
Using DockerBuildPack builder
Step 1/8 : FROM pangeo/pangeo-notebook-onbuild:2019.04.19
# Executing 5 build triggers...
Step 1/1 : COPY --chown=1000:1000 . ${REPO_DIR}
Unknown flag: chown

I'm going to try removing all the extra docker stuff.

dask workers can be scheduled on hub pods with default config

Our current setup allows for dask pods on hub nodes:
https://github.com/pangeo-data/pangeo-stacks/blob/master/base-notebook/binder/dask_config.yaml

This seems to be due to 'prefer' rather than 'require' when scheduling:
https://github.com/dask/dask-kubernetes/blob/ec4666a4af5acad03c24b84aca4fcf8ccd791b4f/dask_kubernetes/objects.py#L177

which results in the following for pods:

spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: k8s.dask.org/node-purpose
            operator: In
            values:
            - worker
        weight: 100

not sure how we modify the config file to get the stricter 'require' condition like we have for notebook pods:

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: k8s.dask.org/node-purpose
            operator: In
            values:
            - worker

@jhamman , @TomAugspurger

Provide ONBUIILD variants for easier customization

Without having to understand Dockerfile semantics, it should be possible for end users of these images to:

  1. Specify which base image they want
  2. Customize it additionally (with environment.yml, postBuild, start, etc)

Based on jupyterhub/repo2docker#487 (comment) and some chat with @jhamman, I think it would be great to have automatically generated -onbuild variants of the images here that support easy customization with environment.yml, postBuild, etc files. They'll only support a subset of what repo2docker supports - for example, since there is no base image with Julia support, REQUIRE files won't work. But this is probably ok for now.

Cartopy in pangeo.pydata.org

It appears that cartopy is not available in pangeo.pydata.org

ModuleNotFoundError: No module named 'cartopy'

Should I install it locally, or will it be included?

Addition of dask-gateway causes dependency panic

When using the latest base-notebook with This env:
https://github.com/scollis/binderhack/blob/master/binder/environment.yml
I get (after a glacial wait)

UnsatisfiableError: The following specifications were found to be incompatible with a past
explicit spec that is not an explicit spec in this operation (pip):

  - boto -> python[version='>=2.7,<2.8.0a0,>=3.5,<3.6.0a0,>=3.6,<3.7.0a0,>=3.7,<3.8.0a0'] -> pip
  - boto3 -> s3transfer[version='>=0.1.10,<0.2.0,>=0.2.0,<0.3.0'] -> botocore[version='>=1.12.36,<2.0.0,>=1.3.0,<2.0.0'] -> urllib3[version='>=1.20,<1.24,>=1.20,<1.25'] -> pyopenssl[version='>=0.14'] -> cryptography[version='>=1.9,>=2.2.1'] -> cffi[version='>=1.7'] -> pycparser -> python[version='3.6.*,>=3.6,<3.7.0a0,>=3.7,<3.8.0a0'] -> pip

I built a new docker image without dask-gateway and I have no issues building an image on top of that using repo2docker with it as the base Docker

Can I run one of these docker images in a standalone cloud VM?

I just tried and failed to run the pangeo-notebook image in a standalone, hand-made google compute instance. I tried following the Google Cloud docs, but whenever my VM booted up, I was "ryan_abernathey" instead of "jovyan" and couldn't find any of the familiar environment or commands.

Is this possible? If so, could we provide some instructions for how to manually boot up an image and connect to the notebook server?

Rasterio not working in latest notebook image

docker run pangeo/notebook:86665a6 followed by docker exec -it <name of container> bin/bash
and finally, python and import rasterio.

>>> import rasterio
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/lib/python3.6/site-packages/rasterio/__init__.py", line 22, in <module>
    from rasterio._base import gdal_version
ImportError: libncurses.so.6: cannot open shared object file: No such file or directory
>>> import cartopy
>>> 

Anyone know of a way to configure the dockerfile so if a package doesn't install properly the build stops?

Updating dask

Opening an issue to bring to attention that we should update dask to its newest version 2.4.0.

clean out package dir to reduce image size?

Our docker images are storing about 2.6 GB worth of conda packages in /srv/conda/pkgs

$ du -h -d1 /srv/conda
4.0K    /srv/conda/envs
2.7G    /srv/conda/pkgs
4.0K    /srv/conda/compiler_compat
31M     /srv/conda/bin
128K    /srv/conda/etc
4.0K    /srv/conda/conda-bld
25M     /srv/conda/conda-meta
539M    /srv/conda/lib
12K     /srv/conda/x86_64-conda_cos6-linux-gnu
7.6M    /srv/conda/include
8.0K    /srv/conda/ssl
8.0K    /srv/conda/man
314M    /srv/conda/share
12K     /srv/conda/shell
92K     /srv/conda/libexec
412K    /srv/conda/sbin
640K    /srv/conda/mkspecs
8.0K    /srv/conda/condabin
20K     /srv/conda/docs
12K     /srv/conda/translations
36K     /srv/conda/doc
332K    /srv/conda/plugins
4.0K    /srv/conda/phrasebooks
156K    /srv/conda/qml
12K     /srv/conda/var
3.6G    /srv/conda

Do we actually need this? Can we clean it out and drastically reduce the size of the images?

Expose environment file from each image

As discussed in today's call, it would be great if we could download an environment.yml file for each of the docker images, to recreate the same environment locally.

Ideally this would appear on the fancy new website.

No sudo

Not sure if this is an intentional thing but sudo has disappeared compared to the old images.

Why does conda have to solve an environment with onbuild images?

With @cgentemann, I'm trying to update our tutorial to use the latest onbuild image. We are using pangeo/pangeo-notebook-onbuild:2020.02.16-e0f17a8.

I'm confused because binder is taking a LONG time to build the image. It has been solving an environment for ~30 minutes.

Waiting for build to start...
# Executing 5 build triggers
 ---> Running in 0bed3634fef6
Removing intermediate container 0bed3634fef6
 ---> Running in 0655f8a0a683
Reading package lists...
Building dependency tree...
Reading state information...
vim is already the newest version (2:8.0.1453-1ubuntu1.1).
0 upgraded, 0 newly installed, 0 to remove and 29 not upgraded.
Warning: you have pip-installed dependencies in your environment file, but youdo not list pip itself as one of your conda dependencies.  Conda may not use the correct pip to install your packages, and they may end up in the wrong place.  Please add an explicit pip dependency.  I'm adding one for you, but still nagging you.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working...

There is no environment.yml file in the binder:
https://github.com/cgentemann/osm2020tutorial/tree/master/binder

So I'm confused why an environment needs to be solved. The older onbuild images didn't work this way.

new 2020 image builds with 'start' config failing to launch on binder or jupyterhub

2020.02.03-0853dbb and latest tags are not launching on binder or jupyterhubs, seems related to recent changes enabling start scripts for onbuild images:
https://github.com/pangeo-data/pangeo-stacks/commits/master/onbuild/r2d_overlay.py

Here is the traceback from pod logs that get stuck in CrashLoopBackOff

Traceback (most recent call last):
  File "/usr/local/bin/r2d_overlay.py", line 167, in <module>
    main()
  File "/usr/local/bin/r2d_overlay.py", line 163, in main
    start()
  File "/usr/local/bin/r2d_overlay.py", line 135, in start
    ['/bin/bash', '-c', command], preexec_fn=applicator._pre_exec
  File "/srv/conda/envs/notebook/lib/python3.7/subprocess.py", line 358, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/srv/conda/envs/notebook/lib/python3.7/subprocess.py", line 339, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/srv/conda/envs/notebook/lib/python3.7/subprocess.py", line 800, in __init__
    restore_signals, start_new_session)
  File "/srv/conda/envs/notebook/lib/python3.7/subprocess.py", line 1552, in _execute_child
    raise child_exception_type(err_msg)
subprocess.SubprocessError: Exception occurred in preexec_fn.

pinging @rsignell-usgs who encountered this, also @jhamman, @yuvipanda, and @scollis for possible fixes

Don't built all images every time

I noticed that our travis config just builds all the images all the time.

We could replace this with the logic that is in hubploy to check based on commit whether a build is needed.

Noisy Dask JupyterLab Warnings

New images with Dask>2.5 have lots of warning messages being displayed in JupyterLab. We attempted to fix this with #87, but it still seems to be an issue with the latest image (2019-11-14):

from dask_kubernetes import KubeCluster
cluster = KubeCluster()
cluster.scale(2);
cluster

Results in:

distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO -   Scheduler at: tcp://192.168.140.221:42003
distributed.scheduler - INFO -   dashboard at:                     :8787
cluster.adapt(minimum=2, maximum=10);

Results in an infinite output loop with these messages:

distributed.scheduler - INFO - Suggest closing workers: ['tcp://192.168.125.103:46797', 'tcp://192.168.103.24:34165']

@TomAugspurger and @jacobtomlinson - seems like these messages should not be appearing by default. Are we enabling them somehow in our dask configuration? https://github.com/pangeo-data/pangeo-stacks/blob/master/base-notebook/binder/dask_config.yaml

Unable to modify dask-kubernetes configuration

Derivative images seem to be unable to modify the dask-kubernetes configuration without rebuilding the base image from scratch.

This seems rather impractical, since it's only natural that one might want to modify the configuration of the dask workers. For example, we might want to taint the worker nodes so that core pods cannot be scheduled there (this is a problem for me at the moment). This means the dask pods need the corresponding toleration, which cannot presently be added without rebuilding base.

I wanted to go ahead and start a discussion on how to implement this.

setup ci to push to dockerhub

The current travis CI setup just builds the images. The next step is to add the bits necessary to authenticate with dockerhub and push built images.

Editor in docker images

Hi, I cannot find any editor installed in the docker images -- sometimes it is useful to be able to run shell and modify some files while developing.

Could we add a minimal editor to the docker images?

cache_from string substitution

In the file build.py

    if cache_from:
        cmd +=' --cache-from {cache_from}'

should be

    if cache_from:
        cmd += f' --cache-from {cache_from}'

Adding dask-labextension

Admittedly I don't really know how things are setup here, but would it make sense to install dask-labextension and configure it in the image?

Support EXTRA_*_PACKAGES like dask images

From https://gitter.im/pangeo-data/Lobby?at=5dc6325beeb63e1a837e6f4a.

Dask does this in a https://github.com/dask/dask-docker/blob/master/base/prepare.sh.

IIUC, we may need to add the EXTRA_PIP_PACKAGES stuff to either postBuild or start https://mybinder.readthedocs.io/en/latest/config_files.html#start-run-code-before-the-user-sessions-starts. I’m not sure, but I thikn it would have to be start, since we really don’t know the value of EXTRA_PIP_PACKAGES until the container is being created.

Dask lab extension unable to launch new clusters (2019.05.19)

The most recent images did away with pinning most versions:
#42

Unfortunately in running these new images, the dask labextension is no longer able to launch new KubeClusters (you can select the latest image to run on this hub: https://nasa.pangeo.io)

Seeing messages such as these:

Failed to load resource: the server responded with a status of 500 ()
clusters.js:146 Uncaught (in promise) Error: Failed to start Dask cluster
    at DaskClusterManager.<anonymous> (clusters.js:146)
    at Generator.next (<anonymous>)
    at fulfilled (clusters.js:4)
serverconnection.js:192 PUT https://nasa.pangeo.io/user/scottyhq/dask/clusters?1558478492967 500
handleRequest @ serverconnection.js:192
makeRequest @ serverconnection.js:75
(anonymous) @ clusters.js:144
(anonymous) @ clusters.js:7
push.eY2S.__awaiter @ clusters.js:3
_launchCluster @ clusters.js:143
onClick @ clusters.js:66
handleMouseDown @ toolbar.js:332
callCallback @ react-dom.development.js:100
invokeGuardedCallbackDev @ react-dom.development.js:138
invokeGuardedCallback @ react-dom.development.js:187
invokeGuardedCallbackAndCatchFirstError @ react-dom.development.js:201
executeDispatch @ react-dom.development.js:461
executeDispatchesInOrder @ react-dom.development.js:483
executeDispatchesAndRelease @ react-dom.development.js:581
executeDispatchesAndReleaseTopLevel @ react-dom.development.js:592
forEachAccumulated @ react-dom.development.js:562
runEventsInBatch @ react-dom.development.js:723
runExtractedEventsInBatch @ react-dom.development.js:732
handleTopLevel @ react-dom.development.js:4477
batchedUpdates$1 @ react-dom.development.js:16660
batchedUpdates @ react-dom.development.js:2131
dispatchEvent @ react-dom.development.js:4556
interactiveUpdates$1 @ react-dom.development.js:16715
interactiveUpdates @ react-dom.development.js:2150
dispatchInteractiveEvent @ react-dom.development.js:4533
clusters.js:146 Uncaught (in promise) Error: Failed to start Dask cluster
    at DaskClusterManager.<anonymous> (clusters.js:146)
    at Generator.next (<anonymous>)
    at fulfilled (clusters.js:4)

And here is a copy of the full conda environment installed:

# packages in environment at /srv/conda/envs/notebook:
#
# Name                    Version                   Build  Channel
adal                      1.2.1                      py_0    conda-forge
alembic                   1.0.9                    pypi_0    pypi
asn1crypto                0.24.0                py36_1003    conda-forge
async_generator           1.10                       py_0    conda-forge
attrs                     19.1.0                     py_0    conda-forge
backcall                  0.1.0                      py_0    conda-forge
bleach                    3.1.0                      py_0    conda-forge
blinker                   1.4                        py_1    conda-forge
bokeh                     1.1.0                    py36_0    conda-forge
ca-certificates           2019.3.9             hecc5488_0    conda-forge
cachetools                2.1.0                      py_0    conda-forge
certifi                   2019.3.9                 py36_0    conda-forge
certipy                   0.1.3                      py_0    conda-forge
cffi                      1.12.3           py36h8022711_0    conda-forge
chardet                   3.0.4                    pypi_0    pypi
click                     7.0                        py_0    conda-forge
cloudpickle               1.0.0                      py_0    conda-forge
configurable-http-proxy   1.3.0                         0    conda-forge
cryptography              2.6.1            py36h72c5cf5_0    conda-forge
cytoolz                   0.9.0.1         py36h14c3975_1001    conda-forge
dask-core                 1.2.2                      py_0    conda-forge
dask-kubernetes           0.8.0                      py_0    conda-forge
dask-labextension         0.3.3                    pypi_0    pypi
dbus                      1.13.6               he372182_0    conda-forge
decorator                 4.4.0                      py_0    conda-forge
defusedxml                0.5.0                      py_1    conda-forge
distributed               1.28.1                   py36_0    conda-forge
entrypoints               0.3                   py36_1000    conda-forge
expat                     2.2.5             hf484d3e_1002    conda-forge
fontconfig                2.13.1            he4413a7_1000    conda-forge
freetype                  2.10.0               he983fc9_0    conda-forge
gettext                   0.19.8.1          hc5be6a0_1002    conda-forge
glib                      2.58.3            hf63aee3_1001    conda-forge
google-auth               1.6.3                      py_0    conda-forge
gst-plugins-base          1.14.4            hdf3bae2_1001    conda-forge
gstreamer                 1.14.4            h66beb1c_1001    conda-forge
heapdict                  1.0.0                 py36_1000    conda-forge
icu                       58.2              hf484d3e_1000    conda-forge
idna                      2.8                      pypi_0    pypi
ipykernel                 5.1.0           py36h24bf2e0_1002    conda-forge
ipython                   7.5.0            py36h24bf2e0_0    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
ipywidgets                7.4.2                      py_0    conda-forge
jedi                      0.13.3                   py36_0    conda-forge
jinja2                    2.10.1                     py_0    conda-forge
jpeg                      9c                h14c3975_1001    conda-forge
jsonschema                3.0.1                    py36_0    conda-forge
jupyter                   1.0.0                      py_2    conda-forge
jupyter_client            5.2.4                      py_3    conda-forge
jupyter_console           6.0.0                      py_0    conda-forge
jupyter_core              4.4.0                      py_0    conda-forge
jupyterhub                0.9.4                    pypi_0    pypi
jupyterlab                0.35.6                   py36_0    conda-forge
jupyterlab_server         0.2.0                      py_0    conda-forge
krb5                      1.16.3            h05b26f9_1001    conda-forge
libblas                   3.8.0               10_openblas    conda-forge
libcblas                  3.8.0               10_openblas    conda-forge
libcurl                   7.64.1               hda55be3_0    conda-forge
libedit                   3.1.20170329      hf8c457e_1001    conda-forge
libffi                    3.2.1             he1b5a44_1006    conda-forge
libgcc-ng                 8.2.0                hdf63c60_1    defaults
libgfortran-ng            7.3.0                hdf63c60_0    defaults
libiconv                  1.15              h516909a_1005    conda-forge
liblapack                 3.8.0               10_openblas    conda-forge
libpng                    1.6.37               hed695b0_0    conda-forge
libsodium                 1.0.16            h14c3975_1001    conda-forge
libssh2                   1.8.2                h22169c7_2    conda-forge
libstdcxx-ng              8.2.0                hdf63c60_1    defaults
libtiff                   4.0.10            h648cc4a_1001    conda-forge
libuuid                   2.32.1            h14c3975_1000    conda-forge
libxcb                    1.13              h14c3975_1002    conda-forge
libxml2                   2.9.9                h13577e0_0    conda-forge
mako                      1.0.9                    pypi_0    pypi
markupsafe                1.1.1            py36h14c3975_0    conda-forge
mistune                   0.8.4           py36h14c3975_1000    conda-forge
msgpack-python            0.6.1            py36h6bb024c_0    conda-forge
nbconvert                 5.4.1                      py_2    conda-forge
nbformat                  4.4.0                      py_1    conda-forge
nbserverproxy             0.8.8                   py_1000    conda-forge
ncurses                   6.1               hf484d3e_1002    conda-forge
nodejs                    11.14.0              he1b5a44_1    conda-forge
notebook                  5.7.8                    py36_0    conda-forge
nteract-on-jupyter        2.0.12                   pypi_0    pypi
numpy                     1.16.3           py36he5ce36f_0    conda-forge
oauthlib                  3.0.1                      py_0    conda-forge
olefile                   0.46                       py_0    conda-forge
openblas                  0.3.6                h6e990d7_2    conda-forge
openssl                   1.1.1b               h14c3975_1    conda-forge
packaging                 19.0                       py_0    conda-forge
pamela                    1.0.0                      py_0    conda-forge
pandoc                    2.7.2                         0    conda-forge
pandocfilters             1.4.2                      py_1    conda-forge
parso                     0.4.0                      py_0    conda-forge
pcre                      8.41              hf484d3e_1003    conda-forge
pexpect                   4.7.0                    py36_0    conda-forge
pickleshare               0.7.5                 py36_1000    conda-forge
pillow                    6.0.0            py36he7afcd5_0    conda-forge
pip                       19.1                     py36_0    conda-forge
prometheus_client         0.6.0                      py_0    conda-forge
prompt_toolkit            2.0.9                      py_0    conda-forge
psutil                    5.6.2            py36h516909a_0    conda-forge
pthread-stubs             0.4               h14c3975_1001    conda-forge
ptyprocess                0.6.0                   py_1001    conda-forge
pyasn1                    0.4.4                      py_1    conda-forge
pyasn1-modules            0.2.4                      py_0    conda-forge
pycparser                 2.19                     py36_1    conda-forge
pycurl                    7.43.0.2         py36h16ce93b_0    conda-forge
pygments                  2.3.1                      py_0    conda-forge
pyjwt                     1.7.1                      py_0    conda-forge
pyopenssl                 19.0.0                   py36_0    conda-forge
pyparsing                 2.4.0                      py_0    conda-forge
pyqt                      5.9.2            py36hcca6a23_0    conda-forge
pyrsistent                0.15.1           py36h516909a_0    conda-forge
pysocks                   1.7.0                    py36_0    conda-forge
python                    3.6.7             h381d211_1004    conda-forge
python-dateutil           2.8.0                      py_0    conda-forge
python-editor             1.0.4                    pypi_0    pypi
python-graphviz           0.10.1                   pypi_0    pypi
python-kubernetes         9.0.0                    py36_0    conda-forge
python-oauth2             1.1.0                    pypi_0    pypi
pyyaml                    5.1              py36h14c3975_0    conda-forge
pyzmq                     18.0.1           py36hc4ba49a_1    conda-forge
qt                        5.9.7                h52cfd70_1    conda-forge
qtconsole                 4.4.4                      py_0    conda-forge
readline                  7.0               hf8c457e_1001    conda-forge
requests                  2.21.0                   pypi_0    pypi
requests-oauthlib         1.2.0                      py_0    conda-forge
rsa                       3.4.2                      py_1    conda-forge
send2trash                1.5.0                      py_0    conda-forge
setuptools                41.0.1                   py36_0    conda-forge
sip                       4.19.8          py36hf484d3e_1000    conda-forge
six                       1.12.0                py36_1000    conda-forge
sortedcontainers          2.1.0                      py_0    conda-forge
sqlalchemy                1.3.3            py36h516909a_0    conda-forge
sqlite                    3.26.0            h67949de_1001    conda-forge
tblib                     1.3.2                      py_1    conda-forge
terminado                 0.8.2                    py36_0    conda-forge
testpath                  0.4.2                   py_1001    conda-forge
tk                        8.6.9             h84994c4_1001    conda-forge
toolz                     0.9.0                      py_1    conda-forge
tornado                   5.1.1           py36h14c3975_1000    conda-forge
traitlets                 4.3.2                 py36_1000    conda-forge
urllib3                   1.24.2                   pypi_0    pypi
wcwidth                   0.1.7                      py_1    conda-forge
webencodings              0.5.1                      py_1    conda-forge
websocket-client          0.56.0                   py36_0    conda-forge
wheel                     0.33.1                   py36_0    conda-forge
widgetsnbextension        3.4.2                 py36_1000    conda-forge
xorg-libxau               1.0.9                h14c3975_0    conda-forge
xorg-libxdmcp             1.1.3                h516909a_0    conda-forge
xz                        5.2.4             h14c3975_1001    conda-forge
yaml                      0.1.7             h14c3975_1001    conda-forge
zeromq                    4.3.1             hf484d3e_1000    conda-forge
zict                      0.1.4                      py_0    conda-forge
zlib                      1.2.11            h14c3975_1004    conda-forge


JupyterLab v0.35.6
Known labextensions:
   app dir: /srv/conda/envs/notebook/share/jupyter/lab
        @jupyter-widgets/jupyterlab-manager v0.38.1  enabled  OK
        @jupyterlab/hub-extension v0.12.0  enabled  OK
        @pyviz/jupyterlab_pyviz v0.7.2  enabled  OK
        dask-labextension v0.3.0  enabled  OK
        jupyter-leaflet v0.10.2  enabled  OK

pinging @ian-r-rose @jhamman

loading data on HPC using intake catalog

Hello,

I have been experiencing an issue when trying to load data using an intake catalog. I've successfully loaded the catalog as shown below:

cat = intake.open_catalog('catalog.yaml')
list(cat)

['sea_surface_height',
 'ECCOv4r3',
 'SOSE',
 'LLC4320_grid',
 'LLC4320_SST',
 'LLC4320_SSS',
 'LLC4320_SSH',
 'LLC4320_SSU',
 'LLC4320_SSV',
 'CESM_POP_hires_control',
 'CESM_POP_hires_RCP8_5',
 'GFDL_CM2_6_control_ocean_surface',
 'GFDL_CM2_6_control_ocean_3D',
 'GFDL_CM2_6_one_percent_ocean_surface',
 'GFDL_CM2_6_one_percent_ocean_3D',
 'GFDL_CM2_6_grid']

However, when I go to load any of the datasets, I get the following error:

ds = cat.ECCOv4r3.to_dask()
ds

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-12-99add3ca0ed7> in <module>
----> 1 ds = cat.ECCOv4r3.to_dask()
      2 ds

~/miniconda3/envs/pangeo/lib/python3.7/site-packages/intake_xarray/base.py in to_dask(self)
     68     def to_dask(self):
     69         """Return xarray object where variables are dask arrays"""
---> 70         return self.read_chunked()
     71 
     72     def close(self):

~/miniconda3/envs/pangeo/lib/python3.7/site-packages/intake_xarray/base.py in read_chunked(self)
     43     def read_chunked(self):
     44         """Return xarray object (which will have chunks)"""
---> 45         self._load_metadata()
     46         return self._ds
     47 

~/miniconda3/envs/pangeo/lib/python3.7/site-packages/intake/source/base.py in _load_metadata(self)
    115         """load metadata only if needed"""
    116         if self._schema is None:
--> 117             self._schema = self._get_schema()
    118             self.datashape = self._schema.datashape
    119             self.dtype = self._schema.dtype

~/miniconda3/envs/pangeo/lib/python3.7/site-packages/intake_xarray/base.py in _get_schema(self)
     17 
     18         if self._ds is None:
---> 19             self._open_dataset()
     20 
     21             metadata = {

~/miniconda3/envs/pangeo/lib/python3.7/site-packages/intake_xarray/xzarr.py in _open_dataset(self)
     30         update_storage_options(options, self.storage_options)
     31 
---> 32         self._fs, _ = get_fs(protocol, options)
     33         if protocol != 'file':
     34             self._mapper = get_mapper(protocol, self._fs, urlpath)

~/miniconda3/envs/pangeo/lib/python3.7/site-packages/dask/bytes/core.py in get_fs(protocol, storage_options)
    569             "    pip install gcsfs",
    570         )
--> 571         cls = _filesystems[protocol]
    572 
    573     elif protocol in ["adl", "adlfs"]:

KeyError: 'gcs'

It appears to be an error related to google cloud storage. For reference, I am using the base pangeo environment provided here: https://github.com/pangeo-data/pangeo-stacks/blob/master/pangeo-notebook/binder/environment.yml

consistent repo2docker versions?

Should we try to keep the repo2docker version the same across our various repos that use it for building pangeo images? Seems like doing will help guarantee a given repo builds to the same environment and is runnable on different hubs.

And also in sync with mybinder.org? (build_image: jupyter/repo2docker:bf29f66f)
which updates regularly - jupyterhub/mybinder.org-deploy#1091
https://github.com/jupyterhub/mybinder.org-deploy/blob/43ae77db9d3d7961964a6e1d383f3749a6d6dbe6/mybinder/values.yaml#L68

cc @jhamman

Add an ESIP container

I should have raised this issue before submitting #95. This is for creating a container with the packages that used to be at esip.pangeo.io to support that work.

Images should inherit from base-notebook

Currently pangeo-notebook image isn't based on the base-notebook image. Ideally, it would be - and we can put things every image must have (what is currently in the appendix) into the base-notebook. It'd also remove duplication in environment.yml, etc.

Discussion of Image Simplifications

After a bit of discussion with @tjcrone plus some topics that have come up on pangeo community meetings, wanted to post some ideas for simplifying this repository and pangeo images in general:

  1. @jcrist suggested creating a conda-metapackage for the 'minimal pangeo environment' to run on hubs with dask-kubernetes. Users could use this to create their own compatible environments with custom libraries. If anyone wants to take this on it seems like a great idea!
    https://docs.conda.io/projects/conda-build/en/latest/resources/commands/conda-metapackage.html
    Once created we could just specify pangeo-core==1.0 in the base-environment configuration (https://github.com/pangeo-data/pangeo-stacks/blob/master/base-notebook/binder/environment.yml).

  2. The onbuild system combined with repo2docker layered conda environments is very confusing. I think maintaining compatibility with repo2docker is a good idea for compatibility with different binderhub deployments. But, can we just publish onbuild images and drop the -onbuild name? This would half the number of options out there and reduce confusion over which to use
    (https://pangeo-data.github.io/pangeo-stacks/images.html#pangeo-pangeo-notebook)

  3. If we could figure out a way to get PR images into a public image repo, or stored as build artifacts, a less error-prone and faster approach to pushing master images to dockerhub would be just to relabel them (for example docker tag pangeo/base-notebook:PR129 pangeo/base-notebook:latest ):
    https://dille.name/blog/2018/09/20/how-to-tag-docker-images-without-pulling-them/

  4. Other thoughts?

Would welcome feedback from @yuvipanda, @jhamman, @rabernat, @rsignell-usgs, @ocefpaf

automate building of new image when conda dependencies are update

It would be nice to have a service that keeps the pangeo-stacks fresh. A bot that does the following would be a fun project:

  • Track installed versions of a few key packages in the stacks
  • compare installed versions to the latest available version in https://conda-static.anaconda.org/conda-forge/rss.xml
  • If a new version is available, update the environment.yaml file(s) and open a pull request

It seems like we could copy a lot of the infrastructure from the henchbot that mybinder.org uses.

@ocefpaf and I spoke about this at the recent pangeo meeting. Let me know if I'm missing any important details here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.