Git Product home page Git Product logo

corn's Introduction

corn ๐ŸŒฝ

Jupyterhub base image for the NASA Openscapes Hub

Overview

This project allows the provisioning of a multi-kernel Docker base image for Jupyterhub deployments.

In collaborative efforts -like this NASA hackathon- there are multiple teams working on different stacks and we often run into situations where Team A will need to use Python 3.8 with say xarray v0.14 and Team B may need Python 3.9 and xarray v0.17. A simple solution would be to reconcile these 2 environments so both teams can run their code. However, this is not always straight forward or even possible. Therefore having a multi kernel base image for Jupyterhub deployments makes a lot of sense.

corn uses the amazing Pangeo's base image, installs all the environments it finds under ci/environments and makes them available as kernels in the base image so users can select which kernel to use depending on their needs. The only requirement to add kernels is to use a conda environment.yml file (pip dependencies can be included in environment.yml) and a name file.

  • environment.yml: conda environment file
  • name.txt: the name for the environment, it can be the same as the one used in the environment file

Adding a new kernel

To add a new kernel we need to create a new folder under ci/environments/ and add the 2 files described above. Say we want to run our amazing new notebook that uses pandas and python 3.10.

We will need a conda environment file environment.yml

name: amazing-env
channels:
  - conda-forge
dependencies:
  - python="3.10"
  - pandas>=1.3
  - pip

and our name.txt file

amazing-env

That's it!

Note: if you have pip installable depencencies, they must be listed using a requirements.txt file.

Updating quarto

To update the quarto installation you'll need to change the version number in two places in corn's Dockerfile. After committing changes, the GitHub Action will begin - see next.

Using a Kernel

After we commit our changes to this repo, create a tagged release with format vYYYY.MM.DD. This is important because the GitHub Action build only gets triggered with that tagged release format. Then, the Github Action will push the resulting Docker image to dockerhub. This can take ~20 minutes. Then, we need to update the user image in our Jupyterhub configuration (admin > Services > configurator)(right now it's hard-coded to openscapes/corn:$TAG, previously was betolink.) For 2i2c deployments there is a GUI that allows administrators to do it.

configurator

Then, you'll go to https://openscapes.2i2c.cloud/hub/home > Stop My Server (or File > Log Out) to stop your server and restart it. Then the Docker image should be updated.

For other Jupyterhub deployments we can change the image using the hub configurator object or even in a Kubernetes chart.

Note: Looks like 2i2c caches the user image so tags like main won't be updated even if they have changes. Using the actual commit hash is a better practice for now.

What's next?

This is a effective but probably inefficient way of building environments, exploring staged partial builds in Docker or using conda-store to build each environment and then pulling them into a Docker image may be more efficient.

The final size of the image depends on the dependencies for each environment, thus avoiding multiple Python versions is still recommended.

corn's People

Contributors

amfriesz avatar andypbarrett avatar asteiker avatar battistowx avatar betolink avatar eeholmes avatar erinmr avatar itcarroll avatar jules32 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

corn's Issues

netcdf-c 4.9.2 causing Access Failure

The recent environment.yml update has caused the netCDF4 and netcdf-c libraries to be updated to their most recent packages. After some troubleshooting with Xiaohua Pan, we discovered that there are issues with the netcdf-c (libnetcdf) >=4.9.1 library and Cloud OPeNDAP links, where an "access denied" error is being triggered when both https and dap4 URLs are passed. This appears to be true for multiple granules from various collections, and only works with the granule from this OPeNDAP notebook: https://github.com/OPENDAP/notebooks/blob/master/tutorials/netCDF4_tutorial.ipynb

This issue proposes a fix by setting netcdf4==1.6.2 in environment.yml, which should also install netcdf-c 4.9.0 and should fix this issue.

I have also opened an issue in the netcdf-c Github relating to this: Unidata/netcdf-c#2704

add Python image into Matlab and R base images

Noting this here for us in the next corn iteration - @betolink @erinmr

Could we please add the python base image to both the Matlab and R images. Since Matlab and R will often act as wrappers or we're otherwise testing things in Python, this is a sticking point. It also came up with the NASA Mentors when the Matlab image didn't have earthaccess installed.

Also: the JupyterHub looks the same in all cases so it's confusing that python isn't there

Add jupytext to jupyter labextensions

Jupytext provides a jupyterlab extension that automates maintaining text versions of jupyter notebooks. Our Champions cohort group would benefit from having it installed.

I'm eager to understand how all our infrastructure works, and would like to submit the very simple PR this takes and follow it through to deployment. I'm sure I'll be pinging @betolink for better understanding the process. Thanks in advance.

Also, thank you @battistowx for the advice today.

final steps for updating quarto in the Hub

Hi @betolink ,

I wanted to chat with you about the most up-to-date ways to update quarto in the Hub. I can update these instructions in the README after we finish (https://github.com/NASA-Openscapes/corn#updating-quarto).

Today, following those instructions (including stopping and restarting my server), asking quarto check in the Terminal still returns Version: 1.0.35 rather than 1.2.269 now in the Dockerfile.

When you are back and have time, could we cowork on this together? I'll detail them below. Thank you!


Purpose: I'm hoping this will solve some weird quarto behavior @asteiker and I have been seeing in the Hub, and I'll document them here:

1. R code

quarto preview fails when Amy has an empty R code cell, she gets this error with the following code:

ERROR: Error executing 'Rscript': No such file or directory (os error 2)

Unable to locate an installed version of R.
Install R from https://cloud.r-project.org/

image

2. formatting

In my local RStudio (quarto 1.2.269), I'm able to add code chunk headers to indicate the language:
image

But in the Hub it is very grumpy about it: it doesn't handle bash well and then ignores the ones I have for python.

image

add quarto jupyterlab extension

Would be great to add the Quarto extension for JupyterLab to corn:

https://quarto.org/docs/tools/jupyter-lab-extension.html

"The Quarto JuptyerLab extension enables JupyterLab Notebooks which use Quarto markdown to properly display the contents of the markdown cells. For example, when the Quarto JupyterLab extension is installed, your Notebook will show rendered previews of elements like Callouts, Divs, Mermaid charts, as well as other Quarto elements (including the document front matter as a title block)."

Updating corn in 2i2c

Documenting progress here and where I'm stuck - to discuss with @battistowx tomorrow
Our best corn update instructions is here: https://github.com/NASA-Openscapes/corn#overview

(Screenshots of sleuthing) > I started with the Pull Requests results and then Issues
image

image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.