Git Product home page Git Product logo

simple_jupyterhub's Introduction

Simple Jupyterhub Deployment

This is a simple jupyterhub deployment to be used on a single server to run the hub + another server to store user backups.

Current features:

  • https/tls using nginx as a reverse proxy with automatic certificates throuhg Let's Encrypt
  • custom hub image
  • user disk quotas (1G soft + 2G hard)
  • periodic backups of user data to external server over simple ssh
  • authenticate through GitHub

Usage

To use without backups, simply do docker-compose up --build. Make sure to setup .env file first!

Goals

  • setup a simple dockerized jupyterhub installation
  • share runnable/editable code in isolated environment with your collaborators
  • teach a class using runnable/editable code in isolated environments

Disclaimer

These instructions assume you are running Ubuntu. They will likely not be much different for any other GNU/Linux.

Setup jupyterhub proxy token

Use

openssl rand -hex 32

to generate a random token for jupyterhub's proxy, and add CONFIGPROXY_AUTH_TOKEN=\<result\> to .env.

Do the same for LAUNCHER_API_TOKEN.

Setup environment

Copy .env.dist to .env, and fill in the blanks. To get OAUTH id and secret, follow instructions here: https://docs.github.com/en/free-pro-team@latest/developers/apps/creating-an-oauth-app

  • <jupyter_address> is something of the form jupyterhub.example.com
  • <backup_server_address> is something of the form backup-server.example.com
  • To generate a secure random password, you can for example, use: dd if=/dev/urandom bs=1 count=32 2>/dev/null | base64 -w 0 | rev | cut -b 2- | rev
  • Follow instructions below to properly setup backup server.
  • Email for letsencrypt doesn't really matter, I never got any :(

Once you have your environment set up, everything should work fine.

Setup users

Copy jupyterhub/userlist.dist to jupyterhub/userlist. Delete user1, user2, user3, ... and replace by GitHub account names of people you want to give access to. Make sure to put a unique positive integer number before every user name. This number is used to setup disk quotas.

If you want to delete a user, make sure not to use their number ever again for any other user in the futre. If you do, new user would have less disk space available because some of their quotas would be used by the previous deleted user.

Alternatively, make sure to also delete data of deleted user (resides in /data/jupyterhub/users/{username}).

Setup disk quotas

To implement disk quotas, we use basic Linux filesystem disk quotas. In order for this to work, your root filesystem, and partitions where docker containers and volumes are mounted to, should be mounted with quotas on:

 sudo mount -o remount,usrquota,grpquota /

This will temporarily enable disk quotas until the next time you restart your server.

To make this permanent, edit /etc/fstab to something like:

UUID=some-long-string-of-random-symbols / ext4 errors=remount-ro,usrquota,grpquota 0 0

or

/dev/sda1 / ext4 errors=remount-ro,usrquota,grpquota 0 0

And then restart the server using sudo reboot. But first make sure you didn't make any typos since then your server wouldn't be able to boot and you wouldn't be able to ssh to it!

Next, initialize and turn quotas on:

sudo quotacheck -cum /
sudo quotaon -v /

First command initializes user quotas, and the second one turns quotas on for the disk mounte to /.

More info on disk quotas: https://linuxhint.com/disk_quota_ubuntu/

Setup database

Before being able to use database, you need to create needed tables/run migrations:

In one terminal, run:

docker-compose up database

Then, in another termial, run:

docker-compose run jupyterhub jupyterhub upgrade-db

Now you can stop the databse service in the first terminal. Database now should be functional!

Configure backups

For backups to work, you first need a second machine with a user jupyter-backup created on it. This user needs to be accessible with ssh, but does not need password (we will connect to it using ssh keys, which are more secure and easy to use):

# ssh to backup machine
localhost> ssh backup-machine
backup-machine> sudo adduser -q -gecos "" -disabled-password jupyter-backup

Now, for more advanced security, we should only allows this user to use sftp, and nothing else. I.e. we will only use them for doing backups through sftp protocol that restic can utilize.

You have to edit your sshd config on the backup machine for that:

backup-machine> sudo vim /etc/ssh/sshd_config

First, if you have Subsystem sftp /usr/lib/openssh/sftp-server, replace it by: Subsystem sftp internal-sftp. If you don't have it, just add it at the end of file. internal-sftp is just a newer version of sftp-server that works better with chroot.

Next, we need to setup a rule sot that all users sftponly group can only access the server through sftp:

Match group sftponly
     ChrootDirectory /home/%u
     X11Forwarding no
     AllowTcpForwarding no
     ForceCommand internal-sftp

This creates a chroot environment (a lighter analogue of docker, haha) for the user. It then prevents them from using X11 forwarding, TCP forwarding, and forces them to only be able to use the sftp.

Now restart your sshd server:

backup-machine> systemctl restart ssh

Next, we need to generate an ssh keypair to allow our restic container on jupyterhub machine to automatically login to backup machine through sftp and store data there.

You do it on the jupyterhub machine:

jupyterhub-machine> cd ~/simple_jupyterhub/restic/ssh/
jupyterhub-machine> ssh-keygen -f id_rsa -P ""

This generate an ssh keypair with no password protection on privite key. We don't set the password because we want restic to be able to connect to the backup server without prompting for the password.

Keep privite key secure. If someone gets access to it, they can potentially remove/steal/modify backups from the backup server.

In the directory we are currently in, there now should be two files, id_rsa (privite key), and id_rsa.pub (public key).

Do cat id_rsa.pub to see its contents. We would have to copy them to the backup server for it to recognize our jupyterhub server when it tries to connect.

Public key should look something like ssh-rsa AAA...6p username@localhost. Make sure you copy all of it.

Now, add it to the backup server:

backup-machine> sudo mkdir ~jupyter-backup/.ssh
backup-machine> sudo vim ~jupyter-backup/.ssh/authorized_keys

Setup proper permissons for ssh folder:

backup-machine> sudo chown -R jupyter-backup:jupyter-backup ~jupyter-backup/.ssh/
backup-machine> sudo chmod 600 ~jupyter-backup/.ssh/authorized_keys
backup-machine> sudo chmod 700 ~jupyter-backup/.ssh/

Note: make sure you copy the last two commands in the exact order shown since the last command makes .ssh directory unreadable by anyone except jupyter-backup.

As a check, right now you should be able to ssh as juputer-backup from your juputerhub server:

jupyterhub-machine> ssh -i id_rsa jupyter-backup@backup-machine

Now let's prevent that from happening:

backup-machine> sudo groupadd sftponly
backup-machine> sudo usermod -a -G sftponly jupyter-backup

Notice that we didn't really have to create this group before adding it to sshd config. Groups and users are superficial on unix systems. They are just numers that exist. Whe you create a user or a group, you simply assign a name to those numbers.

If you want to make it so that ssh logins work once again, do:

backup-machine> sudo gpasswd -d jupyter-backup sftponly

To test that sftp works, do:

jupyterhub-machine> echo "Hello, sftp!" > test
jupyterhub-machine> sftp -i id_rsa jupyter-backup@backup-machine
sftp> ls
sftp> lls
test
sftp> put test
Uploading test to /home/jupyter-backup/test
test                             100%   12    52.4KB/s   00:00
sftp> ls
test
sftp> lls

ls lists files on remote machine, and lls lists local files.

It now works!

Using restic to inspect backups

It's important to manually check that you are actually making backups.

To do so, install restic on any of your computers, or use it from the container: docker-compose exec restic bash.

Then, to chech snapshots: restic -r "sftp:${BACKUP_USER}@${BACKUP_SERVER}:${BACKUP_REPO_PATH}" snapshots. Snapshots are basically like commits in git, but opposite to commits, snapshots can be empty. I.e. we will always get a new snapshot every time we make a snaphsot, even if there is no new data added.

To restore a snapshot with id <snapshot-id>, use restic -r "sftp:${BACKUP_USER}@${BACKUP_SERVER}:${BACKUP_REPO_PATH}" restore <snapshot-id> --target /. It's fairly easy to restore to previous snapshot, and then get back to current version as well. Just like git.

Restic can also restore particular file/directory as opposed to rollbacking the whole system. See https://restic.readthedocs.io/en/latest/050_restore.html for more info.

For more info, read restic documentation: https://restic.readthedocs.io/en/latest/

simple_jupyterhub's People

Contributors

slemonide avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Forkers

phaustin

simple_jupyterhub's Issues

Move user disk quota creation outside of jupyterhub container

There's really no need to do user disk quota creation inside jupyterhub container. We probably should use an additional container for that, that is disconnected from the network.

This will allow to remove privileged: true from jupyterhub, hence shrinking attack surface. Also would make it much harder to screw up host system root disk partition.

External services: configure access control

It should be possible to specify if external services can only be available to logged in users, or to anyone on the internet.

Note: second case is a potential security vulnerability. Need to configure network properly (make it impossible for external service to sniff packets from other containers).

For logged in users, it should then be possible to specify who can access it. I.e. only users who are in that project. Or all users.

Make a nice home screen/lobby for users accessing the hub

This lobby would have a list of courses. For each course, there will be a list of options to choose from. For example, textbook, jupyter notebook, dashboards, maybe something else too. Like a service for keeping notes or chatting with classmates... Probably makes sense to also include a proper ide (code-server, theia, jupyterlab, rstudio, ...)

Here's an example:
image

Also, instead of 404 page, forward users to this lobby. This way, if they try accessing a service that is stopped (for example, if they bookmarked it), they would be able to re-start it again. Even better: start the service when url is accessed by authorized user.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.