Git Product home page Git Product logo

doctor-cluster-config's Introduction

Update all servers

  1. Install fabric2 via pip or load via nix develop
  2. Choose a deployment target:
$ fab -l
Available tasks:

  deploy-doctor      Deploy to rpi4, that is used to control power relay
  deploy-edinburgh   Deploy to edinburgh servers starting with rose
  deploy-tum         Deploy to TUM servers
  1. Run!
$ fab deploy-tum

Add new users

Add chair members to ./modules/users.nix and students to ./modules/students.nix. Check that the uid is unique across both files and in the range between 1000-2000 to avoid conflicts.

Add a new host

New hosts are added in configurations.nix.

Quick hard reboot

$ echo 1 > /proc/sys/kernel/sysrq; echo b > /proc/sysrq-trigger

Reboot switch

If the host is no longer reachable, it is possible to remote reboot it from your raspberry pi: Therefore first ssh to doctor.thalheim.io, then execute:

$ reboot-servers --help
USAGE: /run/current-system/sw/bin/reboot-servers rose|martha|donna|amy|clara
# reboots rose
$ reboot-servers rose

Update system

We use flakes to manage nixpkgs versions. To upgrade use:

$ nix flake update

Than commit flake.lock.

Home-manager

To install home-manager for a user simply run:

$ nix-shell '<home-manager>' -A install

Installation

Make sure the system is booted into EFI:

[root@nixos:~]# mount | grep efivars
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)

Also make sure to enable ipmi serial over lan (run nix-shell -p ipmitool on the same machine):

$ ipmitool sol set privilege-level admin
$ ipmitool sol set force-encryption false
$ ipmitool sol set enabled true
$ ipmitool sol payload enable

Partitioning

Assuming /dev/nvme0n1 is the root disk, the following commands will setup up zfs and one partition formatted as fat32 (used for efi boot).

# Creates gpt partition table with one 500MB partition for boot and the rest for zfs
$ sgdisk -n 1:0:+500M -N 2 -t 1:ef00 -t 2:8304 /dev/nvme0n1
# this what the result might look like
$ lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
loop0         7:0    0 593.4M  0 loop /nix/.ro-store
nvme0n1       8:0    0 894.3G  0 disk
├─nvme0n1p1   8:1    0   500M  0 part
└─nvme0n1p2   8:2    0 893.8G  0 part
# format boot partition
$ mkfs.vfat -b32 /dev/nvme0n1p1
# Create a zpool for zfs
$ zpool create -f zroot /dev/nvme0n1p2
# Create zfs datasets: 
# - zroot/root/nixos for the /
# - zroot/root/tmp for /tmp
# - zroot/nixos/home for /home
# - zroot/reserved in case we run out-of-disk space (can be deleted in this case)
# - zroot/docker for docker
$ zfs create -o acltype=posixacl -o xattr=sa -o compression=lz4 -o mountpoint=none zroot/root
$ zfs create -o mountpoint=legacy -o setuid=off -o devices=off -o sync=disabled zroot/root/tmp
$ zfs create -o mountpoint=legacy -o com.sun:auto-snapshot=true zroot/root/home
$ zfs create -o mountpoint=legacy -o com.sun:auto-snapshot=true zroot/root/nixos
$ zfs create -o mountpoint=none -o refreservation=1G zroot/reserved
$ zfs create -o mountpoint=none -o compression=lz4 zroot/docker
# mount zfs root to /mnt
$ mount -t zfs zroot/root/nixos /mnt
# prepare mountpoints for home,tmp,boot
$ mkdir /mnt/{home,tmp,boot}
# mount boot partition
$ mount /dev/sda1 /mnt/boot/
# mount home partition
$ mount -t zfs zroot/root/home /mnt/home/
# mount temp partition
$ mount -t zfs zroot/root/tmp /mnt/tmp/
# fix permissions on tmp (important!)
$ chmod 777 /mnt/tmp

Prepare NixOS configuration

# We are are just interested in the hardware-configuration.nix that NixOS might come up for this hardware.
# Compare /tmp/nixos/hardware-configuration.nix with hardware-configuration.nix we have in the repo
# and make changes as needed.
$ nixos-generate-config --root /mnt --dir /tmp/nixos
# clone configuration to new server
$ mkdir -p /mnt/etc/nixos
$ nix-shell -p git --run 'git clone https://github.com/Mic92/doctor-cluster-config /mnt/etc/nixos && git -C /mnt/etc/nixos submodule update --init'
# Generates keys for tinc
$ nix-shell -p tinc --run 'mkdir -p /mnt/etc/tinc/retiolum && sudo tincd --generate-keys -c /mnt/etc/tinc/retiolum -n retiolum'
# pick a new hostname and and copy configuration for it
$ cp clara.nix $newname.nix
# copy ssh keys from the installer to the new system (requires sshd service to be started)
# also share this public key (ssh_host_ed25519_key.pub) with @Mic92 to get a ssh certificate
$ cp /etc/ssh/ssh_host_* /mnt/etc/ssh
# the ssh certificate needs to go to modules/sshd/certs/$newname-cert.pub
# Than adjust configuration in $newname.nix
# Also add the new host in configurations.nix
# We primarly need a new tinc ip address for the vpn following https://github.com/Mic92/retiolum
# Also share in this step the key generated to /mnt/etc/tinc/retiolum
$ vim /mnt/etc/nixos/$newname.nix
# add ip address of the machine to
$ vim /mnt/etc/nixos/modules/hosts.nix
# Than run nixos-install
$ nix-shell -p nixFlakes -p git --run 'nixos-install --flake /mnt/etc/nixos#$newname'

Ipmi

On bill.r and nardole.r we have ipmi support!!

On the machine with IPMI:

ipmitool lan print

to get the IPMI address. On the other machine run the following command to get a serial console:

$ ipmitool -I lanplus -H <ipmi-ip-address> -U ADMIN -P "$(</etc/nixos/secrets/ipmi-passwords)" sol activate

Than hit enter in order to get a login Prompt. The root password is in /etc/nixos/secrets/root-password IPMI also allows to change BIOS settings.

Backups

Lorenzo back ups with borgbackup to his personal storage

They can be access from any host like this (secrets are in https://github.com/Mic92/doctor-cluster-secrets for access from other hosts):

$ sudo nix-shell -p borgbackup sshfs
[nix-shell:/etc/nixos]# mkdir -p /mnt/backup /tmp/borgbackup && sshfs -oIdentityFile=/etc/nixos/secrets/borgbackup-ssh-key -oPort=22222 [email protected]:/csce/datastore/inf/users/s1443541 /mnt/backup && BORG_PASSPHRASE=$(cat /etc/nixos/secrets/borgbackup-password) borgfs /mnt/backup/borgbackup /tmp/borgbackup
[nix-shell:/etc/nixos]# ls -la /tmp/borgbackup
total 12
drwxr-xr-x  1 root root  0 Jul  7 17:00 .
drwxrwxrwt 20 root root 21 Jul  7 17:00 ..
drwxr-xr-x  1 root root  0 Jul  7 15:23 rose-all-homes-2020-07-07T15:23:39.failed
drwxr-xr-x  1 root root  0 Jul  7 16:49 rose-all-homes-2020-07-07T16:49:13

Monitoring

Hosts are monitored here: https://grafana.thalheim.io/d/Y3JuredMz/monitoring?orgId=1

doctor-cluster-config's People

Contributors

dimstav23 avatar lzmartinico avatar mic92 avatar supersandro2000 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.