Git Product home page Git Product logo

zos's Introduction

0-OS Tests Go Report Card

0-OS is an autonomous operating system design to expose raw compute, storage and network capacity.

This repository host the V2 of 0-OS which is a complete rewrite from scratch. If you want to know about the history and decision that motivated the creation of the V2, you can read this article

0-OS is mainly used to run node on the Threefold Grid. Head to https://threefold.io and https://wiki.threefold.io to learn more about Threefold and the grid.

Documentation

Start exploring the code base by first checking the documentation and specification documents.

An FAQ is also available for all the common questions.

Setting up your development environment

If you want to contribute read the contribution guideline and the documentation to setup your development environment

Grid Networks

0-OS is deployed on 3 different "flavor" of network:

  • production network: Released of stable version. Used to run the real grid with real money. Cannot be reset ever. Only stable and battle tested feature reach this level. You can find the dashboard here
  • test network: Mostly stable features that need to be tested at scale, allow preview and test of new features. Always the latest and greatest. This network can be reset sometimes, but should be relatively stable. You can find the dashboard here
  • QA network: Mostly unstable features that need to be tested internally, allow preview and test of new features. Can be behind development. This network can be reset sometimes, but should be relatively stable. You can find the dashboard here
  • dev network: ephemeral network only setup to develop and test new features. Can be created and reset at anytime. You can find the dashboard here

Learn more about the different network by reading the upgrade documentation

Provisioning of workloads

ZOS does not expose an interface, instead of wait for reservation to happen on a trusted source, and once this reservation is available, the node will actually apply it to reality. You can start reading about provisioning in this document.

Owners

@maxux @muhamadazmy @delandtj @leesmet

Community

If you have some questions or just want to hang out, you can find us on:

zos's People

Contributors

abdelrahmanelawady avatar abom avatar ahmedhanafy725 avatar ahmedyasen avatar ashraffouda avatar delandtj avatar dependabot[bot] avatar dmahmouali avatar dylanverstraete avatar eslam-nawara avatar glendc avatar grimpy avatar juneezee avatar leesmet avatar mariobassem avatar maximevanhees avatar maxux avatar muhamadazmy avatar omarabdul3ziz avatar omarelawady avatar rawdagastan avatar robvanmieghem avatar sameh-farouk avatar totodebrant avatar waleedhammam avatar weynandkuijpers avatar xmonader avatar zaibon avatar zgorizzo69 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zos's Issues

Avoid generating the wireguard key on the node

As a requests from @despiegk, we need to remove the generation of the wireguard key from the node and move that to the user side.

So the user will have to generate the key pair of each member of its network and then publish these key pair in the bcdb.
This mean we will need to encrypt part of the network object with the public key of the node, so only the node will be able to read its private key.

The current layout is not made for something like this, it currently only container the public key of the wireguard peers of a network.

Define container profiles

After the discussion from #5 (comment)
We agreed we will provide different container "profiles".
This issue is about choosing what kind of profile we want and write the config.json file for them.

Implement storage module

A basic implementation for storage, which shows how the high lvl architecture will look like
@muhamadazmy

  • Btrfs abstraction
  • Btrfs unit tests
  • Device abstraction

@LeeSmet

  • Module interface
  • Module interface implementation
  • Cache preparation ?

Design/Implement Virtual machine module [fcvm]

Technology:

Firecracker has been chosen to run the VM.

To investigate:

  • check if k3os runs fine on firecracker: https://github.com/rancher/k3os/releases/tag/v0.8.0
  • test if firecracker VM can use a tap interface connected to a network resource bridge
  • test that raw file works properly as a way to give access to disks from the VM

Module Design.

This module will be quite similar to the container module. Both needs to expose method to start/stop/inspect a VM/Container

There is an SDK for the firecracker API: https://github.com/firecracker-microvm/firecracker-go-sdk

Todo

  • create udhcpd config -> in a separate NS is easier for networkd we found a way to assign an IP statically to the VM
  • create subvolume in a disk (where) to host images
    • VMs have their caveats: RAW image files are large, but at the same time we would need to have them fast. Storaged allow to create filesystem on top of multiple disk, so if big disks are required we could use thisf feature.
      After some performance testing and if the speed is not good enough maybe we could investigate BCACHE.
  • create flist for firecracker/K3OS images (or integrate firecracker in zos (0-initramfs)
ls -lh build/cargo_target/x86_64-unknown-linux-musl/release/{firecracker,jailer}
-rwxr-xr-x 2 delandtj delandtj 2.7M Jan 14 10:05 build/cargo_target/x86_64-unknown-linux-musl/release/firecracker
-rwxr-xr-x 2 delandtj delandtj 2.4M Jan 14 09:58 build/cargo_target/x86_64-unknown-linux-musl/release/jailer
  • create tap device and attach to NR -> needs schema definition ?
    There is no schema definition for VMs, I think that will be necessary

  • manage reservation size for volumes is not really specified, we wouldn't want quota break things

  • create an automated install procedure for k3os :

    • prepare tap, attach to NR, bring up, disable_ipv6 (networkd)
    • prepare volume fallocate, truncate (storaged)
    • use api to configure vm fc instance (vmd)
      • mac addr
      • volume name / place
      • k3os.mode=install (and others)
    • boot autoinstall , wait ? kill ? (k3os automated install by default halts, instead of reboot :-/ ) (provisiond)
    • start it (provisiond)

network: Logic to request a new network resource for a node joining a network

When a node needs to join a network for the first time, a new network resource (NR) needs to be added to the tenant network object (TNo).

To create a new NR, the TNoDB requires 2 information from the node:

  • a free port
  • the public wireguard key of the node for this TNo

We need to implement this communication between the node and the TNoDB. Currently the networker interface only expose a method to publish the public wireguard key https://github.com/threefoldtech/zosv2/blob/82028f15dd91604c6fd443dda2d7332b632e99f4/modules/network.go#L12-L18

This interface needs to be rethink to change PublishWGPubKey to a method that ask the TNoDB to add a new NR to the TNo.

Design container module

The container module is going to be responsible for exposing an interface on top of the chosen OCI runtime.

Document file location of each module

I would like to have clear view on all the file that each module need to be able to write/create.

Then from there we can have a reflection if this is the best layout or if we can improve things a bit.

The main idea is I want to avoid that any module can write anywhere on the filesystem. This leads to less security and make things harder do debug.

Research and design for versioning, update and upgrades

One of the main feature we want to provide with this version of 0-OS is the "auto-update". The idea is that the system needs to be able to update itself and each of its component with the minimum downtime possible for the workloads. Minimum in most case would be no downtime at all.

Keeping this in mind actually drives quite a bit the design decision to make for all module of 0-OS.
One of them I would like to discuss here is how are we going to handle versioning of the system.

Since every component is by definition modular and choose be able to be change in place with another one or another version, a single version for the OS is not going to be something meaningful. So instead of sticking with plain version I had another idea.
Having 3 "flavor" of 0-OS, main, dev and test like tfchain does.

Having these 3 "flavor" will allow to actually have different network into the grid:

  • Dev: ephemeral network only setup to develop and test new features. Can be created and reset at anytime
  • Test: Mostly stable feature that needs to be tested at scale, allow preview and test of new features. Always the latest greatest. This network can be rest sometimes, by should be relatively stable.
  • Main: Released of stable version. Used to run the real grid with real money. Cannot be reset ever. Only stable and battle tested feature reach this level.

This allow each component to progress at its own pace, have a separate semantic version. A flavor will be composed of different version of each modules. Once a new version of a module is ready, it will just make its way trough the 3 flavor from dev to main.

Each "flavor" will actually create a separate grid, the node will always ever connect to other node with the same "version". This simplify the code cause you know you don't have to deal with different version and can be sure of the feature of the node you're talking too.

The flow of a new feature will bubble up from dev to main. Every time a new feature is upgraded to the next level, all the node in this network will receive the update automatically.
This makes upgrading a network trivial and also ensure that upgrade procedure are at least tested at scale 2 time before reaching main net.

storaged: /var/cache is mounted twice is already mounted

When storaged is restarted and /var/cache is already mounted, it will mount it again giving a double mountpoint like

/dev/sda on /var/cache type btrfs (rw,relatime,ssd,space_cache,subvolid=257,subvol=/zos-cache)
/dev/sda on /var/cache type btrfs (rw,relatime,ssd,space_cache,subvolid=257,subvol=/zos-cache)

Storaged should not mount it a second time

Storage: disk caching

Right now the storage module does a scan of the disks every time something changes the disk layout (filessytem creation, in the future maybe partitioning, ...). This prevents us from bringing down the disks completely.

It should be possible to manually modify the in memory device when these actions are done, allowing us to shut down the disks and not rescan

Extract network manipulation logic from tnodb

Currently most of the logic regarding network object manipulation lives inside the tnodb_mock

While this makes things super easy to use, it doesn't really fit with the concept where the node only provision what they get from the bcdb. Instead the node is now dynamically talking with the tnodb by itself and the network object is modified without the owning user knowing it.

To solve this and move the full control to the user we need to let the user create the network object by himself (using the lib, manually this is way too complex)
Then the user can send the network object as a provisioning request to the bcdb.
The bcdb will only validate the content of the network object. If the network object is not correct the provisioning will be refused by the bcdb.

Tasks:

  • extract all the network object manipulation logic into a library (fb3d753)
  • remove some endpoint from the tnodb_mock (fb3d753)
    • create network
    • add member
    • add user
  • implement a full network object validation that will be used by the bcdb when provisining network: will be done in #132
  • update provisiond to support the a network object as provisioning request (aed0c23)
  • networkd should not watch the network object anymore, but just react upon provisiond request (watcher could be move to provisiond) (aed0c23)

containerd zero-fs integration

  • Check if it's possible to implement a new image type to support zero-fs directly in containerd
  • Plan B, the rootfs of the container is mounted first then sent to containerd in the runc specs

Provisioning module has no notion of reservation expiration

Currently in the provisioning module, the reservation have no notion of time. We lack the information about for how long does the reservation needs to be live.

We need to add a duration to the Reservation struct, so we can know if a reservation is still valid or if it should be deleted from the system

Inter node communication

I would like to discuss here how we are going to create a network of nodes and how these nodes are going to communicate.

The idea behind the TFgrid is that each 0-OS node is a stateless capacity provider and ThreeBot are the directors.
Since the ThreeBot needs to be able to reach every single node in the network, and the ThreeBot are running on top of the 0-OS node, it means the nodes needs to be able to connect to each other.
This idea is simple enough, but practically this raises some questions.

  • How does the nodes knows about each other?
  • What do a node to join a network ?
  • How does the network handle a node leaving the network ?

Zerotier was the way to go for these things so far. But it has proven to not be scalable, extremely hard to manage and not usable at all in an environment where a lot of nodes are in the same LAN.

I think some kind of inter node communication protocol has to be designed to allow to create a fully distributed network where node organize themself and can route information though the network. Trying to create direct p2p connection between node when possible, and finding other routed through publicly reachable nodes when not possible.

Enable missing unit tests in CI

Now that most of the WIP modules have been merged to master I would like we start having a proper build pipeline, CI to run as much test as possible and that we look at how we need to use go mod to version all the code.

I'm still pretty new to go mod so I don't know if what we have today is valid or not.

  • Enable travis-ci
  • Prepare tests environment for CI (some of the package are already being tested, still to activated)
    • flist
    • provision
    • storage
    • gedis
  • Create Makefile for building of all binaries

define clear boot flow

We can organize the services boot in stages by creating pseudo stages. For example we can create a service called init that exec 'true' as one shot and depends on all boot services (udev, settle, network, etc) then all other second stage services can depend only on init

zinit: PID 1 for ZOS v2

Project

https://github.com/threefoldtech/zinit

Tasks

  • process manager in rust/toki POC
  • resolve dependencies of the services
  • load services config from a directory
  • simple unix socket api to management
  • command line tool to manage the process manager (status, stop, start, restart, and reload)
  • zombie reaping ๐Ÿ˜จ

allow to stream container logs to remote endpoints

Since there will be no way for user to connect directly to a 0-OS node, we want to add the possibility to a node to stream the logs of a container to remote endpoint.

During the creation of the container, the user will specify the endpoint location and type. Then during the lifetime of the container, the 0-OS node will stream the logs to the endpoint.

As a first supported endpoint type:

  • redis
  • 0-db

Design storage module

The storage module will be responsible for everything that is related to storing information on a long term medium, usually a disk.

Sub components:

  • 0-db management
    • capacity planning, reservation of 0-db and 0-db namespaces
  • disks management
    • formatting of the disk
    • health monitoring
    • management of volume used by containers

Design node identity

We need to decide how we are going to identify a node.

Since most of the tfgrid identity system will be based on a key pair, we could for each node, generate a key pair. Use the base64 encoded version of the public key as identity of a node.

Example: https://play.golang.org/p/IycmGa1USUi

Now this brings one problem, this key pair will have to be stored on the node disk itself. Which means we bring some state in 0-OS. That is something we want to avoid.
Once possible solution for that would be to use a deterministic seed to generate the key pair that is unique for a node (hardware serial number,...)
This would allow to always be able to generate the key pair. But I'm not sure this is doable without compromising the full security. Since if someone can deduct the seed, he can then make himself looks like a certain node.

IPC infrastructure

We need to decide how API layer will discover and talk to all the low level modules.
Requirements:

  • module needs to be discoverable
  • module needs to be able to be upgraded without impacting the higher layer
  • Capable of propagating events up the stack

Design network module

Since we are going to use CNI as much as possible for network configuration, the network module is not going to be responsible to actually do network configuration. But instead it will expose an API that let a user create some CNI compatible configuration file.

A 0-OS being by nature a multi user platform. We need to be able to provide private network for users so the containers from user A running on a node wont have access to container from user B.
So the network module will be responsible for the management of these private network configuration on a node.

Only create storagepools when needed

Right now the storage module greedily takes all available devices and creates storagepools on them when it starts, this should be changed to only create a storagepool when a filesystem is requested and there is no more space in the existing pools

Resource IDs and ownership of resources

Reservation can (and will) happen on multiple steps. For example to start a container you probably have to do the following:

  • Create a storage volume
  • Allocate network resource
  • Create container, and assign both the volume, and the network resource to that container.

The problem is, on creating a container you need to pass the the information of the volume (may be the volume full path) and the network resource identity as well. But there is no way to grantee that the caller of the container creation is the rightful owner of the associated resources (volume and/or network)

We think that the IDs of the allocated resources should carry some information about the owner of the resource. so in that case, if the id of the volume and the network matches the owner of the container, the container creation should pass.

Proposal

an id of a resource can be a jwt token with payload that contains both the resource information needed, plus the tenant id. The jwt token is signed with the node private key so the jwt is only valid on the node where the resource is allocated.

Example, you first allocate a volume, the volume id can hold the

{
  "path": "/pool/volume",
  "tenant": "id of the owner"
}

then on container creation, the jwt token is passed as the volume id. the node can then verify that the jwt has valid signature, then matches the tenant id to the tenant id of the container.create caller. Once verified, the volume path can be mounted inside the container.

The only drawback for using this technique is the size of the jwt token.

Design security structure

0-OS is by nature a shared system. It's main goal is to allow different user to use capacity provided by the hardware.
This of course raises security concerns:

  • How do I ensure that the data I write on a disk is not going to be accessible by someone else ?
  • How can we ensure that networks from one user is not reachable by another ?
  • How do I authenticate user when they are talking to the public API of the OS ?
  • How are we going to allocate resource to a certain user (we are a reservation based system)
  • ...

All these question needs to be resolve and be a first though when developing all the module that compose 0-OS.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.