Git Product home page Git Product logo

demikernel's Introduction

Demikernel

Join us on Slack! Catnip LibOS Catnap LibOS Catmem LibOS Catpowder LibOS Catloop LibOS

Demikernel is a library operating system (LibOS) architecture designed for use with kernel-bypass I/O devices. This architecture offers a uniform system call API across kernel-bypass technologies (e.g., RDMA, DPDK) and OS functionality (e.g., a user-level networking stack for DPDK).

To read more about the motivation behind the Demikernel, check out this blog post.

To get details about the system, read our paper in SOSP '21.

To read more about Demikernel check out https://aka.ms/demikernel.

Codename for LibOSes

  • catloop - TCP Socket Loopback LibOS
  • catmem - Shared Memory LibOS
  • catnap - Linux Sockets LibOS
  • catnip - DPDK LibOS
  • catpowder - Linux Raw Sockets

Documentation

  • For instructions on development environment setup, see doc/setup.md.
  • For instructions on building, see doc/building.md.
  • For instructions on testing and running, doc/testing.md.
  • For instructions for running on CloudLab, see doc/cloudlab.md.
  • For documentation on the API, see documents in man.
  • For instructions on how to contribute to this project, see CONTRIBUTING.

Usage Statement

This project is a prototype. As such, we provide no guarantees that it will work and you are assuming any risks with using the code. We welcome comments and feedback. Please send any questions or comments to one of the following maintainers of the project:

By sending feedback, you are consenting that it may be used in the further development of this project.

Trademark Notice

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft’s Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party’s policies.

demikernel's People

Contributors

aj-austin avatar anandbonde avatar annakornfeldsimpson avatar ashmrtn avatar brianzill avatar carvalhof avatar deepakverma avatar deeptir18 avatar ethandmd avatar gatowololo avatar ihchoi12 avatar iyzhang avatar jingliu9 avatar joshuafried avatar kirkolynyk avatar kwzhao avatar kyleholohan avatar mlr-msft avatar osalbahr avatar ppenna avatar stolet avatar sujayakar avatar viniciusfdasilva avatar zhangwen0411 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

demikernel's Issues

[inetstack] Handle Reset Generation Correctly

Description

We should handle reset generation correctly [RFC 793].

Case 1

If the connection does not exist (CLOSED) then a reset is sent in response to any incoming segment except another reset. In particular, SYNs addressed to a non-existent connection are rejected by this means.

If the incoming segment has an ACK field, the reset takes its sequence number from the ACK field of the segment, otherwise the reset has sequence number zero and the ACK field is set to the sum of the sequence number and segment length of the incoming segment. The connection remains in the CLOSED state.

Case 2

If he connection is in any non-synchronized state (LISTEN, SYN-SENT, SYN-RECEIVED), and the incoming segment acknowledges something not yet sent (the segment carries an unacceptable ACK), or if an incoming segment has a security level or compartment which does not exactly match the level and compartment requested for the connection, a reset is sent.

If our SYN has not been acknowledged and the precedence level of the incoming segment is higher than the precedence level requested then either raise the local precedence level (if allowed by the user and the system) or send a reset; or if the precedence level of the incoming segment is lower than the precedence level requested then continue as if the precedence matched exactly (if the remote TCP cannot raise the precedence level to match ours this will be detected in the next segment it sends, and the connection will be terminated then). If our SYN has been acknowledged (perhaps in this incoming segment) the precedence level of the incoming segment must match the local precedence level exactly, if it does not a reset must be sent.

If the incoming segment has an ACK field, the reset takes its sequence number from the ACK field of the segment, otherwise the reset has sequence number zero and the ACK field is set to the sum of the sequence number and segment length of the incoming segment. The connection remains in the same state.

Case 3

If the connection is in a synchronized state (ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT), any unacceptable segment (out of window sequence number or unacceptible acknowledgment number) must elicit only an empty acknowledgment segment containing the current send-sequence number and an acknowledgment indicating the next sequence number expected to be received, and the connection remains in the same state.

If an incoming segment has a security level, or compartment, or precedence which does not exactly match the level, and compartment, and precedence requested for the connection, a reset is sent and connection goes to the CLOSED state. The reset takes its sequence number from the ACK field of the incoming segment.

[inetstack] Multi-Core Architecture

It would be nice if the API exposed some way to have threads that could each have their own connections (through whatever libos) so applications that require multithreading are easier to implement. This would require work in each libos, because I believe each libos (e.g., DPDK or RDMA) has its own way of handling threads and queues.

[perftools] Support Custom Output Format

Description

Output format for the profiler tool is currently hardcoded.

It would be nice to enable one to use a custom output format.

Notes

  • Add an optional parameter to profiler::write() that shall be used to output formatting.
  • Enable one to register a output formatting callback function

[inetstack] `dead_socket_tx` Causes Unit Test Regressions to Fail

Running test_connect currently fails with:

Failed to terminate connection: TrySendError { kind: Disconnected }
thread 'protocols::tcp::tests::test_connect' panicked at 'Failed to terminate connection: TrySendError { kind: Disconnected }', src/protocols/tcp/established/background/mod.rs:58:14
stack backtrace:
   0: rust_begin_unwind
             at /rustc/ca82264ec7556a6011b9d3f1b2fd4c7cd0bc8ae2/library/std/src/panicking.rs:493:5
   1: core::panicking::panic_fmt
             at /rustc/ca82264ec7556a6011b9d3f1b2fd4c7cd0bc8ae2/library/core/src/panicking.rs:92:14
   2: core::result::unwrap_failed
             at /rustc/ca82264ec7556a6011b9d3f1b2fd4c7cd0bc8ae2/library/core/src/result.rs:1355:5
   3: core::result::Result<T,E>::expect
             at /home/gatowololo/.rustup/toolchains/nightly-2021-05-10-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/result.rs:997:23
   4: catnip::protocols::tcp::established::background::background::{{closure}}
             at ./src/protocols/tcp/established/background/mod.rs:56:9

This line:

        dead_socket_tx
            .unbounded_send(fd)
            .expect("Failed to terminate connection");

As the error message states, the unbound_send always fails because the receiver is disconnected. So we always panic here.

Looking at the code. It seems the channel pair is created here:

impl<RT: Runtime> Peer<RT> {
    pub fn new(rt: RT, arp: arp::Peer<RT>, file_table: FileTable) -> Self {
        let (tx, _rx) = mpsc::unbounded();
        let inner = Rc::new(RefCell::new(Inner::new(rt.clone(), arp, file_table, tx)));
        Self { inner }
    }
...
}

As you can see the receiver is immediately dropped. So I don't see how this code is every supposed to not fail. Or what is the point of it since it doesn't seem to do anything?

[inetstack] MTU Discovery

Description

Currently MTU size is hardcode as a constant in the network::consts module. Instead, we should implement a auto-discovery function for this value.

[pdpix] Cleanup Hooks

Description

We should add cleanup hooks to the initialization function of demikernel, so as to enable a more transparent environment cleanup experience. This function should for instance wait on all pending operations to complete as well as release any allocated resources.

[inetstack] Fast `connect()`

Description

When the remote is already in the ARP cache, we can greatly speedup the connect operation by inlining fast path in protocols::tcp::active_open::new()

[inetstack] Enable Multiple Waiters for the Same Address

Description

In the ARP cache, we should enable multiple waiters for the same address.

How to Reproduce


sudo -E RUST_LOG=trace CONFIG_PATH=~/config.yaml LIBOS=catpowder ./bin/examples/rust/tcp-accept.elf --address 172.19.17.86:12345 --peer server --nclients 128 --run-mode parallel

sudo -E RUST_LOG=trace CONFIG_PATH=~/config.yaml LIBOS=catpowder ./bin/examples/rust/tcp-accept.elf --address 172.19.17.86:12345 --peer client --nclients 128 --run-mode parallel

Updates

  • The tcp-accept operation exposes this bug.

[inetstack] Improve Resource Usage of TCP Layer

Description

We should implement the follow features in order to improve resource usage of TCP layer:

  • Memory usage
  • Better data structure for open ports
  • Slim down TCP control block
  • Slim down TCP background workers

[inetstack] Handle Cancellation of Ping Requests

Description

When sending ping requests, we spawn a coroutine to wait for the corresponding response and we enqueue this co-routine for later processing.

Unfortunately, however, if the remote peer does not reply the ping or the ping packet gets lost, the queue of pending receive coroutines will grow indefinitely.

https://github.com/demikernel/demikernel/blob/c2c867505a07dc4ac3831fc03d2163202997a0e2/src/rust/inetstack/protocols/icmpv4/peer.rs#L289-L292

Proposed Solution

We should drain the queue of pending responses from time to time.

[perftools] Dump Timer Information in Seconds

Description

Currently, we dump timer information in cycles, which is not user-friendly. Instead it would be nice to report this statistics in seconds.

Caveats

  • We get cycles information using the rdtscp instruction. Therefore, in order to present reliable information to the end user, by simply dividing the number of cycles by the operating frequency, we must ensure that the time stamp counter runs in invariant mode. Otherwise, we should come up with some approximation strategy.

[inetstack] Re-Enable Entry Eviction

Description

We have currently disable entry eviction in the ARP cache due to missing functionalities in the network stack.

Once we introduce them we should re-enable eviction in catnip::protocols::arp::peer::background()

Related Issues

[inetstack] Bad Type Casting

Description

In protocols::tcp::established::ControlBlock::receive_data() and potentially everywhere, we are casting fixed sizes to usize to perform some checks.

We should do the other way around.

[inetstack] Handle `RST` Packets

Description

Currently, we are not handling RST packets. For instance, in protocols::tcp::peer::Inner::send_rst() we are not waiting for ARP replies if needed.

[runtime] Make `RECEIVE_BATCH_SIZE` Generic

Description

We currently have the RECEIVE_BATCH_SIZE harcode as a constant value in network::consts. Instead we should make this genericly available in network::NetworkRuntime;

Related Issues

demikernel/runtime#9

[test] Auto Kill Zombie Tests

Description

Some tests stall in the testing infrastructure and cause future tests to fail.

We should add an auto kill feature to our regression infrastructure to prevent this situation from happening.

[build] Cannot Compile `const_format` Due to Missing Feature

   Compiling const_format v0.2.14 (/home/jack/Dev/demikernel/src/rust/const_format_crates/const_format)
error[E0658]: const generics are unstable
 --> /home/jack/Dev/demikernel/src/rust/const_format_crates/const_format/src/const_generic_concatcp.rs:6:39
  |
6 | pub const fn __priv_concatenate<const LEN: usize>(input: &[PArgument]) -> LenAndArray<[u8; LEN]> {
  |                                       ^^^
  |
  = note: see issue #74878 <https://github.com/rust-lang/rust/issues/74878> for more information
  = help: add `#![feature(min_const_generics)]` to the crate attributes to enable

error[E0658]: const generics are unstable
   --> /home/jack/Dev/demikernel/src/rust/const_format_crates/const_format/src/marker_traits/format_marker.rs:268:15
    |
268 | impl<T, const N: usize> FormatMarker for [T; N] {
    |               ^
    |
    = note: see issue #74878 <https://github.com/rust-lang/rust/issues/74878> for more information
    = help: add `#![feature(min_const_generics)]` to the crate attributes to enable

error[E0658]: const generics are unstable
   --> /home/jack/Dev/demikernel/src/rust/const_format_crates/const_format/src/fmt/str_writer.rs:458:12
    |
458 | impl<const N: usize> StrWriter<[u8; N]> {
    |            ^
    |
    = note: see issue #74878 <https://github.com/rust-lang/rust/issues/74878> for more information
    = help: add `#![feature(min_const_generics)]` to the crate attributes to enable

error: aborting due to 3 previous errors

Add const_format = { version = "*", features = ["nightly_const_generics"] } to src/rust/catnip/Cargo.toml to solve the problem.
Rustc version rustc 1.47.0-nightly (f44c6e4e2 2020-08-24)

[scheduler] Remove drop bits

Description

We currently have 32 bytes in the WakerPage unused. We should make something useful out of this space.

One simple idea would be to halve the size of the structure itself.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.