microsoft / demikernel Goto Github PK
View Code? Open in Web Editor NEWKernel-Bypass LibOS Architecture
Home Page: https://aka.ms/demikernel
License: MIT License
Kernel-Bypass LibOS Architecture
Home Page: https://aka.ms/demikernel
License: MIT License
Currently MTU size is hardcode as a constant in the network::consts
module. Instead, we should implement a auto-discovery function for this value.
We are missing some prerequisites in our setup script for debian (scripts/setup/debian.sh)
.
Our build system is currently broken because we are not solving PHONY conflicts in libcatnip.
We should add the URL of our project's homepage to our README file.
We should refuse an incoming connection and send a RST
packet if a passive socket has reached its maximum backlog length.
demikernel/src/rust/inetstack/protocols/tcp/passive_open.rs
Lines 260 to 263 in 3b1dc96
In the ARP cache, we should enable multiple waiters for the same address.
sudo -E RUST_LOG=trace CONFIG_PATH=~/config.yaml LIBOS=catpowder ./bin/examples/rust/tcp-accept.elf --address 172.19.17.86:12345 --peer server --nclients 128 --run-mode parallel
sudo -E RUST_LOG=trace CONFIG_PATH=~/config.yaml LIBOS=catpowder ./bin/examples/rust/tcp-accept.elf --address 172.19.17.86:12345 --peer client --nclients 128 --run-mode parallel
tcp-accept
operation exposes this bug.In protocols::tcp::established::ControlBlock::receive_data()
and potentially everywhere, we are casting fixed sizes to usize
to perform some checks.
We should do the other way around.
When sending ping requests, we spawn a coroutine to wait for the corresponding response and we enqueue this co-routine for later processing.
Unfortunately, however, if the remote peer does not reply the ping or the ping packet gets lost, the queue of pending receive coroutines will grow indefinitely.
We should drain the queue of pending responses from time to time.
Running test_connect
currently fails with:
Failed to terminate connection: TrySendError { kind: Disconnected }
thread 'protocols::tcp::tests::test_connect' panicked at 'Failed to terminate connection: TrySendError { kind: Disconnected }', src/protocols/tcp/established/background/mod.rs:58:14
stack backtrace:
0: rust_begin_unwind
at /rustc/ca82264ec7556a6011b9d3f1b2fd4c7cd0bc8ae2/library/std/src/panicking.rs:493:5
1: core::panicking::panic_fmt
at /rustc/ca82264ec7556a6011b9d3f1b2fd4c7cd0bc8ae2/library/core/src/panicking.rs:92:14
2: core::result::unwrap_failed
at /rustc/ca82264ec7556a6011b9d3f1b2fd4c7cd0bc8ae2/library/core/src/result.rs:1355:5
3: core::result::Result<T,E>::expect
at /home/gatowololo/.rustup/toolchains/nightly-2021-05-10-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/result.rs:997:23
4: catnip::protocols::tcp::established::background::background::{{closure}}
at ./src/protocols/tcp/established/background/mod.rs:56:9
This line:
dead_socket_tx
.unbounded_send(fd)
.expect("Failed to terminate connection");
As the error message states, the unbound_send
always fails because the receiver is disconnected. So we always panic here.
Looking at the code. It seems the channel pair is created here:
impl<RT: Runtime> Peer<RT> {
pub fn new(rt: RT, arp: arp::Peer<RT>, file_table: FileTable) -> Self {
let (tx, _rx) = mpsc::unbounded();
let inner = Rc::new(RefCell::new(Inner::new(rt.clone(), arp, file_table, tx)));
Self { inner }
}
...
}
As you can see the receiver is immediately dropped. So I don't see how this code is every supposed to not fail. Or what is the point of it since it doesn't seem to do anything?
We should add cleanup hooks to the initialization function of demikernel, so as to enable a more transparent environment cleanup experience. This function should for instance wait on all pending operations to complete as well as release any allocated resources.
When receiving data in active sockets we should handle window size overflow errors properly.
ActiveOpenSocket::receive()
We are currently missing remove()
for HashTtlCache
When the remote is already in the ARP cache, we can greatly speedup the connect
operation by inlining fast path in protocols::tcp::active_open::new()
We should rely on reusable workflows for GitHub Actions.
We should enable to specify device parameters from environment variables.
Currently these are hardcoded when using the C bindings:
Currently, we dump timer information in cycles, which is not user-friendly. Instead it would be nice to report this statistics in seconds.
rdtscp
instruction. Therefore, in order to present reliable information to the end user, by simply dividing the number of cycles by the operating frequency, we must ensure that the time stamp counter runs in invariant mode. Otherwise, we should come up with some approximation strategy.In protocols::tcp::active_open::receive()
, we currently panicking on a bad window scale, instead of failing or rounding down the provided value.
We should handle reset generation correctly [RFC 793].
If the connection does not exist (CLOSED) then a reset is sent in response to any incoming segment except another reset. In particular, SYNs addressed to a non-existent connection are rejected by this means.
If the incoming segment has an ACK field, the reset takes its sequence number from the ACK field of the segment, otherwise the reset has sequence number zero and the ACK field is set to the sum of the sequence number and segment length of the incoming segment. The connection remains in the CLOSED state.
If he connection is in any non-synchronized state (LISTEN, SYN-SENT, SYN-RECEIVED), and the incoming segment acknowledges something not yet sent (the segment carries an unacceptable ACK), or if an incoming segment has a security level or compartment which does not exactly match the level and compartment requested for the connection, a reset is sent.
If our SYN has not been acknowledged and the precedence level of the incoming segment is higher than the precedence level requested then either raise the local precedence level (if allowed by the user and the system) or send a reset; or if the precedence level of the incoming segment is lower than the precedence level requested then continue as if the precedence matched exactly (if the remote TCP cannot raise the precedence level to match ours this will be detected in the next segment it sends, and the connection will be terminated then). If our SYN has been acknowledged (perhaps in this incoming segment) the precedence level of the incoming segment must match the local precedence level exactly, if it does not a reset must be sent.
If the incoming segment has an ACK field, the reset takes its sequence number from the ACK field of the segment, otherwise the reset has sequence number zero and the ACK field is set to the sum of the sequence number and segment length of the incoming segment. The connection remains in the same state.
If the connection is in a synchronized state (ESTABLISHED, FIN-WAIT-1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, TIME-WAIT), any unacceptable segment (out of window sequence number or unacceptible acknowledgment number) must elicit only an empty acknowledgment segment containing the current send-sequence number and an acknowledgment indicating the next sequence number expected to be received, and the connection remains in the same state.
If an incoming segment has a security level, or compartment, or precedence which does not exactly match the level, and compartment, and precedence requested for the connection, a reset is sent and connection goes to the CLOSED state. The reset takes its sequence number from the ACK field of the incoming segment.
We should add missing information in Cargo.toml
When receiving data in an active socket, we should clean up ARP query control flow.
ActiveOpenSocket::receive()
Some tests stall in the testing infrastructure and cause future tests to fail.
We should add an auto kill feature to our regression infrastructure to prevent this situation from happening.
We should setup build regressions to avoid breaking builds in the future.
We should implement the following missing functionalities in the TCP layer:
System calls are currently prefixed with dmtr
. We should change this.
Currently, when we close a TCP connection using the close()
system call, we do not wait()
for the connection FIN packet to arrive, thereby leaving the connection in an unknown state.
We should bump DPDK version to 22.21.
In protocols::tcp::peer:TcpPeer::connect()
we should free ephemeral ports once an active connection is closed.
After building Demikernel, we should copy build artifacts to INSTALL_PREFIX
Compiling const_format v0.2.14 (/home/jack/Dev/demikernel/src/rust/const_format_crates/const_format)
error[E0658]: const generics are unstable
--> /home/jack/Dev/demikernel/src/rust/const_format_crates/const_format/src/const_generic_concatcp.rs:6:39
|
6 | pub const fn __priv_concatenate<const LEN: usize>(input: &[PArgument]) -> LenAndArray<[u8; LEN]> {
| ^^^
|
= note: see issue #74878 <https://github.com/rust-lang/rust/issues/74878> for more information
= help: add `#![feature(min_const_generics)]` to the crate attributes to enable
error[E0658]: const generics are unstable
--> /home/jack/Dev/demikernel/src/rust/const_format_crates/const_format/src/marker_traits/format_marker.rs:268:15
|
268 | impl<T, const N: usize> FormatMarker for [T; N] {
| ^
|
= note: see issue #74878 <https://github.com/rust-lang/rust/issues/74878> for more information
= help: add `#![feature(min_const_generics)]` to the crate attributes to enable
error[E0658]: const generics are unstable
--> /home/jack/Dev/demikernel/src/rust/const_format_crates/const_format/src/fmt/str_writer.rs:458:12
|
458 | impl<const N: usize> StrWriter<[u8; N]> {
| ^
|
= note: see issue #74878 <https://github.com/rust-lang/rust/issues/74878> for more information
= help: add `#![feature(min_const_generics)]` to the crate attributes to enable
error: aborting due to 3 previous errors
Add const_format = { version = "*", features = ["nightly_const_generics"] }
to src/rust/catnip/Cargo.toml
to solve the problem.
Rustc version rustc 1.47.0-nightly (f44c6e4e2 2020-08-24)
We are missing a GitHub Action workflow for TCP that exercises push-pop.
We currently have the RECEIVE_BATCH_SIZE
harcode as a constant value in network::consts
. Instead we should make this genericly available in network::NetworkRuntime
;
demikernel/runtime#9
We should implement close()
for passive sockets.
When receiving data in active sockets we should handle window size overflow errors properly.
PassiveOpenSocket::receive()
We should add the following improvements to the TCP layer so as to improve performance:
We currently don't support the shutdown
operation. We should, if we aim a compliant TCP API.
We have currently disable entry eviction in the ARP cache due to missing functionalities in the network stack.
Once we introduce them we should re-enable eviction in catnip::protocols::arp::peer::background()
We currently have 32 bytes in the WakerPage
unused. We should make something useful out of this space.
One simple idea would be to halve the size of the structure itself.
Currently, we are not handling RST
packets. For instance, in protocols::tcp::peer::Inner::send_rst()
we are not waiting for ARP replies if needed.
We are missing a GitHub Action workflow for TCP that exercises ping-pong.
The fallback MSS value is hard-code.
We should compute the fallback MSS value based on the value of the MTU, IP and TCP headers.
We should implement the follow features in order to improve resource usage of TCP layer:
TCP checksum offloading seems to be malfunctioning. We should investigate and fix, if needed.
In PassiveOpenSocket::receive()
, we should drop packets that show up for connections that have not been accepted yet.
We should have a unit test that sanity checks return values of system calls.
We should expose C bindings for popfrom
operation.
We should improve "contributing" section in README to welcome people to the project.
We are currently using watched values to drive the TCP stack, we should trigger event actions in a direct way instead.
It would be nice if the API exposed some way to have threads that could each have their own connections (through whatever libos) so applications that require multithreading are easier to implement. This would require work in each libos, because I believe each libos (e.g., DPDK or RDMA) has its own way of handling threads and queues.
Output format for the profiler tool is currently hardcoded.
It would be nice to enable one to use a custom output format.
profiler::write()
that shall be used to output formatting.A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.