Git Product home page Git Product logo

Comments (19)

uNetworkingAB avatar uNetworkingAB commented on August 17, 2024 4

I've added initial io_uring in v21:

image

from fastwebsockets.

uNetworkingAB avatar uNetworkingAB commented on August 17, 2024 1

Oh wow, uWS is 10% faster on 16 kb echoes with writev now :D

from fastwebsockets.

uNetworkingAB avatar uNetworkingAB commented on August 17, 2024

For next rerun, I have a few relevant changes in v20.41.0

on master,

load_test now takes byte length and you can specify any length (it swaps from short, medium to long messages as needed)

from fastwebsockets.

uNetworkingAB avatar uNetworkingAB commented on August 17, 2024

For 16kb messages at 500 connections there's more than 100% diff:

Using message size of 16000 bytes
Running benchmark now...
Msg/sec: 60466.250000
Msg/sec: 60521.250000
Msg/sec: 61029.250000

Using message size of 16000 bytes
Running benchmark now...
Msg/sec: 124614.000000
Msg/sec: 122536.500000

So those graphs are quite misleading as of now

from fastwebsockets.

littledivy avatar littledivy commented on August 17, 2024

Can reproduce this 👍

Areas to improve:

  1. Payloads are always copied over, it should be a clone-on-write view to a shared recv buffer. I wanted to do this earlier but Rust lifetimes won't let us do this with the current API.

We also cannot use the normal std::borrow::Cow here because masking happens in-place and we need a mutable borrow to the recv buffer. Instead something like this:

pub enum MutCow<'a, B>
where
    B: 'a + ToOwned + ?Sized,
    <B as ToOwned>::Owned: AsRef<B> + AsMut<B>,
{
    Borrowed(&'a mut B),
    Owned(<B as ToOwned>::Owned),
}

  1. Be smart about using vectored writes. I think we should just enable writev when frame size is large enough. Alternatively, we should just improve the write buffer logic with sendto.

  1. Excessive yields back to the Tokio scheduler. Under heavy load (~500 conns), I/O resources are almost always ready and quickly fill up the coop budget in Tokio - this forces Tokio to yield back to the scheduler so that "other tasks" can get a chance to be polled.

    However in this particular echo_server benchmark there are no "other
    tasks" we care about and we essentially end up wasting time.

from fastwebsockets.

littledivy avatar littledivy commented on August 17, 2024

Meh, I just realised MutCow is an overkill and Frame payloads can just be a &'f mut [u8] :)

from fastwebsockets.

bartlomieju avatar bartlomieju commented on August 17, 2024

Excessive yields back to the Tokio scheduler. Under heavy load (~500 conns), I/O resources are almost always ready and quickly fill up the coop budget in Tokio - this forces Tokio to yield back to the scheduler so that "other tasks" can get a chance to be polled.

Wrap relevant task in https://docs.rs/tokio/latest/tokio/task/fn.unconstrained.html to avoid forced yields.

from fastwebsockets.

littledivy avatar littledivy commented on August 17, 2024

Cool, I was playing with tokio-uring someday and it seems doable to add feature-gated code to support tokio-uring tcp streams. https://docs.rs/tokio-uring/latest/tokio_uring/net/struct.TcpStream.html#method.read

from fastwebsockets.

littledivy avatar littledivy commented on August 17, 2024

Published fastwebsockets 0.4.2

@uNetworkingAB you might be interested in these charts:

image

image

from fastwebsockets.

littledivy avatar littledivy commented on August 17, 2024

Current analysis:

fastwebsockets uWS conn size % (+/-)
197921 203761 10 20 -3%
211226 214914 200 20 -2%
213680 227030 500 20 -5%
101496 86058 10 16386 18%
122088 97946 200 16386 25%
106938 80347 500 16386 33%

from fastwebsockets.

uNetworkingAB avatar uNetworkingAB commented on August 17, 2024

Ah, yes writev with 2 chunks beats write for long messages, not something I've bothered with (yet?). The short message bars make no sense though, they definitely do not match what I see here. I see at least 40% better short message perf. (1 kb and less) with uWS . You never tried v21, right? Even v20 beats fastwebsockets v0.4.2 on small messages by at least 15%, but the diff is extremely apparent in v21.

from fastwebsockets.

littledivy avatar littledivy commented on August 17, 2024

Does v21 use epoll/kqueue by default for EchoServer?

from fastwebsockets.

uNetworkingAB avatar uNetworkingAB commented on August 17, 2024

Don't get me wrong, this competition is good. I'm already looking at adding no-copy writev sends for anything above a threshold. This is good, and I can confirm those numbers, but current short message numbers are way off.

v21 defaults are epoll, there is a release post how to compile with io_uring but you need Linux 6.0 or later.

from fastwebsockets.

littledivy avatar littledivy commented on August 17, 2024

Small msgs with uWS v21 EchoServer

fastwebsockets uWS conn size % (+/-)
191362 208341 10 20 -8%
211942 216165 200 20 -1.9%
200574 224980 500 20 -10%
Linux divy 5.19.0-1022-gcp 
#24~22.04.1-Ubuntu SMP x86_64 GNU/Linux

32GiB System memory
Intel(R) Xeon(R) CPU @ 3.10GHz

It does degrade to 10% but I cannot reproduce the drastic ~40% here.

from fastwebsockets.

uNetworkingAB avatar uNetworkingAB commented on August 17, 2024

It needs Linux 6.0. You are on 5.19. You also need to recompile the load_test so that it uses io_uring. Otherwise you just have epoll trying to stress io_uring. You know it's right if strace only lists io_uring_enter, for both EchoServer and load_test.

from fastwebsockets.

littledivy avatar littledivy commented on August 17, 2024

I want to compare epoll based implementations for now to find out why there is a 40% degrade you see.

The uWS EchoServer compiled is epoll and above results are for that. Is the 40% diff you see because of io_uring? (then that explains the diff)

from fastwebsockets.

uNetworkingAB avatar uNetworkingAB commented on August 17, 2024

Yes 40% is from io_uring on Linux 6.0. There are features of 6.0 that are very central to that bigger diff and that's why I target this kernel version as minimum. This backend will be default as soon as it is stable, so it would be very strange to exclude it.

Anyways, first thing is probably adding this writev send path so we don't have gigantic diffs on bigger messages. I did remember why I never added it though - it's not applicable for compressed messages or SSL, so it's a very specific bypass for only non-ssl, non-compressed, big messages.

from fastwebsockets.

littledivy avatar littledivy commented on August 17, 2024

Cool, the 40% diff will be relevant once fastwebsockets has a iouring backend. Opened #31 for tracking iouring support.

Self note: Add SSL benchmarks sometime in the future.

Anyways, I believe most of the things have been fixed and I'll continue to improve perf on small msgs (max 10% diff is fine for now). Feel free to open more related issues - this has been constructive 👍

from fastwebsockets.

uNetworkingAB avatar uNetworkingAB commented on August 17, 2024

Yes competition creates incentive to improve, which is good. I will have writev fix done any time now.

from fastwebsockets.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.