Git Product home page Git Product logo

Comments (7)

MrCroxx avatar MrCroxx commented on September 22, 2024

The duration between propose and append_log_entries may be the main reason that cause the high latency.

Consider this request:

The request was proposed at (approximately) 14:25:00.379, but it was appended at 14:25:09.171, whose span almost takes the full part of the request latency.

image

image

from runkv.

MrCroxx avatar MrCroxx commented on September 22, 2024

The problem may be the biased strategy when selecting proposal/message/ready futures. If there is any proposal/message in the channels, ready cannot be handled.

        // [`Interval`] with default [`MissedTickBehavior::Brust`].
        let mut ticker = tokio::time::interval(RAFT_HEARTBEAT_TICK_DURATION);

        loop {
            let mut handled = !self.raft.has_ready().await;

            tokio::select! {
                biased;

                _ = ticker.tick() => {
                    self.tick().await;
                }

                Some(msg) = self.message_rx.recv() => {
                    self.step(msg).await?;
                }

                Some(proposal) = self.proposal_rx.recv() => {
                    self.propose(proposal).await?;
                }

                true = self.raft.has_ready() => {
                    self.handle_ready().await?;
                    handled = true;
                }
            }

            if !handled {
                tracing::debug!("has ready but skipped by select");
            }
        }

from runkv.

MrCroxx avatar MrCroxx commented on September 22, 2024

The latencies of the first several requests looks not that bad.

[2022-05-03T12:09:00Z INFO  runkv_wheel::service] txn inner takes: 214.048858ms
[2022-05-03T12:09:00Z INFO  runkv_wheel::service] txn inner takes: 214.313888ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 496.622797ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 496.767884ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 496.874619ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 496.962605ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 497.024285ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 497.095174ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 497.2154ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 497.304463ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 726.521743ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 726.764192ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 726.911805ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 727.030536ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 722.18851ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 722.245253ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 722.355362ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 722.433439ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 722.474536ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 722.509246ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 722.512452ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 722.52525ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 722.675ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 716.390985ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 716.392ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 716.459951ms
[2022-05-03T12:09:01Z INFO  runkv_wheel::service] txn inner takes: 716.573696ms

And then the latencies increased to:

[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.778585874s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.776506203s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.77658502s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.775064749s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.769043578s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.766367544s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.765441751s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.761491207s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.753628482s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.753706547s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.751520124s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.748699473s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.748728649s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.745165191s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.745235667s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.742317884s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.737841398s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.734233232s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.731525069s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.731588724s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.72597238s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.724493535s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.724580558s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.722938509s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.71310062s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.708483158s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.708503405s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.706522122s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.616059401s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.577987709s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.348751476s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.344788489s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.324087938s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.324066748s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.306472587s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.302453369s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.289047649s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.289096015s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.287332311s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.283898354s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.506929926s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.482430406s
[2022-05-03T12:09:27Z INFO  runkv_wheel::service] txn inner takes: 3.37454705s

from runkv.

MrCroxx avatar MrCroxx commented on September 22, 2024

As in #131 , append_log_entires may be the biggest issue for the high latency.

The raft latency in theory is: lat = leader I/O + network RTT + follower I/O + network RTT + fsm apply, and we can see in metrics that the service latency basicly match the formula, append_log_entries latency is the largest item in it.

from runkv.

MrCroxx avatar MrCroxx commented on September 22, 2024

Currently:

benches:

  1. 10 groups, 100 concurrency.
  2. 10 groups, 1 concurrency.
  3. 1 group, 1 concurrency.

image

from runkv.

MrCroxx avatar MrCroxx commented on September 22, 2024

On Ubuntu Server LTS 22.04, i7-10700, KBG40ZNS256G NVMe KIOXIA 256GB.

fio fsync latency:

  fsync/fdatasync/sync_file_range:
    sync (usec): min=894, max=36601, avg=1330.50, stdev=872.29
    sync percentiles (usec):
     |  1.00th=[  988],  5.00th=[  996], 10.00th=[ 1037], 20.00th=[ 1057],
     | 30.00th=[ 1090], 40.00th=[ 1106], 50.00th=[ 1139], 60.00th=[ 1156],
     | 70.00th=[ 1172], 80.00th=[ 1221], 90.00th=[ 2638], 95.00th=[ 2737],
     | 99.00th=[ 2868], 99.50th=[ 2966], 99.90th=[ 3523], 99.95th=[19792],
     | 99.99th=[35914]

image

from runkv.

MrCroxx avatar MrCroxx commented on September 22, 2024

Close.

from runkv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.