Comments (7)
The duration between propose
and append_log_entries
may be the main reason that cause the high latency.
Consider this request:
The request was proposed at (approximately) 14:25:00.379
, but it was appended at 14:25:09.171
, whose span almost takes the full part of the request latency.
from runkv.
The problem may be the biased
strategy when selecting proposal/message/ready futures. If there is any proposal/message in the channels, ready cannot be handled.
// [`Interval`] with default [`MissedTickBehavior::Brust`].
let mut ticker = tokio::time::interval(RAFT_HEARTBEAT_TICK_DURATION);
loop {
let mut handled = !self.raft.has_ready().await;
tokio::select! {
biased;
_ = ticker.tick() => {
self.tick().await;
}
Some(msg) = self.message_rx.recv() => {
self.step(msg).await?;
}
Some(proposal) = self.proposal_rx.recv() => {
self.propose(proposal).await?;
}
true = self.raft.has_ready() => {
self.handle_ready().await?;
handled = true;
}
}
if !handled {
tracing::debug!("has ready but skipped by select");
}
}
from runkv.
The latencies of the first several requests looks not that bad.
[2022-05-03T12:09:00Z INFO runkv_wheel::service] txn inner takes: 214.048858ms
[2022-05-03T12:09:00Z INFO runkv_wheel::service] txn inner takes: 214.313888ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 496.622797ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 496.767884ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 496.874619ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 496.962605ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 497.024285ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 497.095174ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 497.2154ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 497.304463ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 726.521743ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 726.764192ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 726.911805ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 727.030536ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 722.18851ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 722.245253ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 722.355362ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 722.433439ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 722.474536ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 722.509246ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 722.512452ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 722.52525ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 722.675ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 716.390985ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 716.392ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 716.459951ms
[2022-05-03T12:09:01Z INFO runkv_wheel::service] txn inner takes: 716.573696ms
And then the latencies increased to:
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.778585874s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.776506203s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.77658502s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.775064749s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.769043578s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.766367544s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.765441751s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.761491207s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.753628482s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.753706547s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.751520124s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.748699473s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.748728649s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.745165191s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.745235667s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.742317884s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.737841398s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.734233232s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.731525069s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.731588724s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.72597238s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.724493535s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.724580558s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.722938509s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.71310062s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.708483158s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.708503405s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.706522122s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.616059401s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.577987709s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.348751476s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.344788489s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.324087938s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.324066748s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.306472587s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.302453369s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.289047649s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.289096015s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.287332311s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.283898354s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.506929926s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.482430406s
[2022-05-03T12:09:27Z INFO runkv_wheel::service] txn inner takes: 3.37454705s
from runkv.
As in #131 , append_log_entires
may be the biggest issue for the high latency.
The raft latency in theory is: lat = leader I/O + network RTT + follower I/O + network RTT + fsm apply
, and we can see in metrics that the service latency basicly match the formula, append_log_entries
latency is the largest item in it.
from runkv.
Currently:
benches:
- 10 groups, 100 concurrency.
- 10 groups, 1 concurrency.
- 1 group, 1 concurrency.
from runkv.
On Ubuntu Server LTS 22.04, i7-10700, KBG40ZNS256G NVMe KIOXIA 256GB.
fio fsync latency:
fsync/fdatasync/sync_file_range:
sync (usec): min=894, max=36601, avg=1330.50, stdev=872.29
sync percentiles (usec):
| 1.00th=[ 988], 5.00th=[ 996], 10.00th=[ 1037], 20.00th=[ 1057],
| 30.00th=[ 1090], 40.00th=[ 1106], 50.00th=[ 1139], 60.00th=[ 1156],
| 70.00th=[ 1172], 80.00th=[ 1221], 90.00th=[ 2638], 95.00th=[ 2737],
| 99.00th=[ 2868], 99.50th=[ 2966], 99.90th=[ 3523], 99.95th=[19792],
| 99.99th=[35914]
from runkv.
Close.
from runkv.
Related Issues (20)
- optimize concurrent writes for ObjectLsmTree memory states
- purge raft log files with low utilization rate HOT 1
- introduce tracing with opentelemetry and jaeger HOT 1
- refactor: rename group in raft log store
- bug: some requests are not applied or applied out of order without error HOT 1
- bug: abnormal fsync latency jitter HOT 3
- bug: multi version of one key are separated into two sstables in the same level HOT 1
- bug: bench_kv sticks HOT 4
- bug: tend to delete not-exist sstable after comaction
- bug: raft log store read data race HOT 1
- chore: Celebrate my 100th commit for RunKV HOT 1
- File Cache as Secondary Cache HOT 1
- introduce connection pool
- replace unnecessary Bytes with Vec<u8> and &[u8] in storage module HOT 2
- bug: concurrent put & get found None but Some expected HOT 2
- Tracking: raft WAL
- enhance: avoid unnecessary buffer copy in raft log store
- enhance: better write model for batching and pipeline
- bench: port block iterator micro bench HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from runkv.