The await-tree from risingwavelabs

Maybe add throttling for the "GC" operation while tracing new root futures?

I have noticed there is a TODO in Registery::register:

Lines 79 to 81 in 334b563

 pub fn register(&mut self, key: K, root_span: impl Into<Span>) -> TreeRoot { 

 // TODO: make this more efficient 

 self.contexts.retain(|_, v| v.upgrade().is_some());

I guess this will make spawning an task become O(m) operation where m is the current running task. Perhaps we can add a new config like gc_throttle_duration: Option<Duration> (default None). Once it become Some, the registry will maintain the coarse instant of last GC, new register callls within this period again won't trigger GC anymore.

I have written a simple bench to test that spawns many background tasks(Even not really usual in real word, this case was usually used to show how coroutines are lighter that threads):

Details

    fn traced_multi_thread(size: usize) {
        let pool = tokio::runtime::Builder::new_multi_thread()
            .enable_time()
            .build()
            .unwrap();
        let futs = (0..size)
            .map(|_| {
                pool.spawn(root!(async {
                    tokio::time::sleep(std::time::Duration::from_millis(100)).await
                }))
            })
            .collect::<Vec<_>>();
        pool.block_on(futures::future::join_all(futs));
    }

Where root is a simple macro registers the current line to the register function. What it did like:

    pub fn root<T>(
        fut: impl Future<Output = T>,
    ) -> impl Future<Output = T> {
        let id = TID.fetch_add(1, SeqCst);
        REG.lock().unwrap().register(id, concat!(file!(), ":", line!(), ",", column!())).instrument(fut)
    }

The result shows comparing to the baseline with 10,000 tasks , there was a 3x performance regression, which is huge I think.

running 2 tests
test bench_async_trace::multi_thread::baseline_multi_thread_10000 ... bench: 101,856,955 ns/iter (+/- 345,245)
test bench_async_trace::multi_thread::traced_multi_thread_10000   ... bench: 308,596,986 ns/iter (+/- 3,973,652)

I haven't do further works now. If somewhere I'm wrong during benching or something else please let me know. If the new option is acceptable I'm glad to do further perfing to test whether the solution is effect and providing a patch.

How to add instrument for functions returns `Poll<T>`?

Hi, OpenDAL is currently developing an AwaitTreeLayer to give our users the ability to dump the execution tree at runtime: apache/opendal#2623

The missing part here is that how can we add instrument for functions returns Poll<T>? For example, AsyncRead will have API like the following:

pub trait AsyncRead {
    // Required method
    fn poll_read(
        self: Pin<&mut Self>,
        cx: &mut Context<'_>,
        buf: &mut [u8]
    ) -> Poll<Result<usize, Error>>;

Can we add instrument for every poll operations?

risingwavelabs / await-tree Goto Github PK

await-tree's People

Contributors

Stargazers

Watchers

Forkers

await-tree's Issues

Maybe add throttling for the "GC" operation while tracing new root futures?

How to add instrument for functions returns `Poll<T>`?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

	pub fn register(&mut self, key: K, root_span: impl Into<Span>) -> TreeRoot {
	// TODO: make this more efficient
	self.contexts.retain(\|_, v\| v.upgrade().is_some());