Git Product home page Git Product logo

tokio-metrics's Introduction

Tokio Metrics

Provides utilities for collecting metrics from a Tokio application, including runtime and per-task metrics.

[dependencies]
tokio-metrics = { version = "0.3.1", default-features = false }

Getting Started With Task Metrics

Use TaskMonitor to instrument tasks before spawning them, and to observe metrics for those tasks. All tasks instrumented with a given TaskMonitor aggregate their metrics together. To split out metrics for different tasks, use separate TaskMetrics instances.

// construct a TaskMonitor
let monitor = tokio_metrics::TaskMonitor::new();

// print task metrics every 500ms
{
    let frequency = std::time::Duration::from_millis(500);
    let monitor = monitor.clone();
    tokio::spawn(async move {
        for metrics in monitor.intervals() {
            println!("{:?}", metrics);
            tokio::time::sleep(frequency).await;
        }
    });
}

// instrument some tasks and spawn them
loop {
    tokio::spawn(monitor.instrument(do_work()));
}

Task Metrics

Base Metrics

Derived Metrics

Getting Started With Runtime Metrics

This unstable functionality requires tokio_unstable, and the rt crate feature. To enable tokio_unstable, the --cfg tokio_unstable must be passed to rustc when compiling. You can do this by setting the RUSTFLAGS environment variable before compiling your application; e.g.:

RUSTFLAGS="--cfg tokio_unstable" cargo build

Or, by creating the file .cargo/config.toml in the root directory of your crate. If you're using a workspace, put this file in the root directory of your workspace instead.

[build]
rustflags = ["--cfg", "tokio_unstable"]
rustdocflags = ["--cfg", "tokio_unstable"] 

Putting .cargo/config.toml files below the workspace or crate root directory may lead to tools like Rust-Analyzer or VSCode not using your .cargo/config.toml since they invoke cargo from the workspace or crate root and cargo only looks for the .cargo directory in the current & parent directories. Cargo ignores configurations in child directories. More information about where cargo looks for configuration files can be found here.

Missing this configuration file during compilation will cause tokio-metrics to not work, and alternating between building with and without this configuration file included will cause full rebuilds of your project.

The rt feature of tokio-metrics is on by default; simply check that you do not set default-features = false when declaring it as a dependency; e.g.:

[dependencies]
tokio-metrics = "0.3.1"

From within a Tokio runtime, use RuntimeMonitor to monitor key metrics of that runtime.

let handle = tokio::runtime::Handle::current();
let runtime_monitor = tokio_metrics::RuntimeMonitor::new(&handle);

// print runtime metrics every 500ms
let frequency = std::time::Duration::from_millis(500);
tokio::spawn(async move {
    for metrics in runtime_monitor.intervals() {
        println!("Metrics = {:?}", metrics);
        tokio::time::sleep(frequency).await;
    }
});

// run some tasks
tokio::spawn(do_work());
tokio::spawn(do_work());
tokio::spawn(do_work());

Runtime Metrics

Base Metrics

Derived Metrics

Relation to Tokio Console

Currently, Tokio Console is primarily intended for local debugging. Tokio metrics is intended to enable reporting of metrics in production to your preferred tools. Longer term, it is likely that tokio-metrics will merge with Tokio Console.

License

This project is licensed under the MIT license.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in tokio-metrics by you, shall be licensed as MIT, without any additional terms or conditions.

tokio-metrics's People

Contributors

carllerche avatar darksonn avatar davidpdrsn avatar duarten avatar frederik-baetens avatar jotare avatar jschwe avatar jswrenn avatar luciofranco avatar mox692 avatar noah-kennedy avatar sd2k avatar sunng87 avatar xuorig avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tokio-metrics's Issues

Compatibility with Prometheus and pull-based approach in general

Hi there!

I'd like to start exposing tokio runtime metrics as part of my application's prometheus metrics. Unfortunately, there is a number of conceptual differences, which make tokio-metrics not really suitable for this.

Prometheus usually scrapes applications' metrics by calling an HTTP endpoint in equal time intervals. In my practice I've encountered scrape intervals between 15 secs and 5 minutes, it is determined by a trade-off in resolution requirements and available storage resources. In any case, all metric changes between two scrapes are not observable via Prometheus, usually the best practice for that is to implement most metrics as non-decreasing counters and derive frequency properties from that.

Also, since each metric scrape is a network interaction, it can be failed and retried without guarantees that the request really made it through to the process or not. Due to that it's important for a metrics endpoint to be stateless, which is violated in how intervals iterator is implemented. Ideally, there would be no state change at all when retrieving the current state of metrics.

Do you think that tokio-metrics is a good place to implement that kind of stuff or do you believe it targets a different type of metrics here?

cargo test with 3 failures at main branch (e66d2ff654c72868b887f77bb472cf5d9bbbcc07)

~/github.com/tokio-metrics:main@e66d2ff$ RUSTFLAGS="--cfg tokio_unstable" cargo test --all-features
    Finished test [unoptimized + debuginfo] target(s) in 0.15s
     Running unittests (target/debug/deps/tokio_metrics-ec134d5a58bb3238)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests tokio-metrics

running 40 tests
test src/task.rs - task::TaskMetrics::first_poll_count (line 607) ... ok
test src/task.rs - task::TaskMetrics::instrumented_count (line 538) ... ok
test src/task.rs - task::TaskMetrics::mean_poll_duration (line 2001) ... ok
test src/task.rs - task::TaskMetrics::dropped_count (line 568) ... ok
test src/task.rs - task::TaskMetrics::total_fast_poll_count (line 1089) ... ok
test src/task.rs - task::TaskMetrics::mean_slow_poll_duration (line 2222) ... ok
test src/task.rs - task::TaskMetrics::mean_fast_poll_duration (line 2131) ... ok
test src/task.rs - task::TaskMetrics::slow_poll_ratio (line 2046) ... ok
test src/task.rs - task::TaskMetrics::mean_idle_duration (line 1881) ... ok
test src/task.rs - task::TaskMetrics::total_fast_poll_duration (line 1144) ... ok
test src/task.rs - task::TaskMetrics::total_first_poll_delay (line 648) ... ok
test src/task.rs - task::TaskMetrics::total_first_poll_delay (line 697) ... ok
test src/task.rs - task::TaskMetrics::total_first_poll_delay (line 731) ... FAILED
test src/task.rs - task::TaskMetrics::total_idle_duration (line 811) ... ok
test src/task.rs - task::TaskMetrics::total_idled_count (line 770) ... ok
test src/task.rs - task::TaskMonitor (line 306) ... ignored
test src/task.rs - task::TaskMonitor (line 321) ... ignored
test src/task.rs - task::TaskMetrics::total_poll_count (line 989) ... ok
test src/task.rs - task::TaskMetrics::total_poll_duration (line 1054) ... ok
test src/task.rs - task::TaskMetrics::total_scheduled_count (line 850) ... ok
test src/task.rs - task::TaskMetrics::mean_first_poll_delay (line 1811) ... ok
test src/task.rs - task::TaskMetrics::total_slow_poll_count (line 1211) ... ok
test src/task.rs - task::TaskMetrics::total_slow_poll_duration (line 1269) ... ok
test src/task.rs - task::TaskMonitor (line 71) - compile ... ok
test src/task.rs - task::TaskMonitor (line 362) ... FAILED
test src/task.rs - task::TaskMonitor (line 388) ... FAILED
test src/task.rs - task::TaskMonitor (line 413) ... ok
test src/lib.rs - (line 12) ... ok
test src/task.rs - task::TaskMonitor::cumulative (line 1571) ... ok
test src/task.rs - task::TaskMonitor (line 452) ... ok
test src/task.rs - task::TaskMonitor::instrument (line 1488) ... ok
test src/task.rs - task::TaskMonitor::instrument (line 1510) ... ok
test src/task.rs - task::TaskMonitor::instrument (line 1530) ... ok
test src/task.rs - task::TaskMonitor (line 281) ... ok
test src/task.rs - task::TaskMetrics::total_scheduled_duration (line 920) ... ok
test src/task.rs - task::TaskMonitor::intervals (line 1632) ... ok
test src/task.rs - task::TaskMonitor::slow_poll_threshold (line 1467) ... ok
test src/task.rs - task::TaskMonitor::with_slow_poll_threshold (line 1406) ... ok
test src/task.rs - task::TaskMetrics::mean_scheduled_duration (line 1920) ... ok
test src/task.rs - task::TaskMonitor (line 24) ... ok

failures:

---- src/task.rs - task::TaskMetrics::total_first_poll_delay (line 731) stdout ----
Test executable failed (exit code 101).

stderr:
thread 'main' panicked at 'overflow when adding duration to instant', library/std/src/time.rs:409:33
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


---- src/task.rs - task::TaskMonitor (line 362) stdout ----
Test executable failed (exit code 101).

stderr:
thread 'main' panicked at 'overflow when adding duration to instant', library/std/src/time.rs:409:33
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


---- src/task.rs - task::TaskMonitor (line 388) stdout ----
Test executable failed (exit code 101).

stderr:
thread 'main' panicked at 'overflow when adding duration to instant', library/std/src/time.rs:409:33
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace



failures:
    src/task.rs - task::TaskMetrics::total_first_poll_delay (line 731)
    src/task.rs - task::TaskMonitor (line 362)
    src/task.rs - task::TaskMonitor (line 388)

test result: FAILED. 35 passed; 3 failed; 2 ignored; 0 measured; 0 filtered out; finished in 7.52s

error: test failed, to rerun pass '--doc'

This is a Mac OSX environment.

Crisper examples of runtime metrics.

For each task metric, it's fairly easy to write a crisp, self-contained example that reliably induces a change in a metric. For runtime metrics, it's currently not so easy to do this, because:

  1. runtime metrics are buffered
  2. some runtime metrics are dependent on scheduling pathologies that are finicky to induce

We could resolve the first obstacle to provide some mechanism to flush metrics on-demand. For the second obstacle, I'm not sure there's much we can do.

Emit task metrics for single invocations instead of interval samples

Hello,

This is a feature request for some way to get the TaskMetrics for the invocation of a single future. Something like:

let monitor = tokio_metrics::TaskMonitor::new();

let (metrics, other_return_value) = monitor.instrument_single(some_future()).await;

The API usage above is not intended to be the actual API, just illustrating the idea. I want this feature is so that I can record metrics for the overhead every single execution of the some_future() future.

The ultimate reason is that I'm trying to write a program that measures the latency of remote service calls, and I want to understand what kind of overhead I'm seeing as a result of using an async runtime, as opposed to a simple blocking thread application. I'd like to see this on a per-request basis so that I can confirm that requests with high latency are only the result of the remote system, not a result of a delay in scheduling the task.

Is it worth tracking/exposing `num_scheduled`?

I think num_scheduled is going to equal num_polls - num_tasks? Need to double-check this, but if so, it doesn't need to be a field in the Metrics struct; it could be computed in a method, instead.

Should it even be exposed? @carllerche points out that this metric matters much more at the runtime level, since there are multiple ways tasks may be scheduled. For task metrics, what matters more is time spent scheduled. At least internally, we need to account for num_scheduled so we can compute mean_time_scheduled, but maybe num_scheduled doesn't actually need to be exposed.

0.1 TODOs

  • clean up Cargo.toml, features
    • should time be an optional feature?
  • proofread documentation
  • decide what the runtime metrics MVP is and fill the gaps
  • set up CI
  • update README
  • blog post

Fix based on changes to yield_now

Due to tokio-rs/tokio#5223, some metrics tests were broken. These need to be fixed.

failures:

---- src/task.rs - task::TaskMetrics::mean_scheduled_duration (line 1924) stdout ----
Test executable failed (exit status: 101).

stderr:
thread 'main' panicked at 'assertion failed: interval.mean_scheduled_duration() >= Duration::from_secs(1)', src/task.rs:34:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/core/src/panicking.rs:64:14
   2: core::panicking::panic
             at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/core/src/panicking.rs:111:5
   3: rust_out::main::{{closure}}
   4: <core::pin::Pin<P> as core::future::future::Future>::poll
   5: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}::{{closure}}
   6: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}
   7: tokio::runtime::scheduler::current_thread::Context::enter
   8: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}
   9: tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}}
  10: tokio::macros::scoped_tls::ScopedKey<T>::set
  11: tokio::runtime::scheduler::current_thread::CoreGuard::enter
  12: tokio::runtime::scheduler::current_thread::CoreGuard::block_on
  13: tokio::runtime::scheduler::current_thread::CurrentThread::block_on
  14: tokio::runtime::runtime::Runtime::block_on
  15: rust_out::main
  16: core::ops::function::FnOnce::call_once
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.


---- src/task.rs - task::TaskMetrics::total_scheduled_duration (line 922) stdout ----
Test executable failed (exit status: 101).

stderr:
thread 'main' panicked at 'assertion failed: total_scheduled_duration >= Duration::from_millis(1000)', src/task.rs:30:5
stack backtrace:
   0: rust_begin_unwind
             at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/std/src/panicking.rs:575:5
   1: core::panicking::panic_fmt
             at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/core/src/panicking.rs:64:14
   2: core::panicking::panic
             at /rustc/fc594f15669680fa70d255faec3ca3fb507c3405/library/core/src/panicking.rs:111:5
   3: rust_out::main::{{closure}}
   4: <core::pin::Pin<P> as core::future::future::Future>::poll
   5: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}::{{closure}}
   6: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}::{{closure}}
   7: tokio::runtime::scheduler::current_thread::Context::enter
   8: tokio::runtime::scheduler::current_thread::CoreGuard::block_on::{{closure}}
   9: tokio::runtime::scheduler::current_thread::CoreGuard::enter::{{closure}}
  10: tokio::macros::scoped_tls::ScopedKey<T>::set
  11: tokio::runtime::scheduler::current_thread::CoreGuard::enter
  12: tokio::runtime::scheduler::current_thread::CoreGuard::block_on
  13: tokio::runtime::scheduler::current_thread::CurrentThread::block_on
  14: tokio::runtime::runtime::Runtime::block_on
  15: rust_out::main
  16: core::ops::function::FnOnce::call_once
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.



failures:
    src/task.rs - task::TaskMetrics::mean_scheduled_duration (line 1924)
    src/task.rs - task::TaskMetrics::total_scheduled_duration (line 922)

test result: FAILED. 56 passed; 2 failed; 2 ignored; 0 measured; 0 filtered out; finished in 35.93s

UI for metrics

Hey there,
is there some kind of (maybe optionally feature gated) integrated UI planned for this?

I'm not really good at web stuff, but I guess I'll integrate a small Chart.js driven one without data retention into my project for now. Should I share that once it's done?

compatibility with tokio

I want to print metrics of Tokio example with master HEAD, then I get below error:


error[E0308]: mismatched types
    --> examples/tinyhttp.rs:40:51
     |
40   |         let runtime_monitor = RuntimeMonitor::new(&handle);
     |                               ------------------- ^^^^^^^ expected struct `tokio::runtime::handle::Handle`, found struct `Handle`
     |                               |
     |                               arguments to this function are incorrect
     |
     = note: expected reference `&tokio::runtime::handle::Handle`
                found reference `&Handle`
     = note: perhaps two different versions of crate `tokio` are being used?
note: associated function defined here
    --> /root/github/tokio-metrics/src/runtime.rs:1015:12
     |
1015 |     pub fn new(runtime: &runtime::Handle) -> RuntimeMonitor {
     |            ^^^

For more information about this error, try `rustc --explain E0308`.
error: could not compile `examples` due to previous error

Full change in Tokio:

diff --git a/.cargo/config b/.cargo/config
index df885898..71097e3c 100644
--- a/.cargo/config
+++ b/.cargo/config
@@ -1,2 +1,5 @@
+[build]
+rustflags = ["--cfg", "tokio_unstable"]
+rustdocflags = ["--cfg", "tokio_unstable"]
 # [build]
-# rustflags = ["--cfg", "tokio_unstable"]
\ No newline at end of file
+# rustflags = ["--cfg", "tokio_unstable"]
diff --git a/examples/Cargo.toml b/examples/Cargo.toml
index b35c587b..e628ceb2 100644
--- a/examples/Cargo.toml
+++ b/examples/Cargo.toml
@@ -10,7 +10,7 @@ edition = "2018"
 tokio = { version = "1.0.0", path = "../tokio", features = ["full", "tracing"] }
 tokio-util = { version = "0.7.0", path = "../tokio-util", features = ["full"] }
 tokio-stream = { version = "0.1", path = "../tokio-stream" }
-
+tokio-metrics = { version = "0.1.0", path = "../../tokio-metrics" }
 tracing = "0.1"
 tracing-subscriber = { version = "0.3.1", default-features = false, features = ["fmt", "ansi", "env-filter", "tracing-log"] }
 bytes = "1.0.0"
@@ -24,6 +24,9 @@ httpdate = "1.0"
 once_cell = "1.5.2"
 rand = "0.8.3"

+
+
+
 [target.'cfg(windows)'.dev-dependencies.windows-sys]
 version = "0.42.0"

diff --git a/examples/tinyhttp.rs b/examples/tinyhttp.rs
index fa0bc669..0457406a 100644
--- a/examples/tinyhttp.rs
+++ b/examples/tinyhttp.rs
@@ -18,8 +18,10 @@ use futures::SinkExt;
 use http::{header::HeaderValue, Request, Response, StatusCode};
 #[macro_use]
 extern crate serde_derive;
+use std::time::Duration;
 use std::{env, error::Error, fmt, io};
 use tokio::net::{TcpListener, TcpStream};
+use tokio_metrics::RuntimeMonitor;
 use tokio_stream::StreamExt;
 use tokio_util::codec::{Decoder, Encoder, Framed};

@@ -33,6 +35,18 @@ async fn main() -> Result<(), Box<dyn Error>> {
     let server = TcpListener::bind(&addr).await?;
     println!("Listening on: {}", addr);

+    let handle = tokio::runtime::Handle::current();
+    {
+        let runtime_monitor = RuntimeMonitor::new(&handle);
+        tokio::spawn(async move {
+            for interval in runtime_monitor.intervals() {
+                // pretty-print the metric interval
+                println!("{:?}", interval);
+                // wait 500ms
+                tokio::time::sleep(Duration::from_secs(1)).await;
+            }
+        });
+    }
     loop {
         let (stream, _) = server.accept().await?;
         tokio::spawn(async move {

Command:

RUSTFLAGS="--cfg tokio_unstable" cargo run --example tinyhttp

Metric integrity in long-running applications.

Is storing durations as u64 nanoseconds enough? It’s 584 years, but if you have 5000 tasks, you’ll burn through it in 42 days of uptime. That sounds doable. At minimum, we should make sure it doesn’t panic on overflow/underflow.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.