Git Product home page Git Product logo

metrics's People

Contributors

0nkery avatar ashtonkem avatar bratsinot avatar dav1dde avatar david-perez avatar dependabot[bot] avatar flub avatar fornwall avatar fredr avatar hlbarber avatar jean-airoldie avatar jeromegn avatar jmagnuson avatar kathoum avatar kilpatty avatar kinrany avatar look avatar mozgiii avatar nilstrieb avatar nobles5e avatar nrxus avatar randers00 avatar robjtede avatar rodrigopandini avatar rukai avatar str4d avatar superfluffy avatar tobz avatar twz123 avatar zohnannor avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

metrics's Issues

We should try to create a PR for metered to add metrics-core support.

End goal: metered can use metrics as a backend.

Two main things need to happen here:

  • we would need to create a new trait in metrics-core to model ingestion
  • create a PR for metered to add IngestTraitHere support to the macro that defines the registry

We already roughly plan on having a trait to decouple the ingest side (#15), and I think we would need that to be completed before we could realistically expect a 3rd-party crate to accept a PR adding plugin-esque capabilities. The trait would lend some credibility to it being generic and not just a niche thing... even if we're the only metrics-core-compatible library currently. :)

Allow updating a counter to a fixed value

There are cases where a counter is the semantically right choice, but instead of counting it up ourselves the application has to poll this counter value from somewhere else.

My particular use case that would call for this is collecting a process' CPU times from the operating system. (How many seconds of CPU time has this process spent running on the CPU since its start?) - This metric can then be processed later by the metrics system and converted into percent cpu usage. This way of counting CPU load seems to be common, e.g. see here

Two issues are present here: First, the amount of seconds is a continuum - there are no discrete increments. (However that issue is discussed in #60).
The second issue I want to highlight here is that this amount of seconds is polled from the operating system, and is typically returned as a fixed value that the OS has counted "for you" since the start of the process. Essentially this means you end up periodically running a polling function against the operating system, and then have some amount of seconds that you would want to update a counter to be.

Currently, one has to track the difference between invocations of the polling routine and the use counter! to increment the counter, which is cumbersome.

Meter for core metrics?

Meter is a frequently used metrics for measuring throughput, for example, the QPS of our service. In coda-hale's original metrics library, there is meter that can report rate in 1-, 5-, 15- minutes.

Is there any reason for the absence of meter? And is there any alternative solution to measure this type of metrics in current core primitives?

Lazy metrics evaluation

I'm currently using rust-prometheus in my applications for monitoring, but I'm interested in this crate since it decouples the actual metrics for the exporter. Also the performance overhead seems much lower. However I have a few questions regarding a specific use case I have.

I'm using prometheus to export process specific metrics such as cpu usage, disk usage, and network usage, provided by the OS. However, since actually getting of these metrics is relatively expensive, for instance parsing the content of the /proc/<pid>/ directory, it must be done infrequently. Is it possible to have such lazy evaluated metrics that would only be evaluated when required? Or would I need to manually evaluate these metrics at a given time interval and push them?

Prometheus observer does not group by type

I'm not sure if this is the case with the prometheus scraper, but when using telegraf it is required that all metrics of the same TYPE be grouped together, even if they have different label values.

Currently this happens:

# TYPE metric_name counter
metric_name{label1="value1"} 1

# TYPE metric_name counter
metric_name{label1="value2"} 1

Desired output:

# TYPE metric_name counter
metric_name{label1="value1"} 1
metric_name{label1="value2"} 1

Parse metrics with constant keys/labels in macros

Currently an invocation like increment!(my_labels::METRIC_KEY, my_labels::METRIC_LABEL => label_value); will fail to parse in the macro (e.g., "expected literal" or "unexpected token").

It would be nice if using string constants would in these positions would be an option, so I don't have to repeat string literals in a bunch of places if the same metric can be produced from multiple sources (e.g. multiple different implementations of the same trait).

Exporter should handle the Recorder convertion logic.

Currently a Recorder is converted to a String via Into<String> trait. I think a better approach would be to provide a trait method to iterate over the contents of the Recorder so that an Exporter can implement the conversion logic directly. So the Into<String> trait would not be required.

Meta issue for 1.0 release.

Just creating this for some of the more abstract aspects of a 1.0 release:

  • adjust all the docs to point to metrics-runtime as the landing page since metrics is now simply the facade crate
  • ensure that every single crate has: deny missing docs, passing tests, formatted

Add common recorders/exporters to facade modules in main crate?

It's a bit cumbersome to have to important the exporters/recorders i.e. use metrics_exporter_log::LogExporter when they could do something like use metrics::exporters::LogExporter which is way more normal.

We could probably feature flag it for people who don't want the code bloat but I think this is probably what most people would prefer.

JSON and YAML observer silently drop metrics

If you have

counter!("a", 1);
counter!("a.b",1);

Both the JSON and YAML observers will only emit the "a" metric, and "a.b" will be dropped (or vice-versa). The prometheus observer does the correct thing.

I can't find this behaviour documented anywhere.

To fix this would require changing the JSON and YAML output format. (Or providing an alternative, and deprecating the old one to not break things).

Should ingestion/recording be chainable or seperate?

High level: we likely want to define a trait in metrics-core for the input side of things.

We have Recorder which allows composability of the output side, but we don't yet have a trait to allow composability from the input side. This trait is arguably the most useful because it means plugging in metrics to a system, or any metrics competitor, would Just Work (tm) if they implemented the trait.

Now, the question is: do we add a new trait or do we make the ingest side support Recorder?

Right now, our proscribed way to use metrics is to create a Receiver, which holds everything, and then to use a recorder/exporter to deliver those metrics to the console, or to Prometheus, or whatever external system there is. Sink, in this scenario, would implement some trait that generalizes the recording of counters/gauges/histograms.

Technically, Recorder already has matching signatures to do just that. If both Sink and exporters shared the Recorder trait, they could be literally chained together. In some cases, users may not even want/need Receiver/Sink, and they would just wire up an exporter and be, essentially, collecting/exporting a metric within the same method call.

Now, of course, then they'd have no aggregation of values, and other impractical things, and then there's the question of whether or not it would really make sense to require callers to record histogram values by always having to pass a slice: nobody is logging timing calls with a slice, that's just our concession to make it easier to feed all the recorder values to the exporter without the function call overhead of calling record_histogram for every single datapoint.

All of that said, I think we may want to consider it. A new trait gets us a cleaner looking API contract, but closes the door on potentially interesting uses of chaining these components together arbitrarily.

Grafana integration?

Hi, I'd really like to be able to use grafana to visualise metrics. Currently, the best way to do this is to prometheus and connect that to grafana, but I'd much rather run a server that confirms to the grafana json datasource api.

Does your current http exporter use an standard api? Perhaps you could use that one.

Figure out if if there's a way for metered and dipstick to work with metrics-core.

Both @fralalonde and @magnet have expressed interest in making their libraries (dipstick and metered, respectively) compatible with metrics/metrics-core.

Just gonna dump some thoughts here:

Ingestion

We need to come up with an ingestion trait, similar to Recorder. Until we have a trait, we can't reasonably expect these libraries, or others, to interoperate because then they'd just be tying themselves to metrics instead of metrics-core.

I think for dipstick this could be as easy as a new type of output scope that wraps a TheoreticalMetricIngestTrait, and I guess the inverse is also true: it could have an input scope that implements the trait.

For metered, I haven't looked closely enough at the code but given that it generates registry impls via procedural macros, I think it's possible that it could have the auto-generated registries implement the trait and use it? Not sure if it generates a field per specific metric or stores them generically: our plain interface only really works well if you can feed it a string for the metric name and the value, rather than treating it like a field that you map directly to a callsite.

Exporting

Exporting only requires supporting SnapshotProvider and/or AsyncSnapshotProvider.

Both libraries could have their own snapshot type -- just holds all the metric names -> values, really -- and implement the trait. They'd also need to be able to provide a snapshot provider, like Controller, to actually take the snapshot. That type would have to be thread-safe.

OpenCensus / OpenTelemetry compatibility

OpenCensus is an open source initiative for automated metric harvesting, basically this crate for a bunch of languages. (It also does distributed tracing which is a different thing.)

It might be worth looking at its API for inspiration, see what capabilities it offers.

It also has the OpenCensus service, which is a service you can export to via gRPC, which can then export to like 17 other telemetry things. Might be worth providing an exporter to that service.

Note: There's also OpenTelemetry which is going to be the sequel to OpenCensus but is currently mostly vaporware.

log-style API proposal

I'm excited people are working on a metrics library for rust! I've been wanting one for a while. Your reddit post said you were okay with big API design ideas, so here are some i've been noodling on :)

The design I was contemplating is something like log, where there's a tiny facade crate with simple macros that library authors can use freely, that calls into a global trait object for functionality; binaries can then install an implementation that does what they want.

E.g., library code can do something like:

use metrics::gauge;

gauge!(request_time, 123);

(Or, if we want to get fancy, you could also more metadata, like prometheus-style tags and opencensus-style units:)

gauge!(request_time, 123ns, endpoint="/app/thing", method=http::POST);

You could also have other macros to record counters, histograms, time blocks of code, etc. But it's all just simple macros.

Then, users can install a metrics implementation:

fn main() {
   // note: NOT the core metrics crate!
   magic_metrics_library::setup()
     .with_recorder(PrometheusRecorder::new())
     .with_exporter(HttpExporter::new())
     .filter("hyper") // don't include hyper metrics...
     .install();

   start_http_server();
}

And boom, they're done! That's the whole user facing API. Users don't need to know anything more about the internal details of the library. This is especially nice because it means library authors can just add metrics, fire-and-forget style; the core crate is so small that it adds very little compile-time cost, and also practically no runtime cost. (log without a logger installed has a 1ns overhead per-call, by my measurements.) As a result, when a user installs a metric exporter, they'll hopefully have a load of metrics available for free; assuming the facade crate gains adoption.

Now, the implementation. Basically, the core metrics crate contains a core Metrics trait that allows metrics to be recorded -- the API surface looks similar to the Recorder trait, but its responsibilities are broader; it's responsible for the whole metric recording process. It also contains a global, atomically initialized static METRICS: Box<dyn Metrics> trait object. Then, each gauge!, counter!, or histogram! call simply calls a method on this trait object. (You can look at log to see how this works.)

Note that we don't even have SnapshotProvider / Exporter / etc here! Those are super useful, but they can be moved to a crate for backend implementors -- we don't want their compile time cost in the facade crate. (Note: I've been calling the facade crate metrics in the style of log, but it could be metrics_core instead.)

So, that's the design sketch. Thoughts? I know this is pretty significantly different from the current design of the crate, but no code would actually need to be deleted -- the current code would only need a thin shim to comply with this API. This would also allow users more flexibility -- e.g. library authors don't need to be worried about dragging crossbeam in as a dependency, since the facade crate wouldn't have any dependencies; users pay for what they need.

Key construction from `&str` always allocate

Currently creating a Key from that &str always allocates (unless the &'static str), which is not ideal.

Instead we could represent a Key as the hash of a perfect hash function for a combination of name + label. Since the hash would be a Copy type, we don't need to worry about ownership. So conceptually the Key would be a handle to an allocated name + labels combination. To achieve this we could use a string interning mechanism.

So i think that the metrics recording should be done this way:

  • Create a Key from the hash of a name + labels. This never allocates.
  • Attempt to record a metric using the Key:
    • If this fails (returns an error):
      • Register the Key using the name + labels (this allocates)
      • Record the metric (this will never fail).

This way, each name + labels combination will allocate at most once.

edit: I found a better solution

Allow trailing `,` in macro arguments at `0.13.0-alpha.3`

This is rejected:

counter!("processing_errors", 1,
    "component_kind" => "transform",
    "component_type" => "add_fields",
);

While this works:

counter!("processing_errors", 1,
    "component_kind" => "transform",
    "component_type" => "add_fields"
);

Used to work before at 0.12.

Defer scope resolution to snapshot export?

Might not matter at all, but, could potentially be faster to push a copy of scopes into the snapshot so that we're resolving the scopes as the snapshot is exporting rather than going through it in the critical receiver path.

That would get us very very close to taking a snapshot == memcpy.

Floating point gauge/histograms?

Hi, maybe I'm missing something, but is there a reason that metrics doesn't support f64 values? Percentages are great most of the time, but sometimes you need something less bounded, CPU usage, for example, goes to 400% etc if you have multiple cores.

Metrics redesign: defining metric metadata

High Level Goals

To be able to define metric metadata for consumption by metrics collection systems, or human-readable outputs, in the style of Prometheus HELP text.

Notes

This one could go a few ways, but just to spitball...

An obvious choice, and a forwards-compatible one, is to introduce new macros for defining this metadata. This leaves everything else untouched. This would be coupled with a new trait that defines the metadata specification interface.

This could exist potentially as a separate method to "install" the metadata handler, although I wonder if there's a way to allow progressive enhancement through metrics::set_recorder such that it could take a basic recorder or also a metadata-capable recorder.

Either way, this would imply that there needs to be some out-of-band storage that is hooked up and usable when actual metric recording happens, or whenever exporters/observers need to flush. That part is bit unclear to me, although it becomes much of an issue if we eliminate exporters/observers as a first-class citizen, per the master issue.

Investigate attaching tags to metrics.

Naive thought is that we switch to a composite metric key internally -- an enum, really -- where we have untagged and tagged metrics.

Would be trivial to hash so that normal aggregation worked, and shouldn't add a ton of overhead storage-wise. Biggest issue would likely be the performance of sending the tags with the metric, just in terms of memory being moved around.

We might be able to do some Scope-esque magic with keeping a registry or something, but who knows.

Arc-Swap dependency update

Hi,

I've been using the really amazing cargo-deny (https://github.com/EmbarkStudios/cargo-deny) crate for removing duplicate crates to help speed up build times. metrics-runtime (0.12.0) is using an older version of arc-swap (0.3.11). I was wondering if we could bump that up to 0.4 to match signal-hook-registry and slog-scope.

Separate Recorder method for timing data

Currently, the timing! macro simply calls the record_histogram method on the Recorder with a nanosecond duration as its argument. This works fine if you want to store nanoseconds directly, but if you want to turn those nanosecond values into seconds before storing them, to keep compatibility with other existing metrics, you would have to write a custom recorder that inspected keys and changed the value at runtime for certain histogram metrics, but not all. To solve this, I propose a new method on the Recorder trait:

fn record_timing_histogram(&self, nanoseconds: u64) {
    // Default impl to avoid a breaking change
    self.record_histogram(nanoseconds)
}

This would allow Recorder writers to easily convert the nanosecond measurement to whatever unit they wanted before forwarding the value to their store. Adding a default impl that forwards to record_histogram avoids a breaking change and allows existing Recorders to continue working.

The timing! macro would be changed to internally call record_timing_histogram instead of record_histogram.

Write contributing guide + issue/PR templates.

These are the last important documents that we need to solidify and explain our process for contributing to the project. Pretty self-explanatory, just need to spend the time to write them. :)

Create a JSON observer.

This is a follow-on from #10, where we want to add another observer: JSON.

This would have output similar to the text observer, but just in JSON format. It'd maintain the hierarchy, etc.

Consider renaming metrics-core + metrics-facade

This is just a suggestion - no problem if there isn't interest! I understand names can be a very touchy issue for bikeshedding :)

In my opinion the goal of metrics should be to achieve similar ecosystem penetration to log -- it should be a no-brainer for all rust crates to add metrics, since those metrics will add very little compile-time and run-time cost. I'd love to be able to turn on a prometheus exporter and already have most of my metrics predefined, or add a flag to my CLI app and instantly get a dump of execution telemetry. For that to happen, the metrics crate should provide a dead-simple API and an excellent documentation experience. And the most important part of documentation is names.

Currently, however, the names of the involved crates are a little confusing. The metrics crate is complex and opinionated, whereas basic functionality required for ecosystem interop is relegated to the crates metrics-core and metrics-facade. If the crates were released as-is, it'd be like if fern was named log and log was named log-core. Not the end of the world, but a little confusing, especially for what's aiming to become a fundamental piece of infrastructure.

So I'd like to suggest combining metrics-core + metrics-facade and renaming them to metrics, and renaming the current metrics crate to something like metrics-runtime, or even back to hotmic. This would also avoid privileging metrics over e.g. metered and dipstick -- since they'd all comply with the same basic interop stuff, they'd be on a level playing field.

I do think metrics-facade needs more work before it's ready for prime-time; I've got a PR partway done that offers some decent perf improvements, and I think it should be fairly straightforward to get no_std support fully functional as well.

Thoughts?

metrics-exporter-tcp: drop metrics when no clients are connected.

(This is an issue for the yet-to-be-released metrics-exporter-tcp, an exporter that emits metrics over TCP to downstream clients using a Protocol Buffers-based payload.)

Currently, we always signal to the event loop that a new metric is available for sending, and go through the process of encoding them, enqueueing them, etc. If we have no clients, there's no reason to do all of that work.

We should just be able to add in an atomic counter to track connected clients, and have the ingest codepath short-circuit if no clients are currently connected. We might miss one or two metrics that way when a client first connects, but I don't think it's an issue, really, and it should drop the CPU usage when idle to almost nothing.

Moving to a reusable Recorder

Related to #11, #30.

Currently, each time a snapshot is exported there are multiple (arguably) unnecessary allocations.

Lets say we look at the http metrics exporter code. This a small snippet extracted from https://github.com/metrics-rs/metrics/blob/master/metrics-exporter-http/src/lib.rs#L70.

// This is the main logic for the async event loop for the http exporter.
service_fn(move |_| {
    let recorder3 = recorder2.clone();

    controller2
        .get_snapshot_async()
        .then(move |result| match result {
            Ok(snapshot) => {
                let mut r = recorder3.clone();
                snapshot.record(&mut r);
                let output = r.into();
                Ok(Response::new(Body::from(output)))
            }
            Err(e) => Err(e),
        })
})

The following is some kind of pseudo code representation meant to illustrate the allocations made by this logic.

Controller::get_snapshot_async() {
    MetricsRegistry::get_snapshot() {
        Vec::new(); // allocates to hold metrics.
        ValueHandle::snapshot() {
            for value in values {
                Counter -> (),
                Gauge -> (),
                Histogram -> AtomicWindowedHistogram::snapshot() {
                    Vec::new(); // allocates to hold values
                }
            }
        }
    }
}
Recorder::clone() {
    HashMap::new();
    Vec::new();
}
Recorder::into<String>() {
  String::new();
}

So each time we allocate a new Snapshot and Recorder which is quite expensive.
What I suggest is that we get rid of the Snapshot entity and move to a reusable Recorder.
The logic would become something like this:

service_fn(move |_| {
    // The recorder should keep track of its state internally, so that it can
    // only be written to once.
    recorder.reset();
    // Takes a snapshot of the metrics and write them in the recorder. This might
    // allocate memory if needed.
    match controller.snapshot(&mut recorder) {
        Ok(_) => {
            let output: String = recorder.export();
            Ok(Response::new(Body::from(output)))
        }
        Err(e) => Err(e),
    }
})

This way we could reuse the allocated memory from the Recorder and only allocate the output String.

Admittedly, I am not currently certain that this is actually possible with the futures 0.1 runtime, but I'll work on a PR.

Specifying options for each metric

Are there any recommendations for how to specify options on a per metric basis?

For cloudwatch we currently have two "options", dimensions and units and this is specified through the labels that are sent with each metric. Originally we only had dimensions and since this is already a key-value mapping it seemed natural to directly use labels for this mapping.

However, with ramn/metrics_cloudwatch#18 I would also like the unit of the metric to be possible to specify and for this I hacked it in by assuming that an @unit label specifies this unit, instead of specifying an @unit dimension.

This isn't great though so I was wondering if there is a better way? Or if the metrics crate could offer more for this case?

Ignored profile config for metrics-runtime

When compiling with the toolchain version 1.35, I get the following warning:

warning: profiles for the non root package will be ignored, specify profiles at the workspace root:
package:   /home/h4xor/forks/metrics/metrics-runtime/Cargo.toml
workspace: /home/h4xor/forks/metrics/Cargo.toml

This is due to the crate metrics-runtime which uses a [profile config].
However it is ignored since only a single profile set can be set per workspace (rust-lang/cargo#3249).

Possible solutions:

Can't really tell which one is better since I don't understand the motivation for this profile config in the first place.

Key construction from labels always allocate

Currently creating a key from labels will allocate a vector every time, even if the labels are &'static str.

pub fn from_name_and_labels<N, L>(name: N, labels: L) -> Self

pub trait IntoLabels {

This means that the metrics macros might potentially allocate at every call. Obviously this is not ideal.

A solution from metrics side would be for a Key to be constructible with a label without any allocation. So in the same sense that name is a ScopedString, the labels could something like:
Cow<&'static, [&'static str]>. So for the special case where the labels are a slice of &'static str this would not allocate. I think this would be ideal.

Metrics redesign: labels via context

High Level Goals

Take advantage of existing, contextual data to provide labels for metrics.

Notes

This one is fairly straightforward in concept, and could go many ways in implementation.

More or less, we'd like to be able to do something akin to tracing-error where we simply check if we're currently in a tracing span, and if so, extract its fields and use them as labels for a given metric measurement via counter!, etc.

There's obviously specifics to deal with in terms of which fields should be used, or how should the fields be formatted, but let's first consider what other contextual data might exist and how many any of it in particular is useful.

While the idea of reusing the span context is obvious, what's less obvious to me is how much flexibility there needs to be i.e. is it all fields or nothing, or do we allow picking and choosing?

Issues with macro hygiene at `0.13.0-alpha.3`

I noticed a small issue: when using 0.13.0-alpha.3 code with the non-standard package name

metrics1 = { package = "metrics", version = "0.13.0-alpha" }

the counter! (and, presumably, other macros too) can't find metrics crate. Since those are reexported from the metrics crate, I suppose the idea was to bind to the exact (metrics) crate that does the reexport, regardless of the said crate's name.

This might result in a serious problem for binaries using libraries that have adopted the metrics crate if there are multiple versions in use.

Refactor `Key` to be a struct

Refactoring idea: redefine Key to be similar to common::MetricIdentifier, and just use that everywhere from metrics-core up:

#[derive(Clone, Copy, PartialEq, Eq, Hash)]
struct Key<'_> {
   kind: MetricType,
   name: Cow<str>,
   scope: Cow<str>,
   labels: Cow<[(Cow<str>, Cow<str>)]>,
}

#[derive(Clone, Copy, PartialEq, Eq, Hash)]
enum MetricType {
    Counter,
    Gauge,
    Histogram
}

This would make label support easy + turn a bunch of APIs from record_xyz(name, value) to just record(Key {name, value, scope, labels}). It would also hopefully make defining the Ingest trait really easy (#21).

Consuming external metrics with absolute values

It is very common to expose some external metrics as counters created outside of the current application, for example, by collecting system statistics (ex. total amount of bytes sent via the network interface) or fetching them from other application (as various Prometheus exporters do).
The problem is that these metrics has absolute values and with current API it is possible to increment counters only and there is no way to replace the value.

From my understanding, something like Constant Metrics can be added
to allow one-time loading of the metric values; this idea better to be discussed further.

Naive way would be to add Recorder::set_counter method, which change the value of the inner AtomicU64.
Obviously, it is not the best solution possible, yet, it can work as a temporary workaround till better idea will land the metrics crate; as an addition, it can be feature-gated with something like #[cfg(feature = "unstable")].

Extra links:

Initializing metrics before they're actually updated.

Our internal architecture keeps metrics around forever -- so they never drop out, even if they have no update activity -- but we currently have no way to initialize them.

There's a few options here that I've been pondering:

  • make the sink API register them if a user is getting a proxy object (this needs #13 to land)
  • add a registration API either as a Controller/Sink thing or just literally add it to Sink

The upside to a registration API, I think, is the ability to provide a way to document metrics which could be used for exporters like Prometheus, etc. We'd have to add to metrics-core to support that on the recorder side, but it'd be workable.

We need examples!

We have the benchmark example, but it's a poor example, to be honest. We also have no examples of the full spectrum usage: creating the receiver/sinks, logging metrics, setting up an exporter/recorder, etc.

This could be a single-file example, it could be a toy example app in a separate repo under the metrics-rs organization, anything really.

What other exporters / recorders should we have bundled by default?

The list I came up with in my head was:

  • log exporter (uses the log crate and its macros)
  • HTTP server exporter (would expose an endpoint, probably use hyper)
  • text recorder (organize keys by their scope hierarchy, looks like yaml sort of)
  • Prometheus recorder (Prometheus exposition format, good for HTTP exporter to act as pull endpoint)

I think that covers a lot of basic needs, but I could be convinced to include others by default. Totally not again more workspace crates for other exporters/recorders generally speaking: HTTP client (push), JSON recorder, etc.

Metrics redesign: defining metrics via macros

High Level Goals

Allow ahead-of-time definitions of metrics for both self-documentation and performance.

Notes

One of the biggest divergences between metrics the pluggable facade and metrics-runtime the opioninated metrics storage layer is... the performance. The interface implications of metrics::Recorder make it harder to reduce the overhead in writing to those metrics.

Other metrics crates tackle this by providing macro- or procedural macro-based approaches to statically defining metrics. metrics-runtime tackles this by providing metrics_runtime::Sink which, while not handling metric definition, does provide cached access to handles for updating those metrics. However, this isn't usable in metrics and has some other shortcomings.

On top of that, metrics can't be defined ahead-of-time because all that exists is the interface to record a measurement.

We'd likely want to pair this with #68 by having new macros to define metrics with, with or without accompanying metadata. These macros would follow in the footsteps of tracing_core and its static metadata/"callsite" construction, potentially using the metric key itself as a static string for fast hashing, etc.

A step further would be a much more invasive change, but... we could possibly return/generate handles to metric storage directly in the macros, such that counter!("foo", ...) translated to something like:

define static handle for `foo`
conditionally register `foo`
record counter value via handle

This part is a little hazy because I haven't studied tracing_core well enough but the underlying goal remains the same: have the macros do more upfront work, or change the shape of metrics overall to better support defining metrics in a way that allows us to have cached access to them

Metrics redesign: master issue

This is the master issue for tracking the larger Metrics Redesign (tm).

At a high level, there are a few things we want to do or consider:

  • defining metric metadata (a la Prometheus' HELP text) (#68)
  • defining metrics via macros (tracing_core::Callsite but for metrics) (#69)
  • labels via context (bring tracing-error to `metrics) (#70)
  • dissolve metrics-runtime into its individual parts to support reuse
  • eliminate the notion of exporters and observers, and elevate metrics

Is the recorder cloning approach really the best we can do?

Context:

We made the decision to split exporters and recorders into distinct types. This means that instead of a single crate/type that does, say, Prometheus over HTTP, there would be an HTTP exporter and a Prometheus recorder, separating the transport and presentation concerns. This is all fine and good.

Stemming from this fact, some exporters will naturally want the recorder they're using to give them the output of a snapshot in a particular format, such as a String. This is also fine.

The problem comes in where our Snapshot trait exists: a snapshot should be able to take a Recorder and write all of its data into it. For our existing recorders, they make themselves Clone, with a specialized implementation to keep around things like histogram configuration, but zero out other internal storage. This is because we wouldn't want to record two or more snapshots into a single recorder, as we would totally mess up the values by duplicating or overwriting. On top of that, our current exporter -- the log exporter -- also expects that the recorder will be Into<String> to get the string output... so we always have to clone the original recorder that was passed in since we consume it at the end of recording the snapshot.

There's a few potential alternatives:

  • a type/trait for a recorder "builder", where recorders configure a builder, hand that off to the exporters, and the exporters build a new recorder instance for every snapshot
  • a new trait that works similarly to Clone but is named to indicate that you're getting a "fresh" clone, so that people don't think they're getting a true cloned object
  • a new trait that recorders would implement that mutably gets the "value" of the recorder (so, the string output of everything recorded into it at that point in time) and gives it a chance to clear its internal state... something like Flush or Finalize

Right now, Clone is fine, but I think we're sort of abusing it and could potentially do better.

timing! macro improvements

What do you think of having the timing! macro support the following syntax:

let result = timing!("perf.expensive_work", {
    /* some expensive work */
});

// Which would expand to:
let result = {
    let start = Instant::now();
    let result = { /* some expensive work */ };
    let end = Instant::now();
    Recorder::record_histogram("perf.expensive_work", end - start);

    result
};

As well as:

let result = timing!("perf.expensive_call", expensive_call());

//  Which would expand to:
let result = {
    let start = Instant::now();
    let result = expensive_call();
    let end = Instant::now();
    Recorder::record_histogram("perf.expensive_call", end - start);

    result
};

And maybe if no key is specified, it could default to the location of the call in the src.

let result = timing!(expensive_call());

// Which would expand to:
let result = {
    let start = Instant::now();
    let result = expensive_call();
    let end = Instant::now();
    // Note that this is a static str.
    let key = concat!(file!(), ':', line!());
    Recorder::record_histogram("key", end - start);

    result
};

Sanitize metric names/labels during recording.

At some point, we'll likely want to sanitize metric names and labels when recording them.

For instance, curly braces are meaningful in the Prometheus exposition format, so we'd likely want to turn those into something else. Same with equal signs, etc etc.

Basically, we just need to do a pass over existing recorders to try and help make sure a rogue metric name doesn't blow things up. Either way, a user would have to fix their usage of reserved characters, or change the metric if they don't like our replacement... but at least with the latter, they'd still have metrics flowing.

Investigate mixed workload performance.

Our benchmarks right now can test both the change as producers scale up as well as updating multiple metrics per round, but what we don't currently test is multiple producers hitting unique counters, which is a common use case if multiple producers each have a distinct counter -- maybe the labels are derived from the worker and thus will always be unique -- and are updating it often.

We should make sure these use cases have solid performance: we can only do so much to optimize for contended metrics, but we should make sure that we're not hitting any false sharing traps for truly sharded workloads.

http exporter failures can't be detected at startup

The changes discussed here are in PR #83

While trying to setup a Receiver I am encoutering a issue with the current design of the async_run implementaiton.
As async_run will block the current task it's going to be executed in a spawned task (i.e. tokio::spawn(async {setup_metrics().await})`).
If it's spawned in a task though I have no way to detect it the exporter succesfully bound itself to the port, as it will trigger a panic in the task asynchronously.

async fn setup_metrics(metrics_addr: &str) {
    let receiver = metrics_runtime::Receiver::builder()
        .build()
        .expect("Failed to created a receiver");

    let exporter = HttpExporter::new(
        receiver.controller(),
        PrometheusBuilder::new(),
        metrics_addr.parse().unwrap(),
    );
    receiver.install();
    exporter.async_run().await.expect("Exporer failed");
}

Shouldn't async_run be not async and insted return a future to await on? Instead of

Server::bind(&self.address).serve(make_svc).await

It should return something like

Box::new(Server::bind(&self.address).serve(make_svc))

and the caller will receive a future only if the binding works

Crossbeam-Epoch dependency

Similar to #57, cargo-deny is showing a dependency on an older version of crossbeam-epoch, which has a dependency on arrayvec 0.4.12.

slog brings in a series of dependencies that uses the latest version of arrayvec (0.5.1). Updating to 0.8 of epoch actually removes the arrayvec dependency.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.