rust-lang / measureme Goto Github PK

Support crate for rustc's self-profiling feature

License: Apache License 2.0

Rust 100.00%

measureme's Introduction

measureme

Support crate for rustc's self-profiling feature

This crate is maintained by the Rust compiler team and in particular by the self-profile working group. It is currently only meant to be used within rustc itself, so APIs may change at any moment.

Tools

measureme

measureme is the core library which contains a fast, efficient framework for recording events and serializing them to a compact binary format. It is integrated into rustc via the unstable -Z self-profile flag.

Documentation

summarize

summarize produces a human readable summary of measureme profiling data. It contains two main modes:

summarize which groups the profiling events and orders the results by time taken.
diff which compares two profiles and outputs a summary of the differences.

Learn more

stack_collapse

stack_collapse reads measureme profiling data and outputs folded stack traces compatible with the Flame Graph tools.

Learn more

flamegraph

flamegraph reads measureme profiling data and outputs Flame Graph.

Learn more

crox

crox turns measureme profiling data into files that can be visualized by the Chromium performance tools.

Learn more

measureme's People

Contributors

Stargazers

Watchers

measureme's Issues

Store more information in headers

Taking inspiration from perf on Linux via the description of it's file format, we thought that storing the number of threads, CPU type, and available memory on the system would be a good start. All of these should be optional, though.

Relevant discussion on Zulip: https://rust-lang.zulipchat.com/#narrow/stream/187831-t-compiler.2Fwg-self-profile/topic/process-id.20in.20events/near/165803887

Breaking change: Rename "timestamp" methods to "count"

#143 (comment)

Investigate better error handling

We should investigate what the current best practices are for producing error messages from cli tools. This came up in the context of #36 (see this comment in particular). At the very least, we should be more careful about std::process:exit().

Failing to write profiling data to disk should not panic.

The following try-build crashed because it could not write profiling data to disk (insufficient permissions):

https://travis-ci.com/rust-lang/rust/jobs/192490610

Preferably the compiler should not crash/panic because of that. Better to just print a warning to stderr. The panic happened in the Drop impl of the MmapSerializationSink.

SerializationSink::from_path() should return a Result<>

... so we can handle errors gracefully instead of panicking.

Add [badges] section to Cargo.toml

So the travis badge shows up on crates.io.

panic! when analyzing profiler events

I dumped some profiler events using -Z self-profile running rustc with an own stage2 build (1.36.0-dev on ubuntu 18.04). I also used that one to compile measureme (0.3.0-4-gbb14545). Running ./summarize pid-{pid} yields

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `"GenericActivity"`,
 right: `"Query"`', summarize/src/analysis.rs:150:17

Attached are two sets of profiler events that produce the error.

called unwrap() on None

$ summarize summarize stm32h7xx_hal-21900.mm_profdata 
thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', analyzeme/src/profiling_data.rs:58:71

thread 'main' panicked at 'called `Option::unwrap()` on a `None` value', analyzeme/src/profiling_data.rs:58:71
stack backtrace:
   0: rust_begin_unwind
             at /rustc/9d78d1d02761b906038ba4d54c5f3427f920f5fb/library/std/src/panicking.rs:495:5
   1: core::panicking::panic_fmt
             at /rustc/9d78d1d02761b906038ba4d54c5f3427f920f5fb/library/core/src/panicking.rs:92:14
   2: core::panicking::panic
             at /rustc/9d78d1d02761b906038ba4d54c5f3427f920f5fb/library/core/src/panicking.rs:50:5
   3: analyzeme::profiling_data::ProfilingData::new
   4: summarize::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

measureme/analyzeme/src/profiling_data.rs

Line 58 in 16688a5

let string_data = split_data.remove(&PageTag::StringData).unwrap();

I guess the generated file is invalid? Not sure why that would happen, it didn't spit any errors at me.

Consider getting rid of string index data file

The new file encoding for the string table introduced in #90 still has some unused bit-prefixes in its grammar that would allow it to efficiently (i.e. without additional space overhead) encode the string index data table as part of the regular string table data. The index entries would just be interspersed with the string components and the grammar would make sure that we can parse them out properly again.

The only reason I'm a bit hesitant to implement this is that it would require tools to always parse the entire string table once on startup in order to find all the index entries. It's just a single pass and the parsing is not more complicated than parsing a huge UTF-8 string, but it would add a bit of overhead to every tool invocation.

On the positive side it would allow us to get rid of one of the three files we generate when recording a profile.

Add "incl. time" column to summarize and perf.rlo

Many tools show both the "self-time" of an event and the time that also includes the child events. I found myself wanting that feature for perf.rlo in a number of cases recently. Others seems to think the same: #104 (comment).

I think I would prefer the caption "incl. time" for this since we already use the term "total time" for the total amount of time spent in all threads.

Add memory consumption events

There should be an event kind that records the current memory consumption at a given point in time. This is something rustc's -Ztime-passes can do but self-profiling can't.

Integrated FlameGraph output

We already have a tool to generate collapsed stacks from profile data. However, this still needs to be piped into the FlameGraph script to generate svg output. We should integrate inferno to generate the svg without needing external scripts.

Store some metadata about the profile in a reserved string table slot.

It might be useful to store data about a collected profile with the profile itself (like process-id, start-time). It occurred to me that we could just pack this data into a JSON string and store that under a pre-reserved StringId in the string table. Then we don't need to implement this in the more cumbersome binary format.

cc @wesleywiser

Provide a tool to see what user code is causing rustc to use lots of time

We should have a way to calculate how long each item in your codebase takes to compile. Conceptually this would work by grouping the measureme data by DefId/NodeId or some other identifier rather than by the query name. We can then output a sorted list of items by how long each takes.

For example, the output might look like this:

$ summarize code-profile /path/to/rustc-results

Total time in rustc: 120 seconds

----------------------------------------
| % time | Item                        |
| ------ | -----------------------------
| 20.4%  | example::foo::bar::clone()  |
| 10.2%  | example::baz::widget::bla() |

(more rows)

This will require changes to the profiling code in rustc to record DefIds or NodeIds for each query.

summarize doesn't work if you specify your own event kinds

In Boa we are working with measureme to measure performance, our main use case is the Chrome Profiler, but we were interested in using summarize also.

The solution seems to be to use the generic eventKind but that would mean having to remove our own kinds and on Chrome it would all be one colour, this would not be ideal for us.

Our output when using:

Ignoring event with unknown event kind `value`
Unexpectedly enountered event `Event { event_kind: "value", label: "set_field", additional_data: [], timestamp: Interval { start: SystemTime { intervals: 132353512591167719 }, end: SystemTime { intervals: 132353512591168006 } }, thread_id: 1 }`, while top of stack was `Event { event_kind: "init", label: "make_builtin_fn: toString", additional_data: [], timestamp: Interval { start: SystemTime { intervals: 132353512591167595 }, end: SystemTime { intervals: 132353512591168321 } }, thread_id: 1 }`. Ignoring.
Ignoring event with unknown event kind `value`
Unexpectedly enountered event `Event { event_kind: "object", label: "Object::set", additional_data: [], timestamp: Interval { start: SystemTime { intervals: 132353512591167802 }, end: SystemTime { intervals: 132353512591167977 } }, thread_id: 1 }`, while top of stack was `Event { event_kind: "value", label: "set_field", additional_data: [], timestamp: Interval { start: SystemTime { intervals: 132353512591167719 }, end: SystemTime { intervals: 132353512591168006 } }, thread_id: 1 }`. Ignoring.
Ignoring event with unknown event kind `object`
Unexpectedly enountered event `Event { event_kind: "object", label: "Object::get_internal_slot", additional_data: [], timestamp: Interval { start: SystemTime { intervals: 132353512591167920 }, end: SystemTime { intervals: 132353512591167934 } }, thread_id: 1 }`, while top of stack was `Event { event_kind: "object", label: "Object::set", additional_data: [], timestamp: Interval { start: SystemTime { intervals: 132353512591167802 }, end: SystemTime { intervals: 132353512591167977 } }, thread_id: 1 }`. Ignoring.
Ignoring event with unknown event kind `object`
Unexpectedly enountered event `Event { event_kind: "object", label: "Object::get_own_property", additional_data: [], timestamp: Interval { start: SystemTime { intervals: 132353512591167897 }, end: SystemTime { intervals: 132353512591167908 } }, thread_id: 1 }`, while top of stack was `Event { event_kind: "object", label: "Object::set", additional_data: [], timestamp: Interval { start: SystemTime { intervals: 132353512591167802 }, end: SystemTime { intervals: 132353512591167977 } }, thread_id: 1 }`. Ignoring.
Ignoring event with unknown event kind `object`
Unexpectedly enountered event `Event { event_kind: "value", label: "From<String>", additional_data: [], timestamp: Interval { start: SystemTime { intervals: 132353512591167881 }

Issue:
boa-dev/boa#317

measureme release 9.0.0

Shouldn't this release be 0.9.0 rather than 9.0.0, it must've been a typo?

summarize could show min/max times and other useful stats.

We're investigating a case where a query is taking 30-40ms on average, but by using crox we can see instances of that query being over 200ms sometimes, with the maximum around 330ms.

Would be nice to get this information from summarize itself.

Add a CI badge to the main readme

Because it looks nice :)

Cut new releases?

measureme itself is at 0.7.1, an entirely respectable version number. However, recent commits have grown its capabilities and updated its dependencies. It's been a while, so it should probably get to grow up to be 0.7.2 or 0.8.0, along with some of the other workspace members.

Implement performance tests for SerializationSink impls

That might be helpful.

Is this link broken?

The below link seems to broken.

https://github.com/rust-lang/measureme/blame/34792560521a30030b2949b2c9dce881dea77852/README.md#L8

Is this link intended?
https://rust-lang.github.io/compiler-team/working-groups/self-profile/

If the above is correct, will open pull request.

Event recording & summarize need cleanup with respect to self-time vs incr-load-time vs blocked-time vs total-time.

At the moment, we have not specified how self-time and other kinds of measurements interact with each other:

Does the incremental cache loading time contribute to a queries self-time?
In the incremental cache loading case, is the loading event nested inside a regular query-provider event?
Should we distinguish between actual cache loading, metadata loading, and provider re-executions caused by missing cache entries? If so, how?
Do query-blocked time contribute to self-time? How about total-time?
Is a query-blocked event always followed by a cache-hit event (the latter of which would then be responsible for incrementing the invocation count) What if cache-hit events are filtered out? Is it OK to have wrong invocation counts?

Possible integration of stabilizer

Hi,
I came across a very interesting talk by Prof Emery Berger on how the placement of functions in assembly and even the choice of memory layout might influence the time measurements. I have been benchmarking some parallel sorts using criterion as of now. I do see up to 50-70ms difference (on a scale of 300ms) in the wall-time of the exact same code when I have different set of benchmarking functions in the same group with it.

I would be very interested in a time-measurement crate that would randomize the instructions layout and the memory layout of objects. Here is a link to the project that does this for C/C++ (as mentioned in the talk) https://github.com/ccurtsinger/stabilizer.

Also, some quick thoughts on possible implementations would be helpful.

Lot of <unknown>

rustc 1.42.0-nightly (3ebcfa145 2020-01-12) with measureme 232db90.

+---------------------------------------------+-----------+-----------------+------------+------------+--------------+-----------------------+
| Item                                        | Self time | % of total time | Item count | Cache hits | Blocked time | Incremental load time |
+---------------------------------------------+-----------+-----------------+------------+------------+--------------+-----------------------+
| <unknown>                                   | 12.00s    | 70.392          | 870543     | 0          | 0.00ns       | 0.00ns                |
+---------------------------------------------+-----------+-----------------+------------+------------+--------------+-----------------------+
| expand_crate                                | 1.12s     | 6.563           | 1          | 0          | 0.00ns       | 0.00ns                |
+---------------------------------------------+-----------+-----------------+------------+------------+--------------+-----------------------+
| resolve_crate                               | 926.07ms  | 5.434           | 1          | 0          | 0.00ns       | 0.00ns                |
+---------------------------------------------+-----------+-----------------+------------+------------+--------------+-----------------------+
| link_rlib                                   | 482.88ms  | 2.833           | 1          | 0          | 0.00ns       | 0.00ns                |
+---------------------------------------------+-----------+-----------------+------------+------------+--------------+-----------------------+
| generate_crate_metadata                     | 358.57ms  | 2.104           | 1          | 0          | 0.00ns       | 0.00ns                |
+---------------------------------------------+-----------+-----------------+------------+------------+--------------+-----------------------+
| build_hir_map                               | 349.00ms  | 2.048           | 1          | 0          | 0.00ns       | 0.00ns                |
+---------------------------------------------+-----------+-----------------+------------+------------+--------------+-----------------------+
| hir_lowering                                | 319.03ms  | 1.872           | 1          | 0          | 0.00ns       | 0.00ns                |
+---------------------------------------------+-----------+-----------------+------------+------------+--------------+-----------------------+
| parse_crate                                 | 267.80ms  | 1.571           | 1          | 0          | 0.00ns       | 0.00ns                |
...

Summarize diffs

When writing a PR to rustc you often want either to track the source of a perf regression or to track an improvement. Absolute values don't matter as much.

To that end, it would be nice if you could do summarize pid-$baseline pid-$change where $baseline could correspond to the master branch of rustc and $change could correspond to your PR which made changes. You could possibly also write summarize diff ... instead to simplify the argument parsing logic and infer less.

Example: I wanted this when working on rust-lang/rust#59288; it was hard to work with two summarize outputs side by side.

"summarize diff" could show relative statistic numbers

I was looking at the output of summarize diff recently and was able to get some useful information (wow was LLVM a lot slower!). I think it'd be useful to also have information, though, for things like "how much slower" or "how much faster" are the two builds. For example it'd be cool to have an inline statistic saying like "LLVM is 100% slower" or "is_copy_raw is 10% slower". The absolute times are great but having relative times can also be good for putting things in context!

`summarize` shouldn't show columns that don't apply to the profile

For example, most people do not enable query cache hit events or use a parallel compiler so the "Cache Hits" and "Blocked time" columns never have values. We should hide those columns in this case.

Add bors and rust-highfive support to this repo.

Instructions by @Mark-Simulacrum and @pietroalbini from discord:

bors will be via https://github.com/rust-lang/rust-central-station/blob/master/homu.toml.template; highfive you need to add a file to here https://github.com/rust-lang/highfive/tree/master/highfive/configs/rust-lang and then ping @pietroalbini or me, we can finish setup
for bors you also need to add the configuration in the team repo for the perms
likely adding another bors.foo.review = true to https://github.com/rust-lang/team/blob/master/teams/compiler.toml#L22; note that the foo there is the bors name that you use in the homu toml file
(which can be the same as repo name, but doesn't have to be)
and adding foo to the src/permissions.rs
I should really spend some time to document stuff

`stack_collapse` panic

Not sure if I'm doing something wrong

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `"codegen_crate"`,
 right: `"codegen_and_optimize_crate"`', stack_collapse/src/stack_collapse.rs:46:17
stack backtrace:
   0: std::sys::unix::backtrace::tracing::imp::unwind_backtrace
             at src/libstd/sys/unix/backtrace/tracing/gcc_s.rs:39
   1: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:71
   2: std::panicking::default_hook::{{closure}}
             at src/libstd/sys_common/backtrace.rs:59
             at src/libstd/panicking.rs:197
   3: std::panicking::default_hook
             at src/libstd/panicking.rs:211
   4: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:474
   5: std::panicking::continue_panic_fmt
             at src/libstd/panicking.rs:381
   6: std::panicking::begin_panic_fmt
             at src/libstd/panicking.rs:336
   7: stack_collapse::stack_collapse::collapse_stacks
             at stack_collapse/src/stack_collapse.rs:46
   8: stack_collapse::main
             at stack_collapse/src/main.rs:34
   9: std::rt::lang_start::{{closure}}
             at /rustc/3c235d5600393dfe6c36eeed34042efad8d4f26e/src/libstd/rt.rs:64
  10: std::panicking::try::do_call
             at src/libstd/rt.rs:49
             at src/libstd/panicking.rs:293
  11: __rust_maybe_catch_panic
             at src/libpanic_unwind/lib.rs:87
  12: std::rt::lang_start_internal
             at src/libstd/panicking.rs:272
             at src/libstd/panic.rs:388
             at src/libstd/rt.rs:48
  13: std::rt::lang_start
             at /rustc/3c235d5600393dfe6c36eeed34042efad8d4f26e/src/libstd/rt.rs:64
  14: main
  15: __libc_start_main
  16: _start

Add thread ID to `Event`

We have a thread id in RawEvent but it's missing from Event.

Consider making string fields in `Event` lazy

There's quite a bit of post-processing code that iterates over events without looking at strings. It might make sense to lazily read strings from the string table.

Unbuffered event stream.

To avoid hitting a Mutex for every event being recorded, and assuming that all the other streams have (much) lower frequency, we could keep the current format, while recording it differently:

after the file header, and after any non-events page, start an events page (i.e. leave enough space for a page header)
writing events happens ("atomically"), just like writing whole pages
- requires the backing storage to be a mmap'd file, we don't want to actually hit a syscall here
  - we also need this for the ability to track what position we are at (using an AtomicUsize)
- should work as we're (i.e. the atomic position is) effectively always inside an "open" events page
writing a non-events page (as well as finishing the file) has to write the correct length in the "currently open" events page, for which we'd have to:
- first grab a global (per backing store) lock, to avoid page-writing races with other non-events streams
- claim the range for the new page to write using fetch_add on the same AtomicUsize position (that writing events uses), effectively "closing" the "currently open" events page (new events would effectively end up in a newer page)
- use the "start of the events page" (from the global lock), and the "start of the new non-events page" (claimed above) to compute the correct length to use in the header of the just-closed events page, and to write that header
- update the "start of the events page" (in the global lock) to just after the end of the newly claimed non-events page
- release the lock and start writing the new page (in the claimed range)
  - nothing else will access that range so there's no need to keep the lock any longer

Sadly bypassing File entirely will probably be most of the work here, I don't fully understand why measureme never properly mmap'd a file-on-disk (the pre-paging mmaps were "anonymous", i.e. no different than ungrowable Vec<u8>s).

I'm also not sure how we'd handle not being able to grow the mmap'd region (not without re-adding locking to writing events), we need to be on a 64-bit host to even grab gratuitous amounts of virtual memory, without hampering the rest of the process, don't we?
Maybe for 32-bit hosts we can keep paging events and writing to a File?

cc @michaelwoerister @wesleywiser

Come up with and document policy for adding new event kinds

Right now we have a few event kinds, basically the ones defined in https://github.com/rust-lang/measureme/blob/e6f55c7bcacc1c70b895c858d1840958709b9a34/measureme/src/rustc.rs.

If there is an event that is not recognized some tools will print out a warnings (e.g. summarize). Others might just ignore them.

I think we should document somewhere:

How to add new event kinds
What kind of support is expected for each kind of event (e.g. that all tools should be able to handle all known event kinds gracefully)
How to get approval for adding new events

I think that in general that it should be easy to add new event kinds (implementation-wise) but there should be a thorough review if it's really necessary to have a new event kind or if an existing event kind can be used instead.

A guide for people who know nothing about profiling

It would be nice if there was a thorough guide for people who have very little experience with profiling and in particular wrt. rustc.

If this guide would assume no knowledge about rustc's command line and profilers and hand-hold the reader all the way, that would be even better.

Summarize panics if piped into head

Summarize prints a very long table so I piped it into head to get the start of the output, however it panics:

$ ~/src/measureme/target/debug/summarize pid-8987 | head
+------------------------------------------+-----------+-----------------+------------+------------+--------------+-----------------------+
| Item                                     | Self time | % of total time | Item count | Cache hits | Blocked time | Incremental load time |
+------------------------------------------+-----------+-----------------+------------+------------+--------------+-----------------------+
| LLVM_emit_obj                            | 3.72s     | 44.144          | 152        | 0          | 0.00ns       | 0.00ns                |
+------------------------------------------+-----------+-----------------+------------+------------+--------------+-----------------------+
| LLVM_module_passes                       | 873.62ms  | 10.379          | 152        | 0          | 0.00ns       | 0.00ns                |
+------------------------------------------+-----------+-----------------+------------+------------+--------------+-----------------------+
| LLVM_make_bitcode                        | 608.96ms  | 7.235           | 152        | 0          | 0.00ns       | 0.00ns                |
+------------------------------------------+-----------+-----------------+------------+------------+--------------+-----------------------+
| typeck_tables_of                         | 484.63ms  | 5.758           | 953        | 0          | 0.00ns       | 0.00ns                |
thread 'main' panicked at 'Cannot print table to standard output : Broken pipe (os error 32)', /home/alice/.cargo/registry/src/github.com-1ecc6299db9ec823/prettytable-rs-0.8.0/src/lib.rs:194:23
note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

Should `summarize` have way of specifying which columns to emit?

There's lots of information that is interesting for investigating certain performance issues (e.g. #67) but that might be considered unnecessary clutter in the default case.

Maybe summarize should have an option to specify which columns are emitted in which order, like lsblk's -o option.

Having a way of filtering the output would lower the bar for adding new columns for less common cases.

The alternative is to provide no filtering and let downstream tools (like perf.rlo) take care of only processing the needed information.

Tools should use FxHashMap instead sip-hash based hash maps or BTreeMaps

FxHashMap from rustc-hash is fast and we are doing lots of hashing.

Could not find .mm_profdata file

I'm using for the first time measureme to profile the compilation time of a project which needs to be compiled with nightly-2020-09-30. I followed this guide and I'm getting the following error.

Error: "Could not find profiling data file `prusti_driver-32670.mm_profdata`.It looks like your profiling data has been generated by an out-dated version of measureme (0.7 or older)."

Is it because the measureme version that comes with nightly-2020-09-30 is out-dated? Do you know any workarounds?

Panic in summarize

$ cargo new hello
$ cd hello
$ RUSTFLAGS=-Zself-profile cargo build
$ ~/measureme/target/release/summarize summarize hello-750698
thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `"codegen_and_optimize_crate"`,
 right: `"codegen_crate"`', summarize/src/analysis.rs:121:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

tool idea: optimization simulator

It would be useful to have a tool that can simulate the impact of a hypothetical optimization to a function/query. For example, let's say someone has an idea that allows speeding up typeck_tables_of by 10% but it's kind of complicated and they don't know if they should put in the effort of implementing it. With all the caching going on in the compiler it is non-trivial to predict the effect of such an optimization on total compilation time.

However, given a concrete profile, it should be possible to compute an approximation of the same profile with the hypothetical speed-up applied. For the single-threaded case it should be pretty accurate even.

Filter out small items from summarize table

Since the table printed by summarize is intended to be read by humans, we should avoid printing every simple item, since it results in many lines of output (1000 lines for the regex crate).

I propose to provide a parameter to filter lines, so only those items worth more than some percentage of the total runtime are shown. (or similar)

As for choosing a default value, one should probably experiment a bit to figure out where the long tail is. For reference the items with runtime > 1% are 76% of the total runtime on the regex crate.

Single file on disk format

Right now, measureme records data into three files for each profiling session: one for event data, one for string data and one which maps string ids to positions in the string data file. We've often gotten requests to consolidate this down to a single file.

One idea I've been thinking about is to used fixed size pages to store the event data and string data:

--------------------------------------------------------------------
| event0 | event1 | eventN... |     stringN... | string1 | string0 |
--------------------------------------------------------------------

Events would be written from the beginning of the page to the end and strings would be written from the end of the page to the beginning. Once there is insufficient space in the page to write a new event or a new string, a new page would be created and the data would be written there. Of course, we'll want to know how many events there are and where the string data starts in the page, so we'll probably need a header block at the start of the page.

That takes care of two of the three data types but we still have the string index data. To handle that, we can have dedicated pages which just contain the string index data.

The resulting data file might look a bit like this:

----page0-----------------------------------------------------------
| header | is_string_index=1 | event_count=0 | string_start_idx=0
| idx0 | idx1 | idxN... |
--------------------------------------------------------------------

----page1-----------------------------------------------------------
| header | is_string_index=0 | event_count=n | string_start_idx=x
| event0 | event1 | eventN... |     stringN... | string1 | string0 |
--------------------------------------------------------------------

----pageN-----------------------------------------------------------
...
--------------------------------------------------------------------

On top of this, there are quite a few enhancements we can add:

String index pages can have a "forward pointer" to the next string index page so we don't have to scan every page linearly.
Header blocks in the pages give us some scratch space to record additional data or metadata
We can also change our approach to multi-threaded writing. For example, we can hand pages out per-thread which would then allow us to do concurrent writes without any locking since each thread is writing to different pages. Adding new pages could be done with an atomic add operation so no locks would be required anywhere.

cc @michaelwoerister

Reduce Chrome profiler file size by using Complete events

Right now, the crox tool emits one Begin event and one End event for each non-instant event. The Complete event combines the Begin and End events together which should substantially reduce the size of the json file (up to half).

Chrome Profiler File Format Docs

Move off beta channel

Now that broader atomics support has shipped in Rust 1.34, we can move off the beta channel onto stable.

Release to crates.io

@michaelwoerister unless you have any objections, after #18 and #19 merge, I'd like to release a version to crates.io so we can get rust-lang/rust#59515 merged.

This issue tracks the release on crates.io.

I propose that we use pre-release versions when uploading to crates.io as we don't have any stability guarantees at this point. Therefore, the first release will be 0.1.0 and the next release will be 0.2.0, etc.

Add versioning to the binary profile format

Right now the files generated by rustc -Zself-profile don't contain any indication of what their concrete encoding is. That should change so that post-processing tools support more than one version or at least give a sensible error message when encountering an unsupported encoding.

For example, each file could start with a some file magic (b"MMES" for the event stream, b"MMSD" for the string table data, and b"MMSI" for the string table index) and a 4 byte little-endian version number that can be used to select the right decoder.

Validate `unsafe` blocks are actually giving us a performance improvement

We have various places in measureme where we do low-level unsafe operations for efficiency. Example:

measureme/measureme/src/raw_event.rs

Lines 103 to 113 in 8d2d4fd

 #[cfg(target_endian = "little")] 

 { 

 let raw_event_bytes: &[u8] = unsafe { 

 std::slice::from_raw_parts( 

 self as *const _ as *const u8, 

 std::mem::size_of::<RawEvent>(), 

 ) 

 }; 

 bytes.copy_from_slice(raw_event_bytes); 

 }

We should validate that these places are actually improving performance over their safe counterparts.

summarize: Error: "Event stream file format version \'0\' is not supported by this version of `measureme`."

With latest nightly and measureme master, I get :
Error: "Event stream file format version '0' is not supported\n by this version of measureme."

`summarize` has no concept of "wall clock time"

This issue is a stub. The short version is: summarize uses total CPU time as reference of computing percentages. But there is also wall clock time, which might be more meaningful in parallel contexts.

Make measureme tools available as rustup component

The measureme version in the compiler and the corresponding tools must stay in sync in order for things to be really reliable. I suspect that the best way of doing this in the rustc ecosystem is to make the measureme suite available as a rustup component. This would have the following benefits:

Compiler and tools would stay in sync "automatically".
It's easy to use measureme with multiple toolchains on the same system (which would be a pain otherwise).
It's simple for end users to get ahold of the tools.
- This might also extend to perf.rlo which also needs a compatible version of the tools involved.

@Mark-Simulacrum & @wesleywiser, do you agree that the rustup component would actually have these benefits? (My knowledge about rustup components is mostly accidental)

I assume that measureme would have to be part of the main Rust repository somehow (as a submodule probably), which seems fine.

	#[cfg(target_endian = "little")]
	{
	let raw_event_bytes: &[u8] = unsafe {
	std::slice::from_raw_parts(
	self as const _ as const u8,
	std::mem::size_of::<RawEvent>(),
	)
	};

	bytes.copy_from_slice(raw_event_bytes);
	}

rust-lang / measureme Goto Github PK

measureme's Introduction

measureme

Tools

measureme

summarize

stack_collapse

flamegraph

crox

measureme's People

Contributors

Stargazers

Watchers

Forkers

measureme's Issues

Recommend Projects

Recommend Topics

Recommend Org