Git Product home page Git Product logo

rust-jemalloc-pprof's Introduction

Discord

rust-jemalloc-pprof

A rust library to collect and convert Heap profiling data from the jemalloc allocator and convert it to the pprof format.

To understand how to use this together with Polar Signals Cloud to continuously collect profiling data, refer to the Use with Polar Signals Cloud section.

This code was originally developed as part of Materialize, and then in a collaboration extracted into this standalone library.

Requirements

Currently, this library only supports Linux.

Furthermore, you must be able to switch your allocator to jemalloc. If you need to continue using the default system allocator for any reason, this library will not be useful.

Usage

Internally this library uses tikv-jemalloc-ctl to interact with jemalloc, so to use it, you must use the jemalloc allocator via the tikv-jemallocator library.

When adding tikv-jemallocator as a dependency, make sure to enable the profiling feature.

[dependencies]
[target.'cfg(not(target_env = "msvc"))'.dependencies]
tikv-jemallocator = { version = "0.5.4", features = ["profiling", "unprefixed_malloc_on_supported_platforms"] }

Note: We also recommend enabling the unprefixed_malloc_on_supported_platforms feature, not strictly necessary, but will influence the rest of the usage.

Then configure the global allocator and configure it with profiling enabled.

#[cfg(not(target_env = "msvc"))]
#[global_allocator]
static ALLOC: tikv_jemallocator::Jemalloc = tikv_jemallocator::Jemalloc;

#[allow(non_upper_case_globals)]
#[export_name = "malloc_conf"]
pub static malloc_conf: &[u8] = b"prof:true,prof_active:true,lg_prof_sample:19\0";

If you do not use the unprefixed_malloc_on_supported_platforms feature, you have to name it _rjem_malloc_conf it instead of malloc_conf.

2^19 bytes (512KiB) is the default configuration for the sampling period, but we recommend being explicit. To understand more about jemalloc sampling check out the detailed docs on it.

We recommend serving the profiling data on an HTTP server such as axum, that could look like this, and we'll intentionally include a 4mb allocation to trigger sampling.

#[tokio::main]
async fn main() {
    let mut v = vec![];
    for i in 0..1000000 {
        v.push(i);
    }

    let app = axum::Router::new()
        .route("/debug/pprof/heap", axum::routing::get(handle_get_heap));

    // run our app with hyper, listening globally on port 3000
    let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await.unwrap();
    axum::serve(listener, app).await.unwrap();
}

use axum::http::StatusCode;
use axum::response::IntoResponse;

pub async fn handle_get_heap() -> Result<impl IntoResponse, (StatusCode, String)> {
    let mut prof_ctl = jemalloc_pprof::PROF_CTL.as_ref().unwrap().lock().await;
    require_profiling_activated(&prof_ctl)?;
    let pprof = prof_ctl
        .dump_pprof()
        .map_err(|err| (StatusCode::INTERNAL_SERVER_ERROR, err.to_string()))?;
    Ok(pprof)
}

/// Checks whether jemalloc profiling is activated an returns an error response if not.
fn require_profiling_activated(prof_ctl: &jemalloc_pprof::JemallocProfCtl) -> Result<(), (StatusCode, String)> {
    if prof_ctl.activated() {
        Ok(())
    } else {
        Err((axum::http::StatusCode::FORBIDDEN, "heap profiling not activated".into()))
    }
}

Then running the application, we can capture a profile and view it the pprof toolchain.

curl localhost:3000/debug/pprof/heap > heap.pb.gz
pprof -http=:8080 heap.pb.gz

Note: The profiling data is not symbolized, so either addr2line or llvm-addr2line needs to be available in the path and pprof needs to be able to discover the respective debuginfos.

Writeable temporary directory

The way this library works is that it creates a new temporary file (in the platform-specific default temp dir), and instructs jemalloc to dump a profile into that file. Therefore the platform respective temporary directory must be writeable by the process. After reading and converting it to pprof, the file is cleaned up via the destructor. A single profile tends to be only a few kilobytes large, so it doesn't require a significant space, but it's non-zero and needs to be writeable.

Use with Polar Signals Cloud

Polar Signals Cloud allows continuously collecting heap profiling data, so you always have the right profiling data available, and don't need to search for the right data, you already have it!

Polar Signals Cloud supports anything in the pprof format, so a process exposing the above explained pprof endpoint, can then be scraped as elaborated in the scraping docs.

Use from C or C++

The functionality to dump the current jemalloc heap profile in pprof format is exposed to C and C++ (or any other language that can use jemalloc and can link against libraries via the C ABI). This functionality is exposed via the capi (C API) package.

Building

The following prerequisites are necessary to build the C API package:

  • Working Rust and C toolchains. The former can be installed by following the instructions at https://rustup.rs . The latter can be installed via the distribution's package manager. For example, on Ubuntu, run sudo apt install build-essential.
  • jemalloc and its development headers. For example, on Ubuntu, run sudo apt install jemalloc-dev.

Once the prerequisites are installed, the library can be built by running cargo build -p capi --release. There are three files of interest:

  • The library itself, produced at target/release/libjemalloc_pprof.so
  • A header file, at capi/include/jemalloc_pprof.h
  • A manual page, at capi/man/jemalloc_pprof.3.

The procedure for installing and using these files depends on your distribution and build system.

Use

Ensure that your binaries link against both jemalloc and jemalloc_pprof by passing the linker flags -ljemalloc -ljemalloc_pprof. The procedure for ensuring that these flags are passed depends on your build system and is currently outside the scope of this document.

Once that is done, profiling can be enabled either by setting the MALLOC_CONF variable or by defining a symbol called malloc_conf in the binary. For example:

export MALLOC_CONF="prof:true,prof_active:true,lg_prof_sample:19"

See the jemalloc man page for more details. When profiling is enabled, a profile may be dumped in pprof format via the dump_jemalloc_pprof function.

Example

This program allocates between 1 and 10 MiB every 100 milliseconds, and dumps a profile to the file my_profile every 2 seconds.

#include <assert.h>
#include <errno.h>
#include <unistd.h>
#include <stdlib.h>
#include <pthread.h>
#include <stdio.h>

#include <jemalloc_pprof.h>

void
a()
{
        size_t sz = 1 * 1024 * 1024;
        char *x = malloc(sz);
        for (size_t i = 0; i < sz; ++i) {
                x[i] = '\0';
        }
}

void
b()
{
        size_t sz = 2 * 1024 * 1024;
        char *x = malloc(sz);
        for (size_t i = 0; i < sz; ++i) {
                x[i] = '\0';
        }
}

void
c()
{
        size_t sz = 3 * 1024 * 1024;
        char *x = malloc(sz);
        for (size_t i = 0; i < sz; ++i) {
                x[i] = '\0';
        }
}

void
d()
{
        size_t sz = 4 * 1024 * 1024;
        char *x = malloc(sz);
        for (size_t i = 0; i < sz; ++i) {
                x[i] = '\0';
        }
}

void
e()
{
        size_t sz = 5 * 1024 * 1024;
        char *x = malloc(sz);
        for (size_t i = 0; i < sz; ++i) {
                x[i] = '\0';
        }
}

void
f()
{
        size_t sz = 6 * 1024 * 1024;
        char *x = malloc(sz);
        for (size_t i = 0; i < sz; ++i) {
                x[i] = '\0';
        }
}

void
g()
{
        size_t sz = 7 * 1024 * 1024;
        char *x = malloc(sz);
        for (size_t i = 0; i < sz; ++i) {
                x[i] = '\0';
        }
}

void
h()
{
        size_t sz = 8 * 1024 * 1024;
        char *x = malloc(sz);
        for (size_t i = 0; i < sz; ++i) {
                x[i] = '\0';
        }
}

void
j()
{
        size_t sz = 9 * 1024 * 1024;
        char *x = malloc(sz);
        for (size_t i = 0; i < sz; ++i) {
                x[i] = '\0';
        }
}

void
k()
{
        size_t sz = 10 * 1024 * 1024;
        char *x = malloc(sz);
        for (size_t i = 0; i < sz; ++i) {
                x[i] = '\0';
        }
}

void *
repeatedly_dump(void *ignored)
{
        char *buf;
        size_t len = 0;
        int result;
        for (;;) {
                sleep(2);
                result = dump_jemalloc_pprof(&buf, &len);
                if (result != JP_SUCCESS) {
                        fprintf(stderr, "errno: %d\n", errno);
                        continue;
                }
                if (buf) {                        
                        FILE *file = fopen("my_profile", "w");
                        assert(file);

                        fwrite(buf, sizeof(char), len, file);    
                        fclose(file);
                        printf("dumped pprof of size %lu\n", len);
                        free(buf);
                }
        }
        return NULL;
}

int
main()
{
        pthread_t tid;
        int result;

        result = pthread_create(&tid, NULL, repeatedly_dump, NULL);
        assert(!result);
        for (;;) {
                usleep(100000);
                switch (rand() % 10) {
                case 0:
                        a();
                        break;
                case 1:
                        b();
                        break;
                case 2:
                        c();
                        break;
                case 3:
                        d();
                        break;
                case 4:
                        e();
                        break;
                case 5:
                        f();
                        break;
                case 6:
                        g();
                        break;
                case 7:
                        h();
                        break;
                case 8:
                        j();
                        break;
                case 9:
                        k();
                        break;
                }
        }
}

rust-jemalloc-pprof's People

Contributors

brancz avatar umanwizard avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rust-jemalloc-pprof's Issues

Attempt to add with overflow

Describe the bug

There was an overflow in the fetch memory heap file.
image

How to reproduce?

  1. download rust-jemalloc-pprof
    image
  2. build & run
    image

What did you expect to see?

Successfully fetching files from profiling memory

What did you see instead?

reply empty
image
attempt to add with overflow
image

Environment (please complete the following information)

image

Support parsing existing jemalloc profile /proc/id/maps section

Right now parse_jeheap uses MAPPINGS to populate mapping information:

if let Some(mappings) = MAPPINGS.as_ref() {
for mapping in mappings {
profile.push_mapping(mapping.clone());
}
}

When using parse_jeheap on an existing file, this still collects mappings for the running process:

/// Mappings of the processes' executable and shared libraries.
#[cfg(target_os = "linux")]
pub static MAPPINGS: Lazy<Option<Vec<Mapping>>> = Lazy::new(|| {
/// Build a list of mappings for the passed shared objects.
fn build_mappings(objects: &[SharedObject]) -> Vec<Mapping> {
let mut mappings = Vec::new();
for object in objects {
for segment in &object.loaded_segments {
// I have observed that `memory_offset` can be negative on some very old
// versions of Linux (e.g. CentOS 7), so use wrapping add here.
let memory_start = object.base_address.wrapping_add(segment.memory_offset);
mappings.push(Mapping {
memory_start,
memory_end: memory_start + segment.memory_size,
memory_offset: segment.memory_offset,
file_offset: segment.file_offset,
pathname: object.path_name.clone(),
build_id: object.build_id.clone(),
});
}
}
mappings
}
// SAFETY: We are on Linux, and this is the only place in the program this
// function is called.
match unsafe { crate::linux::collect_shared_objects() } {
Ok(objects) => Some(build_mappings(&objects)),
Err(err) => {
error!("build ID fetching failed: {err}");
None
}
}
});
#[cfg(not(target_os = "linux"))]
pub static MAPPINGS: Lazy<Option<Vec<Mapping>>> = Lazy::new(|| {
error!("build ID fetching is only supported on Linux");
None
});
/// Information about a shared object loaded into the current process.
#[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Ord)]
pub struct SharedObject {
/// The address at which the object is loaded.
pub base_address: usize,
/// The path of that file the object was loaded from.
pub path_name: PathBuf,
/// The build ID of the object, if found.
pub build_id: Option<BuildId>,
/// Loaded segments of the object.
pub loaded_segments: Vec<LoadedSegment>,
}
/// Build ID of a shared object.
#[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Ord)]
pub struct BuildId(Vec<u8>);
impl fmt::Display for BuildId {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
for byte in &self.0 {
write!(f, "{byte:02x}")?;
}
Ok(())
}
}
/// A segment of a shared object that's loaded into memory.
#[derive(Clone, Debug, PartialEq, Eq, PartialOrd, Ord)]
pub struct LoadedSegment {
/// Offset of the segment in the source file.
pub file_offset: u64,
/// Offset to the `SharedObject`'s `base_address`.
pub memory_offset: usize,
/// Size of the segment in memory.
pub memory_size: usize,
}
/// Collects information about all shared objects loaded into the current
/// process, including the main program binary as well as all dynamically loaded
/// libraries. Intended to be useful for profilers, who can use this information
/// to symbolize stack traces offline.
///
/// Uses `dl_iterate_phdr` to walk all shared objects and extract the wanted
/// information from their program headers.
///
/// SAFETY: This function is written in a hilariously unsafe way: it involves
/// following pointers to random parts of memory, and then assuming that
/// particular structures can be found there. However, it was written by
/// carefully reading `man dl_iterate_phdr` and `man elf`, and is thus intended
/// to be relatively safe for callers to use. Assuming I haven't written any
/// bugs (and that the documentation is correct), the only known safety
/// requirements are:
///
/// (1) It must not be called multiple times concurrently, as `dl_iterate_phdr`
/// is not documented as being thread-safe.
/// (2) The running binary must be in ELF format and running on Linux.
pub unsafe fn collect_shared_objects() -> Result<Vec<SharedObject>, anyhow::Error> {
let mut state = CallbackState {
result: Ok(Vec::new()),
};
let state_ptr = std::ptr::addr_of_mut!(state).cast();
// SAFETY: `dl_iterate_phdr` has no documented restrictions on when
// it can be called.
unsafe { dl_iterate_phdr(Some(iterate_cb), state_ptr) };
state.result
}

When profiling is enabled and disabled inline, this makes sense.
However, for an existing jemalloc profile, this is counter-intuitive, because the returned mappings are not actually generated from the heap file.

For example, if this method is run on an existing heap file, the returned mappings will be different each time, and do not actually match what exists in the file.

It would be great to support parsing the /proc/id/maps output from the .heap file.

Not building on M1 mac

Hey! I'm seeing this on M1 mac:

error[E0432]: unresolved imports `libc::dl_iterate_phdr`, `libc::dl_phdr_info`, `libc::Elf64_Word`, `libc::PT_LOAD`, `libc::PT_NOTE`
  --> /Users/jack/.cargo/registry/src/index.crates.io-6f17d22bba15001f/jemalloc_pprof-0.1.0/src/linux.rs:27:20
   |
27 | use libc::{c_void, dl_iterate_phdr, dl_phdr_info, size_t, Elf64_Word, PT_LOAD, PT_NOTE};
   |                    ^^^^^^^^^^^^^^^  ^^^^^^^^^^^^          ^^^^^^^^^^  ^^^^^^^  ^^^^^^^ no `PT_NOTE` in the root
   |                    |                |                     |           |
   |                    |                |                     |           no `PT_LOAD` in the root
   |                    |                |                     no `Elf64_Word` in the root
   |                    |                no `dl_phdr_info` in the root
   |                    no `dl_iterate_phdr` in the root

error[E0609]: no field `p_type` on type `&_`
   --> /Users/jack/.cargo/registry/src/index.crates.io-6f17d22bba15001f/jemalloc_pprof-0.1.0/src/linux.rs:225:15
    |
225 |         if ph.p_type == PT_LOAD {
    |               ^^^^^^

Support macOS

Edits from Brennan:

I'm repurposing this issue to "support macOS". The original issue is below.

The main thing we need to do to support macOS is provide something like linux::collect_shared_objects for macOS. This can probably done using the functions documented in dyld(3).


Hi peeps,

I am looking for some guidance on how to run pprof on macos,

I have a rust app configured as follows:

  1. app/Cargo.toml
[profile.release] 
lto = "thin"
debug = 1  # Line tables only.

tikv-jemalloc-ctl = { version = "0.5" }
tikv-jemalloc-sys = { version = "0.5", features = ["profiling"] }
tikv-jemallocator = { version = "0.5", features = ["profiling"] }

[target.'cfg(not(target_os = "macos"))'.dependencies]
jemalloc_pprof = "0.1.0"
  1. conditionally add it to axum
    #[cfg(not(target_os = "macos"))]
    {
        router = router.route("/debug/pprof/heap", get(web::pprof::pprof_heap));
    }
  1. port-forward on the pod

  2. run go tool pprof http://localhost:9200/debug/pprof/heap locally, it throws:

Fetching profile over HTTP from http://localhost:9200/debug/pprof/heap
Local symbolization failed for app (build ID 9d67e10139bad60c021451c99d19ec33b08f8a67): open /usr/local/bin/app: no such file or directory
Local symbolization failed for libpthread.so.0 (build ID 255e355c207aba91a59ae1f808e3b4da443abf0c): open /lib/x86_64-linux-gnu/libpthread.so.0: no such file or directory
Local symbolization failed for libc.so.6 (build ID a3780b0b8a5bf5876e31d16b0a9d8fc6ba69a1f2): open /lib/x86_64-linux-gnu/libc.so.6: no such file or directory
Some binary filenames not available. Symbolization may be incomplete.
Try setting PPROF_BINARY_PATH to the search path for local binaries.
http post http://localhost:9200/debug/pprof/symbol: server response: 404 Not Found

the mechanics of how all this connects is quite hazy for me, but it looks like app binary does not have symbols and that is causing the local symbolization to fail,

question: is that due to #[cfg(not(target_os = "macos"))]? pprof works for the cpu profile, indicating the app binary has the correct symbols *maybe?

btw, for testing, I've copied the binary locally, and open /usr/local/bin/app: no such file or directory was gone, the other two remained

Local symbolization failed

Hello,

This is more of a question rather than a real issue. I do not understand the underlying of pprof well, and could not figure out what was going wrong.

I am coming from Golang background, and pprof works out of box for CPU and memory with pprof package - net/http/pprof - Go Packages. Nice and easy.

When I work on Rust, I find CPU profile works similarly to Go. But for memory, it is not out of box with this lib.

The way I used rust-jemalloc-pprof is:

  1. setup some web handlers, so profile is triggered when receiving a request like /debug/pprofile/heap
  2. curl the handler remotely from my local machine
  3. use pprof to analyze the profile dump on my local machine.

I find it works for Golang cpu/heap and Rust cpu profile. But for memory, I am seeing Local symbolization failed for xxxxx-a2099af8e65d354b (build ID c04fcdbe120999b2): stat /tmp/xxxxx-a2099af8e65d354b. I can get around it by setting PPROF_BINARY_PATHand download xxxxx-a2099af8e65d354b to my local host.

But I do not understand why I only need to do this for Rust jemalloc memory profile and is there anyway for me to get it work automatically?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.