Git Product home page Git Product logo

ferrisetw's Introduction

FerrisETW 🦀

This crate provides safe Rust abstractions over the ETW consumer APIs.

It started as a KrabsETW rip-off written in Rust (hence the name Ferris 🦀). All credits go to the team at Microsoft who develop KrabsEtw, without it, this project probably wouldn't be a thing.
Since version 1.0, the API and internal architecture of this crate is slightly diverging from krabsetw, so that it is more Rust-idiomatic.

Examples

You can find a examples within the crate documentation on doc.rs, as well as the examples and the tests folders.

If you are familiar with KrabsETW you'll see that is very similar. In case you've never used KrabsETW before, the examples are very straight forward and should be easy to follow. If you have any issues don't hesitate in asking.

Documentation

This crate is documented at docs.rs.

Notes

  • The project is still WIP. Feel free to report bugs, issues, feature requests, etc. Of course, contributing will be happily accepted!

  • The types available for parsing are those that implement the trait TryParse for Parser, basic types are already implemented. In the near future I'll add more :)

  • I tried to keep dependencies as minimal as possible, also you'll see I went with the new windows-rs instead of using the winapi. This is a personal decision mainly because I believe the Windows bindings is going to be the "standard" to interact with the Windows API in the near future.

Acknowledgments

  • First of all, the team at MS who develop KrabsETW!!
  • Shaddy for, pretty much, teaching me all the Rust I know 😃
  • n4r1b for creating this great crate
  • daladim for adding even more features

ferrisetw's People

Contributors

daladim avatar dependabot[bot] avatar drchat avatar jrmuizel avatar jxy-s avatar mina86 avatar n4r1b avatar poliorcetics avatar roblabla avatar vthib avatar yjugl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ferrisetw's Issues

Enable non-threaded or blocking trace session

Hi,
I really like the work you've done here, good job!

For my purposes, I usually want to start a trace, and run it until I quit the program. In the current implementation Ferris spawns a new anonymous thread to do the processing, which means I need to either do a 'sleep forever' or 'wait for user input' hack.

Would it be possible to either:

  • Run a trace in the same thread (e.g. 'blocking' mode); or
  • Expose the spawned thread in the Trace struct, so I could do a .join on it

Invalid widestring pointer

hello, I have used the code of this commit(#42). An PropertyError("Invalid widestring pointer") error occurred when I used the following code:

TryParse::<String>::try_parse(&parser, "appname")

This problem will not occur if i switch to the previous commits.

Support filter per PID for system traces

EventFilter::ByPids are only effective on kernel mode logger session.

see https://learn.microsoft.com/en-us/windows/win32/api/evntprov/ns-evntprov-event_filter_descriptor:

The PIDs based filter-blob is only valid for a kernel mode logger session because the private logger session runs inside a user-mode process

But this does not work for KernelTraces in ferrisetw. This would be good to support it.

Ideas:

  • Maybe there's a distinction between "a trace run in kernel-mode" and a "System trace"? But is a ferrisetw::KernelTrace one of them in the first place?
  • Maybe that's post-win10 build 20348 anyway? (see https://learn.microsoft.com/en-us/windows/win32/etw/system-providers)
  • Maybe that's not possible at all, and this should be documented in ferrisetw

If this eventually works, this should be added in an integration test

Missing LICENSE.md

This repository doesn't have a license file. I see that the Cargo.toml says "MIT or Apache 2.0".

Recommend adding LICENSE.md files to more clearly indicate the license.

Unable to process different events from the same TraceLogging provider

I have a provider binary that emits n number of different events over the same provider GUID using TraceLogging. Here's an example:

static const SID sid = { SID_REVISION, 1, 5, { 18 } };

TRACELOGGING_DECLARE_PROVIDER(g_hProvider);
// 3B9CAB28-762A-4740-A82B-B6829CC90ADF
TRACELOGGING_DEFINE_PROVIDER(
    g_hProvider,
    "My-Test-Provider",
    (0x3b9cab28, 0x762a, 0x4740, 0xa8, 0x2b, 0xb6, 0x82, 0x9c, 0xc9, 0xa, 0xdf));

int main() {
    TraceLoggingRegister(g_hProvider);

    EmitEID1();

    for (int i = 0; i < 5; i++) {
        EmitEID2();
    }

    TraceLoggingUnregister(g_hProvider);
}

void EmitEID1() {
    TraceLoggingWrite(g_hProvider,
        "ProcessCreation",
        TraceLoggingUInt8(1, "EventId"),
        TraceLoggingUInt32(44, "Pid"),
        TraceLoggingUInt32(400, "ParentPid"),
        TraceLoggingWideString(L"test-provider.exe", "ParentProcessName"),
        TraceLoggingUInt32(600, "CreatorPid"),
        TraceLoggingWideString(L"test-provider.exe", "CreatorProcessName"),
        TraceLoggingWideString(L"test-provider.exe", "FileName"),
        TraceLoggingBoolean(TRUE, "ExactFileName"),
        TraceLoggingWideString(L"testing", "CommandLine"),
        TraceLoggingSid(&sid, "Sid"),
        TraceLoggingBoolean(FALSE, "SubsystemProcess")
    );
}

void EmitEID2() {
    TraceLoggingWrite(g_hProvider,
        "ThreadCreation",
        TraceLoggingUInt8(2, "EventId"),
        TraceLoggingUInt32(6000, "CreatorPid"),
        TraceLoggingWideString(L"test-provider.exe", "CreatorProcessName"),
        TraceLoggingUInt32(444, "TargetPid"),
        TraceLoggingWideString(L"test-provider.exe", "TargetProcessName"),
        TraceLoggingUInt32(6464, "TargetThreadId"),
        TraceLoggingSid(&sid, "Sid")
    );
}

I've written a basic consumer application that parses the events into a struct.

use ferristetw::*;

fn main() {
    let test_provider = provider::Provider
        ::by_guid("3B9CAB28-762A-4740-A82B-B6829CC90ADF")
        .add_callback(test_callback)
        .build();

    let test_trace = UserTrace::new()
        .enable(test_provider)
        .start_and_process()
        .unwrap();

    std::thread::sleep(std::time::Duration::new(60, 0));

    test_trace.stop().unwrap();
}

fn test_callback(record: &EventRecord, schema_locator: &SchemaLocator) {
    match schema_locator.event_schema(record) {
        Err(err) => println!("Unable to get the ETW schema for event: {:?}", err),
        Ok(schema) => parse_event(&schema, record)
    }    
}

fn parse_event(schema: &schema::Schema, record: &EventRecord) {
    let parser = parser::Parser::create(record, schema);
    match parser.try_parse::<u8>("EventId").unwrap_or(0) {
        2 => {
            let event = ThreadCreatedEvent {
                id: 2,
                description: String::from("Thread created"),
                creator_pid: parser.try_parse::<u32>("CreatorPid").unwrap_or(0),
                creator_process_name: parser.try_parse::<String>("CreatorProcessName").unwrap_or_else(|_| String::from("")),
                target_pid: parser.try_parse::<u32>("TargetPid").unwrap_or(0),
                target_process_name: parser.try_parse::<String>("TargetProcessName").unwrap_or_else(|_| String::from("")),
                target_thread_id: parser.try_parse::<u32>("TargetThreadId").unwrap_or(0),
                sid: parser.try_parse::<String>("Sid").unwrap_or_else(|_| String::from(""))
            };
        
            println!("{:?}", event);
        }
        _ => {}
    }
}

In the example, I emit Event 1 once before emitting Event 2 five times. My problem is that all instances of Event 2 will fail to parse correctly (specifically with TdhNativeError(IoError(Os { code: 1168, kind: Uncategorized, message: "Element not found." })). If I remove the line containing EmitEID1() from the example, the events will parse properly. I'm not exactly sure why I'm unable to parse multiple events and any help would be much appreciated.

Use the W versions of the Windows APIs

We're currently feeding String::as_ptr() to A versions of the Windows API.
This means we'll have unexpected results and/or UB when using non-ASCII traces names for instance.

Configure kernel traces with TraceSetInformation

I've hit a roadblock when using this library to gather stack walking information on sampled profiles. Even when wrangling the control handle from the trace with transmutes after calling start(), it's too late to set what I need, and it ends up not working.

I noticed there was this comment:
// TODO: For kernel traces, implement enable_provider function for providers that require call to TraceSetInformation with extended PERFINFO_GROUPMASK

If either a simple callback with the control handle, or a full solution for managing kernel trace information could be added, that would save a lot of headaches.

Trait bound error caused by diff windows-rs dependence version

windows-rs version in ferrisetw's Cargo.toml is 0.39, but windows-rs version in my project is 0.42. This will cause the following error when compiling:

error[E0277]: the trait bound `Parser<'_>: TryParse<GUID>` is not satisfied
  --> src/main.rs:23:66
   |
23 |                     let guid: GUID = TryParse::<GUID>::try_parse(&mut parser, "Guid").unwrap();
   |                                      --------------------------- ^^^^^^^^^^^ the trait `TryParse<GUID>` is not implemented for `Parser<'_>`
   |                                      |
   |                                      required by a bound introduced by this call
   |
   = help: the following other types implement trait `TryParse<T>`:

My project can only be compiled by using windows-rs with the same version as ferrisetw.

Is there any good solution?

Ideas for the next major release

Hello n4r1b,

Here are a few ideas that could improve the crate, but that would likely break the compatibility, so they would probably need bumping to a new major version.
(note: I have more or less planned to work on them, and I already have a few draft branches locally, I'll probably make MRs soon, but I'm currently focusing on non-breaking changes first)
Feel free to comment or discuss them, maybe there are better solutions to these issues I haven't thought about yet

Features

  • Implement event filters (that's the // TODO: Add Filters option). As a kind of side-effect in my draft branch, this slightly modified a public API, and thus should be left for a major bump

API

  • Use a bitfield for TraceFlags instead of a u32
  • (to check) have a cleaner distinction between open, start and process. One of them is probably not required.
  • I'd personally rename some structs such as PropertyInfo or Schema. This issue comes from krabsetw names, but I feel they're unfortunate because they do not really convey their actual meaning (could be respectively PropertyWithBuffer and EventAndSchema for instance)
  • ...as a consequence, since the Parser contains the Schema, which contains the EventRecord, the Parser owns the Event data, and that feels weird. If I read correctly, this even prevents an ideal caching of such structs.
  • I think we could improve the Builder pattern for the Provider (e.g. some methods should be mutually exclusive and called only once, such as by_guid and by_name)
  • Provider and Trace have setters for fields that are pub. We could probably have only one of them

Safety considerations

  • Be able to move a UserTrace (or KernelTrace). We cannot do this currently, as we feeding OpenTraceA with pointers to UserTrace::data that must not be moved during the lifetime of the trace.
    For the same reason, we could consider introducing lifetime bounds for EventTraceLogfile
  • EventRecord should be a read-only safe wrapper, instead of a simple newtype
    Because it does not make sense to modify e.g. record.ExtendedDataCount

Performance

Ideas from #25 most likely require an API change, and hence a major version bump as well

Do not ignore the very last events

Since (a PR to come, probably #63), we're ignoring the very last ETW events that were still in the buffers when we called CloseTrace

We may want to process them, and drop the memory structures when we're sure every event is processed.
See

/// TODO: it _might_ be possible to know whether we've processed the last buffered event, as
///       ControlTraceW(EVENT_TRACE_CONTROL_QUERY) _might_ tell us if the buffers are empty or not.
///       In case the trace is in ERROR_CTX_CLOSE_PENDING state, we could call this after every
///       callback so that we know when to actually free memory used by the (now useless) callback.
///       Maybe also setting the BufferCallback in EVENT_TRACE_LOGFILEW may help us.

Panic on Kernel Trace close

Hi there,

I get a panic when I try to call .stop() on a Kernel Trace. Basic code:

let provider_io = Provider::kernel(&kernel_providers::FILE_IO_PROVIDER)
    .build()
    .unwrap();

let mut trace = KernelTrace::new()
    .named(String::from("HijackWatcher"))
    .enable(provider_io)
    .start()
    .unwrap();

std::thread::sleep(Duration::new(3, 0));
trace.stop();

Strack Trace:

thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', C:\Users\xxx\.cargo\registry\src\github.com-1ecc6299db9ec823\ferrisetw-0.1.1\src\trace.rs:112:30
stack backtrace:
   0:     0x7ff6de00a782 - std::backtrace_rs::backtrace::dbghelp::trace
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library\std\src\..\..\backtrace\src\backtrace\dbghelp.rs:98
   1:     0x7ff6de00a782 - std::backtrace_rs::backtrace::trace_unsynchronized
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library\std\src\..\..\backtrace\src\backtrace\mod.rs:66
   2:     0x7ff6de00a782 - std::sys_common::backtrace::_print_fmt
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library\std\src\sys_common\backtrace.rs:66
   3:     0x7ff6de00a782 - std::sys_common::backtrace::_print::impl$0::fmt
                               at /rustc/897e37553bba8b42751c67658967889d11ecd120/library\std\src\sys_common\backtrace.rs:45

Windows version:

OS Name:                   Microsoft Windows 11 Pro
OS Version:                10.0.22621 N/A Build 22621

Style: replace Box<Arc<T>> with Arc<T>

That's what is suggested by Clippy

warning: usage of `Box<Arc<CallbackData>>`
   --> src\trace.rs:173:20
    |
173 |     callback_data: Box<Arc<CallbackData>>,
    |                    ^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: `#[warn(clippy::redundant_allocation)]` on by default
    = note: `Arc<CallbackData>` is already on the heap, `Box<Arc<CallbackData>>` makes an extra allocation
    = help: consider using just `Box<CallbackData>` or `Arc<CallbackData>`
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#redundant_allocation

In our case, we do not only want the CallbackData to be on the heap, but the ref counter as well.
https://github.com/rust-lang/rust/blob/96ddd32c4bfb1d78f0cd03eb068b1710a8cebeef/library/alloc/src/sync.rs#L352 suggests the ref counters (both atomic::usizes) really live on the heap, so this change should be sensible.

That's to avoid an allocation, on a struct that's only created a handful of times, this currently properly works that way, and it's a private implementation detail of the crate (so it can be changed at anytime without any trouble for the users), so I'm leaving this for later.

Possible race conditions

Race 1

There is no specific synchronization mechanism in trace_callback_thunk().
Which means a thread could be closing the trace session and destroying the TraceData, just at the same time the last callback was triggered. And the event_record.user_context dereferenced in another thread, supposed to point to a TraceData could be dangling here

Race 2

If a thread closes the trace session and destroys the TraceData, the associated Provider, hence the associated callbacks, which are closures, are dropped. That's a problem in case a callback is still in progress (e.g. a callback is stuck in a blocking function call), because the closure may contain state (e.g. in a move || closure).

Notes

krabsetw may have the same issues. See the issue in krabsetw

De-duplicate code between user and kernel traces

In ferrisetw 0.x, there was a macro doing most of the work.

It has been removed in ferrisetw 1.0 because of the different UserTraceBuilder and KernelTraceBuilder.
I tried to de-duplicate them, using a TraceBuilder<T: TraceTrait>, but that is hard because I've ended up needing to make NativeEtw pub because they would otherwise be private types leaking to public interfaces. So I gave up.

But there should be a way, one just has to find the correct balance between traits, macros and dedicated impl ... for ....
Also, do we need to have separate types after all? There is only one Provider, we may be OK with a single Trace type

Ideas to improve performance of Ferrisetw

Here are a few ideas I had when reading the code before profiling it.
Feel free to add any remarks and comment :)

How important and efficient are all these ideas?
TODO: use a profiler to benchmark the few places that look time-consuming.

When a callback is called (how is the Schema built?)

EVENT_RECORD is Copy. Depending on how the compiler optimizes it, it is possibly copied at every function call:

  • ctx.on_event(*event_record);
  • prov.on_event(record, locator);, once for each provider
  • callbacks.iter_mut().for_each(|cb| cb(record, locator)), once for each callback of each provider
    Even in cases where there is a single provider with a single callback, that may be quite a lot of copies (especially as EVENT_RECORD is quite large).

Solution:

  • To be sure we avoid copies (regardless of how the compiler might happen to optimize it), we could change this to a &EVENT_RECORD.
    This would require Schema to not own it

That's for the event payload part.
Considering the ETW schema, it is properly cached in the SchemaLocator and is retrieved quickly.

When a Parser is created

One of the first steps in the callback is to call Parser::create(&schema). This

  • copies the user_buffer (i.e. the actual event payload). This might be avoided (only take a reference to it?)
  • calls PropertyIter::enum_properties() for every event record, although this only depends on the schema, not on the record itself!
    • that's costly (because enum_properties() builds a Vec<Property>)
    • (BTW that's not what Rust usally calls an "Iterator", as this does not implement Iterator. Change its name?)
    • Solutions:
      • either build it one per actual schema (not per RecordAndSchema) (not possible, see next comment)
        Here as well, splitting Schema from the EventRecord would be a good thing
      • do this lazily only when/if we require a property to be parsed (maybe too much work for little benefit, there should not be tons of different SchemaKeys for a given trace, having a little work done at the first item of every kind should be kinda OK)

When a Property is accessed

parser.try_parse(...) does many things. But most work is done in find_property()

  • Hopefully, it is cached in the Parser...which depends on the event record
  • Could it be cached (or most of its event-independant work) in the ETW schema instead?
    This would require reviewing the code, and split it into two parts: the record-dependant and record-independant code

Note: API changes

Currently, the callbacks are passed an EVENT_RECORD and a SchemaLocator.
As stated in a TODO in the code, this is not straighforward. We could/should:

  • (Pass an &EVENT_RECORD, see above)
  • Do not pass the SchemaLocator, but the Schema directly (bad idea, some callbacks do not need the Schema. Let's keep giving them the ability to retrieve it or not)
  • This Schema would probably not own the event record (nor a ref to it).
  • Note: passing an already built Parser instead of a Schema is probably not a good idea. The end user may want to avoid its creation on most events, and create it only for e.g. event IDs that interest him

Memory usage improvements in EventTraceProperties

Currently, we have

EventTraceProperties {
    etw_trace_properties,
    trace_name: [0; 1024],
    log_file_name: [0; 1024],
};

We could avoid using two arrays of 1024 bytes to store a string that's probably shorter than this. Maybe we could make EventTraceProperties generic on the name length, or really support dynamic allocation of the name

Support more TDH property types

Property::new() can return PropertyError::UnimplementedType, depending on the TDH property flags.

This would be good to support more types (e.g. PropertyFlags::PROPERTY_STRUCT, or PropertyFlags::PROPERTY_PARAM_LENGTH)

Enable doctests

Example code in doctests are not checked by the compiler.

We should enable them, e.g. by having cargo test --doc in the GitHub Actions config.

Note that this requires fixing them, because none of them is currently passing (mainly because they are missing use ...)

Support enabling/disabling a provider on a running trace

In ferrisetw 0.1, UserTrace::enable should be used to add a Provider to a trace. But, IIRC, this has no effect on a trace that has started already (the provider is pushed to a list, but EnableTraceEx2 is not called on this new trace).

To fix this problem, the next_major_version branch reworks this enable function and forces to define all Providers before the trace is started.

But the Windows API allows enabling/disabling providers when the trace is running (using EnableTraceEx2, so this could be worth supporting this in ferrisetw

Schema::properties() isn't public

Schema::properties() is currently pub(crate), but is there any reason it shouldn't be just pub? It's very useful for being able to inspect the properties in a schema; I couldn't find any other way of getting this list.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.