google / vulkan-performance-layers Goto Github PK

View Code? Open in Web Editor NEW

49.0 9.0 20.0 465 KB

Vulkan layers to measure and analyze graphics performance.

Home Page: https://opensource.google

License: Apache License 2.0

CMake 3.93% Dockerfile 1.25% C++ 85.00% Shell 1.86% Python 7.95%

cpp graphics performance-analysis performance-metrics vulkan

vulkan-performance-layers's Issues

Introduce `LogOutput` abstraction.

After an EventLogger converts an Event to a string, it should write it to the output (Issue: #69 ). The output could be a file, standard output, or a string (for testing). LogOutput is the object that takes care of writing to the output and exposes the LogLine() method. Since the underlying file might be shared among multiple threads, LogLine() must be synchronized to make sure the logs do not get corrupted.
A sample implementation:

class Logoutput {
 public:
  virtual void LogLine(const std::string &line) = 0;
};

class FileOutput : public LogOutput {
 public:
  void LogLine(const std::string &line) override { ... }
 private:
  FILE *out_ = nullptr;
};

class StringOutput: public LogOutput {
 public:
  void LogLine(const std::string &line) override { ... }
  
  const std::string &GetLog() { return out_; } 
 private:
  std::string out_ = "";
};

[CI] Add `FileCheck`-based tests for the Chrome Trace Event format

Our current CI checks the format of the CSV and CommonLog files after each run to make sure the results are as expected. We need to have the same check for the Trace Event format as well.

Fix `return-type` and `comment` warnings.

Some of the current warnings are caught as errors when the code gets imported.
7787cba

We would like to catch all of those warnings in the CI. To do that, we enable all the warnings and treat them as errors by setting -Werror.
comment and return-type are the existing unhandled warnings (excluded from -Werror) that do not break the imports but we would like to fix them.

vkcube hangs with runtime layer enabled on ubuntu 22.04

This is under docker & lavapipe.
The other layers don't hang vkcube.

Introduce abstractions for loggable events

The existing event logging is rather ad-hoc. Examples:

  FrameTimeLayerData(char* log_filename, uint64_t exit_frame_num_or_invalid,
                     const char* benchmark_watch_filename,
                     const char* benchmark_start_string)
      : LayerData(log_filename, "Frame Time (ns),Benchmark State"),
        exit_frame_num_or_invalid_(exit_frame_num_or_invalid),
        benchmark_start_pattern_(StrOrEmpty(benchmark_start_string)) {
    LogEventOnly("frame_time_layer_init");

  layer_data->LogTimeDelta("frame_present",
                           layer_data->HasBenchmarkStarted() ? "1" : "0");

    int64_t logged_delta = ToInt64Nanoseconds(now - last_log_time_);
    LogLine(event_type, CsvCat(logged_delta, extra_content), now);

    layer_data->LogEventOnly("frame_time_layer_exit",
                             absl::StrCat("terminated,frame:", frames_elapsed));

This form of logging is unstructured and does not allow for emitting logs in different output formats. All the event data is converted to strings and concatenated before we can decide which logging format to use.

This task is to come up with a more structured abstraction for logged events that will allow for multiple output formats (e.g., csv and json).

The generic interface should provide the following functions:

getEventName()
getLogLevel() -- decides the 'verbosity level', e.g., whether to log to CSV or to the common log file
getNumAttributes -- returns the number of data entries (or 'columns' / tuple elements / etc)
getAttributes()

We want to support a few data entry types:

String
Int64 -- e.g., count of something
Int64Array -- e.g., array of hashes
Timepoint / timestamp -- when something happened
Duration -- how long something took

All of this combines as follows:

Event is a names set of attributes
Attribute is a key-value pair where the attribute name is the key and value can be of one of the supported types.

A concrete layer-specific event definition could look something like this:

class ShaderModuleCreateEvent : public LoggableEvent {
public:
   // (Constructor and other functions...)
   
  std::vector<const Attribute*> getAttributes() const {
    return {&shader_module_hashes_, &creation_time_};
  }
  
private:
  Int64ArrayAttr shader_module_hashes_("shader_hashes");
  DurationAttr creation_time_("creation_time");
};

std::unique_ptr<LoggableEvent> CreateShaderModuleCreateEvent(const ShaderHashes &hashes,
                                                             Duration creation_time,
                                                             Timestamp when = Timestamp::Now()) {
  return std::make_unique<ShaderModuleCreateEvent>("create_shader_module", when,
                                                   hashes, creation_time);
}

(This is just an example for inspiration. Better ideas welcome!)

Introduce event logging abstraction.

An EventLogger takes a stream of Event as input and produces a log based on the desired format (CSV, Chrome trace event, etc).

flowchart LR
    EventCreator --Event --> EventLogger --log--> OutputLogger

The EventLogger would be an abstract class with some public methods: AddEvent(), StartLog(), and FinalizeLog(). It also requires a flush mechanism. Either a public Flush() method triggered from outside, or an implicitly triggered Flush() invoked when the number of current events reaches a certain threshold.
Each format may require prefixes and suffixes around the final log. For instance, CSV requires adding a header to the first line of the file. StartLog() and FinalizeLog() should be implemented according to the specifications of each format.

To generate a log, these steps are required:

Converting each Event into an entry:
Each Event is an entry in the final log output. The entry format is defined by the EventLogger implementation. For instance: The entry for Event(name: "compile_time_ns", timestamp: 1234, hash:0xabc, duration:123) would look like this in the CSV format:

compile_time,1234,0xabc,123

and this in the Chrome event trace format:

{name:"compile_time_ns", timestamp: 1234, hash: 0xabc, duration: 123}

Combining entries:
This step, generates the log's body by combining the entries. In CSV, the entries are separated by a new line and in Chrome event trace, they're comma separated.
The pseudo code for the EventLogger would be like this:

class EventLogger {
 public:
  void InsertEvent(Event* e) = 0;
  // Flush may receive an input (an instance of LogOutput), pass the string generated from concatenating the events to it, and 
  // delete the current events.
  void Flush(...) = 0;
  void StartLog() = 0;
  void FinishLog() = 0;
 protected:
  std::vector<Event*> events_;
};

class EventLoggerImpl: public EventLogger {
 public:
  void Flush(...) override {
    // Convert Events to entries.
    ...
    // Combine entries.
    ...
    // Delete current events
    ...
    // Use LogOutput to print the log
  }
}

Allow to use libc++ in CMake builds

The filesystem library is hardcoded as the libstdc++ one:

set(FILESYSTEM_LIB_NAME stdc++fs)

Broken Script Tracking Issue

Broken Script

Changing the common log format causes the plot_timeline.py script to break because it's using the previous format to parse the event.log. Currently, we're in the middle of refactoring the layers to use the CommonLogger that writes the events in the new format. We want to finish refactoring all the layers before updating the script. This issue is created to track the broken script and fix it after the refactoring is over.

We can fix this issue when the following layers are refactored:

Fix Unaligned slack events

This is the output of the vkcube in the Trace Event format rendered by Perfetto UI.

The shader slack events should start after the shader creation events (the small box in the bottom) and end before the pipline creation events. The start and end time is not measured properly in the layers.

Introduce the event dispatch abstraction

It is good to separate the function which creates an event from the function that triggers the logger(s) for that event. This is beneficial when we want to add support for a new format, change the current output format for a set of existing events, etc. Consider we have the CSV logger and add support for the Trace Event format. In this case we need to find all the call sites of the CSV logger and change the code so that it uses the newly introduced logger. This is too much change and it's error prone as we might miss some sections. One of the alternatives is to have a dispatcher that forwards an event from its creator to a logger(s) registered for that event. It basically maps the event names to a vector of loggers. This way, when we add a new format and want to use it for a specific type of event, we only append it to the logger vector of the event.
The dispatcher has two public interfaces, HandleEvent() and RegisterHandler(). The first one receives an event and invokes the loggers for that event. The second one adds a logger to the loggers of an event.

flowchart LR
    EventCreator--Event-->Dispatcher
    subgraph Dispatcher
      direction LR
      EventType -.-> EventLogger*      
    end
    Dispatcher -- Event --> EventLogger

A sample implementation would like this:

class Dispatcher {
 public:
  // Invokes the loggers registered for this event.
  void HanldeEvent(Event *event) {
  }
  // Adds the logger to the event's logger vector.
  void RegisterLogger(EventLogger* logger, std::string event_name) {
    ...
  }

 private:
  std::map<std::string, std::vector<EventLogger*> > name_to_logger_vector;

}

Handle `time` conversions from constants.

We use 2 different clocks, TimestampClock and DurationClock. They are aliases for std::chrono::system_clock and std::chrono::steady_clock. I created a wrapper around these clocks to make them easier to work with. E.g: #134,
I had 2 assumptions in mind when writing these classes.
First, the DurationClock and TimestampClock have the same precision. This caused a bug on a system in which this didn't hold. I fixed it in #145 by casting the duration from one clock to the other.

My second assumption was that the clocks always are in the nanoseconds precision. The second bug happens in this code:

class Timestamp {
 public:
  static Timestamp FromNanoseconds(int64_t nanos) {
    return Timestamp(
        TimestampClock::time_point(TimestampClock::duration(nanos)));
  }

  Timestamp(TimestampClock::time_point timestamp) : timestamp_{timestamp} {}

  int64_t ToNanoseconds() const {
    return std::chrono::nanoseconds(timestamp_.time_since_epoch()).count();
  }

 private:
  TimestampClock::time_point timestamp_;
};

The Timestamp class is a wrapper around a TimestampClock. It has a FromNanoseconds() factory method that receives an input and stores it in the underlying TimestampClock::time_point variable assuming the TimestampClock has a nanoseconds precision. That is why this line of test fails in a system that has a different clock.

https://github.com/google/vulkan-performance-layers/blob/main/layer/unittest/trace_event_log_tests.cc#L198

Add runtime layer FileCheck.

Now that the runtime layer doesn't hang in the CI when running vkcube, we can have a FileCheck for the output produced by the runtime layer.

Improve source code directory structure

Currently all source and header files are in the same directory. This is not glob-friendly because we keep multiple binaries (layer .sos) in the same directory.

Instead, we should separate out common source files and create a source directory for each binary target.

Remove all Abseil dependencies

We cannot use Abseil in dynamically loaded and dynamically unloaded libraries:
https://abseil.io/about/compatibility#what-users-must-and-must-not-do

Dynamic unloading of performance layers causes crashes because of Abseil code.

Handle vkDestroyShaderModule

LayerData should handle vkDestroyShaderModule so that we don't accidentally use pointers to objects whose lifetime ended.
https://www.khronos.org/registry/vulkan/specs/1.2-extensions/man/html/vkDestroyShaderModule.html

Event logging implementation for the Chrome Trace Event format.

Here is the base Trace Event format:

{
  "name": "myName",
  "cat": "category,list",
  "ph": "B",
  "ts": 12345,
  "pid": 123,
  "tid": 456,
  "args": {
    "someArg": 1,
    "anotherArg": {
      "value": "my value"
    }
  }

Not all the fields are required and we can skip cat, pid, tid, and args. But, we might need additional fields based on the phase (ph) we specify. Since our main goal is to show duration, the phase should be X which indicates an event with duration in Trace Event (a Complete event). This event requires an additional duration (dur) field. A simple Complete event would look like this:

{ "name": "myName", "ph": "X", "ts": 12345, "dur": 123}

Current Event class does not have a "phase" and putting the phase as an attribute might mess things up because we have to change the other loggers to ignore it. I suggest the following code:

enum Phase { Complete };

class TraceEvent: public Event { 
 public:
  Phase GetPhase() { return phase_;};
  virtual Duration GetDuration() { ... }
 private:
  Phase phase_;
};

std::string TraceEventToStr(TraceEvent *event) {
    std::stringstream event_str;
    int64_t timestamp_ms = ToMilliseconds(event->GetTimestamp());
    event_str << "{" << std::quoted("name") << ":" << std::quoted(event->GetName()) << std::quoted("ts") << ":" << timestamp_ms << "}";
    if (event->GetPhase() == Phase::Complete)
        event_str << std::quoted("ph") << ":" << std::quoted("X") << "," << std::quoted("dur") << ":" << ToMilliseconds(event->GetDuration());

    event_str << "},";    
    return event_str.str();    
}

To get the duration, the events that are created as a TraceEvent have already a duration attribute. There are a couple of ways to get that attribute:

Keep a pointer to the duration attribute in the TraceEvent as a private variable. This approach might require the least modification to the previous code.
Move the duration attribute to the Event class so that all the Events have duration. The default duration should be 0. This might not make sense especially for the events that do not have any duration (e.g., the events indicating the start of a layer). We should also modify the loggers as well because the durations are moved out of the attribute vector and we can't print them by just iterating over the attributes.
Add a GetAttributeByName() method to the event to iterate over the attributes and return the matched one. The good thing about it is that we don't need to duplicate things or change tons of code. But the caveat might be the performance since we have to do it for each event.

The TraceEventLogger takes a TraceEvent, converts it to a 1 line string using the mentioned function and logs it to the output. It also overrides the StartEvent() to print an open bracket to the file.

Then we need a DataLayer with a TraceEventLogger that runtime and compile time layer use.

Update the CI for the common log file.

The format of the events written to the common log file has changed. An example of the new format:
create_graphics_pipelines,timestamp:1667521492863138945,hashes:"[0x57dad0c231e4e5ce,0x92cc37a4b4360ca9]",duration:595730
The CommonLogger logs the events in the new format while the current CI checks the common log file with the previous format.
When we start refactoring the layers with CommonLogger, common log checks should be disabled and re-enabled when the checks are consistent with the new format.

Use dispatchable handles instead of tracking object relationships

This is a low priority refactoring task to simplify core layer code.
See the suggestion form @paulthomson: #18 (comment).

In other words, instead of using VkInstance as a key (and the same idea applies for VkDevice), use the "dispatch table pointer" of the VkInstance, which you get using this function (please ignore the snake_case naming). And when you have a VkPhysicalDevice, use pretty much the same function to get the VkInstance dispatch table pointer, which you can then use to look-up the data you have for the VkInstance (which should include the VkInstance value itself).

Another example of where this is somewhat nice is in a frame counter layer. We don't need to track VkQueue; we can go straight from the VkQueue to the DeviceData.

Refactor data logging

This is a high-level issue for better data logging. Future issues should split this into smaller sub-problems.

performance-layers output two types of logs: event/data logs (layer_data->Log()) and debug logs (LOG(WARNING) << ). We would like to refactor the former one such that layers do not have to worry about the exact output file and output format, yet be able to select from multiple ones.

The goal is to support future logging formats, e.g., chrome event traces.

Requirements:

The new classes should be internally-synchronized and not require any mutexes.
We should be able to unit-test the implementation without having to run the layers.
Do not introduce any new abseil code.

The final set of classes may look something like this:

graph TD;
  aa(Attribute);
  Int64Attr-->aa;
  StringAttr-->aa;
  DurationAttr-->aa;
  TimestampAttr-->aa;
  ...Atr -->aa;

  ae(Event);
  aa -.->|Contains one or more| ae;

graph LR;
  ae(Event);
  EventA-->ae;
  EventB-->ae;
  EventC-->ae;

  el(EventLogger);
  cl(LeafLogger);
  ae-.->|AddEvent| el;
  el-->BroadcastLogger;
  el-->FilterLogger;
  el-->cl;

graph TD;
  cl(LeafLogger);
  TestLogger-->cl;
  CsvLogger-->cl;
  CommonLogger-->cl;
  TraceEventLogger-->cl;

graph LR;
  el(EventLogger);
  ao(LogOutput);
  el-.->|LogLine| ao;
  ao --> StringOutput & FileOutput;

Subtasks:

#75
#81
- Use new logging abstractions for CSV logs
- Use new logging abstractions for common logs
- Introduce log output abstraction and implement it for files and strings
Allow for string log outputs to be used in unit tests
Add event logging implementation for the Chrome Event Tracing format

google / vulkan-performance-layers Goto Github PK

vulkan-performance-layers's Issues

Broken Script

Recommend Projects

Recommend Topics

Recommend Org