Git Product home page Git Product logo

vulkan-performance-layers's Issues

Introduce `LogOutput` abstraction.

After an EventLogger converts an Event to a string, it should write it to the output (Issue: #69 ). The output could be a file, standard output, or a string (for testing). LogOutput is the object that takes care of writing to the output and exposes the LogLine() method. Since the underlying file might be shared among multiple threads, LogLine() must be synchronized to make sure the logs do not get corrupted.
A sample implementation:

class Logoutput {
 public:
  virtual void LogLine(const std::string &line) = 0;
};

class FileOutput : public LogOutput {
 public:
  void LogLine(const std::string &line) override { ... }
 private:
  FILE *out_ = nullptr;
};

class StringOutput: public LogOutput {
 public:
  void LogLine(const std::string &line) override { ... }
  
  const std::string &GetLog() { return out_; } 
 private:
  std::string out_ = "";
};

Fix `return-type` and `comment` warnings.

Some of the current warnings are caught as errors when the code gets imported.
7787cba

We would like to catch all of those warnings in the CI. To do that, we enable all the warnings and treat them as errors by setting -Werror.
comment and return-type are the existing unhandled warnings (excluded from -Werror) that do not break the imports but we would like to fix them.

Introduce abstractions for loggable events

The existing event logging is rather ad-hoc. Examples:

  FrameTimeLayerData(char* log_filename, uint64_t exit_frame_num_or_invalid,
                     const char* benchmark_watch_filename,
                     const char* benchmark_start_string)
      : LayerData(log_filename, "Frame Time (ns),Benchmark State"),
        exit_frame_num_or_invalid_(exit_frame_num_or_invalid),
        benchmark_start_pattern_(StrOrEmpty(benchmark_start_string)) {
    LogEventOnly("frame_time_layer_init");
  layer_data->LogTimeDelta("frame_present",
                           layer_data->HasBenchmarkStarted() ? "1" : "0");
    int64_t logged_delta = ToInt64Nanoseconds(now - last_log_time_);
    LogLine(event_type, CsvCat(logged_delta, extra_content), now);
    layer_data->LogEventOnly("frame_time_layer_exit",
                             absl::StrCat("terminated,frame:", frames_elapsed));

This form of logging is unstructured and does not allow for emitting logs in different output formats. All the event data is converted to strings and concatenated before we can decide which logging format to use.

This task is to come up with a more structured abstraction for logged events that will allow for multiple output formats (e.g., csv and json).

The generic interface should provide the following functions:

  1. getEventName()
  2. getLogLevel() -- decides the 'verbosity level', e.g., whether to log to CSV or to the common log file
  3. getNumAttributes -- returns the number of data entries (or 'columns' / tuple elements / etc)
  4. getAttributes()

We want to support a few data entry types:

  1. String
  2. Int64 -- e.g., count of something
  3. Int64Array -- e.g., array of hashes
  4. Timepoint / timestamp -- when something happened
  5. Duration -- how long something took

All of this combines as follows:

  • Event is a names set of attributes
  • Attribute is a key-value pair where the attribute name is the key and value can be of one of the supported types.

A concrete layer-specific event definition could look something like this:

class ShaderModuleCreateEvent : public LoggableEvent {
public:
   // (Constructor and other functions...)
   
  std::vector<const Attribute*> getAttributes() const {
    return {&shader_module_hashes_, &creation_time_};
  }
  
private:
  Int64ArrayAttr shader_module_hashes_("shader_hashes");
  DurationAttr creation_time_("creation_time");
};

std::unique_ptr<LoggableEvent> CreateShaderModuleCreateEvent(const ShaderHashes &hashes,
                                                             Duration creation_time,
                                                             Timestamp when = Timestamp::Now()) {
  return std::make_unique<ShaderModuleCreateEvent>("create_shader_module", when,
                                                   hashes, creation_time);
} 

(This is just an example for inspiration. Better ideas welcome!)

Introduce event logging abstraction.

An EventLogger takes a stream of Event as input and produces a log based on the desired format (CSV, Chrome trace event, etc).

flowchart LR
    EventCreator --Event --> EventLogger --log--> OutputLogger

The EventLogger would be an abstract class with some public methods: AddEvent(), StartLog(), and FinalizeLog(). It also requires a flush mechanism. Either a public Flush() method triggered from outside, or an implicitly triggered Flush() invoked when the number of current events reaches a certain threshold.
Each format may require prefixes and suffixes around the final log. For instance, CSV requires adding a header to the first line of the file. StartLog() and FinalizeLog() should be implemented according to the specifications of each format.

To generate a log, these steps are required:

  1. Converting each Event into an entry:
    Each Event is an entry in the final log output. The entry format is defined by the EventLogger implementation. For instance: The entry for Event(name: "compile_time_ns", timestamp: 1234, hash:0xabc, duration:123) would look like this in the CSV format:
compile_time,1234,0xabc,123

and this in the Chrome event trace format:

{name:"compile_time_ns", timestamp: 1234, hash: 0xabc, duration: 123}
  1. Combining entries:
    This step, generates the log's body by combining the entries. In CSV, the entries are separated by a new line and in Chrome event trace, they're comma separated.
    The pseudo code for the EventLogger would be like this:
class EventLogger {
 public:
  void InsertEvent(Event* e) = 0;
  // Flush may receive an input (an instance of LogOutput), pass the string generated from concatenating the events to it, and 
  // delete the current events.
  void Flush(...) = 0;
  void StartLog() = 0;
  void FinishLog() = 0;
 protected:
  std::vector<Event*> events_;
};

class EventLoggerImpl: public EventLogger {
 public:
  void Flush(...) override {
    // Convert Events to entries.
    ...
    // Combine entries.
    ...
    // Delete current events
    ...
    // Use LogOutput to print the log
  }
}

Broken Script Tracking Issue

Broken Script

Changing the common log format causes the plot_timeline.py script to break because it's using the previous format to parse the event.log. Currently, we're in the middle of refactoring the layers to use the CommonLogger that writes the events in the new format. We want to finish refactoring all the layers before updating the script. This issue is created to track the broken script and fix it after the refactoring is over.

We can fix this issue when the following layers are refactored:

    • Frame time layer
    • Compile time layer
    • Runtime layer
    • Cache sideload layer
  • Update plot_timeline.py

Fix Unaligned slack events

image
This is the output of the vkcube in the Trace Event format rendered by Perfetto UI.

The shader slack events should start after the shader creation events (the small box in the bottom) and end before the pipline creation events. The start and end time is not measured properly in the layers.

Introduce the event dispatch abstraction

It is good to separate the function which creates an event from the function that triggers the logger(s) for that event. This is beneficial when we want to add support for a new format, change the current output format for a set of existing events, etc. Consider we have the CSV logger and add support for the Trace Event format. In this case we need to find all the call sites of the CSV logger and change the code so that it uses the newly introduced logger. This is too much change and it's error prone as we might miss some sections. One of the alternatives is to have a dispatcher that forwards an event from its creator to a logger(s) registered for that event. It basically maps the event names to a vector of loggers. This way, when we add a new format and want to use it for a specific type of event, we only append it to the logger vector of the event.
The dispatcher has two public interfaces, HandleEvent() and RegisterHandler(). The first one receives an event and invokes the loggers for that event. The second one adds a logger to the loggers of an event.

flowchart LR
    EventCreator--Event-->Dispatcher
    subgraph Dispatcher
      direction LR
      EventType -.-> EventLogger*      
    end
    Dispatcher -- Event --> EventLogger

A sample implementation would like this:

class Dispatcher {
 public:
  // Invokes the loggers registered for this event.
  void HanldeEvent(Event *event) {
  }
  // Adds the logger to the event's logger vector.
  void RegisterLogger(EventLogger* logger, std::string event_name) {
    ...
  }

 private:
  std::map<std::string, std::vector<EventLogger*> > name_to_logger_vector;

}

Handle `time` conversions from constants.

We use 2 different clocks, TimestampClock and DurationClock. They are aliases for std::chrono::system_clock and std::chrono::steady_clock. I created a wrapper around these clocks to make them easier to work with. E.g: #134,
I had 2 assumptions in mind when writing these classes.
First, the DurationClock and TimestampClock have the same precision. This caused a bug on a system in which this didn't hold. I fixed it in #145 by casting the duration from one clock to the other.

My second assumption was that the clocks always are in the nanoseconds precision. The second bug happens in this code:

class Timestamp {
 public:
  static Timestamp FromNanoseconds(int64_t nanos) {
    return Timestamp(
        TimestampClock::time_point(TimestampClock::duration(nanos)));
  }

  Timestamp(TimestampClock::time_point timestamp) : timestamp_{timestamp} {}

  int64_t ToNanoseconds() const {
    return std::chrono::nanoseconds(timestamp_.time_since_epoch()).count();
  }

 private:
  TimestampClock::time_point timestamp_;
};

The Timestamp class is a wrapper around a TimestampClock. It has a FromNanoseconds() factory method that receives an input and stores it in the underlying TimestampClock::time_point variable assuming the TimestampClock has a nanoseconds precision. That is why this line of test fails in a system that has a different clock.

https://github.com/google/vulkan-performance-layers/blob/main/layer/unittest/trace_event_log_tests.cc#L198

Add runtime layer FileCheck.

Now that the runtime layer doesn't hang in the CI when running vkcube, we can have a FileCheck for the output produced by the runtime layer.

Improve source code directory structure

Currently all source and header files are in the same directory. This is not glob-friendly because we keep multiple binaries (layer .sos) in the same directory.

Instead, we should separate out common source files and create a source directory for each binary target.

Event logging implementation for the Chrome Trace Event format.

Here is the base Trace Event format:

{
  "name": "myName",
  "cat": "category,list",
  "ph": "B",
  "ts": 12345,
  "pid": 123,
  "tid": 456,
  "args": {
    "someArg": 1,
    "anotherArg": {
      "value": "my value"
    }
  }

Not all the fields are required and we can skip cat, pid, tid, and args. But, we might need additional fields based on the phase (ph) we specify. Since our main goal is to show duration, the phase should be X which indicates an event with duration in Trace Event (a Complete event). This event requires an additional duration (dur) field. A simple Complete event would look like this:

{ "name": "myName", "ph": "X", "ts": 12345, "dur": 123}

Current Event class does not have a "phase" and putting the phase as an attribute might mess things up because we have to change the other loggers to ignore it. I suggest the following code:

enum Phase { Complete };

class TraceEvent: public Event { 
 public:
  Phase GetPhase() { return phase_;};
  virtual Duration GetDuration() { ... }
 private:
  Phase phase_;
};

std::string TraceEventToStr(TraceEvent *event) {
    std::stringstream event_str;
    int64_t timestamp_ms = ToMilliseconds(event->GetTimestamp());
    event_str << "{" << std::quoted("name") << ":" << std::quoted(event->GetName()) << std::quoted("ts") << ":" << timestamp_ms << "}";
    if (event->GetPhase() == Phase::Complete)
        event_str << std::quoted("ph") << ":" << std::quoted("X") << "," << std::quoted("dur") << ":" << ToMilliseconds(event->GetDuration());

    event_str << "},";    
    return event_str.str();    
}

To get the duration, the events that are created as a TraceEvent have already a duration attribute. There are a couple of ways to get that attribute:

  • Keep a pointer to the duration attribute in the TraceEvent as a private variable. This approach might require the least modification to the previous code.
  • Move the duration attribute to the Event class so that all the Events have duration. The default duration should be 0. This might not make sense especially for the events that do not have any duration (e.g., the events indicating the start of a layer). We should also modify the loggers as well because the durations are moved out of the attribute vector and we can't print them by just iterating over the attributes.
  • Add a GetAttributeByName() method to the event to iterate over the attributes and return the matched one. The good thing about it is that we don't need to duplicate things or change tons of code. But the caveat might be the performance since we have to do it for each event.

The TraceEventLogger takes a TraceEvent, converts it to a 1 line string using the mentioned function and logs it to the output. It also overrides the StartEvent() to print an open bracket to the file.

Then we need a DataLayer with a TraceEventLogger that runtime and compile time layer use.

Update the CI for the common log file.

The format of the events written to the common log file has changed. An example of the new format:
create_graphics_pipelines,timestamp:1667521492863138945,hashes:"[0x57dad0c231e4e5ce,0x92cc37a4b4360ca9]",duration:595730
The CommonLogger logs the events in the new format while the current CI checks the common log file with the previous format.
When we start refactoring the layers with CommonLogger, common log checks should be disabled and re-enabled when the checks are consistent with the new format.

Use dispatchable handles instead of tracking object relationships

This is a low priority refactoring task to simplify core layer code.
See the suggestion form @paulthomson: #18 (comment).

In other words, instead of using VkInstance as a key (and the same idea applies for VkDevice), use the "dispatch table pointer" of the VkInstance, which you get using this function (please ignore the snake_case naming). And when you have a VkPhysicalDevice, use pretty much the same function to get the VkInstance dispatch table pointer, which you can then use to look-up the data you have for the VkInstance (which should include the VkInstance value itself).

Another example of where this is somewhat nice is in a frame counter layer. We don't need to track VkQueue; we can go straight from the VkQueue to the DeviceData.

Refactor data logging

This is a high-level issue for better data logging. Future issues should split this into smaller sub-problems.

performance-layers output two types of logs: event/data logs (layer_data->Log()) and debug logs (LOG(WARNING) << ). We would like to refactor the former one such that layers do not have to worry about the exact output file and output format, yet be able to select from multiple ones.

The goal is to support future logging formats, e.g., chrome event traces.

Requirements:

  • The new classes should be internally-synchronized and not require any mutexes.
  • We should be able to unit-test the implementation without having to run the layers.
  • Do not introduce any new abseil code.

The final set of classes may look something like this:

graph TD;
  aa(Attribute);
  Int64Attr-->aa;
  StringAttr-->aa;
  DurationAttr-->aa;
  TimestampAttr-->aa;
  ...Atr -->aa;

  ae(Event);
  aa -.->|Contains one or more| ae;
graph LR;
  ae(Event);
  EventA-->ae;
  EventB-->ae;
  EventC-->ae;

  el(EventLogger);
  cl(LeafLogger);
  ae-.->|AddEvent| el;
  el-->BroadcastLogger;
  el-->FilterLogger;
  el-->cl;
graph TD;
  cl(LeafLogger);
  TestLogger-->cl;
  CsvLogger-->cl;
  CommonLogger-->cl;
  TraceEventLogger-->cl;
graph LR;
  el(EventLogger);
  ao(LogOutput);
  el-.->|LogLine| ao;
  ao --> StringOutput & FileOutput;

Subtasks:

  • #75
  • #81
    • Use new logging abstractions for CSV logs
    • Use new logging abstractions for common logs
    • Introduce log output abstraction and implement it for files and strings
  • Allow for string log outputs to be used in unit tests
  • Add event logging implementation for the Chrome Event Tracing format

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.