w3c / compute-pressure Goto Github PK

View Code? Open in Web Editor NEW

69.0 27.0 10.0 566 KB

A web API proposal that provides information about available compute capacity

Home Page: https://www.w3.org/TR/compute-pressure/

License: Other

HTML 100.00%

compute-pressure's Introduction

Compute Pressure

Authors:

Kenneth Rohde Christiansen (Intel)
Anssi Kostiainen (Intel)
Victor Costan (Google)
Olivier Yiptong (formerly Google)

Participate

Issue tracker

Introduction
Goals / Motivating Use Cases
- Future Goals
Non-goals
Current approach - high-level states
Throttling
Measuring pressure is complicated
How to properly calculate pressure
Design considerations
API flow illustrated
Other considerations
Observer API
Key scenarios
- Adjusting the number of video feeds based on CPU usage
Detailed design discussion
- Prevent instead of mitigate bad user experiences
- Third-party contexts
Considered alternatives
- Expose a thermal throttling indicator
Stakeholder Feedback / Opposition
References & acknowledgments

Introduction

🆕✨ We propose a new API that conveys the utilization of system resources, initially focusing on CPU resources (v1) with the plan to add other resources such as GPU resources in the future (post v1).

In a perfect world, a computing device is able to perform all the tasks assigned to it with guaranteed and consistent quality of service. In practice, the system is constantly balancing the needs of multiple tasks that compete for shared system resources. The underlying system software tries to minimize both the overall wait time and response time (time to interactive) and maximize throughput and fairness across multiple tasks running concurrently.

This scheduling action is handled by an operating system module called scheduler whose work may also be assisted by hardware in modern systems. Notably, all this is transparent to web applications, and as a consequence, the user is only made aware the system is too busy when there's already a perceived degradation in quality of service. For example, a video conferencing application starts dropping video frames, or worse, the audio cuts out.

As this is undesirable for the end-user, software developers would like to avoid such cases and balance the set of enabled features and their quality level against the resource pressure of the end-user device.

Goals / Motivating Use Cases

The primary use-cases enhanced by v1 are focused on improving the user experience of web apps, in particular, but not restricted to streaming apps like video video conferencing and video games.

These popular real-time applications are classified as soft. That is, the quality of service degrades if the system is exercised beyond certain states, but does not lead to a total system failure. These soft real-time applications greatly benefit from being able to adapt their workloads based on CPU consumption/pressure.

If the use-cases is to adapt the user experience to the user system at hand, measuring the time to achieve certain tasks is an option, but web apps can also suffer from unusual high CPU pressure beyond the app's control.

As an example, external pressure can result in a degraded interactivity experience by making certain tasks take longer than usual. e.g, increasing the time it takes for complex components to render and thus increase the response time to interactions, resulting in a degraded user experience. This example can be mitigated by rendering simpler content or skeleton content in cases where the CPU pressure is high.

Specifically, v1 aims to facilitate the following adaptation decisions for these use cases:

Video conferencing
- Adjust the number of video feeds shown simultaneously during calls with many participants
- Reduce the quality of video processing (video resolution, frames per second)
- Skip non-essential video processing, such as some camera filters
- Disable non-essential audio processing, such as WebRTC noise suppression
- Turn quality-vs-speed and size-vs-speed knobs towards “speed” in video and audio encoding (in WebRTC, WebCodecs, or software encoding)
Video games
- Use lower-quality assets to compose the game’s video (3D models, textures, shaders) and audio (voices, sound effects)
- Disable effects that result in less realistic non-essential details (water / cloth / fire animations, skin luminance, glare effects, physical simulations that don’t impact gameplay)
- Tweak quality-vs-speed knobs in the game’s rendering engine (shadows quality, texture filtering, view distance)
User interfaces
- Render simple or skeleton content instead of real data while system is under pressure

Technically these can be accomplished by knowing thermal states (e.g., is the system being passively cooled - throttled) as well as CPU pressure states for the threads the site is using such as main thread and workers. System thermal state is a global state and can be affected by other apps and sites than the observing site.

Future Goals

Post v1 we plan to explore support for other resource types, such as GPU resources.

Additionally, we would like to investigate whether we can enable measurement of hardware resource consumption of different code paths in front end code.

We aim to support the following decision processes:

Compare the CPU consumption of alternative implementations of the same feature, for the purpose of determining the most efficient implementation. We aim to support measuring CPU utilization in the field via A/B tests, because an implementation’s CPU utilization depends on the hardware it’s running on, and most developers cannot afford performance measurement labs covering all the devices owned by their users.
Estimate the impact of enabling a feature on CPU consumption. This cost estimate feeds into the decisions outlined in the primary use cases.

Non-goals

This proposal exposes a high-level abstraction that considers both CPU utilization and thermal throttling. This limitation leaves out some resource consumption decisions that Web applications could make to avoid bad the user experiences mentioned in the introduction.

The following decisions will not be supported by this proposal:

Routing video processing, such as background replacement, to the CPU (via WebAssembly) or to the GPU (via WebGL / WebGPU).
Routing video encoding or decoding to an accelerated hardware implementation (via WebCodecs) or software (via WebAssembly).

Video conferencing applications and games would require the following information to make the decisions enumerated above:

GPU utilization
CPU capabilities, such as number of cores, core speed, cache size
CPU vendor and model

Current approach - high-level states

The API defines a set of pressure states delivered to a web application to signal when adaptation of the workload is appropriate to ensure consistent quality of service. The signal is proactively delivered when the system pressure trend is rising to allow timely adaptation. And conversely, when the pressure eases, a signal is provided to allow the web application to adapt accordingly.

Human-readable pressure states with semantics attached to them improve ergonomics for web developers and provide future-proofing against diversity of hardware. Furthermore, the high-level states abstract away complexities of system bottlenecks that cannot be adequately explained with low-level metrics such as processor clock speed and utilization.

For instance, a processor might have additional cores that work can be distributed to in certain cases, and it might be able to adjust clock speed. The faster clock speed a processor runs at, the more power it consumes which can affect battery and the temperature of the processor. A processor that runs hot may become unstable and crash or even burn.

For this reason processors adjust clock speed all the time based on factors such as the amount of work, whether the device is on battery power or not (AC vs DC power) and whether the cooling system can keep the processor cool. Work often comes in bursts. For example, when the user is performing a certain operation that requires the system to be both fast and responsive, modern processors use multiple boost modes to temporarily runs the processor at an extremely high clock rate in order to get work out of the way and return to normal operation faster. When this happens in short bursts it does not heat up the processor too much. This is more complex in real life because boost frequencies depend on how many cores are utilized among other factors.

The high-level states proposal hides all this complexity from the web developer.

Throttling

A processor might be throttled, run slower than usual, resulting in a poorer user experience. This can happen for a number of reasons, for example:

The temperature of the processor is higher than what can be sustained for longer periods of time
Other bottlenecks exists in the system, e.g. work is blocked on memory access
System is battery-powered (DC), or its battery level is low
The user has explicitly set or the system is preconfigured with a preference for longer battery life over high performance, or better acoustic performance

User's preferences affecting throttling may be configured by the user via operating system provided affordances while some may be preconfigured policies set by the hardware vendor. These factor are often dynamically adjusted taking user's preference into consideration.

Measuring pressure is complicated

Using utilization as a measurement for pressure is suboptimal. What you may think 90% CPU utilization means:

 _____________________________________________________________________
|                                                         |           |
|                          Busy                           |  Waiting  |
|                                                         |  (idle)   |
|_________________________________________________________|___________|

What it might really mean is:

 _____________________________________________________________________
|          |                                              |           |
|   Busy   |                   Waiting                    |  Waiting  |
|          |                  (Stalled)                   |  (idle)   |
|__________|______________________________________________|___________|

Stalled means that the processor is not making forward progress with instructions, and this usually happens because it is waiting on memory I/O. Chances are, you're mostly stalled. This is even more complicated when the processor has multiple cores and the cores you are using are busy but your work cannot simply be distributed to other cores.

The overall system processor utilization may be low for nonobvious reasons. An active core can be running slower waiting on memory I/O, or it may be busy but is throttled due to thermals.

Furthermore, some modern systems have different kind of cores, such as performance cores and efficiency cores, or even multiple levels of such. You can imagine a system with just an efficiency core running when workload is nominal (background check of notifications etc.) and performance cores taking over to prioritize UX when an application is in active use. In this scenario, system will never reach 100% overall utilizations as the efficiency core will never run when other cores are in use.

Clock frequency is likewise a misleading measurement as the frequency is impacted by factors such as which core is active, whether the system is on battery power or plugged in, boost mode being active or not, or other factors.

How to properly calculate pressure

Properly calculating pressure is architecture dependent and as such an implementation must consider multiple input signals that may vary by architecture, form factor, or other system characteristics. Possible signals could be, for example:

AC or DC power state
Thermals
Some weighted values of “utilization” including information about memory I/O

A better metric than utilization could be CPI (clock ticks per instruction, retained) that reports the amount of clock ticks it takes on average to execute an instruction. If the processor is waiting on memory I/O, CPI is rising sharply. If CPI is around or below 1, the system is usually doing well. This is also architecture dependent as some complex instructions take up multiple instructions. A competent implementation will take this into consideration.

Design considerations

In order to enable web applications to react to changes in pressure with minimal degration in quality or service, or user experience, it is important to be notified while you can still adjust your workloads (temporal relevance), and not when the system is already being throttled. It is equally important to not notify too often for both privacy (data minimization) and developer ergonomics (conceptual weight minimization) reasons.

In order to expose the minimum data necessary at the highest level of abstraction that satisfy the use cases, we suggest the following buckets:

⚪ Nominal: Work is minimal and the system is running on lower clock speed to preserve power.

🟢 Fair: The system is doing fine, everything is smooth and it can take on additional work without issues.

🟡 Serious: There is some serious pressure on the system, but it is sustainable and the system is doing well, but it is getting close to its limits:

Clock speed (depending on AC or DC power) is consistently high
Thermals are high but system can handle it

At this point, if you add more work the system may move into critical.

🔴 Critical: The system is now about to reach its limits, but it hasn’t reached the limit yet. Critical doesn’t mean that the system is being actively throttled, but this state is not sustainable for the long run and might result in throttling if the workload remains the same. This signal is the last call for the web application to lighten its workload.

API flow illustrated

As an example, a video conferencing app might have the following dialogue with the API:

Developer: How is pressure?

System: 🟢 It's fair

Developer: OK, I'll use a better, more compute intensive audio codec

System: 🟢 Pressure is still fair

Developer: Show video stream for 8 instead of 4 people

System: 🟡 OK, pressure is now serious

Developer: Great, we are doing good and the user experience is optimal!

System: 🔴 The user turned on background blur, pressure is now critical. If you stay in this state for extended time, the system might start throttling

Developer: OK, let’s only show video stream for 4 people (instead of 8) and tell the users to turn off background blur for a better experience

System: 🟡 User still wants to keep background blur on, but pressure is now back to serious, so we are doing good

Other considerations

There are a lot of advantages to using the above states. For once, it is easier for web developers to understand. What web developers care about is delivering the best user experience to their users given the available resources that vary depending on the system. This may mean taking the system to its limits as long as it provides a better experience, but avoiding taxing the system so much that it starts throttling work.

Another advantage is that this high-level abstraction allows for considering multiple signals and adapts to constant innovation in software and hardware below the API layer. For instance, a CPU can consider memory pressure, thermal conditions and map them to these states. As the industry strives to make the fastest silicon that offers the best user experience, it is important that the API abstraction that developers will depend on is future-proof and stands the test of time.

If we'd expose low-level raw values such as clock speed, a developer might hardcode in the application logic that everything above 90% the base clock is considered critical, which could be the case on some systems today, but wouldn't generalize well. For example, on a desktop form factor or on a properly cooled laptop with an advanced CPU, you might go way beyond the base clock with frequency boosting without negative impacting user experience, while a passively-cooled mobile device would likely behave differently.

Observer API

We propose a design similar to Intersection Observer to let applications be notified when the system's pressure changes.

function callback(entries) {
  const lastEntry = entries[entries.length - 1];
  console.log(`Current pressure ${lastEntry.state}`);
}

const observer = new PressureObserver(callback);
await observer.observe("cpu", {sampleInterval: 1_000 }); // 1000ms

Key scenarios

Adjusting the number of video feeds based on CPU usage

In this more advanced example we lower the number of concurrent video streams if pressure becomes critical. As lowering the amount of streams might not result in exiting the critical state, or at least not immediately, we use a strategy where we lower one stream at the time every 30 seconds while still in the critical state.

The example accomplishes this by creating an async iterable that will end iterating as soon as the pressure exists critical state, or every 30 seconds until then.

// Utility: A Promise that is also an Iterable that will iterate
// at a given interval until the promise resolves.

class IteratablePromise extends Promise {
  #interval;
  #fallback;

  constructor(fn, interval, fallbackValue) {
    super(fn);
    this.#interval = interval;
    this.#fallback = fallback;
  }

  async* [Symbol.asyncIterator]() {
    let proceed = true;
    this.then(() => proceed = false);

    yield this.#fallback;

    while (proceed) {
      let value = await Promise.any([
        this,
        new Promise(resolve => setTimeout(resolve, this.#interval))
      ]);

      yield value || this.#fallback;
    }
  }
};

// Allow to resolve a promise externally by calling resolveFn
let resolveFn = null;
function executor(resolve) {
  resolveFn = value => resolve(value)
}

async function lowerStreamCountWhileCritical() {
  let streamsCount = getStreamsCount();
  let iter = new IteratablePromise(executor, 30_000, "critical");

  for await (const state of iter) {
    if (state !== "critical" || streamsCount == 1) {
      break;
    }
    setStreamsCount(streamsCount--);
  }
}

function pressureChange(entries) {
  for (const entry of entries) {
    if (resolveFn) {
      resolveFn(entry.state);
      resolveFn = null;
      continue;
    }

    if (entry.state == "critical") {
      lowerStreamCountWhileCritical();
    }
  }
}

const observer = new PressureObserver(pressureChange);
await observer.observe("cpu", { sampleInterval: 1_000 });

Detailed design discussion

Prevent instead of mitigate bad user experiences

A key goal for our proposal is to prevent, rather than mitigate, bad user experience. Mobile devices such as laptops, smartphones and tablets, when pushed into high CPU or GPU utilization may cause the device to become uncomfortably hot, cause the device’s fans to get disturbingly loud, or drain the battery at an unacceptable rate.

The key goal above disqualifies solutions such as requestAnimationFrame(), which lead towards a feedback system where bad user experience is mitigated, but not completely avoided. Feedback systems have been successful on desktop computers, where the user is insulated from the device's temperature changes, the fan noise variation is not as significant, and DC power means stable power supply.

Third-party contexts

This API will only be available in frames served from the same origin as the top-level frame. This requirement is necessary for preserving the privacy benefits of the API's quantizing scheme.

The same-origin requirement above implies that the API is only available in first-party contexts.

Considered alternatives

Expose a thermal throttling indicator

On some operating systems and devices, applications can detect when thermal throttling occurs. Thermal throttling is a strong indicator of a bad user experience (high temperature, CPU cooling fans maxed out).

This option was discarded because of concerns that the need to mitigate some recent attacks may lead to significant changes in the APIs that this proposal was envisioning using.

Theoretically, Chrome can detect thermal throttling on Android, Chrome OS, and macOS. However, developer experience suggests that the macOS API is not reliable.

Stakeholder Feedback / Opposition

Chrome: Positive
Gecko: Negative
WebKit: TODO
Web developers: Positive

References & acknowledgments

Many thanks for valuable feedback and advice from:

Asaf Yaffe
Chen Xing
Evan Shrubsole
Jan Gora
Jesse Barnes
Joshua Bell
Kamila Hasanbega
Matt Menke
Moh Haghighat
Nicolás Peña Moreno
Opal Voravootivat
Paul Jensen
Peter Djeu
Raphael Kubo Da Costa
Reilly Grant
Ulan Degenbaev
Victor Miura
Wei Wang
Zhenyao Mo

Exposing CPU utilization information has been explored in the following places.

This explainer is based on the W3C TAG's template.

compute-pressure's People

Contributors

Stargazers

Watchers

Forkers

kenchris adeyemi-timilehin arskama anssiko alexanderfedin himorin rakuco yuelight wangw-1991 majed08

compute-pressure's Issues

Nit

I believe it should |result|, not |return| below
https://github.com/WICG/compute-pressure/blob/75052b4fd3f1ce425d61fe85d14b1b3d4ff59408/index.html#L514

The ComputePressureRecord should contain source

This allows to differentiate what target has had pressure change

Use-case: Current process contribution

It would be very helpful to know current browser process (tab) contribution into the total CPU load.
For example, I have a powerful machine, I am in the video conference, and also I run chromium build. CPU becomes 100% busy, but it does not make sense to reduce number of video feeds in the conference, as this will not save any visible CPU amount - most CPU is taken by build process.
In the video conference code I would analyze current process contribution, and if it is less than lets say 50%, I would avoid any CPU saving activities.

API design: Handling critical pressure

When you handle something like critical pressure, you will need to do less to attempt to exit the state, but you don't always know that doing a bit less is enough to get out of the critical state. Additionally due to thermals etc., doing less might not be reflected immediately.

For this reason, you probably want to keep checking every few seconds (or say half a minute - depending on state) if your change was sufficient or whether you need to do more. An example could be cutting a concurrent video stream from a conference call at each interval while still in the undesirable state.

We have hear before that some prefer to only get events when the state actually changes, but in my own prototype I preferred getting notified at a certain interval and ASAP when a state changed.

You can build such infrastructure yourself on top of the existing specification but it is a bit complex

Here is one way to do it using an async pattern - without special helpers in JS this becomes quite hard to do yourself

Helper:

// Utility: A Promise that is also an Iterable that will iterate
// at a given interval until the promise resolves.

class IteratablePromise extends Promise {
  #interval;
  #fallback;

  constructor(fn, interval, fallbackValue) {
    super(fn);
    this.#interval = interval;
    this.#fallback = fallback;
  }

  async* [Symbol.asyncIterator]() {
    let proceed = true;
    this.then(() => proceed = false);

    yield this.#fallback;

    while (proceed) {
      let value = await Promise.any([
        this,
        new Promise(resolve => setTimeout(resolve, this.#interval))
      ]);
          
      yield value || this.#fallback;
    }
  }
};

// Allow to resolve a promise externally by calling resolveFn
let resolveFn = null;
function executor(resolve) {
  resolveFn = value => resolve(value)
}

async function lowerStreamCountWhileCritical() {
  let streamsCount = getStreamsCount();
  let iter = new IteratablePromise(executor, 30_000, "critical");

  for await (const state of iter) {
    if (state !== "critical" || streamsCount == 1) {
      break;
    }
    setStreamsCount(streamsCount--);
  }
}

function pressureChange(entries) {
  for (const entry of entries) {
    if (resolveFn) {
      resolveFn(entry.state);
      resolveFn = null;
      continue;
    }

    if (entry.state == "critical") {
      lowerStreamCountWhileCritical();
    }
  }
}

const observer = new ComputePressureObserver(pressureChange);
observer.observe();

A much simpler option would be to have the callback being called at an interval that is user configurable, and having it being called even earlier if the state actually changes (and the system polls at a higher frequency)

This is much simpler:

let timerId = -1;
function pressureChange(entries) {
  clearTimeout(timerId);

  const lastEntryArray = [entries.at(entries.length - 1)];
  timerId = setTimeout(pressureChange.bind(this, lastentryArray), 30_000);


  for (const entry of entries) {
    if (entry.state == "critical") {
      let streamsCount = getStreamsCount();
      setStreamsCount(streamsCount--);
    }
  }
}

const observer = new ComputePressureObserver(pressureChange);
observer.observe();

I suggest we allow configuring this behavior simply by the options dict to observe() or the like

Simply something like the following

// Default: Always call as soon as a state change is observed
observer.observe({ immediate: true })

 // Generally call the callback at a declared interval, but immediately if state changes
observer.observe({ interval: 30_000, immediate: true })

// Always call given an interval of 2 sec
observer.observe({ interval: 2000, immediate: false })

Add missing examples for requestPermission()

Spec has been recently updated with a new requestPermission() method. It would be nice to have JS examples in the spec AND the readme file

Support querying method

Feature request from Zoom:

As you can add a callback that stores the current value, you can easily implement a querying method yourself, so adding this doesn't effect privacy or security further. Additionally, existing observers have this ability function as the takeRecords(); method.

https://w3c.github.io/performance-timeline/#dom-performanceobserver-takerecords
https://dom.spec.whatwg.org/#dom-mutationobserver-takerecords
https://www.w3.org/TR/intersection-observer/#dom-intersectionobserver-takerecords

Unloading of workers (and hence Platform Collector)

Find the right hooks to make sure we inform the platform collector when workers are unloaded

ComputePressureObserver ownership

Similar IntersectionObserver issue: w3c/IntersectionObserver#64

Our observers are conceptually attached to the document (not a DOM node or other JS object), so there's no natural place to "hang" ownership off of.

Here are the options I see.

The observer conceptually hangs off of the Document / global object. Downsides:
a. If the ComputePressureObserver JS object is garbage-collected, there's no way to stop the observer. It will keep the listeners alive (normal for an EventHandler, afaik) and invoke them until the user navigates away from the page.
The observer is owned by the ComputePressureObserver JS object. Downsides:
a. Garbage collection can be observed by JS (events stop getting dispatched).
b. Less consistent behavior, probably more difficult for developers to debug.

Should "Notify Compute Pressure Observers" take target types into consideration?

https://wicg.github.io/compute-pressure/#notify-observers seems to notify all observers of all types when invoked, and it's not clear if that's desired. In other words, should an observer started via observe('cpu') also receive updates for GPUs?

Spec: Add examples for takeRecords()

@beaufortfrancois

Make an example with takeRecords()

takeRecords() is a part of the API, and should be documented with an example.

Spec: add examples for disconnect and unobserve

From @beaufortfrancois:

Another example where unobserve() is used would be useful as well. disconnect() seems to be the same with only one observer. Then an example with multiple observers would be nice.

Quantization clarification - how are threshold values handled?

How should implementations handle values that exactly equal the quantization thresholds?

For example, the CPU utilization thresholds [0.2, 0.5, 0.8] will create the intervals [0, 0.2], [0.2, 0.5], [0.5, 0.8] and [0.8, 1]. If the measured CPU utilization is exactly 0.5, would that be quantized to 0.35 (for the interval [0.2, 0.5]) or 0.65 (for the interval [0.5, 0.8])?

(Yes, the current explainer and samples are intentionally vague, because I didn't have a good answer.)

Spec: add the proper hooks and explanation on how to map from hardware states to high level states

Before we had algos consider CPU clock speed and utilization directly, but we cannot do this anymore

Allow the API to work in the background (with limitations)

Broken out from #14

From Zoom:

we would like the api could be workable even if the tab is inactive or minimized. We have a use case about desktop sharing.

This of course affects privacy so we will see if we need to add certain restrictions in order to make this work., like we could make it work only for focused tabs (like currently implemented, I believe) but allow it also when a tab is currently sharing content using the APIs allowing for that.

Figure out whether to use agents or global object

We need to look into this. We currently use agents, but I don't think we need to

Add example(s) showing usage of `observer` in callbacks

ComputePressureUpdateCallback takes two arguments, but all the examples in the spec only make use of the first one, records. It'd be good to have some example showing how the second argument, observer, is used, especially in light of the changes landed in #56.

Move Victor Costan to "former editor"

Victor is unlikely to be contributing to the spec work going forward, so should be pulled off the active editor list.

@pwnall - speak up if you disagree!

"Queue a ComputePressureObserver Task" should use [=queue a global task=]

Step 3 of https://wicg.github.io/compute-pressure/#queue-a-computepressureobserver-task says "[=Queue=] a [=task=] [...]", which is not very helpful:

It should at least refer to "[=Queue a task=]" instead of referring to Infra's "Queue" definition and HTML's "task" definition.
Even then, it needs to pass additional parameters to the "queue a task" algorithm.

However, it's probably a better idea to invoke the "queue a global task" wrapper algorithm. From https://html.spec.whatwg.org/multipage/webappapis.html#queuing-tasks:

Failing to pass an event loop and document to queue a task means relying on the ambiguous and poorly-specified implied event loop and implied document concepts. Specification authors should either always pass these values, or use the wrapper algorithms queue a global task or queue an element task instead. Using the wrapper algorithms is recommended.

Explain trimming the quantization scheme

The current API design has a (currently very subtle) avenue for allowing user agents to determine how much entropy they're willing to expose with this feature.

Quantization schemes are expressed as arrays. Web authors are expected to list value thresholds in order of relative importance, and user agents may choose to ignore some of the thresholds.

For example, Chrome's prototype implementation allows 3 thresholds for CPU utilization. A quantization scheme of [0.5, 0.8, 0.3, 0.9, 0.2] will be reduced to [0.3, 0.5, 0.8] (the sorted version of [0.5, 0.8, 0.3]).

How do you handle multiple ComputePressureObserver?

The API is very clear that a (main) frame is only allow to specify one set of buckets. How is the API going to enforce this? With the current example, it sounds like I could create multiple observers and hence get arbitrary precision on the values exposed.

Don't pass in options in the callback

No other Observer spec does this, except PerformanceObserver, but that is really not an "option" as its basically additional data given to the callback and not options as supplied by the developer

Additional high-level information regarding the cause of the pressure

Conveying a suggestion from @asafyaffe:

One potential extension is to provide developers with additional high-level information regarding the cause of the pressure, if known (i.e. compute, memory, thermals, storage, network etc.). Without this information, developers may need to implement a “trial and error” loop to properly adjust the user experience.

I'd consider this a future enhancement that warrants discussion once the foundational pieces of the high-level API have been established.

Throw NotAllowedError when permission is prompt

According to the spec text below, ComputePressureObserver observe() does not throw when permission is "prompt".
https://github.com/WICG/compute-pressure/blob/75052b4fd3f1ce425d61fe85d14b1b3d4ff59408/index.html#L379

I believe it should be If |permissionState| is not [=permission/granted=], throw {{NotAllowedError}}. as
https://www.w3.org/TR/permissions/#dfn-prompt also exists.

What do you think @kenchris?

Security and privacy review tracker

This issue is to solicit and track security and privacy review feedback from browser vendors, W3C's Privacy Interest Group, other privacy experts. While these reviews are formally part of the standards track, it is beneficial to conduct such reviews and capture any related feedback as early as possible, including any informal feedback and comments.

To facilitate this process, the Compute Pressure API contributors have proactively completed the Self-Review Questionnaire: Security and Privacy, documented the responses in a separate document and updated the Security and privacy considerations accordingly.

Please note the Compute Pressure API has been recently substantially refactored based on the high-level metrics proposal #24 to address feedback provided in WebKit Request for Position and with consideration for new use cases, web developer ergonomics in addition to privacy and security. To that end, we are in particular interested in browsers vendors' feedback on the security and privacy properties of the new API.

All feedback welcome, including LGTMs and more directional guidance.

Chrome's security and privacy team #79 (comment)

Explainer can go in README

I think it's fine to move the explainer to the README, so that it's visible on https://github.com/oyiptong/compute-pressure Alternatively, link the explainer from the README so that people don't have to dig through file names to find the explainer.

Handle dropped entries

Though it is unlikely given we rate limit, potentially we could get many items in the QueuedRecords and maybe we should do something like PerformanceObserver:

https://w3c.github.io/performance-timeline/#dfn-dropped-entries-count

Integration with screen share and media capture

To avoid cross origin user tracking, many specs restrict features to active focuses documents (example Generic Sensors).

But for compute pressure, one of the core use-cases is actually to know the CPU pressure while screen sharing to make sure the screen sharing is smooth, and the video streams and effects do not negatively affect the screen sharing. But during screen sharing, it is quite common that the web site/app using compute pressure is not actually focused.

This will require hooks in other specs and we probably need something like a initiator-of-still-active-screensharing-session

@eladalon1983 @riju @fideltian

Zoom needs this API for the Zoom PWA & Web Client

Hello, we have a web client and PWA version of zoom client. we love to have these kind of features very much. Some use cases i would like to share:

To give the better user experience, we could adaptor the vidoe resolution depends on the CPU occupation, especially the initial resolution. But due to privacy, we cannot get the cpu spec. So CPU occupation is also a way. Audio also have similiar use cases as video.
2) we have some funny features, some of them need lots of computing load. we need to know the CPU occupation and decide the minimal machines to be supported.

And we looked at the APIs we have some feedbacks.

The observer mode will return the cpuUtilizationThresholds every 1 second. how about returning only when there is change. otherwise the callback is too often.
and provide a API that could be querid by Application at any time. it is to let application to adjust based on the up/down trend of CPU usage.

Another requirement is about implementation.

we would like the api could be workable even if the tab is inactive or minimized. We have a use case about desktop sharing.
the API could be workable in worker thread as well?

Consider time instead of timestamp

For consistency with Intersection Observer

https://w3c.github.io/IntersectionObserver/#dom-intersectionobserverentry-time

Also consider using their description

The attribute must return a DOMHighResTimeStamp that corresponds to the time the intersection was recorded, relative to the time origin of the global object associated with the IntersectionObserver instance that generated the notification.

Add algos for quantification AKA bucketing

Follow the *Observer pattern more closely

As currently specified we differ somewhat from other *Observer's on the platform:

Each instance of the Observer class is constructed with a callback, and optionally with some options to customize what should be observed.

Actually only IntersectionObserver takes options as part of the ctor, all others as part of calls to observe()

Instances begin observing specific targets, using a method named observe(), which takes a reference to the target to be observed. The options to customize what should be observed may be provided here instead of to the constructor. The callback provided in the constructor is invoked when something interesting happens to those targets.

We don't have a special target - we don't allow targetting a specific CPU core or similar. We might allow GPU in the future, so we could make this be an enum instead like "cpu" or "gpu". It might make sense to allow observe to change the options.

Callbacks receive change records as arguments. These records contain the details about the interesting thing that happened. Multiple records can be delivered at once.

Yes, all existing observers do that. That might make a lot of sense for us as well, maybe even with timestamp.

The author may stop observing by calling a method called unobserve() or disconnect() on the Observer instance.

We currently use unobserve() - maybe disconnect() is better and also makes more sense if we allow changing options on the fly.

Looking at examples:

[Exposed=(Window)]
interface ResizeObserver {
    constructor(ResizeObserverCallback callback);
    undefined observe(Element target, optional ResizeObserverOptions options = {});
    undefined unobserve(Element target);
    undefined disconnect();
};

I see that you can observe multiple elements and disconnect stops all observations. Now this makes sense thinking about CPU and GPU etc

    enum ComputePressureTarget { "cpu", "gpu" };

    undefined observe(ComputePressureTarget target, optional ComputePressureObserverOptions options = {});
    undefined unobserve(ComputePressureTarget target);
    undefined disconnect();
};

Optionally, a method may be provided to immediately return records for all observed-but-not-yet-delivered occurrences.

This is the takeRecords() method other observers support. With batching that makes sense supporting

End result:

enum ComputePressureTarget { "cpu", "gpu" };
callback ComputePressureObserverCallback = undefined (sequence<ComputePressureObserverEntry> entries, ComputePressureObserver observer);


[Exposed=Window]
interface ComputePressureObserver {
  constructor(ComputePressureObserverCallback callback);
  undefined observe(ComputePressureTarget target, ComputePressureObserverOptions options = {});
  undefined unobserve(ComputePressureTarget target);
  undefined disconnect();
  sequence<ComputePressureObserverEntry> takeRecords();
};

IntersectionObserver also takes thresholds and actually expose those set on the object:

readonly attribute FrozenArray<double> thresholds;

For error handling, I see that MutationObserver observe() throws TypeError

Handle unloading of the document

Picture in picture uses the following terminology:

As one of the unloading document cleanup steps, run the exit Picture-in-Picture algorithm.

Reconsider when to skip events

Currently we have

If state has not changed since last sample for source, abort these steps.

but now we have factors, we should consider a change in factors as well.

That change is simple to make, but then it might make sense to have an easy way to check the last state and factors - maybe we just add an example

this.lastRecord = records.at(-1); // last item

Clarify better what supported means with regard to the ComputePressureSource enum

Now that we use ComputePressureSource as type instead of DomString, we need to adapt the error return to TypeError.

System shows 100% consumption but browser CPU Utilization doesnt

I am running an experiment to measure the client-side CPU load under different video conferencing architectures ( mesh, mixer, SFU etc ). I have set up this on chrome beta with fake streams and have no other program running on this machine( that take significant more power than idle).
When I see my CPU utilization spiking to 100% on large scale video conferences on my system performance monitor, I do not see 100% CPU utilization on the compute API reports. Is it designed to never go above 75%? or I am mssing something ?

For example I have never seen any value go above 75% CPU Utilization and 75 % CPU speed.
1.943 0.75 0.75
1.08 0.75 0.75
1.923 0.75 0.75
2.007 0.75 0.75
2.071 0.75 0.75

Even though CPU is heated immesely and shows ~99% utilization on all cores.
My system's has Intel® Core™ i7-7500U CPU @ 2.70GHz × 4.

Add timestamp to the ComputePressureRecord

As this could be used for identifying the user across multiple origins, we could restrict this to focused browsing contexts. Thus the timestamp is only set when you are focused, otherwise undefined @anssik

Don't call callback if values didn't change

Broken out from #14

The observer mode will return the cpuUtilizationThresholds every 1 second. how about returning only when there is change. otherwise the callback is too often.

This should be an easy change and one that I agree with.

How to handle user permission

From @beaufortfrancois:

They do not tell you anything, however, about whether that API is actually connected to a
real [=platform collector=], whether the [=platform collector=] is collecting real telemetry
readings, or whether the user is going to allow you to access it.

I didn't see in the spec where user could not allow it.

True, we haven't looked into whether we want to guard this behind a permission and where that should be done. Let's discuss that here

Make points in section "Minimizing information exposure" refer to relevant spec sections

ex.

Rate-limiting - The user agent notifies the application of changes in the information it can learn (buckets that each aggregated number). Change notifications are rate-limited.

should refer to the section talking about notification frequency @arskama

Append ComputePressureRecord to QueuedRecords only when pressure state changes.

It is not clear in current spec that ComputePressureRecord is appended to QueuedRecords only when pressure state changes.

Support the API in DedicatedWorker

Broken out from #14

Modernize spec text

The spec is written with an obsolete style for method and attribute algorithms, e.g.:

The observe() method, when invoked, MUST run the following step, given the arguments source:

When supportedSources's attribute getter is called, MUST run the following steps:

The more compact modern way is:

The observe(source) method steps are:

The supportedSources getter's steps are:

on unobserve(source), remove from records list only records assiociated with source

there is a bug in the algo.

We shouldn't be deleting the whole list of records, only the one from the unobserved source.

Use-case: GPU pressure

It would be nice to understand GPU load as well, as in some cases this is a bottleneck, not a CPU.
For example, when on low notebook in a video conference I screenshare external 4k screen, built-in internal GPU is 50-60% busy, CPU is 30% busy, and everything starts working very slowly - even mouse becomes jerky.

Handle focus/active-non-active change

Currently if you are non-focuses (and has no Picture-in-Picture or active screen share) events are not delivered.

But you might have the CP using site in a inactive tab for a long time in which case it makes no sense to keep the platform collector alive. Some browsers (like Chrome) also has the ability to sleep inactive tabs (say after 5 min) so we need to make sure how to handle that and at least to not prevent sleeping the tabs.

The same accounts for bfcache (active vs non active (bfcached) state). Currently we don't deliver events to non active sites (or workers of such sites) but here we might also want some logic to drop the platform collector after a while and reestablish later.

I don't know how much of this should be specced or how, but at least we might need to mention it in non normative text.

@inexorabletash what is your take on this?

Unclear how a sample makes its way from a platform collector to a ComputePressureUpdateCallback

https://wicg.github.io/compute-pressure/#platform-primitives says

When a sample is obtained from the platform collector, the user agent MUST run the 8.6.4 Notify Compute Pressure Observers algorithm a per the 8.6 Processing Model

That looks like the wrong algorithm though, as the process through which the sample collected by the platform is added to a ComputePressureRecord, how this ComputePressureRecord is created and how the relevant ComputePressureObserver instances are selected for a given target type are not clear.

High-level metrics to improve web developer ergonomics

I had a chat with @kenchris to propose we revisit both the existing use cases and new use cases that have recently emerged (e.g. #14) to understand whether the current cpuSpeed and cpuUtilization metrics are still the best fit.

I think there's an opportunity to make the API even more ergonomic for web developers who are not experts in computing performance and tuning, and not familiar with related concepts.

I'd like us to assess whether the current use cases could be served with an API that instead of (or in addition to) the current cpuSpeed and cpuUtilization numerical pair would expose a finite set of human-readable compute pressure states that have semantics attached to them.

What I'm interested in exploring is to see if we could raise the level of abstraction (bonus: more privacy-preserving, future-proofing) and make the underlying low-level metrics implementation details. The low-level metrics are harder to explain to web developers and might evolve and in some cases become misleading. I suspect they could be more easily misinterpreted as well.

In this proposal, the low-level metrics to high-level metrics mapping would become an implementation detail, and implementations could also take into consideration other factors that may influence the compute pressure state such as device form factor, thermal budget, and so on when making the decision.

Here's a strawman proposal, plugging into the existing API for illustrative purposes:

enum ComputePressureState { "nominal", "fair", "serious", "critical" };

dictionary ComputePressureEntry {
  ComputePressureState state = "fair";
}

Thoughts?

Related, I think this blog post https://www.brendangregg.com/blog/2017-05-09/cpu-utilization-is-wrong.html arguing CPU utilization metric is misleading should be reviewed. It trigged quite a long discussion among developer audience (also on HN), so there's probably good nuggets of information hidden also in the comments. Leaving it here for interested folks to digest.

[Edit: The state names in the strawman proposal were tweaked a bit. The names should be considered as placeholders to illustrate the idea. These names are subject to change based on feedback received.]

Integration with picture-in-picture

To avoid cross origin user tracking, many specs restrict features to active focuses documents (example Generic Sensors).

But for compute pressure, one of the core use-cases is actually to know the CPU pressure while writing minutes (summary of meeting notes) to make sure the typing experience is smooth, and the video streams and effects do not negatively affect the minute taking.

But during minute taking, it is quite common that the web site/app using compute pressure is not actually focused, as the word processing app is

This will require hooks in other specs and we probably need something like a initiator-of-still-active-picture-in-picture-session

@beaufortfrancois @riju @fideltian

No indication of max amount of thresholds available per metrics.

There is no way we know what are the amount of thresholds for each metric.
For example at the moment, it s 1 for CPU speed (2 buckets) and 3 for CPU Utilization (4 buckets)

Should there be a getMaxThreshold() per metric? or a mechanism to tell that the amount of thresholds provided is not suitable?

w3c / compute-pressure Goto Github PK

compute-pressure's Introduction

Compute Pressure

Authors:

Participate

Table of Contents

Introduction

Goals / Motivating Use Cases

Future Goals

Non-goals

Current approach - high-level states

Throttling

Measuring pressure is complicated

How to properly calculate pressure

Design considerations

API flow illustrated

Other considerations

Observer API

Key scenarios

Adjusting the number of video feeds based on CPU usage

Detailed design discussion

Prevent instead of mitigate bad user experiences

Third-party contexts

Considered alternatives

Expose a thermal throttling indicator

Stakeholder Feedback / Opposition

References & acknowledgments

compute-pressure's People

Contributors

Stargazers

Watchers

Forkers

compute-pressure's Issues

Recommend Projects

Recommend Topics

Recommend Org