Git Product home page Git Product logo

Comments (21)

padenot avatar padenot commented on August 24, 2024 1

AudioWG virtual F2F:

  • This is desirable, and can be implemented on some systems fully (Jack on Linux for example, or using kernel extensions on other OSes), but is out of scope of the Web Audio API. This is clearly the responsibility of the WebRTC working group, because they deal with device access and permissions. The security considerations are extremely important: without an explicit user consent, the audio of the machine could be exfiltrated to any website.
  • Closing, but we'll follow up in other groups if needed.

from web-audio-api-v2.

PaulFidika avatar PaulFidika commented on August 24, 2024 1

I can't believe how difficult / impossible it is to do something as simple as "hey capture user-audio on this machine, then stream or record it" or "hey let users select in their browser what audio device they want their music to play out of". It's honestly kind of bizarre because I can find stackoverflow / github issues dating back past 2017 asking for this simple behavior, and on any native-app this is well-supported behavior.

from web-audio-api-v2.

bradisbell avatar bradisbell commented on August 24, 2024

The Screen Capture API already supports audio on at least some platforms. It's awkwardly named for this use case, but I think it does what you're looking for. Perhaps we could advocate for improving that spec and support for it?

from web-audio-api-v2.

guest271314 avatar guest271314 commented on August 24, 2024

@bradisbell No, the Screen Capture API does not unambiguously support system or application audio w3c/mediacapture-screen-share#140. I tried developer advocacy. We do not need to be required to capture a screen just to capture audio. I have not used Windows for some time and have no need to, not even before deleting on machines I come across.

Some platforms is not good enough. In fact, good enough is never good enough.

My own diagnosis is that this https://bugs.chromium.org/p/chromium/issues/detail?id=931749 is part of the problem at https://bugs.chromium.org/p/chromium/issues/detail?id=865799 from which I am now banned from participating.

We do not need this API squeezed into API's that were not intentionally designed for this use case, including here by reference the summary of the use cases listed at OP.

This is for an unambiguous API to capture both entire system audio and specific application audio. On its own, outcropped from within the scope of Web audio. Extensibility is paramount at the outset. We do not want to maintain language or an API that is fixed in 2020 when neither time nor concepts are static. An entirely new class of device could be disclosed, and we need to be prepared to adjust specification and API to incorporate as yet potentially undisclosed technology, not be static. There are use cases in the wild for capturing audio output. there is no dedicated specification or API which successfully achieves those use cases in written or executed code form, deliberately. Developer driven. Developers develop. It don't stop.

from web-audio-api-v2.

bradisbell avatar bradisbell commented on August 24, 2024

We do not need to be required to capture a screen just to capture audio.

Yeah, agreed completely. I think it's possible to specify audio only in the constraints? Not sure... my platform (Windows) isn't supported yet so I don't have a good way to test.

Some platforms is not good enough.

I wholeheartedly agree, and it's a fixable problem I think.

My own diagnosis is that this https://bugs.chromium.org/p/chromium/issues/detail?id=931749 is part of the problem...

Capturing from a monitor/loopback device is attacking the problem from the side. While I completely agree that this limitation in Chromium should be fixed, a cross-platform solution would be better.

...from which I am now banned from participating.

That is very sad. :-( I, too, cannot see the thread.

We do not need this API squeezed into API's that were not intentionally designed for this use case

The Screen Capture API is poorly named, but is otherwise designed for the use case as far as I see it. With it, you can capture audio streams from the whole system, individual applications, browser tabs, and windows.

An entirely new class of device could be disclosed, and we need to be prepared to adjust specification and API to incorporate as yet potentially undisclosed technology, not be static.

I don't see why we need a whole new AudioContext replacement for this. You're proposing a new source of MediaStreams, not an entire new audio API from scratch.

from web-audio-api-v2.

bradisbell avatar bradisbell commented on August 24, 2024

@guest271314 Ah, I now see the comment here, indicating that audio capture is out-of-scope: w3c/mediacapture-screen-share#140 (comment)

I posted a comment back on that issue for clarification as to why audio capture would be out-of-scope. Seems like a strange restriction... perhaps we'll at least get a more detailed answer as to what the objection is so it can be addressed.

from web-audio-api-v2.

guest271314 avatar guest271314 commented on August 24, 2024

@bradisbell One problem is that the term of art 'audiooutput' in Media Capture and Streams https://w3c.github.io/mediacapture-main/#idl-def-MediaDeviceKind.audiooutput, which appears twice, is not actually defined as being captured. Further, though not in the language, the intent appears to be routing audio to an output device, again, not capturing the output therefrom whatsoever. The goal of that specification is to capture microphone input. Firefox does not implement 'audiooutput' as described in that specification, there is nothing to implement, as there are no algorithms associated with the term that is used twice in that specification. 'audiooutput' does not appear whatsoever in Audio Output Devices API https://w3c.github.io/mediacapture-output/. The question must be asked, why is that term in the specification at all? I have asked several question re that topic, most closed. I will leave it to you to read those issues. I linked to implementation bugs where applicable therein.

Thus, when the use case is capturing speech synthesis audio output from Web Speech API, where there is no specified means to do so https://lists.w3.org/Archives/Public/public-speech-api/2017Jun/0000.html, a user in the field might read the language in the getUserMedia() specification and attempt to capture the plain language, audio output. At Chrome and Chromium on Linux that will not happen https://stackoverflow.com/a/45003549

@RonenRabinovici Yes, the original code at answer did record the device microphone. The original code is a workaround for the requirement to record speech synthesis by default at modern browsers. Updated code to set "audioouput" as device to record github.com/guest271314/SpeechSynthesisRecorder/commit/… – guest271314 Jan 10 '18 at 3:18
2
@loretoparisi See updated code which sets media device to record to "audiooutput" plnkr.co/edit/PmpCSJ9GtVCXDhnOqn3D?p=preview – guest271314 Jan 10 '18 at 3:22
2
@guest271314, I used the code at plnkr.co/edit/PmpCSJ9GtVCXDhnOqn3D?p=preview but it still recorded from my microphone. – Jeff Baker Aug 15 '18 at 22:54
This doesn't record speaker output. I tried capturing tab audio using chrome extension but still failed. It seems speechSynthesis is not using HTMLmediaElement for audio hence we shall not be able to capture at tab/browser level. The audiooutput mentioned above returns "default " for both mic and speaker since there is no way to set "kind" field while setting constraints in getUsermedia, it always captures "mic". Let me know in case more details required. – Gaurav Srivastava Mar 4 '19 at 1:13
Confirming that it records from microphone rather than speech synthesis - at least in Chrome 84. – joe Aug 13 at 11:15

Is it immediately obvious to you that 'audiooutput' in getUserMedia() specification is not intended to refer to the ability to capture audio output?

There is clearly potential for confusion when users in the field are actually trying to capture audio output due to the omissions of a different API that outputs audio through Speech Dispatcher at Chromium and through potentially Speech Dispatcher or Google playback itself when window.speechSynthesis.speak() is called with an SpeechSynthesisUtterance that uses Google voices, then there is the possibility of at least two different applications, that is just with one speech synthesis engine in use.

The thread might have been marked as permission denied for other reasons. I do not know for certain. At the previous developer channel Chromium build I was using it was possible to disconnect all microphones, catch() getUserMedia({audio: true}) DOMException that is thrown at Chrome and Chromium only and still retain permissions for enumerateDevices(), while appears to have changed at a Chromium release over the last day or two. In any event, the use case of that issue was not having to get microphone permissions just to listen to a Google Meet when there were no microphones connected to the machine.

Chrome and Chromium only list and capture microphones, so that use case is currently not possible at Chrome, however, the UI design can be adjusted to work around the issue https://plnkr.co/plunk/wlwgV3BKBVZJQ2tF. That means acknowledging that getUserMedia() is not sufficient for audio output only capture, especially when we are not trying to capture microphone - enumerateDevices() should list deviceIds only when getUserMedia() permission is granted, however, we have no microphone connected, resulting in Chrome throwing device not found error, and a circular where we ultimately discover that there is no existing API for audio capture, where we must return to the topic of the strict limitations of getUserMedia() and by reading every word and performing at least tens of thousands of manual tests, concluding that there is really is no audio output capture API.

The Screen Capture API is poorly named, but is otherwise designed for the use case as far as I see it. With it, you can capture audio streams from the whole system, individual applications, browser tabs, and windows.

getDisplayMedia() specification does not contain MUST language as to audio capture, not binding on implementations. That is the purpose of that issue. I am well-suited to construct terms of art. MAY does not indicate that implemeners will not be in conformance with the governing specification, if a tracking bug say "Implement specification" where MAY is used. I would suggest re-reading that issue carefully.

Other users have filed bugs https://bugs.chromium.org/p/chromium/issues/detail?id=991401.

Although Mozilla browsers do provide access to Monitor devices, that is only part of the as currently working in the field technical side of this issue https://bugzilla.mozilla.org/show_bug.cgi?id=1670405.

@guest271314 Ah, I now see the comment here, indicating that audio capture is out-of-scope: w3c/mediacapture-screen-share#140 (comment)

I posted a comment back on that issue for clarification as to why audio capture would be out-of-scope. Seems like a strange restriction... perhaps we'll at least get a more detailed answer as to what the objection is so it can be addressed.

We do not want or need to go through capturing a display to capture audio.

I do not settle.

Again, good enough ain't good enough.

I have built workarounds and template language for algorithms with accompanying code in the interim. I do not leave unanswered questions without determining if the requirement is impossible. Here, as demonstrated by the workarounds, the requirement is possible. The issue is not the code, the issue with regard to specifications and implementations is policy.

I have not yet compiled a complete, unabridged list of all of the issues and bugs, and experiments, and tests re this matter. A brief summary of the bugs and issues that I have filed re this matter are referenced at https://github.com/guest271314/captureSystemAudio#references.

If you are compelled to perform due diligence, carry on. I and others have left breadcrumbs along the trail.

In the meantime I do not do any waiting on any other individual or institution for anything. Thus, the various workarounds that I have published for this topic. I am mainly performing due diligence for developers that perhaps have not vetted the entirety of the circular abstraction of how to unambiguously capture entire and application specific audio output.

I perform primary source research. I repudiate and reject spurious claims in any and every domain of human activity that I am engaged constantly, from biology to history, law to politics, industries, etc., using the scientific method and codified research processes (in brief see https://ncu.libguides.com/researchprocess/primaryandsecondary, http://handbook.reuters.com/index.php?title=Vetting_tips&oldid=3127) specifications and implementations are not exempt from vetting and evidenced-based conclusions reached by individuals outside of the system due to Godel's Second Incompleteness Theorem https://plato.stanford.edu/entries/goedel-incompleteness/

Second incompleteness theorem
For any consistent system F within which a certain amount of elementary arithmetic can be carried out, the consistency of F cannot be proved in F itself.

That is what I do. And again, I ain't doing this for me. I build my own stuff and draw my own conclusions. I employ the same approach when writing code breaking code in tests https://gist.githubusercontent.com/guest271314/1fcb5d90799c48fdf34c68dd7f74a304/raw/c06235ae8ebb1ae20f0edea5e8bdc119f0897d9a/UseItBreakItFileBugsRequestFeaturesTestTestTestTestToThePointItBreaks.txt and viewing the language in specifications using the codified rules of statutory construction as I do in industry where when utilized correctly leaves nowhere to go except the absolute source of the matter, or put another way by Suga Free on the record Angry Enuff

Look for lies! Misconduct is at an all-time high too

from web-audio-api-v2.

guest271314 avatar guest271314 commented on August 24, 2024

@bradisbell

Not sure... my platform (Windows) isn't supported yet so I don't have a good way to test.

Supposedly it is possible to capture system audio output with Screen Capture API on Windows w3c/mediacapture-main#694 (comment).

I found this https://bugs.chromium.org/p/chromium/issues/detail?id=1143761 today. Note one of the components listed https://bugs.chromium.org/p/chromium/issues/list?q=component:Internals%3EMedia%3EScreenCapture.

That is not a substitute for this proposal, which is for disambiguation from 'audiooutput' term of art used in Media Capture and Streams specification w3c/mediacapture-main#650 (comment)

Correct. It does not mean capturing audio output. It means playing audio, not capturing it.
Devices of audiooutput kind can be used with the setSinkId method of media elements to set which audio device the element should use to play audio.

to concretely and unambiguously establish entire system and application audio output capture in specification and implementation form, without room for conjecture as to the purpose of the API.

from web-audio-api-v2.

guest271314 avatar guest271314 commented on August 24, 2024

With the closure of this Chromium/Chrome bug https://bugs.chromium.org/p/chromium/issues/detail?id=1155954 based on the evidence we conclude no audio output capture specification or API exists.

This means that Web Audio working group will not be intruding into another working group's domain by specifying audio output capture, as this subject matter is not covered by or within the scope of any existing working group or task forces' goals or ongoing work.

from web-audio-api-v2.

guest271314 avatar guest271314 commented on August 24, 2024

Noting here that technically, for application capture we want to capture the playback stream, or sink inputs, for example using the application name "Chromium", media name Playback; "Firefox", media name AudioStream; or application binary process, "mpv"; "sd_espeak-ng", for disambiguation from capturing entire system audio output, or "What-U-Hear", which could include all of the former.

from web-audio-api-v2.

guest271314 avatar guest271314 commented on August 24, 2024

Preliminary tests to capture discrete sink inputs on Linux using pactl list sink-inputs to get indexes of playback streams and parecord to record to the stream to a file. In this API we can pipe the output to the API as raw PCM or a MediaStream, re-encode to Opus or other code, etc. We have direct access to the raw audio output data of the application to use in the browser.

Test:

At Firefox on Linux with speech-dispatcher installed (python3-speechd) in one tab we run

window.speechSynthesis
.speak(
  new SpeechSynthesisUtterance('test test test test test '.repeat(20))
);

in another tab we load an audio file file:///path/to/audio/file as a media document and set loop to true at the HTMLMediaElement resulting in both tabs outputting audio similtaneously, Web Speech API through speech-dispatcher-espeak-ng module and the local audio file through Firefox AudioStream.

Capture speech synthesis output only

$ parecord -v -r --monitor-stream=26 --file-format=wav output.wav

capture only output from HTMLMediaElement

$ parecord -v -r --monitor-stream=31 --file-format=wav output1.wav

Result:

Only the specified monitor stream (sink-input) audio is captured even while a different playback is simultaneously occurring.

Conclusion:

It is possible to capture only specific discrete application playback while multiple sink-inputs are outputting plackback simultaneously.

from web-audio-api-v2.

guest271314 avatar guest271314 commented on August 24, 2024

The getDisplayMedia() specification does not support capturing audio. It is implementation opt-in. Audio capture is apparently available on Windows for getDisplayMedia({audio: true, video: true}), however, I do not use Windows.

We should not be compelled to capture screens just to capture audio. getDisplayMedia() does not specify allowing {audio: true} only constraints at constructor.

That will not work here.

As I have stated multiple times Chrome and Chromium refuse to support listing and capture of non-microphone devices on Linux, with no plans to revert the change, in brief see ungoogled-software/ungoogled-chromium#1273.

At Chromium at least one contributor to the source code correctly removed the Screen Capture component from the feature request.

This is the province of Web Audio to capture system and application audio. The requirement is not too difficult to achieve and other specifications have already declined to support audio output capture, there is no way to defer back to specifications that have refused to support the requirement.

from web-audio-api-v2.

padenot avatar padenot commented on August 24, 2024

No specification or API, including Web Audio API, exists for the specific purpose of capturing, analyzing, or processing system or application audio output.

This is because this is not possible to implement on all operating systems.

For example, it's not implementable in macOS, and on some Android (depending on the OEM). It's implementable on Linux when using PulseAudio and monitor devices are available (which is not a guarantee), and is available on Windows > 7, if the stream is not "protected" (I don't know the correct word).

from web-audio-api-v2.

guest271314 avatar guest271314 commented on August 24, 2024

This is because this is not possible to implement on all operating systems.

For example, it's not implementable in macOS

The following repository is at GitHub

solutions referencing the above code

an article on how to record speak output on masOS.

Android is based on Linux. Once rooted you can do whatever you want.

From developer.android

developers in the wild are aware of this

Thus the feature appears to be possible on all platforms.

from web-audio-api-v2.

guest271314 avatar guest271314 commented on August 24, 2024

Whether a feature is currently possible or not on one or another operating system cannot impact progression of a feature to be specified.

Implementers can decide for themselves if and how to implement, as they do with all other specifications.

The technology is not static (Moore's Law). Operating systems change over time. The technology exists to achieve the requirement.

I am certain that given a macOS or Android device I will be able to capture audio the same manner as I do on other machines, without reliance on specifications and working around implementations.

AFAIK Media Capture Automation is not implemented (or possible without awareness of Chrome refusing to list or capture certain devices and workarounds) right now on Chrome or Chromium on Linux though is already published, moving ahead (of the curve).

from web-audio-api-v2.

guest271314 avatar guest271314 commented on August 24, 2024

This is because this is not possible to implement on all operating systems.

For example, it's not implementable in macOS, and on some Android (depending on the OEM). It's implementable on Linux when using PulseAudio and monitor devices are available (which is not a guarantee), and is available on Windows > 7, if the stream is not "protected" (I don't know the correct word).

Do we have 2020 tests for this on each OS?

If we do not have tests we can run the tests to capture system and application audio output to base conclusions on evidence.

from web-audio-api-v2.

guest271314 avatar guest271314 commented on August 24, 2024

Capturing entire system audio output alone is not enough. We need to be able to capture specific source-outputs, or virtual devices, etc., particularly relevant to speech synthesis output via Web Speech API, where, for example, when Google Chrome is used the playback of mutliple utterances (passed as input text) can use different playback devices, Google voices (which has an undisclosed restriction not specified https://bugs.chromium.org/p/chromium/issues/detail?id=1158246), and whichever modules the user configures for use with speech-dispatcher (which Chrome and Mozilla browsers each use), so capturing only one playback output during the output will not suffice.

This is how I am am capturing entire system audio output, and specifc source-outputs, in this case speech-dispatcher-espeak-ng module on Linux using navigator.mediaDevice.getUserMedia(), which is absolutely unspecified behaviour even though the infrastructure supports this capability.

Essentially this Issue can be solved by removing the restrictions that Media Capture and Streams (main) have imposed on getUserMedia() and enumerateDevices(), however, given the closure of the issues I filed there that does not appear to be likely.

I will use the existing technology anyway to meet this requirement. No specification restriction or omission can stop that from occurring, as long as I have control over my machine I can set any real or virtual stream I want as the source for getUserMedia({audio: true}) without asking for the specification change or implementation, guest271314/SpeechSynthesisRecorder#17

pactl load-module module-combine-sink \
sink_name=Web_Speech_Sink slaves=$(pacmd list-sinks | grep -A1 "* index" | grep -oP "<\K[^ >]+") \
sink_properties=device.description="Web_Speech_Stream" \
format=s16le \
channels=1 \
rate=22050
pactl move-sink-input $(pacmd list-sink-inputs | tac | perl -E'undef$/;$_=<>;/speech-dispatcher-espeak-ng.*?index: (\d+)\n/s;say $1') Web_Speech_Sink
pactl load-module module-remap-source \
master=Web_Speech_Sink.monitor \
source_name=Web_Speech_Monitor \
source_properties=device.description=Web_Speech_Output

We can then select device with label 'Web_Speech_Output'

navigator.mediaDevices.getUserMedia({audio: true})
.then(async stream => {
  const [track] = stream.getAudioTracks();
  const devices = await navigator.mediaDevices.enumerateDevices();
  const device = devices.find(({label}) => label === 'Web_Speech_Output');
  track.stop();
  console.log(devices, device);
  return navigator.mediaDevices.getUserMedia({audio: {deviceId: {exact: device.deviceId}}});
})
.then(stream => {
  const recorder = new MediaRecorder(stream);
  recorder.ondataavailable = e => console.log(URL.createObjectURL(e.data));
  const synth = speechSynthesis;
  const u = new SpeechSynthesisUtterance('test');
  u.onstart = e => {
    recorder.start();
    console.log(e);
  }
  u.onend = e => {
    recorder.stop();
    recorder.stream.getTracks().forEach(track => track.stop());
    console.log(e);
  }
  synth.speak(u);
});

References:

Notice, the Web has not been broken, and "audiooutput" in that specification https://github.com/w3c/mediacapture-main/issues/756 is still not actually audio output, no matter how the term of art is used there, guest271314/SpeechSynthesisRecorder#18.

from web-audio-api-v2.

guest271314 avatar guest271314 commented on August 24, 2024

AudioWG virtual F2F:

  • This is desirable, and can be implemented on some systems fully (Jack on Linux for example, or using kernel extensions on other OSes), but is out of scope of the Web Audio API. This is clearly the responsibility of the WebRTC working group, because they deal with device access and permissions. The security considerations are extremely important: without an explicit user consent, the audio of the machine could be exfiltrated to any website.
  • Closing, but we'll follow up in other groups if needed.

All of the relevant WebRTC and MediaStream working groups have "rejected" the multiple issues I have filed re this subject matter. Do I really need to list the numerous issues I have filed re this matter, here? I can if that will help clarify that the groups you mention are well-aware of the feature request and have closed all of the issues I have filed therefore.

A short-list of issues closed by W3C reprositories, and Chromium authors for the record

The request at the start is for Chrome to have the ability to execute arbitrary shell scripts from Javascript in order to get around the fact that the standards bodies have rejected requests for capturing monitor devices on Linux.

Closing as "won't fix".

So, no, those groups are not the place to discuss this feature if in fact Web Audio has deemed the feature "desirable".

Closing, but we'll follow up in other groups if needed.

I am not sure what groups you are considering, other than the ones that closed the issues I filed there?

That is why I filed the concept here, given audio developers might just get what the restrictions are and how simply this can be fixed.

The "security" considerations are very simple: prompt, get permissions to capture specific sink-inputs, source-outputs, virtual devices, etc., no different than the existing API's stemmed from the specifications listed above. There is no boogeyman with capturing audio output.

from web-audio-api-v2.

guest271314 avatar guest271314 commented on August 24, 2024

AudioWG virtual F2F:

  • This is desirable, and can be implemented on some systems fully (Jack on Linux for example, or using kernel extensions on other OSes),

There are various means to achieve the expected result on Linux.

Consider use cases

$ mpv no_browser_supports.this_codec => AudioOutputContext + vitual camera supplying live stream to getDisplayMedia(), etc. (WebCodecs does not solve this).

but is out of scope of the Web Audio API.

  • AudioContext.createMediaStreamDestination, AudioContext.createMediaStreamSource.

  • You folks are the audio experts.

Per the closures of the issues Web Audio API would not be stepping on any other (WebRTC) "working group" feet. Other "working group" "rejected" the feature request. Web Audio API found the feature "desirable". Thus there is a different conclusion drawn for each "working group" so far, which is not problematic. Why would "working group" who "rejected" previous proposals have an issue with a different "working group" that considered otherwise?

This is clearly the responsibility of the WebRTC working group

Well, Chromium/Chrome refuse to capture 'monitor' devices. There is little possibility in that changing.

, because they deal with device access and permissions.

WebRTC does not have exclusive "rights" to the concept of permissions (for audio devices); Media Capture and Streams is limited, restricted to "microphone" capture - not output device capture whatsover, becuase Audio Output API does not capture output devices.

The security considerations are extremely important: without an explicit user consent, the audio of the machine could be exfiltrated to any website.

Well, the user can capture whatever they want right now. getUserMedia() captures microphone, right now, which is effectively can be entire system audio output - with echo and reverb - right now.

I am not certain what precise concern you have?

List sink inputs, source outputs, virtual device outputs to capture, capture or read the stream in "real-time" with result as a MediaStreamTrackProcessor.readable if a MediaStreamTrack output option is set, or non-MediaStreamTrack options ReadableStream with Float32Array's, set the values to outputs in AudioWorkletProcessor, etc., which do not rely on WebRTC whatsoever.

  • Closing, but we'll follow up in other groups if needed.

Yes, that follow-up is needed in the above-listed specification "working group" issues. W3C banned me so you folks will need to file your own issue if you do not want to simply comment in one or more of the multiple "working group" issues I filed.

from web-audio-api-v2.

hoch avatar hoch commented on August 24, 2024

@PaulFidika

  1. capture user-audio on this machine, then stream or record it
  2. let users select in their browser what audio device they want their music to play out of

These two requests are different; the latter is being tracked by #10 and the WG's response to the former request is laid out in #106 (comment). However, please feel free to open a new issue if you have different thoughts.

from web-audio-api-v2.

guest271314 avatar guest271314 commented on August 24, 2024

I can't believe how difficult / impossible it is to do something as simple as "hey capture user-audio on this machine, then stream or record it"

I solved this several ways https://github.com/guest271314/captureSystemAudio. The latest variation for PulseAudio on Linux using Native Messaging.

let audioStream = new AudioStream(
  new ReadableStream({
    start(c) {
      c.enqueue(
        new File(
          [
            `parec -v --raw -d $(pactl list | grep -A2 'Source #' | grep 'Name: .*\.monitor$' | cut -d" " -f2)`,
          ],
          'capture_system_audio',
          {
            type: 'application/octet-stream',
          }
        )
      );
      c.close();
    },
  })
);
// audioStream.mediaStream: live MediaStream
audioStream
  .start()
  .then((ab) => {
    // ab: ArrayBuffer representation of WebM file from MediaRecorder
    console.log(
      URL.createObjectURL(new Blob([ab], { type: 'audio/webm;codecs=opus' }))
    );
  })
  .catch(console.error);
// do stuff
// stop capturing system audio output
audioStream.stop();

then processes the raw PCM https://github.com/guest271314/captureSystemAudio/blob/master/native_messaging/capture_system_audio/audioStream.js.

This is also possible using WebTransport https://github.com/guest271314/webtransport/blob/main/AudioStream.js, or fetch() without an extension (Native Messaging) https://github.com/guest271314/NativeTransferableStreams.

It's honestly kind of bizarre because I can find stackoverflow / github issues dating back past 2017 asking for this simple behavior, and on any native-app this is well-supported behavior.

A brief summary https://github.com/guest271314/captureSystemAudio#references.

My initial use case was simply capturing output of Web Speech API https://lists.w3.org/Archives/Public/public-speech-api/2017Jun/0000.html, https://stackoverflow.com/questions/45003548/how-to-capture-generated-audio-from-window-speechsynthesis-speak-call/45003549, see the comments at https://stackoverflow.com/a/45003549 for Chromium/Chrome behaviour and guest271314/SpeechSynthesisRecorder#17 for workarounds.

Though because Web Speech API does not expect or parse SSML I decided to get STDOUT from the speech synthesis engine directly https://github.com/guest271314/native-messaging-espeak-ng. I quickly realized there were other use cases, and for undisclosed reasons was not specified. I filed more than one specification and implementation issue re the capturing audio output use case.

I would recommend. if you entertain hope, abandon it in this case re getting this unambiguously specified. Chromium/Chrome authors simply refuses to list of capture monitor devices. getDisplayMedia({audio: true, video: true}) does capture some tab audio output on Chromium, not others https://bugs.chromium.org/p/chromium/issues/detail?id=1185527.

or "hey let users select in their browser what audio device they want their music to play out of"

You can probably adjust the code at https://github.com/guest271314/setUserMediaAudioSource, substituting WebTransport for the deprecated QuicTransport to achieve that requirement.

I do not have any evidence from the several issues and bugs others and myself have filed re capturing system audio (on Chromium) indicating that vendor will implement such functionality officially, though API's do exists to achieve the requirement.

Roll your own.

from web-audio-api-v2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.