Git Product home page Git Product logo

Comments (8)

guest271314 avatar guest271314 commented on August 24, 2024 1

This will also provide a simpler solution for https://github.com/WebAudio/web-audio-api-v2/issues/90

const ac = new AudioContext();
const aw = new AudioWorkletNode(ac, 'processor');
const {writable, readable} = new TransformStream();
const writer = writable.getWriter();
const input = new Float32Array(128);
input.fill(.5);
const done = new Float32Array(128);
done.fill(.25);
writer.write(input);
// writer.write(..);
// writer.write(..);
writer.close();
const rawPCMStreamNode = new RawPCMStreamNode(ac, readable);
aw.connect(rawPCMStreamNode);


from web-audio-api-v2.

guest271314 avatar guest271314 commented on August 24, 2024

Related https://github.com/w3c/mediacapture-insertable-streams. Not sure if that proposal can solve this.

from web-audio-api-v2.

guest271314 avatar guest271314 commented on August 24, 2024

This is largely solved by https://github.com/w3c/mediacapture-insertable-streams and https://github.com/w3c/webrtc-insertable-streams, and to the extent applicable, without making the tab unresponsive, WebCodecs AudioEncoder and AudioDecoder methods.

from web-audio-api-v2.

guest271314 avatar guest271314 commented on August 24, 2024

Largely though not completely solved.

The AudioFrame described by WebCodecs has some omissions, for example, precisely what timestamp "Presentation timestamp" is not defined in spec #107 is and how to generate the same.

from web-audio-api-v2.

guest271314 avatar guest271314 commented on August 24, 2024

A tentative workaround in lieu of timestamp not being defined in WebCodecs specification is to create a MediaStreamAudioDestinationNode, connect an OscillatorNode with frequency set to 0, pass the MediaStreamTrack to MediaStreamTrackProcessor, read the stream, get the generated timestamp and pass that timestamp with user-defined AudioBuffer to AudioFrame.

<!DOCTYPE html>

<html>
  <head> </head>

  <body>
    <h1>click</h1>
    <audio controls autoplay></audio>

    <script>
      async function webTransportBreakoutBox(text) {
        const url = 'quic-transport://localhost:4433/tts';
        try {
          const transport = new WebTransport(url);
          await transport.ready;

          const sender = await transport.createUnidirectionalStream();
          const writer = sender.writable.getWriter();
          const encoder = new TextEncoder('utf-8');
          let data = encoder.encode(text);
          await writer.write(data);
          console.log('writer close', await writer.close());
          const reader = transport.incomingUnidirectionalStreams.getReader();
          const result = await reader.read();
          console.log({
            result,
          });
          if (result.done) {
            console.log(result);
          }
          let transportStream = result.value;
          console.log({
            transportStream,
          });
          const { readable } = transportStream;
          const ac = new AudioContext({
            latencyHint: 0,
            sampleRate: 22050,
          });
          const msd = new MediaStreamAudioDestinationNode(ac, {
            channelCount: 1,
          });
          const { stream } = msd;
          const [track] = stream.getAudioTracks();
          const osc = new OscillatorNode(ac, { frequency: 0 });
          osc.connect(msd);
          osc.start();
          track.onmute = track.onunmute = track.onended = (e) => console.log(e);
          stream.oninactive = (e) => console.log(e);
          ac.onstatechange = (e) => {
            console.log(ac.state);
          };
          const processor = new MediaStreamTrackProcessor(track);
          const generator = new MediaStreamTrackGenerator('audio');
          const { writable } = generator;
          const { readable: audioReadable } = processor;
          const audioWriter = writable.getWriter();
          const ms = new MediaStream([generator]);

          document.querySelector('audio').srcObject = ms;
          const recorder = new MediaRecorder(ms);
          recorder.ondataavailable = ({data}) => console.log(URL.createObjectURL(data));
          recorder.start();
          let audioController;
          const rs = new ReadableStream({
            start(c) {
              return (audioController = c);
            },
          });
          const audioControllerReader = rs.getReader();
          const audioReader = audioReadable.getReader();
          // https://bugs.chromium.org/p/chromium/issues/detail?id=1174836
          // https://d27xp8zu78jmsf.cloudfront.net/demo/pure-audio-video6/audioworklet.js
          function Uint8ToFloat32(uint8Array_) {
            var int16Array_ = new Int16Array(uint8Array_.length / 2);
            for (var i = 0; i < int16Array_.length; i++) {
              int16Array_[i] =
                (uint8Array_[i * 2] & 0xff) +
                ((uint8Array_[i * 2 + 1] & 0xff) << 8);
            }
            return Int16ToFloat32(int16Array_, 0, int16Array_.length);
          }

          function Int16ToFloat32(inputArray, startIndex, length) {
            var output = new Float32Array(inputArray.length - startIndex);
            for (var i = startIndex; i < length; i++) {
              var int_ = inputArray[i];
              // If the high bit is on, then it is a negative number, and actually counts backwards.
              // output[i] = ((int_ + 32768) % 65536 - 32768) / 32768.0
              var float_ = int_ / 32768.0;
              if (float_ > 1) float_ = 1;
              if (float_ < -1) float_ = -1;
              output[i] = float_;
            }
            return output;
          }

          let index = 0;
          let arr = [];
          let headers = true;

          await Promise.all([
            audioReader.read().then(async function process({ value, done }) {
              if (document.querySelector('audio').currentTime === 0) {
                // avoid clipping first milliseconds of MediaStreamTrack audio
                return audioWriter
                  .write(value)
                  .then(() => audioReader.read().then(process));
              }
              if (done) {
                console.log({ done });
              }
              const {
                value: floats,
                done: audioControllerDone,
              } = await audioControllerReader.read();
              const { timestamp } = value;
              if (audioControllerDone) {
                console.log({ audioControllerDone });
              }
              if (audioControllerDone === false) {
                const buffer = new AudioBuffer({
                  numberOfChannels: 1,
                  length: floats.length,
                  sampleRate: 22050,
                });
                buffer.getChannelData(0).set(floats);
                const frame = new AudioFrame({ timestamp, buffer });
                return audioWriter.write(frame).then(() => {
                  frame.close();
                  return audioReader.read().then(process);
                });
              } else {
                console.log(
                  audioControllerDone,
                  audioReadable,
                  audioReader,
                  writable,
                  audioWriter
                );
                audioReader.releaseLock();
                audioReadable.cancel();
                audioWriter.releaseLock();
                writable.abort();
                return writable.close;
              }
            }),
            readable.pipeTo(
              new WritableStream({
                async start() {
                  console.log('writable start');
                },
                async write(value, c) {
                  let data;
                  if (headers === false) {
                    // #!/bin/bash
                    // espeak-ng --stdout "$1"
                    // 1 channel, 22050 sample rate
                    // omit WAV headers
                    data = Uint8ToFloat32(value.subarray(44));
                    headers = true;
                  } else {
                    data = Uint8ToFloat32(value);
                  }
                  for (let i = 0; i < data.length; i++) {
                    arr.push(data[i]);
                    if (arr.length === 220) {
                      audioController.enqueue(new Float32Array(arr.slice(0)));
                      arr = [];
                    }
                  }
                },
                close() {
                  console.log('done');
                  audioController.close();
                },
              })
            ),
          ]);

          await transport.close();
          track.stop();
          await ac.close();
          recorder.stop();
          return transport.closed
            .then((_) => {
              console.log('Connection closed normally.');
              return 'done';
            })
            .catch((e) => {
              console.error(e.message);
              console.trace();
            });
        } catch (e) {
          console.error(e);
          console.trace();
        }
      }

      document.querySelector('h1').onclick = async () => {
        webTransportBreakoutBox('a hunting we will go');
      };
    </script>
  </body>
</html>

Interestingly, the ReadableStream from a File (WAV) on disk behaves differently from a ReadableStream from WebTransport. I suspect the issue is with using only N indexes from TypedArrays from read() across different Arrays in sequential function calls, so that the memory is not necessarily contiguous when read again internally to produce output, which results in static periodically within the output. The TypedArray conversion function might also have an issue, however, when substituting File.stream() for WebTransport stream the static does not emit.

stream_pcm.zip

I am not certain why timestamp is necessary, given we can stream with AudioWorklet without the need to set timestamps for each output frame.

from web-audio-api-v2.

guest271314 avatar guest271314 commented on August 24, 2024

One issue is that with live input streams the Uint8Array from ReadableStreamDefaultReader.read() is not guranteed to have an length where that Uint8Array or Int8Array or the buffer is then passed to Uint16Array or Int16Array to convert to float values.

Consider the function Uint8ToFloat32 at https://d27xp8zu78jmsf.cloudfront.net/demo/pure-audio-video6/audioworklet.js

      function Uint8ToFloat32(uint8Array_) {
        var int16Array_ = new Int16Array(uint8Array_.length / 2);
        for (let i = 0; i < int16Array_.length; i++) {
          int16Array_[i] =
            (uint8Array_[i * 2] & 0xff) +
            ((uint8Array_[i * 2 + 1] & 0xff) << 8);
        }
        return Int16ToFloat32(int16Array_, 0, int16Array_.length);
      }

when the input Uint8Array from a ReadableStreamDefaultReader.read() call is a single TypedArray, or multiple TypedArray's where the length is an even number, e.g.,

61848

or

441180

the audio is output effectively as expected, with minimal artifacts.

However, when the input is a live stream where the Uint8Array length is arbitrary, and potentially an odd length,

3761
8777
5020
16289
17765
10236

the audio output can include static interference, evidently at the locations in the timeline when theUint8Array having odd length are converted to Int16Array and then Float32Array, consider

var int16Array_ = new Int16Array(uint8Array_.length / 2);

where construction of an Int16Array from a Uint8Array with odd length will not throw

var len = 3761;
var uint8 = new Uint8Array(len);
var int16 = new Int16Array(uint8); // will not throw

passing the buffer will throw

var len = 3761;
var uint8 = new Uint8Array(len);
var int16 = new Int16Array(uint8.buffer); // will throw

VM578:3 Uncaught RangeError: byte length of Int16Array should be a multiple of 2
at new Int16Array (<anonymous>)
at <anonymous>:3:13

we lose data

var int16Array_ = new Int16Array(uint8.length / 2);
console.assert(int16Array_.length * 2 === uint8.length, [int16Array_.length * 2 < uint8.length, int16Array_.length]) // VM1132:1 Assertion failed: (2) [true, 1880]

resulting in static inference in audio output.

A tentative solution that I have previously attempted is to "carry over" the odd value, however, artifacts are still evident in playback at the locations in the timeline where we need to create and slice multiple TypedArrays to carry over a single float.

        readable.pipeTo(
          new WritableStream({
            async start() {
              console.log('writable start');
            },
            async write(value, c) {
              console.log(value.length);
              let data;
              if (float && float instanceof Float32Array) {
                data = new Uint8Array([float, ...value]);
                float = void 0;
              } else {
                data = value;
              }
              if (data.length % 2 === 1) {
                float = data.slice(-1);
                data = data.slice(0, data.length - 1);
              }
              if (headers === false) {
                data = Uint8ToFloat32(data.slice(44));
                headers = true;
              } else {
                data = Uint8ToFloat32(data);
              }
              for (let i = 0; i < data.length; i+=220) {
                 audioController.enqueue(data.slice(i, i + 220));
              }
              // ...

A solution that does output expected result with AudioWorklet is using a single WebAssembly.Memory instance to write data to a contigious memory allocation, that can grow()
when necessary https://github.com/guest271314/webtransport/blob/main/webTransportAudioWorkletWebAssemblyMemoryGrow.js#L190

     async write(value, controller) {
        console.log(value, value.byteLength, memory.buffer.byteLength);
        if (readOffset + value.byteLength > memory.buffer.byteLength) {
          console.log('before grow', memory.buffer.byteLength);
          memory.grow(3);
          console.log('after grow', memory.buffer.byteLength);
        }
        const uint8_sab = new Uint8Array(memory.buffer);
        let i = 0;
        if (!init) {
          init = true;
          i = 44;
        }
        for (; i < value.buffer.byteLength; i++, readOffset++) {
          uint8_sab[readOffset] = value[i];
        }
        // ...

during the live stream.

For this feature request, a means to stream raw PCM which in general will be accessed in the form of a Uint8Array initially is the goal, to avoid creating multiple TypedArrays to carry over the odd float value, potentially allocating additional memory for the same value(s).

from web-audio-api-v2.

guest271314 avatar guest271314 commented on August 24, 2024

Based on testing there does not appear to be any way to avoid glitches, gaps, static interferences ("fragmentation") when streaming audio using multiple TypedArrays (potentially having an odd number length) as input without writing the data to a contiguous ArrayBuffer instance (writing to a Blob or File takes substantially more time in practice).

Memory limits in webassembly https://stackoverflow.com/a/40425252

For asm.js it was difficult to know how the ArrayBuffer was going to be used, and on some 32-bit platforms you often ran into process fragmentation which made it hard to allocate enough contiguous space in the process' virtual memory (the ArrayBuffer must be contiguous in the browser process' virtual address space, otherwise you'd have a huge perf hit).

...

WebAssembly is backed by WebAssembly.Memory which is a special type of ArrayBuffer. This means that a WebAssembly implementation can be clever about that ArrayBuffer. On 32-bit there's not much to do: if you run out of contiguous address space then the VM can't do much. But on 64-bit platforms there's plenty of address space. The browser implementation can choose to prevent you from creating too many WebAssembly.Memory instances (allocating virtual memory is almost free, but not quite), but you should be able to get a few 4GiB allocations. Note that the browser will only allocate that space virtually, and commit physical addresses for the minimum number of pages you said you need. Afterwards it'll only allocate physically when you use grow_memory. That could fail (physical memory is about as abundant as the amount of RAM, give or take swap space), but it's much more predictable.

For StreamNode to work properly an internal form of WebAssembly.Memory can be used to write input streams, and clear the memory used immediately following each frame read.

from web-audio-api-v2.

guest271314 avatar guest271314 commented on August 24, 2024

This is largely solved by https://github.com/w3c/mediacapture-insertable-streams and https://github.com/w3c/webrtc-insertable-streams, and to the extent applicable, without making the tab unresponsive, WebCodecs AudioEncoder and AudioDecoder methods.

WebCodecs does not help with this.

The Inserrtable Streams (Breakout Box; Media Transform API) can be utilized to achieve the expected result.

from web-audio-api-v2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.