pion / mediadevices Goto Github PK

Go implementation of the MediaDevices API.

License: MIT License

Go 66.70% C++ 4.55% C 26.01% Dockerfile 0.34% Objective-C 1.74% Shell 0.02% CMake 0.05% Makefile 0.59%

audio-call codec driver face-recognition go golang livestream machine-learning mediadevices mediadevices-api p2p rtp streaming video-call voip webrtc

mediadevices's People

Contributors

Stargazers

Watchers

Forkers

dolanor-galaxy mzkaramat forkkit alejinho avfoundation 18958182818 masatrio 15576684447 dbl0null tristech81 isgasho fffilimonov pounze f-fl0 wawesomenogui asdlei99 livestream-starters andrein mundeuksoo miloradlalovic kokizzu zyxar laodano1 edaniels digitalix hookttg emrysmyrddin lamhai1401 suunc-suunc guangminglion gtrevg qnkhuat katepangliu n2d trevor403 mix3 brandt dominikn jkiler zjzhang-cn stars-and-focus xy-poin http600 kkc90 maximpertsov orz-forks 33forks mondy cpdevs igtulm vvhh2002 meonardo izinga hetelek krvajal st1992 anylee2021 matt-allan eric7578 braveslc at-wat titlid dgovorukhin myahuang baron2050 neilsagarwal isabella232 kw-m vpalmisano erichsu selvakn malykhin kharijarrett radekg rodrigoaddor bazile-clyde rob1in ctpalmer amirulandalib neversi adityaa30 marthajohnston martha-johnston jbain stv0g seanavery kim-mishra zfg88287508 dezi infamy ryanboring aljanabim thanhphuoc95 kzwhui kobayashirui hexbabe renowncoder rm4n0s abconnectio goga1992

mediadevices's Issues

Support ranged media constraints

Pixel format is scored by complete match at now.
For example, I want to select one of YUY2 and UYVY but not want to use JPEG.

mediadevices/pkg/prop/prop.go

Lines 99 to 112 in 949e850

 // Video represents a video's properties 

 type Video struct { 

 Width, Height int 

 FrameRate float32 

 FrameFormat frame.Format 

 } 

 // Audio represents an audio's properties 

 type Audio struct { 

 ChannelCount int 

 Latency time.Duration 

 SampleRate int 

 SampleSize int 

 }

Get encoding media properties from Transformer

Currently, source media properties are passed to encoder directly, however, Video/AudioTransformer may change the media properties.
e.g. VideoTransformer may change frame rate and size, AudioTransformer may change number of the channels

Filter device by deviceId

It is MediaTrackConstraints.deviceId in Web API.

Add video scaling transform

Some scaling kernels are available in x/image/draw package.
https://pkg.go.dev/golang.org/x/image/draw?tab=doc#pkg-variables

TestMeasureBitRate sometimes fails

=== RUN   TestMeasureBitRateDynamic
##[error]    TestMeasureBitRateDynamic: measurement_test.go:95: expected: 25600.000000 (with 8.000000 precision), but got 25585.959280
--- FAIL: TestMeasureBitRateDynamic (5.00s)

https://github.com/pion/mediadevices/pull/139/checks?check_run_id=598870347 (restarted)

Add camera adapter for Darwin (Mac)

In Darwin, we can get access to the camera through https://developer.apple.com/av-foundation/. To access this API, we need to use either Swift/Objective C. In fact, there's a project that has done this before (https://github.com/dialup-inc/ascii/blob/master/camera/cam_avfoundation.mm). So, we can probably simply learn how they did it and adapt it to this project.

Add support for non-YCbCr input to ToI420

As of now, our video encoders, openh264 and vpx, only support I420 format. Therefore, we have ToI420 converter that helps to handle any kind of image format.

However, ToI420 can only handle YCbCr images at this moment. So, the motivation for adding more image formats would be,

Allow more flexibility for the input
Since we allow users to transform the video through VideoTransform, having a flexible input will also open a door for people to use a library such as https://godoc.org/github.com/disintegration/imaging, which relies highly on NRGBA

OpenH264 library for non-x86_64-Linux environments

cvendor/lib/openh264/libopenh264.a is compiled for x86_64 Linux.
Would be nice to have more portable way for multiple environments.

Fallback codec

Select a fallback codec implementation if the first one was failed to initialize.
For example with the case of vaapi (higher priority) and vpx, try vaapi first, and if the environment doesn't have video acceleration hardware, use vpx.

It could be something like:

codec.Register(webrtc.VP8, codec.VideoEncoderFallbacks(
  codec.VideoEncoderBuilder(vaapi.NewVP8Encoder),
  codec.VideoEncoderBuilder(vpx.NewVP8Encoder),
)

One problem is that user code doesn't have a way to know which implementation is used, so it's difficult to pass codec specific parameter if using fallback codec.
Adding prop.Codec.ImplementationName string and pass multiple CodecParams for each ImplementationName as map[string]interface{}?

Define flexible audio data interface

Add something like image.Image to support variable channel numbers and sample formats.
It makes conversion of channel size, sampling rate, and sample format easy.

Read from camera times-out on go1.14rc1

It's not a problem for now, but I would like to leave a note.

Following test just checks OnEnded callback.

package main

import (
	"testing"
	"time"

	"github.com/pion/mediadevices"
	_ "github.com/pion/mediadevices/pkg/codec/vpx"
	"github.com/pion/mediadevices/pkg/frame"
	"github.com/pion/webrtc/v2"
)

func TestMain(t *testing.T) {
	configs := map[string]webrtc.Configuration{
		"WithSTUN": {
			ICEServers: []webrtc.ICEServer{
				{URLs: []string{"stun:stun.l.google.com:19302"}},
			},
		},
		"WithoutSTUN": {
			ICEServers: []webrtc.ICEServer{},
		},
	}
	for name, config := range configs {
		t.Run(name, func(t *testing.T) {
			peerConnection, err := webrtc.NewPeerConnection(config)
			if err != nil {
				t.Fatal(err)
			}

			md := mediadevices.NewMediaDevices(peerConnection)

			s, err := md.GetUserMedia(mediadevices.MediaStreamConstraints{
				Video: func(c *mediadevices.MediaTrackConstraints) {
					c.CodecName = videoCodecName
					c.FrameFormat = frame.FormatI420
					c.Enabled = true
					c.Width = 640
					c.Height = 480
				},
			})
			if err != nil {
				t.Fatal(err)
			}
			trackers := s.GetTracks()
			if len(trackers) != 1 {
				t.Fatal("wrong number of the tracks")
			}
			peerConnection.AddTrack(trackers[0].Track())
			trackers[0].OnEnded(func(err error) {
				t.Error(err)
			})
			time.Sleep(10 * time.Second)
			trackers[0].OnEnded(func(err error) {})
			peerConnection.Close()
			trackers[0].Stop()
			time.Sleep(time.Second)
		})
	}
}

with treating camera read timeout as error:

diff --git a/pkg/driver/camera/camera_linux.go b/pkg/driver/camera/camera_linux.go
index cee43b2..f7202f8 100644
--- a/pkg/driver/camera/camera_linux.go
+++ b/pkg/driver/camera/camera_linux.go
@@ -4,6 +4,7 @@ package camera
 import "C"
 
 import (
+       "errors"
        "image"
        "io"
 
@@ -97,6 +98,7 @@ func (c *camera) VideoRecord(p prop.Media) (video.Reader, error) {
                        switch err.(type) {
                        case nil:
                        case *webcam.Timeout:
+                               return nil, errors.New("read timeout")
                                continue
                        default:
                                // Camera has been stopped.

It fails with stun server only on go1.14rc1.

$ go1.14rc1 test . -v
=== RUN   TestMain
=== RUN   TestMain/WithSTUN
    TestMain/WithSTUN: main_test.go:51: read timeout
=== RUN   TestMain/WithoutSTUN
--- FAIL: TestMain (32.97s)
    --- FAIL: TestMain/WithSTUN (21.91s)
    --- PASS: TestMain/WithoutSTUN (11.06s)
FAIL
FAIL	github.com/pion/mediadevices/examples/simple	32.986s
FAIL
$ go1.13 test . -v
=== RUN   TestMain
=== RUN   TestMain/WithSTUN
=== RUN   TestMain/WithoutSTUN
--- PASS: TestMain (27.74s)
    --- PASS: TestMain/WithSTUN (16.67s)
    --- PASS: TestMain/WithoutSTUN (11.07s)
PASS
ok  	github.com/pion/mediadevices/examples/simple	27.756s

I will check it again once next RC of Go1.14 gets available.

Improve Windows drivers

A follow-up of #83 and #89

Possible improvements

Microphone

Enumerate devices
Detect device disconnection and return EOF
Get actual properties
Migrate to newer API? (DirectSound, Media Foundation)

Camera

Support other pixel formats
Detect device disconnection and return EOF
Migrate to newer API? (Media Foundation)

Any other improvements are welcomed!

Pass codec specific encoding parameters

Maybe as map[string]interface{}?
prop.Codec.Quality can be dropped instead.

Add microphone adapter to Darwin (Mac)

In Darwin, we can get access to the microphone through https://developer.apple.com/av-foundation/. To access this API, we need to use either Swift/Objective C.

Redesign GetUserMedia API

As of now, GetUserMedia accepts a single parameter, MediaStreamConstraints defined as follows:

func (m *mediaDevices) GetUserMedia(constraints MediaStreamConstraints) (MediaStream, error) {
   ...
}

type MediaStreamConstraints struct {
	Audio MediaOption
	Video MediaOption
}

// MediaTrackConstraints represents https://w3c.github.io/mediacapture-main/#dom-mediatrackconstraints
type MediaTrackConstraints struct {
	prop.Media
	Enabled bool
	// VideoEncoderBuilders are codec builders that are used for encoding the video
	// and later being used for sending the appropriate RTP payload type.
	//
	// If one encoder builder fails to build the codec, the next builder will be used,
	// repeating until a codec builds. If no builders build successfully, an error is returned.
	VideoEncoderBuilders []codec.VideoEncoderBuilder
	// AudioEncoderBuilders are codec builders that are used for encoding the audio
	// and later being used for sending the appropriate RTP payload type.
	//
	// If one encoder builder fails to build the codec, the next builder will be used,
	// repeating until a codec builds. If no builders build successfully, an error is returned.
	AudioEncoderBuilders []codec.AudioEncoderBuilder
	// VideoTransform will be used to transform the video that's coming from the driver.
	// So, basically it'll look like following: driver -> VideoTransform -> codec
	VideoTransform video.TransformFunc
	// AudioTransform will be used to transform the audio that's coming from the driver.
	// So, basically it'll look like following: driver -> AudioTransform -> code
	AudioTransform audio.TransformFunc
}

type MediaOption func(*MediaTrackConstraints)

From the type definitions above, we see that we're using MediaTrackConstraints for unrelated stuff such as:

VideoEncoderBuilders
AudioEncoderBuilders
VideoTransform
AudioTransform

I think we should somehow move them away from MediaTrackConstraints because,

It's less confusing for the API user
Also, it'll make easier later when we want to interop with JS

The purpose of this issue thread is to talk about possible designs that can solve the problems above.

Redesign codec

Problem

In order to specify what codecs should be used, users need to:

Import a specific codec to get the side effect, registering itself to the registrar:

import (
   ...
   _ "github.com/pion/mediadevices/pkg/codec/openh264" // This is required to register h264 video encoder
   ...
)

Specify a proper codec name from the following possible values to GetUserMedia:

// From github.com/pion/webrtc
package webrtc
const (
	PCMU = "PCMU"
	PCMA = "PCMA"
	G722 = "G722"
	Opus = "OPUS"
	VP8  = "VP8"
	VP9  = "VP9"
	H264 = "H264"
)

// From example
md.GetUserMedia(mediadevices.MediaStreamConstraints{
		Audio: func(c *mediadevices.MediaTrackConstraints) {
			c.CodecName = webrtc.Opus
			c.Enabled = true
			c.BitRate = 32000 // 32kbps
		},
		Video: func(c *mediadevices.MediaTrackConstraints) {
			c.CodecName = webrtc.H264
			c.FrameFormat = frame.FormatYUY2
			c.Enabled = true
			c.Width = 640
			c.Height = 480
			c.BitRate = 100000 // 100kbps
		},
	})

From the points above, it shows that the current design (using import as a side effect only to register the codec) requires implicit knowledge from the users, they need to know what kind of codec name that is being registered because they have to give the same CodecName that's registered in the imports.

Not only this design is confusing and error-prone, it's also not scalable and inflexible. What if we want to specify codec specific parameters (#106)? How about having a fallback method (#108), e.g. fallback to software encoder when hardware acceleration is not available?

The solution for the above needs seems to be solvable by using an empty interface. But, the problem with empty interfaces is that we lose static type check.

Refractor track to be more DRY

Due to the differences between audio and video and lack of generics in Go, there are many duplicated codes in https://github.com/pion/mediadevices/blob/38deddc4f0bb0ceae8391ad33b474c3ecdb0c267/track.go.

Although this is ok, it's still better to reduce duplicated codes so that it's easier to maintain.

Add libva based codecs

libva supports hardware accelerated encoding/decoding of MPEG-2, MPEG-4 ASP/H.263, MPEG-4 AVC/H.264, VC-1/VMW3, and JPEG, HEVC/H265, VP8, VP9.

For example, since Kaby Lake, Intel CPU has VP8/9 accelerator .

VP8 @at-wat
VP9 @at-wat
H264

Pass also requested properties to Video/AudioRecord

Only selected prop is passed to Video/AudioRecord at now.

For example on screen capture driver, FrameRate is not discrete.
It would be nice to pass the selected prop and also the requested prop to read such parameters.

driver/camera: support UYVY pixel format

webcam broadcasting

Summary

Broadcast webcam stream from server to browser.

Motivation

This feature definitely should be in examples section.
It has many use cases: from home security control to live video streaming.
Nowadays it's a must have feature.

Alternatives

Alternatives: Python (aiortc), Java (Kurento), linux-projects (UV4L)

Additional context

This feature is very much in demand but alternatives have limitations.
I.e. Kurento requires Java Machine to be installed, aiortc has difficulties with setup due to Python versioning and package support, UV4L is not stable, pretty opinionated and it's not open sourced.
So, this is the case where Go will shine.

Manage demo page in this repository

jsfiddle can load code from GitHub repository like:
https://jsfiddle.net/gh/get/library/pure/pion/example-webrtc-applications/tree/master/save-to-webm/jsfiddle
https://github.com/pion/example-webrtc-applications/tree/master/save-to-webm/jsfiddle

The demo page contains extra transceiver.

// Offer to receive 1 audio, and 2 video tracks
pc.addTransceiver('audio', {'direction': 'recvonly'})
pc.addTransceiver('video', {'direction': 'recvonly'})
pc.addTransceiver('video', {'direction': 'recvonly'})

It would be more clean to have audio/video demo and video only demo separately.

Create getUserMedia interface for Linux

Summary

Create https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getUserMedia interface for Linux.

Internally set driver priority

Currently, there is a randomness of audio device selection. (since Go map is unordered.)
Audio source device may contains monitor which is a loopback of audio output.
Adding little bit higher priority to non-monitor device and/or system default device might stabilize selection and suit for typical use.

Add a way to control codec encoder parameters

In Web API, bitrate of the codecs are controlled by SDP like:

a=mid:audio
b=AS:000

but it's too complicated for this package.

Directly configuring them on GetUserMedia would be better for us.
For example, like

  s, err := md.GetUserMedia(mediadevices.MediaStreamConstraints{
    Audio: func(c *mediadevices.MediaTrackConstraints) {
      c.Codec = webrtc.Opus
      c.BitRate = 32000 // 32kbps
      ...
    },
    Video: func(c *mediadevices.MediaTrackConstraints) {
      c.Codec = videoCodecName
      c.BitRate = 100000 // 100kbps
      c.KeyFrameInterval = 100
      ...
    },
  })

  s, err := md.GetUserMedia(mediadevices.MediaStreamConstraints{
    Audio: func(c *mediadevices.MediaTrackConstraints, c2 *mediadevices.CodecParameters) {
      c.Codec = webrtc.Opus
      c2.BitRate = 32000 // 32kbps
      ...
    },
    Video: func(c *mediadevices.MediaTrackConstraints, c2 *mediadevices.CodecParameters) {
      c.Codec = videoCodecName
      c2.BitRate = 100000 // 100kbps
      c2.KeyFrameInterval = 100
      ...
    },
  })

Create Dockerfiles for development

Description

This goal is to make it easy for cross OS development with Linux by containerizing the dependencies for each OS in its own Docker image.

The runtime:

Windows: https://www.winehq.org/
Darwin: https://www.darlinghq.org/

Add another NewMediaDevices without PeerConnection

PeerConnection is only used to list up supported codecs.
Constructor directly supplying list of codecs would make it capable for non-WebRTC purposes as same as the Web API.

Add fast decodeYUY2

Profile of #102

      flat  flat%   sum%        cum   cum%
     0.52s 33.77% 33.77%      0.52s 33.77%  runtime.cgocall
     0.47s 30.52% 64.29%      0.53s 34.42%  github.com/pion/mediadevices/pkg/frame.decodeYUY2
     0.09s  5.84% 70.13%      0.09s  5.84%  runtime.usleep

decodeYUY2 occupies almost same amount of CPU time of hardware accelerated VP8 encoding.

Add information of the required libraries to example READMEs

https://github.com/pion/webrtc/issues/1102#issuecomment-605429312

GetUserMedia doesn't close driver if request is partially successed

When calling GetUserMedia with audio and video constraints, if one of audio/video failed to be initialized, other one is left opened. Then, the next GetUserMedia call always fail with invalid state: driver is already opened.

Switch from using a compressed format to an uncompressed format

Add camera adapter to Windows

Not sure what API can we use Windows yet. We need to do some research.

Reduce mediadevices complexity

As of now, mediadevices uses many interfaces:

While using interfaces makes the design very flexible, it doesn't give that benefit for free. Following are some of the downsides:

Create some boilerplates and reduce maintainability because interfaces don't have implementation details. So, we'll always end up with 1 type for the interface and another type for the struct.
Increase API complexity if we overuse interfaces. Every time we use an interface, it adds a layer of indirection to the actual definition, which is ok if they're only a few defined interfaces. But, the problem quickly arises when we have multiple of them and they're spread all over the place in a package.
Reduce docs readability. I think this is similar to the second point. Since using interfaces adds an extra layer of indirection, it requires more thought process to the reader.

While I've laid out some of the downsides of using interfaces above, I still think that they're great and should be used appropriately. So, I think we should try to get rid of some of the interfaces and replace them with structs, we should remove the ones that don't require flexibilities.

In my opinion, we should convert MediaDevices and MediaStream interfaces to structs.

Note: Hopefully when pion/webrtc v3 is ready, Tracker and LocalTrack interfaces can get merged to pion/webrtc.

Adapters should be grouped by their categories

As of now, microphone and camera adapters are in the same folder, driver. To make it more organized and modularized, it's good to group them based on their categories.

Before:

driver
-- microphone_linux.go
-- camera_linux.go

After:

driver
-- camera
   -- camera_linux.go
-- microphone
   -- microphone_linux.go

This way, the separation between devices is clear, and it'll put less cognitive load for the driver implementor.

Add video throttle transform

Drop frames to limit output framerate.

Implement faster ToI420

Pure Go implementation of this through image.Image interface requires huge amount of overhead.

Add screen capture adapter for Linux

For linux, libx11 XOpenDisplay, XDefaultRootWindow, XShmGetImage, ShmPutImage could be used.

Camera always times out on Go 1.14 Linux

Not yet digged into the details.
Same source code works on go1.13.8.

Linux host environment: Linux 5.4.19-100.fc30.x86_64 #1 SMP Tue Feb 11 22:27:11 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Go version: go version go1.14 linux/amd64

Add device/codec error handler

In Web API, MediaStreamTrack.ended event is fired and MediaStreamTrack.onended handler is called on such errors.
https://developer.mozilla.org/en-US/docs/Web/API/MediaStreamTrack/onended

This event occurs when the track will no longer provide data to the stream for any reason, including the end of the media input being reached, the user revoking needed permissions, the source device being removed, or the remote peer ending a connection.

mediadevices/track.go

Lines 115 to 124 in e4da8fa

 n, err = vt.encoder.Read(buff) 

 if err != nil { 

 if e, ok := err.(*mio.InsufficientBufferError); ok { 

 buff = make([]byte, 2*e.RequiredSize) 

 continue 

 } 

 // TODO: better error handling 

 panic(err) 

 }

mediadevices/track.go

Lines 181 to 185 in e4da8fa

 n, err := t.encoder.Read(buff) 

 if err != nil { 

 // TODO: better error handling 

 panic(err) 

 }

Stream input from remote track

Receive RTP from WebRTC remote track, decode, process (by Audio/VideoTransform), and encode.
This realizes tiling (or picture-in-picture) of multiple streams into one stream, and audio mixing.

Support more pixel format in x11 screen capture driver

Current code assumes that the pixel format is 32bit RGBA.

Add microphone adapter to Windows

Not sure what API can we use Windows yet. We need to do some research.

How to get audio from a microphone

Hello

I want to make a video call without a browser by connecting my RaspberryPi with a microphone and a webcam.

SFU uses Janus.

The video-room example is being modified. I want advice.

I want to set the input from the microphone as an audio track in pion. How can I do that?
Do you have any samples?

raspivid support

Summary

Add one more example with raspivid support instead of gstreamer to provide hardware encoding for Raspberry Pi.

Motivation

Since Raspberry Pi is wildly used in many projects it would be nice to add webrtc implementation with Go for it.

Alternatives

No such brilliant alternatives like Go & webrtc yet.

Additional context

Just to send video from Raspberry Pi to browser with raspivid for hardware encoding.

Decouple from webrtc stuff

I think we should try to decouple mediadevices from webrtc stuff so that it can be more generic and useful for a wider audience. Also, if we look at the original definition of the MediaDevices API from Mozilla, they never mentioned that the API is solely for webrtc:

The MediaDevices interface provides access to connected media input devices like cameras and microphones, as well as screen sharing. In essence, it lets you obtain access to any hardware source of media data.

Reference: https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices

Inserting custom image processor

It would be useful if a custom image processor (func CustomImageProcessor(r video.Reader) video.Reader) can be inserted between device and codec.

Personally, I would like to use this package as a replacement of gstreamer. Clock overlay to the image is what I want to insert by using it.

Update codec builder to support rate limiting while running

As of now, VideoEncoderBuilder and AudioEncoderBuilder only return io.ReadCloser and `error:

BuildAudioEncoder(r audio.Reader, p prop.Media) (io.ReadCloser, error)
BuildVideoEncoder(r video.Reader, p prop.Media) (io.ReadCloser, error)

While returning io.ReadCloser is very idiomatic, io.ReadCloser is not enough for our need. The main limitation is rate-limiting, we can't adjust the codec parameters on the fly or decreasing/increasing the bitrate as needed depending on the current network speed and quality.

So, instead of returning io.ReadCloser, it's better to instead return a new interface that embeds io.ReadCloser and has another method that updates the BaseParam:

package codec

import "io"

type ReadCloser interface {
	io.ReadCloser
	Update(params BaseParams) error
}

Track (ID: video, Label: 10afd1bd-dee0-452a-85eb-b49cbea60194) ended with error: EOF

	// Video represents a video's properties
	type Video struct {
	Width, Height int
	FrameRate float32
	FrameFormat frame.Format
	}

	// Audio represents an audio's properties
	type Audio struct {
	ChannelCount int
	Latency time.Duration
	SampleRate int
	SampleSize int
	}

	n, err = vt.encoder.Read(buff)
	if err != nil {
	if e, ok := err.(*mio.InsufficientBufferError); ok {
	buff = make([]byte, 2*e.RequiredSize)
	continue
	}

	// TODO: better error handling
	panic(err)
	}

	n, err := t.encoder.Read(buff)
	if err != nil {
	// TODO: better error handling
	panic(err)
	}

pion / mediadevices Goto Github PK

mediadevices's People

Contributors

Stargazers

Watchers

Forkers

mediadevices's Issues

Possible improvements

Microphone

Camera

Problem

Summary

Motivation

Alternatives

Additional context

Summary

Description

Summary

Motivation

Alternatives

Additional context

Recommend Projects

Recommend Topics

Recommend Org