Git Product home page Git Product logo

pion / mediadevices Goto Github PK

View Code? Open in Web Editor NEW
521.0 521.0 120.0 31.17 MB

Go implementation of the MediaDevices API.

Home Page: https://pion.ly/

License: MIT License

Go 66.70% C++ 4.55% C 26.01% Dockerfile 0.34% Objective-C 1.74% Shell 0.02% CMake 0.05% Makefile 0.59%
audio-call codec driver face-recognition go golang livestream machine-learning mediadevices mediadevices-api p2p rtp streaming video-call voip webrtc

mediadevices's People

Contributors

adamroach avatar aljanabim avatar andrein avatar at-wat avatar bazile-clyde avatar digitalix avatar edaniels avatar emrysmyrddin avatar f-fl0 avatar hexbabe avatar infamy avatar kamatama41 avatar kim-mishra avatar kw-m avatar lherman-cs avatar martha-johnston avatar neversi avatar qiulin avatar renovate-bot avatar renovate[bot] avatar sean-der avatar seanavery avatar stv0g avatar tarrencev avatar wawesomenogui avatar zjzhang-cn avatar zyxar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mediadevices's Issues

Support ranged media constraints

Pixel format is scored by complete match at now.
For example, I want to select one of YUY2 and UYVY but not want to use JPEG.

// Video represents a video's properties
type Video struct {
Width, Height int
FrameRate float32
FrameFormat frame.Format
}
// Audio represents an audio's properties
type Audio struct {
ChannelCount int
Latency time.Duration
SampleRate int
SampleSize int
}

Get encoding media properties from Transformer

Currently, source media properties are passed to encoder directly, however, Video/AudioTransformer may change the media properties.
e.g. VideoTransformer may change frame rate and size, AudioTransformer may change number of the channels

Add support for non-YCbCr input to ToI420

As of now, our video encoders, openh264 and vpx, only support I420 format. Therefore, we have ToI420 converter that helps to handle any kind of image format.

However, ToI420 can only handle YCbCr images at this moment. So, the motivation for adding more image formats would be,

  1. Allow more flexibility for the input
  2. Since we allow users to transform the video through VideoTransform, having a flexible input will also open a door for people to use a library such as https://godoc.org/github.com/disintegration/imaging, which relies highly on NRGBA

Fallback codec

Select a fallback codec implementation if the first one was failed to initialize.
For example with the case of vaapi (higher priority) and vpx, try vaapi first, and if the environment doesn't have video acceleration hardware, use vpx.

It could be something like:

codec.Register(webrtc.VP8, codec.VideoEncoderFallbacks(
  codec.VideoEncoderBuilder(vaapi.NewVP8Encoder),
  codec.VideoEncoderBuilder(vpx.NewVP8Encoder),
)

One problem is that user code doesn't have a way to know which implementation is used, so it's difficult to pass codec specific parameter if using fallback codec.
Adding prop.Codec.ImplementationName string and pass multiple CodecParams for each ImplementationName as map[string]interface{}?

Define flexible audio data interface

Add something like image.Image to support variable channel numbers and sample formats.
It makes conversion of channel size, sampling rate, and sample format easy.

Read from camera times-out on go1.14rc1

It's not a problem for now, but I would like to leave a note.

Following test just checks OnEnded callback.

package main

import (
	"testing"
	"time"

	"github.com/pion/mediadevices"
	_ "github.com/pion/mediadevices/pkg/codec/vpx"
	"github.com/pion/mediadevices/pkg/frame"
	"github.com/pion/webrtc/v2"
)

func TestMain(t *testing.T) {
	configs := map[string]webrtc.Configuration{
		"WithSTUN": {
			ICEServers: []webrtc.ICEServer{
				{URLs: []string{"stun:stun.l.google.com:19302"}},
			},
		},
		"WithoutSTUN": {
			ICEServers: []webrtc.ICEServer{},
		},
	}
	for name, config := range configs {
		t.Run(name, func(t *testing.T) {
			peerConnection, err := webrtc.NewPeerConnection(config)
			if err != nil {
				t.Fatal(err)
			}

			md := mediadevices.NewMediaDevices(peerConnection)

			s, err := md.GetUserMedia(mediadevices.MediaStreamConstraints{
				Video: func(c *mediadevices.MediaTrackConstraints) {
					c.CodecName = videoCodecName
					c.FrameFormat = frame.FormatI420
					c.Enabled = true
					c.Width = 640
					c.Height = 480
				},
			})
			if err != nil {
				t.Fatal(err)
			}
			trackers := s.GetTracks()
			if len(trackers) != 1 {
				t.Fatal("wrong number of the tracks")
			}
			peerConnection.AddTrack(trackers[0].Track())
			trackers[0].OnEnded(func(err error) {
				t.Error(err)
			})
			time.Sleep(10 * time.Second)
			trackers[0].OnEnded(func(err error) {})
			peerConnection.Close()
			trackers[0].Stop()
			time.Sleep(time.Second)
		})
	}
}

with treating camera read timeout as error:

diff --git a/pkg/driver/camera/camera_linux.go b/pkg/driver/camera/camera_linux.go
index cee43b2..f7202f8 100644
--- a/pkg/driver/camera/camera_linux.go
+++ b/pkg/driver/camera/camera_linux.go
@@ -4,6 +4,7 @@ package camera
 import "C"
 
 import (
+       "errors"
        "image"
        "io"
 
@@ -97,6 +98,7 @@ func (c *camera) VideoRecord(p prop.Media) (video.Reader, error) {
                        switch err.(type) {
                        case nil:
                        case *webcam.Timeout:
+                               return nil, errors.New("read timeout")
                                continue
                        default:
                                // Camera has been stopped.

It fails with stun server only on go1.14rc1.

$ go1.14rc1 test . -v
=== RUN   TestMain
=== RUN   TestMain/WithSTUN
    TestMain/WithSTUN: main_test.go:51: read timeout
=== RUN   TestMain/WithoutSTUN
--- FAIL: TestMain (32.97s)
    --- FAIL: TestMain/WithSTUN (21.91s)
    --- PASS: TestMain/WithoutSTUN (11.06s)
FAIL
FAIL	github.com/pion/mediadevices/examples/simple	32.986s
FAIL
$ go1.13 test . -v
=== RUN   TestMain
=== RUN   TestMain/WithSTUN
=== RUN   TestMain/WithoutSTUN
--- PASS: TestMain (27.74s)
    --- PASS: TestMain/WithSTUN (16.67s)
    --- PASS: TestMain/WithoutSTUN (11.07s)
PASS
ok  	github.com/pion/mediadevices/examples/simple	27.756s

I will check it again once next RC of Go1.14 gets available.

Improve Windows drivers

A follow-up of #83 and #89

Possible improvements

Microphone

  • Enumerate devices
  • Detect device disconnection and return EOF
  • Get actual properties
  • Migrate to newer API? (DirectSound, Media Foundation)

Camera

  • Support other pixel formats
  • Detect device disconnection and return EOF
  • Migrate to newer API? (Media Foundation)

Any other improvements are welcomed!

Redesign GetUserMedia API

As of now, GetUserMedia accepts a single parameter, MediaStreamConstraints defined as follows:

func (m *mediaDevices) GetUserMedia(constraints MediaStreamConstraints) (MediaStream, error) {
   ...
}

type MediaStreamConstraints struct {
	Audio MediaOption
	Video MediaOption
}

// MediaTrackConstraints represents https://w3c.github.io/mediacapture-main/#dom-mediatrackconstraints
type MediaTrackConstraints struct {
	prop.Media
	Enabled bool
	// VideoEncoderBuilders are codec builders that are used for encoding the video
	// and later being used for sending the appropriate RTP payload type.
	//
	// If one encoder builder fails to build the codec, the next builder will be used,
	// repeating until a codec builds. If no builders build successfully, an error is returned.
	VideoEncoderBuilders []codec.VideoEncoderBuilder
	// AudioEncoderBuilders are codec builders that are used for encoding the audio
	// and later being used for sending the appropriate RTP payload type.
	//
	// If one encoder builder fails to build the codec, the next builder will be used,
	// repeating until a codec builds. If no builders build successfully, an error is returned.
	AudioEncoderBuilders []codec.AudioEncoderBuilder
	// VideoTransform will be used to transform the video that's coming from the driver.
	// So, basically it'll look like following: driver -> VideoTransform -> codec
	VideoTransform video.TransformFunc
	// AudioTransform will be used to transform the audio that's coming from the driver.
	// So, basically it'll look like following: driver -> AudioTransform -> code
	AudioTransform audio.TransformFunc
}

type MediaOption func(*MediaTrackConstraints)

From the type definitions above, we see that we're using MediaTrackConstraints for unrelated stuff such as:

  • VideoEncoderBuilders
  • AudioEncoderBuilders
  • VideoTransform
  • AudioTransform

I think we should somehow move them away from MediaTrackConstraints because,

  1. It's less confusing for the API user
  2. Also, it'll make easier later when we want to interop with JS

The purpose of this issue thread is to talk about possible designs that can solve the problems above.

Redesign codec

Problem

In order to specify what codecs should be used, users need to:

  1. Import a specific codec to get the side effect, registering itself to the registrar:
import (
   ...
   _ "github.com/pion/mediadevices/pkg/codec/openh264" // This is required to register h264 video encoder
   ...
)
  1. Specify a proper codec name from the following possible values to GetUserMedia:
// From github.com/pion/webrtc
package webrtc
const (
	PCMU = "PCMU"
	PCMA = "PCMA"
	G722 = "G722"
	Opus = "OPUS"
	VP8  = "VP8"
	VP9  = "VP9"
	H264 = "H264"
)

// From example
md.GetUserMedia(mediadevices.MediaStreamConstraints{
		Audio: func(c *mediadevices.MediaTrackConstraints) {
			c.CodecName = webrtc.Opus
			c.Enabled = true
			c.BitRate = 32000 // 32kbps
		},
		Video: func(c *mediadevices.MediaTrackConstraints) {
			c.CodecName = webrtc.H264
			c.FrameFormat = frame.FormatYUY2
			c.Enabled = true
			c.Width = 640
			c.Height = 480
			c.BitRate = 100000 // 100kbps
		},
	})

From the points above, it shows that the current design (using import as a side effect only to register the codec) requires implicit knowledge from the users, they need to know what kind of codec name that is being registered because they have to give the same CodecName that's registered in the imports.

Not only this design is confusing and error-prone, it's also not scalable and inflexible. What if we want to specify codec specific parameters (#106)? How about having a fallback method (#108), e.g. fallback to software encoder when hardware acceleration is not available?

The solution for the above needs seems to be solvable by using an empty interface. But, the problem with empty interfaces is that we lose static type check.

Add libva based codecs

libva supports hardware accelerated encoding/decoding of MPEG-2, MPEG-4 ASP/H.263, MPEG-4 AVC/H.264, VC-1/VMW3, and JPEG, HEVC/H265, VP8, VP9.

For example, since Kaby Lake, Intel CPU has VP8/9 accelerator .

Pass also requested properties to Video/AudioRecord

Only selected prop is passed to Video/AudioRecord at now.

For example on screen capture driver, FrameRate is not discrete.
It would be nice to pass the selected prop and also the requested prop to read such parameters.

webcam broadcasting

Summary

Broadcast webcam stream from server to browser.

Motivation

This feature definitely should be in examples section.
It has many use cases: from home security control to live video streaming.
Nowadays it's a must have feature.

Alternatives

Alternatives: Python (aiortc), Java (Kurento), linux-projects (UV4L)

Additional context

This feature is very much in demand but alternatives have limitations.
I.e. Kurento requires Java Machine to be installed, aiortc has difficulties with setup due to Python versioning and package support, UV4L is not stable, pretty opinionated and it's not open sourced.
So, this is the case where Go will shine.

Manage demo page in this repository

jsfiddle can load code from GitHub repository like:
https://jsfiddle.net/gh/get/library/pure/pion/example-webrtc-applications/tree/master/save-to-webm/jsfiddle
https://github.com/pion/example-webrtc-applications/tree/master/save-to-webm/jsfiddle

The demo page contains extra transceiver.

// Offer to receive 1 audio, and 2 video tracks
pc.addTransceiver('audio', {'direction': 'recvonly'})
pc.addTransceiver('video', {'direction': 'recvonly'})
pc.addTransceiver('video', {'direction': 'recvonly'})

It would be more clean to have audio/video demo and video only demo separately.

Internally set driver priority

Currently, there is a randomness of audio device selection. (since Go map is unordered.)
Audio source device may contains monitor which is a loopback of audio output.
Adding little bit higher priority to non-monitor device and/or system default device might stabilize selection and suit for typical use.

Add a way to control codec encoder parameters

In Web API, bitrate of the codecs are controlled by SDP like:

a=mid:audio
b=AS:000

but it's too complicated for this package.

Directly configuring them on GetUserMedia would be better for us.
For example, like

  s, err := md.GetUserMedia(mediadevices.MediaStreamConstraints{
    Audio: func(c *mediadevices.MediaTrackConstraints) {
      c.Codec = webrtc.Opus
      c.BitRate = 32000 // 32kbps
      ...
    },
    Video: func(c *mediadevices.MediaTrackConstraints) {
      c.Codec = videoCodecName
      c.BitRate = 100000 // 100kbps
      c.KeyFrameInterval = 100
      ...
    },
  })

or

  s, err := md.GetUserMedia(mediadevices.MediaStreamConstraints{
    Audio: func(c *mediadevices.MediaTrackConstraints, c2 *mediadevices.CodecParameters) {
      c.Codec = webrtc.Opus
      c2.BitRate = 32000 // 32kbps
      ...
    },
    Video: func(c *mediadevices.MediaTrackConstraints, c2 *mediadevices.CodecParameters) {
      c.Codec = videoCodecName
      c2.BitRate = 100000 // 100kbps
      c2.KeyFrameInterval = 100
      ...
    },
  })

Add fast decodeYUY2

Profile of #102

      flat  flat%   sum%        cum   cum%
     0.52s 33.77% 33.77%      0.52s 33.77%  runtime.cgocall
     0.47s 30.52% 64.29%      0.53s 34.42%  github.com/pion/mediadevices/pkg/frame.decodeYUY2
     0.09s  5.84% 70.13%      0.09s  5.84%  runtime.usleep

decodeYUY2 occupies almost same amount of CPU time of hardware accelerated VP8 encoding.

Reduce mediadevices complexity

As of now, mediadevices uses many interfaces:

While using interfaces makes the design very flexible, it doesn't give that benefit for free. Following are some of the downsides:

  • Create some boilerplates and reduce maintainability because interfaces don't have implementation details. So, we'll always end up with 1 type for the interface and another type for the struct.
  • Increase API complexity if we overuse interfaces. Every time we use an interface, it adds a layer of indirection to the actual definition, which is ok if they're only a few defined interfaces. But, the problem quickly arises when we have multiple of them and they're spread all over the place in a package.
  • Reduce docs readability. I think this is similar to the second point. Since using interfaces adds an extra layer of indirection, it requires more thought process to the reader.

While I've laid out some of the downsides of using interfaces above, I still think that they're great and should be used appropriately. So, I think we should try to get rid of some of the interfaces and replace them with structs, we should remove the ones that don't require flexibilities.

In my opinion, we should convert MediaDevices and MediaStream interfaces to structs.

Note: Hopefully when pion/webrtc v3 is ready, Tracker and LocalTrack interfaces can get merged to pion/webrtc.

Adapters should be grouped by their categories

As of now, microphone and camera adapters are in the same folder, driver. To make it more organized and modularized, it's good to group them based on their categories.

Before:

driver
-- microphone_linux.go
-- camera_linux.go

After:

driver
-- camera
   -- camera_linux.go
-- microphone
   -- microphone_linux.go

This way, the separation between devices is clear, and it'll put less cognitive load for the driver implementor.

Implement faster ToI420

Pure Go implementation of this through image.Image interface requires huge amount of overhead.

Camera always times out on Go 1.14 Linux

Not yet digged into the details.
Same source code works on go1.13.8.

  • Linux host environment: Linux 5.4.19-100.fc30.x86_64 #1 SMP Tue Feb 11 22:27:11 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Go version: go version go1.14 linux/amd64

Add device/codec error handler

In Web API, MediaStreamTrack.ended event is fired and MediaStreamTrack.onended handler is called on such errors.
https://developer.mozilla.org/en-US/docs/Web/API/MediaStreamTrack/onended

This event occurs when the track will no longer provide data to the stream for any reason, including the end of the media input being reached, the user revoking needed permissions, the source device being removed, or the remote peer ending a connection.

mediadevices/track.go

Lines 115 to 124 in e4da8fa

n, err = vt.encoder.Read(buff)
if err != nil {
if e, ok := err.(*mio.InsufficientBufferError); ok {
buff = make([]byte, 2*e.RequiredSize)
continue
}
// TODO: better error handling
panic(err)
}

mediadevices/track.go

Lines 181 to 185 in e4da8fa

n, err := t.encoder.Read(buff)
if err != nil {
// TODO: better error handling
panic(err)
}

Stream input from remote track

Receive RTP from WebRTC remote track, decode, process (by Audio/VideoTransform), and encode.
This realizes tiling (or picture-in-picture) of multiple streams into one stream, and audio mixing.

How to get audio from a microphone

Hello

I want to make a video call without a browser by connecting my RaspberryPi with a microphone and a webcam.

SFU uses Janus.

The video-room example is being modified. I want advice.

I want to set the input from the microphone as an audio track in pion. How can I do that?
Do you have any samples?

raspivid support

Summary

Add one more example with raspivid support instead of gstreamer to provide hardware encoding for Raspberry Pi.

Motivation

Since Raspberry Pi is wildly used in many projects it would be nice to add webrtc implementation with Go for it.

Alternatives

No such brilliant alternatives like Go & webrtc yet.

Additional context

Just to send video from Raspberry Pi to browser with raspivid for hardware encoding.

Decouple from webrtc stuff

I think we should try to decouple mediadevices from webrtc stuff so that it can be more generic and useful for a wider audience. Also, if we look at the original definition of the MediaDevices API from Mozilla, they never mentioned that the API is solely for webrtc:

The MediaDevices interface provides access to connected media input devices like cameras and microphones, as well as screen sharing. In essence, it lets you obtain access to any hardware source of media data.

Reference: https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices

Inserting custom image processor

It would be useful if a custom image processor (func CustomImageProcessor(r video.Reader) video.Reader) can be inserted between device and codec.

Personally, I would like to use this package as a replacement of gstreamer. Clock overlay to the image is what I want to insert by using it.

Update codec builder to support rate limiting while running

As of now, VideoEncoderBuilder and AudioEncoderBuilder only return io.ReadCloser and `error:

BuildAudioEncoder(r audio.Reader, p prop.Media) (io.ReadCloser, error)
BuildVideoEncoder(r video.Reader, p prop.Media) (io.ReadCloser, error)

While returning io.ReadCloser is very idiomatic, io.ReadCloser is not enough for our need. The main limitation is rate-limiting, we can't adjust the codec parameters on the fly or decreasing/increasing the bitrate as needed depending on the current network speed and quality.

So, instead of returning io.ReadCloser, it's better to instead return a new interface that embeds io.ReadCloser and has another method that updates the BaseParam:

package codec

import "io"

type ReadCloser interface {
	io.ReadCloser
	Update(params BaseParams) error
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.