Git Product home page Git Product logo

awslabs / aws-clustered-video-streams Goto Github PK

View Code? Open in Web Editor NEW
41.0 5.0 9.0 1.13 MB

A clustered video stream is an AWS architecture that increases the quality and reliability of live events by providing seamless regional failover capabilities for live video steams. Operators can monitor the status of the clustered stream from a single pane of glass and dynamically control from which region the stream consumed by a player originates.

License: Apache License 2.0

Dockerfile 0.66% JavaScript 42.00% Shell 19.21% Python 31.33% HTML 6.80%
hls-live-streaming hls playlist playlist-parser mediapackage mediastore streaming-video outage hls-video monitoring

aws-clustered-video-streams's Introduction

Clustered Video Streams (CVS)

Overview

A clustered video stream is an AWS architecture that increases the quality and reliability of live events by providing seamless regional failover capabilities for live video steams.

Operators can monitor the status of the clustered stream from a single pane of glass and dynamically control which region the stream consumed by a player originates from.

Failure scenarios addressed:

  • Individual live stream interruption - Some component in a live stream (encoder, network) goes down and the live stream stops producing new video segments.
  • AWS region failure - A regionalized disaster causes an AWS outage in a specific region. Regional redundancy at all points of the architecture ensure the live stream can recover by failing over to a new region.

How it works

A clustered video stream is composed of N identical redundant live video stream instances that are each deployed in a different AWS region. Each stream instance has an origin HTTP(S) endpoint (MediaPackage, S3, etc.) and an AWS CloudFront CDN HTTP(S) endpoint that are unique to the region. Stream instances are shown in the purple shaded boxes in Figure 1 - clustered video stream architecture below.

Image: image

Figure 1 - clustered video stream architecture

Origin health checks are used to monitor the health of each stream instance. Health checks continuously test the stream instance for different failure states. When changes are detected, a message is written to an SNS topic to notify consumers. Currently, this system has one health check, called the stale playlist detector, that checks the “liveness“ of a stream instance by monitoring changes to the segments availble in stream playlists. If the stream stops producing new segments within a time threshold, a failure is detected.

Clustered video stream state is stored in a DynamoDB global state table so that the state of all the stream instances can be accessed from any region in the cluster. The state table stores the desired state and health status of each stream instance.

  • domain - CloudFront domain for the stream instance. Used a key to uniquely identify each stream instance.
  • distro_open - indicates the desired behavior of a distribution and can be set by an end user.
  • stale - indicates whether a stale playlist health check has detected a failure.

A Lambda@Edge function, called the copilot, is used to change the HTTP(S) responses to requests for variant playlists and segments from each stream instance. The copilot lambda is installed on the CloudFront distribution for each stream instance and is triggered by origin-response CloudFront events. The lambda checks the desired state of the stream instance in the state table and will change the HTTP(S) response code to 404 if the distribution is closed (i.e. distro_open is false). This will trigger error handling in the player to try a different stream variant.

A merged, multi-region, master playlist is constructed from the top level playlists of each stream instance. This playlist contains the CloudFront endpoints for the stream variants (bitrate ladder playlists) for all of the redundant regions. The master playlist is the origin for the CDN hosted stream that is consumed by the video player.

The HLS adaptive bitrate protocol enables video players to select from any of the available variants of a stream at any segment boundary. The player might even select variants from different regions while playing the same video.

Variant selection is determined by the player based on performance and health of the variant being played. If a player recieves errors (such as 404s) trying to retrieve segments from a particular variant, it will switch to another variant at the same or different bitrate if one is available.

A failover occurs when an operator closes a distribution for a stream instance by setting the distro_open attribute to false for that instance. The copilot lambda will force a 404 return code in responses to all requests for that stream instance. This forces the player to switch to requesting a stream instance in another region. As deployed, this system supports manual failover that must be initiated by an end user by setting the distro_open flag for stream instances. Automatic failover would be a natural future extension to this capability.

Image: copilot-HLS.png

Continue to the INSTALL guide and try the Clustered Video Streams for yourself.

Navigate

Navigate to README | INSTALL | DESIGN

aws-clustered-video-streams's People

Contributors

aburkleaux-amazon avatar dependabot[bot] avatar jimtharioamazon avatar jpeddicord avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

aws-clustered-video-streams's Issues

JavaScript error in dashboard after providing all input fields

dashboard.js:130 MissingRequiredParameter: Missing required key 'TableName' in params
at constructor.fail (https://sdk.amazonaws.com/js/aws-sdk-2.480.0.min.js:62:29767)
at constructor.validateStructure (https://sdk.amazonaws.com/js/aws-sdk-2.480.0.min.js:62:29977)
at constructor.validateMember (https://sdk.amazonaws.com/js/aws-sdk-2.480.0.min.js:62:30353)
at constructor.validate (https://sdk.amazonaws.com/js/aws-sdk-2.480.0.min.js:62:29423)
at constructor. (https://sdk.amazonaws.com/js/aws-sdk-2.480.0.min.js:62:4316)
at constructor.callListeners (https://sdk.amazonaws.com/js/aws-sdk-2.480.0.min.js:64:5128)
at s (https://sdk.amazonaws.com/js/aws-sdk-2.480.0.min.js:64:4994)
at https://sdk.amazonaws.com/js/aws-sdk-2.480.0.min.js:62:3649
at t (https://sdk.amazonaws.com/js/aws-sdk-2.480.0.min.js:61:4075)
at https://sdk.amazonaws.com/js/aws-sdk-2.480.0.min.js:61:4396 "MissingRequiredParameter: Missing required key 'TableName' in params
at constructor.fail (https://sdk.amazonaws.com/js/aws-sdk-2.480.0.min.js:62:29767)
at constructor.validateStructure (https://sdk.amazonaws.com/js/aws-sdk-2.480.0.min.js:62:29977)
at constructor.validateMember (https://sdk.amazonaws.com/js/aws-sdk-2.480.0.min.js:62:30353)
at constructor.validate (https://sdk.amazonaws.com/js/aws-sdk-2.480.0.min.js:62:29423)
at constructor. (https://sdk.amazonaws.com/js/aws-sdk-2.480.0.min.js:62:4316)
at constructor.callListeners (https://sdk.amazonaws.com/js/aws-sdk-2.480.0.min.js:64:5128)
at s (https://sdk.amazonaws.com/js/aws-sdk-2.480.0.min.js:64:4994)
at https://sdk.amazonaws.com/js/aws-sdk-2.480.0.min.js:62:3649
at t (https://sdk.amazonaws.com/js/aws-sdk-2.480.0.min.js:61:4075)
at https://sdk.amazonaws.com/js/aws-sdk-2.480.0.min.js:61:4396"

Option to perform HEAD operations on segments

Allow the ability to check for presence of segment files referred by a playlist.
A missing segment may be an immediate stale condition or other condition.
Make this optional for performance considerations.

Lambda@Edge removal attempt causes an error during stack delete

It looks like we can't remove replicated Lambda functions for a couple of hours after we remove the associations from CloudFront. The Lambda function might need to be left behind on a stack delete.

After you delete a Lambda@Edge function association from a CloudFront distribution, you can optionally delete the Lambda function or function version from AWS Lambda. You can also delete a specific version of a Lambda function if the version doesn’t have any CloudFront distributions associated with it. If you remove all the associations for a Lambda function version, you can typically delete the function version a few hours later.

https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/lambda-edge-delete-replicas.html

Automatic top-level playlist

Create the merged master playlist and put it in the S3 buckets created by clustered-video-stream-playlist.

This could be a Lambda that samples the top-level playlists from each origin periodically and checks for any changes to either.

When a change happens, generate a new top-level playlist with segments playlists from each origin and store in each of the regional buckets. After storing the new top-level playlist, invalidate the CloudFront distribution so the new playlist will be used and cached.

Generated SNS Topic access policy problem

I think we need to leave the default (none) access policy on the SNS topic. It should only allow principals from the current account to use the topic. The installed policy appears to cause a problem sending from the ECS container.

Alternative bucket copy commands

I am not sure why aws s3 cp was doing weird command line globbing. I was able to use the aws s3 sync command to get the job done.

AWS_PROFILE=personal AWS_DEFAULT_REGION=us-west-2 aws s3 sync global-s3-assets/ s3://jtthario-cvs-us-west-2/jtthario-cvs/v1.0.0/
AWS_PROFILE=personal AWS_DEFAULT_REGION=us-west-2 aws s3 sync regional-s3-assets/ s3://jtthario-cvs-us-west-2/jtthario-cvs/v1.0.0/

AWS_PROFILE=personal AWS_DEFAULT_REGION=us-east-1 aws s3 sync global-s3-assets/ s3://jtthario-cvs-us-east-1/jtthario-cvs/v1.0.0/
AWS_PROFILE=personal AWS_DEFAULT_REGION=us-east-1 aws s3 sync regional-s3-assets/ s3://jtthario-cvs-us-east-1/jtthario-cvs/v1.0.0/

Automate deployment of the stale playlist detector on Fargate

Create a CloudFormation script to automate deployment of the SPD as a Fargate task.

  • The monitored live stream must have a Cloudfront origin.
  • Each stale-playlist-detector stack monitors one live stream.
  • Multiple stacks can be deployed to monitor multiple streams in the same account.

Support for variable segment length in playlists

How to support the timing of a playlist with periodic lower segment durations.

#EXT-X-PROGRAM-DATE-TIME:2019-11-26T20:06:50.000Z
#EXT-X-CUE-OUT-CONT:ElapsedTime=4.533,Duration=32,SCTE35=/DAlAAAAAsrYAP/wFAUAApOff+/+UtrKsP4AKpJwAAEBAQAAKTZmog==
#EXTINF:6.000,
index_1_115763.ts?m=1574115362
#EXT-X-PROGRAM-DATE-TIME:2019-11-26T20:06:56.000Z
#EXT-X-CUE-OUT-CONT:ElapsedTime=10.533,Duration=32,SCTE35=/DAlAAAAAsrYAP/wFAUAApOff+/+UtrKsP4AKpJwAAEBAQAAKTZmog==
#EXTINF:6.000,
index_1_115764.ts?m=1574115362
#EXT-X-PROGRAM-DATE-TIME:2019-11-26T20:07:02.000Z
#EXT-X-CUE-OUT-CONT:ElapsedTime=16.533,Duration=32,SCTE35=/DAlAAAAAsrYAP/wFAUAApOff+/+UtrKsP4AKpJwAAEBAQAAKTZmog==
#EXTINF:6.000,
index_1_115765.ts?m=1574115362
#EXT-X-PROGRAM-DATE-TIME:2019-11-26T20:07:08.000Z
#EXT-X-CUE-OUT-CONT:ElapsedTime=22.533,Duration=32,SCTE35=/DAlAAAAAsrYAP/wFAUAApOff+/+UtrKsP4AKpJwAAEBAQAAKTZmog==
#EXTINF:6.000,
index_1_115766.ts?m=1574115362
#EXT-X-PROGRAM-DATE-TIME:2019-11-26T20:07:14.000Z
#EXT-X-CUE-OUT-CONT:ElapsedTime=28.533,Duration=32,SCTE35=/DAlAAAAAsrYAP/wFAUAApOff+/+UtrKsP4AKpJwAAEBAQAAKTZmog==
#EXTINF:2.467,
index_1_115767.ts?m=1574115362
#EXT-X-PROGRAM-DATE-TIME:2019-11-26T20:07:16.466Z
#EXT-X-CUE-IN
#EXTINF:5.533,
index_1_115768.ts?m=1574115362
#EXT-X-PROGRAM-DATE-TIME:2019-11-26T20:07:22.000Z
#EXTINF:6.000,
index_1_115769.ts?m=1574115362

Refresh playlist set periodically

Playlists may change during runtime from encoder reconfiguration. Have a default or operator-specified refresh time for the playlists -- every X minutes?

Regional MediaPackage sync

How do you keep the two separate regions in sync?
I assume having the top master playlist contain redundant streams means that they are aligned in timestamp and segment numbers.

Missing playlists at start time are not detected

If a playlist is missing (HTTP 404) the initial expiration value for it is never set. A 404ing playlist should also set the expiration timer, or be escalated to a immediate stale playlist status.

Video player recommendations

We need a list of best players to support regional switch over. So far we have identified Safari-based players on the Mac, and to an extent Video.js. We need a more comprehensive list of players that behave correctly when performing a discontinuous timecode switchover.

Add a CORS policy to the master playlist buckets

It turns out we'll need a permissive CORS policy for the buckets when using the master playlist from a browser page loaded from another location.

THEOplayer.js:39 VIDEOJS: ERROR: (CODE:3 MEDIA_ERR_DECODE) Could not load the manifest file. Make sure the source is set correctly and that CORS support is enabled.

Here is a policy I've used in the past:

<?xml version="1.0" encoding="UTF-8"?>
<CORSConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
<CORSRule>
    <AllowedOrigin>*</AllowedOrigin>
    <AllowedMethod>GET</AllowedMethod>
    <AllowedHeader>*</AllowedHeader>
</CORSRule>
</CORSConfiguration>

Build command for profiles/regions example

Here's the build command I used with many profiles in my credentials file. You can prefix temporary environment variables at the front for overrides.

AWS_PROFILE=personal AWS_DEFAULT_REGION=us-west-2 ./build-s3-dist.sh jtthario-cvs jtthario-cvs v1.0.0

Install into an existing ECS cluster

We generate a VPC and Fargate cluster for each installation currently. I'd like to specify the ECS cluster name which may be a Fargate cluster or an EC2 cluster, and have the template install into that.

SPD: fix playlist metric results after only 1 sample

Each playlist is using 0 as the start time for the first sample delta calculation. This is producing a very large result and is throwing off the first set of sample metrics.

We will initialize this value with the current time when the playlist object is created.

Adding more variables

Is there any examples how to add more variables to the command line or startup script?

SPD: segment length divisor exposed to variable

The segment time divisor has been hardcoded as 5 in the SPD. This means the interval between samples of playlists is (segment-time / 5) = 1.2s for 6s segments, 0.8s for 4s segments, and 0.4s for 2s segments.

We should expose this as an optional setting with the current default for backward compatibility.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.