Git Product home page Git Product logo

webvtt.js's Introduction

WebVTT parser and validator

Relevant links:

Install

You can load the parser.js file into your HTML page and the API will become available on window. Alternatively you can install it using bower (webvtt) or npm (npm install webvtt-parser).

API

This module exports classes to either through window or require()/import; the ones you are likely to need are WebVTTParser and WebVTTSerializer.

To parse a WebVTT string:

import { WebVTTParser } from 'webvtt-parser';
const parser = new WebVTTParser();
const tree = parser.parse(someVTT, 'metadata');

By default, the WebVTT parser only recognizes a small subset of named character entities. If you want the full spec-compliant behavior, pass the content of [[html-entities.json]] to the WebVTTParser() constructor.

To serialize a WebVTT tree to string:

import { WebVTTSerializer } from 'webvtt-parser';
const seri = new WebVTTSerializer();
const tree = seri.serialize(vttTree.cues)

webvtt.js's People

Contributors

annevk avatar dontcallmedom avatar gajdaw avatar davemevans avatar tenphi avatar himorin avatar bearfriend avatar j-kan avatar raminrabani avatar darobin avatar silviapfeiffer avatar zcorpan avatar

Stargazers

Brett avatar Chris Sandvik avatar Levi Sherman avatar  avatar v avatar Kevin Ling avatar  avatar Andrey avatar ISODA Yu avatar soyamagic avatar  avatar Khaveesh avatar Nick Krzemienski avatar  avatar Pieter Wigboldus  avatar L vi Mj avatar Henry avatar Ryota Mogi avatar Anthony avatar Meir Roth avatar Matt Robinson avatar saasfreelancer avatar Max (HyoSeong) Lee avatar Behnam avatar Jinzhan avatar Kirtan Gajjar avatar Pascal Birchler avatar Joseph Curtis avatar Thomas Klemm avatar Liang Cai avatar Minh Tran avatar Harsh Deep avatar Deja Jackson avatar Pete avatar Giovani Generali avatar Frans de Jonge avatar Muralidhar Reddy avatar Chuong avatar Robert Kamuda avatar Kyle Howells avatar  avatar  avatar Artur Parkhisenko avatar Shota Sasaki -- AKAI avatar roadwild avatar  avatar Future Infinity avatar Miguel Sánchez Villafán avatar Nate Lawrence avatar Nguyễn Thanh Tùng avatar Michael Nelson avatar Masaya Ikeo avatar  avatar Baltazar Gomez avatar Moritz Wachter avatar Fehmi Özüseven avatar Anna Malantonio avatar Raul Santos avatar Jon Wilkinson avatar Daniel Sommer avatar  avatar Kyongjin Seo avatar Cosmin Ianculescu avatar  avatar Peipei Guo avatar  avatar Raymel Francisco avatar Alex Jennings avatar Flavio Ribeiro avatar Cat  avatar  avatar Daisuke Miyajima avatar devang desai avatar Chun-Wei Yu avatar  avatar Guillaume Gelin avatar Chris avatar Michael avatar Ben avatar Chaikin Evgenii avatar  avatar Willian Carvalho avatar Dean Edridge avatar Aden Narkowicz avatar  avatar Lauren George avatar shellvon avatar  avatar Joseph Kim avatar SHIMIZU Taku avatar  avatar Andy avatar  avatar VeekXT avatar Kyon Sasaki avatar Shayne Linhart avatar Alexandre Nicastro avatar Alfred Gutierrez avatar John Wehr avatar Andrew Mills avatar

Watchers

 avatar Ólafur Sverrir Kjartansson avatar  avatar James Cloos avatar Marcus Emmanuel Barnes avatar OtsukaNoboru avatar Michael Anthony avatar saasfreelancer avatar  avatar TAn avatar  avatar Masaya Ikeo avatar  avatar  avatar

webvtt.js's Issues

Adding a license to this project

Would it be possible to add an open source licence to this project as it would help us when using this project in major open-source projects.

Thanks.

Region definition block incorrectly parsed as cue

Attempting to parse EXAMPLE 8 from https://w3c.github.io/webvtt/#introduction-other-features gives

Line 4: Cue identifier needs to be followed by timestamp.
Line 12: Cue identifier needs to be followed by timestamp.
Line 19, column 31: Invalid setting.
Line 22, column 31: Invalid setting.
Line 25, column 31: Invalid setting.
Line 28, column 31: Invalid setting.
Line 31, column 31: Invalid setting.
Line 34, column 31: Invalid setting.

Appears that regions are not understood by the header parser, and that the cue setting parser does not know about the region cue setting.

Escape entities not correctly parsed/escaped

A prerequisite to this bug is to fix the bug described in #35.

When the entities object is correctly passed to the WebVTTCueTextParser on line 177 as described in the issue above, there are some problems with parsing escape entities.

Steps to Reproduce

const { parse } = new WebVTTParser({
  "&amp": "&",
  "&": "&",
  "&": "&",
  "&AMP": "&",
});

const text1 = `
WEBVTT

1
00:11:46.140 --> 00:11:48.380
Texas A&M`

const text2 = `
WEBVTT

1
00:11:46.140 --> 00:11:48.380
Texas A&amp`

const text3 = `
WEBVTT

1
00:11:46.140 --> 00:11:48.380
Texas A&ampM`

const parsed1 = parse(text1, "metadata");
console.log(parsed1.cues[0].tree.children[0].value); // Texas A&M (correctly parsed)

const parsed2 = parse(text2, "metadata");
console.log(parsed2.cues[0].tree.children[0].value); // Texas A& (correctly parsed)

const parsed3 = parse(text3, "metadata");
console.log(parsed3.cues[0].tree.children[0].value); // Texas A&ampM (incorrectly parsed)

As you can see if the escape characters &amp are not followed by ; or the end of the string (undefined) but instead followed by another alphanumeric character, the escape characters are not properly parsed.

Solution

I believe some conditional logic needs to be updated or added in lines 632-670 to account for escape entities that are followed by an alphanumeric character.

wrong warning: 'No blank line after the signature'

How I know, after a first line starting with WEBVTT it is ok to have whole paragraph with some data, like, lang, author etc.
But your validator throws an error: No blank line after the signature, asking to add blank line after first line...

In my opinion, it is a valid VTT:

Screenshot from 2021-07-04 06-14-54

File '/node_modules/webvtt-parser/parser.js' is not a module/Unexpected token in parser.js

Having a few issues getting up and running with the WebVTT-Parser. We are wanting to use it in a react/redux environment for live error handling when users submit VTT files (basically just like the demo but in a react/redux javascript environment instead of a hardcoded HTML environment.) Currently running into a few issues when trying to get setup with the parser. After installing via yarn, when I import the constructor I get an error saying File '/node_modules/webvtt-parser/parser.js' is not a module. When I try to ignore that and use the parser I get this error:

./node_modules/webvtt-parser/parser.js
Module parse failed: Unexpected token (513:31)
You may need an appropriate loader to handle this file type.
|     this.parse = function(cueStart, cueEnd) {
|       function removeCycles(tree) {
|         const cyclelessTree = {...tree};
|         if (tree.children) {
|           cyclelessTree.children = tree.children.map(removeCycles);

Curious if anyone can provide insight on what I might be doing wrong.

Incorrect escape.

Why characters such ampersand (&) can be html-encoded(&) and other characters such ' encoded as ' are not allowed? Trying to validate the track below throws the error: Line 8, column 31: Incorrect escape.

WEBVTT

00:11.000 --> 00:13.000 vertical:rl
<v Roger Bingham>Tom &amp; Jerry

00:13.000 --> 00:16.000
<v Roger Bingham>See you at 5&#39;34

Cue id "0" lost in cue serialization

When VTT is parsed and serialized, a cue with id "0" is lost.

This VTT input

WEBVTT

0
00:00.890 --> 00:01.760
that was me

becomes this VTT output when parsed and serialized:

WEBVTT

00:00.890 --> 00:01.760
that was me

This is because '0' evaluates to false in the first line of function serializeCue(cue).

Add TypeScript types

I manually added types to my repo, but it's not comprehensive and only covers what I use. It would be great if this repo maintained its own types. Switching to TypeScript is not required. Only need to add a .d.ts file to the repo and set the types field in package.json.

Here's what I've gotten started, but I'm sure it's not completely accurate.

// webvtt-parser.d.ts
declare module 'webvtt-parser' {
  export interface Cue {
    direction: 'horizontal' | 'vertical'
    id: string
    startTime: number
    endTime: number
    text: string
    lineAlign: 'start' | 'center' | 'end'
    linePosition: 'auto' | number
    pauseOnExit: boolean
    positionAlign: 'auto' | 'start' | 'center' | 'end'
    size: number
    snapToLines: boolean
    textPosition: 'auto' | number
    tree: {
      children: {
        type: 'text'
        value: string
      }[]
    }
  }

  export class WebVTTParser {
    public parse: (
      input: string,
      mode: 'metadata' | 'chapters'
    ) => {
      cues: Cue[]
      errors: Error[]
      time: number
    }
  }
  export class WebVTTSerializer {
    public serialize: (cues: Cue[]) => string
  }
}

Thanks for considering.

Timestamps with one digit allowed

The spec requires that a WebVTT timestamp has two or more ASCII digits for hours (if non-zero) and two ASCII digits for minutes.

WEBVTT

1:23.456 --> 1:23:45.678

However, the validator accepts above input as valid. The first value length in a timestamp before the first colon is not checked.

CDN for quick usage

Hi, thanks for the library!

Do we publish on any CDN (eg: cdnjs, jsdelivr...) for quick grab the link and use fashion, do we plan to support that in the future ?

Serializer: Fallback to `tree` property if unavailable

Currently I'm trying to write an ASS-to-VTT converter which generates VTTCue-like structures.

The WebVTTSerializer also expects VTTCue-like objects but it expects parser-specific tree property, blocking my library to use it directly.

It may fallback to text if tree is unavailable, what do you think?

(This will also enable compatibility with native VTTCue):

new WebVTTSerializer().serialize([new VTTCue(0, 0, 'wow')])
// Currently throws with TypeError: cue.tree is undefined

Line Numbers with reference

Would be nice if the line numbers had a reference to the line that had issues.
ie:
Line 1237, column 8: Must be exactly two digits.

Would be helpful for it to show the line to search for: 00:00:0 6.012 --> 00:00:08.640

[Enhancement] RFC8216 - support X-TIMESTAMP-MAP

The HLS RFC8216 https://datatracker.ietf.org/doc/html/rfc8216#section-3.5 has extended the webVTT spec to support X-TIMESTAMP-MAP.

In order to synchronize timestamps between audio/video and subtitles,
an X-TIMESTAMP-MAP metadata header SHOULD be added to each WebVTT
header. This header maps WebVTT cue timestamps to MPEG-2 (PES)
timestamps in other Renditions of the Variant Stream. Its format is:

X-TIMESTAMP-MAP=LOCAL:,MPEGTS:
e.g., X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000

Although X-TIMESTAMP-MAP is not part of the w3c spec, X-TIMESTAMP-MAP is now part of an official RFC.

It would be useful if webvtt.js supported this header without generating error, so that HLS segmented webVTTs can be validated using webvtt.js.

Test case using https://quuz.org/webvtt/

Current behavior "Line 2: No blank line after the signature."
Expected behavior "This is boring, your WebVTT is valid! (1ms)"

WEBVTT - This file has cues. ; Kind: captions; Language: en
X-TIMESTAMP-MAP=LOCAL:00:00:00.000,MPEGTS:900000

00:11.000 --> 00:13.000 vertical:rl
<v Roger Bingham>We are in New York City

00:13.000 --> 00:16.000
<v Roger Bingham>We're actually at the Lucern Hotel, just down the street

00:16.000 --> 00:18.000
<v Roger Bingham>from the American Museum of Natural History

00:18.000 --> 00:20.000
<v Roger Bingham>And with me is Neil deGrasse Tyson

00:20.000 --> 00:22.000
<v Roger Bingham>Astrophysicist, Director of the Hayden Planetarium

Thanks!

Serializer generates invalid WebVTT

Input:

WEBVTT

00:11.000 --> 00:13.000 vertical:rl
<v Roger Bingham>We are in New York City

00:13.000 --> 00:16.000
<v Roger Bingham>We're actually at the Lucern Hotel, just down the street

00:16.000 --> 00:18.000
<v Roger Bingham>from the American Museum of Natural History

00:18.000 --> 00:20.000
<v Roger Bingham>And with me is Neil deGrasse Tyson

00:20.000 --> 00:22.000
<v Roger Bingham>Astrophysicist, Director of the Hayden Planetarium

Code

var webvtt = require("webvtt-parser");
console.log(new webvtt.WebVTTSerializer().serialize(new webvtt.WebVTTParser().parse(str).cues))

Output:

11 13
<v Roger Bingham>We are in New York City</v>

13 16
<v Roger Bingham>We're actually at the Lucern Hotel, just down the street</v>

16 18
<v Roger Bingham>from the American Museum of Natural History</v>

18 20
<v Roger Bingham>And with me is Neil deGrasse Tyson</v>

20 22
<v Roger Bingham>Astrophysicist, Director of the Hayden Planetarium</v>

It seems the serializer assumes that startTime/endTime are already strings but they are not.

And the vertical:rl property is ripped out, --> tokens are missing.

Entities object incorrectly passed to WebVTTCueTextParser

Bug Description

In parser.js, the entities object is assigned to this on line 29:

line 29: this.entities = entities

If we console.log(this) right after line 29, we see that this corresponds to WebVTTParser.

Next on line 30, this.parse is assigned to a function. In the parse function on line 177, it is attempted to pass that same entities object to the WebVTTCueTextParser:

line 177: var cuetextparser = new WebVTTCueTextParser(cue.text, err, mode, this.entities)

However, if we do another console.log(this) here, we see that it corresponds to Window because it is being used inside the function.

So this on line 29 is not the same as this on line 177, meaning that the argument this.entities passed to the WebVTTCueTextParser on line 177 will always be undefined. This causes a bug later on line 639 in the parser

if(self.entities[buffer]) {

Uncaught TypeError: Cannot read properties of undefined (reading '&M') where the buffer here is &M and self.entities is undefined.


Proposed Change

The proposed solution is to update how entities is passed to the WebVTTCueTextParser constructor on line 177 of parser.js:

Before:

var cuetextparser = new WebVTTCueTextParser(cue.text, err, mode, this.entities)

After:

var cuetextparser = new WebVTTCueTextParser(cue.text, err, mode, entities)

Steps to Reproduce:

const vttData = `
WEBVTT

1
00:11:46.140 --> 00:11:48.380
Texas A&M`

const { parse } = new WebVTTParser();
const parsed = parse(vttData, "metadata");
// ===> Uncaught TypeError: Cannot read properties of undefined (reading '&M')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.