Git Product home page Git Product logo

node-subtitles-grouping's Introduction

subtitles-grouping

Groups subtitles (srt files) by sync. Useful when retrieving results from OpenSubtitles.

Why?

OpenSubtitles has two ways of matching subtitles: metadata (e.g. IMDB ID) and movieHash. While the first one looks subtitles for a particular movie or TV episode, the second method retrieves them for a specific video.

Obviously, matching by MovieHash is the much better method since it ensures the subtitles will be synced to that particular video file.

However, there are users in OpenSubtitles who upload subtitles for a particular video file but with wrong sync. This is where grouping the subtitles by sync helps.

The basic philosophy is that once we have the subtitles grouped, we select the group that has the most MovieHash matches in it. As long as most MovieHash matches are correct (always), this is the right sync group for that video. That way we weed out the "odd" subtitles and we also pick the correctly synced subtitles from the metadata-based matches (usually 80% of the matches).

Example

This is how the example looks like. Each line represents a heatmap of an srt file (by time) and each colour represents a sync group. The brighter lines are picked by moviehash. You can clearly see how the two groups are different in sync. You can also see how the red group has more bright lines, meaning most moviehash picks are there. This is the correct sync group. The green (wrong) group has one bright line - meaning we just found a wrong moviehash pick and filtered it. If we had also grouped all non-moviehash picks, the correctly synced would be added to the red group.

API

var groupSubtitles = require("subtitles-grouping");

groupSubtitles([
	{ id: "3562019", uri: "http://dl.opensubtitles.org/en/download/filead/src-api/vrf-52c7037c6b/sid-vo81ml26hrarcsciua7gd44ta6/1952189414.gz" },
	{ id: "3963807", uri: "./examples/dexter-4x1/3963807.srt" },
	{ id: "3567323": uri: "./examples/dexter-4x1/3567323.srt" },
	{ id: "3562666": uri: "./examples/dexter-4x1/3562666.srt" }
], function(err, groups) { console.log(groups) }, { /* OPTIONS agent, sensitivity */  });

// subtitles URI is a URL/local path to an srt file, gzip-compressed srt or a zip containing an srt

// groups will be an array of the groups, each group being an array of subtitles as given to groupSubtitles() but also with a ``.heatmap`` property

// see example/example.js

Other modules

// Retrieves an srt string from path/URL to srt, gz or zip
// Also converts encoding to UTF8
require("./lib/retriever").retrieveSrt(/* path/URL to an srt, gz or zip file */, function(err,buf) {  })`` 

// Builds a heatmap of an srt file
require("./lib/heatmap")(/* string in a srt format */) //returns an array heatmap of that srt

CLI

subtitles-grouping [path to srt] [path to srt] ...

Contributors

  • Big thanks to OpenSubtitles for providing subtitles dumps to test with

node-subtitles-grouping's People

Contributors

ivshti avatar core1024 avatar jaruba avatar dexter21767-dev avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.