Git Product home page Git Product logo

simple-yet-powerful-srt-subtitle-parser-cpp's Introduction

srtparser.h : Simple, yet powerful C++ SRT Subtitle Parser Library.

srtparser.h is a single header, simple and powerful C++ srt subtitle parsing library that allows you to easily handle, process and manipulate srt subtitle files in your project. It is an extension of Oleksii Maryshchenko’s simple subtitle-parser. It has following features :

  1. It is a single header C++ (CPP) file, and can be easily used in your project.

  2. Focus on portability, efficiency and simplicity with no external dependency.

  3. Wide variety of functions at programmers disposal to parse srt file as per need.

  4. Capable of :

    • extracting and stripping HTML and other styling tags from subtitle text.

    • extracting and stripping speaker names.

    • extracting and stripping non dialogue texts.

  5. Easy to extend and add new functionalities.

How to use srtparser.h

General usage

srptparser.h is a cross-platform robust srt subtitle parser.

SubtitleParserFactory *subParserFactory = new SubtitleParserFactory("inputFile.srt");
SubtitleParser *parser = subParserFactory->getParser();

//to get subtitles

std::vector<SubtitleItem*> sub = parser->getSubtitles();
  • Call appropriate functions to perform parsing.

See demo usage in examples directory.

Parser Functions

The following is a complete list of available parser functions.

Syntax:

Class Return Type Function Description

SubtitleParserFactory

SubtitleParserFactory

SubtitleParserFactory("inputFile.srt")

Creates a SubtitleParserFactory object. Here the inputFile.srt is the path of subtitle file to be parsed. This object is used to create parser.

E.g.: SubtitleParserFactory *subParserFactory = new SubtitleParserFactory("inputFile.srt");

SubtitleParserFactory

SubtitleParser

getParser()

Returns the SubtitleParser object. This object will be used to parse the subtitle file.

E.g.: SubtitleParser *parser = subParserFactory→getParser();

SubtitleParser

std::vector<SubtitleItem*>

getSubtitles()

Returns the Subtitle as SubtitleItem object.

E.g.: std::vector<SubtitleItem*> sub = parser→getSubtitles();

SubtitleParser

std::string

getFileData()

Returns the complete file data read as it is from inputFile.srt

E.g.: std::string fileData = parser→getFileData();

SubtitleItem

long int

getStartTime()

Returns the starting time of subtitle in milliseconds.

E.g.: long int startTime = sub→getStartTime();

SubtitleItem

long int

getEndTime()

Returns the ending time of subtitle in milliseconds.

E.g.: long int endTime = sub→getEndTime();

SubtitleItem

std::string

getStartTimeString()

Returns the starting time of subtitle in srt format.

E.g.: std::string startTime = sub→getStartTimeString();

SubtitleItem

std::string

getEndTimeString()

Returns the ending time of subtitle in srt format.

E.g.: std::string endTime = sub→getEndTimeString();

SubtitleItem

std::string

getText()

Returns the subtitle text as present in .srt file.

E.g.: std::string text = sub→getText();

SubtitleItem

std::string

getDialogue(bool keepHTML, bool doNotIgnoreNonDialogues, bool doNotRemoveSpeakerNames);

Returns the subtitle text after processing according to parameters.

keepHTML = 1 to stop parser from stripping style tags

doNotIgnoreNonDialogues = 1 to stop parser from ignoring and extracting non dialogue texts such as (laughter).

doNotRemoveSpeakerNames = 1 to stop parser from ignoring and extracting speaker names

By default (0,0,0) values are passed.

E.g.: std::string text = sub→getDialogue();

SubtitleItem

int

getWordCount()

Returns the count of number of words present in the subtitle dialogue.

E.g.: int wordCount = sub→getWordCount();

SubtitleItem

std::vector<std::string>

getIndividualWords()

Returns string vector of individual words present in subtitle.

E.g.: std::vector<std::string> words = sub→getIndividualWords();

SubtitleItem

bool

getIgnoreStatus()

Returns the ignore status. Returns true, if the justDialogue field i.e. subtitle after processing is empty.

_E.g.: bool ignore = sub→getIgnoreStatus();

SubtitleItem

int

getSpeakerCount()

Returns the count of number of speakers present in the subtitle.

E.g.: int speakerCount = sub→getSpeakerCount();

SubtitleItem

std::vector<std::string>

getSpeakerNames()

Returns string vector of speaker names.

E.g.: std::vector<std::string> speakerNames = sub→getSpeakerNames();

SubtitleItem

int

getNonDialogueCount()

Returns the count of number of non dialogue words present in the subtitle.

E.g.: int nonDialogueCount = sub→getNonDialogueCount();

SubtitleItem

std::vector<std::string>

getNonDialogueWords()

Returns string vector of non dialogue words.

E.g.: std::vector<std::string> nonDialogueWords = sub→getNonDialogueWords();

SubtitleItem

int

getStyleTagCount()

Returns the count of number of style tags present in the subtitle.

E.g.: int styleTagCount = sub→getStyleTagCount();

SubtitleItem

std::vector<std::string>

getStyleTags()

Returns string vector of style tags.

E.g.: std::vector<std::string> styleTags = sub→getStyleTags();

SubtitleWord

std::string

getText()

Returns the subtitle text as present in .srt file.

E.g.: std::string text = sub→getText();

Examples

While I’ve tried to include examples in the above table, a compilation of all of them together in a single C++ program can be found in example directory.

Contributing

Suggestions, features request, PRs, bug reports, bug fixes are welcomed. I’ll be thankful.

Credits

Built upon a MIT licensed simple subtitle-parser called LibSub-Parser by Oleksii Maryshchenko.

The original parser had 3 major functions : getStartTime(), getEndTime() and getText().

Rest work done by Saurabh Shrivastava, originally for using this in his GSoC project.

simple-yet-powerful-srt-subtitle-parser-cpp's People

Contributors

saurabhshri avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.