Git Product home page Git Product logo

sponsor-inference's Introduction

Labeling Sponsored Segments in a Youtube Video

Install !

git clone https://github.com/anaselmhamdi/sponsor-inference.git
cd sponsor-inference
pip3 install -r requirements.txt
wget https://anas-models.s3.amazonaws.com/tut7-model.pt --directory-prefix="./app"

You'll need around 2.5 Gb of free space and about 2 Gb of RAM to run optimally

Train freezing BERT parameters

Using a 70-15-15 training-validation-test split.
The script will download and cache the BERT cased modes (11M+ parameters)

python3 app/train.py -f training_file.json

The training file should contain a list of objects such as:

[{"This video is sponsored by Squarespace","label":"sponsor"}, {"Welcome to this NLP tutorial","label":"content"}]

I trained this model with a dataset I built based off of SponsorBlock's labels.

I used youtube-dl to get the english auto captions when they were available on the diffrent videos.

The dataset is publicly available here on Kaggle.

It took 4 hours 48 minutes to train on a 16 Gb GPU on a Kaggle kernel available here

It yielded a test accuracy of 93.85%.

Inference on a sentence

python3 app/inference.py -s "This video is sponsored by Squarespace"

Should print:

{ "class": "sponsor", "probability": 0.9990846614236943 }

Inference on a Youtube video

python3 app/inference.py -u "https://www.youtube.com/watch?v=MlOPPuNv4Ec"

Will download the video captions, and label it by 10s chunks.
The results will be written in the labeled_results.json file.

Running a Fast API locally

You can run this model for inference locally by running:

cd app
uvicorn main:app

You can see the routes at http://127.0.0.1:8000/docs

sponsor-inference's People

Contributors

anaselmhamdi avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.