kanjieater / subplz Goto Github PK

🫴 Generate accurate subtitles from audio, align existing subs to videos, generate your own Kindle's Immersion Reading like audiobook subs 📖🎧

License: MIT License

Python 93.35% Shell 5.42% Dockerfile 1.23%

subplz's Introduction

SubPlz🫴: Get Incredibly Accurate Subs for Anything

o.mp4

Generate accurate subtitles from audio, align existing subs to videos, generate your own Kindle's Immersion Reading like audiobook subs.

This tool allows you to use AI models to generate subtitles from only audio, then match the subtitles to an accurate text, like a book. It supports syncronizing existing subs as well. Soon, You can also just generate subtitles for videos with it, without needing any existing subtitles. Currently I am only developing this tool for Japanese use, though rumor has it, the language flag can be used for other languages too.

It requires a modern GPU with decent VRAM, CPU, and RAM. There's also a community built Google Colab notebook available on discord.

Current State:

The subtitle timings will be 99.99% accurate for most intended use cases.
The timings will be mostly accurate, but may come late or leave early.
Occassionally, the first word of the next line will show up in the next subtitle.
Occassionally, non-spoken things like sound effects in subtitles will be combined with other lines
Known Issues: RAM usage. 5+ hr audiobooks can take more than 12 GB of RAM. I can't run a 19 hr one with 48GB of RAM. The current work around is to use an epub + chaptered m4b audiobook. Then we can automatically split the ebook text and audiobook chapters to sync in smaller chunks accurately. Alternatively you could use multiple text files and mp3 files to achieve a similar result.

Accuracy has improved tremendously with the latest updates to the AI tooling used. Sometimes the first few lines will be off slightly, but will quickly autocorrect. If it get's off midway, it autocorrects. Sometimes multiple lines get bundled together making large subtitles, but it's not usually an issue.

How does this compare to Alass for video subtitles?

Alass is usually either 100% right once it get's lined up - or way off and unusable. In contrast, SubPlz is probably right 95% of the time, but may have a few of the above issues. Ideally you'd have both types of subtitle available and could switch from an Alass version to a SubPlz version if need be. Alternatively, since SubPlz is consistent, you could just default to always using it, if you find it to be "good enough"

Support for this tool can be found on KanjiEater's thread on The Moe Way Discord

Support for any tool by KanjiEater can be found on KanjiEater's Discord

Support

The Deep Weeb Podcast - Sub Please 😉

If you find my tools useful please consider supporting via Patreon. I have spent countless hours to make these useful for not only myself but other's as well and am now offering them completely 100% free.

If you can't contribute monetarily please consider following on a social platform, joining the discord and sharing this with a friend.

How to use

Quick Guide

Put an audio/video file and a text file in a folder.
1. Audio / Video files: m4b, mkv or any other audio/video file
2. Text files: srt, vtt, ass, txt, or epub

/sync/
└── /Harry Potter 1/
   ├── Im an audio file.m4b
   └── Harry Potter.epub
└── /Harry Potter 2 The Spooky Sequel/
   ├── Harry Potter 2 The Spooky Sequel.mp3
   └── script.txt

List the directories you want to run this on. The -d parameter can multiple audiobooks to process like: subplz sync -d "/mnt/d/sync/Harry Potter 1/" "/mnt/d/sync/Harry Potter 2 The Spooky Sequel/"
Run subplz sync -d "<full folder path>" like /mnt/d/sync/Harry Potter 1"
From there, use a texthooker with something like mpv_websocket and enjoy Immersion Reading.

Install

Currently supports Docker (preferred), Windows, and unix based OS's like Ubuntu 22.04 on WSL2. Primarily supports Japanese, but other languages may work as well with limited dev support.

Run from Colab

Open this Colab
In Google Drive, create a folder named sync on the root of MyDrive
Upload the audio/video file and supported text to your sync folder
Open the colab, you can change the last line if you want, like -d "/content/drive/MyDrive/sync/Harry Potter 1/" for the quick guide example
In the upper menu, click Runtime > run all, give the necessary permissions and wait for it to finish, should take some 30 min for your average book

Running from Docker

Install Docker
```
docker run -it --rm --name subplz \
-v <full path to up to content folder>:/sync \
-v <your folder path>:/SyncCache \
kanjieater/subplz:latest \
sync -d "/sync/<content folder>/"
```
Example:
```
/mnt/d/sync/
         └── /変な家/
               ├── 変な家.m4b
               └── 変な家.epub
```
```
docker run -it --rm --name subplz \
--gpus all \
-v /mnt/d/sync:/sync \
-v /mnt/d/SyncCache:/app/SyncCache \
kanjieater/subplz:latest \
sync -d "/sync/変な家/"
```
a. Optional: --gpus all will allow you to run with GPU. If this doesn't work make sure you've enabled your GPU in docker (outside the scope of this project)

b. -v <your folder path>:/sync ex: -v /mnt/d/sync:/sync This is where your files that you want to sync are at. The part to the left of the : if your machine, the part to the right is what the app will see as the folder name.

c. The SyncCache part is the same thing as the folder syncing. This is just mapping where things are locally to your machine. As long as the app can find the SyncCache folder, it will be able to resync things much faster.

d. <command> <params> ex: sync -d /sync/変な家/, this runs a subplz <command> <params> as you would outside of docker

Setup from source

Install ffmpeg and make it available on the path
git clone https://github.com/kanjieater/SubPlz.git
Use python >= 3.11.2 (latest working version is always specified in pyproject.toml)
pip install .
You can get a full list of cli params from subplz sync -h
If you're using a single file for the entire audiobook with chapters you are good to go. If an file with audio is too long it may use up all of your RAM. You can use the docker image m4b-tool to make a chaptered audio file. Trust me, you want the improved codec's that are included in the docker image. I tested both and noticed a huge drop in sound quality without them. When lossy formats like mp3 are transcoded they lose quality so it's important to use the docker image to retain the best quality if you plan to listen to the audio file.

Note

This can be GPU intense, RAM intense, and CPU intense script part. subplz sync -d "<full folder path>" eg subplz sync -d "/mnt/d/Editing/Audiobooks/かがみの孤城/". This runs each file to get a character level transcript. It then creates a sub format that can be matched to the script.txt. Each character level subtitle is merged into a phrase level, and your result should be a <name>.srt file. The video or audio file then can be watched with MPV, playing audio in time with the subtitle.
Users with a modern CPU with lots of threads won't notice much of a difference between using CUDA/GPU & CPU

Sort Order

By default, the -d parameter will pick up the supported files in the directory(s) given. Ensure that your OS sorts them in an order that you would want them to be patched together in. Sort them by name, and as long as all of the audio files are in order and the all of the text files are in the same order, they'll be "zipped" up individually with each other.

Overwrite

By default the tool will overwrite any existing srt named after the audio file's name. If you don't want it to do this you must explicitly tell it not to.

subplz sync -d "/mnt/v/somefolder" --no-overwrite

Only Running for the Files It Needs

For subtitles, SubPlz renames matching sub files to the audio with the <audiofile>.old.<sub ext> naming. This ensures that subplz runs once and only once per directory for your content. If you want to rerun the SubPlz syncing, you can use the flag --rerun to use the matching .old file and ignore all subs that aren't .old.

Respect Transcript Grouping

If you want to allow the tool to break lines up into smaller chunks, you can use this flag. --no-respect-grouping

Tuning Recommendations

For different use cases, different parameters may be optimal.

For Audiobooks

Recommended: subplz sync -d "/mnt/d/sync/Harry Potter"
A chapter m4b file will allow us to split up the audio and do things in parallel
There can be slight variations between epub and txt files, like where full character spaces aren't pickedup in epub but are in txt. A chaptered epub may be faster, but you can have more control over what text gets synced from a txt file if you need to manually remove things (but epub is still probably the easier option, and very reliable)
If the audio and the text differ greatly - like full sections of the book are read in different order, you will want to use --no-respect-grouping to let the algorithm remove content for you
The default --model "tiny" seems to work well, and is much faster than other models. If your transcript is inaccurate, consider using a larger model to compensate

For Realigning Subtitles

Recommended: subplz sync --model large-v3 -d "/mnt/v/Videos/J-Anime Shows/Sousou no Frieren"
Highly recommend running with something like --model "large-v3" as subtitles often have sound effects or other things that won't be picked up by transcription models. By using a large model, it will take much longer (a 24 min episode can go from 30 seconds to 4 mins for me), but it will be much more accurate.
Subs can be cut off in strange ways if you have an unreliable transcript, so you may want to use --respect-grouping. If you find your subs frequently have very long subtitle lines, consider using --no-respect-grouping

Anki Support

Generates subs2srs style deck
Imports the deck into Anki automatically

The Anki support currently takes your m4b file in <full_folder_path> named <name>.m4b, where <name> is the name of the media, and it outputs srs audio and a TSV file that can is sent via AnkiConnect to Anki. This is useful for searching across GoldenDict to find sentences that use a word, or to merge automatically with custom scripts (more releases to support this coming hopefully).

Install ankiconnect add-on to Anki.
I recommend using ANKICONNECT as an environment variable. Set export ANKICONNECT=localhost:8755 or export ANKICONNECT="$(hostname).local:8765" in your ~/.zshrc or bashrc & activate it.
Make sure you are in the project directory cd ./AudiobookTextSync
Install pip install ./requirements.txt (only needs to be done once)
Set ANKI_MEDIA_DIR to your anki profile's media path: /mnt/f/Anki2/KanjiEater/collection.media/
Run the command below

Command: ./anki.sh "<full_folder_path>"

Example: ./anki.sh "/mnt/d/sync/kokoro/"

FAQ

Can I run this with multiple Audio files and One script?

It's not recommended. You will have a bad time.

If your audiobook is huge (eg 38 hours long & 31 audio files), then break up each section into an m4b or audio file with a text file for it: one text file per one audio file. This will work fine.

But it can work in very specific circumstances. The exception to the Sort Order rule, is if we find one transcript and multiple audio files. We'll assume that's something like a bunch of mp3s or other audio files that you want to sync to a single transcript like an epub. This only works if the epub chapters and the mp3 match. Txt files don't work very well for this case currently. I still don't recommend it.

How do I get a bunch of MP3's into one file then?

Please use m4b for audiobooks. I know you may have gotten them in mp3 and it's an extra step, but it's the audiobook format.

I've heard of people using https://github.com/yermak/AudioBookConverter

Personally, I use the docker image for m4b-tool. If you go down this route, make sure you use the docker version of m4b-tool as the improved codecs are included in it. I tested m4b-tool without the docker image and noticed a huge drop in sound quality without them. When lossy formats like mp3 are transcoded they lose quality so it's important to use the docker image to retain the best quality. I use the helpers/merge2.sh to merge audiobooks together in batch with this method.

Alternatively you could use ChatGPT to help you combine them. Something like this:

!for f in "/content/drive/MyDrive/name/成瀬は天下を取りに行く/"*.mp3; do echo "file '$f'" >> mylist.txt; done
!ffmpeg -f concat -safe 0 -i mylist.txt -c copy output.mp3

Thanks

Besides the other ones already mentioned & installed this project uses other open source projects subs2cia, & anki-csv-importer

https://github.com/gsingh93/anki-csv-importer

https://github.com/kanjieater/subs2cia

https://github.com/ym1234/audiobooktextsync

Other Cool Projects

A cool tool to turn these audiobook subs into Visual Novels

https://github.com/asayake-b5/audiobooksync2renpy

subplz's People

Contributors

Stargazers

Watchers

Forkers

thethao133 geniusssmit ccos89 ivorytwelve asayake-b5 borisna xingbicheng dcardoso-github

subplz's Issues

Need some help setting it up

I don't think this is an issue on your part, I just don't know how to run it.

I have researched and followed a guide for the following:
How to create wsl.
Followed a guide and installed python 3.9.9 successfully on the wsl with ubuntu. I'm confident I did this part right.
Installed pip. Followed a guide to install ffmpeg on ubuntu, added the path like this: export PATH=$PATH:/bin/ffmpeg
Installed stable-ts. Added it to path just in case with export PATH=$PATH:$HOME/pacote/ since I got the yellow message saying it was not in path.
After this, step 4 sounds like it's optional, is it? I just want to run this, if it's optional I'd skip until I can get it to work.
So I didn't do step 4.
Then I tried to run with a single audiobook file, very small.
The folder is called "name" and the audiobook file inside is "name.m4b", in the folder I also have "script.txt" with the ebook in it.
When I run ./run.sh "$(wslpath -a "wsl.localhost\Ubuntu\home\pacote\name")" (I tried coping your example) I only get new line ">" and a blinking _
(if I add another \ like your example after "name": name\ + another ")" it just gives me ~bash: ./run.sh: No such file or directory

Would you mind explaining what I'm missing?

edit: here's a picture of how it is just in case

IndexError('list index out of range')

IndexError('list index out of range')

WSL2 Ubuntu 22.04.2 LTS package conflicts

Hi, I know this is not really an issue with your program, but hoping I can have some clues on how to fix this. I've installed python 3.9.9 by compiling from source and then ran 'python3.9 -m pip install -r requirements.txt --user.' I've doubled checked that the python3.9 binary is python 3.9.9.

However, when I run this I get:

**ERROR: Cannot install openai-whisper and requests==2.23.0 because these package versions have conflicting dependencies.

The conflict is caused by:
    The user requested requests==2.23.0
    tiktoken 0.3.1 depends on requests>=2.26.0**

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/user_guide/#fixing-conflicting-dependencies

I've actually also tried running this pip instal lwith later version of Python and I'm having the same issue

IndexError('list index out of range')

I'm trying this out for the first time, and I've encountered this error while saving the VTT file. I'm not sure what's causing it, but it seems to be an issue with Python. It's probably my fault somehow but I have no idea what it could be.

C:\Users\ \Desktop\AudiobookTextSync-master>python run.py -d "E:\Books\とある飛空士への追憶" Working on E:\Books\とある飛空士への追憶\ Splitting script into sentences 100%|████████████████████████████████████████████████████████████████████████████| 6286/6286 [00:01<00:00, 3484.25it/s] '1 file will be processed:' ['E:\\Books\\とある飛空士への追憶\\とある飛空士への追憶（ガガガ文庫）（小学館）.m4b'] 100%|███████████████████████████████████████████████████████████████████████████████████| 1/1 [07:10<00:00, 430.95s/it] 0%| | 0/1 [00:00<?, ?it/s]C:\Users\ \AppData\Local\Programs\Python\Python39\lib\site-packages\stable_whisper\whisper_word_level.py:190: UserWarning: FP16 is not supported on CPU; using FP32 instead warnings.warn("FP16 is not supported on CPU; using FP32 instead") Predicted silence(s) with VAD 100%|████████████████████████████████████████████████████████████████████████████████████| 31865.77/31865.77 [1:14:25<00:00, 7.14sec/s] Saved: E:\Books\とある飛空士への追憶\とある飛空士への追憶（ガガガ文庫）（小学館）.filtered.ass5.77/31865.77 [1:14:25<00:00, 3.21sec/s] 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [1:40:01<00:00, 6001.90s/it] とある飛空士への追憶.vtt IndexError('list index out of range') The following 1 failed: 'E:\\Books\\とある飛空士への追憶\\' IndexError('list index out of range')

The resulting vtt file got about 8 minutes in.

System:
Windows 10 (wsl2 installed)
Python 3.9.9

Terminating Error due to UnicodeDecodeError

Hi, I have this installed in a fresh conda environment but it's failing after creating the vtt file with whisper. I'm getting the below error, and am not quite sure how to navigate to the offset it is indicating in the vtt file (?) in order to evaluate which character is causing the error. I converted the epub to txt using Calibre with UTF-8 encoding and it is displaying fine in a text document.

The process terminates upon encountering this error. I'm not quite sure how to troubleshoot this as again, I'm not sure how to navigate to that offset in the text. I've included a sample of the text from the book in an attached text file - to avoid any copyright issues I've only included the first bit of text in case that helps illuminate the issue. Please let me know if you need any other information to help troubleshoot.

The only (uneducated) guess I have is that the error is induced by the fact the vtt file has "WEBVTT" appended to the beginning of the contents prior to the transcription.

I'm running in Windows 10 with Python 3.10 with 64GB of RAM, i7, and a 4090. I'm not sure if this error could be due to you saying to use Python 3.9.9 in the instructions, so if it is, I apologize - everything seems to be working to this point so I figured that was not the issue hence the post.
sample.txt

Thanks in advance for the help!

Alignment Problems with Spanish

I think this has to do more with stable-ts, and not much to do with your program. But I'm posting anyway, as it might help someone else.

What I found was that using this for Spanish, the subtitles were often broken up in an illogical way, e.g., one sentence would be split over a few subtitles, a semi-colon might also split a subtitle, or the last word in a sentence would get its own subtitle line. Once I set 'regroup' to false, this fixed that problem. Now the subtitles themselves are split perfectly, i.e., each sentence gets its own subtitle line, which is exactly what's needed for sentence mining.

However, the alignment is off by some amount every line. It's not consistent so I can't fix it by adjusting the subtitle delay. I've set the model to small, language to es, and regrouping to false. Otherwise, that's all I've changed. Given that it's matching the transcription with actual text. What might cause bad alignment? Is it possible that unrecognized symbols in the text might cause this, e.g., "—", "¿", "«"?

I've checked the text matches and extracted only small portions to make sure there is nothing extra in there other than what's spoken in the audio book.

Am I hoping for too much to expect that I'll get almost consistently good alignment? That is the reason for wanting to use this, after all, to sentence mine efficiently from books.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.