3b1b / captions Goto Github PK

View Code? Open in Web Editor NEW

232.0 232.0 174.0 409.31 MB

transcripts and captions for 3blue1brown videos

JavaScript 4.85% HTML 2.12% Shell 0.32% TypeScript 78.56% CSS 13.18% Python 0.97%

captions's People

Stargazers

Watchers

Forkers

schrodingerzhu explosion-scratch jayakrishna-10 yasinkaraaslan lprecord epsalon ljoubert qoheniac runningtooblivion noaohad hinum gur018801 prateekbansal97 corping kcentric gonsote xx-masik-xx agustin-j anna-lombardo luinthalion dlluatic benjonion antootnasbugs morenozito rajeshwar-pandey huncut2016 thadaeus ntsagkas davidbar-on dlatikaynen pleierzn303 amin-not-found sszychaa valpafy46 homok43 evezz pritam142857 gabboronco luxenss cheskoxd yag000 gala-kyklos wkmartins elsa-lion employedrussian notokr imlargo luiz12apn wolfsgier devmet1 sakurai-nak zsawyer nikitus20 marcelolynch tapender1 anuith yahiaeldakhakhny saurabh-git-dev jabir-a-h dnull-to-infinity m0zzarella nirakiwi logiisch giacomodelillo aesistril iverid kilobravo3 rrayhka infibrocco giorey01 omertuc iliakonnov einroboter nokibsarkar renelle29 jns-v g00devening iammin123 ectem7 lukamarinkic nbekareva xezu730 yusuf-nar dreivmeister lole370 laszloalexandra mathiashaebich jsenn2 vameryt realcalal mouhsiiin adilatte rosieparfait trinetra75 ah202-hub yvanamouawad tebaioioo bot224032 usmangdeveloper mrexpertgithub

captions's Issues

Correctly credit authors

This touches on this PR.

When I threw together my quick script, I noticed that authors that did their contributions per github_dev and are correctly credited if you look on the Github website, are simply credited as "Github" in the online data. It is very well possible, that I did some mistake or oversaw something, but it is something we should take a look at.

Here is some of the raw data:

"commits": {
        "b7c944d930ca0a0c482a4667c7c41d8d86930c31": {
            "oid": "b7c944d930ca0a0c482a4667c7c41d8d86930c31",
            "message": "Add raw translations, with associated time ranges",
            "shortMessageHtmlLink": "<a data-pjax=\"true\" title=\"Add raw translations, with associated time ranges\" class=\"color-fg-default\" href=\"/3b1b/captions/commit/b7c944d930ca0a0c482a4667c7c41d8d86930c31\">Add raw translations, with associated time ranges</a>",
            "authorAvatarUrl": "https://avatars.githubusercontent.com/u/11601040?s=80&v=4",
            "committerName": "Grant Sanderson",
            "committerEmail": "[email protected]",
            "committedDate": "2024-01-24T15:52:50.000-06:00",
            "firstParentOid": "45fee6d36feeb666cd681f57904769c0ca3164e4"
        },
        "476075345349dcfcbd378f37cf88ede3f6b8e851": {
            "oid": "476075345349dcfcbd378f37cf88ede3f6b8e851",
            "message": "Replace 'Wurdle' with 'Wordle' and improve first 200 seconds",
            "shortMessageHtmlLink": "<a data-pjax=\"true\" title=\"Replace &#39;Wurdle&#39; with &#39;Wordle&#39; and improve first 200 seconds\" class=\"color-fg-default\" href=\"/3b1b/captions/commit/476075345349dcfcbd378f37cf88ede3f6b8e851\">Replace 'Wurdle' with 'Wordle' and improve first 200 seconds</a>",
            "authorAvatarUrl": "https://avatars.githubusercontent.com/u/60841004?s=80&v=4",
            "committerName": "GitHub",
            "committerEmail": "[email protected]",
            "committedDate": "2024-02-04T12:14:49.000+01:00",
            "firstParentOid": "94c58ca574b56df33ddaade9e074108dd71ab98c"
        },

Secondly, will git blame data even work with the new system? I have no idea how it will look, but if it is not possible for this new system to use the git blame data, then what we are doing right now is useless.
Even if we commit to Github, how does the commit come from the person opening the web app etc?

Voice dub length issue

I took a look at the newly created translate.3blue1brown.com page to check my previous translations. The site highlights sentences red, when "its estimated to be too long to fit within the time constraint".

Hungarian words are usually longer then english words, so the translation is hard enough already considering the length requirements, but I've found a lot of occasions where my translated hungarian sentences have less characters as the corresponding english ones and they still get highlighted red. This makes the completion rate only 25% even though more than 90% of translations are not significantly longer.

Given the current constraints I don't see a way to properly reduce the length to sub-english sizes and still sound natural and precise enough. I suggest loosening the time constraints to +10% or so, as the video contains a lot of spare time between sentences which could be used up by the longer narration.

Suggestion: Romanian language

I know there would be a massive amount of interest in these videos if they were available with ro dubs. Suggestion to include Romanian as a language.

Suggestion: Dutch

I would love to help bring 3b1b to Dutch!

As a start I would like to work on the "Essence of Calculus" course. To this end I would like to request the rough generated translations of the first 4 videos of The Essence of Calculus.

A Dutch channel on the Discord server would also be appreciated, in case anyone wants to help.

Thank you for the amazing content :)

Suggestion: Czech

Hi, I would love to help translate 3B1B videos into Czech. I'm probably going to work backwards and start by translating the Shorts from 2024 and 2023 with the Optics puzzle series if possible:

Requested videos for translations

barber-pole-1
barber-pole-2
cube-shadow-puzzle
mandelbrot
newton-art-puzzle
on-shorts
prism
refractive-index-questions
subset-sum

I have already translated cube-shadow-puzzle without the rough translation. Not exactly sure what I'm supposed to do with it now, but I will gladly translate it once more alongside the rough translation if necessary.

SIDE NOTE:
I've noticed there is no #czech channel on the translation Discord server and I would like to ask you to add one. Even though I will be evidently doing solo work there, I'd be good to have things over there organized for future translators.

Hopefully I'll have these translations done in less than a month's worth of time, even thought my progress might be slowed down, because of my school and studying, but I believe the shorts will be a matter of a few days in worst case scenario.

The Czech translation of Essence of linear algebra series.

Hey, I gained a new translator to our Czech translation team and he is interested in the translation of the Essence of linear algebra series into Czech because of big national exam comming up in the Czech republic. After reviewing said video files I saw community files, can we request an AI translation and translate the sentence_translation.json files or should we only change the community files? Thanks :)

Missing `hebrew` folders

under ego-and-math

Also under shorts/on-shorts

And under 2024/shorts/cube-shadow-puzzle and 2024/shorts/subset-sum

Is it expected that translators create them manually? Or is this some bug in the automatic translations scripts?

Incorrect newline characters breaking JSON parsing

See https://github.com/3b1b/captions/blob/main/2023/gaussian-integral/hebrew/sentence_translations.json#L774

I think this is the AI model trying to translate a \n newline character, and using a Hebrew "n" instead, which is not a valid JSON escape character. So, parsing fails, and going to that lesson page shows that the captions file is missing (I could improve the message to discriminate between loading errors and parsing errors).

It'd be hard to make the app recover from this type of parsing error though. I could replace all \מs with \ns, but what about other languages and escape characters? Perhaps a better solution here would be to make sure these characters are removed from the input English before passing them to the models. Could more easily make sure all escape characters are captured that way.

gpt/english and translations

Original
But once you have a prediction model like this, a simple thing you generate a longer piece of text is to give it an initial snippet to work with, have it take a random sample from the distribution it just generated, append that sample to the text, and then run the whole process again to make a ne...

Time:
110.18 - 129.54

Line
Entry # 16
Line # ~131

Describe the issue
The English subtitle (and translations as a result) are different than the audio. The text ".... you generate a longer ...." should actually be ".... you can try to make it generate is a longer .....".

differential-equations (DE1) - translation sync is lost at 1163.89

Original
In his book Chaos, the author James Glick describes phase space as, "O

Translation (French):
Dans son livre Chaos, l'auteur James Glick décrit l'espace des phases comme suit : ".

Time:
1163.89 - 1186.89

Line
Entry # 139
Line # ~1115

Describe the issue
This issue exist at least in the French and Hebrew translations, but I assume it is common to most of the translations. About 23 seconds are allocated to this part of the translation, although it takes only few seconds (is it because of the """?). From this point on, the translation and video are not synchronized.

Note that the English transcript does not have this issue, so it maybe that the English transcript was fixed without re-running the auto translations.

Misc app UI enhancements

Add "upload zip" option, to resume work from a previously exported zip.
Add "see raw" button next to flag button in each row that links to raw json files on github for inspection? #:~:text=
More descriptive body text for linked issues (e.g. flag button, .json missing link, etc.). Maybe line/index number of entry? Maybe also include input English text?

limits/hebrew - unable to submit my edits

I worked on the Hebrew translation of "Limits, L'Hôpital's rule, and epsilon delta definitions | Chapter 7, Essence of calculus" but the submit edits button did not work.
I keep getting this message after a few seconds of waiting - "There was an error submitting: Couldn't get main branch: undefined. Please save your edits as a backup, then try again later or report this issue."

Suggestion: Using a package that produces better timestamps than Whisper

Context

Whisper is great for transcriptions, however, its timestamps for subtitles are pretty clunky. Why? Because Whisper aligns at the phrase level

Solution

I have been working on a Python package for video transcription that uses Whisper and WhisperX, which is an extension of Whisper that aligns at the word level. You can try it out on this Colab notebook. Here is the result of transcribing the But what is a Neural Network? video: as a SRT subtitles file and as plaintext file

Proposal

Using this package would save the community a lot of manual effort in re-aligning the subtitles, as its alignment quality is far better than that of the original Whisper.

PD

This is not shameless self-promotion, I am genuinely interested in helping :)

No sentences_translation.json

No file in 3b1b/captions/tree/main/2019/bayes-theorem-quick/german

gaussian-convolution/hebrew

The translations repo for the gaussian-convolution video ("A pretty reason why Gaussian + Gaussian = Gaussian") is missing the Hebrew captions file. Note that when viewing the video, the Hebrew captions are shown.

Suggestion: Dutch translations of crypto videos

Hi, I would like to help translating some videos into Dutch, specifically the videos on cryptocurrencies:

bitcoin
256-bit-security

I have already translated the 256-bit-security video without the machine-generated translations. However, it would save a lot of time if I could use the rough translations. Would you like to add the files for these two videos?

Thanks in advance, and thanks for your awesome videos.

Suggestion: Bulgarian language

I know a lot of people in my country watch your videos and respect them. There's a bit of a math problem since a lot of people aren't that interested in math and the old teachers that don't speak English don't help that. I would love to help out with a possible Bulgarian translation that could reach more people and ignite their passion for math! I know damn well it did for me!

Suggestion: Consistent formality in German translations

Consistent formality in German translations

When translating from English to German, a certain degree of freedom is left to the translator. This freedom boils down to the formality of the text. As other contributors have pointed out, the pronoun you has several translations at least that each come with a different verb conjugation, most notably:

Du (Singular, informal)
Sie (Singular, very formal)¹

Grammatical gender, too, plays a role in this. To avoid any inconsistencies between translations, I'd like to suggest @3b1b let us know what level of formality he finds appropriate for his target audience in advance.

See this article for more information on German pronouns. ↩

Suggestion: Language labels

We should make it easier to know the language of each pull request for smoother reviews.

While some mention it in the title, enforcing a format can be tricky. A simple solution could be using labels, automatically set by the labeler action. For example, a PR changing https://github.com/3b1b/captions/blob/main/2023/ego-and-math/spanish/sentence_translations.json could be labeled as 'Spanish' since it's in the 'spanish' directory, and the same for the other languages.

If you agree, I can submit a PR for this.

barber-pole-2/hebrew

Line:
At the extreme, the only thing that's more important is the direction that the wave is going to move...

Time:
478 - 493.52

Describe the issue:
The English text used for the Hebrew translation is probably based on an old version of the video. It seems that the video was edited starting at 7:58.000 and the text is not aligned with the video. Note that the English subtitles are o.k., so probably the new version of the English text was not propagated to other languages translations.

(Note that I submitted PR #372 with changes to the Hebrew translation up to this point, so I will not loose the work I did. I hope that you will be able to use this edited Hebrew translation in the new version.)

Fixing the original English subtitles

Is it there a way to submit a review for the original subtitles for a video?

For example on the Solving Wordle using information theory video, there are several times in the english subtitle that "Wurdle" and "wordle" appear. Also with non existing words, "aargh", and when spelling letters out loud individually.

Should this be done through the website or as a direct pull request to the source files?

It's possible that by just adding English to the list inside of app/src/data/languages.json would work, but I don't know the code that well.

What model(s) does Grant use to generate the Korean audio track in the pilot video?

I've been wondering if anyone knows what pipeline he used to do the TTS for the Korean audio in this video

Suggestion: Armenian language

Hi, I would love to help with translating videos into Armenian. Also, can we have an Armenian channel in the discord server.

Improving Translation Accuracy

Issue

I'd like to emphasize that making minor adjustments is easier to rectify than translating the entire text from scratch. To illustrate, Hungarian translations often contain inaccuracies and can sometimes be misleading.

Solution

Based on my research, DeepL appears to be the most reliable tool for English-Hungarian translation. Consequently, I encourage other native speakers to suggest alternative translation services they find satisfactory. Subsequently, these can be integrated into the existing pipeline.

Dutch only has community.srt files, and can't upload .json files

On most videos there is no Dutch present, but on some there is a "community.srt" file. My first question is what this file is supposed to be, as it doesn't seem to be human translated (if it was, no offense, but they would suck at Dutch).

For the video dp3t I have converted the .srt file to a .json file in the correct format and revised it to be proper Dutch, but without the "input" field as there doesn't seem to be an English transcript/subtitle file with corresponding times. I tried uploading the files to the repo, but I don't have permission (for a good reason!) to upload files so I uploaded them on a fork: CactusBrothers@c40c03f .

So the question is:
What is the deal with the "community.srt" files?

Love this initiative and I will definitely be checking the Dutch translations!

Suggestion: Malayalam Language

I would like to contribute to writing Malayalam subtitles but there is no option for it

What about the words on the screen ?

Would there be any easy way to translate the words and variables that appear on the screen ? I doubt it but I'm no expert. If not doable, is it better to translate the corresponding words/variables in the text, or leave them as is ? I think this could be awkward either way. I got this issue translating V - E + R = 2 from the Moser circle problem, which would translate as N - A + R = 2. While it's awkward hearing and seeing two different things, I think it's worse to not understand where the letters are coming from.

Suggestion: Burmese language

No options to add Burmese subtitles

Suggestion: Adding popular IDE folders to .gitignore

It would be nice if the folders of popular IDEs are added to the .gitignore

Differential equations DE1/Hebrew - Playing a video part during edit ends too early

When editing the Hebrew DE1 subtitles and playing the video for a specific part (by pressing the "play" button to the left of the sentence"), they video play ends about a 2 seconds before the sentence ends. E.g., playing the 35.06 - 43.82 part ends at about 0:41.

This is a not a major issue, but I submit it in case it may be a general issue in the tool.

[SUGGESTION] Add labels for languages

Everything is in the title. I think it would be better to add a label for each language to help the filtering on issues and PR. Since the formatting on the title is good, but not as readable and filterable as labels.

Double subtitle

Line:
Wait for it....

Time:
110.66 - 110.04

Subtitle not necessary, there is a copy of it that lasts longer just after it.

differential-equations/vietnamese

Line:
bout a single initial condition but about a whole spectrum of initial states....

Time:
1192.91 - 1201.91

Describe the issue:
The timing between sentences in the subtitles does not match the voice in the video

CTL Translations are Missing in the Tool

In the "3Blue1Brown Translations" tool I can't find the Hebrew translation, or any other translation, for the CTL video "But what is the Central Limit Theorem?".

I already edited the Hebrew translation, but not using the tool, and the since the tool properly shows mixed RTL/LTR text (which is great!), I would like to go over the last version of the translation using the tool.

Merging gidelines

Following a discussion on the Spanish channel on Discord, I would like to inquire about guidelines regarding when a translation PR is ready to merge.

The Spanish community has a good number of active members who diligently review each other's PRs, which is excellent for catching typos and mistakes. I have observed similar practices in other languages, particularly German. I am interested in knowing if there is a way to "institutionalize" reviews, especially for "larger" communities. If there are only one or two contributors for a language this does not make much sense.

In particular, for the Spanish community, we face challenges due to the diverse ways Spanish is spoken across various countries, with sometimes radical differences. We would appreciate having different perspectives before merging to address these variations.

Suggestion: Mediawiki

Hello there,

TL;DR:
I'd suggest to transcribe via OpenAI Whisper, then translate in your prefered way, then push all transcripts on a mediawiki instance. To my knowledge, this is the best middle-ground for getting people involved, making it easy to revert / detect / manage vandalism, while simultaniously having the tools to easily distribute workload, rights, etc.
Then, after given thresholds (time passed, edits, overseer approval, ...), pull the transcripts from the wiki, convert to captions (and Text-to-speech - potentially, this review could be posted, discussed etc. again on the same wiki) - and progress to the video.

Long story:

Hi, my name is Tim, I am a PhD Student in Digital Libraries and Datascience and Co-founded the BorgNetzWerk in 2023 as a charity NPO to make accessible knowledge more connected. Some quick references:

Our first in-use project: Transcripts for the history of philosophy, "without any gaps." podcast from Peter Adamson, Professor of Philosophy at the LMU in Munich and at King's College London: https://historyofphilosophy.net/transcripts
A pitch for a potential Wiki for CinemaTherapy: https://www.reddit.com/r/cinema_therapy/comments/131zmgq/searchable_wiki_for_cinema_therapy/
- And one example how a Transcript would look like: https://data.bnwiki.de/index.php?title=Cinema_Therapy:Therapist_Reacts_to_Everything_Everywhere_All_at_Once

We're almost done with the bureaucracy related to properly founding a charitable NPO in germany, with all major steps being done. Currently, we are reworking the online representation to properly represent the charity, but some artefacts are born english and/or somewhat tranlsated:

Our code used for transcription, analysis and publishing of media (videos, text, etc.): https://github.com/borgnetzwerk/tools
Our aims and values: https://borgnetzwerk.de/aims-and-values/
Our founding as a charity: https://borgnetzwerk.de/founding/

I personally would love to help, from low level support like being one of the language support ppl, to co-designing, hosting and maintaining the wiki, community-effort, etc. Exactly this is also my PhD topic alongisde the reason for founding the NPO, so I'd love to help, first and foremost in the way most helpful to your requirements.

Long story short:
I suggest Mediawiki - and offer a bunch of additional ideas.

I hope this helps :)

Best
Tim

on-shorts/italian

Missing title, captions and descriptions.

Hamming code part 1 and part 2 english subtitles swapped

Hi, I was watching the video and it seemed like that english subtitles were swapped. Part 1's subtitle shows on part 2, and part 2's shows for part 1.
It looked fine and are in right folder, and English seems to be only caption that is swapped on the video. Can you fix this?

gaussian-integral/hebrew

Missing captions file in the "3Blue1Brown Translations" tool for "Why π is in the normal distribution (beyond integral tricks)".

I already submitted tow versions for correcting the Hebrew translation, but not through the "3Blue1Brown Translations" tool. I found that using this tool is much better than the methods I used before, because if correctly shows mixed LTR and RTL sentences. Therefore, I would like to edit the translation using this tool. However, for some reason, the translation is not available to the tool.

Suggestion: Tagalog language

I'm a native Tagalog speaker and I can help with this.

Not sure if this could help much with views, but I think this should help bridge your work to young and old Filipinos who prefer to watch Tagalog videos.

eulers-number/italian

Line:
The answer to the question e to the what equals that base....

Time:
653.42 - 653.42

Describe the issue: This subtitle should be merged with the the next one.
Also, "814.90 - 829.50" was never said.

derivatives/dutch

Line:
If you change the specific distance vs....

Time:
164.84 - 167.16

Describe the issue:
It splits based on the periods from the "vs." meaning "versus", which occur mid-sentence.

Seems like /2015/eulers-formula-old and /2015/eulers-formula-poem have their names mixed up

It's all in the title. Haven't checked if there are other folders with this issue

Suggestions: language contribution workflow

I came across this project movie-web and I find their contribution workflow is what this project really needs. They are using Weblate to automate translation and contributions from the community, a web-based translation tool with version control.

movie-web Contribution Guidelines

Weblate

derivatives

Line:
- If you change the specific distance vs. -
- time function, you'll have some different velocity vs. -
- time function. -

Time:
164.84 - 171.80

Describe the issue:
Currently this sentence is divided in 3 parts. I think it should be merged in one single part.
Same in 175.68 - 179.82, 331.58 - 337.68

Suggestion: Greek Language

I wanted to contribute to this massive project after reading the announcement. I know many people here in Greece who watch and appreciate Grant's videos. Therefore, I am suggesting the addition and availability of the Greek language, the translation of which I would be very willing to contribute.

differential-equations DE1 - "l" in subtitles should be "L"

Original
More specifically, one with a period of 2 pi times the square root of l over g, where l is the length of the pendulum and g is the strength of gravity.

Time:
370.77 - 381.17

Line
Entry # 44
Line # ~355

Describe the issue
"l" in this and other related sentences should be "L", as shown in the video.

Suggestion: Claiming System?

It could be really disheartening to take on a translation project that's larger than a few scripts, knowing that by the time you're done half of them could have already been fixed. Could there be a way to organize work between us, or even a way to know if someone's already done work to a specific video/file?

(a discord server to coordinate in would go a long way towards solving this imo)

Creation of Glossary for cross-/single-language references

Hello.
Does this project have glossary and memory for translation?
As we have an enourmous treasure of the assets, it would be helpfulto have glossary to identify the terms and commonly used expressions more efficiently and to provide the most probably correct translation for the expression.

Suggestion: using toml format instead of json

In your community post on youtube you say that it is a bit absurd to ask people to write json files. I think this is true. Non technical people can have diffulties with json files.

I suggest that you use toml. It tries to do the same as json but it is much much more human friendly.

It is also python compatible without 3rd party dependencies, so python has a toml parser in the standart library (since 3.11).

Many things already use toml like rust cargo and python poetry.

Check it out. Its really cool.

3b1b / captions Goto Github PK

captions's People

Stargazers

Watchers

Forkers

captions's Issues

Requested videos for translations

Context

Solution

Proposal

PD

Consistent formality in German translations

Footnotes

Issue

Solution

Recommend Projects

Recommend Topics

Recommend Org