Git Product home page Git Product logo

zh-plus / openlrc Goto Github PK

View Code? Open in Web Editor NEW
326.0 8.0 22.0 5.85 MB

Transcribe and translate voice into LRC file using Whisper and LLMs (GPT, Claude, et,al). 使用whisper和LLM(GPT,Claude等)来转录、翻译你的音频为字幕文件。

Home Page: https://zh-plus.github.io/openlrc/

License: MIT License

Python 100.00%
faster-whisper lyrics lyrics-generator openai-api speech-to-text transcribe voice-to-text whisper openlrc auto-subtitle

openlrc's Introduction

Open-Lyrics

PyPI PyPI - License Downloads GitHub Workflow Status (with event)

Open-Lyrics is a Python library that transcribes voice files using faster-whisper, and translates/polishes the resulting text into .lrc files in the desired language using LLM, e.g. OpenAI-GPT, Anthropic-Claude.

Key Features:

  • Well preprocessed audio to reduce hallucination (Loudness Norm & optional Noise Suppression).
  • Context-aware translation to improve translation quality. Check prompt for details.

New 🚨

  • 2024.3.29: Claude models are now available for translation. According to the testing, Claude 3 Sonnet performs way better than GPT-3.5 Turbo. We recommend using Claude 3 Sonnet for non-english audio (source language) translation (For now, the default model are still GPT-3.5 Turbo):
    lrcer = LRCer(chatbot_model='claude-3-sonnet-20240229')
  • 2024.4.4: Add basic streamlit GUI support. Try openlrc gui to start the GUI.
  • 2024.5.7:
    • Add custom endpoint (base_url) support for OpenAI & Anthropic:
      lrcer = LRCer(base_url_config={'openai': 'https://api.chatanywhere.tech',
                                     'anthropic': 'https://api.g4f.icu'})
    • Generating bilingual subtitles
      lrcer.run('./data/test.mp3', target_lang='zh-cn', bilingual_sub=True)
  • 2024.5.11: Add glossary into prompt, which is confirmed to improve domain specific translation. Check here for details.

Installation ⚙️

  1. Please install CUDA 11.x and cuDNN 8 for CUDA 11 first according to https://opennmt.net/CTranslate2/installation.html to enable faster-whisper.

    faster-whisper also needs cuBLAS for CUDA 11 installed.

    For Windows Users (click to expand)

    (For Windows Users only) Windows user can Download the libraries from Purfview's repository:

    Purfview's whisper-standalone-win provides the required NVIDIA libraries for Windows in a single archive. Decompress the archive and place the libraries in a directory included in the PATH.

  2. Add LLM API keys, you can either:

  3. Install PyTorch:

    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
  4. Install latest fast-whisper

    pip install git+https://github.com/guillaumekln/faster-whisper
  5. Install ffmpeg and add bin directory to your PATH.

  6. This project can be installed from PyPI:

    pip install openlrc

    or install directly from GitHub:

    pip install git+https://github.com/zh-plus/openlrc

Usage 🐍

GUI

Note

We are migrating the GUI from streamlit to Gradio. The GUI is still under development.

openlrc gui

Python code

from openlrc import LRCer

if __name__ == '__main__':
    lrcer = LRCer()

    # Single file
    lrcer.run('./data/test.mp3',
              target_lang='zh-cn')  # Generate translated ./data/test.lrc with default translate prompt.

    # Multiple files
    lrcer.run(['./data/test1.mp3', './data/test2.mp3'], target_lang='zh-cn')
    # Note we run the transcription sequentially, but run the translation concurrently for each file.

    # Path can contain video
    lrcer.run(['./data/test_audio.mp3', './data/test_video.mp4'], target_lang='zh-cn')
    # Generate translated ./data/test_audio.lrc and ./data/test_video.srt

    # Use context.yaml to improve translation
    lrcer.run('./data/test.mp3', target_lang='zh-cn', context_path='./data/context.yaml')

    # Use glossary to improve translation
    lrcer = LRCer(glossary='./data/aoe4-glossary.yaml')

    # To skip translation process
    lrcer.run('./data/test.mp3', target_lang='en', skip_trans=True)

    # Change asr_options or vad_options, check openlrc.defaults for details
    vad_options = {"threshold": 0.1}
    lrcer = LRCer(vad_options=vad_options)
    lrcer.run('./data/test.mp3', target_lang='zh-cn')

    # Enhance the audio using noise suppression (consume more time).
    lrcer.run('./data/test.mp3', target_lang='zh-cn', noise_suppress=True)

    # Change the LLM model for translation
    lrcer = LRCer(chatbot_model='claude-3-sonnet-20240229')
    lrcer.run('./data/test.mp3', target_lang='zh-cn')

    # Clear temp folder after processing done
    lrcer.run('./data/test.mp3', target_lang='zh-cn', clear_temp_folder=True)

    # Change base_url
    lrcer = LRCer(base_url_config={'openai': 'https://api.chatanywhere.tech',
                                   'anthropic': 'https://api.g4f.icu'})

    # Bilingual subtitle
    lrcer.run('./data/test.mp3', target_lang='zh-cn', bilingual_sub=True)

Check more details in Documentation.

Context

Utilize the available context to enhance the quality of your translation. Save them as context.yaml in the same directory as your audio file.

Note

The improvement of translation quality from Context is NOT guaranteed.

background: "This is a multi-line background.
This is a basic example."
audio_type: Movie
description_map: {
  movie_name1 (without extension): "This
  is a multi-line description for movie1.",
  movie_name2 (without extension): "This
  is a multi-line description for movie2.",
  movie_name3 (without extension): "This is a single-line description for movie 3.",
}

Glossary

Add glossary to improve domain specific translation. For example aoe4-glossary.yaml:

{
  "aoe4": "帝国时代4",
  "feudal": "封建时代",
  "2TC": "双TC",
  "English": "英格兰文明",
  "scout": "侦察兵"
}
lrcer = LRCer(glossary='./data/aoe4-glossary.yaml')
lrcer.run('./data/test.mp3', target_lang='zh-cn')

or directly use dictionary to add glossary:

lrcer = LRCer(glossary={"aoe4": "帝国时代4", "feudal": "封建时代"})
lrcer.run('./data/test.mp3', target_lang='zh-cn')

Pricing 💰

pricing data from OpenAI and Anthropic

Model Name Pricing for 1M Tokens
(Input/Output) (USD)
Cost for 1 Hour Audio
(USD)
gpt-3.5-turbo-0125 0.5, 1.5 0.01
gpt-3.5-turbo 0.5, 1.5 0.01
gpt-4-0125-preview 10, 30 0.5
gpt-4-turbo-preview 10, 30 0.5
claude-3-haiku-20240307 0.25, 1.25 0.015
claude-3-sonnet-20240229 3, 15 0.2
claude-3-opus-20240229 15, 75 1

Note the cost is estimated based on the token count of the input and output text. The actual cost may vary due to the language and audio speed.

Recommended translation model

For english audio, we recommend using gpt-3.5-turbo.

For non-english audio, we recommend using claude-3-sonnet-20240229.

Todo

  • [Efficiency] Batched translate/polish for GPT request (enable contextual ability).
  • [Efficiency] Concurrent support for GPT request.
  • [Translation Quality] Make translate prompt more robust according to https://github.com/openai/openai-cookbook.
  • [Feature] Automatically fix json encoder error using GPT.
  • [Efficiency] Asynchronously perform transcription and translation for multiple audio inputs.
  • [Quality] Improve batched translation/polish prompt according to gpt-subtrans.
  • [Feature] Input video support.
  • [Feature] Multiple output format support.
  • [Quality] Speech enhancement for input audio.
  • [Feature] Preprocessor: Voice-music separation.
  • [Feature] Align ground-truth transcription with audio.
  • [Quality] Use multilingual language model to assess translation quality.
  • [Efficiency] Add Azure OpenAI Service support.
  • [Quality] Use claude for translation.
  • [Feature] Add local LLM support.
  • [Feature] Multiple translate engine (Anthropic, Microsoft, DeepL, Google, etc.) support.
  • [Feature] Build a electron + fastapi GUI for cross-platform application.
  • [Feature] Web-based streamlit GUI.
  • Add fine-tuned whisper-large-v2 models for common languages.
  • [Feature] Add custom OpenAI & Anthropic endpoint support.
  • [Feature] Add local translation model support (e.g. SakuraLLM).
  • [Others] Add transcribed examples.
    • Song
    • Podcast
    • Audiobook

Credits

Star History

Star History Chart

openlrc's People

Contributors

anilbey avatar cck0517 avatar zh-plus avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

openlrc's Issues

Could not locate cublasLt64_12.dll. Please make sure it is in your library path!

I get this error when I use CUDA 11.8: which is according to the installation guide of ctranslate2:
"Could not locate cublasLt64_12.dll. Please make sure it is in your library path!"

However, if I switch to CUDA 12, I get the following message:

Traceback (most recent call last):
File "C:\Users\16152\Desktop\p projects\Whisper\process.py", line 8, in
lrcer.run('./095-1-of-8.mp4', target_lang='zh-cn')
File "C:\Users\16152\Desktop\p projects\Whisper\venv\lib\site-packages\openlrc\openlrc.py", line 243, in run
producer.result()
File "C:\Users\16152\AppData\Local\Programs\Python\Python39\lib\concurrent\futures_base.py", line 445, in result
return self.__get_result()
File "C:\Users\16152\AppData\Local\Programs\Python\Python39\lib\concurrent\futures_base.py", line 390, in __get_result
raise self._exception
File "C:\Users\16152\AppData\Local\Programs\Python\Python39\lib\concurrent\futures\thread.py", line 52, in run
result = self.fn(*self.args, **self.kwargs)
File "C:\Users\16152\Desktop\p projects\Whisper\venv\lib\site-packages\openlrc\openlrc.py", line 83, in transcription_producer
segments, info = self.transcriber.transcribe(audio_path, language=src_lang)
File "C:\Users\16152\Desktop\p projects\Whisper\venv\lib\site-packages\openlrc\transcribe.py", line 36, in transcribe
seg_gen, info = self.whisper_model.transcribe(str(audio_path), language=language,
File "C:\Users\16152\Desktop\p projects\Whisper\venv\lib\site-packages\faster_whisper\transcribe.py", line 308, in transcribe
encoder_output = self.encode(segment)
File "C:\Users\16152\Desktop\p projects\Whisper\venv\lib\site-packages\faster_whisper\transcribe.py", line 610, in encode
return self.model.encode(features, to_cpu=to_cpu)
RuntimeError: Library cublas64_11.dll is not found or cannot be loaded

What should I do?

skip_trans 选项好像有点问题,不会生成lrc文件

运行如下代码
lrcer.run("D:\CloudMusic\电台节目\阿坑是个坑 - 四六级长难句精听磨耳朵 19.mp3" , target_lang='en', skip_trans=True )

会到这一步停住
96%|█████████▌| 230.71/240 [00:10<00:00, 22.86 seconds/s] [2024-03-15 23:26:40] INFO [Producer_0] Start Sentence Segmentation [2024-03-15 23:26:40] INFO [Producer_0] Sentence Segmentation Elapsed: 0.27s [2024-03-15 23:26:40] INFO [Producer_0] Detected language: en [2024-03-15 23:26:40] INFO [Producer_0] Transcription process Elapsed: 12.54s [2024-03-15 23:26:40] INFO [Producer_0] File saved to D:\CloudMusic\电台节目\preprocessed\阿坑是个坑 - 四六级长难句精听磨耳朵 19_preprocessed_transcribed.json [2024-03-15 23:26:40] INFO [Producer_0] Transcription producer finished. [2024-03-15 23:26:40] INFO [ThreadPoolExecutor-1_1] Got transcription: D:\CloudMusic\电台节目\preprocessed\阿坑是个坑 - 四六级长难句精听磨耳朵 19_preprocessed_transcribed.json [2024-03-15 23:26:40] INFO [ThreadPoolExecutor-1_1] Optimized json file saved to D:\CloudMusic\电台节目\preprocessed\阿坑是个坑 - 四六级长难句精听磨耳朵 19_preprocessed_transcribed_optimized.json [2024-03-15 23:26:40] INFO [Consumer_0] Transcription consumer finished. [2024-03-15 23:26:40] INFO [MainThread] Transcription (Producer) and Translation (Consumer) process Elapsed: 12.55s [2024-03-15 23:26:40] INFO [MainThread] Totally used API fee: 0.0000 USD

然后并不会生成lrc字幕文件,但preprocessed里面会正常生成三个文件,打开json可以看到转译都是没问题的
不用skip_trans就是正常的,会生成lrc文件

小建议-清除中间文件

项目很棒,我用来翻译视频,英文的视频,用base已经很好了

就是有个问题,每次生成就会伴随着一大堆的临时文件,根目录的wav,还有preprocessed文件夹,里面也是大量文件

希望空余了,可以设置个参数,可以在任务结束后,清除这些东西

我尝试修改代码,但是有些部分,不敢动。因为可能哪里就会用到这个文件😝

gui好像是。有点问题的

如图,找不到模块
5932ab68fbf0171958bdf8f096bc8362
Traceback: File "C:\Users\infin\AppData\Local\Programs\Python\Python311\Lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 584, in _run_script exec(code, module.__dict__) File "C:\Users\infin\AppData\Local\Programs\Python\Python311\Lib\site-packages\openlrc\gui\home.py", line 9, in <module> from st_pages import Page, show_pages

AssertionError case sensitivity issue

The exception looks like this:

AssertionError: Last split: ab not in ~~~~~AB~~~~~~

The low-case word cant match the upper-case word in the sentence. Exceptions may be caused by the abbreviation(e.g. BGM) or single letter in the sentences.

image

exception when running

Alas, this did not work for me.

[why are we probing an mp3 anyway?]

 [2023-07-06 07:52:54] WARNING  [MainThread] The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
 [2023-07-06 07:52:54] WARNING  [MainThread] The torchaudio backend is switched to 'soundfile'. Note that 'sox_io' is not supported on Windows.
Traceback (most recent call last):
  File "C:\ProgramData\anaconda3\lib\site-packages\openlrc\utils.py", line 52, in get_file_type
    video_stream = ffmpeg.probe(path, select_streams='v')['streams']
  File "C:\ProgramData\anaconda3\lib\site-packages\ffmpeg\_probe.py", line 23, in probe
    raise Error('ffprobe', out, err)
ffmpeg._run.Error: ffprobe error (see stderr output for detail)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\BAT\lrc.py", line 7, in <module>
    lrcer.run(sys.argv[0])  # Generate translated test.lrc with default translate prompt.
  File "C:\ProgramData\anaconda3\lib\site-packages\openlrc\openlrc.py", line 168, in run
    audio_paths = self.pre_process(paths)
  File "C:\ProgramData\anaconda3\lib\site-packages\openlrc\openlrc.py", line 232, in pre_process
    paths[i] = extract_audio(path)
  File "C:\ProgramData\anaconda3\lib\site-packages\openlrc\utils.py", line 25, in extract_audio
    file_type = get_file_type(path)
  File "C:\ProgramData\anaconda3\lib\site-packages\openlrc\utils.py", line 54, in get_file_type
    raise FfmpegException(f'ffmpeg error: {e}')
openlrc.exceptions.FfmpegException: ffmpeg error: ffprobe error (see stderr output for detail)

音频预处理阶段出错 Runtime exception during preprocess

Dependecies all installed according to the documents

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\ezrealc\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 120, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ezrealc\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 129, in _main
    prepare(preparation_data)
  File "C:\Users\ezrealc\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 240, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\ezrealc\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 291, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen runpy>", line 291, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "C:\Users\ezrealc\Desktop\openlrc\openlrc-run.py", line 11, in <module>
    lrcer.run("example.mp3", target_lang='zh-cn')
  File "C:\Users\ezrealc\AppData\Local\Programs\Python\Python311\Lib\site-packages\openlrc\openlrc.py", line 231, in run
    audio_paths = self.pre_process(paths, noise_suppress=noise_suppress)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ezrealc\AppData\Local\Programs\Python\Python311\Lib\site-packages\openlrc\openlrc.py", line 289, in pre_process
    paths = preprocessor.run(noise_suppress)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ezrealc\AppData\Local\Programs\Python\Python311\Lib\site-packages\openlrc\preprocess.py", line 131, in run
    ln_paths: list[Path] = self.loudness_normalization(ns_paths)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ezrealc\AppData\Local\Programs\Python\Python311\Lib\site-packages\openlrc\preprocess.py", line 106, in loudness_normalization
    _ = [executor.submit(loudness_norm_single, *arg) for arg in args]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ezrealc\AppData\Local\Programs\Python\Python311\Lib\site-packages\openlrc\preprocess.py", line 106, in <listcomp>
    _ = [executor.submit(loudness_norm_single, *arg) for arg in args]
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ezrealc\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures\process.py", line 787, in submit
    self._adjust_process_count()
  File "C:\Users\ezrealc\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures\process.py", line 746, in _adjust_process_count
    self._spawn_process()
  File "C:\Users\ezrealc\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures\process.py", line 764, in _spawn_process
    p.start()
  File "C:\Users\ezrealc\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "C:\Users\ezrealc\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\context.py", line 336, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "C:\Users\ezrealc\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ezrealc\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 158, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\ezrealc\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 138, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
Traceback (most recent call last):
  File "C:\Users\ezrealc\Desktop\openlrc\openlrc-run.py", line 11, in <module>
    lrcer.run("example.mp3", target_lang='zh-cn')
  File "C:\Users\ezrealc\AppData\Local\Programs\Python\Python311\Lib\site-packages\openlrc\openlrc.py", line 231, in run
    audio_paths = self.pre_process(paths, noise_suppress=noise_suppress)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ezrealc\AppData\Local\Programs\Python\Python311\Lib\site-packages\openlrc\openlrc.py", line 289, in pre_process
    paths = preprocessor.run(noise_suppress)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ezrealc\AppData\Local\Programs\Python\Python311\Lib\site-packages\openlrc\preprocess.py", line 136, in run
    ln_path = ln_path.rename(ln_path.parent / f'{audio_name}_preprocessed.wav')
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\ezrealc\AppData\Local\Programs\Python\Python311\Lib\pathlib.py", line 1175, in rename
    os.rename(self, target)
FileNotFoundError: [WinError 2] 系统找不到指定的文件。: 'preprocessed\\example_ln.wav' -> 'preprocessed\\example_preprocessed.wav'

TypeError: transcribe() got an unexpected keyword argument 'repetition_penalty'

执行 lrcer.run('./data/p1.mp3', arget_lang='zh-cn', skip_trans=True) 报错

Traceback (most recent call last):
File "/root/autodl-tmp/code/audio2text/openlrc_test.py", line 7, in
lrcer.run('./data/p1.mp3',
File "/root/miniconda3/envs/audio/lib/python3.9/site-packages/openlrc/openlrc.py", line 243, in run
producer.result()
File "/root/miniconda3/envs/audio/lib/python3.9/concurrent/futures/_base.py", line 446, in result
return self.__get_result()
File "/root/miniconda3/envs/audio/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
raise self._exception
File "/root/miniconda3/envs/audio/lib/python3.9/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/root/miniconda3/envs/audio/lib/python3.9/site-packages/openlrc/openlrc.py", line 83, in transcription_producer
segments, info = self.transcriber.transcribe(audio_path, language=src_lang)
File "/root/miniconda3/envs/audio/lib/python3.9/site-packages/openlrc/transcribe.py", line 36, in transcribe
seg_gen, info = self.whisper_model.transcribe(str(audio_path), language=language,
TypeError: transcribe() got an unexpected keyword argument 'repetition_penalty

经过检查,0.2.3 版本 的 openlrc.py 里定义的 transcribe() 函数确实没有 repetition_penalty 参数

preprocessed 执行失败

ln_path = ln_path.rename(ln_path.parent / f'{audio_name}_preprocessed.wav')
这一步执行失败,没反应,导致后面报错:
FileNotFoundError: [WinError 2] 系统找不到指定的文件。

today's update seems to not transcribe right

So I've been working on several different LRC solutions.

The default whisper gets the lyrics to this song:

image

Here is the code i used:

image

But when I run your version, I only get a few words:

image

It seems like whisperx.exe is not getting the words out of the audio file:
image

But whisper.exe (the official openai one) is:
image

So, something is broken with whisperx. Any idea what it might be?

lrcer.run执行报错

test.py 代码如下

from openlrc import LRCer
lrcer = LRCer()
lrcer.run('./data/1.wav', target_lang='zh-cn')

执行报错内容如下

D:\test>python test.py
C:\Users\gyx\AppData\Local\Programs\Python\Python311\Lib\site-packages\df\io.py:9: UserWarning: 'torchaudio.backend.common.AudioMetaData' has been moved to 'torchaudio.AudioMetaData'. Please update the import path.
  from torchaudio.backend.common import AudioMetaData
 [2024-01-23 22:06:02] INFO     [MainThread] Default context config not found: Context(background=, audio_type=Anime, description_map={}), using default context.
 [2024-01-23 22:06:02] INFO     [MainThread] Loudness normalizing...
C:\Users\gyx\AppData\Local\Programs\Python\Python311\Lib\site-packages\df\io.py:9: UserWarning: 'torchaudio.backend.common.AudioMetaData' has been moved to 'torchaudio.AudioMetaData'. Please update the import path.
  from torchaudio.backend.common import AudioMetaData
 [2024-01-23 22:06:07] INFO     [MainThread] Default context config not found: Context(background=, audio_type=Anime, description_map={}), using default context.
 [2024-01-23 22:06:07] INFO     [MainThread] Loudness normalizing...
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\gyx\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 120, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\gyx\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 129, in _main
    prepare(preparation_data)
  File "C:\Users\gyx\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 240, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Users\gyx\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 291, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen runpy>", line 291, in run_path
  File "<frozen runpy>", line 98, in _run_module_code
  File "<frozen runpy>", line 88, in _run_code
  File "D:\test\test.py", line 18, in <module>
    lrcer.run('./data/1.wav', target_lang='zh-cn')
  File "C:\Users\gyx\AppData\Local\Programs\Python\Python311\Lib\site-packages\openlrc\openlrc.py", line 231, in run
    audio_paths = self.pre_process(paths, noise_suppress=noise_suppress)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\gyx\AppData\Local\Programs\Python\Python311\Lib\site-packages\openlrc\openlrc.py", line 289, in pre_process
    paths = preprocessor.run(noise_suppress)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\gyx\AppData\Local\Programs\Python\Python311\Lib\site-packages\openlrc\preprocess.py", line 139, in run
    ln_paths: list[Path] = self.loudness_normalization(ns_paths)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\gyx\AppData\Local\Programs\Python\Python311\Lib\site-packages\openlrc\preprocess.py", line 106, in loudness_normalization
    results = [executor.submit(loudness_norm_single, *arg) for arg in args]
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\gyx\AppData\Local\Programs\Python\Python311\Lib\site-packages\openlrc\preprocess.py", line 106, in <listcomp>
    results = [executor.submit(loudness_norm_single, *arg) for arg in args]
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\gyx\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures\process.py", line 782, in submit
    self._adjust_process_count()
  File "C:\Users\gyx\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures\process.py", line 741, in _adjust_process_count
    self._spawn_process()
  File "C:\Users\gyx\AppData\Local\Programs\Python\Python311\Lib\concurrent\futures\process.py", line 759, in _spawn_process
    p.start()
  File "C:\Users\gyx\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "C:\Users\gyx\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\context.py", line 336, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "C:\Users\gyx\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\popen_spawn_win32.py", line 45, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\gyx\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 158, in get_preparation_data
    _check_not_importing_main()
  File "C:\Users\gyx\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 138, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
 [2024-01-23 22:06:07] ERROR    [MainThread] Loudness normalization failed, exception: A process in the process pool was terminated abruptly while the future was running or pending.
Traceback (most recent call last):
  File "D:\test\test.py", line 18, in <module>
    lrcer.run('./data/1.wav', target_lang='zh-cn')
  File "C:\Users\gyx\AppData\Local\Programs\Python\Python311\Lib\site-packages\openlrc\openlrc.py", line 231, in run
    audio_paths = self.pre_process(paths, noise_suppress=noise_suppress)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\gyx\AppData\Local\Programs\Python\Python311\Lib\site-packages\openlrc\openlrc.py", line 289, in pre_process
    paths = preprocessor.run(noise_suppress)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\gyx\AppData\Local\Programs\Python\Python311\Lib\site-packages\openlrc\preprocess.py", line 139, in run
    ln_paths: list[Path] = self.loudness_normalization(ns_paths)
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\gyx\AppData\Local\Programs\Python\Python311\Lib\site-packages\openlrc\preprocess.py", line 114, in loudness_normalization
    raise exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

Diarization

Does this or do you plan to add speaker recognition?

可以正常转译,但不能翻译,翻译功能无法正常运作

这个preprocessed.lrc打开还是英文的,open ai的api也没有使用记录

[2023-12-28 20:23:59] INFO     [Producer_0] Start Sentence Segmentation
 [2023-12-28 20:24:00] INFO     [Producer_0] Sentence Segmentation Elapsed: 0.35s
 [2023-12-28 20:24:00] INFO     [Producer_0] Detected language: en
 [2023-12-28 20:24:00] INFO     [Producer_0] Transcription process Elapsed: 11.74s
 [2023-12-28 20:24:00] INFO     [Producer_0] File saved to D:\CloudMusic\电台节目\preprocessed\阿坑是个坑 - 四六级长难句精听磨耳朵 19_preprocessed_transcribed.json
 [2023-12-28 20:24:00] INFO     [Producer_0] Transcription producer finished.
 [2023-12-28 20:24:00] INFO     [ThreadPoolExecutor-1_0] Got transcription: D:\CloudMusic\电台节目\preprocessed\阿坑是个坑 - 四六级长难句精听磨耳朵 19_preprocessed_transcribed.json
 [2023-12-28 20:24:00] INFO     [ThreadPoolExecutor-1_0] Optimized json file saved to D:\CloudMusic\电台节目\preprocessed\阿坑是个坑 - 四六级长难句精听磨耳朵 19_preprocessed_transcribed_optimized.json
 [2023-12-28 20:24:00] INFO     [ThreadPoolExecutor-1_0] Start Translation process
 [2023-12-28 20:24:00] INFO     [ThreadPoolExecutor-1_0] Translating 阿坑是个坑 - 四六级长难句精听磨耳朵 19_preprocessed: 1 chunks, 8 lines in total.
 [2023-12-28 20:24:00] INFO     [ThreadPoolExecutor-1_0] Translation process Elapsed: 0.57s
 [2023-12-28 20:24:00] INFO     [ThreadPoolExecutor-1_0] File saved to D:\CloudMusic\电台节目\preprocessed\阿坑是个坑 - 四六级长难句精听磨耳朵 19_preprocessed.lrc
 [2023-12-28 20:24:00] INFO     [ThreadPoolExecutor-1_0] Translation fee til now: 0.0000 USD
 [2023-12-28 20:24:00] INFO     [Consumer_0] Transcription consumer finished.
 [2023-12-28 20:24:00] INFO     [MainThread] Transcription (Producer) and Translation (Consumer) process Elapsed: 12.33s
Traceback (most recent call last):
  File "D:\pythonProject\open use1\1.py", line 9, in <module>
    lrcer.run("D:\CloudMusic\电台节目\阿坑是个坑 - 四六级长难句精听磨耳朵 19.mp3",
  File "D:\conda\envs\openlrc\Lib\site-packages\openlrc\openlrc.py", line 241, in run
    raise self.exception
  File "D:\conda\envs\openlrc\Lib\site-packages\openlrc\openlrc.py", line 134, in consumer_worker
    final_subtitle = self._translate(audio_name, prompter, target_lang, transcribed_opt_sub,
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\conda\envs\openlrc\Lib\site-packages\openlrc\openlrc.py", line 159, in _translate
    target_texts = translator.translate(
                   ^^^^^^^^^^^^^^^^^^^^^
  File "D:\conda\envs\openlrc\Lib\site-packages\openlrc\translate.py", line 133, in translate
    response = translate_bot.message(messages_list, output_checker=prompter.check_format)[0]
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\conda\envs\openlrc\Lib\site-packages\openlrc\chatbot.py", line 144, in message
    token_numbers = [get_messages_token_number(message) for message in messages_list]
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\conda\envs\openlrc\Lib\site-packages\openlrc\chatbot.py", line 144, in <listcomp>
    token_numbers = [get_messages_token_number(message) for message in messages_list]
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\conda\envs\openlrc\Lib\site-packages\openlrc\utils.py", line 108, in get_messages_token_number
    total = sum([get_text_token_number(element['content'], model=model) for element in messages])
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\conda\envs\openlrc\Lib\site-packages\openlrc\utils.py", line 108, in <listcomp>
    total = sum([get_text_token_number(element['content'], model=model) for element in messages])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\conda\envs\openlrc\Lib\site-packages\openlrc\utils.py", line 102, in get_text_token_number
    tokens = tiktoken.encoding_for_model(model).encode(text)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\conda\envs\openlrc\Lib\site-packages\tiktoken\model.py", line 75, in encoding_for_model
    return get_encoding(encoding_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\conda\envs\openlrc\Lib\site-packages\tiktoken\registry.py", line 63, in get_encoding
    enc = Encoding(**constructor())
                     ^^^^^^^^^^^^^
  File "D:\conda\envs\openlrc\Lib\site-packages\tiktoken_ext\openai_public.py", line 64, in cl100k_base
    mergeable_ranks = load_tiktoken_bpe(
                      ^^^^^^^^^^^^^^^^^^
  File "D:\conda\envs\openlrc\Lib\site-packages\tiktoken\load.py", line 115, in load_tiktoken_bpe
    return {
           ^
  File "D:\conda\envs\openlrc\Lib\site-packages\tiktoken\load.py", line 117, in <dictcomp>
    for token, rank in (line.split() for line in contents.splitlines() if line)
        ^^^^^^^^^^^
ValueError: too many values to unpack (expected 2)

FileNotFoundError: [WinError 2] The system cannot find the file specified: 'data\\preprocessed\\track5_ln.wav' -> 'data\\preprocessed\\track5_preprocessed.wav'

I get this error when running the code. Help?

`C:\Test\whisperx>whisper-this-lc.py
C:\Users\Vert\AppData\Local\Programs\Python\Python311\Lib\site-packages\df\io.py:9: UserWarning: `torchaudio.backend.common.AudioMetaData` has been moved to `torchaudio.AudioMetaData`. Please update the import path.
  from torchaudio.backend.common import AudioMetaData
 [2023-12-27 14:20:22] INFO     [MainThread] Default context config not found: Context(background=, audio_type=Anime, description_map={}), using default context.
 [2023-12-27 14:20:22] INFO     [MainThread] Preprocessed audio already exists in data\preprocessed\track1_preprocessed.wav
 [2023-12-27 14:20:22] INFO     [MainThread] Preprocessed audio already exists in data\preprocessed\track2_preprocessed.wav
 [2023-12-27 14:20:22] INFO     [MainThread] Preprocessed audio already exists in data\preprocessed\track3_preprocessed.wav
 [2023-12-27 14:20:22] INFO     [MainThread] Preprocessed audio already exists in data\preprocessed\track4_preprocessed.wav
 [2023-12-27 14:20:22] INFO     [MainThread] Loudness normalizing...
C:\Users\Vert\AppData\Local\Programs\Python\Python311\Lib\site-packages\df\io.py:9: UserWarning: `torchaudio.backend.common.AudioMetaData` has been moved to `torchaudio.AudioMetaData`. Please update the import path.
  from torchaudio.backend.common import AudioMetaData
 [2023-12-27 14:20:29] INFO     [MainThread] Normalizing file track5.wav (1 of 1)
 [2023-12-27 14:20:29] INFO     [MainThread] Running first pass loudnorm filter for stream 0
 [2023-12-27 14:20:39] INFO     [MainThread] Running second pass for data\track5.wav
 [2023-12-27 14:20:39] ERROR    [MainThread] Error while running command 'C:\Program Files\ImageMagick-7.0.10-Q16-HDRI\ffmpeg.EXE' -hide_banner -y -i 'data\track5.wav' -filter_complex '[0:0]loudnorm=i=-23.0:lra=21.3:tp=-2.0:offset=-0.18:measured_i=-13.62:measured_lra=21.3:measured_tp=-2.6:measured_thresh=-28.24:linear=true:print_format=json[norm0]' -map_metadata 0 -map_metadata:s:a:0 0:s:a:0 -map_chapters 0 -map '[norm0]' -c:a pcm_s16le -ar 48000 -c:s copy -f wav 'C:\Users\Vert\AppData\Local\Temp\tmpmwnezkx8\out.wav'! Error: Error running command ['C:\\Program Files\\ImageMagick-7.0.10-Q16-HDRI\\ffmpeg.EXE', '-hide_banner', '-y', '-i', 'data\\track5.wav', '-filter_complex', '[0:0]loudnorm=i=-23.0:lra=21.3:tp=-2.0:offset=-0.18:measured_i=-13.62:measured_lra=21.3:measured_tp=-2.6:measured_thresh=-28.24:linear=true:print_format=json[norm0]', '-map_metadata', '0', '-map_metadata:s:a:0', '0:s:a:0', '-map_chapters', '0', '-map', '[norm0]', '-c:a', 'pcm_s16le', '-ar', '48000', '-c:s', 'copy', '-f', 'wav', 'C:\\Users\\Vert\\AppData\\Local\\Temp\\tmpmwnezkx8\\out.wav']: Guessed Channel Layout for Input Stream #0.0 : stereo
Input #0, wav, from 'data\track5.wav':
Duration: 00:04:45.33, bitrate: 2822 kb/s
Stream #0:0: Audio: pcm_s32le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s32, 2822 kb/s
[loudnorm @ 00000287a4d25580] Value 21.300000 for parameter 'lra' out of range [1 - 20]
Last message repeated 1 times
[loudnorm @ 00000287a4d25580] Error setting option lra to value 21.3.
[Parsed_loudnorm_0 @ 00000287a4d73a40] Error applying options to the filter.
[AVFilterGraph @ 00000287a4d74400] Error initializing filter 'loudnorm' with args 'i=-23.0:lra=21.3:tp=-2.0:offset=-0.18:measured_i=-13.62:measured_lra=21.3:measured_tp=-2.6:measured_thresh=-28.24:linear=true:print_format=json'
Error initializing complex filters.
Result too large

Traceback (most recent call last):
  File "C:\StableDiff\whisperx\whisper-this-lc.py", line 5, in <module>
    lrcer.run(['./data/track1.wav', './data/track2.wav', './data/track3.wav', './data/track4.wav', './data/track5.wav'], target_lang='en-us')
  File "C:\Users\Vert\AppData\Local\Programs\Python\Python311\Lib\site-packages\openlrc\openlrc.py", line 224, in run
    audio_paths = self.pre_process(paths, noise_suppress=noise_suppress)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Vert\AppData\Local\Programs\Python\Python311\Lib\site-packages\openlrc\openlrc.py", line 282, in pre_process
    paths = preprocessor.run(noise_suppress)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Vert\AppData\Local\Programs\Python\Python311\Lib\site-packages\openlrc\preprocess.py", line 136, in run
    ln_path = ln_path.rename(ln_path.parent / f'{audio_name}_preprocessed.wav')
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\Vert\AppData\Local\Programs\Python\Python311\Lib\pathlib.py", line 1175, in rename
    os.rename(self, target)
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'data\\preprocessed\\track5_ln.wav' -> 'data\\preprocessed\\track5_preprocessed.wav'`

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.