patrickenfuego / chapterize-audiobooks Goto Github PK

Split a single, monolithic mp3 audiobook file into chapters using Machine Learning and ffmpeg.

License: Apache License 2.0

Python 100.00%

audiobook-convertor audiobook-tracks audiobooks chapters machine-learning mp3-converter mp3-files mp3-tags speech-to-text

chapterize-audiobooks's Introduction

Chapterize-Audiobooks

Split a single, monolithic mp3 audiobook file into chapters using Machine Learning and ffmpeg.

Chapterize-Audiobooks

About

This is a simple command line utility that will chapterize your mp3 audiobooks for you. No longer will you have to dissect a waveform to look for chapter breaks, or deal with all the annoyances that come with a single audiobook file.

You can use this as an intermediary step for creating .m4b files, or keep the files the way they are if you don't want to sacrifice audio quality through re-encoding (or prefer .mp3 files for some reason).

Machine Learning

The script utilizes the vosk-api machine learning library which performs a speech-to-text conversion on the audiobook file, generating timestamps throughout which are stored in a srt (subrip) file. The file is then parsed, searching for phrases like "prologue", "chapter", and "epilogue", which are used as separators for the generated chapter files.

Models

NOTE: Downloading models requires the requests library

The small United States English (en-us) machine learning model is already provided with this project. However, various other languages and model sizes are available to download directly from the script - it will unpack the archive for you, too, and no additional effort should be required. See Supported Languages and Models for a comprehensive list.

Models should be saved within the project's model directory as this is where the script looks for them. If you want to save multiple models or sizes, no problem - just don't change the name of the unpacked model archive as the script uses portions of it to determine which model to use based on the combination of arguments passed.

If there is a model conflict, the script will perform a best effort guess to determine which one to use. If that fails, an error will be thrown.

Metadata Parsing

The script will also parse metadata from the source file along with the cover art (if present) and copy it into each chapter file automatically. There are CLI parameters you can use to pass your own ID3-compliant metadata properties, too, which always take precedence over the fields extracted from the source file (if there is a conflict). Otherwise, the tags will be combined and added to the final output.

Cue files

Cue files can be created to make editing chapter markers and start/stop timecodes easier. This is especially useful if the machine learning speech-to-text conversion misses a section, timecodes require tweaking, or if you wish to add additional breakpoint sections (such as 'forward' or 'aftermath') which are not currently supported by the script's parsing engine (and may not be supported anytime soon due to the difficulty required to make them work consistently without breaking other stuff).

Cue files are always generated within the same directory as the audiobook itself, but there are arguments which allow you to specify a custom path to a cue file if it is inconvenient to keep them paired with the audiobook.

Cue file syntax is generated in a somewhat unconventional way to make script parsing easier, but is similar enough that it can be converted to a standard .cue format fairly easily.

Configuration File

Included with this project is a defaults.toml file which you can use to specify configurations that you use frequently. This removes the need to pass certain CLI argument each time you run the script and provides options for specifying file paths (such as the path to ffmpeg if you don't feel like setting environment variables - see below).

Here is the base template included with the project:

# Uncomment lines starting with '#' to set config options, or modify an existing line.
# No spaces before/after the '='!
#
# Default model language
default_language='english'
# Default model size
default_model='small'
# Defaults to the system's PATH. You can specify a full path in single '' quotes
ffmpeg_path='ffmpeg'
# Change this to True if you always want the script to generate a cue file
generate_cue_file='False'
# Set this to the cue file path you want to use. Useful for continuous edits and script runs where the
# cue file is saved somewhere other than the current audiobook directory (the default search path).
# The cue_path script argument takes precedence over this path if used
cue_path=''

Dependencies

ffmpeg
python 3.10+
- Packages:
  - rich
  - vosk
  - requests (if you want to download models)

To install python dependencies, open a command shell and type the following:

NOTE: If you're on Linux, you might need to use pip3 instead

# Using the requirements file (recommended)
pip install -r requirements.txt
# Manually installing packages
pip install vosk rich requests

ffmpeg

It is recommended that you add ffmpeg to your system PATH so you don't have to run the script from the same directory. How you do this depends on your Operating System; consult your OS documentation (if you aren't familiar with the process, it's super easy. Just Google it).

Here is a quick example for Windows using PowerShell (it can be done via GUI, too):

# Whatever the path is to your ffmpeg install
$ffmpeg = 'C:\Users\SomeUser\Software\ffmpeg.exe'
$newPath = $env:PATH + ";$ffmpeg"
[Environment]::SetEnvironmentVariable('PATH', $newPath, 'User')
# Now close and reopen PowerShell to update

Here is a quick example using bash:

# Set this equal to wherever ffmpeg is
ffmpeg="/home/someuser/software/ffmpeg"
# If you're using zsh, replace with .zshrc
echo "export PATH=${ffmpeg}:${PATH}" >> ~/.bashrc
# Source the file to update
source ~/.bashrc

If you don't want to deal with all that, you can add the path of ffmpeg to the defaults.toml file included with the repository - copy and paste the full path and set it equal to the ffmpeg_path option using single quotes '':

# Specfying the path to ffmpeg manually
ffmpeg_path='C:\Users\SomeUser\Software\ffmpeg.exe'
# If ffmpeg is added to PATH, leave the file like this
ffmpeg_path='ffmpeg'

Supported Languages and Models

NOTE: You can set a default language and model size in the defaults.toml file included with the repository

The vosk-api provides models in several languages. By default, only the small 'en-us' model is provided with this repository, but you can download additional models in several languages using the script's --download_model/-dm parameter, which accepts arguments small and large (if nothing is passed, it defaults to small); if the model isn't English, you must also specify a language using --language/-l parameter. See Usage for more info.

Not all models are supported, but you can download additional models manually from the vosk website (and other sources listed on the site). Simply replace the existing model inside the /model directory with the one you wish to use.

The following is a list of models which can be downloaded using the --download_model parameter of the script:

You can use either the Language or Code fields to specify a model

Language	Code	Small	Large
English (US)	en-us	✓	✓
English (India)	en-in	✓	✓
Chinese	cn	✓	✓
Russian	ru	✓	✓
French	fr	✓	✓
German	de	✓	✓
Spanish	es	✓	✓
Portuguese	pt	✓	✓
Greek	el	✕	✓
Turkish	tr	✕	✓
Vietnamese	vn	✓	✕
Italian	it	✓	✓
Dutch	nl	✓	✕
Catalan	ca	✓	✕
Arabic	ar	✕	✓
Farsi	fa	✓	✓
Filipino	tl-ph	✕	✓
Kazakh	kz	✓	✓
Japanese	ja	✓	✓
Ukrainian	uk	✓	✓
Esperanto	eo	✓	✕
Hindi	hi	✓	✓
Czech	cs	✓	✕
Polish	pl	✓	✕

The model used for speech-to-text the conversion is fairly dependent on the quality of the audio. The model included in this repo is meant for small distributions on mobile systems, as it is the only one that will fit in a GitHub repository. If you aren't getting good results, you might want to consider using a larger model (if one is available).

Usage

usage: chapterize_ab.py [-h] or [--help]

usage: chapterize_ab.py [-ll] or [--list_languages]

usage: chapterize_ab.py [AUDIOBOOK_PATH] [--timecodes_file [TIMECODES_FILE]] [--language [LANGUAGE]]
                        [--download_model [{small,large}]] [--narrator [NARRATOR]] [--comment [COMMENT]]
                        [--model [{small,large}]] [--cover_art [COVER_ART_PATH]] [--author [AUTHOR]]
                        [--year [YEAR]] [--title [TITLE]] [--genre [GENRE]] [--write_cue_file]
                        [--cue_path [CUE_PATH]]

positional arguments:

  AUDIOBOOK_PATH          path to audiobook mp3 file. required.
  

optional argument flags:

  -h, --help              show help message with usage examples and exit.

  -ll, --list_languages   list supported languages and exit.

  -wc, --write_cue_file   generate a cue file inside the audiobook directory for editing chapter markers.
                          default disabled, but can be enabled permanently through defaults.toml.
                          
  
optional arguments:
  
  --timecodes_file, -tc [TIMECODES_FILE]
  DESCRIPTION:            optional path to an existing srt timecode file in a different directory.
                        
  --language, -l [LANGUAGE]
  DESCRIPTION:            model language to use. requires a supported model.
                          en-us is provided with the project.
  
  --model, -m [{small,large}]
  DESCRIPTION:            model type to use where multiple models are available. default is small.
  
  --download_model, -dm [{small,large}]
  DESCRIPTION:            download a model archive. language to download specified via
                          the --language argument.     
                        
  --cover_art, -ca [COVER_ART_PATH]
  DESCRIPTION:            path to cover art file. Optional.
                        
  --author, -a [AUTHOR]
  DESCRIPTION:            audiobook author. Optional metadata field.

   --narrator, -n [NARRATOR]
  DESCRIPTION:            audiobook narrator (should be compatible with most players). 
                          optional metadata field.
                        
  --title, -t [TITLE]
  DESCRIPTION:            audiobook title. optional metadata field.
                        
  --genre, -g [GENRE]
  DESCRIPTION:            audiobook genre. optional metadata field. multiple genres can be separated 
                          by a semicolon
                          
  --year, -y [YEAR]
  DESCRIPTION:            audiobook release year. optional metadata field.
                        
  --comment, -c [COMMENT]
  DESCRIPTION:            audiobook comment. optional metadata field.

  --cue_path, -cp [CUE_PATH]
  DESCRIPTION:            path to cue file in non-default location (i.e., not in the audiobook directory) 
                          containing chapter timecodes. can also be set within defaults.toml, which has 
                          lesser precedence than this argument.

Examples

NOTE: Each argument has a shortened alias. Most examples use the full argument name for clarity, but it's often more convenient in practice to use the aliases

# Adding the title and genre metadata fields 
~$ python3 ./chapterize_ab.py '/path/to/audiobook/file.mp3' --title 'Game of Thrones' --genre 'Fantasy'

# Adding an external cover art file (using shorthand flag -ca)
PS > python .\chapterize_ab.py 'C:\path\to\audiobook\file.mp3' -ca 'C:\path\to\cover_art.jpg'

# Set model to use German as the language (requires a different model, see above)
PS > python .\chapterize_ab.py 'C:\path\to\audiobook\file.mp3' --language 'de'

# Download a different model (Italian large used here as an example)
~$ python3 ./chapterize_ab.py '/path/to/audiobook/file.mp3' --download_model 'large' --language 'italian'

# Write a cue file inside the audiobook directory using the --write_cue_file option flag
PS > python .\chapterize_ab.py 'C:\path\to\audiobook\file.mp3' --write_cue_file

# Specify custom path to a cue file. Overrides the default search path (audiobook directory)
~$ python3 ./chapterize_ab.py '/path/to/audiobook/file.mp3' --cue_path '/path/to/file.cue'

Improvement

This script is still in alpha, and thus there are bound to be some issues; I've noticed a few words and phrases that have falsely generated chapter markers, which I'm currently compiling into an ignore list as I see them. With that said, it's been remarkably accurate so far.

I encourage anyone who might use this to report any issues you find, particularly with false positive chapter markers. The more false positives identified, the more accurate it will be!

Language Support

So far, support for this project is primarily targeted toward English audiobooks only; I've added some German content, but I'm by no means a fluent speaker and there are a lot of gaps.

If you want to contribute an exclusion list and chapter markers for other languages (preferably vosk supported languages), please do! Open a pull request or send them to me in a GitHub issue and I'll gladly merge them into the project. I'd like to make this project multi-lingual, but I can't do it without your help.

Known Issues

Access Denied Error on Windows

Every once in a while when downloading a new model on Windows, it will throw an "Access Denied" exception after attempting to rename the extracted file. This isn't really a permissions issue, but rather a concurrency one. I've found that closing any app or Explorer window that might be related to Chapterize-Audiobooks usually fixes this problem. This seems to be a somewhat common issue with Python on Windows when renaming/deleting/moving files.

chapterize-audiobooks's People

Contributors

Stargazers

Watchers

Forkers

testxsubject linden-ryuujin lunawesley9 loganm123 frenchbeast squallium drsocket andrei27m rainthefrog timski01 leetoo ezzybb finefindus

chapterize-audiobooks's Issues

When I run the script, I get a syntax error in line 40, to do with vosk_link

Installed on my Windows 10 machine, with python3 and ffmpeg etc up to date. When I run any command involving chapterize_ab.py (including "-h"), I get the following error:

File "C:\dev\chapterize-audiobooks-main\chapterize_ab.py", line 40 vosk_link = f"[link={vosk_url}]this link[/link]" ^ SyntaxError: invalid syntax

Any ideas why?

ERROR: Failed to generate timecode file with vosk: module 'srt' has no attribute 'Subtitle'

Hi @patrickenfuego, I'm trying to use your package to chapterize an audiobook, and I've successfully installed (with pip) rich, vosk, and requests and even reinstalled them (after purging pip cache), but I am running into this error. Because I wanted to try and troubleshoot the issue I removed the -loglevel quiet flag from chapterize_ab.py#760 as was suggested in this issue #17 so that the output would be verbose.

This is the error below. Any help would be appreciated. Thank you!
M

pip3 install vosk
Collecting vosk
  Downloading vosk-0.3.45-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.metadata (1.8 kB)
Requirement already satisfied: cffi>=1.0 in /home/homc/miniforge3/lib/python3.10/site-packages (from vosk) (1.16.0)
Requirement already satisfied: requests in /home/homc/miniforge3/lib/python3.10/site-packages (from vosk) (2.31.0)
Requirement already satisfied: tqdm in /home/homc/miniforge3/lib/python3.10/site-packages (from vosk) (4.66.2)
Requirement already satisfied: srt in /home/homc/miniforge3/lib/python3.10/site-packages (from vosk) (3.5.3)
Requirement already satisfied: websockets in /home/homc/miniforge3/lib/python3.10/site-packages (from vosk) (12.0)
Requirement already satisfied: pycparser in /home/homc/miniforge3/lib/python3.10/site-packages (from cffi>=1.0->vosk) (2.22)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/homc/miniforge3/lib/python3.10/site-packages (from requests->vosk) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /home/homc/miniforge3/lib/python3.10/site-packages (from requests->vosk) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/homc/miniforge3/lib/python3.10/site-packages (from requests->vosk) (2.2.1)
Requirement already satisfied: certifi>=2017.4.17 in /home/homc/miniforge3/lib/python3.10/site-packages (from requests->vosk) (2024.2.2)
Downloading vosk-0.3.45-py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (7.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 7.2/7.2 MB 3.4 MB/s eta 0:00:00
Installing collected packages: vosk
Successfully installed vosk-0.3.45

python3 ./chapterize_ab.py /mnt/c/Users/Orca/Downloads/redacted.mp3 -ca /mnt/c/Users/Orca/Downloads/redacted.jpg
─────────────────────────── Starting script 
Preparing chapterfying magic ⚡...
──────────────────────── Extracting metadata 
SUCCESS! Metadata extraction complete
Merging extracted and user metadata...
─────────────────────────── ID3 Metadata 
│ {'title': 'redacted', 'artist': 'redacted', 'album': 'redacted', 'cover_art': PosixPath('/mnt/c/Users/Orca/Downloads/redacted_cover.jpg'), 'genre': 'Audiobook'}        │
──────────────────────── Discovering Cover Art 

SUCCESS! Cover art is...covered!
──────────────────────── Generating Timecodes 
✅ Local ML model found. Language: 'en-us'

▐  ⠠     ▌ Sit tight, this might take a while...ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers
  built with gcc 11 (Ubuntu 11.2.0-19ubuntu1)
  configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared
  libavutil      56. 70.100 / 56. 70.100
  libavcodec     58.134.100 / 58.134.100
  libavformat    58. 76.100 / 58. 76.100
  libavdevice    58. 13.100 / 58. 13.100
  libavfilter     7.110.100 /  7.110.100
  libswscale      5.  9.100 /  5.  9.100
  libswresample   3.  9.100 /  3.  9.100
  libpostproc    55.  9.100 / 55.  9.100
[mp3 @ 0x7fffe79f0340] Estimating duration from bitrate, this may be inaccurate
Input #0, mp3, from '/mnt/c/Users/Orca/Downloads/redacted.mp3':
  Metadata:
    title           : redacted
    artist          : redacted
    album           : redacted
  Duration: 10:45:46.42, start: 0.000000, bitrate: 64 kb/s
  Stream #0:0: Audio: mp3, 44100 Hz, mono, fltp, 64 kb/s
Stream mapping:
  Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native))
Press [q] to stop, [?] for help
Output #0, s16le, to 'pipe:':
  Metadata:
    title           : redacted
    artist          : redacted
    album           : redacted
    encoder         : Lavf58.76.100
  Stream #0:0: Audio: pcm_s16le, 16000 Hz, mono, s16, 256 kb/s
    Metadata:
      encoder         : Lavc58.134.100 pcm_s16le
▐⠂       ▌ Sit tight, this might take a while...[mp3float @ 0x7fffe79f5640] overread, skip -5 enddists: -2 -2
[mp3float @ 0x7fffe79f5640] overread, skip -7 enddists: -2 -2
▐⠠       ▌ Sit tight, this might take a while...[mp3float @ 0x7fffe79f5640] overread, skip -6 enddists: -2 -2
▐ ⠠      ▌ Sit tight, this might take a while...[mp3float @ 0x7fffe79f5640] overread, skip -7 enddists: -5 -5
[mp3float @ 0x7fffe79f5640] overread, skip -5 enddists: -1 -1
▐      ⠈ ▌ Sit tight, this might take a while...[mp3float @ 0x7fffe79f5640] overread, skip -5 enddists: -2 -2
[mp3float @ 0x7fffe79f5640] overread, skip -7 enddists: -5 -5
[mp3float @ 0x7fffe79f5640] overread, skip -5 enddists: -2 -2
▐    ⠂   ▌ Sit tight, this might take a while...[mp3float @ 0x7fffe79f5640] overread, skip -7 enddists: -5 -5
[mp3float @ 0x7fffe79f5640] overread, skip -5 enddists: -1 -1
[mp3float @ 0x7fffe79f5640] overread, skip -7 enddists: -6 -6
[mp3float @ 0x7fffe79f5640] overread, skip -7 enddists: -1 -1
[mp3float @ 0x7fffe79f5640] overread, skip -7 enddists: -2 -2
▐      ⠂ ▌ Sit tight, this might take a while...[mp3float @ 0x7fffe79f5640] overread, skip -5 enddists: -1 -1
[mp3float @ 0x7fffe79f5640] overread, skip -6 enddists: -5 -5
▐⠈       ▌ Sit tight, this might take a while...[mp3float @ 0x7fffe79f5640] overread, skip -6 enddists: -1 -1
[mp3float @ 0x7fffe79f5640] overread, skip -5 enddists: -4 -4
▐    ⠂   ▌ Sit tight, this might take a while...size= 1210825kB time=10:45:46.40 bitrate= 256.0kbits/s speed=19.3x
video:0kB audio:1210825kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.000000%
ERROR: Failed to generate timecode file with vosk: module 'srt' has no attribute 'Subtitle'

upload demo gif

Add Additional Language Support

In a previous release, I modularized the project so it can leverage multiple different languages dynamically. I need help from people who speak those languages to fill out the excluded phrases and chapter separators so more people can use this tool.

Add GUI interface

Add an additional, simple GUI interface for users who are not as comfortable using the command line.

The script doesn't work...

I found your script and I really liked the idea. But I tried to run it and I get stuck all the time!
At first I got the following error:
File "C:\FFOutput\Chapterize-Audiobooks-0.6.0\chapterize_ab.py", line 316, in parse_args
args.audiobook.with_suffix('.cue').exists()
AttributeError: 'NoneType' object has no attribute 'with_suffix'
So I went to line 316 and deleted the
or args.audiobook.with_suffix('.cue').exists()
Now the script started working. And I got the message:
ERROR: The script only works with .mp3 files (for now)

I tried different lines and got the same error:

chapterize_ab.py -h
chapterize_ab.py 'C:\FFOutput\a.mp3' --title 'aaa' --genre 'Fantasy'

I tried at first from the Windows command line, and then also from IDLE (3.11), but without success.
I tried to run older versions of your script, and I got the first error in version 0.5 as well, and the second error in all your versions...
Would appreciate help.
post Scriptum. I don't understand Python that much, so it is not unreasonable that I skipped a step that is obvious to you, simply due to lack of knowledge.

Generate timecode/cue file

After parsing, generate a file which can be used to edit chapter markers in situations where the split points are inaccurate.

Improve Chapter Parsing

Some audiobooks don't use normal keywords that can help identify the start of a chapter. For example, some don't say "chapter" before the identifier, but instead just say "One".

My goal is to help identify these section separators using the surrounding context, allowing for more accurate chapter breaks.

Option to skip writing the mp3s

Is it possible to just write the CUE file and skip writing the mp3s?

Experimental Chapter Separators

Add additional chapter separators:

Preface
Introduction
Foreword
Afterword

Initially these will not be used but can be enabled via a CLI switch until thorough testing is performed.

upload new gif

Feature Request: Some indicator that generate_timecodes is working

It would be nice if there was some indicator that the ffmpeg subprocess was working (maybe a tail of the SRT file) so as a user we can see it's still working through the file and not that the process is hung.

I know we could modify the chapterize_ab.py#760 and remove the -loglevel quiet arg and see that it's working but if a prettier option was available it would be nice.

Inconsitent support for mp4a

I love that the script can detect and splice a big file into chapters but it would be nice it also supported mp4a encoding consistently. The script is able to analyze and generate SRT file from mp4a but it cannot splice the file. It would be nice if the script could detect that the source was encoded using mp4a and automatically convert it to a temporary mp3 file so it can splice it or let the user know before it starts to process it that the encoding is not supported.

I manually converted the file mp4a file to mp3 to confirm that the error reported was due to it being mp4 and not for another reason and it worked as expected.

ffmpeg_log.txtx

[mov,mp4,m4a,3gp,3g2,mj2 @ 0x55e69758ce40] Discarding ID3 tags because more suitable tags were found.
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/mnt/Podcast/Vaughn_Heppner/Star_Raider/Star_Raider.mp3':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6mp41
    creation_time   : 2023-11-20T06:27:35.000000Z
  Duration: 12:18:42.06, start: 0.000000, bitrate: 129 kb/s
  Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default)
    Metadata:
      creation_time   : 2023-11-20T06:27:35.000000Z
      handler_name    : ISO Media file produced by Google Inc.
      vendor_id       : [0][0][0][0]
Input #1, image2, from '/mnt/Podcast/Vaughn_Heppner/Star_Raider/star_raider.jpg':
  Duration: 00:00:00.04, start: 0.000000, bitrate: 14626 kb/s
  Stream #1:0: Video: mjpeg (Progressive), yuvj444p(pc, bt470bg/unknown/unknown), 362x342 [SAR 300:300 DAR 181:171], 25 fps, 25 tbr, 25 tbn, 25 tbc
[mp3 @ 0x55e6975e6680] Invalid audio stream. Exactly one MP3 audio stream is required.
Could not write header for output file #0 (incorrect codec parameters ?): Invalid argument
Error initializing output stream 0:1 -- 
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #1:0 -> #0:1 (copy)
    Last message repeated 1 times
----------------------------------------------------

********************************************************
NEW LOG START
********************************************************

[mov,mp4,m4a,3gp,3g2,mj2 @ 0x5641cafe2e80] Discarding ID3 tags because more suitable tags were found.
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/mnt/Podcast/Vaughn_Heppner/Star_Raider/Star_Raider.mp3':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6mp41
    creation_time   : 2023-11-20T06:27:35.000000Z
  Duration: 12:18:42.06, start: 0.000000, bitrate: 129 kb/s
  Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default)
    Metadata:
      creation_time   : 2023-11-20T06:27:35.000000Z
      handler_name    : ISO Media file produced by Google Inc.
      vendor_id       : [0][0][0][0]
Input #1, image2, from '/mnt/Podcast/Vaughn_Heppner/Star_Raider/star_raider.jpg':
  Duration: 00:00:00.04, start: 0.000000, bitrate: 14626 kb/s
  Stream #1:0: Video: mjpeg (Progressive), yuvj444p(pc, bt470bg/unknown/unknown), 362x342 [SAR 300:300 DAR 181:171], 25 fps, 25 tbr, 25 tbn, 25 tbc
[mp3 @ 0x5641cb03c180] Invalid audio stream. Exactly one MP3 audio stream is required.
Could not write header for output file #0 (incorrect codec parameters ?): Invalid argument
Error initializing output stream 0:1 -- 
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #1:0 -> #0:1 (copy)
    Last message repeated 1 times
----------------------------------------------------

[mov,mp4,m4a,3gp,3g2,mj2 @ 0x556de9d23e80] Discarding ID3 tags because more suitable tags were found.
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/mnt/Podcast/Vaughn_Heppner/Star_Raider/Star_Raider.mp3':
  Metadata:
    major_brand     : dash
    minor_version   : 0
    compatible_brands: iso6mp41
    creation_time   : 2023-11-20T06:27:35.000000Z
  Duration: 12:18:42.06, start: 0.000000, bitrate: 129 kb/s
  Stream #0:0(und): Audio: aac (LC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 127 kb/s (default)
    Metadata:
      creation_time   : 2023-11-20T06:27:35.000000Z
      handler_name    : ISO Media file produced by Google Inc.
      vendor_id       : [0][0][0][0]
Input #1, image2, from '/mnt/Podcast/Vaughn_Heppner/Star_Raider/star_raider.jpg':
  Duration: 00:00:00.04, start: 0.000000, bitrate: 14626 kb/s
  Stream #1:0: Video: mjpeg (Progressive), yuvj444p(pc, bt470bg/unknown/unknown), 362x342 [SAR 300:300 DAR 181:171], 25 fps, 25 tbr, 25 tbn, 25 tbc
[mp3 @ 0x556de9d7d180] Invalid audio stream. Exactly one MP3 audio stream is required.
Could not write header for output file #0 (incorrect codec parameters ?): Invalid argument
Error initializing output stream 0:1 -- 
Stream mapping:
  Stream #0:0 -> #0:0 (copy)
  Stream #1:0 -> #0:1 (copy)
    Last message repeated 1 times
----------------------------------------------------
...

Failure downloading model

I got the error:

Traceback (most recent call last):
  File "/home/savant/Projects/Chapterize-Audiobooks/chapterize_ab.py", line 1078, in <module>
    main()
  File "/home/savant/Projects/Chapterize-Audiobooks/chapterize_ab.py", line 970, in main
    audiobook_file, in_metadata, lang, model_name, model_type, cue_file = parse_args()
  File "/home/savant/Projects/Chapterize-Audiobooks/chapterize_ab.py", line 316, in parse_args
    args.audiobook.with_suffix('.cue').exists()
AttributeError: 'NoneType' object has no attribute 'with_suffix'

Command: python3.10 chapterize_ab.py -dm -l pl

Executed inside venv
OS: debian 11
Installed dependencies using pip3.10 install -r requirements.txt

pip log:

Requirement already satisfied: rich>=12.6.0 in ./lib/python3.10/site-packages (from -r requirements.txt (line 1)) (13.5.3)
Requirement already satisfied: vosk>=0.3.44 in ./lib/python3.10/site-packages (from -r requirements.txt (line 2)) (0.3.45)
Requirement already satisfied: requests>=2.28.0 in ./lib/python3.10/site-packages (from -r requirements.txt (line 3)) (2.31.0)
Requirement already satisfied: markdown-it-py>=2.2.0 in ./lib/python3.10/site-packages (from rich>=12.6.0->-r requirements.txt (line 1)) (3.0.0)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in ./lib/python3.10/site-packages (from rich>=12.6.0->-r requirements.txt (line 1)) (2.16.1)
Requirement already satisfied: cffi>=1.0 in ./lib/python3.10/site-packages (from vosk>=0.3.44->-r requirements.txt (line 2)) (1.16.0)
Requirement already satisfied: tqdm in ./lib/python3.10/site-packages (from vosk>=0.3.44->-r requirements.txt (line 2)) (4.66.1)
Requirement already satisfied: srt in ./lib/python3.10/site-packages (from vosk>=0.3.44->-r requirements.txt (line 2)) (3.5.3)
Requirement already satisfied: websockets in ./lib/python3.10/site-packages (from vosk>=0.3.44->-r requirements.txt (line 2)) (11.0.3)
Requirement already satisfied: certifi>=2017.4.17 in ./lib/python3.10/site-packages (from requests>=2.28.0->-r requirements.txt (line 3)) (2023.7.22)
Requirement already satisfied: idna<4,>=2.5 in ./lib/python3.10/site-packages (from requests>=2.28.0->-r requirements.txt (line 3)) (3.4)
Requirement already satisfied: charset-normalizer<4,>=2 in ./lib/python3.10/site-packages (from requests>=2.28.0->-r requirements.txt (line 3)) (3.2.0)
Requirement already satisfied: urllib3<3,>=1.21.1 in ./lib/python3.10/site-packages (from requests>=2.28.0->-r requirements.txt (line 3)) (2.0.5)
Requirement already satisfied: pycparser in ./lib/python3.10/site-packages (from cffi>=1.0->vosk>=0.3.44->-r requirements.txt (line 2)) (2.21)
Requirement already satisfied: mdurl~=0.1 in ./lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich>=12.6.0->-r requirements.txt (line 1)) (0.1.2)
WARNING: You are using pip version 21.2.3; however, version 23.2.1 is available.
You should consider upgrading via the '/home/savant/Projects/Chapterize-Audiobooks/bin/python3.10 -m pip install --upgrade pip' command.

Convert to m4b

Option to convert an mp3 file to m4b with embedded chapter metadata.