audio-agi / audiosep Goto Github PK

View Code? Open in Web Editor NEW

1.5K 64.0 111.0 16.51 MB

Official implementation of "Separate Anything You Describe"

Home Page: https://audio-agi.github.io/Separate-Anything-You-Describe/

License: MIT License

Python 99.23% Jupyter Notebook 0.77%

audiosep's Introduction

Separate Anything You Describe

This repository contains the official implementation of "Separate Anything You Describe".

We introduce AudioSep, a foundation model for open-domain sound separation with natural language queries. AudioSep demonstrates strong separation performance and impressive zero-shot generalization ability on numerous tasks, such as audio event separation, musical instrument separation, and speech enhancement. Check out the separated audio examples on the Demo Page!

Setup

Clone the repository and setup the conda environment:

git clone https://github.com/Audio-AGI/AudioSep.git && \
cd AudioSep && \ 
conda env create -f environment.yml && \
conda activate AudioSep

Download model weights at checkpoint/.

If you're using this checkpoint for the DCASE 2024 Task 9 challenge participation, please note that this checkpoint was trained using audio at 32k Hz, with a window size of 2048 points and a hop size of 320 points in the STFT operation, which is different with the challenge baseline system provided (16k Hz, window size 1024, hop size 160).

Inference

from pipeline import build_audiosep, inference
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = build_audiosep(
      config_yaml='config/audiosep_base.yaml', 
      checkpoint_path='checkpoint/audiosep_base_4M_steps.ckpt', 
      device=device)

audio_file = 'path_to_audio_file'
text = 'textual_description'
output_file='separated_audio.wav'

# AudioSep processes the audio at 32 kHz sampling rate  
inference(model, audio_file, text, output_file, device)

To load directly from Hugging Face, you can do the following:

from models.audiosep import AudioSep
from utils import get_ss_model
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

ss_model = get_ss_model('config/audiosep_base.yaml')

model = AudioSep.from_pretrained("nielsr/audiosep-demo", ss_model=ss_model)

audio_file = 'path_to_audio_file'
text = 'textual_description'
output_file='separated_audio.wav'

# AudioSep processes the audio at 32 kHz sampling rate  
inference(model, audio_file, text, output_file, device)

Use chunk-based inference to save memory:

inference(model, audio_file, text, output_file, device, use_chunk=True)

Training

To utilize your audio-text paired dataset:

Format your dataset to match our JSON structure. Refer to the provided template at datafiles/template.json.
Update the config/audiosep_base.yaml file by listing your formatted JSON data files under datafiles. For example:

data:
    datafiles:
        - 'datafiles/your_datafile_1.json'
        - 'datafiles/your_datafile_2.json'
        ...

Train AudioSep from scratch:

python train.py --workspace workspace/AudioSep --config_yaml config/audiosep_base.yaml --resume_checkpoint_path checkpoint/ ''

Finetune AudioSep from pretrained checkpoint:

python train.py --workspace workspace/AudioSep --config_yaml config/audiosep_base.yaml --resume_checkpoint_path path_to_checkpoint

Benchmark Evaluation

Download the evaluation data under the evaluation/data folder. The data should be organized as follows:

evaluation:
    data:
        - audioset/
        - audiocaps/
        - vggsound/
        - music/
        - clotho/
        - esc50/

Run benchmark inference script, the results will be saved at eval_logs/

python benchmark.py --checkpoint_path audiosep_base_4M_steps.ckpt

"""
Evaluation Results:

VGGSound Avg SDRi: 9.144, SISDR: 9.043
MUSIC Avg SDRi: 10.508, SISDR: 9.425
ESC-50 Avg SDRi: 10.040, SISDR: 8.810
AudioSet Avg SDRi: 7.739, SISDR: 6.903
AudioCaps Avg SDRi: 8.220, SISDR: 7.189
Clotho Avg SDRi: 6.850, SISDR: 5.242
"""

Cite this work

If you found this tool useful, please consider citing

@article{liu2023separate,
  title={Separate Anything You Describe},
  author={Liu, Xubo and Kong, Qiuqiang and Zhao, Yan and Liu, Haohe and Yuan, Yi, and Liu, Yuzhuo, and Xia, Rui and Wang, Yuxuan, and Plumbley, Mark D and Wang, Wenwu},
  journal={arXiv preprint arXiv:2308.05037},
  year={2023}
}

@inproceedings{liu22w_interspeech,
  title={Separate What You Describe: Language-Queried Audio Source Separation},
  author={Liu, Xubo and Liu, Haohe and Kong, Qiuqiang and Mei, Xinhao and Zhao, Jinzheng and Huang, Qiushi, and Plumbley, Mark D and Wang, Wenwu},
  year=2022,
  booktitle={Proc. Interspeech},
  pages={1801--1805},
}

Contributors :

audiosep's People

Contributors

Stargazers

Watchers

Forkers

maxmax2016 zhongshijun jaedukseo adrianwangzhao yomaser positivewon shaun95 01001101ilad murilo chukwuemerie-ezieke maxathon2020 xiaozhuo12138 milanturyna arsendraqan badayvedat ishine split-sound tomchapin entn-at ruohoruotsi lionzhang-001 tylarcam ankitshah009 jinzaizhichi navezjt wx69wx commerceless xymfei marwanto606 cellinlab willnco orange-hwang nielsrogge f901107 techthiyanes kekewind kbb99 ailabteam andrewddc proryanator-forks lepeng hmysy hjhcos thanhpham1987 criticalpulsar philippenguyen lamquangtuong kou-bin coal2001 chenxwh kingkong135 emanuelfromflorence paperwave jags111 cuanchai octag0no eltociear yangchuang-allen bhargavshirin wb-video kris248 farookhnitap rorosan farukhs52 kushal34712 rs-labhub jensinjames andredacity shresthasurav guileen matthewgard1 aryan4884 harshhere905 anurag8789512 mohitd404 momu-2016 techuuu themindexpansionnetwork th3-m1nd-3xpansi0n-n3xus alienishi anitalp fenardh aigongshe davyxx3 0armaan025 unna97 huanganhai mohammad-parvizi-dev daaniyaan render-ai haninatwany kalyani2003 sergey-baranenkov liujingxiu23 musikamelabs shanggdlk coldinjection seeyxceo lemonhall alexshcer

audiosep's Issues

Unable to load music_speech_audioset model

I tried using the Colab notebook. The first model checkpoint loads without any issue, however, the second model checkpoint leads to an error during the model initialization. Below is the snippet of the code that downloads the model checkpoints and attempts to initialize the model:

model = build_audiosep(
    config_yaml='config/audiosep_base.yaml',
    checkpoint_path=str(models[1][1]),
)

Upon executing the model initialization, a KeyError related to pytorch-lightning_version is encountered, as shown below:

KeyError: 'pytorch-lightning_version'

Additionally, a warning concerning the initialization of RobertaModel with some weights not being used is thrown, although it's unclear if this warning is related to the KeyError.

Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.dense.bias', 'lm_head.decoder.weight', 'lm_head.dense.weight', 'lm_head.layer_norm.weight', 'lm_head.bias', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).

The issue seems to arise specifically with the second model checkpoint music_speech_audioset_epoch_15_esc_89.98.pt. I would appreciate any guidance or suggestions on how to resolve this KeyError and successfully load the second model checkpoint for further use.

Thank you.

What is the scope of "Anything"?

It is an interesting work and the task it aims to do is as exciting as SAM to me.
But I am not familar with audio research and I do have some questions related to this work.

Firstly, I checked the dataset amd it seems not very complete for "sound separation" or "separate anything in audio".
Actually I tried some samples for "separate vocal from songs", I found no matter use "Human Sounds" or "Vocal" the model cannot separate it even from a very slow and simple "guitar playing and singing" sample. And reversely I tried "acoustic guitar", it contains some vocal which is obvious.
Am I misunderstanding the scope that "songs" do not belong to music and the scope of this work?

Secondly, I would like to ask why it is foundation. It seems multimodal or multiple types of inputs = foundation model as I do not know what it provides for the "downstream tasks". Can someone provide me the insights?

multi-gpu support

Thank you for sharing this project. I am wondering if there is, or will be support for multi-GPU inference. Currently I am unable to run the inference on a 3090
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 5.50 GiB (GPU 0; 23.69 GiB total capacity; 15.94 GiB already allocated; 2.85 GiB free; 19.24 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

how to construct training sample pairs?

Nice work!

How do you construct the training pairs? It looks like you construct the training pairs in the SegmentMixer class. Do you use the same minibatch sources to construct the "mixture" and "target audio" pairs?

There might be one issue:
source1: male speech , s1
source2: another male speech, s2
if mixture = s1 + s2:
as both captions are "male speech", will it confuse the model training?

pytorch_lightning, DDP, GPU stucked at 100%, training stopped

Do you encounter this issue? Any suggestions?
Training after one epoch, somewhere in the middle of the 2nd epoch training, all GPU stucked at 100% without error. Training is also stucked. It seems like a common bug of pytorch_lightning using DDP. But I still did not find a solution.

Lightning-AI/pytorch-lightning#11242

will colab be available?

ETA on release?

ETA on release? Any date?

Is something broken? This is really bad.

Tried locally and didn't get much, then tried HF space and the model didn't really seperate anything. Was it overfit on the demo data? Barely any difference between input and output on a random song from my library with organ, drums, vocal.

Support for negative prompts

Does AudioSep have the potential to support negative prompts?

How to do speech separation

Hi
I have an audio clip in which two people are talking and there is noise at the same time.. How can I separate the audio of two speaker separately? Thanks！

Conda failed to create environment

Machine: ASUS TUF Dash F15 FX517ZE_FX517ZE
OS: Windows 11 Education 23H2, 64 bit
Version: 10.0.22631 Build 22631
GPU: Nvidia RTX 3050

D:\AudioSep>conda env create -f environment.yml
Retrieving notices: ...working... done
Collecting package metadata (repodata.json): done
Solving environment: failed
Error:
ResolvePackageNotFound:

urllib3==1.26.14=py310h06a4308_0
jupyter_core==5.3.0=py310h06a4308_0
libcusolver==11.3.4.124=h33c3c4e_0
cuda-nvrtc==11.6.124=h020bade_0
ld_impl_linux-64==2.38=h1181459_1
libdeflate==1.17=h5eee18b_0
tornado==6.2=py310h5eee18b_0
cuda-nvml-dev==11.6.55=haa9ef22_0
gmp==6.2.1=h295c915_3
openh264==2.1.1=h4ff587b_0
pytorch==1.13.1=py3.10_cuda11.6_cudnn8.3.2_0
cuda-cuxxfilt==11.6.124=hecbf4f6_0
libcusparse-dev==11.7.2.124=hbbe9722_0
libcufft-dev==10.7.1.112=ha5ce4c0_0
zstd==1.5.4=hc292b87_0
freetype==2.12.1=h4a9f257_0
gnutls==3.6.15=he1e5248_0
tk==8.6.12=h1ccaba5_0
lz4-c==1.9.4=h6a678d5_0
cuda-nsight==12.1.55=0
libcusparse==11.7.2.124=h7538f96_0
cuda-nvrtc-dev==11.6.124=h249d397_0
brotlipy==0.7.0=py310h7f8727e_1002
psutil==5.9.0=py310h5eee18b_0
ruamel.yaml.clib==0.2.6=py310h5eee18b_1
_openmp_mutex==5.1=1_gnu
cryptography==38.0.4=py310h9ce1e76_0
libcublas-dev==11.9.2.110=h5c901ab_0
libcufile-dev==1.6.0.25=0
cuda-nvprune==11.6.124=he22ec0a_0
ca-certificates==2023.01.10=h06a4308_0
libiconv==1.16=h7f8727e_2
libcublas==11.9.2.110=h5e84587_0
libwebp-base==1.2.4=h5eee18b_1
tqdm==4.64.1=py310h06a4308_0
libsodium==1.0.18=h7b6447c_0
libunistring==0.9.10=h27cfd23_0
gds-tools==1.6.0.25=0
libtiff==4.5.0=h6a678d5_2
readline==8.2=h5eee18b_0
cuda-gdb==12.1.55=0
libidn2==2.3.2=h7f8727e_0
libtasn1==4.19.0=h5eee18b_0
lame==3.100=h7b6447c_0
conda-content-trust==0.1.3=py310h06a4308_0
bzip2==1.0.8=h7b6447c_0
setuptools==65.6.3=py310h06a4308_0
libffi==3.4.2=h6a678d5_6
requests==2.28.1=py310h06a4308_0
libnvjpeg-dev==11.6.2.124=hb5906b9_0
nest-asyncio==1.5.6=py310h06a4308_0
conda-package-handling==2.0.2=py310h06a4308_0
debugpy==1.5.1=py310h295c915_0
sqlite==3.40.1=h5082296_0
mkl-service==2.4.0=py310h7f8727e_0
numpy-base==1.23.5=py310h8e6c178_0
conda==23.3.1=py310h06a4308_0
libgcc-ng==11.2.0=h1234567_1
pip==22.3.1=py310h06a4308_0
intel-openmp==2021.4.0=h06a4308_3561
libnpp-dev==11.6.3.124=h3c42840_0
boltons==23.0.0=py310h06a4308_0
pycosat==0.6.4=py310h5eee18b_0
pyzmq==23.2.0=py310h6a678d5_0
cuda-nvcc==11.6.124=hbba6d2d_0
ipython==8.12.0=py310h06a4308_0
ncurses==6.4=h6a678d5_0
nettle==3.7.3=hbbd107a_1
libwebp==1.2.4=h11a3e52_1
ruamel.yaml==0.17.21=py310h5eee18b_0
zstandard==0.18.0=py310h5eee18b_0
cffi==1.15.1=py310h5eee18b_3
jpeg==9e=h5eee18b_1
xz==5.2.10=h5eee18b_1
libuuid==1.41.5=h5eee18b_0
certifi==2022.12.7=py310h06a4308_0
mkl_random==1.2.2=py310h00e6091_0
flit-core==3.8.0=py310h06a4308_0
libcufile==1.6.0.25=0
libgomp==11.2.0=h1234567_1
giflib==5.2.1=h5eee18b_3
libpng==1.6.39=h5eee18b_0
lerc==3.0=h295c915_0
typing_extensions==4.4.0=py310h06a4308_0
cuda-cupti==11.6.124=h86345e5_0
idna==3.4=py310h06a4308_0
libstdcxx-ng==11.2.0=h1234567_1
platformdirs==2.5.2=py310h06a4308_0
ipykernel==6.19.2=py310h2f386ee_0
matplotlib-inline==0.1.6=py310h06a4308_0
pluggy==1.0.0=py310h06a4308_1
zlib==1.2.13=h5eee18b_0
lcms2==2.12=h3be6417_0
numpy==1.23.5=py310hd5efca6_0
cuda-cccl==11.6.55=hf6102b2_0
mkl_fft==1.3.1=py310hd6ae3a3_0
libnpp==11.6.3.124=hd2722f0_0
libcufft==10.7.1.112=hf425ae0_0
comm==0.1.2=py310h06a4308_0
packaging==23.0=py310h06a4308_0
pysocks==1.7.1=py310h06a4308_0
cuda-driver-dev==11.6.55=0
cuda-nvtx==11.6.124=h0630a44_0
cuda-cudart==11.6.55=he381448_0
libnvjpeg==11.6.2.124=hd473ad6_0
conda-package-streaming==0.7.0=py310h06a4308_0
mkl==2021.4.0=h06a4308_640
openssl==1.1.1t=h7f8727e_0
cuda-cuobjdump==11.6.124=h2eeebcb_0
jupyter_client==8.1.0=py310h06a4308_0
cuda-cudart-dev==11.6.55=h42ad0f4_0
python==3.10.9=h7a1cb2a_0
cuda-samples==11.6.101=h8efea70_0
zeromq==4.3.4=h2531618_0
toolz==0.12.0=py310h06a4308_0

Feature: Adding contributors section to the README.md file.

There is no Contributors section in readme file .
As we know Contributions are what make the open-source community such an amazing place to learn, inspire, and create.
The Contributors section in a README.md file is important as it acknowledges and gives credit to those who have contributed to a project, fosters community and collaboration, adds transparency and accountability, and helps document the project's history for current and future maintainers. It also serves as a form of recognition, motivating contributors to continue their efforts.

Requesting Code reference for metrics calculation

Team, I was wondering if you haven't included code reference for evaluation metrics such as CSIG,CBAK, COVL, SSNR etc.. ? Do you have any plans for this please ?

Can this be used as a classifier?

I want to detect if an audio sample contains laughing, can this model help me do that?

Query About Template.json

{
"data": [
{
"wav": "path_to_audio_file",
"caption": "textual_desciptions"
}
]
}

Do we put mixtures in here or individual items like flute audio?
Can you provide like an example with actual values filled in the json object? @liuxubo717

Error when using music_speech..._89.98.pt: pytorch-lightning_version

From your paper, I wasn't sure of the role/purpose of music_speech_audioset_epoch_15_esc_89.98.pt

Are these the saved model weights one should use if one wants to focus on separation of musical instruments from one another, say? Or is audiosep_base_4M_steps.ckpt still applicable in such use cases?

When I edited your example inference code from the readme to use music_speech_audioset_epoch_15_esc_89.98.pt on a Linux machine running Ubuntu, I got the following error.

Please clarify the purpose/use of this checkpoint, and if it is meant to be used, whether I need to modify the example inference code further.

Thanks!

Traceback (most recent call last):
File "/home/blah/repos/AudioSep/sayd_infer_example.py", line 6, in
model = build_audiosep(
File "/home/blah/repos/AudioSep/pipeline.py", line 17, in build_audiosep
model = load_ss_model(
File "/home/blah/repos/AudioSep/utils.py", line 387, in load_ss_model
pl_model = AudioSep.load_from_checkpoint(
File "/home/blah/anaconda3/envs/AudioSep/lib/python3.10/site-packages/lightning/pytorch/core/module.py", line 1532, in load_from_checkpoint
loaded = _load_from_checkpoint(
File "/home/blah/anaconda3/envs/AudioSep/lib/python3.10/site-packages/lightning/pytorch/core/saving.py", line 65, in _load_from_checkpoint
checkpoint = _pl_migrate_checkpoint(
File "/home/blah/anaconda3/envs/AudioSep/lib/python3.10/site-packages/lightning/pytorch/utilities/migration/utils.py", line 113, in _pl_migrate_checkpoint
old_version = _get_version(checkpoint)
File "/home/blah/anaconda3/envs/AudioSep/lib/python3.10/site-packages/lightning/pytorch/utilities/migration/utils.py", line 136, in _get_version
return checkpoint["pytorch-lightning_version"]
KeyError: 'pytorch-lightning_version'

Adding Contributors section to the readme.md

Why Contributors section:- A "Contributors" section in a repo gives credit to and acknowledges the people who have helped with the project, fosters a sense of community, and helps others know who to contact for questions or issues related to the project.

Issue type

[✅] Docs

Demo Image :-

@liuxubo717 kindly assign this issue to me ! I would love to work on it ! thank you !

ImportError: cannot import name 'inference' from 'pipeline'

this is probably just me being really bad at coding, I'm trying to run the example inference code in README and am getting this error:
ImportError: cannot import name 'inference' from 'pipeline' (/home/jordancruz/Tools/AudioSep/pipeline.py)

am I doing something wrong?

Adding code-of-conduct & Contributors.md File to the repo!

code-of-conduct:- We propose adding a comprehensive Code of Conduct to our repository to ensure
a safe, respectful, and inclusive environment for all contributors and users. This code will
serve as a guideline for behavior, promoting diversity, reducing conflicts, and attracting a
wider range of perspectives.

Contributing.md:- A "Contributing.md" file is added to a repository to provide guidelines and
instructions for potential contributors on how to collaborate effectively with the project.
It typically includes information on coding standards, how to submit changes, reporting issues,
and other important details to streamline the contribution process and maintain a healthy
open-source community.

Issue type

[✅] Docs

@liuxubo717 kindly assign this issue to me ! I would love to work on it ! Thank You !

Solving environment: failed

Get stuck here when trying setup instructions -

(base) PS C:\Users\flush\AaudioSep> conda env create -f environment.yml
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound:

libnpp-dev==11.6.3.124=h3c42840_0
cryptography==38.0.4=py310h9ce1e76_0
conda==23.3.1=py310h06a4308_0
conda-package-streaming==0.7.0=py310h06a4308_0
cuda-nvrtc-dev==11.6.124=h249d397_0
lcms2==2.12=h3be6417_0
---and so on...

RuntimeError: could not create a primitive

I wanted to share the following error I got after trying to run the inference_script in the README updating: query, input and output file.

$ python3 inference_script.py 
/$USER/miniconda3/envs/AudioSep/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1670525552843/work/aten/src/ATen/native/TensorShape.cpp:3190.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.layer_norm.bias', 'lm_head.decoder.weight', 'lm_head.bias', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of the model checkpoint at roberta-base were not used when initializing RobertaModel: ['lm_head.layer_norm.bias', 'lm_head.decoder.weight', 'lm_head.bias', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.dense.weight']
- This IS expected if you are initializing RobertaModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Load AudioSep model from [checkpoint/audiosep_base_4M_steps.ckpt]
Separate audio from [/my/file/path/file.wav] with textual query [my_textual_query_to_separate]
Traceback (most recent call last):
  File "/file/to/local/audio-agi/AudioSep/inference_script.py", line 16, in <module>
    inference(model, audio_file, text, output_file, device)
  File "/file/to/local/audio-agi/AudioSep/pipeline.py", line 47, in inference
    sep_segment = model.ss_model(input_dict)["waveform"]
  File "/$USER/miniconda3/envs/AudioSep/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/path/to/local/AudioSep/models/resunet.py", line 648, in forward
    output_dict = self.base(
  File "/$USER/miniconda3/envs/AudioSep/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/file/to/local/AudioSep/models/resunet.py", line 555, in forward
    x = self.pre_conv(x)
  File "/$USER/miniconda3/envs/AudioSep/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/$USER/miniconda3/envs/AudioSep/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "/$USER/miniconda3/envs/AudioSep/lib/python3.10/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: could not create a primitive

This error was created in the latest commit: 2150ca8
In last snippet I changed the paths for readability.

Additionally, on another note, I had a previous issue:

File "/$USER/miniconda3/lib/python3.11/ctypes/__init__.py", line 376, in __init__
    self._handle = _dlopen(self._name, mode)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: libcudart.so.12: cannot open shared object file: No such file or directory

That got solved by adding the following to my bashrc:

export LD_LIBRARY_PATH=/us/local/cuda/lib64:$LD_LIBRARY_PATH