Implementation of the M-SpeechCLIP model, introduced in the paper "M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval" (https://arxiv.org/abs/2211.01180)
Hello, thank you very much for open-sourcing the code for M-SpeechCLIP, as I have been trying to replicate the paper's results starting from the original SpeechCLIP implementation. This will be an immense help in my research.
I wanted to ask: is there any plan to share the pre-trained checkpoints too? If not possible, I will train from scratch following the guidelines. To clarify, the checkpoint would be used for non-commercial, research purposes only.
Please let me know, and again thank you so much!๐