I'm able to use example.py for inference with the base parralel flickr model but I get the following error when I use the cascaded models intead i.e. model_fp = "slt_ckpts/SpeechCLIP/base/flickr/cascaded/epoch_58-step_6902-val_recall_mean_1_7.7700.ckpt" or model_fp = "slt_ckpts/SpeechCLIP/large/flickr/cascaded/epoch_187-step_21995-val_recall_mean_10_62.7700.ckpt" or model_fp = "slt_ckpts/SpeechCLIP/large/coco/cascaded/epoch_12-step_28794-val_recall_mean_10_36.1455.ckpt"
Traceback (most recent call last):
File "/work/07469/lpugalen/ls6/SpeechCLIP/example.py", line 61, in
speechFeatVector_baseFlickrCascasdedModel= baseFlickrCascasdedModel.encode_speech(wav=wav_data)#["cascaded_audio_feat"]
File "/work/07469/lpugalen/ls6/SpeechCLIP/avssl/model/kwClip.py", line 1340, inencode_speech
cascaded_audio_feat, vq_results, keywords = self.cascaded_branch(
File "/work/07469/lpugalen/ls6/SpeechCLIP/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/work/07469/lpugalen/ls6/SpeechCLIP/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/work/07469/lpugalen/ls6/SpeechCLIP/avssl/model/kwClip.py", line 914, in forward
audio_feat = self.clip.encode_keywords(keywords, self.keyword_num)
File "/work/07469/lpugalen/ls6/SpeechCLIP/avssl/module/clip_official.py", line 249, in encode_keywords
x = self.model.token_embedding(text)
File "/work/07469/lpugalen/ls6/SpeechCLIP/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/work/07469/lpugalen/ls6/SpeechCLIP/torch/nn/modules/module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "/work/07469/lpugalen/ls6/SpeechCLIP/torch/nn/modules/sparse.py", line 163, in forward
return F.embedding(
File "/work/07469/lpugalen/ls6/SpeechCLIP/torch/nn/functional.py", line 2237, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)