Comments (9)
"audio" is an argument only of NeMo 2.0, which is the current main branch, and only it supports tensors.
The old 1.23 Nemo version only supports path2audio_files and does not accept tensors.
from nemo.
Okay, even after install from source, its not able to transcribe the whole tensor as it used to earlier. Still its only transcribing first 100000 samples of the waveform
from nemo.
What's the error trace ? Or just finished after 100K. Btw if you transcribe that much data you run the risk of OOM CPU ram. You might want to try the new transcribe_generator() instead if it's OOM you're facing
from nemo.
This is my code:
import nemo.collections.asr as nemo_asr
import torch
asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained(model_name="stt_hi_conformer_ctc_medium", map_location=torch.device('cuda:0'))
aud, sr = torchaudio.load("1.wav") # Audio is of 35 seconds
aud = torchaudio.functional.resample(aud, sr, 16000)
aud = aud.mean(dim=0)
# Try 1
out = asr_model.transcribe(audio=aud)
print(out[0])
## output: 'फाइनेैंस मैनेजमेंट का सबसे बेसिक कॉन्सेप्ट है इन्वेस्टि को समझना और सी तरीके से इम्प्लीमेंट करना मतलब है अपने इ प्ान बना कै पे इसतेमाल किए जाएंगे'
### Expected:
## 'फाइनेंस मैनेजमेंट का सबसे बेसिक कॉन्सेप्ट है बजटिंग सेविंग और इन्वेस्टिंग को समझना और सही तरीके से इम्प्लीमेंट करना बजटिंग का मतलब है अपने इनकम और एक्सपेंसेस को ट्रैक करना और एक प्लान बनाना कि कैसे पैसे इस्तेमाल किए जाएंगे सेविंग में हम पैसों का एक हिस्सा अलग करके फ्यूचर के लिए रखते हैं और इन्वेस्टिंग में हम पैसों को ग्रोथ के लिए अलग अलग तरीकों से इस्तेमाल करते हैं जैसे स्टॉक्स बॉन्ड्स या रियल एस्टेट में इन्वेस्ट करके जब हम इन तीनों को सही तरीके से मैनेज करते हैं तब हम अपनी फाइनेंशियल स्टेबिलिटी को इम्प्रूव कर सकते हैं'
# Try 2:
config = nemo_asr.parts.mixins.transcription.TranscribeConfig(batch_size = 1)
gen_out = asr_model.transcribe_generator(aud,config)
print(next(gen_out))
# output: ['फाइनेैंस मैनेजमेंट का सबसे बेसिक कॉन्सेप्ट है इन्वेस्टि को समझना और सी तरीके से इम्प्लीमेंट करना मतलब है अपने इ प्ान बना कै पे इसतेमाल किए जाएंगे']
@titu1994 as we can clearly see, in both the outputs its not transcribing whole audio
from nemo.
I see now. In both case, a dummy data loader is used which has duration set to 100000 - this doesn't matter, the model computes the actual duration on the fly. Ignore the 100000.
Have you listened to the audio file yourself ? 35 second audio file and that much expected text - it is possibly spoken far too fast, or the resample is causing a bug causing the model to be unable to predict properly. Write the file to disk after resampling and hear the audio fully to see if there's issues in it.
from nemo.
Yes, the audio has continuous speech, at normal rate. I have resampled the audio using ffmpeg to 16000 with mono
from nemo.
@titu1994 can you tell me how I can use the model to transcribe at least a 48 seconds audio? If its 16k Hz, and a mono sample
from nemo.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
from nemo.
This issue was closed because it has been inactive for 7 days since being marked as stale.
from nemo.
Related Issues (20)
- [rank1]: AttributeError: 'NoneType' object has no attribute 'get' (finetuning Mamba Hybrid) HOT 1
- fastconformer hybrid recipe reports strange val_WER with `nemo:24.07` and `nemo:dev` HOT 1
- SFT training getting nan loss when using PP=4, TP=4 and model params > 7b HOT 1
- ERROR: Could not find a version that satisfies the requirement triton (from nemo-toolkit) (from versions: none) HOT 1
- Error in converting LLaMA3.1 nemo checkpoint into HF HOT 2
- Nemo ASR: TypeError: ConfidenceConfig.__init__() got an unexpected keyword argument 'tdt_include_duration' HOT 2
- Unusually high initial loss during continual pre-training of the Gemma2-2B model.
- Can't run basic inference HOT 2
- Continual training error: FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpbqgpune1/model_weights/model.decoder.layers.self_attention.linear_proj._extra_state/shard_0_16.pt'
- 00_NeMo_Primer.ipynb in Google Collab fail HOT 2
- Convert Mamba2 Hybrid .nemo model to .safetensors / .bin
- RuntimeError: stack expects each tensor to be equal size (when using lhotse shar data sets)
- megatron.core.dist_checkpointing.core.CheckpointingException: Object shard /ckpt/model_weights/model.decoder.layers.self_attention.core_attention._extra_state/shard_0_80.pt not found
- Problem running LoRA PEFT on Llama 3 8B Instruct using NeMo docker container
- Cosine Similarity to Probability HOT 1
- Add a checkpoint averaging script for the new .distcp checkpoint format
- ImportError: cannot import name '_TORCH_GREATER_EQUAL_2_0' from 'lightning_fabric.utilities.imports' (/usr/local/lib/python3.10/dist-packages/lightning_fabric/utilities/imports.py)
- NeMo Container 24.07: NLPSaveRestoreConnector.save_to() is calling modelopt.torch.opt.plugins.save_sharded_modelopt_state with an unsupported parameter HOT 1
- Megatron -> .nemo checkpoint conversion script `megatron_lm_ckpt_to_nemo.py` fails
- `always_save_nemo` not working properly
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nemo.