Comments (7)
"audio" is an argument only of NeMo 2.0, which is the current main branch, and only it supports tensors.
The old 1.23 Nemo version only supports path2audio_files and does not accept tensors.
from nemo.
Okay, even after install from source, its not able to transcribe the whole tensor as it used to earlier. Still its only transcribing first 100000 samples of the waveform
from nemo.
What's the error trace ? Or just finished after 100K. Btw if you transcribe that much data you run the risk of OOM CPU ram. You might want to try the new transcribe_generator() instead if it's OOM you're facing
from nemo.
This is my code:
import nemo.collections.asr as nemo_asr
import torch
asr_model = nemo_asr.models.EncDecCTCModelBPE.from_pretrained(model_name="stt_hi_conformer_ctc_medium", map_location=torch.device('cuda:0'))
aud, sr = torchaudio.load("1.wav") # Audio is of 35 seconds
aud = torchaudio.functional.resample(aud, sr, 16000)
aud = aud.mean(dim=0)
# Try 1
out = asr_model.transcribe(audio=aud)
print(out[0])
## output: 'फाइनेैंस मैनेजमेंट का सबसे बेसिक कॉन्सेप्ट है इन्वेस्टि को समझना और सी तरीके से इम्प्लीमेंट करना मतलब है अपने इ प्ान बना कै पे इसतेमाल किए जाएंगे'
### Expected:
## 'फाइनेंस मैनेजमेंट का सबसे बेसिक कॉन्सेप्ट है बजटिंग सेविंग और इन्वेस्टिंग को समझना और सही तरीके से इम्प्लीमेंट करना बजटिंग का मतलब है अपने इनकम और एक्सपेंसेस को ट्रैक करना और एक प्लान बनाना कि कैसे पैसे इस्तेमाल किए जाएंगे सेविंग में हम पैसों का एक हिस्सा अलग करके फ्यूचर के लिए रखते हैं और इन्वेस्टिंग में हम पैसों को ग्रोथ के लिए अलग अलग तरीकों से इस्तेमाल करते हैं जैसे स्टॉक्स बॉन्ड्स या रियल एस्टेट में इन्वेस्ट करके जब हम इन तीनों को सही तरीके से मैनेज करते हैं तब हम अपनी फाइनेंशियल स्टेबिलिटी को इम्प्रूव कर सकते हैं'
# Try 2:
config = nemo_asr.parts.mixins.transcription.TranscribeConfig(batch_size = 1)
gen_out = asr_model.transcribe_generator(aud,config)
print(next(gen_out))
# output: ['फाइनेैंस मैनेजमेंट का सबसे बेसिक कॉन्सेप्ट है इन्वेस्टि को समझना और सी तरीके से इम्प्लीमेंट करना मतलब है अपने इ प्ान बना कै पे इसतेमाल किए जाएंगे']
@titu1994 as we can clearly see, in both the outputs its not transcribing whole audio
from nemo.
I see now. In both case, a dummy data loader is used which has duration set to 100000 - this doesn't matter, the model computes the actual duration on the fly. Ignore the 100000.
Have you listened to the audio file yourself ? 35 second audio file and that much expected text - it is possibly spoken far too fast, or the resample is causing a bug causing the model to be unable to predict properly. Write the file to disk after resampling and hear the audio fully to see if there's issues in it.
from nemo.
Yes, the audio has continuous speech, at normal rate. I have resampled the audio using ffmpeg to 16000 with mono
from nemo.
@titu1994 can you tell me how I can use the model to transcribe at least a 48 seconds audio? If its 16k Hz, and a mono sample
from nemo.
Related Issues (20)
- Any tts models in nemo that can simulated human laughter and other human sounds?
- setuptools 70.0.0 results in ImportError: cannot import name 'packaging' from 'pkg_resources' HOT 3
- Question about the settings in speech_data_simulator HOT 4
- The training hangs in the middle on multiple nodes, showing low power consumption and 100% GPU utilization.
- Request for Code and Models
- Issue Resuming Training from Checkpoint with Small Validation Dataset HOT 5
- Error: cannot import name 'ALBERT_PRETRAINED_MODEL_ARCHIVE_LIST' from 'transformers' while using HOT 4
- NameError: name 'ApexGuardDefaults' is not defined HOT 1
- Nvidia GPU is Not accessing in my Docker Container HOT 2
- why we reverse the RGB channels in video processor HOT 1
- Error(s): ConfidenceConfig.__init__() got an unexpected keyword argument 'measure_cfg' HOT 12
- ImportError: cannot import name '_library_root_logger' from 'apex' (unknown location)
- Citrinet Training: Sentences are cut during prediction HOT 1
- Unable to disable validation
- Why isn't FSDP supported by DistributedCheckpointIO?
- video input 'image_aspect_ratio=pad' not work
- Nemo_toolkit 2.0.0.rc0 installation failure HOT 12
- FSDP reduce_scatter can not overlap with compute HOT 1
- Getting `TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.`
- Python 3.11 dataclasses ValueError
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nemo.