Git Product home page Git Product logo

speech-to-text's People

Contributors

akras14 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

speech-to-text's Issues

[HELP], Want to recognize the voice

I have used google cloud speech to text API which is working well but I need to show speakers just above the line. Suppose I have an audio in which 4 persons involved Now I want to get the persons just before start his / her text. Like
Person1:
Here is the text of person1.
Person2:
Here is the text of person2.
Person1:
Here is another line of text from person1.
Person3:
Here is the text of person3.
Can anyone let me know how I can get the speaker also with the text by using google API?

Can anyone tell me what is the reason of this issue, please.

I am a beginner and could not find anything on google. WAV docs are in Turkish. I don't know if it is related. Might be
Thank you for your time.
Ali

`Traceback (most recent call last):
File "fast.py", line 28, in
all_text = pool.map(transcribe, enumerate(files))
File "C:\Users\ASUS-25\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "C:\Users\ASUS-25\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 768, in get
raise self.value
File "C:\Users\ASUS-25\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "C:\Users\ASUS-25\AppData\Local\Programs\Python\Python38-32\lib\multiprocessing\pool.py", line 48, in mapstar
return list(map(*args))
File "fast.py", line 21, in transcribe
text = r.recognize_google_cloud(audio, credentials_json=GOOGLE_CLOUD_SPEECH_CREDENTIALS)
File "C:\Users\ASUS-25\AppData\Local\Programs\Python\Python38-32\lib\site-packages\speech_recognition_init
.py", line 937, in recognize_google_cloud
if "results" not in response or len(response["results"]) == 0: raise UnknownValueError()
speech_recognition.UnknownValueError

C:\Users\ASUS-25\Google Drive\Work\speech-to-text-master>`

ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC

Has anyone encountered a value error even though the audio file is a PCM wav? Any idea to solve it?
ValueError: Audio file could not be read as PCM WAV, AIFF/AIFF-C, or Native FLAC.

I ran the fast.py with some sample wav files and it worked perfectly! But when I tested it with audio files I collected from website, I got a value error even though the info from soxi command says otherwise.

I then re-ran the sample wav files that were previously worked, but received the same error messages.

Audio files I collected from website
I downloaded Amazon's audio (https://www.youtube.com/watch?v=CxK1VhtJlNQ), converted it to wav file at 16K sample rate and 1 channel. Split it into small pieces with py-webrtcvad.

soxi chunk-02.wav
Input File : 'chunk-02.wav'
Channels : 1
Sample Rate : 16000
Precision : 16-bit
Duration : 00:00:03.03 = 48480 samples ~ 227.25 CDDA sectors
File Size : 97.0k
Bit Rate : 256k
Sample Encoding: 16-bit Signed Integer PCM

Not actually an issue

I have setup your library locally and it works like charm thank you for good work ! I m trying to integrate this library with php but couldn't get it produce results in that case. This script is saved in a folder named speech_to_text and I m trying to execute it using php's shell command the code I run is $output = shell_exec("/usr/bin/python3 /var/www/html/speech_to_text/slow.py $directory"); and I have modified the slow.py file in following way

https://gist.github.com/khanof89/1c97f178dace3712991d114f95a3da2c

the following is the output I get:

foldername /var/www/html/podcasts-manage/storage/episode-2-of-the-awesome-mypodcast-a5dc/
['genevieve1.wav']
for f in tqdm /var/www/html/podcasts-manage/storage/episode-2-of-the-awesome-mypodcast-a5dc/genevieve1.wav
name /var/www/html/podcasts-manage/storage/episode-2-of-the-awesome-mypodcast-a5dc/genevieve1.wav
inside source
done source
credentials {
"type": "service_account",
"project_id": "api-project-11111111111",
"private_key_id": "private_key_id_goes_here",
"private_key": "-----BEGIN PRIVATE KEY-----\nMY_PRIVATE_KEY_GOES_HERE\n-----END PRIVATE KEY-----\n",
"client_email": "[email protected]",
"client_id": "1111111111111",
"auth_uri": "https://accounts.google.com/o/oauth2/auth",
"token_uri": "https://accounts.google.com/o/oauth2/token",
"auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
"client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/analytics%40api-project-792103813257.iam.gserviceaccount.com"
}

exception in text=r.recognizerl": "https://www.googleapis.com/robot/v1/metadata/x509/analytics%40api-project-792103813257.iam.gserviceaccount.com"
}

exception in text=r.recognize

because I am creating this as a issue I have replaced many thing from my google credentials file but actually they are intact. Please help

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.