kaljurand / k6nele Goto Github PK
View Code? Open in Web Editor NEWAn Android app that offers speech-to-text user interfaces to other apps
Home Page: http://kaljurand.github.io/K6nele/
License: Apache License 2.0
An Android app that offers speech-to-text user interfaces to other apps
Home Page: http://kaljurand.github.io/K6nele/
License: Apache License 2.0
Make sure that all the images and code samples in the documentation (https://github.com/Kaljurand/K6nele/blob/gh-pages/docs/et/user_guide.md) have textual labels and that the documentation produces a reasonable output when converted to audio using http://heliraamat.eki.ee/voxpopuli/
Original issue 25 created by Kaljurand on 2012-08-25T20:30:03.000Z:
In the Apps list context menu, add "Assign regexp" and "Remove regexp". "Assign regexp" would allow the user to specify a regular expression that modifies the transcription(s) that the server returns.
Use case: TuneIn Radio. TuneIn Radio supports voice search (in car mode, with device language set to English) but expects the returned transcription to start with "listen to". So, Kõnele cannot be used with TuneIn. The solution would be to automatically modify the transcription e.g. by:
s/^/listen to /
or
s/^mängi /listen to /
Allowing the user to say "tallinn põleb" (i.e. simply the search query) or "mängi Arvo Pärt" (search query with an Estonian prefix).
Use case: removing brackets e.g. from arithmetical expressions: s/[)(]//g
Some modifications would require actually a chain of regexp transformations instead of just a single regexp.
Original issue 16 created by Kaljurand on 2011-12-02T19:08:49.000Z:
Many keyboard apps do not use the !RecognizerIntent interface. This includes the default Android keyboard. Also the latest version of SlideIT. Support such keyboards as well.
Original issue 20 created by Kaljurand on 2011-12-31T13:33:09.000Z:
The Android documentation says that "44100Hz is currently the only rate that is guaranteed to work on all devices, but other rates such as 22050, 16000, and 11025 may work on some devices".
We currently record in 16kHz which could be the reason why recording completely fails on some devices (e.g. Samsung Galaxy Gio). The solution would be to record in the single officially supported sample rate (44.1kHz) and then downsample the result to the best size/quality sample rate (16kHz).
See also:
http://developer.android.com/reference/android/media/AudioRecord.html#AudioRecord(int, int, int, int, int)
Jar files are binaries that shouldn't be inside the repository. Could you wire it up to build from the souce code (e.g. using submodules) or push net-speech-api to mavenCentral?
Hello!
Many-many thanks once again for this great app! I find it indispensable and use it a lot. However, since yesterday I get the 'server not reachable' error message when I try to dictate something.
Many thanks for the application, it helps me a lot!
I have been using it heavily for a couple of months. Unfortunately now it stopped working and responds with no transcription found error when I speak. How can I fix this?
Original issue 29 created by Kaljurand on 2014-01-18T14:14:12.000Z:
Kõnele can be currently used as an activity and as a service. It would be useful if it could act as an IME (input method editor) as well. The GUI could be similar to the Google Voice Search IME. For a structured input type (number, datetime, phone), we could do grammar-based recognition.
Original issue 5 created by Kaljurand on 2011-10-27T15:42:48.000Z:
It should be possible to keep the Grammars DB in sync with some official online resource.
Alternatively, make the Grammars list only available off-phone, via a mobile-friendly webpage from where the user can pick grammar URLs to assign them to Apps.
Original issue 14 created by Kaljurand on 2011-11-09T22:22:43.000Z:
It would make sense to pause the media player for the duration of recording, otherwise there is too much interference affecting the pause detection and the eventual transcription quality. E.g. Google Voice Search pauses the media player.
Original issue 11 created by Kaljurand on 2011-11-03T12:10:04.000Z:
There should be a built-in tool which guides the user through a list of written utterances (with their corresponding normalizations) and asks to speak each of them, showing continuously the speech recognizer performance of getting the matching transcription.
The purpose is to test/evaluate the speech recognizer, or to train the recognizer (for the latter, the API needs to provide a way to communicate the existing written utterance to the server, e.g. with every query).
In terms of the UI
All this functionality could also be packaged as an independent app, but it's probably easier to start building it as part of RecognizerIntent
.
hi kaljurand,
i am trying to run this on Android emulator in my Android Studio(I do not have an Android Phone) and I am getting this error.
Any suggestions on how to get over this(i tried the option you suggested in one of the closed issues but did not seem to work)?
thanks
Original issue 31 created by Kaljurand on 2014-11-21T19:02:50.000Z:
Blocked by this bug https://code.google.com/p/android/issues/detail?id=80079
Original issue 4 created by Kaljurand on 2011-10-27T15:31:54.000Z:
in order to reduce the audio upload size.
See also: http://flac.sourceforge.net/
Note that Android (as of v4.0) does not have native support for FLAC-encoding.
Original issue 9 created by Kaljurand on 2011-11-03T10:49:39.000Z:
This would allow the 3rd party app developer to set which server to use, similar to setting the grammar and lang.
When I try to use K6nele as a keyboard to dictate to a text field, I get the "[ insufficient permissions ]" error, reported on top of the yellow mic button, after I tap the button to dictate.
Nexus 5X, Android 6.0.1
Hey there
While following the readme I get stuck at "gradle assembleRelease"
with the error "ee.ioc.phon.netspeechapi.recsession does not exist"
More specifically at ChunkedWebRecSession.
I have build the net-speech-api using "mvn package -DskipTests"
Original issue 28 created by Kaljurand on 2013-04-01T12:37:39.000Z:
Look into "Android Intents with Chrome" (https://developers.google.com/chrome/mobile/docs/intents)
hi,
i am getting the above error. Could you please tell what is wrong?
thanks
additional info:
username$ gradle assemble --info
Starting Build
Settings evaluated using settings file '/Applications/Dev/AS/Samples/K6nele/settings.gradle'.
Projects loaded. Root project using build file '/Applications/Dev/AS/Samples/K6nele/build.gradle'.
Included projects: [root project 'K6nele', project ':app', project ':net-speech-api', project ':speechutils', project ':speechutils:app']
Evaluating root project 'K6nele' using build file '/Applications/Dev/AS/Samples/K6nele/build.gradle'.
FAILURE: Build failed with an exception.
Where:
Build file '/Applications/Dev/AS/Samples/K6nele/build.gradle' line: 3
What went wrong:
A problem occurred evaluating root project 'K6nele'.
Could not find method google() for arguments [] on repository container.
Original issue 7 created by Kaljurand on 2011-10-27T16:06:32.000Z:
There is currently support for EXTRAs up to API Level 3. Android has added a few new EXTRAs in API levels 8, 11, and 14. For the most part these are rarely used and non-essential but some form of support would still be nice, e.g. print a log message for every unsupported EXTRA to inform the developer that his/her EXTRA was ignored.
See also:
http://developer.android.com/reference/android/speech/RecognizerIntent.html
Original issue 24 created by Kaljurand on 2012-04-06T07:37:32.000Z:
Make it work better with apps which expect the continuous dictation interface (e.g. Evernote v3.6.2 on ICS). Currently some audio is not transcribed unless you pause long and carefully between sentences.
Original issue 13 created by Kaljurand on 2011-11-04T19:08:41.000Z:
What steps will reproduce the problem?
What is the expected output? What do you see instead?
Instead of RecognizerIntentActivity
(which provides the recorder/recognizer box), you'll see the Demos' activity, i.e. the current topmost activity in the RecognizerIntent activity stack.
To work around this problem, one must press BACK until reaching the HOME-screen, so that RecognizerIntent
is completely destroyed and then go back to filling the textfield.
When called from another app, ALWAYS RecognizerIntentActivity
must start.
Original issue 21 created by Kaljurand on 2012-01-06T09:00:33.000Z:
With longer recordings (15sec of 16kHz) on HTC Wildfire, getting:
W/AudioFlinger( 73): RecordThread: buffer overflow
Original issue 1 created by Kaljurand on 2011-09-30T12:57:35.000Z:
SwiftkeyX Android keyboard has a little microphone button for entering text via speech recognition. I can successfully open recognizer-intent with it, and it recognizes my speech, but the resulting text is not written to the current text field.
Works with Google Voice Search: Voice Search gives a list of recognition hypotheses to select from, and the selected text is written to the text field.
Original issue 19 created by Kaljurand on 2011-12-21T13:43:46.000Z:
Implement beginningOfSpeech, see
http://developer.android.com/reference/android/speech/RecognitionService.Callback.html#beginningOfSpeech()
Here in this library , can we implement speech recognition when offline , Can you provide any solution here or anyother way.Is Functionality Implemented in Konele.
Original issue 26 created by Kaljurand on 2012-10-21T16:56:02.000Z:
What steps will reproduce the problem?
What is the expected output? What do you see instead?
app crashes ( logcat attached )
What version of the product are you using? On what operating system?
android 4.1.2
Please provide any additional information below.
Original issue 10 created by Kaljurand on 2011-11-03T11:08:44.000Z:
When calling RECOGNIZE_SPEECH
the 3rd party app sets certain EXTRAS. The user should be able to configure if these extras are replaced and how, by setting for each app for each extra one of the following:
For example, if a 3rd party app contains a hard-coded reference to a grammar URL which does not resolve and which cannot be changed in the app by the user, then the user could still change it in the Apps-list of RecognizerIntent
.
Hi Kaljurand
I am facing the following error when trying to build K6nele
Caused by: com.android.tools.aapt2.Aapt2Exception: AAPT2 error: check logs for details
at com.android.builder.png.AaptProcess$NotifierProcessOutput.handleOutput(AaptProcess.java:463)
at com.android.builder.png.AaptProcess$NotifierProcessOutput.err(AaptProcess.java:415)
at com.android.builder.png.AaptProcess$ProcessOutputFacade.err(AaptProcess.java:332)
at com.android.utils.GrabProcessOutput$1.run(GrabProcessOutput.java:104)
FAILURE: Build failed with an exception.
Failed to execute aapt
Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.
Get more help at https://help.gradle.org
Deprecated Gradle features were used in this build, making it incompatible with Gradle 5.0.
See https://docs.gradle.org/4.6/userguide/command_line_interface.html#sec:command_line_warnings
BUILD FAILED in 0s
28 actionable tasks: 1 executed, 27 up-to-date
Any suggestion?
Thanks
Mohamed Eldesouki
Hi Kaarel Kaljurand,
I'm testing the library and I'm excited about the features it has. In one of the tests, I used an Android tablet setting the language to Estonian and the automatic recognition to Kaldi, while the web socket pointed to my local server where Kaldi runs with Spanish models. K6nele recognized it as expected, but when I use an application such as Notes or Handoff the system uses Google models in Spanish, even though it is set in Estonian (fast recognition). I disabled "Google OK" but still using the Google model. How can I use other applications with Kaldi? Thanks in advance.
Original issue 12 created by Kaljurand on 2011-11-03T15:09:51.000Z:
On HTC Wildfire (Android v2.2), sometimes getting this NPE:
{{{
E/AndroidRuntime( 5655): java.lang.NullPointerException
E/AndroidRuntime( 5655): at org.apache.harmony.luni.internal.net.www.protocol.http.HttpURLConnectionImpl.readln(HttpURLConnectionImpl.java:1293)
E/AndroidRuntime( 5655): at org.apache.harmony.luni.internal.net.www.protocol.http.HttpURLConnectionImpl.readServerResponse(HttpURLConnectionImpl.java:1351)
E/AndroidRuntime( 5655): at org.apache.harmony.luni.internal.net.www.protocol.http.HttpURLConnectionImpl.doRequest(HttpURLConnectionImpl.java:1644)
E/AndroidRuntime( 5655): at org.apache.harmony.luni.internal.net.www.protocol.http.HttpURLConnectionImpl.getInputStream(HttpURLConnectionImpl.java:1153)
E/AndroidRuntime( 5655): at ee.ioc.phon.netspeechapi.recsession.ChunkedWebRecSession.sendChunk(Unknown Source)
E/AndroidRuntime( 5655): at ee.ioc.phon.android.recognizerintent.RecognizerIntentActivity.sendChunk(RecognizerIntentActivity.java:706)
E/AndroidRuntime( 5655): at ee.ioc.phon.android.recognizerintent.RecognizerIntentActivity.access$4(RecognizerIntentActivity.java:703)
E/AndroidRuntime( 5655): at ee.ioc.phon.android.recognizerintent.RecognizerIntentActivity$6.run(RecognizerIntentActivity.java:560)
W/ActivityManager( 103): Force finishing activity ee.ioc.phon.android.recognizerintent/.demo.RepeaterDemo
}}}
Hi kaljurand,
(i put this question on alumae github page as well. i thought may be you can answer these questions.thankyou)
i am new to kaldi and the gstreamer plugin.
I have a question though. what is the reason behind choosing gstreamer to pass the audio to kaldi for decoding?
could a simple server be sufficient to pass the audio to kaldi for decoding? are there any limitations to such an approach that made you choose the gstreamer to act as an intermediary(pardon my ignorance if my thought process is wrong)?
thanks in advance
Original issue 15 created by Kaljurand on 2011-11-22T09:13:11.000Z:
Currently the standard RESULT_NETWORK_ERROR result code is returned but no calling app seems to care about it and properly display it to the user. So the solution would be to inform the user in our own UI, as Google Voice Search does.
Original issue 3 created by Kaljurand on 2011-10-27T15:19:54.000Z:
Also, its performance depends too much on the underlying hardware, e.g. the detection works much better on HTC Wildfire than on Samsung Galaxy S II.
Original issue 22 created by Kaljurand on 2012-02-08T07:43:18.000Z:
Example: the user is switching keyboard layouts (e.g. between Estonian and Russian), and expects the speech recognizer to switch its languages as well.
In case the default server cannot handle the language change (the server must be explicit about which languages it supports), then Kõnele could try to intelligently switch to another server, or even dispatch the job to another speech recognition service installed on the phone.
In the settings, the user should be able to give guidelines for such switching, e.g. specify which alternative service she prefers.
Original issue 17 created by Kaljurand on 2011-12-13T13:16:22.000Z:
See:
Original issue 30 created by Kaljurand on 2014-01-24T09:56:24.000Z:
Support extras:
used by Google Keep.
For services that support this (e.g. the service that is based on kaldi-gstreamer-server) present the status of the service (number of available slots, latency, etc.) somewhere / under certain conditions, e.g.
Motivated by #48
Original issue 6 created by Kaljurand on 2011-10-27T15:53:49.000Z:
Currently the phone switches to portrait-mode when the RecognizerIntentActivity is started.
Original issue 27 created by Kaljurand on 2013-01-13T20:03:55.000Z:
According to:
http://developer.android.com/reference/android/speech/RecognizerIntent.html#EXTRA_LANGUAGE
the user's preferred locale must be used for the identification of the input speech language in case EXTRA_LANGUAGE is not set.
This has some complications for K6nele because many Estonian users prefer a non Estonian locale because the Estonian translation of Android is often horrible. See also:
http://nugiline.wordpress.com/2011/09/03/androidi-eesti-keel-imeb-lurinal-samsung-galaxy-s-ii-i9100/
Note also that IMEs have their own technique of identifying the language (by "selectedLanguage" String).
Original issue 2 created by Kaljurand on 2011-10-27T15:12:33.000Z:
What steps will reproduce the problem?
What is the expected output?
A nice beep in the style of Siri or Google Voice Search
What do you hear instead?
An ugly beep...
Unable to resolve dependency for ':app@debug/compileClasspath': Could not resolve project :speechutils:app. Open File Show Details
Unable to resolve dependency for ':app@release/compileClasspath': Could not resolve project :speechutils:app. Open File Show Details
Unable to resolve dependency for ':app@debugAndroidTest/compileClasspath': Could not resolve project :speechutils:app. Open File Show Details
Original issue 18 created by Kaljurand on 2011-12-18T11:42:32.000Z:
In some cases where RESULT_SERVER_ERROR would be correct to return, RESULT_NO_MATCH is currently returned instead, e.g. when the server returns the results in an incorrect format. (This is actually more of an issue with the net-speech-api.)
I want to start a ratchet websocket service to collect audio data from K6nele.
The websocket service is started, and I can open the websocket connection in javascript like:
var ws = createWebSocket("ws://10.6.71.10:8001");
However, when I use K6nele with websocket url "ws://10.6.71.10:8001" to open websocket connection, the handshake message showed:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: GJVr5M5/N98Sk4n1Yl2FnZI2/Os=
X-Powered-By: Ratchet/0.4.1
Is there any difference to open a websocket in k6nele comparing to javascript?
regards,
Yunzhao
When I turn off the screen (from the physical button) when the Kõnele IME is active, K6nele starts a new recording session itself at the same moment.
When I do this repeatedly, Kõnele gets into strange inconsistent state where nothing works any more -- nothing happens when I push the yellow button, message "VIGA: lindistamine luhtus", nothing happens on the server.
Using latest apk from IME branch, compiled by you.
Original issue 23 created by Kaljurand on 2012-03-31T10:01:25.000Z:
Currently this non-well-formed response triggers the "network error" message:
{
"status": 0,
"hypotheses": [
],
"id": "776be34c343ec24asfaf08ed42a4998e442e"
}
Original issue 8 created by Kaljurand on 2011-10-27T18:40:19.000Z:
What steps will reproduce the problem?
What is the expected output? What do you see instead?
The Settings/Grammars list has become empty which is unexpected.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.