kaljurand / k6nele Goto Github PK

View Code? Open in Web Editor NEW

266.0 266.0 83.0 25.09 MB

An Android app that offers speech-to-text user interfaces to other apps

Home Page: http://kaljurand.github.io/K6nele/

License: Apache License 2.0

Java 89.10% HTML 0.30% Kotlin 10.59%

android estonian input-method-editor speech-recognition speech-to-text

k6nele's People

Contributors

Stargazers

Watchers

Forkers

eideo amrzagloul kapong lilianyu crosslink yoraco vlinhd11 alexg98 skategui batumi chagge chaoskcuf mikesth007 shangma cruise0 wangpeng1 visavis2k damonthomson ajunboys betoluna gauripulekar wycarol44 amalg989 kkadalip mingderwang stheodorakis priyasgit hsiyang506 mark-akaon baalsaawe baaslaawe dreadlord1984 artzok talk114 azeez148 mbencherif ashokbachu wangwindlong dev-bobsong chandanws zds05 nayomnik wuxianai charloco stevenlol reloadbrain lixuya mitr-io slbinilkumar kshravi86 codedsun lalitwdhw jhusain matrixy tidehc tommyteavee pxunever yh646492956 ajrulez alifesoftware nermeen78 kevin6449 sanyaade-machine-learning manhcuongk55 shuaibixi dheeraj0052 hanv89 jhachandan1994 yalandingtw chaitanya-jadhav poveteen program017 standup2go xiao11lam jacobpiela shammur shengruxiahuazhizunbao wangzigeng2004 ets-android4 inuman idacrabackup darian-dean

k6nele's Issues

Make documentation listenable as an audio file

Make sure that all the images and code samples in the documentation (https://github.com/Kaljurand/K6nele/blob/gh-pages/docs/et/user_guide.md) have textual labels and that the documentation produces a reasonable output when converted to audio using http://heliraamat.eki.ee/voxpopuli/

Modify transcription by user-specified regular expression

Original issue 25 created by Kaljurand on 2012-08-25T20:30:03.000Z:

In the Apps list context menu, add "Assign regexp" and "Remove regexp". "Assign regexp" would allow the user to specify a regular expression that modifies the transcription(s) that the server returns.

Use case: TuneIn Radio. TuneIn Radio supports voice search (in car mode, with device language set to English) but expects the returned transcription to start with "listen to". So, Kõnele cannot be used with TuneIn. The solution would be to automatically modify the transcription e.g. by:

s/^/listen to /

s/^mängi /listen to /

Allowing the user to say "tallinn põleb" (i.e. simply the search query) or "mängi Arvo Pärt" (search query with an Estonian prefix).

Use case: removing brackets e.g. from arithmetical expressions: s/[)(]//g

Some modifications would require actually a chain of regexp transformations instead of just a single regexp.

Support all keyboard apps

Original issue 16 created by Kaljurand on 2011-12-02T19:08:49.000Z:

Many keyboard apps do not use the !RecognizerIntent interface. This includes the default Android keyboard. Also the latest version of SlideIT. Support such keyboards as well.

Record in 44100Hz, downsample to 16000Hz

Original issue 20 created by Kaljurand on 2011-12-31T13:33:09.000Z:

The Android documentation says that "44100Hz is currently the only rate that is guaranteed to work on all devices, but other rates such as 22050, 16000, and 11025 may work on some devices".

We currently record in 16kHz which could be the reason why recording completely fails on some devices (e.g. Samsung Galaxy Gio). The solution would be to record in the single officially supported sample rate (44.1kHz) and then downsample the result to the best size/quality sample rate (16kHz).

Build net-speech-api from source

Jar files are binaries that shouldn't be inside the repository. Could you wire it up to build from the souce code (e.g. using submodules) or push net-speech-api to mavenCentral?

'Server not reachable'; (re)open if Kõnele default server not working

Hello!

Many-many thanks once again for this great app! I find it indispensable and use it a lot. However, since yesterday I get the 'server not reachable' error message when I try to dictate something.

No transcription found error

Many thanks for the application, it helps me a lot!

I have been using it heavily for a couple of months. Unfortunately now it stopped working and responds with no transcription found error when I speak. How can I fix this?

Implement the IME interface

Original issue 29 created by Kaljurand on 2014-01-18T14:14:12.000Z:

Kõnele can be currently used as an activity and as a service. It would be useful if it could act as an IME (input method editor) as well. The GUI could be similar to the Google Voice Search IME. For a structured input type (number, datetime, phone), we could do grammar-based recognition.

Improve grammar management

Original issue 5 created by Kaljurand on 2011-10-27T15:42:48.000Z:

It should be possible to keep the Grammars DB in sync with some official online resource.

Alternatively, make the Grammars list only available off-phone, via a mobile-friendly webpage from where the user can pick grammar URLs to assign them to Apps.

Pause the media player for the duration of recording

Original issue 14 created by Kaljurand on 2011-11-09T22:22:43.000Z:

It would make sense to pause the media player for the duration of recording, otherwise there is too much interference affecting the pause detection and the eventual transcription quality. E.g. Google Voice Search pauses the media player.

Add testing/evaluation/calibration functionality

Original issue 11 created by Kaljurand on 2011-11-03T12:10:04.000Z:

There should be a built-in tool which guides the user through a list of written utterances (with their corresponding normalizations) and asks to speak each of them, showing continuously the speech recognizer performance of getting the matching transcription.

The purpose is to test/evaluate the speech recognizer, or to train the recognizer (for the latter, the API needs to provide a way to communicate the existing written utterance to the server, e.g. with every query).

In terms of the UI

it could be something similar to the current Repeater-demo (and it could actually replace it also as a demo);
it should be able to pull existing written utterances from a webservice (which possibly generates them randomly using a given a grammar);
it should display a final report about the recognition accuracy/speed and share it via other apps.

All this functionality could also be packaged as an independent app, but it's probably easier to start building it as part of RecognizerIntent.

insufficient permissions on Android Emulator (Nexus 5 API 26)

hi kaljurand,

i am trying to run this on Android emulator in my Android Studio(I do not have an Android Phone) and I am getting this error.

Any suggestions on how to get over this(i tried the option you suggested in one of the closed issues but did not seem to work)?

thanks

Overhaul the end-user documentation to be Android v5 and v6 compatible

Original issue 31 created by Kaljurand on 2014-11-21T19:02:50.000Z:

Blocked by this bug https://code.google.com/p/android/issues/detail?id=80079

Optionally send the audio in FLAC or smth similar

Original issue 4 created by Kaljurand on 2011-10-27T15:31:54.000Z:

in order to reduce the audio upload size.

Add extra: ee.ioc.phon.android.extra.SERVER_URL

Original issue 9 created by Kaljurand on 2011-11-03T10:49:39.000Z:

This would allow the 3rd party app developer to set which server to use, similar to setting the grammar and lang.

Insufficient permissions on Android 6

When I try to use K6nele as a keyboard to dictate to a text field, I get the "[ insufficient permissions ]" error, reported on top of the yellow mic button, after I tap the button to dictate.

Nexus 5X, Android 6.0.1

Can't gradle assembleRelease (ee.ioc.phon.netspeechapi.recsession does not exist)

Hey there
While following the readme I get stuck at "gradle assembleRelease"
with the error "ee.ioc.phon.netspeechapi.recsession does not exist"
More specifically at ChunkedWebRecSession.

I have build the net-speech-api using "mvn package -DskipTests"

Launch the recognizer from the browser

Original issue 28 created by Kaljurand on 2013-04-01T12:37:39.000Z:

Look into "Android Intents with Chrome" (https://developers.google.com/chrome/mobile/docs/intents)

Could not find method google() for arguments [] on repository container

hi,

i am getting the above error. Could you please tell what is wrong?
thanks

additional info:

username$ gradle assemble --info

Starting Build
Settings evaluated using settings file '/Applications/Dev/AS/Samples/K6nele/settings.gradle'.
Projects loaded. Root project using build file '/Applications/Dev/AS/Samples/K6nele/build.gradle'.
Included projects: [root project 'K6nele', project ':app', project ':net-speech-api', project ':speechutils', project ':speechutils:app']
Evaluating root project 'K6nele' using build file '/Applications/Dev/AS/Samples/K6nele/build.gradle'.

FAILURE: Build failed with an exception.

Where:
Build file '/Applications/Dev/AS/Samples/K6nele/build.gradle' line: 3
What went wrong:
A problem occurred evaluating root project 'K6nele'.

Could not find method google() for arguments [] on repository container.

Try:
Run with --stacktrace option to get the stack trace. Run with --debug option to get more log output.

Support RecognizerIntent EXTRAs from higher API levels

Original issue 7 created by Kaljurand on 2011-10-27T16:06:32.000Z:

There is currently support for EXTRAs up to API Level 3. Android has added a few new EXTRAs in API levels 8, 11, and 14. For the most part these are rarely used and non-essential but some form of support would still be nice, e.g. print a log message for every unsupported EXTRA to inform the developer that his/her EXTRA was ignored.

emulator android studio

Hello :)
help me please :( when running the android application, the emulator (Nexus 5x api 28-2) remains in this state.
is there a solution ???????
thank you

Saving and Retrieving recorded audio

Improve continuous dictation interface (applies to ICS and higher)

Original issue 24 created by Kaljurand on 2012-04-06T07:37:32.000Z:

Make it work better with apps which expect the continuous dictation interface (e.g. Evernote v3.6.2 on ICS). Currently some audio is not transcribed unless you pause long and carefully between sentences.

RecognizerIntent sometimes starts in a wrong activity

Original issue 13 created by Kaljurand on 2011-11-04T19:08:41.000Z:

What steps will reproduce the problem?

Start from a launcher icon, and go into a deeper menu, e.g. Demos
Press HOME (leaving the Demos' activity on the top of the stack)
Launch an app with a textfield (e.g. an SMS app)
Try to fill the textfield via SwiftKey X's microphone button

What is the expected output? What do you see instead?

Instead of RecognizerIntentActivity (which provides the recorder/recognizer box), you'll see the Demos' activity, i.e. the current topmost activity in the RecognizerIntent activity stack.

To work around this problem, one must press BACK until reaching the HOME-screen, so that RecognizerIntent is completely destroyed and then go back to filling the textfield.

When called from another app, ALWAYS RecognizerIntentActivity must start.

RecordThread: buffer overflow (HTC Wildfire)

Original issue 21 created by Kaljurand on 2012-01-06T09:00:33.000Z:

With longer recordings (15sec of 16kHz) on HTC Wildfire, getting:

W/AudioFlinger( 73): RecordThread: buffer overflow

Doesn't work with SwiftkeyX

Original issue 1 created by Kaljurand on 2011-09-30T12:57:35.000Z:

SwiftkeyX Android keyboard has a little microphone button for entering text via speech recognition. I can successfully open recognizer-intent with it, and it recognizes my speech, but the resulting text is not written to the current text field.

Works with Google Voice Search: Voice Search gives a list of recognition hypotheses to select from, and the selected text is written to the text field.

Implement beginningOfSpeech

Original issue 19 created by Kaljurand on 2011-12-21T13:43:46.000Z:

Implement beginningOfSpeech, see
http://developer.android.com/reference/android/speech/RecognitionService.Callback.html#beginningOfSpeech()

Offline Speech Recognition

Here in this library , can we implement speech recognition when offline , Can you provide any solution here or anyother way.Is Functionality Implemented in Konele.

speech-to-text crashes everytime on 4.1.2

Original issue 26 created by Kaljurand on 2012-10-21T16:56:02.000Z:

What steps will reproduce the problem?

tap on the red icon to talk
talk 3 - 5 seconds
stop talking and let it process the speech

What is the expected output? What do you see instead?

app crashes ( logcat attached )

What version of the product are you using? On what operating system?

android 4.1.2

Please provide any additional information below.

User should be able to override incoming extras

Original issue 10 created by Kaljurand on 2011-11-03T11:08:44.000Z:

When calling RECOGNIZE_SPEECH the 3rd party app sets certain EXTRAS. The user should be able to configure if these extras are replaced and how, by setting for each app for each extra one of the following:

override with a new (possibly empty) value
do not change (DEFAULT)

For example, if a 3rd party app contains a hard-coded reference to a grammar URL which does not resolve and which cannot be changed in the app by the user, then the user could still change it in the Apps-list of RecognizerIntent.

Execution failed for task ':app:processDebugResources'. Failed to execute aapt

Hi Kaljurand

I am facing the following error when trying to build K6nele

Caused by: com.android.tools.aapt2.Aapt2Exception: AAPT2 error: check logs for details
at com.android.builder.png.AaptProcess$NotifierProcessOutput.handleOutput(AaptProcess.java:463)
at com.android.builder.png.AaptProcess$NotifierProcessOutput.err(AaptProcess.java:415)
at com.android.builder.png.AaptProcess$ProcessOutputFacade.err(AaptProcess.java:332)
at com.android.utils.GrabProcessOutput$1.run(GrabProcessOutput.java:104)

FAILURE: Build failed with an exception.

What went wrong:
Execution failed for task ':app:processDebugResources'.

Failed to execute aapt

Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.
Get more help at https://help.gradle.org

Deprecated Gradle features were used in this build, making it incompatible with Gradle 5.0.
See https://docs.gradle.org/4.6/userguide/command_line_interface.html#sec:command_line_warnings

BUILD FAILED in 0s
28 actionable tasks: 1 executed, 27 up-to-date

Any suggestion?
Thanks
Mohamed Eldesouki

How to set Notes or another application to use Kaldi and not Google ASR?

Hi Kaarel Kaljurand,
I'm testing the library and I'm excited about the features it has. In one of the tests, I used an Android tablet setting the language to Estonian and the automatic recognition to Kaldi, while the web socket pointed to my local server where Kaldi runs with Spanish models. K6nele recognized it as expected, but when I use an application such as Notes or Handoff the system uses Google models in Spanish, even though it is set in Estonian (fast recognition). I disabled "Google OK" but still using the Google model. How can I use other applications with Kaldi? Thanks in advance.

Force close resulting from sendChunk (that uses HttpURLConnectionImpl)

Original issue 12 created by Kaljurand on 2011-11-03T15:09:51.000Z:

On HTC Wildfire (Android v2.2), sometimes getting this NPE:

{{{
E/AndroidRuntime( 5655): java.lang.NullPointerException
E/AndroidRuntime( 5655): at org.apache.harmony.luni.internal.net.www.protocol.http.HttpURLConnectionImpl.readln(HttpURLConnectionImpl.java:1293)
E/AndroidRuntime( 5655): at org.apache.harmony.luni.internal.net.www.protocol.http.HttpURLConnectionImpl.readServerResponse(HttpURLConnectionImpl.java:1351)
E/AndroidRuntime( 5655): at org.apache.harmony.luni.internal.net.www.protocol.http.HttpURLConnectionImpl.doRequest(HttpURLConnectionImpl.java:1644)
E/AndroidRuntime( 5655): at org.apache.harmony.luni.internal.net.www.protocol.http.HttpURLConnectionImpl.getInputStream(HttpURLConnectionImpl.java:1153)
E/AndroidRuntime( 5655): at ee.ioc.phon.netspeechapi.recsession.ChunkedWebRecSession.sendChunk(Unknown Source)
E/AndroidRuntime( 5655): at ee.ioc.phon.android.recognizerintent.RecognizerIntentActivity.sendChunk(RecognizerIntentActivity.java:706)
E/AndroidRuntime( 5655): at ee.ioc.phon.android.recognizerintent.RecognizerIntentActivity.access$4(RecognizerIntentActivity.java:703)
E/AndroidRuntime( 5655): at ee.ioc.phon.android.recognizerintent.RecognizerIntentActivity$6.run(RecognizerIntentActivity.java:560)
W/ActivityManager( 103): Force finishing activity ee.ioc.phon.android.recognizerintent/.demo.RepeaterDemo
}}}

question

Hi kaljurand,

(i put this question on alumae github page as well. i thought may be you can answer these questions.thankyou)

i am new to kaldi and the gstreamer plugin.

I have a question though. what is the reason behind choosing gstreamer to pass the audio to kaldi for decoding?

could a simple server be sufficient to pass the audio to kaldi for decoding? are there any limitations to such an approach that made you choose the gstreamer to act as an intermediary(pardon my ignorance if my thought process is wrong)?

thanks in advance

Improve the way the user is notified about network connection problems

Original issue 15 created by Kaljurand on 2011-11-22T09:13:11.000Z:

Currently the standard RESULT_NETWORK_ERROR result code is returned but no calling app seems to care about it and properly display it to the user. So the solution would be to inform the user in our own UI, as Google Voice Search does.

Pause detection fails in noisy/windy environments

Original issue 3 created by Kaljurand on 2011-10-27T15:19:54.000Z:

Also, its performance depends too much on the underlying hardware, e.g. the detection works much better on HTC Wildfire than on Samsung Galaxy S II.

Optionally dispatch the recognition task to another recognizer

Original issue 22 created by Kaljurand on 2012-02-08T07:43:18.000Z:

Example: the user is switching keyboard layouts (e.g. between Estonian and Russian), and expects the speech recognizer to switch its languages as well.

In case the default server cannot handle the language change (the server must be explicit about which languages it supports), then Kõnele could try to intelligently switch to another server, or even dispatch the job to another speech recognition service installed on the phone.

In the settings, the user should be able to give guidelines for such switching, e.g. specify which alternative service she prefers.

Look into voiceimeutils.jar

Original issue 17 created by Kaljurand on 2011-12-13T13:16:22.000Z:

See:

Support Google Keep "get audio" extras

Original issue 30 created by Kaljurand on 2014-01-24T09:56:24.000Z:

Support extras:

android.speech.extra.GET_AUDIO_FORMAT: audio/AMR
android.speech.extra.GET_AUDIO: true

used by Google Keep.

Audio Support in English and Setting Default Language English

Show service status in the UI

For services that support this (e.g. the service that is based on kaldi-gstreamer-server) present the status of the service (number of available slots, latency, etc.) somewhere / under certain conditions, e.g.

add a live message into "Settings/Recognition services/K6nele (fast recognition)" showing the number of currently available slots (similarly to http://bark.phon.ioc.ee:82/dev/duplex-speech-api/static/status.html), or a network error message;
after 2 consecutive network error messages in the IME / voice panel, test the network, and if it's otherwise working then pop up a dialog blaming the server, possibly offering the email address of the server maintainer as a target for complaints.

Motivated by #48

Support landscape-orientation with RecognizerIntentActivity

Original issue 6 created by Kaljurand on 2011-10-27T15:53:49.000Z:

Currently the phone switches to portrait-mode when the RecognizerIntentActivity is started.

Correctly handle the case where EXTRA_LANGUAGE is not set

Original issue 27 created by Kaljurand on 2013-01-13T20:03:55.000Z:

According to:

http://developer.android.com/reference/android/speech/RecognizerIntent.html#EXTRA_LANGUAGE

the user's preferred locale must be used for the identification of the input speech language in case EXTRA_LANGUAGE is not set.

This has some complications for K6nele because many Estonian users prefer a non Estonian locale because the Estonian translation of Android is often horrible. See also:

http://nugiline.wordpress.com/2011/09/03/androidi-eesti-keel-imeb-lurinal-samsung-galaxy-s-ii-i9100/

Note also that IMEs have their own technique of identifying the language (by "selectedLanguage" String).

The beep-sounds should be improved

Original issue 2 created by Kaljurand on 2011-10-27T15:12:33.000Z:

What steps will reproduce the problem?

Switch on "Play audio cues"
Record/recognize something
Notice the ugliness...

What is the expected output?

A nice beep in the style of Siri or Google Voice Search

What do you hear instead?

An ugly beep...

Unable to compile project due to dependency

Unable to resolve dependency for ':app@debug/compileClasspath': Could not resolve project :speechutils:app. Open File Show Details

Unable to resolve dependency for ':app@release/compileClasspath': Could not resolve project :speechutils:app. Open File Show Details

Unable to resolve dependency for ':app@debugAndroidTest/compileClasspath': Could not resolve project :speechutils:app. Open File Show Details

Return RESULT_SERVER_ERROR instead of RESULT_NO_MATCH where appropriate

Original issue 18 created by Kaljurand on 2011-12-18T11:42:32.000Z:

In some cases where RESULT_SERVER_ERROR would be correct to return, RESULT_NO_MATCH is currently returned instead, e.g. when the server returns the results in an incorrect format. (This is actually more of an issue with the net-speech-api.)

websocket issue

I want to start a ratchet websocket service to collect audio data from K6nele.
The websocket service is started, and I can open the websocket connection in javascript like:
var ws = createWebSocket("ws://10.6.71.10:8001");

However, when I use K6nele with websocket url "ws://10.6.71.10:8001" to open websocket connection, the handshake message showed:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: GJVr5M5/N98Sk4n1Yl2FnZI2/Os=
X-Powered-By: Ratchet/0.4.1

Is there any difference to open a websocket in k6nele comparing to javascript?

regards,
Yunzhao

IME: Turning off screen starts recording

When I turn off the screen (from the physical button) when the Kõnele IME is active, K6nele starts a new recording session itself at the same moment.

When I do this repeatedly, Kõnele gets into strange inconsistent state where nothing works any more -- nothing happens when I push the yellow button, message "VIGA: lindistamine luhtus", nothing happens on the server.

Using latest apk from IME branch, compiled by you.

Return RESULT_SERVER_ERROR instead of RESULT_NETWORK_ERROR where appropriate

Original issue 23 created by Kaljurand on 2012-03-31T10:01:25.000Z:

Currently this non-well-formed response triggers the "network error" message:

{
"status": 0,
"hypotheses": [

],
"id": "776be34c343ec24asfaf08ed42a4998e442e"
}

The Grammars-list is emptied for no reason

Original issue 8 created by Kaljurand on 2011-10-27T18:40:19.000Z:

What steps will reproduce the problem?

Use RecognizerIntent via some app
Go to Settings/Apps and delete the entry corresponding to the app

What is the expected output? What do you see instead?

The Settings/Grammars list has become empty which is unexpected.

kaljurand / k6nele Goto Github PK

k6nele's People

Contributors

Stargazers

Watchers

Forkers

k6nele's Issues

Recommend Projects

Recommend Topics

Recommend Org