Git Product home page Git Product logo

k6nele's People

Contributors

kaljurand avatar licaon-kter avatar poussinou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

k6nele's Issues

Modify transcription by user-specified regular expression

Original issue 25 created by Kaljurand on 2012-08-25T20:30:03.000Z:

In the Apps list context menu, add "Assign regexp" and "Remove regexp". "Assign regexp" would allow the user to specify a regular expression that modifies the transcription(s) that the server returns.

Use case: TuneIn Radio. TuneIn Radio supports voice search (in car mode, with device language set to English) but expects the returned transcription to start with "listen to". So, Kõnele cannot be used with TuneIn. The solution would be to automatically modify the transcription e.g. by:

s/^/listen to /

or

s/^mängi /listen to /

Allowing the user to say "tallinn põleb" (i.e. simply the search query) or "mängi Arvo Pärt" (search query with an Estonian prefix).

Use case: removing brackets e.g. from arithmetical expressions: s/[)(]//g

Some modifications would require actually a chain of regexp transformations instead of just a single regexp.

Support all keyboard apps

Original issue 16 created by Kaljurand on 2011-12-02T19:08:49.000Z:

Many keyboard apps do not use the !RecognizerIntent interface. This includes the default Android keyboard. Also the latest version of SlideIT. Support such keyboards as well.

Record in 44100Hz, downsample to 16000Hz

Original issue 20 created by Kaljurand on 2011-12-31T13:33:09.000Z:

The Android documentation says that "44100Hz is currently the only rate that is guaranteed to work on all devices, but other rates such as 22050, 16000, and 11025 may work on some devices".

We currently record in 16kHz which could be the reason why recording completely fails on some devices (e.g. Samsung Galaxy Gio). The solution would be to record in the single officially supported sample rate (44.1kHz) and then downsample the result to the best size/quality sample rate (16kHz).

See also:

http://developer.android.com/reference/android/media/AudioRecord.html#AudioRecord(int, int, int, int, int)

Build net-speech-api from source

Jar files are binaries that shouldn't be inside the repository. Could you wire it up to build from the souce code (e.g. using submodules) or push net-speech-api to mavenCentral?

No transcription found error

Many thanks for the application, it helps me a lot!

I have been using it heavily for a couple of months. Unfortunately now it stopped working and responds with no transcription found error when I speak. How can I fix this?

Implement the IME interface

Original issue 29 created by Kaljurand on 2014-01-18T14:14:12.000Z:

Kõnele can be currently used as an activity and as a service. It would be useful if it could act as an IME (input method editor) as well. The GUI could be similar to the Google Voice Search IME. For a structured input type (number, datetime, phone), we could do grammar-based recognition.

Improve grammar management

Original issue 5 created by Kaljurand on 2011-10-27T15:42:48.000Z:

It should be possible to keep the Grammars DB in sync with some official online resource.

Alternatively, make the Grammars list only available off-phone, via a mobile-friendly webpage from where the user can pick grammar URLs to assign them to Apps.

Pause the media player for the duration of recording

Original issue 14 created by Kaljurand on 2011-11-09T22:22:43.000Z:

It would make sense to pause the media player for the duration of recording, otherwise there is too much interference affecting the pause detection and the eventual transcription quality. E.g. Google Voice Search pauses the media player.

Add testing/evaluation/calibration functionality

Original issue 11 created by Kaljurand on 2011-11-03T12:10:04.000Z:

There should be a built-in tool which guides the user through a list of written utterances (with their corresponding normalizations) and asks to speak each of them, showing continuously the speech recognizer performance of getting the matching transcription.

The purpose is to test/evaluate the speech recognizer, or to train the recognizer (for the latter, the API needs to provide a way to communicate the existing written utterance to the server, e.g. with every query).

In terms of the UI

  • it could be something similar to the current Repeater-demo (and it could actually replace it also as a demo);
  • it should be able to pull existing written utterances from a webservice (which possibly generates them randomly using a given a grammar);
  • it should display a final report about the recognition accuracy/speed and share it via other apps.

All this functionality could also be packaged as an independent app, but it's probably easier to start building it as part of RecognizerIntent.

insufficient permissions on Android Emulator (Nexus 5 API 26)

hi kaljurand,

i am trying to run this on Android emulator in my Android Studio(I do not have an Android Phone) and I am getting this error.

Any suggestions on how to get over this(i tried the option you suggested in one of the closed issues but did not seem to work)?

thanks

Insufficient permissions on Android 6

When I try to use K6nele as a keyboard to dictate to a text field, I get the "[ insufficient permissions ]" error, reported on top of the yellow mic button, after I tap the button to dictate.

Nexus 5X, Android 6.0.1

Could not find method google() for arguments [] on repository container

hi,

i am getting the above error. Could you please tell what is wrong?
thanks

additional info:

username$ gradle assemble --info

Starting Build
Settings evaluated using settings file '/Applications/Dev/AS/Samples/K6nele/settings.gradle'.
Projects loaded. Root project using build file '/Applications/Dev/AS/Samples/K6nele/build.gradle'.
Included projects: [root project 'K6nele', project ':app', project ':net-speech-api', project ':speechutils', project ':speechutils:app']
Evaluating root project 'K6nele' using build file '/Applications/Dev/AS/Samples/K6nele/build.gradle'.

FAILURE: Build failed with an exception.

  • Where:
    Build file '/Applications/Dev/AS/Samples/K6nele/build.gradle' line: 3

  • What went wrong:
    A problem occurred evaluating root project 'K6nele'.

Could not find method google() for arguments [] on repository container.

  • Try:
    Run with --stacktrace option to get the stack trace. Run with --debug option to get more log output.

Support RecognizerIntent EXTRAs from higher API levels

Original issue 7 created by Kaljurand on 2011-10-27T16:06:32.000Z:

There is currently support for EXTRAs up to API Level 3. Android has added a few new EXTRAs in API levels 8, 11, and 14. For the most part these are rarely used and non-essential but some form of support would still be nice, e.g. print a log message for every unsupported EXTRA to inform the developer that his/her EXTRA was ignored.

See also:

http://developer.android.com/reference/android/speech/RecognizerIntent.html

emulator android studio

Hello :)
help me please :( when running the android application, the emulator (Nexus 5x api 28-2) remains in this state.
is there a solution ???????
thank you
38217183_2085399355062132_7612757685947072512_n

RecognizerIntent sometimes starts in a wrong activity

Original issue 13 created by Kaljurand on 2011-11-04T19:08:41.000Z:

What steps will reproduce the problem?

  1. Start from a launcher icon, and go into a deeper menu, e.g. Demos
  2. Press HOME (leaving the Demos' activity on the top of the stack)
  3. Launch an app with a textfield (e.g. an SMS app)
  4. Try to fill the textfield via SwiftKey X's microphone button

What is the expected output? What do you see instead?

Instead of RecognizerIntentActivity (which provides the recorder/recognizer box), you'll see the Demos' activity, i.e. the current topmost activity in the RecognizerIntent activity stack.

To work around this problem, one must press BACK until reaching the HOME-screen, so that RecognizerIntent is completely destroyed and then go back to filling the textfield.

When called from another app, ALWAYS RecognizerIntentActivity must start.

Doesn't work with SwiftkeyX

Original issue 1 created by Kaljurand on 2011-09-30T12:57:35.000Z:

SwiftkeyX Android keyboard has a little microphone button for entering text via speech recognition. I can successfully open recognizer-intent with it, and it recognizes my speech, but the resulting text is not written to the current text field.

Works with Google Voice Search: Voice Search gives a list of recognition hypotheses to select from, and the selected text is written to the text field.

Offline Speech Recognition

Here in this library , can we implement speech recognition when offline , Can you provide any solution here or anyother way.Is Functionality Implemented in Konele.

speech-to-text crashes everytime on 4.1.2

Original issue 26 created by Kaljurand on 2012-10-21T16:56:02.000Z:

What steps will reproduce the problem?

  1. tap on the red icon to talk
  2. talk 3 - 5 seconds
  3. stop talking and let it process the speech

What is the expected output? What do you see instead?

app crashes ( logcat attached )

What version of the product are you using? On what operating system?

android 4.1.2

Please provide any additional information below.

User should be able to override incoming extras

Original issue 10 created by Kaljurand on 2011-11-03T11:08:44.000Z:

When calling RECOGNIZE_SPEECH the 3rd party app sets certain EXTRAS. The user should be able to configure if these extras are replaced and how, by setting for each app for each extra one of the following:

  • override with a new (possibly empty) value
  • do not change (DEFAULT)

For example, if a 3rd party app contains a hard-coded reference to a grammar URL which does not resolve and which cannot be changed in the app by the user, then the user could still change it in the Apps-list of RecognizerIntent.

Execution failed for task ':app:processDebugResources'. Failed to execute aapt

Hi Kaljurand

I am facing the following error when trying to build K6nele

Caused by: com.android.tools.aapt2.Aapt2Exception: AAPT2 error: check logs for details
at com.android.builder.png.AaptProcess$NotifierProcessOutput.handleOutput(AaptProcess.java:463)
at com.android.builder.png.AaptProcess$NotifierProcessOutput.err(AaptProcess.java:415)
at com.android.builder.png.AaptProcess$ProcessOutputFacade.err(AaptProcess.java:332)
at com.android.utils.GrabProcessOutput$1.run(GrabProcessOutput.java:104)

FAILURE: Build failed with an exception.

  • What went wrong:
    Execution failed for task ':app:processDebugResources'.

Failed to execute aapt

  • Try:
    Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output. Run with --scan to get full insights.

  • Get more help at https://help.gradle.org

Deprecated Gradle features were used in this build, making it incompatible with Gradle 5.0.
See https://docs.gradle.org/4.6/userguide/command_line_interface.html#sec:command_line_warnings

BUILD FAILED in 0s
28 actionable tasks: 1 executed, 27 up-to-date

Any suggestion?
Thanks
Mohamed Eldesouki

How to set Notes or another application to use Kaldi and not Google ASR?

Hi Kaarel Kaljurand,
I'm testing the library and I'm excited about the features it has. In one of the tests, I used an Android tablet setting the language to Estonian and the automatic recognition to Kaldi, while the web socket pointed to my local server where Kaldi runs with Spanish models. K6nele recognized it as expected, but when I use an application such as Notes or Handoff the system uses Google models in Spanish, even though it is set in Estonian (fast recognition). I disabled "Google OK" but still using the Google model. How can I use other applications with Kaldi? Thanks in advance.

Force close resulting from sendChunk (that uses HttpURLConnectionImpl)

Original issue 12 created by Kaljurand on 2011-11-03T15:09:51.000Z:

On HTC Wildfire (Android v2.2), sometimes getting this NPE:

{{{
E/AndroidRuntime( 5655): java.lang.NullPointerException
E/AndroidRuntime( 5655): at org.apache.harmony.luni.internal.net.www.protocol.http.HttpURLConnectionImpl.readln(HttpURLConnectionImpl.java:1293)
E/AndroidRuntime( 5655): at org.apache.harmony.luni.internal.net.www.protocol.http.HttpURLConnectionImpl.readServerResponse(HttpURLConnectionImpl.java:1351)
E/AndroidRuntime( 5655): at org.apache.harmony.luni.internal.net.www.protocol.http.HttpURLConnectionImpl.doRequest(HttpURLConnectionImpl.java:1644)
E/AndroidRuntime( 5655): at org.apache.harmony.luni.internal.net.www.protocol.http.HttpURLConnectionImpl.getInputStream(HttpURLConnectionImpl.java:1153)
E/AndroidRuntime( 5655): at ee.ioc.phon.netspeechapi.recsession.ChunkedWebRecSession.sendChunk(Unknown Source)
E/AndroidRuntime( 5655): at ee.ioc.phon.android.recognizerintent.RecognizerIntentActivity.sendChunk(RecognizerIntentActivity.java:706)
E/AndroidRuntime( 5655): at ee.ioc.phon.android.recognizerintent.RecognizerIntentActivity.access$4(RecognizerIntentActivity.java:703)
E/AndroidRuntime( 5655): at ee.ioc.phon.android.recognizerintent.RecognizerIntentActivity$6.run(RecognizerIntentActivity.java:560)
W/ActivityManager( 103): Force finishing activity ee.ioc.phon.android.recognizerintent/.demo.RepeaterDemo
}}}

question

Hi kaljurand,

(i put this question on alumae github page as well. i thought may be you can answer these questions.thankyou)

i am new to kaldi and the gstreamer plugin.

I have a question though. what is the reason behind choosing gstreamer to pass the audio to kaldi for decoding?

could a simple server be sufficient to pass the audio to kaldi for decoding? are there any limitations to such an approach that made you choose the gstreamer to act as an intermediary(pardon my ignorance if my thought process is wrong)?

thanks in advance

Optionally dispatch the recognition task to another recognizer

Original issue 22 created by Kaljurand on 2012-02-08T07:43:18.000Z:

Example: the user is switching keyboard layouts (e.g. between Estonian and Russian), and expects the speech recognizer to switch its languages as well.

In case the default server cannot handle the language change (the server must be explicit about which languages it supports), then Kõnele could try to intelligently switch to another server, or even dispatch the job to another speech recognition service installed on the phone.

In the settings, the user should be able to give guidelines for such switching, e.g. specify which alternative service she prefers.

Support Google Keep "get audio" extras

Original issue 30 created by Kaljurand on 2014-01-24T09:56:24.000Z:

Support extras:

  • android.speech.extra.GET_AUDIO_FORMAT: audio/AMR
  • android.speech.extra.GET_AUDIO: true

used by Google Keep.

Show service status in the UI

For services that support this (e.g. the service that is based on kaldi-gstreamer-server) present the status of the service (number of available slots, latency, etc.) somewhere / under certain conditions, e.g.

  • add a live message into "Settings/Recognition services/K6nele (fast recognition)" showing the number of currently available slots (similarly to http://bark.phon.ioc.ee:82/dev/duplex-speech-api/static/status.html), or a network error message;
  • after 2 consecutive network error messages in the IME / voice panel, test the network, and if it's otherwise working then pop up a dialog blaming the server, possibly offering the email address of the server maintainer as a target for complaints.

Motivated by #48

Correctly handle the case where EXTRA_LANGUAGE is not set

Original issue 27 created by Kaljurand on 2013-01-13T20:03:55.000Z:

According to:

http://developer.android.com/reference/android/speech/RecognizerIntent.html#EXTRA_LANGUAGE

the user's preferred locale must be used for the identification of the input speech language in case EXTRA_LANGUAGE is not set.

This has some complications for K6nele because many Estonian users prefer a non Estonian locale because the Estonian translation of Android is often horrible. See also:

http://nugiline.wordpress.com/2011/09/03/androidi-eesti-keel-imeb-lurinal-samsung-galaxy-s-ii-i9100/

Note also that IMEs have their own technique of identifying the language (by "selectedLanguage" String).

The beep-sounds should be improved

Original issue 2 created by Kaljurand on 2011-10-27T15:12:33.000Z:

What steps will reproduce the problem?

  1. Switch on "Play audio cues"
  2. Record/recognize something
  3. Notice the ugliness...

What is the expected output?

A nice beep in the style of Siri or Google Voice Search

What do you hear instead?

An ugly beep...

Unable to compile project due to dependency

Unable to resolve dependency for ':app@debug/compileClasspath': Could not resolve project :speechutils:app. Open File Show Details

Unable to resolve dependency for ':app@release/compileClasspath': Could not resolve project :speechutils:app. Open File Show Details

Unable to resolve dependency for ':app@debugAndroidTest/compileClasspath': Could not resolve project :speechutils:app. Open File Show Details

websocket issue

I want to start a ratchet websocket service to collect audio data from K6nele.
The websocket service is started, and I can open the websocket connection in javascript like:
var ws = createWebSocket("ws://10.6.71.10:8001");

However, when I use K6nele with websocket url "ws://10.6.71.10:8001" to open websocket connection, the handshake message showed:

HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: GJVr5M5/N98Sk4n1Yl2FnZI2/Os=
X-Powered-By: Ratchet/0.4.1

Is there any difference to open a websocket in k6nele comparing to javascript?

regards,
Yunzhao

IME: Turning off screen starts recording

When I turn off the screen (from the physical button) when the Kõnele IME is active, K6nele starts a new recording session itself at the same moment.

When I do this repeatedly, Kõnele gets into strange inconsistent state where nothing works any more -- nothing happens when I push the yellow button, message "VIGA: lindistamine luhtus", nothing happens on the server.

Using latest apk from IME branch, compiled by you.

The Grammars-list is emptied for no reason

Original issue 8 created by Kaljurand on 2011-10-27T18:40:19.000Z:

What steps will reproduce the problem?

  1. Use RecognizerIntent via some app
  2. Go to Settings/Apps and delete the entry corresponding to the app

What is the expected output? What do you see instead?

The Settings/Grammars list has become empty which is unexpected.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.