Git Product home page Git Product logo

xbot_head's Introduction

XBot Head

Build Status

Overview

XBot Head is an android application which used for XBot Robot .There are many ROS package running on Xbot.For more details please visit http://wiki.ros.org/Robots/Xbot/indigo .

Xbot Head can recognise specific faces with the support of recognition server.User can register by just a few steps. What is more,it can control the media player of android devices to play audio files that is about The Software Museum of Chinese Academy of Sciences.

Xbot Head communicates with the Xbot by Rosbridge_suite Protocal which can ameliorate the interaction of Ros devices compared with RosJava.

About Xbot

download Xbot Head : http://fir.im/u4rz

Prerequisite

  • Before using this application, please make sure the Ros Server and the Recognition Server have been started correctly.
  • After xbot head application started,the Ip address of Ros Server and Ip address of Recognition Server should be configured correctly in setting page of xbot head .

Features

1.User registration :User can register into our service by taking a photo of head portrait.Then the photo will be sent to Recognition Server.At next time the rocognition server will recognise who he/she is.

2.**Face Recognition & Audio Commentary ** :After face detection and face recognition ,the app will greet to user and then begin to play audio files which is about The Software Museum of Chinese Academy of Sciences.Xbot iscan be used for commentary in many scenes.

3.**Face Sign In Mode ** :Face Sign-in function can be used in common scenes ,such as office and schools. Xbot Head can complete this work perfectly with the cooperation of Xbot.

4.**Comprehensive Interaction ** :This function is about AI-Talk mode.It will start conversation between people and Xbot Head.

5.**Manipulation & Controller ** :There is another application, called XbotPlayer that is used for manipulating the movement of Xbot.

Ros Topic Statement

Commentary Mode

There are two kinds of topic in this mode :

  • /audio_status :After the commentary audio started , the backgroud service of application will publish an AudioStatus in topic /audio_status. The message used in /audio_status is:

    int32 id
    bool iscomplete
    

    int32 id -- The commentary id that the media player is playing at now.

    bool iscomplete -- The audio file is complete or not.

  • /museum_pos :When application started ,it will subscribe this topic in order to know the current status of the movebase.When Xbot arrived at a location ,it will publish MuseumPosition in this topic .The message used in /museum_pos is :

    int32 id
    bool ismoving
    

    int32 id -- Current id of area which Xbot is in.

    bool ismoving -- Whether the xbot is moving.

Face Sign Mode

There are two kinds of topic in this mode.

  • /robot_status :When Xbot arrived at a target point ,it will publish a RobotStatus message in /robot_status topic .The Message type of RobotStatus is :
int32 id
bool ismoving

int32 id -- Current id of area which Xbot is in.

bool ismoving -- Whether the xbot is moving.

  • pad_sign_completion :Xbot Head will recognize each person and send the recognition status to Xbot.After each checkpoint finished, it will send a SignStatus to Xbot in pad_sign_completion topic. The message type of SignStatus is :
bool complete
bool success

bool complete -- Means whether a checkpoint have been finished by Xbot Head.It shows a recognition server have completed a recognition request or timeout of a recognition request.

bool success -- When this value is true ,means a person's face have been recognized successfully .When this value is false ,means that recognize failed or timeout of connection to recognition server.

Contributors of this project

Thanks

bytefish : a sample Android application for Face Detection .

PoiCamera : an Android application by using android.hardware.camera2 API.

IFLYTEK : an online service for voice recognition.

rosbridge_suite : RosBridge Protocal.

icons8.com : Provide support of Icons. The License .

This Project is originally developed by Nguyen Minh Tri - [email protected]

Original contributors:


License

Copyright 2017 Wei Wu
Copyright 2017 Songting Li

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

xbot_head's People

Contributors

betri28 avatar lazyparser avatar lisongting avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

xbot_head's Issues

讯飞TTS测试_可识别人脸但无法播报人名

在优图客户端可识别人脸,下方可显示YOUTU:ret,confidence,id;
阈值不足0.6则认为无法识别到对应id,播报“你好游客,这里是&#@”,阈值到达0.6以上则可识别到对应id,播报“你好,这里是&¥#@”;

[meta] 添加命令词识别功能:识别“停止解说”和“恢复解说”的命令词。

这个功能室逐步加入智能化功能的一部分。
这个issue的目标不是实现对话,是实现简单的一个语音控制功能。

目前的希望是能够在xbot开始播放解说词之后,用户对着pad的麦克风说“停止解说”四个字,xbot能够暂停解说;对着pad说“继续解说”、“恢复解说”、“开始解说”三个命令词的任何一个,xbot能够从之前停止的位置继续解说(不用从精确的几分几秒开始,是从当前的解说点开始重新解说)。

设置界面中对非法输入值的处理

现在有如下两种设置选项需要处理非法值:

服务器地址:合法值为标准点分十进制形式。非法值可能有:中文汉字、英文字符串、数字和非数字混合字符串、分割符(.)数目不为3、特殊字符(如@》*)等等。

人脸检测阈值:合法值为(0,1)区间的数。非法值可能有:中文汉字、英文字符串、大于等于1和小于等于0的数、特殊字符等等。

还需考虑用户输入错误值之后,应该将值恢复为何值(我觉得应该恢复为上一次设置的合法值,如果用户从来没有设置过,则设置为默认值)。

[中级]重新改写人脸检测和摳出人脸的代码, 提高检测速度.

现有的代码是从github上一个demo项目中直接fork过来的.
能work, 问题是demo目的所以速度没有关注.
目前的实测速度是FPS=3~5 (这个跟具体设备有关, 以我们JDTab或者小米pad3为准).
有点卡的. 我们希望优化一下代码, 或者重写一下, 让FPS可以达到7~10最好.

另外, 现有的代码中有大量的 magic number. 有关于中脸部框框截图的计算, 大部分是我当时手工调的. 日子久了就没法看了. 这部分, 可以摳出來一个小的class来进行处理.

另外, 目前, 没记错的话, 摳图发送给youtu服务器的, 和在屏幕上显示摳图的框框的, 代码好象是复制粘贴的两部分. 这个最好能够合并起来(我觉得是应该合并起来的).

[goodfirstbug]用更好的方式来输入IP地址

目前的IP地址是输入一个字符串。
理想的情况是能够:

  • 能够拆分成4个三位八进制数字的输入,这样就可以直接将输入框限定为数字输入
    进一步的,bonus是
  • 提供192.168.10.0.0.这两个常用的子网段可以直接选择,在测试的时候输入会方便一些。

用户注册功能

用户注册功能需要实现:
要求用户输入姓名,然后进行头像采集。发送给服务端,然后在界面显示:注册成功或失败。

给项目添加自动构建服务

github上有大量的面向开源项目的自动构建服务。例如 Travis 等。
我看过 Travis 可能并不一定适合 Android 项目的构建。
目前github集成了几百个CI服务,不出意外肯定有针对安卓项目的。

这个issue的目标是调研一下github上可以免费集成的CI服务,并给 xbothead项目添加自动构建。

Android中的TextToSpeech

安卓中的TextToSpeech可以实现朗读文字。
但目前只支持以下语言(并不支持中文):英文、德文、法文、西班牙文、意大利文

关于TensorFlow在Android中集成

TensorFlow是Google的机器学习框架。

  • 我浏览了一些网站,tensorflow的官网是https://www.tensorflow.org/
    .然而我没有VPN,进不去。其中唯一觉得靠谱点的是极客学院 提供的这个资料,还有Tensorflow中文社区 ,然而这两个网站都没有讲到怎样在Android中使用tensorflow。
  • tensorflow在github中只给出了一个巨大的88M的demo,(只给了demo,没有提供源码),在官方仓库 的最底部。我也下载运行了一下,界面就和caffe2 AI-camera差不多,都是对物体 进行识别的。
  • 综上,关于tensorflow,可参考资料还是太少。

主界面的美化

我昨天找到一个不错的图标素材网站:http://www.iconsdb.com/

iconsDB.com currently has 4113 icons in the database that you can customize and download in any color and any size you want ! 412,028,303 icon downloads and counting ! 2659 icons can be used freely in both personal and commercial projects with no attribution required, but always appreciated and 1454 icons require a link to be used. All logos and trademarks presented in some icons are copyright of their respective trademark owners.

里面的图标是无版权,可商用的。
我打算寻找几个合适的来替换当前的原生按钮界面

[medium]调研如何使用百度的TTS服务,准备加入到app中

20170519我们会迎来又一次小的发布(Release),在此次发布中我们希望能够使用百度在线TTS的服务将已注册用户的名字念出来。

目前的实现方式:预先通过百度TTS生成和保存了几位开发人员的名字,对于普通游客和非工作人员的已注册用户,则统一念出来“游客”或者“已注册用户”。

这个issue预期的改变:

  • 通过腾讯优图检测到人脸结果之后(返回了ID和概率),
  • 通过优图API查询ID对应的名字(需要发起一次新的 POST 请求,封包格式JSON,解析也是JSON);
  • 调用百度TTS API,发送姓名,返回audio/mp3
  • 将audio/mp3缓存并加入到tts播放列表中。

bonus(意思是加分项):能够将已经查询过的名字缓存起来,这样可以减少网络查询的次数。

[meta] 重构代码

目前的代码非常乱,实际上是两个App的代码共用一个仓库。Ctrl-C 和 Ctrl-V 了不少的部分。还有很多为了demo用的硬编码需要去掉。

目标是能够将几个不同的功能模块化出来,将两个app共用的部分抽取出来变成jar库;同时也更多可配置的部分。

[feature]添加一个功能,使得xbothead应用能够作为遥控器控制xbot运动底盘

原理上还是通过topic的方式。keyop的形式可以作为参考。 在 xbot 项目中有一个 keyop 项目,或许可以作为参考的起点。具体的交互方式,可以等roc忙完这两周之后来一起约定下。

这期间可以继续看看ROS的核心库,尤其是通信部分,关于控制信号传送的稳定性或许到时候是一个需要考虑的因素。

综合交互模式

昨天和汪学长讨论了之后,决定将人脸识别+解说词播放+AI对话功能整合为一个整体功能,其实现细节大致如下:

  • 1.首先开启摄像头进行人脸识别
  • 2.识别完成后播放用户问候语
  • 3.问候完毕,开启AI对话功能,进行一些基础AI对话
  • 4.如果在AI对话功能中,用户说出“带我参观博物馆”指令,或者点击界面上的“参观博物馆”按钮,则开启语音解说词,并与ROS通信,底盘进行移动,到达指定的位置播放指定解说词。(在解说阶段关闭人脸识别和AI对话功能)
  • 5.参观完成后,xbot回到出发点。
    该功能暂时称之为“综合交互模式”。
    进度体现在这个分支:https://github.com/lisongting/xbot_head/commits/comprehensive-mode

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.