lazyparser / xbot_head Goto Github PK

View Code? Open in Web Editor NEW

6.0 4.0 3.0 78.13 MB

中科院软件所XLab的机器人UI交互模块

Home Page: https://github.com/lazyparser/weloveinterns/wiki/

License: Apache License 2.0

Java 100.00%

android ros youtu face-detection face-identification robot iscas

xbot_head's Introduction

XBot Head

Overview

XBot Head is an android application which used for XBot Robot .There are many ROS package running on Xbot.For more details please visit http://wiki.ros.org/Robots/Xbot/indigo .

Xbot Head can recognise specific faces with the support of recognition server.User can register by just a few steps. What is more,it can control the media player of android devices to play audio files that is about The Software Museum of Chinese Academy of Sciences.

Xbot Head communicates with the Xbot by Rosbridge_suite Protocal which can ameliorate the interaction of Ros devices compared with RosJava.

About Xbot

This is the wiki of Xbot : http://wiki.ros.org/Robots/Xbot .
The source code of Xbot:https://github.com/yowlings/xbot.
The website:http://robots.ros.org/xbot.

download Xbot Head : http://fir.im/u4rz

Prerequisite

Before using this application, please make sure the Ros Server and the Recognition Server have been started correctly.
After xbot head application started,the Ip address of Ros Server and Ip address of Recognition Server should be configured correctly in setting page of xbot head .

Features

1.User registration ：User can register into our service by taking a photo of head portrait.Then the photo will be sent to Recognition Server.At next time the rocognition server will recognise who he/she is.

2.**Face Recognition & Audio Commentary ** ：After face detection and face recognition ,the app will greet to user and then begin to play audio files which is about The Software Museum of Chinese Academy of Sciences.Xbot iscan be used for commentary in many scenes.

3.**Face Sign In Mode ** :Face Sign-in function can be used in common scenes ,such as office and schools. Xbot Head can complete this work perfectly with the cooperation of Xbot.

4.**Comprehensive Interaction ** :This function is about AI-Talk mode.It will start conversation between people and Xbot Head.

5.**Manipulation & Controller ** ：There is another application, called XbotPlayer that is used for manipulating the movement of Xbot.

Ros Topic Statement

Commentary Mode

There are two kinds of topic in this mode :

/audio_status :After the commentary audio started , the backgroud service of application will publish an AudioStatus in topic /audio_status. The message used in /audio_status is:
```
int32 id
bool iscomplete
```
int32 id -- The commentary id that the media player is playing at now.

bool iscomplete -- The audio file is complete or not.
/museum_pos ：When application started ,it will subscribe this topic in order to know the current status of the movebase.When Xbot arrived at a location ,it will publish MuseumPosition in this topic .The message used in /museum_pos is :
```
int32 id
bool ismoving
```
int32 id -- Current id of area which Xbot is in.

bool ismoving -- Whether the xbot is moving.

Face Sign Mode

There are two kinds of topic in this mode.

/robot_status :When Xbot arrived at a target point ,it will publish a RobotStatus message in /robot_status topic .The Message type of RobotStatus is :

int32 id
bool ismoving

int32 id -- Current id of area which Xbot is in.

bool ismoving -- Whether the xbot is moving.

pad_sign_completion :Xbot Head will recognize each person and send the recognition status to Xbot.After each checkpoint finished, it will send a SignStatus to Xbot in pad_sign_completion topic. The message type of SignStatus is :

bool complete
bool success

bool complete -- Means whether a checkpoint have been finished by Xbot Head.It shows a recognition server have completed a recognition request or timeout of a recognition request.

bool success -- When this value is true ,means a person's face have been recognized successfully .When this value is false ,means that recognize failed or timeout of connection to recognition server.

Contributors of this project

Wei Wu lazyparser
Songting Li lisongting

Thanks

bytefish : a sample Android application for Face Detection .

PoiCamera : an Android application by using android.hardware.camera2 API.

IFLYTEK : an online service for voice recognition.

rosbridge_suite : RosBridge Protocal.

icons8.com : Provide support of Icons. The License .

This Project is originally developed by Nguyen Minh Tri - [email protected]

Original contributors:

Nguyen Minh Tri betri28
xxhong hibernate2011

License

Copyright 2017 Wei Wu
Copyright 2017 Songting Li

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

xbot_head's People

Contributors

Stargazers

Watchers

Forkers

xbotgroup lisongting

xbot_head's Issues

讯飞TTS测试_可识别人脸但无法播报人名

在优图客户端可识别人脸，下方可显示YOUTU：ret，confidence，id；
阈值不足0.6则认为无法识别到对应id，播报“你好游客，这里是&#@”，阈值到达0.6以上则可识别到对应id，播报“你好，这里是&￥#@”；

给App增加连接ROS服务器失败或超时的错误处理和恢复机制

目前的app开启的时候连接ROS服务器，如果服务器没有可能就闪退了，网络异常处理机制没有做。

关于Caffe2在Android中的集成

Caffe2是facebook的机器学习框架。

Caffe2有两个正式的官网：http://caffe.berkeleyvision.org/ 和 https://caffe2.ai/。
我看了看，有关于在移动设备中集成caffe2的教程，它给出的demo就是这个，这个就是之前在slack中讨论过的那个demo。
我把这个demo运行了一遍，也是对物体进行识别的。我觉得后续可以在项目中集成caffe2。

[meta] 添加命令词识别功能：识别“停止解说”和“恢复解说”的命令词。

这个功能室逐步加入智能化功能的一部分。
这个issue的目标不是实现对话，是实现简单的一个语音控制功能。

目前的希望是能够在xbot开始播放解说词之后，用户对着pad的麦克风说“停止解说”四个字，xbot能够暂停解说；对着pad说“继续解说”、“恢复解说”、“开始解说”三个命令词的任何一个，xbot能够从之前停止的位置继续解说（不用从精确的几分几秒开始，是从当前的解说点开始重新解说）。

订阅 /stop_run 并根据此信号停止解说词播放

从 #6 拆分过来的需求。 /speaker_done是高优先级的需求，并且pending了下层机器人的运动控制；而 /stop_run 目前并不需要。

[easy]重新按照解说展板的划分，将解说词进行重新分解和生成

当前的解说词TTS是按照文字长度切分的。在后续需求中需要将解说词和展板对应，因此需要按照展板内容切分。

设置界面中对非法输入值的处理

现在有如下两种设置选项需要处理非法值：

服务器地址：合法值为标准点分十进制形式。非法值可能有：中文汉字、英文字符串、数字和非数字混合字符串、分割符（.）数目不为3、特殊字符（如@》*）等等。

人脸检测阈值：合法值为（0,1）区间的数。非法值可能有：中文汉字、英文字符串、大于等于1和小于等于0的数、特殊字符等等。

还需考虑用户输入错误值之后，应该将值恢复为何值（我觉得应该恢复为上一次设置的合法值，如果用户从来没有设置过，则设置为默认值）。

[meta]讨论确定异常流程（急刹，障碍等）后续UI端怎么处理

正常流程是走走停停播放语音，异常处理之后怎么恢复，有多种技术可行的方式，需要讨论明确下，看看哪种方式是最符合在场的人的预期的。

添加一个加载等待动画

在等待ROS Server连接的界面，加入一个等待动画。

[中级]重新改写人脸检测和摳出人脸的代码, 提高检测速度.

现有的代码是从github上一个demo项目中直接fork过来的.
能work, 问题是demo目的所以速度没有关注.
目前的实测速度是FPS=3~5 (这个跟具体设备有关, 以我们JDTab或者小米pad3为准).
有点卡的. 我们希望优化一下代码, 或者重写一下, 让FPS可以达到7~10最好.

另外, 现有的代码中有大量的 magic number. 有关于中脸部框框截图的计算, 大部分是我当时手工调的. 日子久了就没法看了. 这部分, 可以摳出來一个小的class来进行处理.

另外, 目前, 没记错的话, 摳图发送给youtu服务器的, 和在屏幕上显示摳图的框框的, 代码好象是复制粘贴的两部分. 这个最好能够合并起来(我觉得是应该合并起来的).

增加AI对话功能

[goodfirstbug]用更好的方式来输入IP地址

目前的IP地址是输入一个字符串。
理想的情况是能够：

能够拆分成4个三位八进制数字的输入，这样就可以直接将输入框限定为数字输入
进一步的，bonus是
提供192.168.和10.0.0.这两个常用的子网段可以直接选择，在测试的时候输入会方便一些。

我浏览了一些网站，tensorflow的官网是https://www.tensorflow.org/
.然而我没有VPN，进不去。其中唯一觉得靠谱点的是极客学院提供的这个资料，还有Tensorflow中文社区，然而这两个网站都没有讲到怎样在Android中使用tensorflow。
tensorflow在github中只给出了一个巨大的88M的demo，（只给了demo，没有提供源码），在官方仓库的最底部。我也下载运行了一下，界面就和caffe2 AI-camera差不多，都是对物体进行识别的。
综上，关于tensorflow，可参考资料还是太少。

主界面的美化

我昨天找到一个不错的图标素材网站：http://www.iconsdb.com/

iconsDB.com currently has 4113 icons in the database that you can customize and download in any color and any size you want ! 412,028,303 icon downloads and counting ! 2659 icons can be used freely in both personal and commercial projects with no attribution required, but always appreciated and 1454 icons require a link to be used. All logos and trademarks presented in some icons are copyright of their respective trademark owners.

里面的图标是无版权，可商用的。
我打算寻找几个合适的来替换当前的原生按钮界面

[medium]调研如何使用百度的TTS服务，准备加入到app中

20170519我们会迎来又一次小的发布（Release），在此次发布中我们希望能够使用百度在线TTS的服务将已注册用户的名字念出来。

目前的实现方式：预先通过百度TTS生成和保存了几位开发人员的名字，对于普通游客和非工作人员的已注册用户，则统一念出来“游客”或者“已注册用户”。

这个issue预期的改变：

通过腾讯优图检测到人脸结果之后（返回了ID和概率），
通过优图API查询ID对应的名字（需要发起一次新的 POST 请求，封包格式JSON，解析也是JSON）；
调用百度TTS API，发送姓名，返回audio／mp3
将audio／mp3缓存并加入到tts播放列表中。

bonus（意思是加分项）：能够将已经查询过的名字缓存起来，这样可以减少网络查询的次数。

移动端接收：/speak_done (std_msg/Bool)
pad端接收：/stop_run (std_msg/Bool)
具体的实现方法见test_pub2.py，在test_pub2.py中假设了语言运行的函数是speakCB，计数50次，完成语音说话。
xbot_navigation/Xbot/src/nav_staff/src/test_pub2.py
https://github.com/DinnerHowe/xbot_navigation/blob/master/Xbot/src/nav_staff/src/test_pub2.py

[meta] 寻找项目中被标记为"已被废弃(deprecated)"的API，进行替换

如果在项目的代码审计中看到了已被废弃(deprecated)的代码，开新的issue，一次更新一个／类api。

代码质量改进。

移除xbothead项目中人脸检测的模块，仅保留原“机器人脸”模块

人脸检测这个功能实际上是一个门禁系统实用的功能
后续我们可以把这个功能移除掉
门禁系统fork到别的仓库中去了

增加“查看已注册用户列表”功能

综合交互模式

昨天和汪学长讨论了之后，决定将人脸识别+解说词播放+AI对话功能整合为一个整体功能，其实现细节大致如下：

1.首先开启摄像头进行人脸识别
2.识别完成后播放用户问候语
3.问候完毕，开启AI对话功能，进行一些基础AI对话
4.如果在AI对话功能中，用户说出“带我参观博物馆”指令，或者点击界面上的“参观博物馆”按钮，则开启语音解说词，并与ROS通信，底盘进行移动，到达指定的位置播放指定解说词。（在解说阶段关闭人脸识别和AI对话功能）
5.参观完成后，xbot回到出发点。
该功能暂时称之为“综合交互模式”。
进度体现在这个分支：https://github.com/lisongting/xbot_head/commits/comprehensive-mode

[feature] 实现跟ROSCore的通信

基于 @hibernate2011 的 RosClient 项目，添加跟ROS的通信功能。