KanTV

KanTV("Kan", aka Chinese PinYin "Kan" or Chinese HanZi "看" or English "watch/listen") , an open source project focus on study and practise state-of-the-art AI technology in real scenario(such as online-TV playback and online-TV transcription(real-time subtitle) and online-TV language translation and online-TV video&audio recording works at the same time) on Android phone/device, derived from original , with much enhancements and new features:

Watch online TV and local media by customized , source code of customized FFmpeg 6.1 could be found in external/ffmpeg according to FFmpeg's license
Record online TV to automatically generate videos (useful for short video creators to generate short video materials but pls respect IPR of original content creator/provider); record online TV's video / audio content for gather video / audio data which might be required of/useful for AI R&D activity
ASR(Automatic Speech Recognition, a subfiled of AI) study by the great whisper.cpp
LLM(Large Language Model, a subfiled of AI) study by the great llama.cpp ，Run/experience LLM(such as llama-2-7b, baichuan2-7b, qwen1_5-1_8b, gemma-2b) on Xiaomi 14 using the llama.cpp
Real-time English subtitle for English online-TV(aka OTT TV) by the great & excellent & amazing whisper.cpp (PoC finished on Xiaomi 14. Xiaomi 14 or other powerful Android mobile phone is HIGHLY required/recommended for real-time subtitle feature otherwise unexpected behavior would happen)
2D graphic performance
Set up a customized playlist and then use this software to watch the content of the customized playlist for R&D activity
UI refactor(closer to real commercial Android application and only English is supported in UI language currently)
Well-maintained "workbench" for ASR(Automatic Speech Recognition) researchers who was interested in practise state-of-the-art AI tech(like whisper.cpp) in real scenario on mobile device(focus on Android currently)
Well-maintained "workbench" for LLM(Large Language Model) researchers who was interested in practise state-of-the-art AI tech(like llama.cpp) in real scenario on mobile device(focus on Android currently)
Android turn-key project for AI researchers(whom mightbe not familiar with regular Android software development)/developers/beginners focus on edge/device-side AI learning / R&D activity, some AI R&D activities (AI algorithm validation / AI model validation / performance benchmark in ASR, LLM, TTS, NLP, CV......field) could be done by Android Studio IDE + a powerful Android phone very easily

Software architecture of KanTV Android

(depend on zhouwg#121 and zhouwg#176 )

How to build project

Fetch source codes


git clone https://github.com/zhouwg/kantv.git

cd kantv

git checkout master

cd kantv

Setup development environment

Option 1: Setup docker environment

Build docker image

docker build build -t kantv --build-arg USER_ID=$(id -u) --build-arg GROUP_ID=$(id -g) --build-arg USER_NAME=$(whoami)

Run docker container

# map source code directory into docker container
docker run -it --name=kantv --volume=`pwd`:/home/`whoami`/kantv kantv

# in docker container
. build/envsetup.sh

./build/prebuild-download.sh

Option 2: Setup local environment

Prerequisites

Host OS information:

uname -a

Linux 5.8.0-43-generic #49~20.04.1-Ubuntu SMP Fri Feb 5 09:57:56 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

cat /etc/issue

Ubuntu 20.04.2 LTS \n \l

tools & utilities

sudo apt-get update
sudo apt-get install build-essential -y
sudo apt-get install cmake -y
sudo apt-get install curl -y
sudo apt-get install wget -y
sudo apt-get install python -y
sudo apt-get install tcl expect -y
sudo apt-get install nginx -y
sudo apt-get install git -y
sudo apt-get install vim -y
sudo apt-get install spawn-fcgi -y
sudo apt-get install u-boot-tools -y
sudo apt-get install ffmpeg -y
sudo apt-get install openssh-client -y
sudo apt-get install nasm -y
sudo apt-get install yasm -y
sudo apt-get install openjdk-17-jdk -y

sudo dpkg --add-architecture i386
sudo apt-get install lib32z1 -y

sudo apt-get install -y android-tools-adb android-tools-fastboot autoconf \
        automake bc bison build-essential ccache cscope curl device-tree-compiler \
        expect flex ftp-upload gdisk acpica-tools libattr1-dev libcap-dev \
        libfdt-dev libftdi-dev libglib2.0-dev libhidapi-dev libncurses5-dev \
        libpixman-1-dev libssl-dev libtool make \
        mtools netcat python-crypto python3-crypto python-pyelftools \
        python3-pycryptodome python3-pyelftools python3-serial \
        rsync unzip uuid-dev xdg-utils xterm xz-utils zlib1g-dev

sudo apt-get install python3-pip -y
sudo apt-get install indent -y
pip3 install meson ninja

echo "export PATH=/home/`whoami`/.local/bin:\$PATH" >> ~/.bashrc

or run below script accordingly after fetch project's source code


./build/prebuild.sh

Android Studio

download and install Android Studio manually

Android Studio 4.2.1 or latest Android Studio
vim settings

borrow from http://ffmpeg.org/developer.html#Editor-configuration

set ai
set nu
set expandtab
set tabstop=4
set shiftwidth=4
set softtabstop=4
set noundofile
set nobackup
set fileformat=unix
set undodir=~/.undodir
set cindent
set cinoptions=(0
" Allow tabs in Makefiles.
autocmd FileType make,automake set noexpandtab shiftwidth=8 softtabstop=8
" Trailing whitespace and tabs are forbidden, so highlight them.
highlight ForbiddenWhitespace ctermbg=red guibg=red
match ForbiddenWhitespace /\s\+$\|\t/
" Do not highlight spaces at the end of line while typing on that line.
autocmd InsertEnter * match ForbiddenWhitespace /\t\|\s\+\%#\@<!$/

Download android-ndk-r26c to prebuilts/toolchain, skip this step if android-ndk-r26c is already exist
```
. build/envsetup.sh

./build/prebuild-download.sh
```
Modify ggml/CMakeLists.txt and ncnn/CMakeLists.txt accordingly if target Android device is Xiaomi 14 or Qualcomm Snapdragon 8 Gen 3 SoC based Android phone
Modify ggml/CMakeLists.txt and ncnn/CMakeLists.txt accordingly if target Android phone is NOT Qualcomm SoC based Android phone

Build native codes

. build/envsetup.sh

Build Android APK

Option 1: Build APK from source code by Android Studio IDE

Option 2: Build APK from source code by command line

  . build/envsetup.sh
  lunch 1
  ./build-all.sh android

Run Android APK on Android phone

This Android APK works well on any mainstream Android phone and the following four permissions are required:

Access to storage is required to generate necessary temporary files
Access to device information is required to obtain current phone network status information, distinguishing whether the current network is Wi-Fi or mobile when playing online TV
Access to camera is needed for AI Agent
Access to mic(audio recorder) is needed for AI Agent

here is a short video to demostrate AI subtitle by running the great & excellent & amazing whisper.cpp on a Xiaomi 14 device - fully offline, on-device.

realtime-subtitle-by-whispercpp-demo-on-xiaomi14-finetune-20240324.mp4

here is a screenshot to demostrate LLM inference by running the magic llama.cpp on a Xiaomi 14 device - fully offline, on-device.

here is a screenshot to demostrate ASR inference by running the excellent whisper.cpp on a Xiaomi 14 device - fully offline, on-device.

some other screenshots

Hot topics

Android multimodal AI agent(ASR, LLM, TTS, CV, NLP, ..., an open source GPT-4o style multimodal AI agent on Android phone) by GGML + NCNN
improve the quality of Qualcomm QNN backend for GGML
bugfix in UI layer(Java)
bugfix in native layer(C/C++)

Contribution

Be sure to review the opening issues before contribute to project KanTV, We use GitHub issues for tracking requests and bugs, please see how to submit issue in this project .

Report issue in various Android-based phone or even submit PR to this project is greatly welcomed.

Docs

Special Acknowledgement

GGML by Georgi Gerganov
ncnn by Tencent

ASR engine whisper.cpp by Georgi Gerganov
LLM engine llama.cpp by Georgi Gerganov
TTS engine bark.cpp by PABannier
Text2Image engine stablediffusion.cpp by leejet
ASR engine sherpa-ncnn(an open-source ASR engine using next-generation Kaldi with ncnn) by k2-fsa

License

Copyright (c) 2021 - 2023 Project KanTV

Copyright (c) 2024 -  Authors of Project KanTV

Licensed under Apachev2.0 or later

hiquanta / kantv Goto Github PK

kantv's Introduction