Git Product home page Git Product logo

funasr-app's Introduction

FunASR-APP

FunASR-APP is a comprehensive speech application toolkit designed to facilitate the application and integration of FunASR's open-source speech models. Its primary goal is to package the models into convenient application packages, enabling easy application and seamless integration.

What's New

  • 10/17 Bug fix for multiple periods chosen, used to return video with wrong length.
  • 10/10 ClipVideo now supports recognizing with speaker diarization ability, choose 'yes' button in 'Recognize Speakers' and you will get recognition results with speaker id for each sentence. And then you can clip out the periods of one or some speakers (e.g. 'spk0' or 'spk0#spk3') using ClipVideo.

ClipVideo

As the first application toolkit of FunASR-APP, ClipVideo enables users to clip .mp4 video files or .wav audio files with chosen text segments out of the recognition results generated by Paraformer-long model.

Under the help of ClipVideo you can get the video clips easily with the following steps (in Gradio service):

  • Step1: Upload your video file (or try the example videos below)
  • Step2: Copy the text segments you need to 'Text to Clip'
  • Step3: Adjust subtitle settings (if needed)
  • Step4: Click 'Clip' or 'Clip and Generate Subtitles'

Usage

git clone https://github.com/alibaba-damo-academy/FunASR-APP.git
cd FunASR-APP
# install modelscope
pip install "modelscope[audio_asr]" -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html
# python environments
pip install -r ClipVideo/requirments.txt

(Optional) If you want to clip video file with embedded subtitles

  1. ffmpeg and imagemagick is required
  • On Ubuntu
apt-get -y update && apt-get -y install ffmpeg imagemagick
sed -i 's/none/read,write/g' /etc/ImageMagick-6/policy.xml
  • On MacOS
brew install imagemagick
sed -i 's/none/read,write/g' /usr/local/Cellar/imagemagick/7.1.1-8_1/etc/ImageMagick-7/policy.xml 
  1. Download font file to ClipVideo/font
wget https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ClipVideo/STHeitiMedium.ttc -O ClipVideo/font/STHeitiMedium.ttc

Experience ClipVideo in Modelscope

You can try ClipVideo in modelscope space: link.

Use ClipVideo as Gradio Service

You can establish your own ClipVideo service which is same as Modelscope Space as follow:

python clipvideo/gradio_service.py

then visit localhost:7860 you will get a Gradio service like below and you can use ClipVideo following the steps:

Use ClipVideo in command line

ClipVideo supports you to recognize and clip with commands:

# working in ClipVideo/
# step1: Recognize
python clipvideo/videoclipper.py --stage 1 \
                       --file examples/2022云栖大会_片段.mp4 \
                       --output_dir ./output
# now you can find recognition results and entire SRT file in ./output/
# step2: Clip
python clipvideo/videoclipper.py --stage 2 \
                       --file examples/2022云栖大会_片段.mp4 \
                       --output_dir ./output \
                       --dest_text '我们把它跟乡村振兴去结合起来,利用我们的设计的能力' \
                       --start_ost 0 \
                       --end_ost 100 \
                       --output_file './output/res.mp4'

Study Speech Related Models in FunASR

FunASR hopes to build a bridge between academic research and industrial applications on speech recognition. By supporting the training & finetuning of the industrial-grade speech recognition model released on ModelScope, researchers and developers can conduct research and production of speech recognition models more conveniently, and promote the development of speech recognition ecology. ASR for Fun!

📚FunASR Paper: 🌟Support FunASR:

funasr-app's People

Contributors

alibaba-oss avatar r1ckshi avatar season4675 avatar wanghuii1 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.