Git Product home page Git Product logo

ai_webui's Introduction

AI-WEBUI: A universal web interface for AI creation, a handy tool for image, audio, and video processing

⭐ If it helps you, please give it a star, thank you! 🤗 中文文档

🌟 1. Introduction

ai-webui is a browser-based interface designed to provide a universal AI creation platform. drawing

This project provides basic functionalities such as image segmentation, object tracking, image restoration, speech recognition, speech synthesis, as well as advanced features such as chatbot, video translation, and video watermark removal, which greatly improve the efficiency of short video creation.

⚡2. Installation

To install and use AI-WebUI, follow these steps:

2.1 Clone this project to your local machine

git clone https://github.com/jasonaidm/ai_webui.git

2.2 Enter the project directory

cd ai_webui

2.3 Create a virtual environment

conda create -n aiwebui python=3.11
conda activate aiwebui

2.4 Install the required dependencies

apt install ffmpeg -y 
pip install -r requirements.txt

🚀3. Quick Start

Using AI-WebUI is very simple. Just follow the instructions on the interface. You can input creative elements by uploading videos, audios, images, or entering text, and interact with the model's output.

python webui.py -c ./configs/webui_configs.yml

After starting, open http://localhost:9090/?__theme=dark in your browser to see the project interface.

3.1 Single Function Demo

Considering the GPU performance issues of some users' personal computers, we provide single function demos that allow users to run a specific AI function without starting the entire project.

  1. Image Segmentation
  • Panorama segmentation
  • Segmentation based on points coordinates
  • Segmentation based on textual prompts
python webui.py -c ./configs/segmentation_demo.yml

segmentation demo

  1. Speech Recognition
  • Multilingual speech recognition (e.g., Chinese and English)
python webui.py -c ./configs/asr_demo.yml

asr demo

  1. Speech Synthesis
  • Multilingual speech synthesis (e.g., Chinese and English)
python webui.py -c ./configs/tts_demo.yml

tts demo

3.2 Combined Function Demo

More complex functions can be obtained by combining multiple AI models, requiring higher GPU resources.

  1. Chatbot
  • Text-based chatbot
  • Voice-based chatbot
python webui.py -c ./configs/chatbot_demo.yml

chatbot demo

  1. Video Restoration
  • Watermark removal
  • Mosaic removal
  • Object tracking
  • Object removal in videos
python webui.py -c ./configs/video_inpainter_demo.yml

video inpainter demo

  1. Video Conversion
  • Audio-video separation
  • Image cropping
  • Image noise addition
  • Frame extraction
  • Speech recognition
  • Subtitle translation
  • Speech synthesis
  • BGM addition
  • One-click video generation (automatic video replication from the internet)
python webui.py -c ./configs/video_convertor_demo.yml

video convertor demo

3.3 Full-function Online

Open all AI functions by running the following command:

python webui.py -c ./configs/webui_configs.yml

Since model loading takes a long time, it is recommended to load the models during the first inference after starting. You can control the loading strategy of each AI model through the "init_model_when_start_server" option in the configs/base.yml configuration file.

🔥4. Model Files

4.1 Model File Downloads

Model Model File Size Small Model List Download Link
chatglm2-6b-int4 3.7G Baidu Netdisk
chatglm2-6b 12G Tsinghua University Cloud Disk
sam_vit_b 358M Baidu Netdisk
sam_vit_h 2.4G Baidu Netdisk
FastSAM-s 23M Baidu Netdisk
FastSAM-x 138M Baidu Netdisk
ProPainter 150M Baidu Netdisk
raft-things 20M Baidu Netdisk
recurrent_flow_completion 19M Baidu Netdisk
cutie 134M Baidu Netdisk
whisper-samll 461M Baidu Netdisk
whisper-large-v3 2.9G Baidu Netdisk
  • The extraction code for Baidu Netdisk is: zogk

4.2 Directory Structure of Model Weight Files

model_weights/
├── chatglm
│   └── chatglm2-6b-int4
│       ├── config.json
│       ├── configuration_chatglm.py
│       ├── modeling_chatglm.py
│       ├── pytorch_model.bin
│       ├── quantization.py
│       ├── tokenization_chatglm.py
│       ├── tokenizer.model
│       └── tokenizer_config.json
├── fastsam
│   ├── FastSAM-s.pt
│   └── FastSAM-x.pt
├── propainter
│   ├── ProPainter.pth
│   ├── cutie-base-mega.pth
│   ├── raft-things.pth
│   └── recurrent_flow_completion.pth
├── sam
│   ├── sam_vit_b.pth
│   └── sam_vit_h.pth
└── whisper
    ├── large-v3.pt
    └── small.pt

If the GPU memory is less than 8G, you may need to use the small models to run the project; however, the performance of the small models may not be ideal, so it is recommended to run the large models if possible.

5. Contributing

If you have any suggestions or feature requests, please feel free to create an issue.

6. References

  • Segment-ant-Track-Anything
  • ProPainter
  • ChatGLM2-6B
  • segment-anything
  • FastSAM
  • whisper

ai_webui's People

Contributors

jasonaidm avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.