Git Product home page Git Product logo

getttttttt / speech-to-text-emotional-generation Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 4 KB

This Python-based project leverages advanced machine learning techniques to generate speech underlying emotions, enhancing user experience in applications ranging from virtual assistants to interactive gaming. Dive into our code, experiment with different emotional tones, and contribute to making digital communication more human-like!

Python 100.00%

speech-to-text-emotional-generation's Introduction

Speech-to-Text-Emotional-Generation

This Python-based project leverages advanced machine learning techniques to generate speech underlying emotions, enhancing user experience in applications ranging from virtual assistants to interactive gaming. Dive into our code, experiment with different emotional tones, and contribute to making digital communication more human-like!

实现思路

通过 识别-生成 两步完成

Step 1: 文本情感识别(已调试通过)

这一步骤的目标是从给定的文本中自动识别出其中的情感类型以及情感的强度。利用预训练的情感分析模型实现,模型会接收输入的文本,并返回一个包含情感类别(如joy)和一个代表情感强度的数值(介于0到1之间的概率值,值越大表示情感表现越明显)。

定义了函数classify_text_emotion,接受一段文本作为输入,调用模型,然后返回情感类别和强度。这为下一步的语音生成准备了情感信息。

Step 2: 文本到语音的情感生成(已完成但未调试)

在成功提取出文本的情感类型和强度后,下一步是将这些情感信息转化为语音,该工具支持在不同说话者之间转移情感,以生成富有表现力的语音。

具体实现过程中,根据从第一步得到的情感类型和强度,选择适当的说话者ID和情感ID。然后,运行命令行脚本synthesize.py,该脚本根据提供的文本、说话者ID、情感ID等参数,合成并输出带有特定情感的语音文件。

Step1 文本情感识别

Ref: https://huggingface.co/michellejieli/emotion_text_classifier

from transformers import pipeline

classifier = pipeline("sentiment-analysis", model="michellejieli/emotion_text_classifier")

def classify_text_emotion(text):
    results = classifier(text)
    emotion_type = results[0]['label']
    emotion_intensity = results[0]['score']
    return emotion_type, emotion_intensity

if __name__ == '__main__':
    text = 'I love life!'
    emotion_type, emotion_intensity = classify_text_emotion(text)
    print(f"Emotion Type: {emotion_type}, Intensity: {emotion_intensity}")

提取出Text的Emotion Type和Intensity

Step2 Speech-to-Text

Ref: https://github.com/keonlee9420/Cross-Speaker-Emotion-Transfer

将Step1中提取的结果作为Step2的input项输入,使用语音合成生成带有情感的语音。(环境配置参考Ref)

GitHub中的运行示例为:

python3 synthesize.py --text "YOUR_DESIRED_TEXT" --speaker_id SPEAKER_ID --emotion_id EMOTION_ID --restore_step RESTORE_STEP --mode single --dataset DATASET

输出音频文件

基于此给出的代码为:

import subprocess
from transformers import pipeline

classifier = pipeline("sentiment-analysis", model="michellejieli/emotion_text_classifier")

def classify_text_emotion(text):
    results = classifier(text)
    emotion_type = results[0]['label']
    emotion_intensity = results[0]['score']
    return emotion_type, emotion_intensity

def synthesize_emotional_speech(text, speaker_id, emotion_id, restore_step, dataset):
    command = [
        'python3', './Cross-Speaker-Emotion-Transfer/synthesize.py',
        '--text', text,
        '--speaker_id', str(speaker_id),
        '--emotion_id', str(emotion_id),
        '--restore_step', str(restore_step),
        '--mode', 'single',
        '--dataset', dataset
    ]
    subprocess.run(command)

if __name__ == '__main__':
    text = 'I love life!'
    emotion_type, emotion_intensity = classify_text_emotion(text)
    '''
    由于时间限制,这一部分两边的output和input还没有对齐,但是大致的逻辑是这样的
    '''
    speaker_id = 1
    emotion_id = 3  
    restore_step = 100000
    dataset = 'your_dataset_name'
    synthesize_emotional_speech(text, speaker_id, emotion_id, restore_step, dataset)

speech-to-text-emotional-generation's People

Contributors

getttttttt avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.