Git Product home page Git Product logo

nue-asr's Introduction

Nue ASR

rinna-icon

[Paper] [Model card]

This repository includes codes for an end-to-end speech recognition model, Nue ASR, which integrates pre-trained speech and language models.

The name Nue comes from the Japanese word (鵺/ぬえ/Nue), one of the Japanese legendary creatures (妖怪/ようかい/Yōkai).

This model provides end-to-end Japanese speech recognition with recognition accuracy comparable to the recent ASR models. You can recognize speech faster than real time by using a GPU.

Benchmark scores, including our models, can be found at https://rinnakk.github.io/research/benchmarks/asr/

Setup

We tested our code using Python 3.8.10 and 3.10.12 with PyTorch 2.1.1 and Transformers 4.35.2. This codebase is expected to be compatible with Python 3.8 or later and recent PyTorch versions. The version of Transformers should be 4.33.0 or higher.

First, install the code for inference of this model.

pip install git+https://github.com/rinnakk/nue-asr.git

Command-line interface and python interface are available.

Command-line usage

The following command transcribes the audio file using the command line interface. Audio files will be automatically downsampled to 16kHz.

nue-asr audio1.wav

You can specify multiple audio files.

nue-asr audio1.wav audio2.flac audio3.mp3

We can use DeepSpeed-Inference to accelerate the inference speed of GPT-NeoX module. If you use DeepSpeed-Inference, you need to install DeepSpeed.

pip install deepspeed

Then, you can use DeepSpeed-Inference as follows:

nue-asr --use-deepspeed audio1.wav

Run nue-asr --help for more information.

Python usage

The example of Python interface is as follows:

import nue_asr

model = nue_asr.load_model("rinna/nue-asr")
tokenizer = nue_asr.load_tokenizer("rinna/nue-asr")

result = nue_asr.transcribe(model, tokenizer, "path_to_audio.wav")
print(result.text)

nue_asr.transcribe function can accept audio data as either a numpy.array or a torch.Tensor, in addition to audio file paths.

Acceleration of inference speed using DeepSpeed-Inference is also available within the Python interface.

import nue_asr

model = nue_asr.load_model("rinna/nue-asr", use_deepspeed=True)
tokenizer = nue_asr.load_tokenizer("rinna/nue-asr")

result = nue_asr.transcribe(model, tokenizer, "path_to_audio.wav")
print(result.text)

How to cite

@article{hono2023integration,
    title={An Integration of Pre-Trained Speech and Language Models for End-to-End Speech Recognition},
    author={Hono, Yukiya and Mitsuda, Koh and Zhao, Tianyu and Mitsui, Kentaro and Wakatsuki, Toshiaki and Sawada, Kei},
    journal={arXiv preprint arXiv:2312.03668},
    year={2023}
}

@misc{rinna-nue-asr,
    title={rinna/nue-asr},
    author={Hono, Yukiya and Mitsuda, Koh and Zhao, Tianyu and Mitsui, Kentaro and Wakatsuki, Toshiaki and Sawada, Kei},
    url={https://huggingface.co/rinna/nue-asr}
}

License

The Apache 2.0 license

nue-asr's People

Contributors

yky-h avatar

Stargazers

Razvan B. avatar Bagus Tris Atmaja avatar  avatar  avatar hua avatar Koichi Yoshizaki avatar yykt avatar Chaunice Zhang avatar Akinori Nakajima avatar gotomypc avatar t.tkmr avatar Tatsuki Okada avatar ーーー avatar redaready avatar Sangchun Ha (Patrick) avatar Phenomer avatar  avatar t.ashula avatar  avatar  avatar eruma avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.