The vaksanca from cyfer0618

Vāksañcayaḥ - Sanskrit speech corpus has more than 78 hours of data and contains recordings of 45,953 sentences with a sampling rate of 22 KHz. The content is mainly readings of various texts spanning many Śāstras of Saṃskṛt literature and also includes contemporary stories, radio program, extempore discourse, etc. The summary datasheet associated with this corpus can be accessed here - Link. Please download the corpus from https://www.cse.iitb.ac.in/~asr/.

Environments

python version: 3.7.3
Model files
- List of the speakers used in the train, validation, test and out-of-domain-test split are given in the README file of corpus.
- SRILM LM link
Results for different model
- In-domain test data WER : 21.94 for the best performing model (SLP1 as the script and BPE splits as the LM unit).
- Out-of-domain test data WER for different speakers can be referred to in the paper.

Recipe

This Kaldi recipe is based on subword - Vowel Split and Byte Pair Encoding. For word based we used Wall Street Journal recipe

Training

Download the vowel splitter (This requires the text to be in SLP1 format)

Download the pre-trained model

Download the processed dataset

Convert the audio files for testing from .mp3 files to .wav files before testing using the script given with the corpus.
We used our best performing model(SLP1 as the script and BPE splits as the LM unit) for testing Out-of-domain data.
In-domain test data link (test.zip)
Out-of-domain test data link (truetest.zip)

Evaluate

From pre-trained model (SLP vowel split)

./decode.sh test
# | WER : 18.12
./decode.sh truetest
# | WER : 34.88

Publications

Devaraja Adiga and Rishabh Kumar and Amrith Krishna and Preethi Jyothi and Ganesh Ramakrishnan and Pawan Goyal, Automatic Speech Recognition in Sanskrit: A New Speech Corpus and Modelling Insights, In ACL 2021.

cyfer0618 / vaksanca Goto Github PK

vaksanca's Introduction

Environments

Recipe

Training

Evaluate

Publications

vaksanca's People

Contributors

Stargazers

Watchers

Forkers

vaksanca's Issues

Update processed dataset

The run.sh script does not accept "BPE" argument

Add the arxiv link to paper and update details

How To Use

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent