Git Product home page Git Product logo

speech-to-intent-dataset's Introduction

Skit-S2I Dataset

Dataset Release for Intent Classification task from Speech

About

This is a dataset for Intent classification from human speech, and covers 14 coarse-grained intents from the Banking domain. This work is inspired by a similar release in the Minds-14 dataset - here, we restrict ourselves to Indian English but with a much larger training set. The dataset is split into:

  • test - 100 samples per intent
  • train - >650 samples per intent

The data was generated by 11 (Indian English) speakers, recording over a telephony line. We also provide access to anonymised speaker information - like gender, languages spoken, native language - so as to allow more structured discussions around robustness and bias, in the models you train.

Download and Usage

The dataset can be downloaded by clicking on this link. Incase you face any issues please reach out to [email protected].

This dataset is shared under Creative Commons Attribution-NonCommercial 4.0 International Licence. This places restrictions on commercial use of this dataset.

Uses

Most spoken dialog-systems use a pipeline of speech recognition followed by intent classification, and optimise each individually. But this allows ASR errors to leak downstream. Instead, what if we train end-to-end intent models on speech ? More importantly, how well would such models generalise in a language like Indian English - given the diversity of speech behaviours ? This dataset is an attempt towards answering such questions around robustness and model bias.

Structure

This release contains data of (Indian English) speech samples tagged with an intent from the Banking domain. Also includes the transcript template used to generate the sample.

Audio Quality : 8 Khz, 16-bit

Structure

- wav_audios          [contains the wav audio files]
- train.csv           [contains the train split, where each row contains "<id> | <intent-class> | <template> | <audio-path> | <speaker-id>"]
- test.csv            [contains the test split, where each row contains "<id> | <intent-class> | <template> | <audio-path> | <speaker-id>"]
- intent_info.csv     [contains information about the intents, where each row contains "<intent-class> | <intent-name> | <description>"]
- speaker_info.csv    [contains information about the speakers, where each row contains "<speaker-id> | <native-language> | <languages-spoken> | <places-lived> | <gender>"]

More information regarding the dataset can be found in the datasheet.

Baselines

The code for the baselines are provided in the baselines directory.

Citation

If you are using this dataset, please cite using the link in the About section on the right.

License

Shield: CC BY-NC 4.0

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

CC BY-NC 4.0

speech-to-intent-dataset's People

Contributors

janaab11 avatar shangeth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

speech-to-intent-dataset's Issues

Hugging Face

Hi do you have any plans on making this dataset available through Hugging Face?

Label Noise, Speech Noise, Cut Speech in the dataset

I was doing Dataset Cartography analysis on the training dataset for the e2e SLU model based on a whisper encoder. This analysis splits the dataset into 3 parts: easy, hard, and ambiguous samples.
datamaps

After the split, I tried to analyze the hard samples to understand why these samples are harder for the model to learn. When listening to these audio samples, I found a few samples were mislabelled, a few had no speech only noise and a few sample speeches were cut in between. This analysis was only done on the train set, this has to be done on test set too.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.