Git Product home page Git Product logo

common-voice-stats's Introduction

Common Voice

This is a living document on all things related to the Common Voice project.

Feel free to make suggestions!

How to contribute to Numbers + Yes/No

If you want to help with the numbers + yes/no data collection project, you're in the right place! There's three ways to contribute:

  1. You don't have a Github account: send an email to me at [email protected] with (1) language name, (2) the words you're contributing {english_word: your_translation}, and (3) YES/NO are you a native speaker of the language.
  2. You don't want to send a Pull Request: Leave a Github Issue with the same info as above.
  3. Send a Pull Request (preferred): send your changes in the table as a PR.

Licensing

Common Voice is released under a Creative Commons-0 license.

Download

You can download the current release here: Common Voice Download

Current Release

Release Date

December 10, 2019

Language Statistics

LANGUAGE # HOURS # SPEAKERS LANGUAGE FAMILY
Abkhaz <1 hours (validated); <1 hours (total) 3 speakers (reported: 2% female / 98% male) Northwest Caucasian
Arabic 7 hours (validated); 12 hours (total) 228 speakers (reported: 24% female / 48% male) Afro-Asiatic
Basque 65 hours (validated); 99 hours (total) 638 speakers (reported: 23% female / 51% male) Language Isolate
Breton 5 hours (validated); 12 hours (total) 133 speakers (reported: 2% female / 55% male) Indo-European
Catalan 245 hours (validated); 295 hours (total) 3,724 speakers (reported: 35% female / 43% male) Indo-European
Cantonese (Hong Kong) <1 hours (validated); <1 hours (total) 15 speakers (reported: 24% female / 37% male) Sino-Tibetan
Chuvash <1 hour (validated); 2 hours (total) 38 speakers (reported: 0% female / 47% male) Turkic
Dhivehi 6 hours (validated); 8 hours (total) 101 speakers (reported: 64% female / 28% male) Indo-European
Dutch 24 hours (validated); 33 hours (total) 701 speakers (reported: 10% female / 66% male) Indo-European
English 1,118 hours (validated); 1,488 hours (total) 51,072 speakers (reported: 13% female / 46% male) Indo-European
Esperanto 35 hours (validated); 41 hours (total) 215 speakers (reported: 7% female / 70% male) Indo-European
Estonian 10 hours (validated); 13 hours (total) 230 speakers (reported: 38% female / 57% male) Uralic
French 350 hours (validated); 412 hours (total) 8,164 speakers (reported: 12% female / 65% male) Indo-European
German 483 hours (validated); 538 hours (total) 8,460 speakers (reported: 9% female / 67% male) Indo-European
Hakha Chin 2 hours (validated); 5 hours (total) 290 speakers (reported: 20% female / 23% male) Sino-Tibetan
Indonesian 3 hours (validated); 3 hours (total) 56 speakers (reported: 4% female / 82% male) Austronesian
Interlingua 1 hours (validated); 3 hours (total) 12 speakers (reported: 2% female / 94% male) Indo-European
Irish 2 hour (validated); 4 hour (total) 80 speakers (reported: 16% female / 59% male) Indo-European
Italian 85 hours (validated); 122 hours (total) 4,292 speakers (reported: 18% female / 47% male) Indo-European
Japanese 3 hours (validated); 3 hours (total) 52 speakers (reported: 0% female / 81% male) Japonic
Kabyle 262 hours (validated); 276 hours (total) 693 speakers (reported: 22% female / 55% male) Afro-Asiatic
Kinyarwanda <1 hours (validated); 17 hours (total) 129 speakers (reported: 8% female / 41% male) Niger-Congo
Kyrgyz 11 hours (validated); 21 hours (total) 119 speakers (reported: 44% female / 45% male) Turkic
Latvian 4 hours (validated); 6 hours (total) 86 speakers (reported: 17% female / 64% male) Indo-European
Mandarin (China) 26 hours (validated); 31 hours (total) 963 speakers (reported: 10% female / 64% male) Sino-Tibetan
Mandarin (Taiwan) 42 hours (validated); 60 hours (total) 1,108 speakers (reported: 26% female / 48% male) Sino-Tibetan
Mongolian 9 hours (validated); 12 hours (total) 296 speakers (reported: 25% female / 36% male) Mongolic
Odia 0.8 hours (validated); 1.2 hours (total) 9 speakers (reported: 13% female / 46% male) Indo-European
Persian 211 hours (validated); 255 hours (total) 2,763 speakers (reported: 6% female / 78% male) Indo-European
Portuguese 27 hours (validated); 29 hours (total) 354 speakers (reported: 2% female / 89% male) Indo-European
Romansh Sursilvan <1 hours (validated); <1 hours (total) 3 speakers (reported: 0% female / 75% male) Indo-European
Russian 72 hours (validated); 76 hours (total) 496 speakers (reported: 23% female / 71% male) Indo-European
Sakha 3 hours (validated); 6 hours (total) 37 speakers (reported: 10% female / 54% male) Turkic
Slovenian 3 hour (validated); 6 hours (total) 51 speakers (reported: 16% female / 80% male) Indo-European
Spanish 167 hours (validated); 221 hours (total) 8,252 speakers (reported: 10% female / 55% male) Indo-European
Swedish 5 hours (validated); 6 hours (total) 99 speakers (reported: 8% female / 74% male) Indo-European
Tamil 3 hours (validated); 4 hours (total) 91 speakers (reported: 10% female / 67% male) Dravidian
Tatar 25 hours (validated); 27 hours (total) 142 speakers (reported: 2% female / 81% male) Turkic
Turkish 13 hours (validated); 14 hours (total) 461 speakers (reported: 8% female / 74% male) Turkic
Votic <1 hours (validated); <1 hours (total) 2 speakers (reported: 0% female / 0% male) Uralic
Welsh 59 hours (validated); 77 hours (total) 1,149 speakers (reported: 18% female / 29% male) Indo-European

Single-digit numbers + yes + no

WARNING: these words, numbers, and spellings are not guaranteed to be correct.

Use Case

The intended use-case is talking to an automated system over the phone. In this case, how would these numbers be read if you were talking to a voice-bot, counting out loud, or reading a long number out loud digit-by-digit?

For YES/NO, how would you say "yes" or "no" if you were answering a simple question, like, "Would you like to check your account balance?"

LANGUAGE 0 1 2 3 4 5 6 7 8 9 yes no native speaker verified?
Abkhaz акымзарак акы ҩба хԥа ԥшьба хәба фба быжьба ааба жәба ааи мап YES
Arabic صفر واحد إثنان ثلاثة أربعة خمسة ستة سبعة ثمانية تسعة نعم لا YES
Basque zero bat bi hiru lau bost sei zazpi zortzi bederatzi bai ez YES
Breton mann unan daou tri pevar pemp c'hwec'h seizh eizh nav ya nann NO
Cantonese (Hong Kong) 唔係 YES
Catalan zero u dos tres quatre cinc sis set vuit nou no YES
Chuvash пӗрре иккӗ виҫҫӗ тӑваттӑ пиллӗк улттӑ ҫиччӗ саккӑр тӑххӑр вуннӑ ҫапла ҫук YES
Czech nula jedna dva tři čtyři pět šest sedm osm devět ano ne YES
Danish nul en to tre fire fem seks syv otte ni ja nej YES
Dhivehi އާ ނޫން NO
Dutch nul één twee drie vier vijf zes zeven acht negen ja nee YES
English zero one two three four five six seven eight nine yes no YES
Esperanto nul unu du tri kvar kvin ses sep ok naŭ jes ne YES
Estonian null üks kaks kolm neli viis kuus seitse kaheksa üheksa jah ei NO
French zéro un deux trois quatre cinq six sept huit neuf oui non YES
Georgian ნული ერთი ორი სამი ოთხი ხუთი ექვსი შვიდი რვა ცხრა დიახ არა YES
German null eins zwei drei vier fünf sechs sieben acht neun ja nein YES
Hakha Chin NO
Indonesian nol satu dua tiga empat lima enam tujuh delapan sembilan ya tidak YES
Interlingua NO
Irish a náid a haon a dó a trí a ceathair a cúig a sé a seacht a hocht a naoi NO
Italian zero uno due tre quattro cinque sei sette otto nove no NO
Japanese [Formal / Informal] れい / まる いち / ひと に / ふた さん / み し / よ ご / いつ ろく / む しち / なな はち / や く / ここの はい / うん いいえ / いや YES
Kabyle [Formal / Informal] ilem yiwen sin kraḍ/tlata kuẓ/ṛebɛa semmus/xemsa sḍis/setta sa/sebɛa ṭam/tmenya tẓa/tesɛa ih uhu YES
Kinyarwanda zeru rimwe kabiri gatatu kane gatanu gatandatu karindwi umunani icyenda yego oya Yes
Kyrgyz нөл бир эки үч төрт беш алты жети сегиз тогуз ооба жок NO
Latvian nulle viens divi trīs četri pieci seši septiņi astoņi deviņi NO
Mandarin (China) NO
Mandarin (Taiwan) YES
Mongolian тэг нэг нь хоёр гурав дөрөв тав зургаа долоо найм ес тийм шүү үгүй шүү NO
Odia ଶୂନ ଏକ ଦୁଇ ତିନି ଚାରି ପାଞ୍ଚ ଛଅ ସାତ ଆଠ ନଅ ହଁ ନା YES
Persian صفر یکی دو سه چهار پنج شش هفت هشت نه آره نه NO
Polish zero jeden dwa trzy cztery pięć sześć siedem osiem dziewięć tak nie YES
Portuguese zero um dois três quatro cinco seis[ptr-br also use "meia"] sete oito nove sim não YES
Romansh Sursilvan NO
Russian ноль один два три четыре пять шесть семь восемь девять да нет YES
Sakha NO
Slovenian nìč êna dvé trí štíri pét šést sédem ósem devét ja ne NO
Spanish cero uno dos tres cuatro cinco seis siete ocho nueve no YES
Swedish noll ett två tre fyra fem sex sju åtta nio ja nej NO
Tamil பூஜ்யம் ஒன்று இரண்டு மூன்று நான்கு ஐந்து ஆறு ஏழு எட்டு ஒன்பது ஆம் இல்லை YES
Tatar ноль бер ике өч дүрт биш алты җиде сигез тугыз әйе юк YES
Turkish sıfır bir iki üç dört beş altı yedi sekiz dokuz evet hayır YES
Votic NO
Welsh sero / dim un dau tri pedwar pump chwech saith wyth naw N/A N/A YES

common-voice-stats's People

Contributors

adrijaned avatar ag12r avatar carlfm01 avatar danielinux7 avatar dionrhys avatar ftyers avatar greenbrown avatar irvin avatar isomorph70 avatar jrmeyer avatar mahalisyarifuddin avatar marsf avatar mohammedbelkacem avatar psubhashish avatar pzelasko avatar rubaaw avatar rutsam avatar sammyfung avatar stefangrotz avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.