client_id
string
path
string
audio
audio
sentence
string
age
string
gender
string
language
class label
45 classes
"ara_trn_sp_12"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_12/common_voice_ar_20401372.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"عليك أن تفي بوعدك."
"twenties"
"male"
0 (Arabic)
"ara_trn_sp_11"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_11/common_voice_ar_19216539.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"يشبه أباه."
"twenties"
"female"
0 (Arabic)
"ara_trn_sp_197"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_197/common_voice_ar_19375914.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"لن يُغَيِّرَ ذلك شيئًا."
"fourties"
"male"
0 (Arabic)
"ara_trn_sp_194"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_194/common_voice_ar_19220386.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"كيف حال الجميع ؟"
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_66"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_66/common_voice_ar_19803329.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"أتعرف كيف تلعب الشطرنج ؟"
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_161"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_161/common_voice_ar_20026829.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"أريد أن آكل في مطعم الليلة."
"thirties"
"female"
0 (Arabic)
"ara_trn_sp_3"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_3/common_voice_ar_19529991.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"لقد مضت أربعون سنة."
"twenties"
"male"
0 (Arabic)
"ara_trn_sp_13"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_13/common_voice_ar_19083375.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"ماذا تطبخ ؟"
"thirties"
"female"
0 (Arabic)
"ara_trn_sp_94"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_94/common_voice_ar_19380209.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"في وقت واحد سيتم إنشاء عشرات من دول الكومنولث التعاونية"
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_109"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_109/common_voice_ar_19476981.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"هذه فكرة جيدة!"
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_192"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_192/common_voice_ar_19205882.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"هل تحبان بعضكما البعض بشدة ؟"
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_122"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_122/common_voice_ar_19204113.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"انتقد بورك نفسه بسبب الابتسامة"
"thirties"
"male"
0 (Arabic)
"ara_trn_sp_169"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_169/common_voice_ar_21100471.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"جلست بجانبي."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_63"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_63/common_voice_ar_19850609.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"مهما قلت عنه ، اظن انه صادق."
"fourties"
"male"
0 (Arabic)
"ara_trn_sp_111"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_111/common_voice_ar_19285863.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"ما أسهل أن يكتسب المرء عادات سيئة!"
"thirties"
"male"
0 (Arabic)
"ara_trn_sp_16"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_16/common_voice_ar_19471957.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"سيسعدني أن أساعدك في أي وقت."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_74"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_74/common_voice_ar_20312018.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"متى وصلت ؟"
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_111"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_111/common_voice_ar_19285859.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"من بحث وجد."
"thirties"
"male"
0 (Arabic)
"ara_trn_sp_187"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_187/common_voice_ar_19541088.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
".في كندا لا يمكنك شرب الكحول حتى تبلغ سن العشرين"
"twenties"
"female"
0 (Arabic)
"ara_trn_sp_194"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_194/common_voice_ar_19220365.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"كعكتك شهية."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_171"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_171/common_voice_ar_19651948.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"لكل داء دواء"
"twenties"
"male"
0 (Arabic)
"ara_trn_sp_168"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_168/common_voice_ar_19231212.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"إنا نعيش في عصر الذّرّة."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_151"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_151/common_voice_ar_19647457.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"أنا مهندس حاسب آلي."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_108"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_108/common_voice_ar_19212444.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"ملأت كيسها بالتفاح."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_191"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_191/common_voice_ar_19233448.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"أيمكنك رؤية الفرق ؟"
"twenties"
"male"
0 (Arabic)
"ara_trn_sp_183"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_183/common_voice_ar_21262385.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"ليس عليك أن تدرس."
"twenties"
"male"
0 (Arabic)
"ara_trn_sp_172"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_172/common_voice_ar_20907897.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"تعرّض للإصابة عندما كان يلعب الركبي."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_110"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_110/common_voice_ar_19222360.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"إتفقوا على انتخابه رئيساً."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_4"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_4/common_voice_ar_19205154.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
".يا له من كلب كبير"
"twenties"
"male"
0 (Arabic)
"ara_trn_sp_184"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_184/common_voice_ar_19236913.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"أحب قصص الغرام."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_64"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_64/common_voice_ar_19533052.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"أعرف اسمه."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_67"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_67/common_voice_ar_19382768.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
".الأمل ليس استراتيجية"
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_8"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_8/common_voice_ar_19228437.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"كنت مشغولاً."
"twenties"
"female"
0 (Arabic)
"ara_trn_sp_184"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_184/common_voice_ar_19236912.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"رجل ميت ليس له فائدة في المزرعة"
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_6"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_6/common_voice_ar_19855706.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
".اِقطع الكعكة بالسكين"
"twenties"
"female"
0 (Arabic)
"ara_trn_sp_77"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_77/common_voice_ar_19375159.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"القرار النهائي بيد الطلبة."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_183"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_183/common_voice_ar_21262384.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"المشتري هو أضخم كوكب في المجموعة الشمسية."
"twenties"
"male"
0 (Arabic)
"ara_trn_sp_5"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_5/common_voice_ar_19179148.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
".لا تتحدى من ليس لديه ما يخسره"
"thirties"
"female"
0 (Arabic)
"ara_trn_sp_155"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_155/common_voice_ar_20179031.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"اِسمح لي أن أعرّفك بمايوكو."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_133"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_133/common_voice_ar_19204603.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
".يبدو المعطف الأحمر جميلاً عليك"
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_82"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_82/common_voice_ar_19219147.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"في البداية واجه صعوبة في التأقلم مع بيته الجديد."
"thirties"
"male"
0 (Arabic)
"ara_trn_sp_123"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_123/common_voice_ar_19330414.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"هيا بنا!"
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_194"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_194/common_voice_ar_19220364.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"إنه صعب أن أتحدث ثلاث لغات"
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_199"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_199/common_voice_ar_19968536.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"خدعها."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_157"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_157/common_voice_ar_19540848.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"تستخدم شاحنة التزجيج في رصف الطرق عندما يتوقع الصقيع بين عشية وضحاها"
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_166"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_166/common_voice_ar_20429386.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"هذا هو الباقي."
"fourties"
"male"
0 (Arabic)
"ara_trn_sp_132"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_132/common_voice_ar_19164211.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
".إذا ابتسم المهزوم أفقد المنتصر لذة الفوز"
"twenties"
"male"
0 (Arabic)
"ara_trn_sp_166"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_166/common_voice_ar_20429178.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"ابحث عن وظيفة جيّدة."
"fourties"
"male"
0 (Arabic)
"ara_trn_sp_130"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_130/common_voice_ar_19579951.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
".منزلك أكبر من منزلي بثلاث مرات"
"twenties"
"male"
0 (Arabic)
"ara_trn_sp_76"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_76/common_voice_ar_19527482.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"بإمكانك أن تطلب منه المساعدة."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_15"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_15/common_voice_ar_19227733.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"سأترك القرار الأخير لك."
"twenties"
"male"
0 (Arabic)
"ara_trn_sp_198"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_198/common_voice_ar_19204870.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"طاب مساؤك. أحلام سعيدة ."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_5"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_5/common_voice_ar_19177070.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"أحسن إلى الناس تستعبد قلوبهم فطالما استعبد الإنسان إحسان"
"thirties"
"female"
0 (Arabic)
"ara_trn_sp_149"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_149/common_voice_ar_19244092.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
".هناك بعبع تحت سريري"
"twenties"
"male"
0 (Arabic)
"ara_trn_sp_138"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_138/common_voice_ar_19205289.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"درست اللغة الإنجليزية حين كنت هناك."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_144"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_144/common_voice_ar_19204506.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"إنك تعيبني دائماً."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_101"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_101/common_voice_ar_19785103.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"لا أعرف شيئاً عن ماضيه."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_156"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_156/common_voice_ar_21099403.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"هي ممرضة."
"twenties"
"male"
0 (Arabic)
"ara_trn_sp_196"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_196/common_voice_ar_21172250.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"لا أستطيع أن أجد أي خطأ في نظريته."
"thirties"
"male"
0 (Arabic)
"ara_trn_sp_196"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_196/common_voice_ar_21172259.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"أحدهم يتصل بك."
"thirties"
"male"
0 (Arabic)
"ara_trn_sp_170"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_170/common_voice_ar_20137009.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"هذه الجملة ليس لها أي معنى."
"thirties"
"male"
0 (Arabic)
"ara_trn_sp_96"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_96/common_voice_ar_20023514.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"كل من فهمني هم حفنة من الناس."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_111"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_111/common_voice_ar_19285862.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"هذا الكرسي يحتاج إلى الإصلاح."
"thirties"
"male"
0 (Arabic)
"ara_trn_sp_132"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_132/common_voice_ar_19164212.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
".من فضلك إلغِ هذا الملفّ"
"twenties"
"male"
0 (Arabic)
"ara_trn_sp_170"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_170/common_voice_ar_20137010.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"أنا أحاول أن أنام."
"thirties"
"male"
0 (Arabic)
"ara_trn_sp_4"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_4/common_voice_ar_19234757.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"أنا لستُ سعيدة."
"twenties"
"male"
0 (Arabic)
"ara_trn_sp_70"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_70/common_voice_ar_19314633.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"كيف حال الطقس ؟"
"teens"
"male"
0 (Arabic)
"ara_trn_sp_4"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_4/common_voice_ar_19234589.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"إنه لا يسكن في نفس الحي الذي أعيش فيه."
"twenties"
"male"
0 (Arabic)
"ara_trn_sp_195"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_195/common_voice_ar_20444748.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"حصل بيري منه على معلومات هامة."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_2"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_2/common_voice_ar_19061962.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"هل ستغني ؟"
"thirties"
"female"
0 (Arabic)
"ara_trn_sp_7"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_7/common_voice_ar_19194227.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"أتوقعت أن يقول الحقيقة ؟"
"teens"
"female"
0 (Arabic)
"ara_trn_sp_133"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_133/common_voice_ar_19204608.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"في الجزء العلوي من كومة المدخنة ، يمكنك رؤية وعاء المدخنة"
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_175"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_175/common_voice_ar_20354933.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"لقد انتهى كل شيء بالنسبة لي. لقد فقدت عملي."
"twenties"
"male"
0 (Arabic)
"ara_trn_sp_179"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_179/common_voice_ar_20016681.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"أين توني ؟"
"thirties"
"female"
0 (Arabic)
"ara_trn_sp_102"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_102/common_voice_ar_19963354.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
".ستبدأ المسرحية في الساعة الثامنة"
"twenties"
"male"
0 (Arabic)
"ara_trn_sp_198"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_198/common_voice_ar_19204868.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"""" كم عمرك ؟ "" "" ستة عشر عاماً """"
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_6"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_6/common_voice_ar_20065664.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
".النهار يتبع الليل"
"twenties"
"female"
0 (Arabic)
"ara_trn_sp_169"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_169/common_voice_ar_21100407.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"عليك التصرف بأدب أثناء غيابي ، أتسمع ما أقول ؟"
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_81"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_81/common_voice_ar_20248419.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"خرجت على المعاش السنة الفائتة."
"teens"
"other"
0 (Arabic)
"ara_trn_sp_108"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_108/common_voice_ar_19212443.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
".أنا أركض"
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_192"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_192/common_voice_ar_19205874.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"من صنع رجل الثلج ذاك ؟"
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_132"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_132/common_voice_ar_19164210.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
".أخي ساعدني في حل واجبي"
"twenties"
"male"
0 (Arabic)
"ara_trn_sp_166"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_166/common_voice_ar_20429199.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"لا تقرأ في هذه القاعة."
"fourties"
"male"
0 (Arabic)
"ara_trn_sp_92"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_92/common_voice_ar_19540126.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"ابق معنا."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_116"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_116/common_voice_ar_19370415.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"اتركني و شأني!"
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_126"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_126/common_voice_ar_20038199.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"الرجل الطويل خرج من البيت."
"twenties"
"male"
0 (Arabic)
"ara_trn_sp_148"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_148/common_voice_ar_19117939.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"أختي تشبه جدتي."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_105"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_105/common_voice_ar_19964898.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"لماذا تتجاهل ما أقوله ؟"
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_155"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_155/common_voice_ar_20179032.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"انفجر غضباً."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_72"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_72/common_voice_ar_19465398.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"إن الله يمهل ولا يهمل"
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_158"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_158/common_voice_ar_20896991.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"هل تلعب كرة القدم ؟"
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_160"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_160/common_voice_ar_21204290.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"هذا بيتي."
"twenties"
"male"
0 (Arabic)
"ara_trn_sp_148"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_148/common_voice_ar_19117938.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"يقولون بأنه لن يرجع."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_164"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_164/common_voice_ar_21004066.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"نظر إلى السماء."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_2"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_2/common_voice_ar_19062116.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"إني أحب أن أعمل."
"thirties"
"female"
0 (Arabic)
"ara_trn_sp_109"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_109/common_voice_ar_19476979.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"القليل من العلاج ، والتدليك ، مع بعض المساعدة من الطبيب"
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_181"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_181/common_voice_ar_20937862.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"هل لي أن أستعير مذياعك ؟"
"twenties"
"male"
0 (Arabic)
"ara_trn_sp_199"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_199/common_voice_ar_19968518.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"إجلس مستقيماً."
"not_defined"
"not_defined"
0 (Arabic)
"ara_trn_sp_95"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_95/common_voice_ar_19322540.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
".ينقص شوكة"
"twenties"
"male"
0 (Arabic)
"ara_trn_sp_166"
"zip://common_voice_kpd/Arabic/train/ara_trn_sp_166/common_voice_ar_20428741.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip"
"أولا ، يجب أن أشكركم على مساعدتكم."
"fourties"
"male"
0 (Arabic)

Dataset Card for common_language

Dataset Summary

This dataset is composed of speech recordings from languages that were carefully selected from the CommonVoice database. The total duration of audio recordings is 45.1 hours (i.e., 1 hour of material for each language). The dataset has been extracted from CommonVoice to train language-id systems.

Supported Tasks and Leaderboards

The baselines for language-id are available in the SpeechBrain toolkit (see recipes/CommonLanguage): https://github.com/speechbrain/speechbrain

Languages

List of included languages:

Arabic, Basque, Breton, Catalan, Chinese_China, Chinese_Hongkong, Chinese_Taiwan, Chuvash, Czech, Dhivehi, Dutch, English, Esperanto, Estonian, French, Frisian, Georgian, German, Greek, Hakha_Chin, Indonesian, Interlingua, Italian, Japanese, Kabyle, Kinyarwanda, Kyrgyz, Latvian, Maltese, Mongolian, Persian, Polish, Portuguese, Romanian, Romansh_Sursilvan, Russian, Sakha, Slovenian, Spanish, Swedish, Tamil, Tatar, Turkish, Ukranian, Welsh

Dataset Structure

Data Instances

A typical data point comprises the path to the audio file, and its label language. Additional fields include age, client_id, gender and sentence.

{
  'client_id': 'itln_trn_sp_175',
  'path': '/path/common_voice_kpd/Italian/train/itln_trn_sp_175/common_voice_it_18279446.wav',
  'audio': {'path': '/path/common_voice_kpd/Italian/train/itln_trn_sp_175/common_voice_it_18279446.wav',
           'array': array([-0.00048828, -0.00018311, -0.00137329, ...,  0.00079346, 0.00091553,  0.00085449], dtype=float32),
           'sampling_rate': 48000},
  'sentence': 'Con gli studenti è leggermente simile.',
  'age': 'not_defined',
  'gender': 'not_defined',
  'language': 22
}

Data Fields

client_id (string): An id for which client (voice) made the recording

path (string): The path to the audio file

  • audio (dict): A dictionary containing the path to the downloaded audio file, the decoded audio array, and the sampling rate. Note that when accessing the audio column: dataset[0]["audio"] the audio file is automatically decoded and resampled to dataset.features["audio"].sampling_rate. Decoding and resampling of a large number of audio files might take a significant amount of time. Thus it is important to first query the sample index before the "audio" column, i.e. dataset[0]["audio"] should always be preferred over dataset["audio"][0].

language (ClassLabel): The language of the recording (see the Languages section above)

sentence (string): The sentence the user was prompted to speak

age (string): The age of the speaker.

gender (string): The gender of the speaker

Data Splits

The dataset is already balanced and split into train, dev (validation) and test sets.

Name Train Dev Test
# of utterances 177552 47104 47704
# unique speakers 11189 1297 1322
Total duration, hr 30.04 7.53 7.53
Min duration, sec 0.86 0.98 0.89
Mean duration, sec 4.87 4.61 4.55
Max duration, sec 21.72 105.67 29.83
Duration per language, min ~40 ~10 ~10

Dataset Creation

Curation Rationale

More Information Needed

Source Data

Initial Data Collection and Normalization

More Information Needed

Who are the source language producers?

More Information Needed

Annotations

Annotation process

More Information Needed

Who are the annotators?

More Information Needed

Personal and Sensitive Information

The dataset consists of people who have donated their voice online. You agree to not attempt to determine the identity of speakers in the Common Voice dataset.

Considerations for Using the Data

Social Impact of Dataset

The dataset consists of people who have donated their voice online. You agree to not attempt to determine the identity of speakers in the Common Voice dataset.

Discussion of Biases

More Information Needed

Other Known Limitations

The Mongolian and Ukrainian languages are spelled as "Mangolian" and "Ukranian" in this version of the dataset.

More Information Needed

Additional Information

Dataset Curators

Ganesh Sinisetty; Pavlo Ruban; Oleksandr Dymov; Mirco Ravanelli

Licensing Information

Creative Commons Attribution 4.0 International

Citation Information

@dataset{ganesh_sinisetty_2021_5036977,
  author       = {Ganesh Sinisetty and
                  Pavlo Ruban and
                  Oleksandr Dymov and
                  Mirco Ravanelli},
  title        = {CommonLanguage},
  month        = jun,
  year         = 2021,
  publisher    = {Zenodo},
  version      = {0.1},
  doi          = {10.5281/zenodo.5036977},
  url          = {https://doi.org/10.5281/zenodo.5036977}
}

Contributions

Thanks to @anton-l for adding this dataset.

Downloads last month
5,575
Edit dataset card
Evaluate models HF Leaderboard

Models trained or fine-tuned on common_language