Datasets:
client_id
string
| path
string
| audio
audio
| sentence
string
| age
string
| gender
string
| language
class label
45 classes
|
---|---|---|---|---|---|---|
"ara_trn_sp_12" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_12/common_voice_ar_20401372.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "عليك أن تفي بوعدك." | "twenties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_11" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_11/common_voice_ar_19216539.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "يشبه أباه." | "twenties" | "female" | 0
(Arabic) |
|
"ara_trn_sp_197" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_197/common_voice_ar_19375914.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "لن يُغَيِّرَ ذلك شيئًا." | "fourties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_194" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_194/common_voice_ar_19220386.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "كيف حال الجميع ؟" | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_66" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_66/common_voice_ar_19803329.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "أتعرف كيف تلعب الشطرنج ؟" | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_161" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_161/common_voice_ar_20026829.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "أريد أن آكل في مطعم الليلة." | "thirties" | "female" | 0
(Arabic) |
|
"ara_trn_sp_3" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_3/common_voice_ar_19529991.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "لقد مضت أربعون سنة." | "twenties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_13" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_13/common_voice_ar_19083375.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "ماذا تطبخ ؟" | "thirties" | "female" | 0
(Arabic) |
|
"ara_trn_sp_94" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_94/common_voice_ar_19380209.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "في وقت واحد سيتم إنشاء عشرات من دول الكومنولث التعاونية" | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_109" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_109/common_voice_ar_19476981.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "هذه فكرة جيدة!" | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_192" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_192/common_voice_ar_19205882.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "هل تحبان بعضكما البعض بشدة ؟" | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_122" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_122/common_voice_ar_19204113.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "انتقد بورك نفسه بسبب الابتسامة" | "thirties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_169" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_169/common_voice_ar_21100471.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "جلست بجانبي." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_63" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_63/common_voice_ar_19850609.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "مهما قلت عنه ، اظن انه صادق." | "fourties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_111" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_111/common_voice_ar_19285863.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "ما أسهل أن يكتسب المرء عادات سيئة!" | "thirties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_16" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_16/common_voice_ar_19471957.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "سيسعدني أن أساعدك في أي وقت." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_74" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_74/common_voice_ar_20312018.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "متى وصلت ؟" | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_111" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_111/common_voice_ar_19285859.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "من بحث وجد." | "thirties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_187" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_187/common_voice_ar_19541088.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | ".في كندا لا يمكنك شرب الكحول حتى تبلغ سن العشرين" | "twenties" | "female" | 0
(Arabic) |
|
"ara_trn_sp_194" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_194/common_voice_ar_19220365.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "كعكتك شهية." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_171" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_171/common_voice_ar_19651948.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "لكل داء دواء" | "twenties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_168" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_168/common_voice_ar_19231212.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "إنا نعيش في عصر الذّرّة." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_151" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_151/common_voice_ar_19647457.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "أنا مهندس حاسب آلي." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_108" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_108/common_voice_ar_19212444.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "ملأت كيسها بالتفاح." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_191" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_191/common_voice_ar_19233448.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "أيمكنك رؤية الفرق ؟" | "twenties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_183" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_183/common_voice_ar_21262385.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "ليس عليك أن تدرس." | "twenties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_172" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_172/common_voice_ar_20907897.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "تعرّض للإصابة عندما كان يلعب الركبي." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_110" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_110/common_voice_ar_19222360.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "إتفقوا على انتخابه رئيساً." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_4" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_4/common_voice_ar_19205154.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | ".يا له من كلب كبير" | "twenties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_184" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_184/common_voice_ar_19236913.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "أحب قصص الغرام." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_64" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_64/common_voice_ar_19533052.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "أعرف اسمه." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_67" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_67/common_voice_ar_19382768.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | ".الأمل ليس استراتيجية" | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_8" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_8/common_voice_ar_19228437.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "كنت مشغولاً." | "twenties" | "female" | 0
(Arabic) |
|
"ara_trn_sp_184" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_184/common_voice_ar_19236912.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "رجل ميت ليس له فائدة في المزرعة" | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_6" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_6/common_voice_ar_19855706.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | ".اِقطع الكعكة بالسكين" | "twenties" | "female" | 0
(Arabic) |
|
"ara_trn_sp_77" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_77/common_voice_ar_19375159.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "القرار النهائي بيد الطلبة." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_183" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_183/common_voice_ar_21262384.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "المشتري هو أضخم كوكب في المجموعة الشمسية." | "twenties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_5" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_5/common_voice_ar_19179148.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | ".لا تتحدى من ليس لديه ما يخسره" | "thirties" | "female" | 0
(Arabic) |
|
"ara_trn_sp_155" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_155/common_voice_ar_20179031.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "اِسمح لي أن أعرّفك بمايوكو." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_133" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_133/common_voice_ar_19204603.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | ".يبدو المعطف الأحمر جميلاً عليك" | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_82" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_82/common_voice_ar_19219147.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "في البداية واجه صعوبة في التأقلم مع بيته الجديد." | "thirties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_123" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_123/common_voice_ar_19330414.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "هيا بنا!" | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_194" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_194/common_voice_ar_19220364.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "إنه صعب أن أتحدث ثلاث لغات" | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_199" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_199/common_voice_ar_19968536.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "خدعها." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_157" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_157/common_voice_ar_19540848.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "تستخدم شاحنة التزجيج في رصف الطرق عندما يتوقع الصقيع بين عشية وضحاها" | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_166" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_166/common_voice_ar_20429386.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "هذا هو الباقي." | "fourties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_132" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_132/common_voice_ar_19164211.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | ".إذا ابتسم المهزوم أفقد المنتصر لذة الفوز" | "twenties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_166" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_166/common_voice_ar_20429178.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "ابحث عن وظيفة جيّدة." | "fourties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_130" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_130/common_voice_ar_19579951.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | ".منزلك أكبر من منزلي بثلاث مرات" | "twenties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_76" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_76/common_voice_ar_19527482.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "بإمكانك أن تطلب منه المساعدة." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_15" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_15/common_voice_ar_19227733.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "سأترك القرار الأخير لك." | "twenties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_198" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_198/common_voice_ar_19204870.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "طاب مساؤك. أحلام سعيدة ." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_5" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_5/common_voice_ar_19177070.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "أحسن إلى الناس تستعبد قلوبهم فطالما استعبد الإنسان إحسان" | "thirties" | "female" | 0
(Arabic) |
|
"ara_trn_sp_149" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_149/common_voice_ar_19244092.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | ".هناك بعبع تحت سريري" | "twenties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_138" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_138/common_voice_ar_19205289.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "درست اللغة الإنجليزية حين كنت هناك." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_144" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_144/common_voice_ar_19204506.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "إنك تعيبني دائماً." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_101" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_101/common_voice_ar_19785103.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "لا أعرف شيئاً عن ماضيه." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_156" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_156/common_voice_ar_21099403.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "هي ممرضة." | "twenties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_196" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_196/common_voice_ar_21172250.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "لا أستطيع أن أجد أي خطأ في نظريته." | "thirties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_196" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_196/common_voice_ar_21172259.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "أحدهم يتصل بك." | "thirties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_170" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_170/common_voice_ar_20137009.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "هذه الجملة ليس لها أي معنى." | "thirties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_96" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_96/common_voice_ar_20023514.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "كل من فهمني هم حفنة من الناس." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_111" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_111/common_voice_ar_19285862.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "هذا الكرسي يحتاج إلى الإصلاح." | "thirties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_132" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_132/common_voice_ar_19164212.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | ".من فضلك إلغِ هذا الملفّ" | "twenties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_170" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_170/common_voice_ar_20137010.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "أنا أحاول أن أنام." | "thirties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_4" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_4/common_voice_ar_19234757.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "أنا لستُ سعيدة." | "twenties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_70" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_70/common_voice_ar_19314633.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "كيف حال الطقس ؟" | "teens" | "male" | 0
(Arabic) |
|
"ara_trn_sp_4" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_4/common_voice_ar_19234589.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "إنه لا يسكن في نفس الحي الذي أعيش فيه." | "twenties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_195" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_195/common_voice_ar_20444748.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "حصل بيري منه على معلومات هامة." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_2" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_2/common_voice_ar_19061962.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "هل ستغني ؟" | "thirties" | "female" | 0
(Arabic) |
|
"ara_trn_sp_7" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_7/common_voice_ar_19194227.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "أتوقعت أن يقول الحقيقة ؟" | "teens" | "female" | 0
(Arabic) |
|
"ara_trn_sp_133" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_133/common_voice_ar_19204608.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "في الجزء العلوي من كومة المدخنة ، يمكنك رؤية وعاء المدخنة" | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_175" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_175/common_voice_ar_20354933.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "لقد انتهى كل شيء بالنسبة لي. لقد فقدت عملي." | "twenties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_179" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_179/common_voice_ar_20016681.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "أين توني ؟" | "thirties" | "female" | 0
(Arabic) |
|
"ara_trn_sp_102" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_102/common_voice_ar_19963354.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | ".ستبدأ المسرحية في الساعة الثامنة" | "twenties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_198" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_198/common_voice_ar_19204868.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | """" كم عمرك ؟ "" "" ستة عشر عاماً """" | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_6" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_6/common_voice_ar_20065664.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | ".النهار يتبع الليل" | "twenties" | "female" | 0
(Arabic) |
|
"ara_trn_sp_169" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_169/common_voice_ar_21100407.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "عليك التصرف بأدب أثناء غيابي ، أتسمع ما أقول ؟" | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_81" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_81/common_voice_ar_20248419.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "خرجت على المعاش السنة الفائتة." | "teens" | "other" | 0
(Arabic) |
|
"ara_trn_sp_108" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_108/common_voice_ar_19212443.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | ".أنا أركض" | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_192" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_192/common_voice_ar_19205874.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "من صنع رجل الثلج ذاك ؟" | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_132" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_132/common_voice_ar_19164210.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | ".أخي ساعدني في حل واجبي" | "twenties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_166" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_166/common_voice_ar_20429199.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "لا تقرأ في هذه القاعة." | "fourties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_92" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_92/common_voice_ar_19540126.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "ابق معنا." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_116" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_116/common_voice_ar_19370415.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "اتركني و شأني!" | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_126" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_126/common_voice_ar_20038199.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "الرجل الطويل خرج من البيت." | "twenties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_148" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_148/common_voice_ar_19117939.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "أختي تشبه جدتي." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_105" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_105/common_voice_ar_19964898.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "لماذا تتجاهل ما أقوله ؟" | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_155" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_155/common_voice_ar_20179032.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "انفجر غضباً." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_72" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_72/common_voice_ar_19465398.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "إن الله يمهل ولا يهمل" | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_158" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_158/common_voice_ar_20896991.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "هل تلعب كرة القدم ؟" | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_160" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_160/common_voice_ar_21204290.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "هذا بيتي." | "twenties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_148" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_148/common_voice_ar_19117938.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "يقولون بأنه لن يرجع." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_164" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_164/common_voice_ar_21004066.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "نظر إلى السماء." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_2" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_2/common_voice_ar_19062116.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "إني أحب أن أعمل." | "thirties" | "female" | 0
(Arabic) |
|
"ara_trn_sp_109" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_109/common_voice_ar_19476979.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "القليل من العلاج ، والتدليك ، مع بعض المساعدة من الطبيب" | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_181" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_181/common_voice_ar_20937862.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "هل لي أن أستعير مذياعك ؟" | "twenties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_199" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_199/common_voice_ar_19968518.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "إجلس مستقيماً." | "not_defined" | "not_defined" | 0
(Arabic) |
|
"ara_trn_sp_95" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_95/common_voice_ar_19322540.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | ".ينقص شوكة" | "twenties" | "male" | 0
(Arabic) |
|
"ara_trn_sp_166" | "zip://common_voice_kpd/Arabic/train/ara_trn_sp_166/common_voice_ar_20428741.wav::https://huggingface.co/datasets/common_language/resolve/main/data/CommonLanguage.zip" | "أولا ، يجب أن أشكركم على مساعدتكم." | "fourties" | "male" | 0
(Arabic) |
Dataset Card for common_language
Dataset Summary
This dataset is composed of speech recordings from languages that were carefully selected from the CommonVoice database. The total duration of audio recordings is 45.1 hours (i.e., 1 hour of material for each language). The dataset has been extracted from CommonVoice to train language-id systems.
Supported Tasks and Leaderboards
The baselines for language-id are available in the SpeechBrain toolkit (see recipes/CommonLanguage): https://github.com/speechbrain/speechbrain
Languages
List of included languages:
Arabic, Basque, Breton, Catalan, Chinese_China, Chinese_Hongkong, Chinese_Taiwan, Chuvash, Czech, Dhivehi, Dutch, English, Esperanto, Estonian, French, Frisian, Georgian, German, Greek, Hakha_Chin, Indonesian, Interlingua, Italian, Japanese, Kabyle, Kinyarwanda, Kyrgyz, Latvian, Maltese, Mongolian, Persian, Polish, Portuguese, Romanian, Romansh_Sursilvan, Russian, Sakha, Slovenian, Spanish, Swedish, Tamil, Tatar, Turkish, Ukranian, Welsh
Dataset Structure
Data Instances
A typical data point comprises the path
to the audio file, and its label language
. Additional fields include age
, client_id
, gender
and sentence
.
{
'client_id': 'itln_trn_sp_175',
'path': '/path/common_voice_kpd/Italian/train/itln_trn_sp_175/common_voice_it_18279446.wav',
'audio': {'path': '/path/common_voice_kpd/Italian/train/itln_trn_sp_175/common_voice_it_18279446.wav',
'array': array([-0.00048828, -0.00018311, -0.00137329, ..., 0.00079346, 0.00091553, 0.00085449], dtype=float32),
'sampling_rate': 48000},
'sentence': 'Con gli studenti è leggermente simile.',
'age': 'not_defined',
'gender': 'not_defined',
'language': 22
}
Data Fields
client_id
(string
): An id for which client (voice) made the recording
path
(string
): The path to the audio file
audio
(dict
): A dictionary containing the path to the downloaded audio file, the decoded audio array, and the sampling rate. Note that when accessing the audio column:dataset[0]["audio"]
the audio file is automatically decoded and resampled todataset.features["audio"].sampling_rate
. Decoding and resampling of a large number of audio files might take a significant amount of time. Thus it is important to first query the sample index before the"audio"
column, i.e.dataset[0]["audio"]
should always be preferred overdataset["audio"][0]
.
language
(ClassLabel
): The language of the recording (see the Languages
section above)
sentence
(string
): The sentence the user was prompted to speak
age
(string
): The age of the speaker.
gender
(string
): The gender of the speaker
Data Splits
The dataset is already balanced and split into train, dev (validation) and test sets.
Name | Train | Dev | Test |
---|---|---|---|
# of utterances | 177552 | 47104 | 47704 |
# unique speakers | 11189 | 1297 | 1322 |
Total duration, hr | 30.04 | 7.53 | 7.53 |
Min duration, sec | 0.86 | 0.98 | 0.89 |
Mean duration, sec | 4.87 | 4.61 | 4.55 |
Max duration, sec | 21.72 | 105.67 | 29.83 |
Duration per language, min | ~40 | ~10 | ~10 |
Dataset Creation
Curation Rationale
Source Data
Initial Data Collection and Normalization
Who are the source language producers?
Annotations
Annotation process
Who are the annotators?
Personal and Sensitive Information
The dataset consists of people who have donated their voice online. You agree to not attempt to determine the identity of speakers in the Common Voice dataset.
Considerations for Using the Data
Social Impact of Dataset
The dataset consists of people who have donated their voice online. You agree to not attempt to determine the identity of speakers in the Common Voice dataset.
Discussion of Biases
Other Known Limitations
The Mongolian and Ukrainian languages are spelled as "Mangolian" and "Ukranian" in this version of the dataset.
Additional Information
Dataset Curators
Ganesh Sinisetty; Pavlo Ruban; Oleksandr Dymov; Mirco Ravanelli
Licensing Information
Creative Commons Attribution 4.0 International
Citation Information
@dataset{ganesh_sinisetty_2021_5036977,
author = {Ganesh Sinisetty and
Pavlo Ruban and
Oleksandr Dymov and
Mirco Ravanelli},
title = {CommonLanguage},
month = jun,
year = 2021,
publisher = {Zenodo},
version = {0.1},
doi = {10.5281/zenodo.5036977},
url = {https://doi.org/10.5281/zenodo.5036977}
}
Contributions
Thanks to @anton-l for adding this dataset.
- Downloads last month
- 5,575