id
string
path
string
audio
audio
transcription
string
duration
float32
0.14
15
language
string
original_speaker_id
int64
1
26
session_id
int64
1
4
topic
string
"00000"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L2_0.560_1.560.wav"
"我刚刚开始record"
1.56
"mixed"
1
1
"persona"
"00001"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L4_2.440_4.160.wav"
"嗯hello我的名字叫徐妍"
4.16
"mixed"
1
1
"persona"
"00002"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L6_6.720_3.320.wav"
"嗯初次见面nice to meet you嗯"
3.32
"mixed"
1
1
"persona"
"00003"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L8_10.240_5.700.wav"
"今天呢我非常希望能够通过这个机会去跟你make friends"
5.7
"mixed"
1
1
"persona"
"00004"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L10_16.020_2.020.wav"
"嗯你知道就是"
2.02
"zh"
1
1
"persona"
"00005"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L12_18.080_7.680.wav"
"我们平时能够遇见其他stranger的机会其实不是很多所以其实这样的一个机会我还是觉得"
7.68
"mixed"
1
1
"persona"
"00006"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L14_25.880_1.280.wav"
"很honour的"
1.28
"mixed"
1
1
"persona"
"00007"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L16_27.770_4.890.wav"
"对然后嗯我是来自中国北方的一个小城市"
4.89
"zh"
1
1
"persona"
"00008"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L20_34.200_4.120.wav"
"we have the sea shore in the city and we have a lot of"
4.12
"en"
1
1
"persona"
"00009"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L22_38.900_1.480.wav"
"delicious sea food"
1.48
"en"
1
1
"persona"
"00010"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L24_40.680_7.140.wav"
"嗯我不知道好像我不太确定你家是不是也是来自一个similar city所以"
7.14
"mixed"
1
1
"persona"
"00011"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L26_47.920_9.440.wav"
"嗯接下来我们也可以再接下来我们可以讨论一下吃海鲜啊also sea food also the sea shore also the sunlight something like that"
9.44
"mixed"
1
1
"persona"
"00012"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L28_57.380_1.200.wav"
"嗯"
1.2
"zh"
1
1
"persona"
"00013"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L30_58.600_2.680.wav"
"我的hobby是读书"
2.68
"mixed"
1
1
"persona"
"00014"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L32_61.320_2.120.wav"
"um watch some movies"
2.12
"en"
1
1
"persona"
"00015"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L34_63.460_4.740.wav"
"and我也很喜欢哈outdoor那些运动"
4.74
"mixed"
1
1
"persona"
"00016"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L36_68.220_2.320.wav"
"比如说go hiking啊之类的"
2.32
"mixed"
1
1
"persona"
"00017"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L38_70.920_2.640.wav"
"嗯所以啊"
2.64
"zh"
1
1
"persona"
"00018"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L40_73.580_2.050.wav"
"what about your what about you"
2.05
"en"
1
1
"persona"
"00019"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L42_86.8090_1.6600.wav"
"ok嗯"
1.66
"mixed"
2
1
"persona"
"00020"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L44_88.4890_2.7600.wav"
"南方其实也能吃海鲜啊其实我"
2.76
"zh"
2
1
"persona"
"00021"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L46_91.5290_3.0600.wav"
"小时候当然吃的不多但还是吃得到"
3.06
"zh"
2
1
"persona"
"00022"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L48_94.6690_2.7200.wav"
"吃到那种大螃蟹又超超级好吃"
2.72
"zh"
2
1
"persona"
"00023"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L50_97.4090_0.9600.wav"
"I really like"
0.96
"en"
2
1
"persona"
"00024"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L52_98.6490_3.4400.wav"
"so um about my hobby I"
3.44
"en"
2
1
"persona"
"00025"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L54_102.1090_4.7800.wav"
"i like to do some sports with my friends and"
4.78
"en"
2
1
"persona"
"00026"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L56_108.1890_4.4800.wav"
"我有时候也会比较喜欢和朋友一起打游戏一起这种之类的事情"
4.48
"zh"
2
1
"persona"
"00027"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L58_112.7690_4.8600.wav"
"然后哦忘了告诉你哦我刚已经告诉你了我是来自南南方"
4.86
"zh"
2
1
"persona"
"00028"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L60_117.7090_1.5600.wav"
"I come from south"
1.56
"en"
2
1
"persona"
"00029"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L62_119.3290_2.5700.wav"
"uh is it funny no"
2.57
"en"
2
1
"persona"
"00030"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L64_121.9690_2.0600.wav"
"um so um"
2.06
"en"
2
1
"persona"
"00031"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L66_125.7890_4.3000.wav"
"呃我我有我的其他的关于我其他的一些hobby啊"
4.3
"mixed"
2
1
"persona"
"00032"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L68_130.3890_1.1600.wav"
"i like reading"
1.16
"en"
2
1
"persona"
"00033"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L70_131.5690_6.6200.wav"
"a bit but depends um for some for some books very interesting books i can really like"
6.62
"en"
2
1
"persona"
"00034"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L72_138.9290_3.0800.wav"
"呃花好长的时间大概"
3.08
"zh"
2
1
"persona"
"00035"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L74_142.0290_1.8400.wav"
"连续会看三四个小时"
1.84
"zh"
2
1
"persona"
"00036"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L76_144.1890_0.7400.wav"
"嗯"
0.74
"zh"
2
1
"persona"
"00037"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L78_144.9490_1.8400.wav"
"but but depends"
1.84
"en"
2
1
"persona"
"00038"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L80_147.1090_3.5600.wav"
"i like i like武侠小小说超超级喜欢"
3.56
"mixed"
2
1
"persona"
"00039"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L82_150.9090_4.6000.wav"
"um but when i very busy maybe i i"
4.6
"en"
2
1
"persona"
"00040"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L84_155.6690_1.7900.wav"
"will not spend too much time on this"
1.79
"en"
2
1
"persona"
"00041"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L86_157.8490_5.3800.wav"
"i remember呃在我本科的时候我花过挺长的时间"
5.38
"mixed"
2
1
"persona"
"00042"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L88_163.3090_2.8800.wav"
"去看完了好几本金庸的小说"
2.88
"zh"
2
1
"persona"
"00043"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L90_167.3290_1.0000.wav"
"嗯"
1
"zh"
2
1
"persona"
"00044"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L92_169.1090_2.0600.wav"
"关于其他的事情嗯"
2.06
"zh"
2
1
"persona"
"00045"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L94_173.4490_0.9000.wav"
"哦对"
0.9
"zh"
2
1
"persona"
"00046"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L96_175.5290_3.4600.wav"
"嗯你可以你你可能有相相似的爱好"
3.46
"zh"
2
1
"persona"
"00047"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L98_170.480_3.340.wav"
"嗯对啊对啊我也很喜欢reading books"
3.34
"mixed"
1
1
"persona"
"00048"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L100_173.900_3.420.wav"
"呃当然我本来刚刚想问你喜欢"
3.42
"zh"
1
1
"persona"
"00049"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L102_177.340_4.920.wav"
"看谁的呃一些novel所以你也跟我说了你喜欢看金庸的嘛"
4.92
"mixed"
1
1
"persona"
"00050"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L104_182.320_1.020.wav"
"但是"
1.02
"zh"
1
1
"persona"
"00051"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L106_183.360_5.080.wav"
"嗯actually i read i didn't read a lot of books about jin yong"
5.08
"mixed"
1
1
"persona"
"00052"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L108_188.460_4.860.wav"
"because嗯其实金庸他呃相对来说"
4.86
"mixed"
1
1
"persona"
"00053"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L110_193.380_2.140.wav"
"嗯比较old"
2.14
"mixed"
1
1
"persona"
"00054"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L112_195.620_0.800.wav"
"对"
0.8
"zh"
1
1
"persona"
"00055"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L114_196.440_3.000.wav"
"然后对于我来说嘛女孩子"
3
"zh"
1
1
"persona"
"00056"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L116_199.460_2.220.wav"
"for girls we don't like that"
2.22
"en"
1
1
"persona"
"00057"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L118_201.700_3.360.wav"
"cruel thing like武打呀"
3.36
"mixed"
1
1
"persona"
"00058"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L120_205.080_3.760.wav"
"like呃江湖啊"
3.76
"mixed"
1
1
"persona"
"00059"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L122_209.120_5.020.wav"
"呃something like that所以我基本上我看到小的时候看的都是一种"
5.02
"mixed"
1
1
"persona"
"00060"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L124_214.480_1.420.wav"
"爱情类小说"
1.42
"zh"
1
1
"persona"
"00061"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L126_215.920_2.420.wav"
"love stories或者是"
2.42
"mixed"
1
1
"persona"
"00062"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L128_218.600_1.780.wav"
"一些玄幻类的"
1.78
"zh"
1
1
"persona"
"00063"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L130_220.600_3.040.wav"
"呃比如说关于一些神啊关于god"
3.04
"mixed"
1
1
"persona"
"00064"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L132_223.660_4.920.wav"
"关于一些其他的奇奇怪怪的religions some something like that"
4.92
"mixed"
1
1
"persona"
"00065"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L134_228.600_2.900.wav"
"嗯所以我们还是挺不一样的"
2.9
"zh"
1
1
"persona"
"00066"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L136_242.8290_1.7000.wav"
"哦我可以问一下你"
1.7
"zh"
2
1
"persona"
"00067"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L138_244.6690_3.0000.wav"
"具体哪一些类型嘛可能我也会喜欢像"
3
"zh"
2
1
"persona"
"00068"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L140_247.6890_3.3600.wav"
"像一些神话类的有一些小说也蛮有意思"
3.36
"zh"
2
1
"persona"
"00069"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L142_241.820_5.120.wav"
"嗯说实话我读的最神话的可能就是北欧神话"
5.12
"zh"
1
1
"persona"
"00070"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L144_247.400_3.920.wav"
"对但是这个是我很很很很久之后才读的但是"
3.92
"zh"
1
1
"persona"
"00071"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L146_251.380_2.740.wav"
"嗯关于玄幻类的比如说three body"
2.74
"mixed"
1
1
"persona"
"00072"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L148_254.140_2.460.wav"
"in Chinese is 三体"
2.46
"mixed"
1
1
"persona"
"00073"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L150_256.620_1.600.wav"
"啊是"
1.6
"zh"
1
1
"persona"
"00074"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L152_258.540_1.640.wav"
"刘慈欣的作品"
1.64
"zh"
1
1
"persona"
"00075"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L154_260.200_3.600.wav"
"对我非常非常的喜欢他关于一些比如说"
3.6
"zh"
1
1
"persona"
"00076"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L156_263.820_3.520.wav"
"嗯black forest theory"
3.52
"mixed"
1
1
"persona"
"00077"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L158_267.460_4.120.wav"
"something like that i really think is oh creative"
4.12
"en"
1
1
"persona"
"00078"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L160_271.820_4.020.wav"
"他真的太我就觉得我觉得他真的太有想象力了"
4.02
"zh"
1
1
"persona"
"00079"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L162_275.860_2.580.wav"
"他的imagination is so good"
2.58
"mixed"
1
1
"persona"
"00080"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L164_289.3090_4.2800.wav"
"哦我也看过三体小说的一个情节"
4.28
"zh"
2
1
"persona"
"00081"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L166_293.9090_2.9400.wav"
"但我现在已经忘记掉就是我知道它是一本"
2.94
"zh"
2
1
"persona"
"00082"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L168_296.8690_2.9400.wav"
"very very interesting and very"
2.94
"en"
2
1
"persona"
"00083"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L170_300.0490_1.3400.wav"
"appearing book"
1.34
"en"
2
1
"persona"
"00084"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L172_302.1390_3.6300.wav"
"嗯我当时没有看他的原因好像是"
3.63
"zh"
2
1
"persona"
"00085"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L174_306.1090_2.2800.wav"
"当时有一段时间还挺忙的"
2.28
"zh"
2
1
"persona"
"00086"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L176_308.5690_3.6400.wav"
"哦对然后关于小说的话"
3.64
"zh"
2
1
"persona"
"00087"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L178_312.2490_4.8200.wav"
"um sometimes i will prefer to watch movie or tv series"
4.82
"en"
2
1
"persona"
"00088"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L180_317.0890_2.8800.wav"
"because i think is much better than reading books"
2.88
"en"
2
1
"persona"
"00089"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L182_320.1690_1.0200.wav"
"but uh"
1.02
"en"
2
1
"persona"
"00090"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L184_321.5090_1.5400.wav"
"depends be uh"
1.54
"en"
2
1
"persona"
"00091"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L186_323.0690_3.7400.wav"
"um i sometimes i [UNK] start reading for a while"
3.74
"en"
2
1
"persona"
"00092"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L188_326.8290_2.2600.wav"
"嗯可能我就会"
2.26
"zh"
2
1
"persona"
"00093"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L190_329.3890_4.0800.wav"
"就是花更多的时间就是越来越去看这些书"
4.08
"zh"
2
1
"persona"
"00094"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L192_333.8890_2.7000.wav"
"这就比方说我已经"
2.7
"zh"
2
1
"persona"
"00095"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L194_336.6090_2.0000.wav"
"看了这本书的第一个章节"
2
"zh"
2
1
"persona"
"00096"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L196_338.6690_3.2600.wav"
"and then I may be very likely"
3.26
"en"
2
1
"persona"
"00097"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L198_342.1690_2.1400.wav"
"to keep reading more"
2.14
"en"
2
1
"persona"
"00098"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk2_L200_344.7090_3.6800.wav"
"because i was already被被这本书吸引"
3.68
"mixed"
2
1
"persona"
"00099"
"/storage/hf-datasets-cache/all/datasets/16739474757983-config-parquet-and-info-CAiRE-ASCEND-5c1abf9c/downloads/extracted/f0790e45797bd654a35ecd1eb4865fa761f1cbd842b674e0defb6812ae8cffbf/waves/ses1_spk1_L202_340.740_2.720.wav"
"嗯对我有的时候也是这样所以"
2.72
"zh"
1
1
"persona"

Dataset Card for ASCEND

Dataset Summary

ASCEND (A Spontaneous Chinese-English Dataset) introduces a high-quality resource of spontaneous multi-turn conversational dialogue Chinese-English code-switching corpus collected in Hong Kong. ASCEND consists of 10.62 hours of spontaneous speech with a total of ~12.3K utterances. The corpus is split into 3 sets: training, validation, and test with a ratio of 8:1:1 while maintaining a balanced gender proportion on each set.

Supported Tasks and Leaderboards

Code-switching

Languages

Chinese and English

Usage

To obtain the full dataset (complete with train, validation, and test set), simply run this:

import datasets
dataset = datasets.load_dataset("CAiRE/ASCEND")

Dataset Structure

A typical data point comprises the path to the audio file, the loaded audio array, and its transcription. Additional fields include datapoint id, duration, language, speaker id, session id, and topic.

{
    'id': '00644',
    'path': '.cache/huggingface/datasets/downloads/extracted/f0b33b5266cd9452ee310eef3577cf7adb7f29aa54dbff74b9a8ee406a55d614/waves/ses2_spk3_L13101_189.900_5.490.wav',
    'audio': {
        'path': '.cache/huggingface/datasets/downloads/extracted/f0b33b5266cd9452ee310eef3577cf7adb7f29aa54dbff74b9a8ee406a55d614/waves/ses2_spk3_L13101_189.900_5.490.wav',
        'array': array([-6.1035156e-05, -1.8310547e-04, 3.0517578e-05, ...,
            0.0000000e+00, -3.0517578e-05, 0.0000000e+00
        ], dtype = float32),
        'sampling_rate': 16000
    },
    'transcription': '因为你不可能邀你的female friends去说走我们去play basketball',
    'duration': 5.489999771118164,
    'language': 'mixed',
    'original_speaker_id': 3,
    'session_id': 2,
    'topic': 'sports'
}

Data Splits

Number of utterances: 9,869 train, 1,130 validation, and 1,315 test.

Additional Information

For comprehensive explanations, please check our paper.

Licensing Information

Creative Common Attribution Share-Alike 4.0 International (CC-BY-SA 4.0)

Citation Information

If you use our dataset, please cite us:

@inproceedings{lovenia2022ascend,
  title={ASCEND: A Spontaneous Chinese-English Dataset for Code-switching in Multi-turn Conversation},
  author={Lovenia, Holy and Cahyawijaya, Samuel and Winata, Genta Indra and Xu, Peng and Yan, Xu and Liu, Zihan and Frieske, Rita and Yu, Tiezheng and Dai, Wenliang and Barezi, Elham J and others},
  booktitle={Proceedings of the 13th Language Resources and Evaluation Conference (LREC)},
  year={2022}
Downloads last month
487
Edit dataset card
Evaluate models HF Leaderboard