token
string
ner_tag
class label
21 classes
"الجامع"
1 (Book)
"المسند"
1 (Book)
"الصحيح"
1 (Book)
"المختصر"
1 (Book)
"من"
1 (Book)
"أمور"
1 (Book)
"رسول"
1 (Book)
"الله"
1 (Book)
"صلى"
1 (Book)
"الله"
1 (Book)
"عليه"
1 (Book)
"وسلم"
1 (Book)
"وسننه"
1 (Book)
"وأيامه"
1 (Book)
"صحيح"
1 (Book)
"البخاري"
1 (Book)
"المؤلف"
13 (O)
"محمد"
16 (Pers)
"بن"
16 (Pers)
"إسماعيل"
16 (Pers)
"أبو"
16 (Pers)
"عبد"
16 (Pers)
"الله"
16 (Pers)
"البخاري"
16 (Pers)
"الجعفي"
16 (Pers)
"المحقق"
13 (O)
"محمد"
16 (Pers)
"زهير"
16 (Pers)
"بن"
16 (Pers)
"ناصر"
16 (Pers)
"الناصر"
16 (Pers)
"الناشر"
13 (O)
"دار"
14 (Org)
"طوق"
14 (Org)
"النجاة"
14 (Org)
"مصورة"
13 (O)
"عن"
13 (O)
"السلطانية"
13 (O)
"بإضافة"
13 (O)
"ترقيم"
13 (O)
"ترقيم"
13 (O)
"محمد"
16 (Pers)
"فؤاد"
16 (Pers)
"عبد"
16 (Pers)
"الباقي"
16 (Pers)
"الطبعة"
13 (O)
"الأولى"
13 (O)
"1422"
4 (Date)
"ه"
13 (O)
"عدد"
13 (O)
"الأجزاء"
13 (O)
"9"
12 (Number)
"ترقيم"
13 (O)
"الكتاب"
13 (O)
"موافق"
13 (O)
"للمطبوع"
13 (O)
"وهو"
13 (O)
"ضمن"
13 (O)
"خدمة"
13 (O)
"التخريج"
13 (O)
"ومتن"
13 (O)
"مرتبط"
13 (O)
"بشرحه"
13 (O)
"مع"
13 (O)
"الكتاب"
13 (O)
"شرح"
13 (O)
"وتعليق"
13 (O)
"د"
13 (O)
"مصطفى"
16 (Pers)
"ديب"
16 (Pers)
"البغا"
16 (Pers)
"أستاذ"
13 (O)
"الحديث"
13 (O)
"وعلومه"
13 (O)
"في"
13 (O)
"كلية"
14 (Org)
"الشريعة"
14 (Org)
"جامعة"
14 (Org)
"دمشق"
14 (Org)
"كالتالي"
13 (O)
"رقم"
13 (O)
"الحديث"
13 (O)
"والجزء"
13 (O)
"والصفحة"
13 (O)
"في"
13 (O)
"ط"
13 (O)
"البغا"
16 (Pers)
"يليه"
13 (O)
"تعليقه"
13 (O)
"ثم"
13 (O)
"أطرافه"
13 (O)
"مقدمة"
13 (O)
"د"
13 (O)
"مصطفى"
16 (Pers)
"البغا"
16 (Pers)
"بسم"
13 (O)
"الله"
0 (Allah)
"الرحمن"
0 (Allah)
"الرحيم"
0 (Allah)
"الحمد"
13 (O)

Dataset Card for CANER

Dataset Summary

The Classical Arabic Named Entity Recognition corpus is a new corpus of tagged data that can be useful for handling the issues in recognition of Arabic named entities.

Supported Tasks and Leaderboards

  • Named Entity Recognition

Languages

Classical Arabic

Dataset Structure

Data Instances

An example from the dataset:

{'ner_tag': 1, 'token': 'الجامع'}

Where 1 stands for "Book"

Data Fields

  • id: id of the sample
  • token: the tokens of the example text
  • ner_tag: the NER tags of each token

The NER tags correspond to this list:

"Allah",
"Book",
"Clan",
"Crime",
"Date",
"Day",
"Hell",
"Loc",
"Meas",
"Mon",
"Month",
"NatOb",
"Number",
"O",
"Org",
"Para",
"Pers",
"Prophet",
"Rlig",
"Sect",
"Time"

Data Splits

Training splits only

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

Ramzi Salah and Lailatul Qadri Zakaria

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

[More Information Needed]

Dataset Curators

[More Information Needed]

Licensing Information

[More Information Needed]

Citation Information

@article{article, author = {Salah, Ramzi and Zakaria, Lailatul}, year = {2018}, month = {12}, pages = {}, title = {BUILDING THE CLASSICAL ARABIC NAMED ENTITY RECOGNITION CORPUS (CANERCORPUS)}, volume = {96}, journal = {Journal of Theoretical and Applied Information Technology} }

Contributions

Thanks to @KMFODA for adding this dataset.

Downloads last month
427
Edit dataset card
Evaluate models HF Leaderboard

Models trained or fine-tuned on caner