Datasets:
token
string
| ner_tag
class label
21 classes
|
---|---|
"الجامع" | 1
(Book) |
"المسند" | 1
(Book) |
"الصحيح" | 1
(Book) |
"المختصر" | 1
(Book) |
"من" | 1
(Book) |
"أمور" | 1
(Book) |
"رسول" | 1
(Book) |
"الله" | 1
(Book) |
"صلى" | 1
(Book) |
"الله" | 1
(Book) |
"عليه" | 1
(Book) |
"وسلم" | 1
(Book) |
"وسننه" | 1
(Book) |
"وأيامه" | 1
(Book) |
"صحيح" | 1
(Book) |
"البخاري" | 1
(Book) |
"المؤلف" | 13
(O) |
"محمد" | 16
(Pers) |
"بن" | 16
(Pers) |
"إسماعيل" | 16
(Pers) |
"أبو" | 16
(Pers) |
"عبد" | 16
(Pers) |
"الله" | 16
(Pers) |
"البخاري" | 16
(Pers) |
"الجعفي" | 16
(Pers) |
"المحقق" | 13
(O) |
"محمد" | 16
(Pers) |
"زهير" | 16
(Pers) |
"بن" | 16
(Pers) |
"ناصر" | 16
(Pers) |
"الناصر" | 16
(Pers) |
"الناشر" | 13
(O) |
"دار" | 14
(Org) |
"طوق" | 14
(Org) |
"النجاة" | 14
(Org) |
"مصورة" | 13
(O) |
"عن" | 13
(O) |
"السلطانية" | 13
(O) |
"بإضافة" | 13
(O) |
"ترقيم" | 13
(O) |
"ترقيم" | 13
(O) |
"محمد" | 16
(Pers) |
"فؤاد" | 16
(Pers) |
"عبد" | 16
(Pers) |
"الباقي" | 16
(Pers) |
"الطبعة" | 13
(O) |
"الأولى" | 13
(O) |
"1422" | 4
(Date) |
"ه" | 13
(O) |
"عدد" | 13
(O) |
"الأجزاء" | 13
(O) |
"9" | 12
(Number) |
"ترقيم" | 13
(O) |
"الكتاب" | 13
(O) |
"موافق" | 13
(O) |
"للمطبوع" | 13
(O) |
"وهو" | 13
(O) |
"ضمن" | 13
(O) |
"خدمة" | 13
(O) |
"التخريج" | 13
(O) |
"ومتن" | 13
(O) |
"مرتبط" | 13
(O) |
"بشرحه" | 13
(O) |
"مع" | 13
(O) |
"الكتاب" | 13
(O) |
"شرح" | 13
(O) |
"وتعليق" | 13
(O) |
"د" | 13
(O) |
"مصطفى" | 16
(Pers) |
"ديب" | 16
(Pers) |
"البغا" | 16
(Pers) |
"أستاذ" | 13
(O) |
"الحديث" | 13
(O) |
"وعلومه" | 13
(O) |
"في" | 13
(O) |
"كلية" | 14
(Org) |
"الشريعة" | 14
(Org) |
"جامعة" | 14
(Org) |
"دمشق" | 14
(Org) |
"كالتالي" | 13
(O) |
"رقم" | 13
(O) |
"الحديث" | 13
(O) |
"والجزء" | 13
(O) |
"والصفحة" | 13
(O) |
"في" | 13
(O) |
"ط" | 13
(O) |
"البغا" | 16
(Pers) |
"يليه" | 13
(O) |
"تعليقه" | 13
(O) |
"ثم" | 13
(O) |
"أطرافه" | 13
(O) |
"مقدمة" | 13
(O) |
"د" | 13
(O) |
"مصطفى" | 16
(Pers) |
"البغا" | 16
(Pers) |
"بسم" | 13
(O) |
"الله" | 0
(Allah) |
"الرحمن" | 0
(Allah) |
"الرحيم" | 0
(Allah) |
"الحمد" | 13
(O) |
Dataset Card for CANER
Dataset Summary
The Classical Arabic Named Entity Recognition corpus is a new corpus of tagged data that can be useful for handling the issues in recognition of Arabic named entities.
Supported Tasks and Leaderboards
- Named Entity Recognition
Languages
Classical Arabic
Dataset Structure
Data Instances
An example from the dataset:
{'ner_tag': 1, 'token': 'الجامع'}
Where 1 stands for "Book"
Data Fields
id
: id of the sampletoken
: the tokens of the example textner_tag
: the NER tags of each token
The NER tags correspond to this list:
"Allah",
"Book",
"Clan",
"Crime",
"Date",
"Day",
"Hell",
"Loc",
"Meas",
"Mon",
"Month",
"NatOb",
"Number",
"O",
"Org",
"Para",
"Pers",
"Prophet",
"Rlig",
"Sect",
"Time"
Data Splits
Training splits only
Dataset Creation
Curation Rationale
[More Information Needed]
Source Data
Initial Data Collection and Normalization
[More Information Needed]
Who are the source language producers?
[More Information Needed]
Annotations
Annotation process
[More Information Needed]
Who are the annotators?
Ramzi Salah and Lailatul Qadri Zakaria
Personal and Sensitive Information
[More Information Needed]
Considerations for Using the Data
Social Impact of Dataset
[More Information Needed]
Discussion of Biases
[More Information Needed]
Other Known Limitations
[More Information Needed]
Additional Information
[More Information Needed]
Dataset Curators
[More Information Needed]
Licensing Information
[More Information Needed]
Citation Information
@article{article, author = {Salah, Ramzi and Zakaria, Lailatul}, year = {2018}, month = {12}, pages = {}, title = {BUILDING THE CLASSICAL ARABIC NAMED ENTITY RECOGNITION CORPUS (CANERCORPUS)}, volume = {96}, journal = {Journal of Theoretical and Applied Information Technology} }
Contributions
Thanks to @KMFODA for adding this dataset.
- Downloads last month
- 427