Datasets:
Tasks:
Text Retrieval
Sub-tasks:
entity-linking-retrieval
Languages:
Polish
Multilinguality:
monolingual
Size Categories:
1K<n<10K
Language Creators:
expert-generated
Annotations Creators:
expert-generated
Source Datasets:
original
License:
unknown
The dataset viewer is not available for this split.
Cannot load the dataset split (in streaming mode) to extract the first rows.
Error code: StreamingRowsError Exception: FileNotFoundError Message: https://minio.clarin-pl.eu/semrel/corpora/ner_export_json/ner_tele_export.json Traceback: Traceback (most recent call last): File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 417, in _info await _file_info( File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 837, in _file_info r.raise_for_status() File "/src/services/worker/.venv/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 1005, in raise_for_status raise ClientResponseError( aiohttp.client_exceptions.ClientResponseError: 503, message='Service Unavailable', url=URL('https://minio.clarin-pl.eu/semrel/corpora/ner_export_json/ner_tele_export.json') The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/src/services/worker/src/worker/utils.py", line 264, in get_rows_or_raise return get_rows( File "/src/services/worker/src/worker/utils.py", line 205, in decorator return func(*args, **kwargs) File "/src/services/worker/src/worker/utils.py", line 242, in get_rows rows_plus_one = list(itertools.islice(ds, rows_max_number + 1)) File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 1379, in __iter__ for key, example in ex_iterable: File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 233, in __iter__ yield from self.generate_examples_fn(**self.kwargs) File "/tmp/modules-cache/datasets_modules/datasets/bprec/7dc37fa0b20500f00cfbb735415afd219cb3be8142cb0d2a8aedf8195350fa0e/bprec.py", line 187, in _generate_examples with open(filepath, "r", encoding="utf-8") as f: File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/streaming.py", line 74, in wrapper return function(*args, download_config=download_config, **kwargs) File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 496, in xopen file_obj = fsspec.open(file, mode=mode, *args, **kwargs).open() File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 134, in open return self.__enter__() File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 102, in __enter__ f = self.fs.open(self.path, mode=mode) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 1199, in open f = self._open( File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 356, in _open size = size or self.info(path, **kwargs)["size"] File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 115, in wrapper return sync(self.loop, func, *args, **kwargs) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 100, in sync raise return_result File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 55, in _runner result[0] = await coro File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 430, in _info raise FileNotFoundError(url) from exc FileNotFoundError: https://minio.clarin-pl.eu/semrel/corpora/ner_export_json/ner_tele_export.json
Need help to make the dataset viewer work? Open a discussion for direct support.
Dataset Card for [Dataset Name]
Dataset Summary
Brand-Product Relation Extraction Corpora in Polish
Supported Tasks and Leaderboards
NER, Entity linking
Languages
Polish
Dataset Structure
Data Instances
[More Information Needed]
Data Fields
- id: int identifier of a text
- text: string text, for example a consumer comment on the social media
- ner: extracted entities and their relationship
- source and target: a pair of entities identified in the text
- from: int value representing starting character of the entity
- text: string value with the entity text
- to: int value representing end character of the entity
- type: one of pre-identified entity types:
- PRODUCT_NAME
- PRODUCT_NAME_IMP
- PRODUCT_NO_BRAND
- BRAND_NAME
- BRAND_NAME_IMP
- VERSION
- PRODUCT_ADJ
- BRAND_ADJ
- LOCATION
- LOCATION_IMP
- source and target: a pair of entities identified in the text
Data Splits
No train/validation/test split provided. Current dataset configurations point to 4 domain categories for the texts:
- tele
- electro
- cosmetics
- banking
Dataset Creation
Curation Rationale
[More Information Needed]
Source Data
Initial Data Collection and Normalization
[More Information Needed]
Who are the source language producers?
[More Information Needed]
Annotations
Annotation process
[More Information Needed]
Who are the annotators?
[More Information Needed]
Personal and Sensitive Information
[More Information Needed]
Considerations for Using the Data
Social Impact of Dataset
[More Information Needed]
Discussion of Biases
[More Information Needed]
Other Known Limitations
[More Information Needed]
Additional Information
Dataset Curators
[More Information Needed]
Licensing Information
[More Information Needed]
Citation Information
@inproceedings{inproceedings,
author = {Janz, Arkadiusz and Kopociński, Łukasz and Piasecki, Maciej and Pluwak, Agnieszka},
year = {2020},
month = {05},
pages = {},
title = {Brand-Product Relation Extraction Using Heterogeneous Vector Space Representations}
}
Contributions
Thanks to @kldarek for adding this dataset.
- Downloads last month
- 1,353