Datasets:
bprec

Tasks:

Text Retrieval

Sub-tasks: entity-linking-retrieval

Languages: Polish

Multilinguality: monolingual

Size Categories: 1K<n<10K

Language Creators: expert-generated

Annotations Creators: expert-generated

Source Datasets: original

License: unknown

Dataset card Files Files and versions Community

Dataset Viewer

Go to dataset viewer

Viewer

The dataset viewer is not available for this split.

Cannot load the dataset split (in streaming mode) to extract the first rows.

Error code:   StreamingRowsError
Exception:    FileNotFoundError
Message:      https://minio.clarin-pl.eu/semrel/corpora/ner_export_json/ner_tele_export.json
Traceback:    Traceback (most recent call last):
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 417, in _info
                  await _file_info(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 837, in _file_info
                  r.raise_for_status()
                File "/src/services/worker/.venv/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 1005, in raise_for_status
                  raise ClientResponseError(
              aiohttp.client_exceptions.ClientResponseError: 503, message='Service Unavailable', url=URL('https://minio.clarin-pl.eu/semrel/corpora/ner_export_json/ner_tele_export.json')
              
              The above exception was the direct cause of the following exception:
              
              Traceback (most recent call last):
                File "/src/services/worker/src/worker/utils.py", line 264, in get_rows_or_raise
                  return get_rows(
                File "/src/services/worker/src/worker/utils.py", line 205, in decorator
                  return func(*args, **kwargs)
                File "/src/services/worker/src/worker/utils.py", line 242, in get_rows
                  rows_plus_one = list(itertools.islice(ds, rows_max_number + 1))
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 1379, in __iter__
                  for key, example in ex_iterable:
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 233, in __iter__
                  yield from self.generate_examples_fn(**self.kwargs)
                File "/tmp/modules-cache/datasets_modules/datasets/bprec/7dc37fa0b20500f00cfbb735415afd219cb3be8142cb0d2a8aedf8195350fa0e/bprec.py", line 187, in _generate_examples
                  with open(filepath, "r", encoding="utf-8") as f:
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/streaming.py", line 74, in wrapper
                  return function(*args, download_config=download_config, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 496, in xopen
                  file_obj = fsspec.open(file, mode=mode, *args, **kwargs).open()
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 134, in open
                  return self.__enter__()
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 102, in __enter__
                  f = self.fs.open(self.path, mode=mode)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 1199, in open
                  f = self._open(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 356, in _open
                  size = size or self.info(path, **kwargs)["size"]
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 115, in wrapper
                  return sync(self.loop, func, *args, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 100, in sync
                  raise return_result
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 55, in _runner
                  result[0] = await coro
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 430, in _info
                  raise FileNotFoundError(url) from exc
              FileNotFoundError: https://minio.clarin-pl.eu/semrel/corpora/ner_export_json/ner_tele_export.json

Need help to make the dataset viewer work? Open a discussion for direct support.

Dataset Card for [Dataset Name]

Dataset Summary

Brand-Product Relation Extraction Corpora in Polish

Supported Tasks and Leaderboards

NER, Entity linking

Languages

Polish

Dataset Structure

Data Instances

[More Information Needed]

Data Fields

id: int identifier of a text
text: string text, for example a consumer comment on the social media
ner: extracted entities and their relationship
- source and target: a pair of entities identified in the text
  - from: int value representing starting character of the entity
  - text: string value with the entity text
  - to: int value representing end character of the entity
  - type: one of pre-identified entity types:
    - PRODUCT_NAME
    - PRODUCT_NAME_IMP
    - PRODUCT_NO_BRAND
    - BRAND_NAME
    - BRAND_NAME_IMP
    - VERSION
    - PRODUCT_ADJ
    - BRAND_ADJ
    - LOCATION
    - LOCATION_IMP

Data Splits

No train/validation/test split provided. Current dataset configurations point to 4 domain categories for the texts:

tele
electro
cosmetics
banking

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

[More Information Needed]

Citation Information

@inproceedings{inproceedings,
author = {Janz, Arkadiusz and Kopociński, Łukasz and Piasecki, Maciej and Pluwak, Agnieszka},
year = {2020},
month = {05},
pages = {},
title = {Brand-Product Relation Extraction Using Heterogeneous Vector Space Representations}
}

Contributions

Thanks to @kldarek for adding this dataset.

Downloads last month: 1,353

Edit dataset card

Evaluate models Model Database Leaderboard