Datasets:

toloka
/

TolokerGraph

Name: TolokerGraph
Creator: Toloka
License: https://choosealicense.com/licenses/cc-by-4.0/

Tasks:

Graph Machine Learning

Size Categories: 10K<n<100K

Tags: toloka graph node-classification

License: cc-by-4.0

Dataset card Files Files and versions Community

Dataset Viewer

Go to dataset viewer

Viewer

The dataset viewer is not available for this split.

Cannot load the dataset split (in streaming mode) to extract the first rows.

Error code:   StreamingRowsError
Exception:    KeyError
Message:      'source'
Traceback:    Traceback (most recent call last):
                File "/src/services/worker/src/worker/utils.py", line 257, in get_rows_or_raise
                  return get_rows(
                File "/src/services/worker/src/worker/utils.py", line 198, in decorator
                  return func(*args, **kwargs)
                File "/src/services/worker/src/worker/utils.py", line 235, in get_rows
                  rows_plus_one = list(itertools.islice(ds, rows_max_number + 1))
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 1383, in __iter__
                  example = _apply_feature_types_on_example(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 1075, in _apply_feature_types_on_example
                  encoded_example = features.encode_example(example)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/features/features.py", line 1852, in encode_example
                  return encode_nested_example(self, example)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/features/features.py", line 1229, in encode_nested_example
                  {
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/features/features.py", line 1229, in <dictcomp>
                  {
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 323, in zip_dict
                  yield key, tuple(d[key] for d in dicts)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/utils/py_utils.py", line 323, in <genexpr>
                  yield key, tuple(d[key] for d in dicts)
              KeyError: 'source'

Need help to make the dataset viewer work? Open a discussion for direct support.

Toloker Graph: Interaction of Crowd Annotators

Dataset Summary

This repository contains a graph representing interactions between crowd annotators on a project labeled on the Toloka crowdsourcing platform (see the Toloka overview for the details on the used terminology).

The graph contains 11,758 nodes and 519,000 edges. Each node represents an individual annotator; nodes are provided with four numerical and three categorical features. An edge is drawn between a pair of annotators if they annotated the same task. Also, each node is provided with a label showing whether the annotator was banned on this project, or not.

Nodes

Nodes are stored in the nodes.tsv file in the TSV format of the following structure:

id: unique identifier of the annotator
approved_rate: percentage of the approved labels of this annotator
skipped_rate: percentage of the skipped tasks of this annotator
expired_rate: percentage of the expired tasks of this annotator
rejected_rate: percentage of the rejected labels of this annotator
education: level of education as self-reported by this annotator (none, basic, middle, high)
english_profile: knowledge of English as self-reported by this annotator (0 for no, 1 for yes)
english_tested: whether the annotator passed the Toloka language test for English (0 for no, 1 for yes)
banned: whether the annotator was banned on this project (0 for no, 1 for yes)

The *_rate attributes should sum up to 1.

Edges

Edges are stored in the edges.tsv file in the TSV format of the following structure:

source: source identifier of the annotator
target: target identifier of the annotator

As the graph is undirected, source and target can be interchanged for the given pair of nodes.

Citation

Likhobaba, D., Pavlichenko, N., Ustalov, D. (2023). Toloker Graph: Interaction of Crowd Annotators. Zenodo. https://doi.org/10.5281/zenodo.7620795

@dataset{Tolokers,
  author     = {Likhobaba, Daniil and Pavlichenko, Nikita and Ustalov, Dmitry},
  title      = {{Toloker Graph: Interaction of Crowd Annotators}},
  year       = {2023},
  publisher  = {Zenodo},
  doi        = {10.5281/zenodo.7620795},
  url        = {https://github.com/Toloka/TolokerGraph},
  language   = {english},
}

Copyright

Licensed under the Creative Commons Attribution 4.0 License. See LICENSE file for more details.

Downloads last month: 2

Edit dataset card

Evaluate models HF Leaderboard