Datasets:
The dataset viewer is not available for this split.
Error code: JobManagerExceededMaximumDurationError
Need help to make the dataset viewer work? Open a discussion for direct support.
Dataset Card for "trivia_qa"
Dataset Summary
TriviaqQA is a reading comprehension dataset containing over 650K question-answer-evidence triples. TriviaqQA includes 95K question-answer pairs authored by trivia enthusiasts and independently gathered evidence documents, six per question on average, that provide high quality distant supervision for answering the questions.
Supported Tasks and Leaderboards
Languages
English.
Dataset Structure
Data Instances
rc
- Size of downloaded dataset files: 2.67 GB
- Size of the generated dataset: 16.02 GB
- Total amount of disk used: 18.68 GB
An example of 'train' looks as follows.
rc.nocontext
- Size of downloaded dataset files: 2.67 GB
- Size of the generated dataset: 126.27 MB
- Total amount of disk used: 2.79 GB
An example of 'train' looks as follows.
unfiltered
- Size of downloaded dataset files: 3.30 GB
- Size of the generated dataset: 29.24 GB
- Total amount of disk used: 32.54 GB
An example of 'validation' looks as follows.
unfiltered.nocontext
- Size of downloaded dataset files: 632.55 MB
- Size of the generated dataset: 74.56 MB
- Total amount of disk used: 707.11 MB
An example of 'train' looks as follows.
Data Fields
The data fields are the same among all splits.
rc
question
: astring
feature.question_id
: astring
feature.question_source
: astring
feature.entity_pages
: a dictionary feature containing:doc_source
: astring
feature.filename
: astring
feature.title
: astring
feature.wiki_context
: astring
feature.
search_results
: a dictionary feature containing:description
: astring
feature.filename
: astring
feature.rank
: aint32
feature.title
: astring
feature.url
: astring
feature.search_context
: astring
feature.
aliases
: alist
ofstring
features.normalized_aliases
: alist
ofstring
features.matched_wiki_entity_name
: astring
feature.normalized_matched_wiki_entity_name
: astring
feature.normalized_value
: astring
feature.type
: astring
feature.value
: astring
feature.
rc.nocontext
question
: astring
feature.question_id
: astring
feature.question_source
: astring
feature.entity_pages
: a dictionary feature containing:doc_source
: astring
feature.filename
: astring
feature.title
: astring
feature.wiki_context
: astring
feature.
search_results
: a dictionary feature containing:description
: astring
feature.filename
: astring
feature.rank
: aint32
feature.title
: astring
feature.url
: astring
feature.search_context
: astring
feature.
aliases
: alist
ofstring
features.normalized_aliases
: alist
ofstring
features.matched_wiki_entity_name
: astring
feature.normalized_matched_wiki_entity_name
: astring
feature.normalized_value
: astring
feature.type
: astring
feature.value
: astring
feature.
unfiltered
question
: astring
feature.question_id
: astring
feature.question_source
: astring
feature.entity_pages
: a dictionary feature containing:doc_source
: astring
feature.filename
: astring
feature.title
: astring
feature.wiki_context
: astring
feature.
search_results
: a dictionary feature containing:description
: astring
feature.filename
: astring
feature.rank
: aint32
feature.title
: astring
feature.url
: astring
feature.search_context
: astring
feature.
aliases
: alist
ofstring
features.normalized_aliases
: alist
ofstring
features.matched_wiki_entity_name
: astring
feature.normalized_matched_wiki_entity_name
: astring
feature.normalized_value
: astring
feature.type
: astring
feature.value
: astring
feature.
unfiltered.nocontext
question
: astring
feature.question_id
: astring
feature.question_source
: astring
feature.entity_pages
: a dictionary feature containing:doc_source
: astring
feature.filename
: astring
feature.title
: astring
feature.wiki_context
: astring
feature.
search_results
: a dictionary feature containing:description
: astring
feature.filename
: astring
feature.rank
: aint32
feature.title
: astring
feature.url
: astring
feature.search_context
: astring
feature.
aliases
: alist
ofstring
features.normalized_aliases
: alist
ofstring
features.matched_wiki_entity_name
: astring
feature.normalized_matched_wiki_entity_name
: astring
feature.normalized_value
: astring
feature.type
: astring
feature.value
: astring
feature.
Data Splits
name | train | validation | test |
---|---|---|---|
rc | 138384 | 18669 | 17210 |
rc.nocontext | 138384 | 18669 | 17210 |
unfiltered | 87622 | 11313 | 10832 |
unfiltered.nocontext | 87622 | 11313 | 10832 |
Dataset Creation
Curation Rationale
Source Data
Initial Data Collection and Normalization
Who are the source language producers?
Annotations
Annotation process
Who are the annotators?
Personal and Sensitive Information
Considerations for Using the Data
Social Impact of Dataset
Discussion of Biases
Other Known Limitations
Additional Information
Dataset Curators
Licensing Information
The University of Washington does not own the copyright of the questions and documents included in TriviaQA.
Citation Information
@article{2017arXivtriviaqa,
author = {{Joshi}, Mandar and {Choi}, Eunsol and {Weld},
Daniel and {Zettlemoyer}, Luke},
title = "{triviaqa: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension}",
journal = {arXiv e-prints},
year = 2017,
eid = {arXiv:1705.03551},
pages = {arXiv:1705.03551},
archivePrefix = {arXiv},
eprint = {1705.03551},
}
Contributions
Thanks to @thomwolf, @patrickvonplaten, @lewtun for adding this dataset.
- Downloads last month
- 71,497