premise
string
hypothesis
string
label
class label
1 classes
idx
int32
0
1.1k
"The cat sat on the mat."
"The cat did not sit on the mat."
-1 (no label)
0
"The cat did not sit on the mat."
"The cat sat on the mat."
-1 (no label)
1
"When you've got no snow, it's really hard to learn a snow sport so we looked at all the different ways I could mimic being on snow without actually being on snow."
"When you've got snow, it's really hard to learn a snow sport so we looked at all the different ways I could mimic being on snow without actually being on snow."
-1 (no label)
2
"When you've got snow, it's really hard to learn a snow sport so we looked at all the different ways I could mimic being on snow without actually being on snow."
"When you've got no snow, it's really hard to learn a snow sport so we looked at all the different ways I could mimic being on snow without actually being on snow."
-1 (no label)
3
"Out of the box, Ouya supports media apps such as Twitch.tv and XBMC media player."
"Out of the box, Ouya doesn't support media apps such as Twitch.tv and XBMC media player."
-1 (no label)
4
"Out of the box, Ouya doesn't support media apps such as Twitch.tv and XBMC media player."
"Out of the box, Ouya supports media apps such as Twitch.tv and XBMC media player."
-1 (no label)
5
"Out of the box, Ouya supports media apps such as Twitch.tv and XBMC media player."
"Out of the box, Ouya supports Twitch.tv and XBMC media player."
-1 (no label)
6
"Out of the box, Ouya supports Twitch.tv and XBMC media player."
"Out of the box, Ouya supports media apps such as Twitch.tv and XBMC media player."
-1 (no label)
7
"Considering this definition, it is surprising to find frequent use of sarcastic language in opinionated user generated content."
"Considering this definition, it is not surprising to find frequent use of sarcastic language in opinionated user generated content."
-1 (no label)
8
"Considering this definition, it is not surprising to find frequent use of sarcastic language in opinionated user generated content."
"Considering this definition, it is surprising to find frequent use of sarcastic language in opinionated user generated content."
-1 (no label)
9
"The new gaming console is affordable."
"The new gaming console is unaffordable."
-1 (no label)
10
"The new gaming console is unaffordable."
"The new gaming console is affordable."
-1 (no label)
11
"Brexit is an irreversible decision, Sir Mike Rake, the chairman of WorldPay and ex-chairman of BT group, said as calls for a second EU referendum were sparked last week."
"Brexit is a reversible decision, Sir Mike Rake, the chairman of WorldPay and ex-chairman of BT group, said as calls for a second EU referendum were sparked last week."
-1 (no label)
12
"Brexit is a reversible decision, Sir Mike Rake, the chairman of WorldPay and ex-chairman of BT group, said as calls for a second EU referendum were sparked last week."
"Brexit is an irreversible decision, Sir Mike Rake, the chairman of WorldPay and ex-chairman of BT group, said as calls for a second EU referendum were sparked last week."
-1 (no label)
13
"We built our society on unclean energy."
"We built our society on clean energy."
-1 (no label)
14
"We built our society on clean energy."
"We built our society on unclean energy."
-1 (no label)
15
"Pursuing a strategy of nonviolent protest, Gandhi took the administration by surprise and won concessions from the authorities."
"Pursuing a strategy of violent protest, Gandhi took the administration by surprise and won concessions from the authorities."
-1 (no label)
16
"Pursuing a strategy of violent protest, Gandhi took the administration by surprise and won concessions from the authorities."
"Pursuing a strategy of nonviolent protest, Gandhi took the administration by surprise and won concessions from the authorities."
-1 (no label)
17
"Pursuing a strategy of nonviolent protest, Gandhi took the administration by surprise and won concessions from the authorities."
"Pursuing a strategy of protest, Gandhi took the administration by surprise and won concessions from the authorities."
-1 (no label)
18
"Pursuing a strategy of protest, Gandhi took the administration by surprise and won concessions from the authorities."
"Pursuing a strategy of nonviolent protest, Gandhi took the administration by surprise and won concessions from the authorities."
-1 (no label)
19
"And if both apply, they are essentially impossible."
"And if both apply, they are essentially possible."
-1 (no label)
20
"And if both apply, they are essentially possible."
"And if both apply, they are essentially impossible."
-1 (no label)
21
"Writing Java is not too different from programming with handcuffs."
"Writing Java is similar to programming with handcuffs."
-1 (no label)
22
"Writing Java is similar to programming with handcuffs."
"Writing Java is not too different from programming with handcuffs."
-1 (no label)
23
"The market is about to get harder, but not impossible to navigate."
"The market is about to get harder, but possible to navigate."
-1 (no label)
24
"The market is about to get harder, but possible to navigate."
"The market is about to get harder, but not impossible to navigate."
-1 (no label)
25
"Even after now finding out that it's animal feed, I won't ever stop being addicted to Flamin' Hot Cheetos."
"Even after now finding out that it's animal feed, I will never stop being addicted to Flamin' Hot Cheetos."
-1 (no label)
26
"Even after now finding out that it's animal feed, I will never stop being addicted to Flamin' Hot Cheetos."
"Even after now finding out that it's animal feed, I won't ever stop being addicted to Flamin' Hot Cheetos."
-1 (no label)
27
"He did not disagree with the party's position, but felt that if he resigned, his popularity with Indians would cease to stifle the party's membership."
"He agreed with the party's position, but felt that if he resigned, his popularity with Indians would cease to stifle the party's membership."
-1 (no label)
28
"He agreed with the party's position, but felt that if he resigned, his popularity with Indians would cease to stifle the party's membership."
"He did not disagree with the party's position, but felt that if he resigned, his popularity with Indians would cease to stifle the party's membership."
-1 (no label)
29
"If the pipeline tokenization scheme does not correspond to the one that was used when a model was created, a negative impact on the pipeline results would be expected."
"If the pipeline tokenization scheme does not correspond to the one that was used when a model was created, a negative impact on the pipeline results would not be unexpected."
-1 (no label)
30
"If the pipeline tokenization scheme does not correspond to the one that was used when a model was created, a negative impact on the pipeline results would not be unexpected."
"If the pipeline tokenization scheme does not correspond to the one that was used when a model was created, a negative impact on the pipeline results would be expected."
-1 (no label)
31
"If the pipeline tokenization scheme does not correspond to the one that was used when a model was created, a negative impact on the pipeline results would be expected."
"If the pipeline tokenization scheme does not correspond to the one that was used when a model was created, it would be expected to negatively impact the pipeline results."
-1 (no label)
32
"If the pipeline tokenization scheme does not correspond to the one that was used when a model was created, it would be expected to negatively impact the pipeline results."
"If the pipeline tokenization scheme does not correspond to the one that was used when a model was created, a negative impact on the pipeline results would be expected."
-1 (no label)
33
"If the pipeline tokenization scheme does not correspond to the one that was used when a model was created, a negative impact on the pipeline results would be expected."
"If the pipeline tokenization scheme does not correspond to the one that was used when a model was created, it would not be unexpected for it to negatively impact the pipeline results."
-1 (no label)
34
"If the pipeline tokenization scheme does not correspond to the one that was used when a model was created, it would not be unexpected for it to negatively impact the pipeline results."
"If the pipeline tokenization scheme does not correspond to the one that was used when a model was created, a negative impact on the pipeline results would be expected."
-1 (no label)
35
"The water is too hot."
"The water is too cold."
-1 (no label)
36
"The water is too cold."
"The water is too hot."
-1 (no label)
37
"Falcon Heavy is the largest rocket since NASA's Saturn V booster, which was used for the Moon missions in the 1970s."
"Falcon Heavy is the smallest rocket since NASA's Saturn V booster, which was used for the Moon missions in the 1970s."
-1 (no label)
38
"Falcon Heavy is the smallest rocket since NASA's Saturn V booster, which was used for the Moon missions in the 1970s."
"Falcon Heavy is the largest rocket since NASA's Saturn V booster, which was used for the Moon missions in the 1970s."
-1 (no label)
39
"Adenoiditis symptoms often persist for ten or more days, and often include pus-like discharge from nose."
"Adenoiditis symptoms often pass within ten days or less, and often include pus-like discharge from nose."
-1 (no label)
40
"Adenoiditis symptoms often pass within ten days or less, and often include pus-like discharge from nose."
"Adenoiditis symptoms often persist for ten or more days, and often include pus-like discharge from nose."
-1 (no label)
41
"In example (1) it is quite straightforward to see the exaggerated positive sentiment used in order to convey strong negative feelings."
"In example (1) it is quite difficult to see the exaggerated positive sentiment used in order to convey strong negative feelings."
-1 (no label)
42
"In example (1) it is quite difficult to see the exaggerated positive sentiment used in order to convey strong negative feelings."
"In example (1) it is quite straightforward to see the exaggerated positive sentiment used in order to convey strong negative feelings."
-1 (no label)
43
"In example (1) it is quite straightforward to see the exaggerated positive sentiment used in order to convey strong negative feelings."
"In example (1) it is quite easy to see the exaggerated positive sentiment used in order to convey strong negative feelings."
-1 (no label)
44
"In example (1) it is quite easy to see the exaggerated positive sentiment used in order to convey strong negative feelings."
"In example (1) it is quite straightforward to see the exaggerated positive sentiment used in order to convey strong negative feelings."
-1 (no label)
45
"In example (1) it is quite straightforward to see the exaggerated positive sentiment used in order to convey strong negative feelings."
"In example (1) it is quite important to see the exaggerated positive sentiment used in order to convey strong negative feelings."
-1 (no label)
46
"In example (1) it is quite important to see the exaggerated positive sentiment used in order to convey strong negative feelings."
"In example (1) it is quite straightforward to see the exaggerated positive sentiment used in order to convey strong negative feelings."
-1 (no label)
47
"Some dogs like to scratch their ears."
"Some animals like to scratch their ears."
-1 (no label)
48
"Some animals like to scratch their ears."
"Some dogs like to scratch their ears."
-1 (no label)
49
"Cruz has frequently derided as "amnesty" various plans that confer legal status or citizenship on people living in the country illegally."
"Cruz has frequently derided as "amnesty" various bills that confer legal status or citizenship on people living in the country illegally."
-1 (no label)
50
"Cruz has frequently derided as "amnesty" various bills that confer legal status or citizenship on people living in the country illegally."
"Cruz has frequently derided as "amnesty" various plans that confer legal status or citizenship on people living in the country illegally."
-1 (no label)
51
"Most of the graduates of my program have moved on to other things because the jobs suck."
"Some of the graduates of my program have moved on to other things because the jobs suck."
-1 (no label)
52
"Some of the graduates of my program have moved on to other things because the jobs suck."
"Most of the graduates of my program have moved on to other things because the jobs suck."
-1 (no label)
53
"In many developed areas, human activity has changed the form of river channels, altering magnitudes and frequencies of flooding."
"In many areas, human activity has changed the form of river channels, altering magnitudes and frequencies of flooding."
-1 (no label)
54
"In many areas, human activity has changed the form of river channels, altering magnitudes and frequencies of flooding."
"In many developed areas, human activity has changed the form of river channels, altering magnitudes and frequencies of flooding."
-1 (no label)
55
"We consider some context words as positive examples and sample negatives at random from the dictionary."
"We consider some words as positive examples and sample negatives at random from the dictionary."
-1 (no label)
56
"We consider some words as positive examples and sample negatives at random from the dictionary."
"We consider some context words as positive examples and sample negatives at random from the dictionary."
-1 (no label)
57
"We consider some context words as positive examples and sample negatives at random from the dictionary."
"We consider all context words as positive examples and sample many negatives at random from the dictionary."
-1 (no label)
58
"We consider all context words as positive examples and sample many negatives at random from the dictionary."
"We consider some context words as positive examples and sample negatives at random from the dictionary."
-1 (no label)
59
"We consider some context words as positive examples and sample negatives at random from the dictionary."
"We consider many context words as positive examples and sample negatives at random from the dictionary."
-1 (no label)
60
"We consider many context words as positive examples and sample negatives at random from the dictionary."
"We consider some context words as positive examples and sample negatives at random from the dictionary."
-1 (no label)
61
"We consider all context words as positive examples and sample negatives at random from the dictionary."
"We consider all words as positive examples and sample negatives at random from the dictionary."
-1 (no label)
62
"We consider all words as positive examples and sample negatives at random from the dictionary."
"We consider all context words as positive examples and sample negatives at random from the dictionary."
-1 (no label)
63
"All dogs like to scratch their ears."
"All animals like to scratch their ears."
-1 (no label)
64
"All animals like to scratch their ears."
"All dogs like to scratch their ears."
-1 (no label)
65
"Cruz has frequently derided as "amnesty" any plan that confers legal status or citizenship on people living in the country illegally."
"Cruz has frequently derided as "amnesty" any bill that confers legal status or citizenship on people living in the country illegally."
-1 (no label)
66
"Cruz has frequently derided as "amnesty" any bill that confers legal status or citizenship on people living in the country illegally."
"Cruz has frequently derided as "amnesty" any plan that confers legal status or citizenship on people living in the country illegally."
-1 (no label)
67
"Most of the graduates of my program have moved on to other things because the jobs suck."
"None of the graduates of my program have moved on to other things because the jobs suck."
-1 (no label)
68
"None of the graduates of my program have moved on to other things because the jobs suck."
"Most of the graduates of my program have moved on to other things because the jobs suck."
-1 (no label)
69
"Most of the graduates of my program have moved on to other things because the jobs suck."
"All of the graduates of my program have moved on to other things because the jobs suck."
-1 (no label)
70
"All of the graduates of my program have moved on to other things because the jobs suck."
"Most of the graduates of my program have moved on to other things because the jobs suck."
-1 (no label)
71
"In all areas, human activity has changed the form of river channels, altering magnitudes and frequencies of flooding."
"In all developed areas, human activity has changed the form of river channels, altering magnitudes and frequencies of flooding."
-1 (no label)
72
"In all developed areas, human activity has changed the form of river channels, altering magnitudes and frequencies of flooding."
"In all areas, human activity has changed the form of river channels, altering magnitudes and frequencies of flooding."
-1 (no label)
73
"Tom and Adam were whispering in the theater."
"Tom and Adam were whispering quietly in the theater."
-1 (no label)
74
"Tom and Adam were whispering quietly in the theater."
"Tom and Adam were whispering in the theater."
-1 (no label)
75
"Tom and Adam were whispering in the theater."
"Tom and Adam were whispering loudly in the theater."
-1 (no label)
76
"Tom and Adam were whispering loudly in the theater."
"Tom and Adam were whispering in the theater."
-1 (no label)
77
"Prior to the dance, which is voluntary, students are told to fill out a card by selecting five people they want to dance with."
"Prior to the dance, which is voluntary, students are told to fill out a card by selecting five different people they want to dance with."
-1 (no label)
78
"Prior to the dance, which is voluntary, students are told to fill out a card by selecting five different people they want to dance with."
"Prior to the dance, which is voluntary, students are told to fill out a card by selecting five people they want to dance with."
-1 (no label)
79
"Notifications about Farmville and other crap had become unbearable, then the shift to the non-chronological timeline happened and the content from your friends started to be replaced by ads and other cringy wannabe-viral campaigns."
"Notifications about Farmville and other crappy apps had become unbearable, then the shift to the non-chronological timeline happened and the content from your friends started to be replaced by ads and other cringy wannabe-viral campaigns."
-1 (no label)
80
"Notifications about Farmville and other crappy apps had become unbearable, then the shift to the non-chronological timeline happened and the content from your friends started to be replaced by ads and other cringy wannabe-viral campaigns."
"Notifications about Farmville and other crap had become unbearable, then the shift to the non-chronological timeline happened and the content from your friends started to be replaced by ads and other cringy wannabe-viral campaigns."
-1 (no label)
81
"Chicago City Hall is the official seat of government of the City of Chicago."
"Chicago City Hall is the official seat of government of Chicago."
-1 (no label)
82
"Chicago City Hall is the official seat of government of Chicago."
"Chicago City Hall is the official seat of government of the City of Chicago."
-1 (no label)
83
"The question generation aspect is unique to our formulation, and corresponds roughly to identifying what semantic role labels are present in previous formulations of the task."
"The question generation aspect is unique to our formulation, and corresponds roughly to identifying what semantic role labels are present in previous other formulations of the task."
-1 (no label)
84
"The question generation aspect is unique to our formulation, and corresponds roughly to identifying what semantic role labels are present in previous other formulations of the task."
"The question generation aspect is unique to our formulation, and corresponds roughly to identifying what semantic role labels are present in previous formulations of the task."
-1 (no label)
85
"John ate pasta for dinner."
"John ate pasta for supper."
-1 (no label)
86
"John ate pasta for supper."
"John ate pasta for dinner."
-1 (no label)
87
"John ate pasta for dinner."
"John ate pasta for breakfast."
-1 (no label)
88
"John ate pasta for breakfast."
"John ate pasta for dinner."
-1 (no label)
89
"House Speaker Paul Ryan was facing problems from fellow Republicans dissatisfied with his leadership."
"House Speaker Paul Ryan was facing problems from fellow Republicans unhappy with his leadership."
-1 (no label)
90
"House Speaker Paul Ryan was facing problems from fellow Republicans unhappy with his leadership."
"House Speaker Paul Ryan was facing problems from fellow Republicans dissatisfied with his leadership."
-1 (no label)
91
"House Speaker Paul Ryan was facing problems uniquely from fellow Republicans dissatisfied with his leadership."
"House Speaker Paul Ryan was facing problems uniquely from fellow Republicans supportive of his leadership."
-1 (no label)
92
"House Speaker Paul Ryan was facing problems uniquely from fellow Republicans supportive of his leadership."
"House Speaker Paul Ryan was facing problems uniquely from fellow Republicans dissatisfied with his leadership."
-1 (no label)
93
"I can actually see him climbing into a Lincoln saying this."
"I can actually see him getting into a Lincoln saying this."
-1 (no label)
94
"I can actually see him getting into a Lincoln saying this."
"I can actually see him climbing into a Lincoln saying this."
-1 (no label)
95
"I can actually see him climbing into a Lincoln saying this."
"I can actually see him climbing into a Mazda saying this."
-1 (no label)
96
"I can actually see him climbing into a Mazda saying this."
"I can actually see him climbing into a Lincoln saying this."
-1 (no label)
97
"The villain is the character who tends to have a negative effect on other characters."
"The villain is the character who tends to have a negative impact on other characters."
-1 (no label)
98
"The villain is the character who tends to have a negative impact on other characters."
"The villain is the character who tends to have a negative effect on other characters."
-1 (no label)
99

Dataset Card for GLUE

Dataset Summary

GLUE, the General Language Understanding Evaluation benchmark (https://gluebenchmark.com/) is a collection of resources for training, evaluating, and analyzing natural language understanding systems.

Supported Tasks and Leaderboards

The leaderboard for the GLUE benchmark can be found at this address. It comprises the following tasks:

ax

A manually-curated evaluation dataset for fine-grained analysis of system performance on a broad range of linguistic phenomena. This dataset evaluates sentence understanding through Natural Language Inference (NLI) problems. Use a model trained on MulitNLI to produce predictions for this dataset.

cola

The Corpus of Linguistic Acceptability consists of English acceptability judgments drawn from books and journal articles on linguistic theory. Each example is a sequence of words annotated with whether it is a grammatical English sentence.

mnli

The Multi-Genre Natural Language Inference Corpus is a crowdsourced collection of sentence pairs with textual entailment annotations. Given a premise sentence and a hypothesis sentence, the task is to predict whether the premise entails the hypothesis (entailment), contradicts the hypothesis (contradiction), or neither (neutral). The premise sentences are gathered from ten different sources, including transcribed speech, fiction, and government reports. The authors of the benchmark use the standard test set, for which they obtained private labels from the RTE authors, and evaluate on both the matched (in-domain) and mismatched (cross-domain) section. They also uses and recommend the SNLI corpus as 550k examples of auxiliary training data.

mnli_matched

The matched validation and test splits from MNLI. See the "mnli" BuilderConfig for additional information.

mnli_mismatched

The mismatched validation and test splits from MNLI. See the "mnli" BuilderConfig for additional information.

mrpc

The Microsoft Research Paraphrase Corpus (Dolan & Brockett, 2005) is a corpus of sentence pairs automatically extracted from online news sources, with human annotations for whether the sentences in the pair are semantically equivalent.

qnli

The Stanford Question Answering Dataset is a question-answering dataset consisting of question-paragraph pairs, where one of the sentences in the paragraph (drawn from Wikipedia) contains the answer to the corresponding question (written by an annotator). The authors of the benchmark convert the task into sentence pair classification by forming a pair between each question and each sentence in the corresponding context, and filtering out pairs with low lexical overlap between the question and the context sentence. The task is to determine whether the context sentence contains the answer to the question. This modified version of the original task removes the requirement that the model select the exact answer, but also removes the simplifying assumptions that the answer is always present in the input and that lexical overlap is a reliable cue.

qqp

The Quora Question Pairs2 dataset is a collection of question pairs from the community question-answering website Quora. The task is to determine whether a pair of questions are semantically equivalent.

rte

The Recognizing Textual Entailment (RTE) datasets come from a series of annual textual entailment challenges. The authors of the benchmark combined the data from RTE1 (Dagan et al., 2006), RTE2 (Bar Haim et al., 2006), RTE3 (Giampiccolo et al., 2007), and RTE5 (Bentivogli et al., 2009). Examples are constructed based on news and Wikipedia text. The authors of the benchmark convert all datasets to a two-class split, where for three-class datasets they collapse neutral and contradiction into not entailment, for consistency.

sst2

The Stanford Sentiment Treebank consists of sentences from movie reviews and human annotations of their sentiment. The task is to predict the sentiment of a given sentence. It uses the two-way (positive/negative) class split, with only sentence-level labels.

stsb

The Semantic Textual Similarity Benchmark (Cer et al., 2017) is a collection of sentence pairs drawn from news headlines, video and image captions, and natural language inference data. Each pair is human-annotated with a similarity score from 1 to 5.

wnli

The Winograd Schema Challenge (Levesque et al., 2011) is a reading comprehension task in which a system must read a sentence with a pronoun and select the referent of that pronoun from a list of choices. The examples are manually constructed to foil simple statistical methods: Each one is contingent on contextual information provided by a single word or phrase in the sentence. To convert the problem into sentence pair classification, the authors of the benchmark construct sentence pairs by replacing the ambiguous pronoun with each possible referent. The task is to predict if the sentence with the pronoun substituted is entailed by the original sentence. They use a small evaluation set consisting of new examples derived from fiction books that was shared privately by the authors of the original corpus. While the included training set is balanced between two classes, the test set is imbalanced between them (65% not entailment). Also, due to a data quirk, the development set is adversarial: hypotheses are sometimes shared between training and development examples, so if a model memorizes the training examples, they will predict the wrong label on corresponding development set example. As with QNLI, each example is evaluated separately, so there is not a systematic correspondence between a model's score on this task and its score on the unconverted original task. The authors of the benchmark call converted dataset WNLI (Winograd NLI).

Languages

The language data in GLUE is in English (BCP-47 en)

Dataset Structure

Data Instances

ax

  • Size of downloaded dataset files: 0.22 MB
  • Size of the generated dataset: 0.24 MB
  • Total amount of disk used: 0.46 MB

An example of 'test' looks as follows.

{
  "premise": "The cat sat on the mat.",
  "hypothesis": "The cat did not sit on the mat.",
  "label": -1,
  "idx: 0
}

cola

  • Size of downloaded dataset files: 0.38 MB
  • Size of the generated dataset: 0.61 MB
  • Total amount of disk used: 0.99 MB

An example of 'train' looks as follows.

{
  "sentence": "Our friends won't buy this analysis, let alone the next one we propose.",
  "label": 1,
  "id": 0
}

mnli

  • Size of downloaded dataset files: 312.78 MB
  • Size of the generated dataset: 82.47 MB
  • Total amount of disk used: 395.26 MB

An example of 'train' looks as follows.

{
  "premise": "Conceptually cream skimming has two basic dimensions - product and geography.",
  "hypothesis": "Product and geography are what make cream skimming work.",
  "label": 1,
  "idx": 0
}

mnli_matched

  • Size of downloaded dataset files: 312.78 MB
  • Size of the generated dataset: 3.69 MB
  • Total amount of disk used: 316.48 MB

An example of 'test' looks as follows.

{
  "premise": "Hierbas, ans seco, ans dulce, and frigola are just a few names worth keeping a look-out for.",
  "hypothesis": "Hierbas is a name worth looking out for.",
  "label": -1,
  "idx": 0
}

mnli_mismatched

  • Size of downloaded dataset files: 312.78 MB
  • Size of the generated dataset: 3.91 MB
  • Total amount of disk used: 316.69 MB

An example of 'test' looks as follows.

{
  "premise": "What have you decided, what are you going to do?",
  "hypothesis": "So what's your decision?,
  "label": -1,
  "idx": 0
}

mrpc

More Information Needed

qnli

More Information Needed

qqp

More Information Needed

rte

More Information Needed

sst2

More Information Needed

stsb

More Information Needed

wnli

More Information Needed

Data Fields

The data fields are the same among all splits.

ax

  • premise: a string feature.
  • hypothesis: a string feature.
  • label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).
  • idx: a int32 feature.

cola

  • sentence: a string feature.
  • label: a classification label, with possible values including unacceptable (0), acceptable (1).
  • idx: a int32 feature.

mnli

  • premise: a string feature.
  • hypothesis: a string feature.
  • label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).
  • idx: a int32 feature.

mnli_matched

  • premise: a string feature.
  • hypothesis: a string feature.
  • label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).
  • idx: a int32 feature.

mnli_mismatched

  • premise: a string feature.
  • hypothesis: a string feature.
  • label: a classification label, with possible values including entailment (0), neutral (1), contradiction (2).
  • idx: a int32 feature.

mrpc

More Information Needed

qnli

More Information Needed

qqp

More Information Needed

rte

More Information Needed

sst2

More Information Needed

stsb

More Information Needed

wnli

More Information Needed

Data Splits

ax

test
ax 1104

cola

train validation test
cola 8551 1043 1063

mnli

train validation_matched validation_mismatched test_matched test_mismatched
mnli 392702 9815 9832 9796 9847

mnli_matched

validation test
mnli_matched 9815 9796

mnli_mismatched

validation test
mnli_mismatched 9832 9847

mrpc

More Information Needed

qnli

More Information Needed

qqp

More Information Needed

rte

More Information Needed

sst2

More Information Needed

stsb

More Information Needed

wnli

More Information Needed

Dataset Creation

Curation Rationale

More Information Needed

Source Data

Initial Data Collection and Normalization

More Information Needed

Who are the source language producers?

More Information Needed

Annotations

Annotation process

More Information Needed

Who are the annotators?

More Information Needed

Personal and Sensitive Information

More Information Needed

Considerations for Using the Data

Social Impact of Dataset

More Information Needed

Discussion of Biases

More Information Needed

Other Known Limitations

More Information Needed

Additional Information

Dataset Curators

More Information Needed

Licensing Information

More Information Needed

Citation Information

@article{warstadt2018neural,
  title={Neural Network Acceptability Judgments},
  author={Warstadt, Alex and Singh, Amanpreet and Bowman, Samuel R},
  journal={arXiv preprint arXiv:1805.12471},
  year={2018}
}
@inproceedings{wang2019glue,
  title={{GLUE}: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding},
  author={Wang, Alex and Singh, Amanpreet and Michael, Julian and Hill, Felix and Levy, Omer and Bowman, Samuel R.},
  note={In the Proceedings of ICLR.},
  year={2019}
}

Note that each GLUE dataset has its own citation. Please see the source to see
the correct citation for each contained dataset.

Contributions

Thanks to @patpizio, @jeswan, @thomwolf, @patrickvonplaten, @mariamabarham for adding this dataset.

Downloads last month
1,348,977

Models trained or fine-tuned on glue

Spaces using glue 14