Dataset Preview
Viewer
The full dataset viewer is not available (click to read why). Only showing a preview of the rows.
Couldn't get the size of external files in `_split_generators` because a request failed: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Read timed out. (read timeout=10.0) Please consider moving your data files in this dataset repository instead (e.g. inside a data/ folder).
Error code:   ExternalFilesSizeRequestTimeoutError
Exception:    ReadTimeout
Message:      HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Read timed out. (read timeout=10.0)
Traceback:    Traceback (most recent call last):
                File "/src/services/worker/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 466, in _make_request
                  six.raise_from(e, None)
                File "<string>", line 3, in raise_from
                File "/src/services/worker/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 461, in _make_request
                  httplib_response = conn.getresponse()
                File "/usr/local/lib/python3.9/http/client.py", line 1377, in getresponse
                  response.begin()
                File "/usr/local/lib/python3.9/http/client.py", line 320, in begin
                  version, status, reason = self._read_status()
                File "/usr/local/lib/python3.9/http/client.py", line 281, in _read_status
                  line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
                File "/usr/local/lib/python3.9/socket.py", line 704, in readinto
                  return self._sock.recv_into(b)
                File "/usr/local/lib/python3.9/ssl.py", line 1242, in recv_into
                  return self.read(nbytes, buffer)
                File "/usr/local/lib/python3.9/ssl.py", line 1100, in read
                  return self._sslobj.read(len, buffer)
              socket.timeout: The read operation timed out
              
              During handling of the above exception, another exception occurred:
              
              Traceback (most recent call last):
                File "/src/services/worker/.venv/lib/python3.9/site-packages/requests/adapters.py", line 486, in send
                  resp = conn.urlopen(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 798, in urlopen
                  retries = retries.increment(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/urllib3/util/retry.py", line 550, in increment
                  raise six.reraise(type(error), error, _stacktrace)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/urllib3/packages/six.py", line 770, in reraise
                  raise value
                File "/src/services/worker/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 714, in urlopen
                  httplib_response = self._make_request(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 468, in _make_request
                  self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/urllib3/connectionpool.py", line 357, in _raise_timeout
                  raise ReadTimeoutError(
              urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Read timed out. (read timeout=10.0)
              
              During handling of the above exception, another exception occurred:
              
              Traceback (most recent call last):
                File "/src/services/worker/src/worker/job_runners/config/parquet_and_info.py", line 488, in _is_too_big_from_external_data_files
                  for i, size in enumerate(pool.imap_unordered(get_size, ext_data_files)):
                File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 870, in next
                  raise value
                File "/usr/local/lib/python3.9/multiprocessing/pool.py", line 125, in worker
                  result = (True, func(*args, **kwds))
                File "/src/services/worker/src/worker/job_runners/config/parquet_and_info.py", line 386, in _request_size
                  response = http_head(url, headers=headers, max_retries=3)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/utils/file_utils.py", line 429, in http_head
                  response = _request_with_retry(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/utils/file_utils.py", line 328, in _request_with_retry
                  response = requests.request(method=method.upper(), url=url, timeout=timeout, **params)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/requests/api.py", line 59, in request
                  return session.request(method=method, url=url, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/requests/sessions.py", line 589, in request
                  resp = self.send(prep, **send_kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/requests/sessions.py", line 703, in send
                  r = adapter.send(request, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/requests/adapters.py", line 532, in send
                  raise ReadTimeout(e, request=request)
              requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Read timed out. (read timeout=10.0)

Need help to make the dataset viewer work? Open a discussion for direct support.

sentence1
string
sentence2
string
similarity_score
float32
"A plane is taking off."
"An air plane is taking off."
5
"A man is playing a large flute."
"A man is playing a flute."
3.8
"A man is spreading shreded cheese on a pizza."
"A man is spreading shredded cheese on an uncooked pizza."
3.8
"Three men are playing chess."
"Two men are playing chess."
2.6
"A man is playing the cello."
"A man seated is playing the cello."
4.25
"Some men are fighting."
"Two men are fighting."
4.25
"A man is smoking."
"A man is skating."
0.5
"The man is playing the piano."
"The man is playing the guitar."
1.6
"A man is playing on a guitar and singing."
"A woman is playing an acoustic guitar and singing."
2.2
"A person is throwing a cat on to the ceiling."
"A person throws a cat on the ceiling."
5
"The man hit the other man with a stick."
"The man spanked the other man with a stick."
4.2
"A woman picks up and holds a baby kangaroo."
"A woman picks up and holds a baby kangaroo in her arms."
4.6
"A man is playing a flute."
"A man is playing a bamboo flute."
3.867
"A person is folding a piece of paper."
"Someone is folding a piece of paper."
4.667
"A man is running on the road."
"A panda dog is running on the road."
1.667
"A dog is trying to get bacon off his back."
"A dog is trying to eat the bacon on its back."
3.75
"The polar bear is sliding on the snow."
"A polar bear is sliding across the snow."
5
"A woman is writing."
"A woman is swimming."
0.5
"A cat is rubbing against baby's face."
"A cat is rubbing against a baby."
3.8
"The man is riding a horse."
"A man is riding on a horse."
5
"A man pours oil into a pot."
"A man pours wine in a pot."
3.2
"A man is playing a guitar."
"A girl is playing a guitar."
2.8
"A panda is sliding down a slide."
"A panda slides down a slide."
4.6
"A woman is eating something."
"A woman is eating meat."
3
"A woman peels a potato."
"A woman is peeling a potato."
5
"The boy fell off his bike."
"A boy falls off his bike."
4.8
"The woman is playing the flute."
"A woman is playing a flute."
5
"A rabbit is running from an eagle."
"A hare is running from a eagle."
4.2
"The woman is frying a breaded pork chop."
"A woman is cooking a breaded pork chop."
4.2
"A girl is flying a kite."
"A girl running is flying a kite."
4
"A man is riding a mechanical bull."
"A man rode a mechanical bull."
4
"The man is playing the guitar."
"A man is playing a guitar."
4.909
"A woman is dancing and singing with other women."
"A woman is dancing and singing in the rain."
3
"A man is slicing a bun."
"A man is slicing an onion."
2.4
"A man is pouring oil into a pan."
"A man is pouring oil into a skillet."
4.2
"A lion is playing with people."
"A lion is playing with two men."
3.4
"A dog rides a skateboard."
"A dog is riding a skateboard."
5
"Someone is carving a statue."
"A man is carving a statue."
3.75
"A woman is slicing an onion."
"A man is cutting an onion."
2.75
"A woman peels shrimp."
"A woman is peeling shrimp."
5
"A woman is frying fish."
"A woman is cooking fish."
4
"A woman is playing an electric guitar."
"A woman is playing a guitar."
3.6
"A baby tiger is playing with a ball."
"A baby is playing with a doll."
1.6
"A person is slicing a tomato."
"A person is slicing some meat."
1.75
"A person cuts an onion."
"A person is cutting an onion."
5
"A man is playing the piano."
"A woman is playing the violin."
1
"A woman is playing the flute."
"A man is playing the guitar."
1
"A man is cutting up a potato."
"A man is cutting up carrots."
2.375
"A kid is playing guitar."
"A boy is playing a guitar."
3.8
"A boy is playing guitar."
"A man is playing a guitar."
3.2
"A man is playing guitar."
"A boy is playing a guitar."
3.2
"A little boy is playing a keyboard."
"A boy is playing key board."
4.4
"A man is playing a guitar."
"A man is playing an electric guitar."
3.75
"A dog licks a baby."
"A dog is licking a baby."
4.75
"A woman is slicing an onion."
"A man is cutting and onion."
3.2
"A man is playing the guitar."
"A man is playing the drums."
1.556
"A woman is slicing a pepper."
"A woman is cutting a red pepper."
3.938
"A man is playing the drums."
"A man plays the drum."
5
"A woman rides a horse."
"A woman is riding a horse."
5
"A man is eating a banana by a tree."
"A man is eating a banana."
4
"A cat is playing a key board."
"A man is playing two keyboards."
1.6
"A man chops down a tree with an axe."
"A man cut a tree with an axe."
4.75
"A kid plays with a toy phone."
"A little boy plays with a toy phone."
3.5
"A man is riding a motorcycle."
"A man is riding a horse."
1.4
"A man is riding a motorcycle."
"A man is riding a horse."
1.4
"A squirrel is spinning around in circles."
"A squirrel runs around in circles."
4
"A man and a woman are kissing."
"A man and woman kiss."
5
"A man is getting into a car."
"A man is getting into a car in a garage."
3.833
"A man is dancing."
"A man is talking."
0.6
"A man is playing the guitar and singing."
"A man is playing the guitar."
2.917
"A person is cutting mushrooms."
"A person is cutting mushrooms with a knife."
4.2
"A tiger cub is making a sound."
"A tiger is walking around."
2
"A person is slicing onions."
"A person is peeling an onion."
2.6
"A man is playing the piano."
"A man is playing the trumpet."
1.6
"A woman is peeling a potato."
"A woman is peeling an apple."
2
"A pankda is eating bamboo."
"A panda bear is eating some bamboo."
4.2
"A person is peeling an onion."
"A person is peeling an eggplant."
2
"A monkey pushes another monkey."
"The monkey pushed the other monkey."
4.8
"A squirrel runs around in circles."
"A squirrel is moving in circles."
4.4
"A man is tying his shoe."
"A man ties his shoe."
5
"A boy is singing and playing the piano."
"A boy is playing the piano."
3
"A dog is eating water melon."
"A dog is eating a piece of watermelon."
4.25
"A woman is chopping broccoli."
"A woman is chopping broccoli with a knife."
4.25
"A man is peeling a potato."
"A man peeled a potatoe."
3.8
"A woman is playing a guitar."
"A man plays a guitar."
2.4
"A woman is slicing tomato."
"A man is slicing onion."
1.6
"A man swims underwater."
"A woman is swimming underwater."
2
"A man and woman are talking."
"A man and woman is eating."
1.6
"A small dog is chasing a yoga ball."
"A dog is chasing a ball."
4
"The men are playing cricket."
"The men are playing basketball."
2.2
"A man rides off on a motorcycle."
"A man is riding on a motorcycle."
4.4
"A man is playing a guitar."
"A man is singing and playing a guitar."
3.6
"The man talked on the telephone."
"The man is talking on the phone."
3.6
"A man is fishing."
"A man is exercising."
0.5
"A man is levitating."
"A man is talking."
0.8
"Two boys are driving."
"Two bays are dancing."
0.6
"A man is riding on a horse."
"A girl is riding a horse."
2.6
"A man is riding a bicycle."
"A monkey is riding a bike."
2
"A man is slicing potatoes."
"A woman is peeling potato."
2.2
"A woman is peeling a potato."
"A man is slicing potato."
2.4
End of preview (truncated to 100 rows)

Dataset Card for STSb Multi MT

Dataset Summary

STS Benchmark comprises a selection of the English datasets used in the STS tasks organized in the context of SemEval between 2012 and 2017. The selection of datasets include text from image captions, news headlines and user forums. (source)

These are different multilingual translations and the English original of the STSbenchmark dataset. Translation has been done with deepl.com. It can be used to train sentence embeddings like T-Systems-onsite/cross-en-de-roberta-sentence-transformer.

Examples of Use

Load German dev Dataset:

from datasets import load_dataset
dataset = load_dataset("stsb_multi_mt", name="de", split="dev")

Load English train Dataset:

from datasets import load_dataset
dataset = load_dataset("stsb_multi_mt", name="en", split="train")

Supported Tasks and Leaderboards

[More Information Needed]

Languages

Available languages are: de, en, es, fr, it, nl, pl, pt, ru, zh

Dataset Structure

Data Instances

This dataset provides pairs of sentences and a score of their similarity.

score 2 example sentences explanation
5 The bird is bathing in the sink.
Birdie is washing itself in the water basin.
The two sentences are completely equivalent, as they mean the same thing.
4 Two boys on a couch are playing video games.
Two boys are playing a video game.
The two sentences are mostly equivalent, but some unimportant details differ.
3 John said he is considered a witness but not a suspect.
“He is not a suspect anymore.” John said.
The two sentences are roughly equivalent, but some important information differs/missing.
2 They flew out of the nest in groups.
They flew into the nest together.
The two sentences are not equivalent, but share some details.
1 The woman is playing the violin.
The young lady enjoys listening to the guitar.
The two sentences are not equivalent, but are on the same topic.
0 The black dog is running through the snow.
A race car driver is driving his car through the mud.
The two sentences are completely dissimilar.

An example:

{
    "sentence1": "A man is playing a large flute.",
    "sentence2": "A man is playing a flute.",
    "similarity_score": 3.8
}

Data Fields

  • sentence1: The 1st sentence as a str.
  • sentence2: The 2nd sentence as a str.
  • similarity_score: The similarity score as a float which is <= 5.0 and >= 0.0.

Data Splits

  • train with 5749 samples
  • dev with 1500 samples
  • test with 1379 sampples

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

See LICENSE and download at original dataset.

Citation Information

@InProceedings{huggingface:dataset:stsb_multi_mt,
title = {Machine translated multilingual STS benchmark dataset.},
author={Philip May},
year={2021},
url={https://github.com/PhilipMay/stsb-multi-mt}
}

Contributions

Thanks to @PhilipMay for adding this dataset.

Downloads last month
9,367
Edit dataset card
Evaluate models HF Leaderboard

Models trained or fine-tuned on stsb_multi_mt