Error code:   StreamingRowsError
Exception:    NonStreamableDatasetError
Message:      Streaming is not possible for this dataset because data host server doesn't support HTTP range requests. You can still load this dataset in non-streaming mode by passing `streaming=False` (default)
Traceback:    Traceback (most recent call last):
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 496, in xopen
                  file_obj = fsspec.open(file, mode=mode, *args, **kwargs).open()
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 439, in open
                  return open_files(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 282, in open_files
                  fs, fs_token, paths = get_fs_token_paths(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 606, in get_fs_token_paths
                  fs = filesystem(protocol, **inkwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/registry.py", line 261, in filesystem
                  return cls(**storage_options)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 76, in __call__
                  obj = super().__call__(*args, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/zip.py", line 59, in __init__
                  self.zip = zipfile.ZipFile(
                File "/usr/local/lib/python3.9/zipfile.py", line 1266, in __init__
                  self._RealGetContents()
                File "/usr/local/lib/python3.9/zipfile.py", line 1329, in _RealGetContents
                  endrec = _EndRecData(fp)
                File "/usr/local/lib/python3.9/zipfile.py", line 263, in _EndRecData
                  fpin.seek(0, 2)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 747, in seek
                  raise ValueError("Cannot seek streaming HTTP file")
              ValueError: Cannot seek streaming HTTP file
              
              The above exception was the direct cause of the following exception:
              
              Traceback (most recent call last):
                File "/src/services/worker/src/worker/utils.py", line 257, in get_rows_or_raise
                  return get_rows(
                File "/src/services/worker/src/worker/utils.py", line 198, in decorator
                  return func(*args, **kwargs)
                File "/src/services/worker/src/worker/utils.py", line 235, in get_rows
                  rows_plus_one = list(itertools.islice(ds, rows_max_number + 1))
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 1379, in __iter__
                  for key, example in ex_iterable:
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 233, in __iter__
                  yield from self.generate_examples_fn(**self.kwargs)
                File "/tmp/modules-cache/datasets_modules/datasets/Leyo--TGIF/83967bb9cca723f70c977a431f3164ff9b2c6f6214227f5fb17764cbdf6decfe/TGIF.py", line 99, in _generate_examples
                  with open(split_links_file, encoding="utf-8") as txt_file:
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/streaming.py", line 74, in wrapper
                  return function(*args, download_config=download_config, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 499, in xopen
                  raise NonStreamableDatasetError(
              datasets.download.streaming_download_manager.NonStreamableDatasetError: Streaming is not possible for this dataset because data host server doesn't support HTTP range requests. You can still load this dataset in non-streaming mode by passing `streaming=False` (default)

Need help to make the dataset viewer work? Open a discussion for direct support.

Dataset Card for [Dataset Name]

Dataset Summary

The Tumblr GIF (TGIF) dataset contains 100K animated GIFs and 120K sentences describing visual content of the animated GIFs. The animated GIFs have been collected from Tumblr, from randomly selected posts published between May and June of 2015. We provide the URLs of animated GIFs in this release. The sentences are collected via crowdsourcing, with a carefully designed annotation interface that ensures high quality dataset. We provide one sentence per animated GIF for the training and validation splits, and three sentences per GIF for the test split. The dataset shall be used to evaluate animated GIF/video description techniques.

Languages

The captions in the dataset are in English.

Dataset Structure

Data Fields

video_path: str "https://31.media.tumblr.com/001a8b092b9752d260ffec73c0bc29cd/tumblr_ndotjhRiX51t8n92fo1_500.gif" -video_bytes: large_bytes video file in bytes format
en_global_captions: list_str List of english captions describing the entire video

Data Splits

	train	validation	test	Overall
# of GIFs	80,000	10,708	11,360	102,068

Annotations

Quoting TGIF paper:
"We annotated animated GIFs with natural language descriptions using the crowdsourcing service CrowdFlower. We carefully designed our annotation task with various quality control mechanisms to ensure the sentences are both syntactically and semantically of high quality. A total of 931 workers participated in our annotation task. We allowed workers only from Australia, Canada, New Zealand, UK and USA in an effort to collect fluent descriptions from native English speakers. Figure 2 shows the instructions given to the workers. Each task showed 5 animated GIFs and asked the worker to describe each with one sentence. To promote language style diversity, each worker could rate no more than 800 images (0.7% of our corpus). We paid 0.02 USD per sentence; the entire crowdsourcing cost less than 4K USD. We provide details of our annotation task in the supplementary material."

Personal and Sensitive Information

Nothing specifically mentioned in the paper.

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Licensing Information

This dataset is provided to be used for approved non-commercial research purposes. No personally identifying information is available in this dataset.

Citation Information

@InProceedings{tgif-cvpr2016,
  author = {Li, Yuncheng and Song, Yale and Cao, Liangliang and Tetreault, Joel and Goldberg, Larry and Jaimes, Alejandro and Luo, Jiebo},
  title = "{TGIF: A New Dataset and Benchmark on Animated GIF Description}",
  booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  month = {June},
  year = {2016}
}

Contributions

Thanks to @leot13 for adding this dataset.

Downloads last month: 61

Edit dataset card

Evaluate models HF Leaderboard