Dataset Viewer
Viewer
The dataset viewer is not available for this split.
Cannot load the dataset split (in streaming mode) to extract the first rows.
Error code:   StreamingRowsError
Exception:    FileNotFoundError
Message:      https://github.com/purvimisal/OneStopCorpus-Compiled/raw/main/Texts-SeparatedByReadingLevel.zip
Traceback:    Traceback (most recent call last):
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 417, in _info
                  await _file_info(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 837, in _file_info
                  r.raise_for_status()
                File "/src/services/worker/.venv/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 1005, in raise_for_status
                  raise ClientResponseError(
              aiohttp.client_exceptions.ClientResponseError: 503, message='first byte timeout', url=URL('https://raw.githubusercontent.com/purvimisal/OneStopCorpus-Compiled/main/Texts-SeparatedByReadingLevel.zip')
              
              The above exception was the direct cause of the following exception:
              
              Traceback (most recent call last):
                File "/src/services/worker/src/worker/utils.py", line 263, in get_rows_or_raise
                  return get_rows(
                File "/src/services/worker/src/worker/utils.py", line 204, in decorator
                  return func(*args, **kwargs)
                File "/src/services/worker/src/worker/utils.py", line 241, in get_rows
                  rows_plus_one = list(itertools.islice(ds, rows_max_number + 1))
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 1353, in __iter__
                  for key, example in ex_iterable:
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 207, in __iter__
                  yield from self.generate_examples_fn(**self.kwargs)
                File "/tmp/modules-cache/datasets_modules/datasets/onestop_english/6b19eec5680862ad1cf1990e98b06a98d1fa4c85f3585dc4dfab93f52b89d9cf/onestop_english.py", line 132, in _generate_examples
                  split_text, split_labels = self._get_examples_from_split(split_key, data_dir)
                File "/tmp/modules-cache/datasets_modules/datasets/onestop_english/6b19eec5680862ad1cf1990e98b06a98d1fa4c85f3585dc4dfab93f52b89d9cf/onestop_english.py", line 91, in _get_examples_from_split
                  files = os.listdir(dir_path)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/streaming.py", line 74, in wrapper
                  return function(*args, download_config=download_config, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 532, in xlistdir
                  fs, *_ = fsspec.get_fs_token_paths(path, storage_options=storage_options)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 606, in get_fs_token_paths
                  fs = filesystem(protocol, **inkwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/registry.py", line 261, in filesystem
                  return cls(**storage_options)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 76, in __call__
                  obj = super().__call__(*args, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/zip.py", line 58, in __init__
                  self.fo = fo.__enter__()  # the whole instance is a context
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 102, in __enter__
                  f = self.fs.open(self.path, mode=mode)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 1199, in open
                  f = self._open(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 356, in _open
                  size = size or self.info(path, **kwargs)["size"]
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 115, in wrapper
                  return sync(self.loop, func, *args, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 100, in sync
                  raise return_result
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 55, in _runner
                  result[0] = await coro
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 430, in _info
                  raise FileNotFoundError(url) from exc
              FileNotFoundError: https://github.com/purvimisal/OneStopCorpus-Compiled/raw/main/Texts-SeparatedByReadingLevel.zip

Need help to make the dataset viewer work? Open a discussion for direct support.

Dataset Card for OneStopEnglish corpus

Dataset Summary

OneStopEnglish is a corpus of texts written at three reading levels, and demonstrates its usefulness for through two applications - automatic readability assessment and automatic text simplification.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

[More Information Needed]

Dataset Structure

Data Instances

An instance example:

{
  "text": "When you see the word Amazon, what’s the first thing you think...",
  "label": 0
}

Note that each instance contains the full text of the document.

Data Fields

  • text: Full document text.
  • label: Reading level of the document- ele/int/adv (Elementary/Intermediate/Advance).

Data Splits

The OneStopEnglish dataset has a single train split.

Split Number of instances
train 567

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

[More Information Needed]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

Creative Commons Attribution-ShareAlike 4.0 International License

Citation Information

[More Information Needed]

Contributions

Thanks to @purvimisal for adding this dataset.

Downloads last month
875