Datasets:
The dataset viewer is not available for this split.
Error code: StreamingRowsError Exception: FileNotFoundError Message: https://github.com/purvimisal/OneStopCorpus-Compiled/raw/main/Texts-SeparatedByReadingLevel.zip Traceback: Traceback (most recent call last): File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 417, in _info await _file_info( File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 837, in _file_info r.raise_for_status() File "/src/services/worker/.venv/lib/python3.9/site-packages/aiohttp/client_reqrep.py", line 1005, in raise_for_status raise ClientResponseError( aiohttp.client_exceptions.ClientResponseError: 503, message='first byte timeout', url=URL('https://raw.githubusercontent.com/purvimisal/OneStopCorpus-Compiled/main/Texts-SeparatedByReadingLevel.zip') The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/src/services/worker/src/worker/utils.py", line 263, in get_rows_or_raise return get_rows( File "/src/services/worker/src/worker/utils.py", line 204, in decorator return func(*args, **kwargs) File "/src/services/worker/src/worker/utils.py", line 241, in get_rows rows_plus_one = list(itertools.islice(ds, rows_max_number + 1)) File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 1353, in __iter__ for key, example in ex_iterable: File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 207, in __iter__ yield from self.generate_examples_fn(**self.kwargs) File "/tmp/modules-cache/datasets_modules/datasets/onestop_english/6b19eec5680862ad1cf1990e98b06a98d1fa4c85f3585dc4dfab93f52b89d9cf/onestop_english.py", line 132, in _generate_examples split_text, split_labels = self._get_examples_from_split(split_key, data_dir) File "/tmp/modules-cache/datasets_modules/datasets/onestop_english/6b19eec5680862ad1cf1990e98b06a98d1fa4c85f3585dc4dfab93f52b89d9cf/onestop_english.py", line 91, in _get_examples_from_split files = os.listdir(dir_path) File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/streaming.py", line 74, in wrapper return function(*args, download_config=download_config, **kwargs) File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 532, in xlistdir fs, *_ = fsspec.get_fs_token_paths(path, storage_options=storage_options) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 606, in get_fs_token_paths fs = filesystem(protocol, **inkwargs) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/registry.py", line 261, in filesystem return cls(**storage_options) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 76, in __call__ obj = super().__call__(*args, **kwargs) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/zip.py", line 58, in __init__ self.fo = fo.__enter__() # the whole instance is a context File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 102, in __enter__ f = self.fs.open(self.path, mode=mode) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 1199, in open f = self._open( File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 356, in _open size = size or self.info(path, **kwargs)["size"] File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 115, in wrapper return sync(self.loop, func, *args, **kwargs) File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 100, in sync raise return_result File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 55, in _runner result[0] = await coro File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 430, in _info raise FileNotFoundError(url) from exc FileNotFoundError: https://github.com/purvimisal/OneStopCorpus-Compiled/raw/main/Texts-SeparatedByReadingLevel.zip
Need help to make the dataset viewer work? Open a discussion for direct support.
Dataset Card for OneStopEnglish corpus
Dataset Summary
OneStopEnglish is a corpus of texts written at three reading levels, and demonstrates its usefulness for through two applications - automatic readability assessment and automatic text simplification.
Supported Tasks and Leaderboards
[More Information Needed]
Languages
[More Information Needed]
Dataset Structure
Data Instances
An instance example:
{
"text": "When you see the word Amazon, what’s the first thing you think...",
"label": 0
}
Note that each instance contains the full text of the document.
Data Fields
text
: Full document text.label
: Reading level of the document- ele/int/adv (Elementary/Intermediate/Advance).
Data Splits
The OneStopEnglish dataset has a single train split.
Split | Number of instances |
---|---|
train | 567 |
Dataset Creation
Curation Rationale
[More Information Needed]
Source Data
Initial Data Collection and Normalization
[More Information Needed]
Who are the source language producers?
[More Information Needed]
Annotations
Annotation process
[More Information Needed]
Who are the annotators?
[More Information Needed]
Personal and Sensitive Information
[More Information Needed]
Considerations for Using the Data
Social Impact of Dataset
[More Information Needed]
Discussion of Biases
[More Information Needed]
Other Known Limitations
[More Information Needed]
Additional Information
Dataset Curators
[More Information Needed]
Licensing Information
Creative Commons Attribution-ShareAlike 4.0 International License
Citation Information
[More Information Needed]
Contributions
Thanks to @purvimisal for adding this dataset.
- Downloads last month
- 875