Dataset Viewer
Viewer
The dataset viewer is not available for this split.
Cannot load the dataset split (in streaming mode) to extract the first rows.
Error code:   StreamingRowsError
Exception:    FileNotFoundError
Message:      https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip
Traceback:    Traceback (most recent call last):
                File "/src/services/worker/.venv/lib/python3.9/site-packages/aiohttp/connector.py", line 980, in _wrap_create_connection
                  return await self._loop.create_connection(*args, **kwargs)  # type: ignore[return-value]  # noqa
                File "/usr/local/lib/python3.9/asyncio/base_events.py", line 1065, in create_connection
                  raise exceptions[0]
                File "/usr/local/lib/python3.9/asyncio/base_events.py", line 1050, in create_connection
                  sock = await self._connect_sock(
                File "/usr/local/lib/python3.9/asyncio/base_events.py", line 961, in _connect_sock
                  await self.sock_connect(sock, address)
                File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 500, in sock_connect
                  return await fut
                File "/usr/local/lib/python3.9/asyncio/selector_events.py", line 535, in _sock_connect_cb
                  raise OSError(err, f'Connect call failed {address}')
              TimeoutError: [Errno 110] Connect call failed ('10.70.21.37', 443)
              
              The above exception was the direct cause of the following exception:
              
              Traceback (most recent call last):
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 417, in _info
                  await _file_info(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 833, in _file_info
                  r = await session.get(url, allow_redirects=ar, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/aiohttp/client.py", line 536, in _request
                  conn = await self._connector.connect(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/aiohttp/connector.py", line 540, in connect
                  proto = await self._create_connection(req, traces, timeout)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/aiohttp/connector.py", line 901, in _create_connection
                  _, proto = await self._create_direct_connection(req, traces, timeout)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/aiohttp/connector.py", line 1209, in _create_direct_connection
                  raise last_exc
                File "/src/services/worker/.venv/lib/python3.9/site-packages/aiohttp/connector.py", line 1178, in _create_direct_connection
                  transp, proto = await self._wrap_create_connection(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/aiohttp/connector.py", line 988, in _wrap_create_connection
                  raise client_error(req.connection_key, exc) from exc
              aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host datashare.ed.ac.uk:443 ssl:default [Connect call failed ('10.70.21.37', 443)]
              
              The above exception was the direct cause of the following exception:
              
              Traceback (most recent call last):
                File "/src/services/worker/src/worker/utils.py", line 263, in get_rows_or_raise
                  return get_rows(
                File "/src/services/worker/src/worker/utils.py", line 204, in decorator
                  return func(*args, **kwargs)
                File "/src/services/worker/src/worker/utils.py", line 241, in get_rows
                  rows_plus_one = list(itertools.islice(ds, rows_max_number + 1))
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 1353, in __iter__
                  for key, example in ex_iterable:
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 207, in __iter__
                  yield from self.generate_examples_fn(**self.kwargs)
                File "/tmp/modules-cache/datasets_modules/datasets/vctk/eeb0c5a93221dfd9ef03140e994b3b762d474be37ade9b3cc9e24e07ed227b07/vctk.py", line 92, in _generate_examples
                  with open(meta_path, encoding="utf-8") as meta_file:
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/streaming.py", line 74, in wrapper
                  return function(*args, download_config=download_config, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 496, in xopen
                  file_obj = fsspec.open(file, mode=mode, *args, **kwargs).open()
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 439, in open
                  return open_files(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 282, in open_files
                  fs, fs_token, paths = get_fs_token_paths(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 606, in get_fs_token_paths
                  fs = filesystem(protocol, **inkwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/registry.py", line 261, in filesystem
                  return cls(**storage_options)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 76, in __call__
                  obj = super().__call__(*args, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/zip.py", line 58, in __init__
                  self.fo = fo.__enter__()  # the whole instance is a context
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 102, in __enter__
                  f = self.fs.open(self.path, mode=mode)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 1199, in open
                  f = self._open(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 356, in _open
                  size = size or self.info(path, **kwargs)["size"]
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 115, in wrapper
                  return sync(self.loop, func, *args, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 100, in sync
                  raise return_result
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 55, in _runner
                  result[0] = await coro
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 430, in _info
                  raise FileNotFoundError(url) from exc
              FileNotFoundError: https://datashare.is.ed.ac.uk/bitstream/handle/10283/3443/VCTK-Corpus-0.92.zip

Need help to make the dataset viewer work? Open a discussion for direct support.

Dataset Card for VCTK

Dataset Summary

This CSTR VCTK Corpus includes speech data uttered by 110 English speakers with various accents. Each speaker reads out about 400 sentences, which were selected from a newspaper, the rainbow passage and an elicitation paragraph used for the speech accent archive.

Supported Tasks and Leaderboards

[More Information Needed]

Languages

[More Information Needed]

Dataset Structure

Data Instances

A data point comprises the path to the audio file, called file and its transcription, called text.

{
  'speaker_id': 'p225',
  'text_id': '001',
  'text': 'Please call Stella.',
  'age': '23',
  'gender': 'F',
  'accent': 'English',
  'region': 'Southern England',
  'file': '/datasets/downloads/extracted/8ed7dad05dfffdb552a3699777442af8e8ed11e656feb277f35bf9aea448f49e/wav48_silence_trimmed/p225/p225_001_mic1.flac',
  'audio':
    {
      'path': '/datasets/downloads/extracted/8ed7dad05dfffdb552a3699777442af8e8ed11e656feb277f35bf9aea448f49e/wav48_silence_trimmed/p225/p225_001_mic1.flac',
      'array': array([0.00485229, 0.00689697, 0.00619507, ..., 0.00811768, 0.00836182, 0.00854492], dtype=float32),
      'sampling_rate': 48000
    },
  'comment': ''
}

Each audio file is a single-channel FLAC with a sample rate of 48000 Hz.

Data Fields

Each row consists of the following fields:

  • speaker_id: Speaker ID
  • audio: Audio recording
  • file: Path to audio file
  • text: Text transcription of corresponding audio
  • text_id: Text ID
  • age: Speaker's age
  • gender: Speaker's gender
  • accent: Speaker's accent
  • region: Speaker's region, if annotation exists
  • comment: Miscellaneous comments, if any

Data Splits

The dataset has no predefined splits.

Dataset Creation

Curation Rationale

[More Information Needed]

Source Data

Initial Data Collection and Normalization

[More Information Needed]

Who are the source language producers?

[More Information Needed]

Annotations

Annotation process

[More Information Needed]

Who are the annotators?

[More Information Needed]

Personal and Sensitive Information

The dataset consists of people who have donated their voice online. You agree to not attempt to determine the identity of speakers in this dataset.

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[More Information Needed]

Additional Information

Dataset Curators

[More Information Needed]

Licensing Information

Public Domain, Creative Commons Attribution 4.0 International Public License (CC-BY-4.0)

Citation Information

@inproceedings{Veaux2017CSTRVC,
    title        = {CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit},
    author       = {Christophe Veaux and Junichi Yamagishi and Kirsten MacDonald},
    year         = 2017
}

Contributions

Thanks to @jaketae for adding this dataset.

Downloads last month
421

Models trained or fine-tuned on vctk

Space using vctk 1