Dataset Viewer
Viewer
The dataset viewer is not available for this split.
Cannot load the dataset split (in streaming mode) to extract the first rows.
Error code:   StreamingRowsError
Exception:    ValueError
Message:      The HTTP server doesn't appear to support range requests. Only reading this file from the beginning is supported. Open with block_size=0 for a streaming file interface.
Traceback:    Traceback (most recent call last):
                File "/src/services/worker/src/worker/utils.py", line 263, in get_rows_or_raise
                  return get_rows(
                File "/src/services/worker/src/worker/utils.py", line 204, in decorator
                  return func(*args, **kwargs)
                File "/src/services/worker/src/worker/utils.py", line 241, in get_rows
                  rows_plus_one = list(itertools.islice(ds, rows_max_number + 1))
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 1353, in __iter__
                  for key, example in ex_iterable:
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 207, in __iter__
                  yield from self.generate_examples_fn(**self.kwargs)
                File "/tmp/modules-cache/datasets_modules/datasets/LanceaKing--asvspoof2019/31161b6952eafb56f5c3a720eaffa6db1cfe62b7e0810508b8ede9023d38a6d7/asvspoof2019.py", line 131, in _generate_examples
                  with open(metadata_filepath) as f:
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/streaming.py", line 74, in wrapper
                  return function(*args, download_config=download_config, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 496, in xopen
                  file_obj = fsspec.open(file, mode=mode, *args, **kwargs).open()
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 439, in open
                  return open_files(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 282, in open_files
                  fs, fs_token, paths = get_fs_token_paths(
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/core.py", line 606, in get_fs_token_paths
                  fs = filesystem(protocol, **inkwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/registry.py", line 261, in filesystem
                  return cls(**storage_options)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 76, in __call__
                  obj = super().__call__(*args, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/zip.py", line 59, in __init__
                  self.zip = zipfile.ZipFile(
                File "/usr/local/lib/python3.9/zipfile.py", line 1266, in __init__
                  self._RealGetContents()
                File "/usr/local/lib/python3.9/zipfile.py", line 1329, in _RealGetContents
                  endrec = _EndRecData(fp)
                File "/usr/local/lib/python3.9/zipfile.py", line 273, in _EndRecData
                  data = fpin.read()
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 600, in read
                  return super().read(length)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/spec.py", line 1748, in read
                  out = self.cache._fetch(self.loc, self.loc + length)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/caching.py", line 380, in _fetch
                  self.cache = self.fetcher(start, bend)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 115, in wrapper
                  return sync(self.loop, func, *args, **kwargs)
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 100, in sync
                  raise return_result
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/asyn.py", line 55, in _runner
                  result[0] = await coro
                File "/src/services/worker/.venv/lib/python3.9/site-packages/fsspec/implementations/http.py", line 671, in async_fetch_range
                  raise ValueError(
              ValueError: The HTTP server doesn't appear to support range requests. Only reading this file from the beginning is supported. Open with block_size=0 for a streaming file interface.

Need help to make the dataset viewer work? Open a discussion for direct support.

Dataset Card for asvspoof2019

Dataset Summary

This is a database used for the Third Automatic Speaker Verification Spoofing and Countermeasuers Challenge, for short, ASVspoof 2019 (http://www.asvspoof.org) organized by Junichi Yamagishi, Massimiliano Todisco, Md Sahidullah, Héctor Delgado, Xin Wang, Nicholas Evans, Tomi Kinnunen, Kong Aik Lee, Ville Vestman, and Andreas Nautsch in 2019.

Supported Tasks and Leaderboards

[Needs More Information]

Languages

English

Dataset Structure

Data Instances

{'speaker_id': 'LA_0091',
 'audio_file_name': 'LA_T_8529430',
 'audio': {'path': 'D:/Users/80304531/.cache/huggingface/datasets/downloads/extracted/8cabb6d5c283b0ed94b2219a8d459fea8e972ce098ef14d8e5a97b181f850502/LA/ASVspoof2019_LA_train/flac/LA_T_8529430.flac',
  'array': array([-0.00201416, -0.00234985, -0.0022583 , ...,  0.01309204,
          0.01339722,  0.01461792], dtype=float32),
  'sampling_rate': 16000},
 'system_id': 'A01',
 'key': 1}

Data Fields

Logical access (LA):

  • speaker_id: LA_****, a 4-digit speaker ID
  • audio_file_name: name of the audio file
  • audio: A dictionary containing the path to the downloaded audio file, the decoded audio array, and the sampling rate. Note that when accessing the audio column: dataset[0]["audio"] the audio file is automatically decoded and resampled to dataset.features["audio"].sampling_rate. Decoding and resampling of a large number of audio files might take a significant amount of time. Thus it is important to first query the sample index before the "audio" column, i.e. dataset[0]["audio"] should always be preferred over dataset["audio"][0].
  • system_id: ID of the speech spoofing system (A01 - A19), or, for bonafide speech SYSTEM-ID is left blank ('-')
  • key: 'bonafide' for genuine speech, or, 'spoof' for spoofing speech

Physical access (PA):

  • speaker_id: PA_****, a 4-digit speaker ID

  • audio_file_name: name of the audio file

  • audio: A dictionary containing the path to the downloaded audio file, the decoded audio array, and the sampling rate. Note that when accessing the audio column: dataset[0]["audio"] the audio file is automatically decoded and resampled to dataset.features["audio"].sampling_rate. Decoding and resampling of a large number of audio files might take a significant amount of time. Thus it is important to first query the sample index before the "audio" column, i.e. dataset[0]["audio"] should always be preferred over dataset["audio"][0].

  • environment_id: a triplet (S,R,D_s), which take one letter in the set {a,b,c} as categorical value, defined as

    a b c
    S: Room size (square meters) 2-5 5-10 10-20
    R: T60 (ms) 50-200 200-600 600-1000
    D_s: Talker-to-ASV distance (cm) 10-50 50-100 100-150
  • attack_id: a duple (D_a,Q), which take one letter in the set {A,B,C} as categorical value, defined as

    A B C
    Z: Attacker-to-talker distance (cm) 10-50 50-100 > 100
    Q: Replay device quality perfect high low

    for bonafide speech, attack_id is left blank ('-')

  • key: 'bonafide' for genuine speech, or, 'spoof' for spoofing speech

Data Splits

Training set Development set Evaluation set
Bonafide 2580 2548 7355
Spoof 22800 22296 63882
Total 25380 24844 71237

Dataset Creation

Curation Rationale

[Needs More Information]

Source Data

Initial Data Collection and Normalization

[Needs More Information]

Who are the source language producers?

[Needs More Information]

Annotations

Annotation process

[Needs More Information]

Who are the annotators?

[Needs More Information]

Personal and Sensitive Information

[Needs More Information]

Considerations for Using the Data

Social Impact of Dataset

[Needs More Information]

Discussion of Biases

[Needs More Information]

Other Known Limitations

[Needs More Information]

Additional Information

Dataset Curators

[Needs More Information]

Licensing Information

This ASVspoof 2019 dataset is made available under the Open Data Commons Attribution License: http://opendatacommons.org/licenses/by/1.0/

Citation Information

@InProceedings{Todisco2019,
  Title     = {{ASV}spoof 2019: {F}uture {H}orizons in {S}poofed and {F}ake {A}udio {D}etection},
  Author    = {Todisco, Massimiliano and
               Wang, Xin and
               Sahidullah, Md and
               Delgado, H ́ector and
               Nautsch, Andreas and
               Yamagishi, Junichi and
               Evans, Nicholas and
               Kinnunen, Tomi and
               Lee, Kong Aik},
  booktitle = {Proc. of Interspeech 2019},
  Year      = {2019}
}
Downloads last month
144
Edit dataset card
Evaluate models HF Leaderboard