Dataset Card for NST Swedish Speech Synthesis (44 kHz)

Dataset Summary

The corpus consists of a single speaker, with 5277 segments.

Supported Tasks and Leaderboards

[Needs More Information]

Languages

The audio is in Swedish.

Dataset Structure

[Needs More Information]

Data Instances

[Needs More Information]

Data Fields

[Needs More Information]

Data Splits

[Needs More Information]

Dataset Creation

Curation Rationale

(The below is a partially corrected machine translation from here )

The data was developed by Nordisk språkteknologi holding AS (NST), which went bankrupt in 2003.

In 2006, a jointly owned group of the University of Oslo, the University of Bergen, the Norwegian University of Science and Technology, the Language Council and IBM bought the assets of NST, to ensure that the linguistic resources that NST had developed were take care of. The National Library was commissioned by the Ministry of Culture to build a Norwegian language bank in 2009, and started this work in 2010.

The resources after NST were transferred to the National Library in May 2011, and they are now done available in the Language Bank, initially without further processing.

Source Data

Initial Data Collection and Normalization

[Needs More Information]

Who are the source language producers?

[Needs More Information]

Annotations

Annotation process

[Needs More Information]

Who are the annotators?

[Needs More Information]

Personal and Sensitive Information

[Needs More Information]

Considerations for Using the Data

Social Impact of Dataset

[More Information Needed]

Discussion of Biases

[More Information Needed]

Other Known Limitations

[Needs More Information]

Additional Information

Dataset Curators

The Norwegian Language Bank

Licensing Information

CC0: Public Domain

Citation Information

[Needs More Information]

Contributions

[Needs More Information]