Model Database's logo
Join the Model Database community

and get access to the augmented documentation experience

to get started

List splits and configurations

Datasets typically have splits and may also have configurations. A split is a subset of the dataset, like train and test, that are used during different stages of training and evaluating a model. A configuration is a sub-dataset contained within a larger dataset. Configurations are especially common in multilingual speech datasets where there may be a different configuration for each language. If you’re interested in learning more about splits and configurations, check out the Load a dataset from the Hub tutorial!

This guide shows you how to use Datasets Server’s /splits endpoint to retrieve a dataset’s splits and configurations programmatically. Feel free to also try it out with Postman, RapidAPI, or ReDoc

The /splits endpoint accepts the dataset name as its query parameter:

Python
JavaScript
cURL
import requests
headers = {"Authorization": f"Bearer {API_TOKEN}"}
API_URL = "https://datasets-server.huggingface.co/splits?dataset=duorc"
def query():
    response = requests.get(API_URL, headers=headers)
    return response.json()
data = query()

The endpoint response is a JSON containing a list of the dataset’s splits and configurations. For example, the duorc dataset has six splits and two configurations:

{
  "splits": [
    { "dataset": "duorc", "config": "ParaphraseRC", "split": "train" },
    { "dataset": "duorc", "config": "ParaphraseRC", "split": "validation" },
    { "dataset": "duorc", "config": "ParaphraseRC", "split": "test" },
    { "dataset": "duorc", "config": "SelfRC", "split": "train" },
    { "dataset": "duorc", "config": "SelfRC", "split": "validation" },
    { "dataset": "duorc", "config": "SelfRC", "split": "test" }
  ],
  "pending": [],
  "failed": []
}