Dataset Preview
Viewer
The full dataset viewer is not available (click to read why). Only showing a preview of the rows.
An error occurred while generating the dataset
Error code:   UnexpectedError

Need help to make the dataset viewer work? Open a discussion for direct support.

-5KQ66BBWC4
string
0902
int64
0.077
float64
0.151
float64
0.283
float64
0.811
float64
80
float64
1
float64
"-5KQ66BBWC4"
902
0.077
0.151
0.283
0.811
9
1
"-5KQ66BBWC4"
902
0.226
0.032
0.366
0.497
12
0
"-5KQ66BBWC4"
902
0.226
0.032
0.366
0.497
17
0
"-5KQ66BBWC4"
902
0.226
0.032
0.366
0.497
80
0
"-5KQ66BBWC4"
902
0.332
0.194
0.481
0.891
80
2
"-5KQ66BBWC4"
902
0.332
0.194
0.481
0.891
9
2
"-5KQ66BBWC4"
902
0.505
0.105
0.653
0.78
9
3
"-5KQ66BBWC4"
902
0.626
0.146
0.805
0.818
9
5
"-5KQ66BBWC4"
902
0.805
0.222
0.997
1
80
4
"-5KQ66BBWC4"
902
0.805
0.222
0.997
1
9
4
"-5KQ66BBWC4"
903
0
0.162
0.177
0.804
80
6
"-5KQ66BBWC4"
903
0
0.162
0.177
0.804
9
6
"-5KQ66BBWC4"
903
0.141
0.158
0.298
0.825
12
8
"-5KQ66BBWC4"
903
0.141
0.158
0.298
0.825
80
8
"-5KQ66BBWC4"
903
0.226
0.026
0.363
0.512
12
0
"-5KQ66BBWC4"
903
0.226
0.026
0.363
0.512
80
0
"-5KQ66BBWC4"
903
0.328
0.182
0.484
0.895
80
2
"-5KQ66BBWC4"
903
0.328
0.182
0.484
0.895
9
2
"-5KQ66BBWC4"
903
0.507
0.147
0.666
0.789
80
7
"-5KQ66BBWC4"
903
0.507
0.147
0.666
0.789
9
7
"-5KQ66BBWC4"
903
0.642
0.158
0.791
0.859
9
10
"-5KQ66BBWC4"
903
0.785
0.15
0.886
0.703
12
11
"-5KQ66BBWC4"
903
0.785
0.15
0.886
0.703
80
11
"-5KQ66BBWC4"
903
0.802
0.267
0.994
0.971
80
4
"-5KQ66BBWC4"
903
0.802
0.267
0.994
0.971
9
4
"-5KQ66BBWC4"
903
0.865
0.158
0.991
0.436
80
9
"-5KQ66BBWC4"
903
0.865
0.158
0.991
0.436
9
9
"-5KQ66BBWC4"
904
0.217
0.008
0.982
0.966
12
4
"-5KQ66BBWC4"
904
0.217
0.008
0.982
0.966
80
4
"-5KQ66BBWC4"
905
0.044
0.056
0.236
0.891
12
13
"-5KQ66BBWC4"
905
0.192
0.072
0.411
0.97
12
15
"-5KQ66BBWC4"
905
0.192
0.072
0.411
0.97
80
15
"-5KQ66BBWC4"
905
0.392
0.033
0.556
0.618
14
12
"-5KQ66BBWC4"
905
0.392
0.033
0.556
0.618
17
12
"-5KQ66BBWC4"
905
0.614
0.078
0.826
0.975
12
14
"-5KQ66BBWC4"
905
0.614
0.078
0.826
0.975
80
14
"-5KQ66BBWC4"
906
0.03
0.078
0.225
0.876
12
13
"-5KQ66BBWC4"
906
0.191
0.073
0.399
0.971
12
15
"-5KQ66BBWC4"
906
0.408
0.008
0.586
0.639
14
12
"-5KQ66BBWC4"
906
0.408
0.008
0.586
0.639
17
12
"-5KQ66BBWC4"
906
0.614
0.075
0.821
0.999
12
14
"-5KQ66BBWC4"
906
0.859
0.104
0.996
0.885
12
16
"-5KQ66BBWC4"
907
0.062
0.074
0.234
0.926
12
13
"-5KQ66BBWC4"
907
0.19
0.109
0.396
0.995
12
15
"-5KQ66BBWC4"
907
0.42
0.115
0.616
0.883
14
12
"-5KQ66BBWC4"
907
0.42
0.115
0.616
0.883
17
12
"-5KQ66BBWC4"
907
0.615
0.082
0.823
1
12
14
"-5KQ66BBWC4"
908
0.046
0.072
0.34
1
12
15
"-5KQ66BBWC4"
908
0.046
0.072
0.34
1
80
15
"-5KQ66BBWC4"
908
0.191
0.07
0.32
0.348
12
17
"-5KQ66BBWC4"
908
0.38
0.086
0.693
0.991
14
12
"-5KQ66BBWC4"
908
0.38
0.086
0.693
0.991
17
12
"-5KQ66BBWC4"
908
0.634
0.06
0.919
0.762
12
14
"-5KQ66BBWC4"
909
0.118
0.01
0.889
0.984
12
12
"-5KQ66BBWC4"
909
0.118
0.01
0.889
0.984
17
12
"-5KQ66BBWC4"
909
0.118
0.01
0.889
0.984
79
12
"-5KQ66BBWC4"
909
0.778
0.035
1
0.966
12
14
"-5KQ66BBWC4"
910
0.053
0.045
0.824
0.983
12
12
"-5KQ66BBWC4"
910
0.053
0.045
0.824
0.983
17
12
"-5KQ66BBWC4"
910
0.053
0.045
0.824
0.983
79
12
"-5KQ66BBWC4"
911
0.037
0.025
0.826
0.978
12
12
"-5KQ66BBWC4"
911
0.037
0.025
0.826
0.978
17
12
"-5KQ66BBWC4"
911
0.037
0.025
0.826
0.978
79
12
"-5KQ66BBWC4"
912
0
0.114
0.602
1
12
12
"-5KQ66BBWC4"
912
0
0.114
0.602
1
74
12
"-5KQ66BBWC4"
912
0
0.114
0.602
1
80
12
"-5KQ66BBWC4"
912
0.374
0.151
0.874
0.982
12
18
"-5KQ66BBWC4"
912
0.374
0.151
0.874
0.982
79
18
"-5KQ66BBWC4"
912
0.374
0.151
0.874
0.982
80
18
"-5KQ66BBWC4"
913
0
0
0.282
0.638
12
19
"-5KQ66BBWC4"
913
0
0
0.282
0.638
74
19
"-5KQ66BBWC4"
913
0
0.147
0.564
0.992
12
12
"-5KQ66BBWC4"
913
0
0.147
0.564
0.992
74
12
"-5KQ66BBWC4"
913
0
0.147
0.564
0.992
80
12
"-5KQ66BBWC4"
913
0.385
0.15
0.883
0.987
12
18
"-5KQ66BBWC4"
913
0.385
0.15
0.883
0.987
79
18
"-5KQ66BBWC4"
913
0.385
0.15
0.883
0.987
80
18
"-5KQ66BBWC4"
914
0
0.137
0.6
1
12
12
"-5KQ66BBWC4"
914
0
0.137
0.6
1
74
12
"-5KQ66BBWC4"
914
0
0.137
0.6
1
79
12
"-5KQ66BBWC4"
914
0
0.137
0.6
1
80
12
"-5KQ66BBWC4"
914
0.384
0.161
0.882
0.995
12
18
"-5KQ66BBWC4"
914
0.384
0.161
0.882
0.995
79
18
"-5KQ66BBWC4"
914
0.384
0.161
0.882
0.995
80
18
"-5KQ66BBWC4"
915
0.29
0.217
0.68
0.975
12
20
"-5KQ66BBWC4"
915
0.29
0.217
0.68
0.975
17
20
"-5KQ66BBWC4"
915
0.29
0.217
0.68
0.975
79
20
"-5KQ66BBWC4"
915
0.29
0.217
0.68
0.975
80
20
"-5KQ66BBWC4"
915
0.541
0.211
0.657
0.447
12
23
"-5KQ66BBWC4"
915
0.541
0.211
0.657
0.447
74
23
"-5KQ66BBWC4"
915
0.541
0.211
0.657
0.447
80
23
"-5KQ66BBWC4"
915
0.612
0.246
0.811
0.969
12
21
"-5KQ66BBWC4"
915
0.612
0.246
0.811
0.969
74
21
"-5KQ66BBWC4"
915
0.612
0.246
0.811
0.969
80
21
"-5KQ66BBWC4"
915
0.7
0.077
1
0.991
12
22
"-5KQ66BBWC4"
915
0.7
0.077
1
0.991
74
22
"-5KQ66BBWC4"
916
0.09
0.304
0.296
1
12
25
"-5KQ66BBWC4"
916
0.09
0.304
0.296
1
74
25
"-5KQ66BBWC4"
916
0.09
0.304
0.296
1
80
25
"-5KQ66BBWC4"
916
0.297
0.211
0.666
0.978
12
20
End of preview (truncated to 100 rows)

Dataset Card for CLIP-Kinetics70

Dataset Description

Dataset Summary

CLIP-Kinetics700 is a compressed version of the Kinetics700 dataset using OpenAI's CLIP model.

The original dataset is ~700 GB making it difficult to use and hold in memory on one machine. By downsampling each video to 1 FPS and encoding the frames using CLIP we we're able to compress the dataset to ~8 GB making it very memory-friendly and easy to use.

Dataset Preprocessing

clip-video-encode is a tool you can use to easily and efficiently compute CLIP embeddings from video frames. We used it to generate the embeddings for this dataset.

Dataset Structure

Data Format

We formatted this as a WebDataset for better data-loading performance when training the models. Each split contains a list of tar files each with 10000 data samples. This format can be read and used easily using the EmbeddingWebDatasetReader from clip-video-encode.

CLIP-Kinetics700
 β”œβ”€β”€ splits.csv
 β”œβ”€β”€ ds_00000.tar
 |     β”œβ”€β”€ vid_00000.npy
 |     β”œβ”€β”€ vid_00000.txt
 |     β”œβ”€β”€ vid_00000.json
 |     β”œβ”€β”€ vid_00001.npy
 |     β”œβ”€β”€ vid_00001.txt
 |     β”œβ”€β”€ vid_00001.json
 |     └── ...
 |     β”œβ”€β”€ vid_10000.npy
 |     β”œβ”€β”€ vid_10000.txt
 |     β”œβ”€β”€ vid_10000.json
 β”œβ”€β”€ ds_00001.tar
 |     β”œβ”€β”€ vid_10001.npy
 |     β”œβ”€β”€ vid_10001.txt
 |     β”œβ”€β”€ vid_10001.json
 β”‚     ...
 ...

Data Fields

  • vid.npy: the numpy array with the per-frame embeddings. Shape -> (n_frames, 512)
  • vid.cap: the "caption" of the video. In this case it is the Kinetics700 label.
  • vid.json: additional metadata - YouTube video ID, start time, end time.

Data Splits

  • Train - 536489 samples | 54 tar's
  • Validation - 33966 samples | 4 tar's
  • Test - 64532 samples | 7 tar's

Dataset Creation

Source Data

Data was sourced from DeepMind's Kinetics700 dataset and downloaded using this convenient repository.

Simple Experiments

Using this repository we evaluate CLIP-Kinetics700 with the following simple methods:

Zero-shot Evaluation

Accuracy
Top-1 0.31
Top-5 0.56
mean(Top1, Top5) 0.44

Linear-probe Evaluation

Accuracy
Top-1 0.41
Top-5 0.65
mean(Top1, Top5) 0.53
Downloads last month
61
Edit dataset card
Evaluate models HF Leaderboard