image
string
width
int64
height
int64
left
int64
top
int64
right
int64
bottom
int64
question
string
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000013.jpg"
640
427
129
192
155
212
"What does it use to breath?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000019.jpg"
640
427
424
28
427
32
"What can fly in the sky?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000030.jpg"
640
428
242
149
406
351
"Where can i put my flowers"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000070.jpg"
640
480
52
372
636
443
"What equipment is used for snowboarding?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000073.jpg"
565
640
316
323
452
419
"Where can I put my feet?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000092.jpg"
640
427
417
120
583
426
"what do you use to eat a cake?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000097.jpg"
500
337
294
126
329
153
"Where a seller can show his products prices?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000106.jpg"
640
426
11
56
301
134
"What do you call the object where we put water for animals to drink on?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000128.jpg"
490
640
250
526
316
589
"What food is elephants favorite?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000141.jpg"
640
399
3
33
110
216
"Where do we control the water from"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000159.jpg"
640
318
372
225
414
272
"What can we use to promote a store?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000233.jpg"
640
640
544
387
590
590
"Which is the best utensil for eating?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000257.jpg"
640
480
101
415
176
479
"what is used for carrying a baby?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000267.jpg"
640
504
370
199
384
213
"Who flies an aeroplane"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000294.jpg"
640
427
506
311
639
425
"Where can I warm my food."
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000308.jpg"
640
426
276
94
322
162
"Where does the wine options posted?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000309.jpg"
600
600
420
71
509
194
"Who likes to play with toys?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000335.jpg"
640
480
270
16
345
86
"What do we use to watch movies"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000346.jpg"
640
480
58
157
172
272
"What is the flavor enhancer that comes with food?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000360.jpg"
500
375
240
139
312
196
"What sport requires a skating board?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000370.jpg"
480
640
233
530
260
553
"What can we wear on neck?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000389.jpg"
640
480
387
267
460
479
"What can I use to carry my stuff?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000398.jpg"
640
480
245
76
272
131
"Where can you touch to open something?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000410.jpg"
640
480
38
103
303
362
"What can cut paper easily?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000412.jpg"
640
426
471
162
638
423
"What living creature can speak?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000421.jpg"
640
233
310
143
354
183
"Where do people go to have leisure time?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000436.jpg"
427
640
221
237
321
310
"WHAT DO WE EAT FOR DESSERT?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000450.jpg"
640
480
559
0
639
195
"What do we drink?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000459.jpg"
516
640
372
245
475
316
"What can be used in taking pictures?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000473.jpg"
500
375
112
340
132
365
"What is used to move the bicycle?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000491.jpg"
500
313
0
273
35
308
"What is used to hang small items?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000492.jpg"
640
383
166
248
281
353
"What do parents love and care for"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000508.jpg"
640
480
477
252
507
263
"What can be used to warn the drivers when driving on the road?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000520.jpg"
640
480
345
299
355
319
"What anchored floating item serves as a navigation mark for sailors?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000527.jpg"
640
360
229
126
299
172
"Where do animals at the zoo find water to drink?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000540.jpg"
640
425
157
187
449
270
"What can we take to travel faster?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000552.jpg"
640
427
384
78
492
158
"What is used to play tennis?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000617.jpg"
425
640
182
79
196
93
"What do we hit with tennis racket?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000629.jpg"
640
427
323
248
562
410
"What can we ride on?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000636.jpg"
480
640
138
376
291
552
"What do you sit on during natures call?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000645.jpg"
480
640
251
376
467
507
"What do we use for slicing cake?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000659.jpg"
640
425
346
299
365
397
"What can I use to join two parts of a train?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000664.jpg"
640
480
497
102
631
236
"Where do you watch?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000693.jpg"
640
428
334
39
506
353
"What is synonym of a female child ?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000694.jpg"
640
480
339
379
376
388
"What is used to identify a vehicle?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000713.jpg"
640
425
380
244
401
274
"Where do we heat water?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000736.jpg"
428
640
23
332
137
491
"What is used for washing clothes"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000737.jpg"
426
640
163
381
260
622
"What indicates that I should stop my vehicle?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000761.jpg"
640
480
69
28
245
187
"Where can we sleep?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000790.jpg"
640
422
361
61
385
86
"What is used to measure and indicate time?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000841.jpg"
640
480
288
257
348
384
"What do we use for cleaning our teeth?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000851.jpg"
500
375
78
54
113
104
"Where do we plug our electronics?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000853.jpg"
640
480
351
57
639
161
"What could be used to place filling onto bread?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000855.jpg"
640
480
106
205
115
245
"What can i use to take pictures?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000872.jpg"
621
640
367
158
431
201
"What glove does a person use to take a baseball ball?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000882.jpg"
640
427
227
130
470
425
"Who is man's best friend"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000885.jpg"
640
427
385
267
479
311
"What do we use to hit the ball in tennis?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000888.jpg"
640
427
556
271
573
281
"Where do you put dog food?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000908.jpg"
612
612
406
412
484
466
"Where do we warm food?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000913.jpg"
640
512
413
138
422
158
"What lights up the runway?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000943.jpg"
640
360
487
48
591
236
"Where did the water come from?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000966.jpg"
640
427
554
253
600
412
"Who rides the bus?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000000970.jpg"
640
427
23
120
353
363
"what do racers use when racing in a circuit?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001000.jpg"
640
480
282
164
326
223
"What do we win when we compete?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001005.jpg"
640
427
466
75
532
292
"What is a monkey's favorite food?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001023.jpg"
640
480
0
287
24
379
"What is used to drink?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001053.jpg"
640
419
517
244
639
418
"What makes the car to move?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001093.jpg"
500
347
409
172
432
206
"Which is different from the group?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001109.jpg"
640
427
0
248
46
392
"What thing is used to maintain electricity"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001127.jpg"
500
375
316
38
365
139
"What tells me to stop."
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001151.jpg"
640
480
184
285
219
369
"What object is used to contain liquids?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001175.jpg"
640
334
122
203
207
280
"What can we use to watch news or movies?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001180.jpg"
426
640
161
402
407
491
"What do you eat at the birthday party?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001190.jpg"
640
479
317
321
497
464
"What do we usually cycle for personal use?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001200.jpg"
480
640
83
74
149
110
"Which device slows our fall from the sky?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001205.jpg"
640
480
100
242
322
364
"Where can I sleep tonight?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001206.jpg"
390
500
170
76
210
107
"What does a bird use to pick grains off the ground?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001233.jpg"
437
360
146
63
190
89
"What do we wear in our head to protect it from injury?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001239.jpg"
500
333
276
196
370
230
"What do we use for camping?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001244.jpg"
640
479
425
289
522
322
"What do people use to cover themselves from the sun?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001247.jpg"
640
427
266
248
335
304
"Where do we sit to do our business?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001268.jpg"
640
427
189
225
264
261
"What type of bird lives by the water?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001279.jpg"
500
350
306
3
329
46
"What is used to release smoke in house?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001289.jpg"
640
480
342
252
384
285
"What can be used in the kitchen as a fruit container?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001295.jpg"
480
640
214
256
292
330
"What is used to tell time?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001296.jpg"
427
640
300
143
371
278
"What do people use to call?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001316.jpg"
640
360
215
29
222
35
"What is the object we hit when playing tennis?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001323.jpg"
640
480
266
124
420
335
"Which animal has seven lives?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001324.jpg"
640
427
244
154
266
178
"what is the item we are playing with by throwing it at each other?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001329.jpg"
640
480
313
236
343
277
"What do you use to play tennis?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001330.jpg"
480
640
287
314
326
339
"What is the circular object we can play with while throwing?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001332.jpg"
640
419
254
136
483
309
"What likes to run on ranches?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001348.jpg"
480
640
420
269
467
315
"Where can you find shelter?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001359.jpg"
427
640
192
259
218
312
"What is he flying?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001398.jpg"
407
482
70
56
260
366
"What meows?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001404.jpg"
478
640
113
180
441
525
"What do kids play with ?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001409.jpg"
640
480
248
205
320
271
"What flies in the air?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001426.jpg"
384
640
106
41
268
147
"What is she wearing on her head?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001464.jpg"
640
427
343
100
507
257
"Who skates in the snow?"
"https://toloka-cdn.azureedge.net/wsdmcup2023/000000001507.jpg"
612
612
1
317
188
555
"What do we use for eating?"

Dataset Card for WSDMCup2023

Question Image and Answer
What do you use to hit the ball? What do you use to hit the ball?
What do people use for cutting? What do people use for cutting?
What do we use to support the immune system and get vitamin C? What do we use to support the immune system and get vitamin C?

Dataset Summary

The WSDMCup2023 Dataset consists of images associated with textual questions. One entry (instance) in our dataset is a question-image pair labeled with the ground truth coordinates of a bounding box containing the visual answer to the given question. The images were obtained from a CC BY-licensed subset of the Microsoft Common Objects in Context dataset, MS COCO. All data labeling was performed on the Toloka crowdsourcing platform.

Our dataset has 45,199 instances split among three subsets: train (38,990 instances), public test (1,705 instances), and private test (4,504 instances). The entire train dataset was available for everyone since the start of the challenge. The public test dataset was available since the evaluation phase of the competition but without any ground truth labels. After the end of the competition, public and private sets were released.

Dataset Citation

Please cite the challenge results or dataset description as follows.

  • Ustalov D., Pavlichenko N., Likhobaba D., and Smirnova A. WSDM Cup 2023 Challenge on Visual Question Answering. Proceedings of the 4th Crowd Science Workshop on Collaboration of Humans and Learning Algorithms for Data Labeling. Singapore, 2023, pp. 1–7.
@inproceedings{TolokaWSDMCup2023,
  author    = {Ustalov, Dmitry and Pavlichenko, Nikita and Likhobaba, Daniil and Smirnova, Alisa},
  title     = {{WSDM~Cup 2023 Challenge on Visual Question Answering}},
  year      = {2023},
  booktitle = {Proceedings of the 4th Crowd Science Workshop on Collaboration of Humans and Learning Algorithms for Data Labeling},
  pages     = {1--7},
  address   = {Singapore},
  issn      = {1613-0073},
  url       = {http://ceur-ws.org/Vol-3357/invited1.pdf},
  language  = {english},
}

Supported Tasks and Leaderboards

The Visual Question Answering.

Language

English

Dataset Structure

Data Instances

A data instance contains a URL to the picture, information about the image size - width and height, information about the ground truth bounding box - left top and right bottom dots, and contains the question related to the picture.

{'image': https://toloka-cdn.azureedge.net/wsdmcup2023/000000000013.jpg,
'width': 640,
'height': 427,
'left': 129,
'top': 192,
'right': 155,
'bottom': 212,
'question': What does it use to breath?}

Data Fields

  • image: contains URL to the image
  • width: value in pixels of image width
  • height: value in pixels of image height
  • left: the x coordinate in pixels to determine the left-top dot of the bounding box
  • top: the y coordinate in pixels to determine the left-top dot of the bounding box
  • right: the x coordinate in pixels to determine the right-bottom dot of the bounding box
  • bottom: the y coordinate in pixels to determine the right-bottom dot of the bounding box
  • question: a question related to the picture

Data Splits

There are four splits in the data: train, train_sample, test_public, and test_private. 'train' split contains the full pull for model training. The 'train-sample' split contains the part of the 'train' split. The 'test_public' split contains public data to test the model. The 'test_private' split contains private data for the final model test.

Source Data

The images were obtained from a CC BY-licensed subset of the Microsoft Common Objects in Context dataset, MS COCO.

Annotations

All data labeling was performed on the Toloka crowdsourcing platform.

Only annotators who self-reported the knowledge of English had access to the annotation task.

Citation Information

Downloads last month
2
Edit dataset card
Evaluate models HF Leaderboard

Data Sourcing report

powered
by Spawning.ai

No elements in this dataset have been identified as either opted-out, or opted-in, by their creator.