COCO_train2014_000000000009.jpg
string
plate rack broccoli
string
colorful dishes holding meat vegetables fruit and bread.
string
"COCO_train2014_000000000025.jpg"
"gazelle cheetah"
"two giraffes standing in a tree filled area."
"COCO_train2014_000000000030.jpg"
"vase flower"
"a white vase filled with different colored flowers."
"COCO_train2014_000000000034.jpg"
"zebra plant"
"a lone zebra grazing in some green grass."
"COCO_train2014_000000000036.jpg"
"umbrella dress human face"
"a woman with an umbrella near the sea"
"COCO_train2014_000000000049.jpg"
"pickelhaube cock horse"
"a pair of horses performing tricks in a field."
"COCO_train2014_000000000061.jpg"
"siamang gorilla"
"there are people riding elephants in the middle of a forest"
"COCO_train2014_000000000064.jpg"
"parking meter analog clock car tree"
"a clock is seen in front of a tall tree."
"COCO_train2014_000000000071.jpg"
"electric locomotive freight car"
"a train coming down the tracks in the city."
"COCO_train2014_000000000072.jpg"
"upright whippet"
"two giraffes hang out near trees and nuzzle up to each other."
"COCO_train2014_000000000077.jpg"
"corkscrew chain saw person footwear man"
"group of boys performing skateboard trick on ramp with graffiti on it"
"COCO_train2014_000000000078.jpg"
"table lamp analog clock"
"an antique looking clock beside an owl with a candle in it."
"COCO_train2014_000000000081.jpg"
"airliner"
"a big plane with airfrance on the side of it."
"COCO_train2014_000000000086.jpg"
"bicycle built for two tricycle"
"a person looks down at something while sitting on a bike."
"COCO_train2014_000000000089.jpg"
"chiffonier microwave building countertop kitchen appliance"
"smooth top stove with exhaust fan that has light turned on."
"COCO_train2014_000000000092.jpg"
"chocolate sauce"
"a chocolate desert on a plate with a fork."
"COCO_train2014_000000000094.jpg"
"street sign"
"this is a view of a quaint city street."
"COCO_train2014_000000000109.jpg"
"paddlewheel lakeside vehicle boat"
"a waterfront walkway and garden area next to a river."
"COCO_train2014_000000000110.jpg"
"pizza"
"a woman cutting a pizza with a fork and knife."
"COCO_train2014_000000000113.jpg"
"ballplayer jigsaw puzzle"
"two people standing near a table with a cake"
"COCO_train2014_000000000127.jpg"
"boathouse espresso"
"a slice of cake and mug sitting on a wooden table outside."
"COCO_train2014_000000000138.jpg"
"chiffonier switch"
"a series of photographs of a tiny model kitchen"
"COCO_train2014_000000000142.jpg"
"skunk bagel dessert food drink"
"a close up of a plate of a jelly and banana sandwich"
"COCO_train2014_000000000144.jpg"
"upright fountain"
"two giraffes eating together out of a trough in a fenced area."
"COCO_train2014_000000000149.jpg"
"kite park bench"
"a host of kites being flown on a field by people."
"COCO_train2014_000000000151.jpg"
"freight car park bench"
"a man leans out of a vehicle near a short stop sign in a forest."
"COCO_train2014_000000000154.jpg"
"zebra"
"three zebras standing on a grassy dirt field."
"COCO_train2014_000000000165.jpg"
"military uniform vestment"
"a woman soldier holding up a pair of giant scissors to someones tie."
"COCO_train2014_000000000194.jpg"
"pizza"
"a small personal sized pizza is shown on display."
"COCO_train2014_000000000201.jpg"
"ski"
"a row of snow boards sticking out of the snow."
"COCO_train2014_000000000247.jpg"
"plane wing"
"a red white and blue helicopter has a tag on one blade."
"COCO_train2014_000000000250.jpg"
"street sign"
"street signs near tall buildings on the corner of greenwich st and vesey st."
"COCO_train2014_000000000260.jpg"
"pay phone turnstile"
"the woman is standing with her luggage"
"COCO_train2014_000000000263.jpg"
"african elephant indian"
"a couple of elephants playing with each other in their pen."
"COCO_train2014_000000000307.jpg"
"shetland sheepdog"
"a cute dog is in the grass with a frisbee."
"COCO_train2014_000000000308.jpg"
"barrel caldron"
"a man pouring wine from casks for patrons"
"COCO_train2014_000000000309.jpg"
"cairn teddy"
"a small brown stuffed bear laying in the grass."
"COCO_train2014_000000000312.jpg"
"african elephant"
"a herd of elephants including a tiny baby elephant cross a dirt road."
"COCO_train2014_000000000315.jpg"
"sunscreen seashore"
"women are walking towards seats on the beach."
"COCO_train2014_000000000321.jpg"
"hotdog"
"a sub sandwich in a box next to two hot dogs."
"COCO_train2014_000000000322.jpg"
"projectile balance beam baseball bat person man"
"a person is throwing a frisbee at night."
"COCO_train2014_000000000326.jpg"
"red wine bubble"
"a male in a white shirt is holding a glass of wine"
"COCO_train2014_000000000332.jpg"
"wok king crab food fast"
"asian noodles cut up egg roll with greens and carrots in a white bowl."
"COCO_train2014_000000000349.jpg"
"electric locomotive"
"a white and orange train passing by trees and a lamppost."
"COCO_train2014_000000000368.jpg"
"soccer ball"
"a young man kicking a soccer ball around a field."
"COCO_train2014_000000000370.jpg"
"broccoli"
"a little girl holds a piece of broccoli towards the camera."
"COCO_train2014_000000000382.jpg"
"ski alp person"
"a distance shot shows rolling snowy hills partially shadowed with ski prints reaching off into the distance and to the foreground a single skier."
"COCO_train2014_000000000384.jpg"
"window shade"
"this is a living room with white curtains."
"COCO_train2014_000000000389.jpg"
"bolo tie vestment man human face"
"the man smiles wearing a green neck tie near a crowd of people."
"COCO_train2014_000000000394.jpg"
"sealyham terrier toy poodle"
"a white dog has a purple frisbee in its mouth."
"COCO_train2014_000000000404.jpg"
"catamaran dock"
"three boats are docked together on the cloudy day."
"COCO_train2014_000000000419.jpg"
"racket"
"a tennis player taking a swing at a ball"
"COCO_train2014_000000000431.jpg"
"racket"
"a male tennis player hits the ball on a grass court."
"COCO_train2014_000000000436.jpg"
"harmonica bagel"
"a man licking his thumb with a peanut buttered bread slice in his hand."
"COCO_train2014_000000000438.jpg"
"bakery pretzel dessert snack"
"a box full of donuts in different shapes and flavors."
"COCO_train2014_000000000443.jpg"
"mitten cardigan"
"a person sitting on a couch with a cat laying on its back."
"COCO_train2014_000000000446.jpg"
"bassoon joystick"
"a woman playing a video game indoors."
"COCO_train2014_000000000450.jpg"
"pizza"
"a pizza sitting on a white plate next to a glass of water."
"COCO_train2014_000000000471.jpg"
"school bus wheel"
"a yellow bus without passengers standing on the road."
"COCO_train2014_000000000490.jpg"
"basset english foxhound person footwear building"
"dog and skateboard in a room and shadow of dog and person on a wall."
"COCO_train2014_000000000491.jpg"
"eskimo dog wig"
"toy pomeranian dog on bag for playing with and enjoyment."
"COCO_train2014_000000000508.jpg"
"beacon alp"
"a field covered in snow on top of a hillside."
"COCO_train2014_000000000510.jpg"
"fountain park bench"
"an old man sits on a park bench beside a fountain"
"COCO_train2014_000000000514.jpg"
"theater curtain four poster"
"a very big and fancy looking room with a big pretty bed."
"COCO_train2014_000000000529.jpg"
"bicycle built for two mountain bike motorcycle human face"
"a motorcyclist and a woman posing on a bike for a photo."
"COCO_train2014_000000000531.jpg"
"tennis ball maze"
"two tennis courts both have doubles games playing on them."
"COCO_train2014_000000000532.jpg"
"school bus fire engine"
"a man and two women disembark from two colorful busses."
"COCO_train2014_000000000540.jpg"
"airliner"
"very large jal jet taxiing on the runway at airport in urban environment"
"COCO_train2014_000000000542.jpg"
"plane schooner helicopter wheel person"
"large group of people standing next to very old style air plane"
"COCO_train2014_000000000560.jpg"
"bathtub tub"
"this is a bathroom with a jacuzzi shower sink and toilet."
"COCO_train2014_000000000562.jpg"
"pencil sharpener soup bowl"
"three colored toothbrushes standing in a glass holder."
"COCO_train2014_000000000572.jpg"
"four poster crate"
"a person standing and a person looking down"
"COCO_train2014_000000000575.jpg"
"kit fox wallaby"
"a cat playing with a shoe in a grassy field."
"COCO_train2014_000000000581.jpg"
"ibizan hound wallaby"
"there is a white and beige dog sitting on the floor"
"COCO_train2014_000000000584.jpg"
"acorn squash plate"
"a picture of a bowl of food that is mostly vegetables."
"COCO_train2014_000000000595.jpg"
"rain barrel ashcan"
"an old tv monitor sits in the middle of a stream."
"COCO_train2014_000000000597.jpg"
"african elephant"
"elephant herd in a line walking across savanna."
"COCO_train2014_000000000605.jpg"
"partridge tray"
"a cup of liquid with a fancy design on top of it."
"COCO_train2014_000000000612.jpg"
"theater curtain teddy"
"a group of teddy bears laying on top of a red bed."
"COCO_train2014_000000000620.jpg"
"rotisserie safe"
"large cheese pizzas stacked in a warming oven."
"COCO_train2014_000000000625.jpg"
"volleyball rugby ball"
"3 people attempt to catch a frisbee in midair."
"COCO_train2014_000000000629.jpg"
"moped motor scooter"
"an old motorcycle rests on its kickstand by a door in front of a wall with a mural of a tree."
"COCO_train2014_000000000634.jpg"
"corkscrew aircraft carrier"
"a man riding a skateboard on a ramp."
"COCO_train2014_000000000643.jpg"
"desktop computer"
"a computer desk with a desktop computer and a stuffed turtle on top."
"COCO_train2014_000000000650.jpg"
"madagascar cat tiger"
"cat sitting on top of car with brush behind."
"COCO_train2014_000000000656.jpg"
"racer crash helmet"
"a motorcycle parked in a parking space in front of a store."
"COCO_train2014_000000000659.jpg"
"electric locomotive school bus"
"a train engine carrying carts parked at a station."
"COCO_train2014_000000000670.jpg"
"grocery store carton"
"a bunch of boxes full of assorted fruit on display."
"COCO_train2014_000000000671.jpg"
"leonberg fountain camel tree"
"a statue of a soldier sitting atop a horse"
"COCO_train2014_000000000673.jpg"
"paddle boat vehicle"
"eleven surfboards face down on an empty beach."
"COCO_train2014_000000000681.jpg"
"trolleybus"
"passenger bus parked on street with sunset in background"
"COCO_train2014_000000000684.jpg"
"car mirror hourglass"
"a three piece circular mirror with a girl taking a picture of it in the reflection"
"COCO_train2014_000000000690.jpg"
"corkscrew bicycle built for two tree land vehicle"
"a person riding a motorcycle on a road with trees"
"COCO_train2014_000000000706.jpg"
"upright cheetah giraffe tree"
"a giraffe standing in a stand of trees."
"COCO_train2014_000000000714.jpg"
"washbasin"
"two different bathrooms one has a green toilet"
"COCO_train2014_000000000716.jpg"
"joystick laptop"
"the extra laptop is on standby for the computer users."
"COCO_train2014_000000000722.jpg"
"kite parachute person land vehicle"
"clowns are on hand to assist fliers with their kites."
"COCO_train2014_000000000723.jpg"
"restaurant"
"a large motorcycle sits outside of a bar."
"COCO_train2014_000000000731.jpg"
"poncho umbrella"
"a woman holding her umbrella is standing in a quite large puddle."
"COCO_train2014_000000000735.jpg"
"confectionery altar"
"a women cutting a large cake with one lit candle."
"COCO_train2014_000000000753.jpg"
"hartebeest cheetah"
"the giraffe is standing alone in the wilderness."

#update: June-2023 add both soft/hard-label to visual_caption_cosine_score (0.2, 0.3, 0.4, and 0.5)

Introduction

Modern image captaining relies heavily on extracting knowledge, from images such as objects, to capture the concept of static story in the image. In this paper, we propose a textual visual context dataset for captioning, where the publicly available dataset COCO caption (Lin et al., 2014) has been extended with information about the scene (such as objects in the image). Since this information has textual form, it can be used to leverage any NLP task, such as text similarity or semantic relation methods, into captioning systems, either as an end-to-end training strategy or a post-processing based approach.

Please refer to project page and Github for more information. arXiv Website shields.io

For quick start please have a look this demo and pre-trained model with th 0.2, 0.3, 0.4

Overview

We enrich COCO-Caption with textual Visual Context information. We use ResNet152, CLIP, and Faster R-CNN to extract object information for each image. We use three filter approaches to ensure the quality of the dataset (1) Threshold: to filter out predictions where the object classifier is not confident enough, and (2) semantic alignment with semantic similarity to remove duplicated objects. (3) semantic relatedness score as soft-label: to guarantee the visual context and caption have a strong relation. In particular, we use Sentence-RoBERTa-sts via cosine similarity to give a soft score, and then we use a threshold to annotate the final label (if th ≥ 0.2, 0.3, 0.4 then 1,0). Finally, to take advantage of the visual overlap between caption and visual context, and to extract global information, we use BERT followed by a shallow 1D-CNN (Kim, 2014) to estimate the visual relatedness score.

Download

  1. Dowload Raw data with ID and Visual context -> original dataset with related ID caption train2014
  2. Downlod Data with cosine score-> soft cosine lable with th 0.2, 0.3, 0.4 and 0.5 and hardlabel [0,1]
  3. Dowload Overlaping visual with caption-> Overlap visual context and the human annotated caption
  4. Download Dataset (tsv file) 0.0-> raw data with hard lable without cosine similairty and with threshold cosine sim degree of the relation beteween the visual and caption = 0.2, 0.3, 0.4
  5. Download Dataset GenderBias-> man/woman replaced with person class label

For future work, we plan to extract the visual context from the caption (without using a visual classifier) and estimate the visual relatedness score by employing unsupervised learning (i.e. contrastive learning). (work in progress)

  1. Download CC -> Caption dataset from Conceptinal Caption (CC) 2M (2255927 captions)
  2. Download CC+wiki -> CC+1M-wiki 3M (3255928)
  3. Download CC+wiki+COCO -> CC+wiki+COCO-Caption 3.5M (366984)
  4. Download COCO-caption+wiki -> COCO-caption +wiki 1.4M (1413915)
  5. Download COCO-caption+wiki+CC+8Mwiki -> COCO-caption+wiki+CC+8Mwiki 11M (11541667)

Citation

The details of this repo are described in the following paper. If you find this repo useful, please kindly cite it:

@article{sabir2023visual,
  title={Visual Semantic Relatedness Dataset for Image Captioning},
  author={Sabir, Ahmed and Moreno-Noguer, Francesc and Padr{\'o}, Llu{\'\i}s},
  journal={arXiv preprint arXiv:2301.08784},
  year={2023}
}
Downloads last month
66
Edit dataset card
Evaluate models HF Leaderboard