image
image
question
string
answer
string
"where are liver stem cells (oval cells) located?"
"in the canals of hering"
"what are stained here with an immunohistochemical stain for cytokeratin 7?"
"bile duct cells and canals of hering"
"what do the areas of white chalky deposits represent?"
"foci of fat necrosis"
"is embolus derived from a lower-extremity deep venous thrombus lodged in a pulmonary artery branch?"
"yes"
"how is hyperplasia without atypia characterized?"
"by nests of closely packed glands"
"is normal palmar creases present?"
"no"
"where is this from?"
"gastrointestinal system"
"what is present?"
"gastrointestinal"
"what is present?"
"esophagus"
"what is present?"
"varices"
"what does this image show?"
"excellent photo typical adenocarcinoma extending through muscularis to serosa"
"does this image show excellent photo typical adenocarcinoma extending through muscularis to serosa?"
"yes"
"does typical tuberculous exudate show excellent photo typical adenocarcinoma extending through muscularis to serosa?"
"no"
"what is present?"
"gastrointestinal"
"is gastrointestinal present?"
"yes"
"is hyperplasia without atypia characterized by nests of closely packed glands?"
"yes"
"is mucoepidermoid carcinoma present?"
"no"
"what is present?"
"colon"
"is colon present?"
"yes"
"is edema present?"
"no"
"where is this from?"
"gastrointestinal system"
"what is present?"
"gastrointestinal"
"what is present?"
"colon"
"what does this image show?"
"typical infiltrating adenocarcinoma"
"does this image show typical infiltrating adenocarcinoma?"
"yes"
"does retroperitoneal leiomyosarcoma show typical infiltrating adenocarcinoma?"
"no"
"is endoscopic view of a longitudinally-oriented mallory-weiss characterized by nests of closely packed glands?"
"no"
"what is present?"
"gastrointestinal"
"is gastrointestinal present?"
"yes"
"is adrenal present?"
"no"
"what is present?"
"colon"
"is colon present?"
"yes"
"is coronary artery anomalous origin left from pulmonary artery present?"
"no"
"where is this from?"
"gastrointestinal system"
"what is present?"
"gastrointestinal"
"what is present?"
"colon"
"what does this image show?"
"typical histology for colon adenocarcinoma"
"what is seen as glandular crowding and cellular atypia?"
"hyperplasia without atypia"
"does this image show typical histology for colon adenocarcinoma?"
"yes"
"does amyloid angiopathy r. endocrine show typical histology for colon adenocarcinoma?"
"no"
"what is present?"
"gastrointestinal"
"is gastrointestinal present?"
"yes"
"is alpha smooth muscle actin immunohistochemical present?"
"no"
"what is present?"
"colon"
"is colon present?"
"yes"
"is lymphangiomatosis present?"
"no"
"where is this from?"
"gastrointestinal system"
"what is present?"
"gastrointestinal"
"what is hyperplasia with atypia seen as?"
"glandular crowding and cellular atypia"
"what is present?"
"colon"
"what does this image show?"
"typical excellent pinworm"
"does this image show typical excellent pinworm?"
"yes"
"does acid show typical excellent pinworm?"
"no"
"what is present?"
"gastrointestinal"
"is gastrointestinal present?"
"yes"
"is myocardium present?"
"no"
"what is present?"
"appendix"
"is appendix present?"
"yes"
"is carcinoma metastatic lung present?"
"no"
"is hyperplasia with atypia seen as glandular crowding and cellular atypia?"
"yes"
"what is present?"
"pinworm"
"is pinworm present?"
"yes"
"is normal ovary present?"
"no"
"where is this from?"
"gastrointestinal system"
"what is present?"
"gastrointestinal"
"what is present?"
"appendix"
"what is present?"
"pinworm"
"what does this image show?"
"trophozoite source"
"does this image show trophozoite source?"
"yes"
"does glomerulosa show trophozoite source?"
"no"
"is a binucleate reed-sternberg cell with large, inclusion-like nucleoli and abundant cytoplasm seen as glandular crowding and cellular atypia?"
"no"
"what is present?"
"gastrointestinal"
"is gastrointestinal present?"
"yes"
"is metastatic carcinoma prostate present?"
"no"
"what is present?"
"colon"
"is colon present?"
"yes"
"is appendix present?"
"no"
"what is present?"
"amebiasis"
"is amebiasis present?"
"yes"
"is lymphangiomatosis present?"
"no"
"where is this from?"
"gastrointestinal system"
"what shows bundles of normal-looking smooth muscle cells?"
"microscopic appearance of leiomyoma"
"what is present?"
"gastrointestinal"
"what is present?"
"colon"
"what is present?"
"amebiasis"
"what does this image show?"
"inclusion bodies"
"does this image show inclusion bodies?"
"yes"
"does atrophy show inclusion bodies?"
"no"
"what is present?"
"gastrointestinal"
"is gastrointestinal present?"
"yes"
"is antitrypsin present?"
"no"
"what is present?"
"stomach"
"what does the microscopic appearance of leiomyoma show?"
"bundles of normal-looking smooth muscle cells"
"is stomach present?"
"yes"
"is cranial artery present?"
"no"
"what is present?"
"cytomegalovirus"
"is cytomegalovirus present?"
"yes"
"is acute lymphocytic leukemia present?"
"no"
"where is this from?"
"gastrointestinal system"
"what is present?"
"gastrointestinal"

Dataset Card for PathVQA

Dataset Description

PathVQA is a dataset of question-answer pairs on pathology images. The dataset is intended to be used for training and testing Medical Visual Question Answering (VQA) systems. The dataset includes both open-ended questions and binary "yes/no" questions. The dataset is built from two publicly-available pathology textbooks: "Textbook of Pathology" and "Basic Pathology", and a publicly-available digital library: "Pathology Education Informational Resource" (PEIR). The copyrights of images and captions belong to the publishers and authors of these two books, and the owners of the PEIR digital library.

Repository: PathVQA Official GitHub Repository
Paper: PathVQA: 30000+ Questions for Medical Visual Question Answering
Leaderboard: Papers with Code Leaderboard

Dataset Summary

The dataset was obtained from the updated Google Drive link shared by the authors on Feb 15, 2023, see the commit in the GitHub repository. This version of the dataset contains a total of 5,004 images and 32,795 question-answer pairs. Out of the 5,004 images, 4,289 images are referenced by a question-answer pair, while 715 images are not used. There are a few image-question-answer triplets which occur more than once in the same split (training, validation, test). After dropping the duplicate image-question-answer triplets, the dataset contains 32,632 question-answer pairs on 4,289 images.

Supported Tasks and Leaderboards

The PathVQA dataset has an active leaderboard on Papers with Code where models are ranked based on three metrics: "Yes/No Accuracy", "Free-form accuracy" and "Overall accuracy". "Yes/No Accuracy" is the accuracy of a model's generated answers for the subset of binary "yes/no" questions. "Free-form accuracy" is the accuracy of a model's generated answers for the subset of open-ended questions. "Overall accuracy" is the accuracy of a model's generated answers across all questions.

Languages

The question-answer pairs are in English.

Dataset Structure

Data Instances

Each instance consists of an image-question-answer triplet.

{
  'image': <PIL.JpegImagePlugin.JpegImageFile image mode=CMYK size=309x272>,
  'question': 'where are liver stem cells (oval cells) located?',
  'answer': 'in the canals of hering'
}

Data Fields

  • 'image': the image referenced by the question-answer pair.
  • 'question': the question about the image.
  • 'answer': the expected answer.

Data Splits

The dataset is split into training, validation and test. The split is provided directly by the authors.

Training Set Validation Set Test Set
QAs 19,654 6,259 6,719
Images 2,599 832 858

Additional Information

Licensing Information

The authors have released the dataset under the MIT License.

Citation Information

@article{he2020pathvqa,
    title={PathVQA: 30000+ Questions for Medical Visual Question Answering},
    author={He, Xuehai and Zhang, Yichen and Mou, Luntian and Xing, Eric and Xie, Pengtao},
    journal={arXiv preprint arXiv:2003.10286},
    year={2020}
}
Downloads last month
45