PubMedQA is a biomedical question answering (QA) dataset compiled from PubMed abstracts. The task of PubMedQA is to use the corresponding abstract to answer research questions, with the answers formatted as yes/no/maybe (e.g., "Do preoperative statins reduce atrial fibrillation after coronary artery bypass grafting?"). The dataset contains 1,000 expert-annotated QA instances, 61,200 unannotated instances, and 211,300 artificially generated QA instances. Each PubMedQA instance includes: (1) a question, which may be the title of an existing research article or derived from the title; (2) a context, namely the corresponding abstract excluding the conclusion; (3) a long answer, which is the conclusion of the abstract assumed to answer the research question; (4) a yes/no/maybe answer that summarizes the conclusion.
The uniqueness of PubMedQA lies in its deep mining of PubMed data sources. Approximately 760,000 PubMed article titles are presented in the form of questions, which are often directly related to the conclusion part of the abstract, providing a direct answer source for QA systems. Among them, 120,000 abstracts are presented in a structured style, including traditional scientific paper sections such as "Background" and "Results," enabling these parts to serve as context to aid understanding and conclusion generation. This structure not only facilitates direct association between questions and answers but also raises the requirement for models to handle different writing styles. More than half of the article titles can be answered with a simple yes/no/maybe, providing rich research material for natural language processing and making PubMedQA a valuable resource for scientific reasoning and automated literature processing.
Task Type | Language | Train | Val | Test | File Format | Size |
---|---|---|---|---|---|---|
QA | English | 500 labeled 61.2k unlabeled 211.3k artificial label | - | 500 | .json | 656MB |
Statistic | PQA-L | PQA-U | PQA-A |
---|---|---|---|
Number of QA pairs | 1.0k | 61.2k | 211.3k |
Prop. of yes (%) | 55.2 | — | 92.8 |
Prop. of no (%) | 33.8 | — | 7.2 |
Prop. of maybe (%) | 11.0 | — | 0.0 |
Avg. question length | 14.4 | 15.0 | 16.3 |
Avg. context length | 238.9 | 237.3 | 238.0 |
Avg. long answer length | 43.2 | 45.9 | 41.0 |
A visualization of the distribution of PubMedQA topics. Nearly all instances are human studies and cover a wide range of topics, including retrospective, prospective, and cohort studies, different age groups, and healthcare-related topics such as treatment outcomes, prognosis, and disease. risk factors.
Proportional relationships between textual interpretations of different question types, inference types, and the presence or absence of numbers in context.
Question Type | % | Example Questions |
---|---|---|
Does a factor influence the output? | 36.5 | Does reducing spasticity translate into functional benefit? Does ibuprofen increase perioperative blood loss during hip arthroplasty? |
Is a therapy good/necessary? | 26.0 | Should circumcision be performed in childhood? Is external palliative radiotherapy for gallbladder carcinoma effective? |
Is a statement true? | 18.0 | Sternal fracture in growing children: A rare and often overlooked fracture? Xanthogranulomatous cholecystitis: a premalignant condition? |
Is a factor related to the output? | 18.0 | Can PRISM predict length of PICU stay? Is trabecular bone related to primary stability of miniscrews? |
Reasoning Type | % | Example Snippet in Context |
Inter-group comparison | 57.5 | Postoperative AF was significantly lower in the Statin group compared with the Non-statin group (16% versus 33%, p=0.005). |
Interpreting subgroup statistics | 16.5 | 57% of patients were of lower socioeconomic status and they had more health problems, less functioning, and more symptoms |
Interpreting (single) group statistics | 16.0 | A total of 4 children aged 5-14 years with a sternal fracture were treated in 2 years, 2 children were hospitalized for pain management and... |
Text Interpretations of Numbers | % | Example Snippet in Context |
Existing interpretations of numbers | 75.5 | Postoperative AF was significantly lower in the Statin group compared with the Non-statin group (16% versus 33%, p=0.005). |
No interpretations (numbers only) | 21.0 | 30-day mortality was 12.4% in those aged<70 years and 22% in those>70 years (p<0.001). |
No numbers (texts only) | 3.5 | The halofantrine therapeutic dose group showed loss and distortion of inner hair cells and inner phalangeal cells |
Official paper data example
"21645374": {
"QUESTION": "Do mitochondria play a role in remodelling lace plant leaves during programmed cell death?",
"CONTEXTS": [
"Programmed cell death (PCD) is the regulated death of cells within an organism. The lace plant (Aponogeton madagascariensis) produces perforations in its leaves through PCD. The leaves of the plant consist of a latticework of longitudinal and transverse veins enclosing areoles. PCD occurs in the cells at the center of these areoles and progresses outwards, stopping approximately five cells from the vasculature. The role of mitochondria during PCD has been recognized in animals; however, it has been less studied during PCD in plants.",
"The following paper elucidates the role of mitochondrial dynamics during developmentally regulated PCD in vivo in A. madagascariensis. A single areole within a window stage leaf (PCD is occurring) was divided into three areas based on the progression of PCD; cells that will not undergo PCD (NPCD), cells in early stages of PCD (EPCD), and cells in late stages of PCD (LPCD). Window stage leaves were stained with the mitochondrial dye MitoTracker Red CMXRos and examined. Mitochondrial dynamics were delineated into four categories (M1-M4) based on characteristics including distribution, motility, and membrane potential (\u0394\u03a8m). A TUNEL assay showed fragmented nDNA in a gradient over these mitochondrial stages. Chloroplasts and transvacuolar strands were also examined using live cell imaging. The possible importance of mitochondrial permeability transition pore (PTP) formation during PCD was indirectly examined via in vivo cyclosporine A (CsA) treatment. This treatment resulted in lace plant leaves with a significantly lower number of perforations compared to controls, and that displayed mitochondrial dynamics similar to that of non-PCD cells."
],
"LABELS": [
"BACKGROUND",
"RESULTS"
],
"MESHES": [
"Alismataceae",
"Apoptosis",
"Cell Differentiation",
"Mitochondria",
"Plant Leaves"
],
"YEAR": "2011",
"reasoning_required_pred": "yes",
"reasoning_free_pred": "yes",
"final_decision": "yes",
"LONG_ANSWER": "Results depicted mitochondrial dynamics in vivo as PCD progresses within the lace plant, and highlight the correlation of this organelle with other organelles during developmental PCD. To the best of our knowledge, this is the first report of mitochondria and chloroplasts moving on transvacuolar strands to form a ring structure surrounding the nucleus during developmental PCD. Also, for the first time, we have shown the feasibility for the use of CsA in a whole plant system. Overall, our findings implicate the mitochondria as playing a critical and early role in developmentally regulated PCD in the lace plant."
},
The dataset file structure is as follows: ori_pqaa.json
includes 1,000 expert-annotated instances, ori_pqal.json
contains 61.2 thousand unannotated instances, ori_pqau.json
contains 211.3 thousand artificially generated instances, and there is a test_ground_truth.json
file which includes the yes/no/maybe answers for 500 test instances.
PubMedQA
│
├── ori_pqaa.json
├── ori_pqal.json
├── ori_pqau.json
└── test_ground_truth.json
Qiao Jin (University of Pittsburgh, USA)
Bhuvan Dhingra (Carnegie Mellon University, USA)
Zhengping Liu (University of Pittsburgh, USA)
William W. Cohen (Google AI, USA)
Xinghua Lu (University of Pittsburgh, USA)
Official Website: https://pubmedqa.github.io/
Download Link: https://github.com/pubmedqa/pubmedqa?tab=readme-ov-file#download
Article Address: https://arxiv.org/abs/1909.06146
Publication Date: 2019
@inproceedings{jin2019pubmedqa,
title={PubMedQA: A Dataset for Biomedical Research Question Answering},
author={Jin, Qiao and Dhingra, Bhuwan and Liu, Zhengping and Cohen, William and Lu, Xinghua},
booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)},
pages={2567--2577},
year={2019}
}
Original introduction article is here.