This repository collects medical data that can be freely accessed and used in research.
- MIMIC-IV (40k) 2008 - 2019
- MIMIC (Medical Information Mart for Intensive Care) is a large, freely-available database comprising de-identified health-related data from patients who were admitted to the critical care units of the Beth Israel Deaconess Medical Center. MIMIC-IV contains data from 2008 - 2019.
- Intro:
- Data:
- Code:
- eICU (200k) 2014-2015
- The eICU Collaborative Research Database is populated with data from a combination of many critical care units throughout the continental United States. The data in the collaborative database covers patients who were admitted to critical care units in 2014 and 2015.
- intro:
- Data:
- Code:
- HiRID - high time resolution ICU data set (33k) 2008-2016
- The dataset contains de-identified demographic information and a total of 712 routinely collected physiological variables, diagnostic test results and treatment parameters from more than 33 thousand admissions during the period from January 2008 to June 2016. Data is stored with a uniquely high time resolution of one entry every two minutes.
- Intro:
- Data:
- Code:
- AmsterdamUMCdb (23k) 2003-2016
- This is the first freely accessible intensive care database from within the European Union. It contains de-identified health data related to tens of thousands of intensive care unit admissions, including demographics, vital signs, laboratory tests and medications.This version (v1.0.2)contains data related to 23,106 intensive care unit and high dependency unit admissions of adult patients from 2003-2016.
- Intro & Data:
- Code:
- Critical care database comprising patients with infection at Zigong Fourth People's Hospital (2.79k)
- a critical care database relating to patients with infection from Chinese hospital
- Critical care database comprising patients with infection at Zigong Fourth People's Hospital v1.1
- VitalDB (6.8k) 2016-2017
- The VitalDB (Vital Signs DataBase) is an open dataset created specifically to facilitate machine learning studies related to monitoring vital signs in surgical patients. This dataset contains high-resolution multi-parameter data from 6,388 cases, including 486,451 waveform and numeric data tracks of 196 intraoperative monitoring parameters, 73 perioperative clinical parameters, and 34 time-series laboratory result parameters.
- Intro:
- Data:
- Code:
- Alzheimer’s Disease Neuroimaging Initiative (ADNI) database
- provides imaging, clinical, and genetic data for over 2220 patients spanning four studies
- Intro & Data:
- SEER
- The Surveillance, Epidemiology, and End Results (SEER) Program provides information on cancer statistics in an effort to reduce the cancer burden among the U.S. population.
- Sleep Heart Health Study (SHHS)
- Using the Compumedics PS polysomnograph, sleep studies were obtained in an unattended setting, usually in the homes of the participants, the dataset include ECGs, EEG and so on.
- Intro & Data:
- Hospitalized patients with heart failure
- a retrospective heart failure dataset using electronic health data collected from patients who were admitted to a hospital in China
- Intro & Data:
- medal: Large medical text dataset curated for abbreviation disambiguation, designed for natural language understanding pre-training in the medical domain
- MedQA: A large-scale Open domain question answering dataset from medical exams
- PubMedQA: A Dataset for Biomedical Research Question Answering
- MedMCQA: A large-scale (194k), Multiple-Choice Question Answering (MCQA) dataset designed to address real world medical entrance exam questions.
- MMLU:Measuring Massive Multitask Language Understanding, These include practice questions for tests such as the Graduate Record Examination and the United States Medical Licensing Examination.
- MIMIC-CXR (227k) 2011-2016
- a large publicly available dataset of chest radiographs in DICOM format with free-text radiology reports.
- MIMIC-CXR Database v2.0.0
- CheXpert (65k) 2002-2017
- a large public dataset for chest radiograph interpretation, consisting of 224,316 chest radiographs of 65,240 patients.
- CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison
- VinDr-CXR (18k) 2018-2020
- The published dataset consists of 18,000 postero-anterior (PA) view CXR scans that come with both the localization of critical findings and the classification of common thoracic diseases.
- VinDr-CXR: An open dataset and benchmarks for disease classification and abnormality localization on chest radiographs | VinDr
-
VinDr-Mammo (5k) 2018-2020
- a large-scale benchmark dataset of full-field digital mammography, called VinDr-Mammo
- VinDr-Mammo: A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography | VinDr
-
CBIS-DDSM (2.6k)1988-1999
- The DDSM is a database of 2,620 scanned film mammography studies. It contains normal, benign, and malignant cases with verified pathology information.
- Curated Breast Imaging Subset of Digital Database for Screening Mammography (CBIS-DDSM) - The Cancer Imaging Archive (TCIA) Public Access - Cancer Imaging Archive Wiki
-
GitHub - ys-zong/MEDFAIR: MEDFAIR: Benchmarking Fairness for Medical Imaging
- PTB-XL (18k) 1989-1996
- Chapman-Shaoxing (10k) 2020
- Georgia 12-Lead ECG Challenge (10k) 2020
- China Physiological Signal Challenge (6.8k) 2018
- SHHS (5.8k) 1995-1998
- includes 5,804 adults aged 40 and older
- BioLINCC: Sleep Heart Health Study (SHHS)
- ISRUC-Sleep(0.1k)2009-2013
- collected from subjects in hospital whose ages range from 20 years old to 85 years old, with an average age of 51
- ISRUC-SLEEP Dataset | A comprehensive public dataset for sleep researchers
- RadFusion: Multimodal Pulmonary Embolism Dataset(1.8k)2000-2016
- collected data from 1794 patients susceptible to pulmonary embolism at the Stanford University Medical Center. The dataset consists of Chest CT, patient demographics and medical history
- Stanford AIMI Shared Datasets
- You can search medical or other datasets in nature
scientific data
journal PhysioNet
websiteGoogle dataset
- Mendeley Data