This module contains the implementation of multiple medical time series datasets following the tensorflow dataset API.
Currently implemented datasets are:
physionet2012
(mortality prediction)mimic3_mortality
(mortality prediction)mimic3_phenotyping
(mortality prediction)physionet2019
(online sepsis early prediction)
In order to get a tensorflow dataset representation of one of the datasets simply
import tensorflow_datasets
and this module. The datasets can then be accessed
like any other tensorflow dataset.
import tensorflow_datasets as tfds
import medical_ts_datasets
physionet_dataset = tfds.load(name='physionet2012', split='train')
Each instance in the dataset is represented as a nested directory of the following structure:
statics
: Static variables such as demographics or the unit the patient was admitted totime
: Scalar time variable containing the time since admission in hoursvalues
: Observation values of time series, these by default contain NaN for modalities which were not observed for the given timepoint.targets
: Directory of potential target values, the available endpoints are dataset specific.metadata
: Directory of metadata on an individual patient, such as the identifier using in the dataset.
If the load method is called with the flag as_supervised=True
, it will
return a dataset which can readily be used together with keras. Here each
instance is represented by a (X, y) tuple and the X tuple contains the
following 4 elements: time
, values
, measurements
(indicators if
a value was measured or not) and length
.
When using datasets from this repository, please do not forget to cite the following references besides a reference to the original dataset:
@InProceedings{horn2020set,
title = {{S}et {F}unctions for {T}ime {S}eries},
author = {Max Horn and Michael Moor and Christian Bock and Bastian Rieck and Karsten Borgwardt},
booktitle = {Proceedings of the 37th International Conference on Machine Learning},
year = {2020},
volume = {119},
series = {Proceedings of Machine Learning Research},
publisher = {PMLR}
}
@misc{TFDS,
title = {{TensorFlow Datasets}, A collection of ready-to-use datasets},
howpublished = {\url{https://www.tensorflow.org/datasets}},
}