Skip to content

ExpectationMax/medical_ts_datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Medical time series datasets

This module contains the implementation of multiple medical time series datasets following the tensorflow dataset API.

Currently implemented datasets are:

  • physionet2012 (mortality prediction)
  • mimic3_mortality (mortality prediction)
  • mimic3_phenotyping (mortality prediction)
  • physionet2019 (online sepsis early prediction)

Example usage

In order to get a tensorflow dataset representation of one of the datasets simply import tensorflow_datasets and this module. The datasets can then be accessed like any other tensorflow dataset.

import tensorflow_datasets as tfds
import medical_ts_datasets

physionet_dataset = tfds.load(name='physionet2012', split='train')

Instance structure

Each instance in the dataset is represented as a nested directory of the following structure:

  • statics: Static variables such as demographics or the unit the patient was admitted to
  • time: Scalar time variable containing the time since admission in hours
  • values: Observation values of time series, these by default contain NaN for modalities which were not observed for the given timepoint.
  • targets: Directory of potential target values, the available endpoints are dataset specific.
  • metadata: Directory of metadata on an individual patient, such as the identifier using in the dataset.

Supervised dataset

If the load method is called with the flag as_supervised=True, it will return a dataset which can readily be used together with keras. Here each instance is represented by a (X, y) tuple and the X tuple contains the following 4 elements: time, values, measurements (indicators if a value was measured or not) and length.

Citing

When using datasets from this repository, please do not forget to cite the following references besides a reference to the original dataset:

@InProceedings{horn2020set,
    title = {{S}et {F}unctions for {T}ime {S}eries},
    author = {Max Horn and Michael Moor and Christian Bock and Bastian Rieck and Karsten Borgwardt},
    booktitle = {Proceedings of the 37th International Conference on Machine Learning},
    year = {2020},
    volume = {119},
    series = {Proceedings of Machine Learning Research},
    publisher = {PMLR}
}

@misc{TFDS,
  title = {{TensorFlow Datasets}, A collection of ready-to-use datasets},
  howpublished = {\url{https://www.tensorflow.org/datasets}},
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages