The Free Spoken Digit Dataset consists of 4 individuals speaking the digits 0 through 9 50 times each for a total size of 2000 wave files. We took each segment, normalized it zero-mean, unit-variance, clipped it for white space and then converted the resulting arrays into 2-D images. This data can be thought of as ”spoken MNIST” in image form.
The Data Repo can be viewed at the following link and downloaded from the following tarfile (38 MB file)
Additional images referred to in the ICML 2019 Time Series workshop may be found in the arxiv preprint which focuses solely on the medical time series data. In particular Figure 1 shows MNIST prototypes, while Figure 5 and 6 and Tables 1 and 2 refer to NICU related accuracy plots, confusion matrices, and hyper parameter experimental results.