Skip to content

sugrrants: Visual methods for big temporal data

Dianne Cook edited this page Feb 3, 2017 · 7 revisions

Background

The abundance of temporal data drives the demand of understanding and exploring trend, seasonal patterns and spotting anomalies using a variety of visual tools. However, the time series visualisation toolbox is inadequate in exploiting the information while temporal data are big in length, big in volume, and big in complexity at present. If a time series is substantially long over the time span, consequently, conventional line graphs turn out ineffective in displaying such time series as the natural ordering of the time leads a mess of all the lines on a given screen. More possible visualisation challenges come with a large collection of time series, such as depicting associations between time series and increasing data density (Tufte, 2001) while retaining the clarity. Other complications of temporal data, such as irregular time intervals, missing values, anomalies and additional variables/structures, increase the difficulty and complexity of visualising time series.

In this project, we aim to build a new package named sugrrants to support and extend temporal data exploration and visualisation, and possibly adds interactivity using plotly and shiny. Furthermore, it will handle visualisation for time series models (e.g. ARIMA and ETS), for example visualising diagnostics to evaluate the model performance (i.e. model in the data space) and visualising time series data in the model space (Wickham, Cook, Hofmann, 2015).

Related work

zoo and xts are the most popular R packages handling time series data. Both define their own objects as zoo and xts classes respectively. zoo provides algorithms for regularising unequally spaced time series, imputing missing values, and applying a function based on fixed and rolling windows. It also provides ggplot2.zoo for plotting overlaying time series and small multiples for a zoo object. xts extends zoo to work with different date-time based data in R. These two packages intend to be consistent with ts and mts objects in base R.

Apart from time series data wrangling, statistical modelling is one of the most important components, which is fulfilled by widely-used R packages forecast, hts. The fundamental data structure of forecast and hts relies on ts and mts objects. forecast not only builds models on time series and produces forecasting, but also provides methods to detect outliers and handle missing values for a time series. hts is targeted at forecasting hierarchical and grouped time series.

Packages such as quantmod and TTR focus on financial time series. The most recent tidyquant package integrates the tidy data framework with these time series packages including quantmod, TTR, zoo and xts for analysing and plotting financial data in the data frame (tibble) format easily.

A quite new package padr provides functionalities to make implicit missing data explicit with imputed records and aggregate to a higher-level time interval for date-time based data.

Details of your coding project

  • Tidy data: a couple of functions would be expected to facilitate the conversions between data frames and ts, mts, hts and forecast objects.
  • Slicing and dicing: slicing and wrapping/faceting sub-series based on input temporal components from a given date-time data frame. It would be helpful for plotting a long time series of multiple seasonality.
  • Calendar view: it has become increasingly common to collect data at daily or even lower time resolution. This creates a need to employ our familiar calendar layout to effectively organise raw series, particularly if seasonal effects, such as time of a day, day of a week and day of a year, are present in the data. New Geom for handling a calendar-format data frame will be added.
  • Nested facet: ggplot2::facet_grid nicely handles a two-dimensional crossing categories (a*b), while there are a nested expression (a/b) lending itself to one-dimensional structure. Wilkinson and Wills (2005) point out that (a*b) and (a/b) seem subtle in terms of layout but are of fundamental difference in interpretations between crossing and nesting. A new facet system would improve the labelling to emphasise the nested structure. It may be also helpful for demonstrating time series in a hierarchical structure for an hts object.

Expected impact

The completion of this package is expected to contribute useful visual methods into the existing graphical toolkit for big temporal data and make seamless time series analysis with tidyverse packages.

Mentors

  • Dr Dianne Cook, visualisation
  • Dr Rob Hyndman, forecasting

Tests

NA

Solutions of tests

NA

Clone this wiki locally