-
Notifications
You must be signed in to change notification settings - Fork 31
sugrrants: Visual methods for big temporal data
The abundance of temporal data drives the demand of understanding and exploring trend, seasonal patterns and spotting anomalies using a variety of visual tools. However, the time series visualisation toolbox is inadequate in exploiting the information while temporal data are big in length, big in volume, and big in complexity at present. If a time series is substantially long over the time span, consequently, conventional line graphs turn out ineffective in displaying such time series as the natural ordering of the time leads a mess of all the lines on a given screen. More possible visualisation challenges come with a large collection of time series, such as depicting associations between time series and increasing data density (Tufte, 2001) while retaining the clarity. Other complications of temporal data, such as irregular time intervals, missing values, anomalies and additional variables/structures, increase the difficulty and complexity of visualising time series.
In this project, we aim to build a new package named sugrrants
to support and
extend temporal data exploration and visualisation, and possibly adds
interactivity using plotly
and shiny
. Furthermore, it will handle
visualisation for time series models (e.g. ARIMA and ETS), for example
visualising diagnostics to evaluate the model performance (i.e. model in the
data space) and visualising time series data in the model space (Wickham, Cook,
Hofmann, 2015).
zoo
and xts
are the most popular R packages handling time series data. Both
define their own objects as zoo
and xts
classes respectively. zoo
provides
algorithms for regularising unequally spaced time series, imputing missing
values, and applying a function based on fixed and rolling windows. It also
provides ggplot2.zoo
for plotting overlaying time series and small multiples
for a zoo
object. xts
extends zoo
to work with different date-time based
data in R. These two packages intend to be consistent with ts
and mts
objects in base R.
Apart from time series data wrangling, statistical modelling is one of the
most important components, which is fulfilled by widely-used R packages
forecast
, hts
. The fundamental data structure of forecast
and hts
relies
on ts
and mts
objects. forecast
not only builds models on time series and
produces forecasting, but also provides methods to detect outliers and handle
missing values for a time series. hts
is targeted at forecasting hierarchical
and grouped time series.
Packages such as quantmod
and TTR
focus on financial time series. The most
recent tidyquant
package integrates the tidy data framework with these time
series packages including quantmod
, TTR
, zoo
and xts
for analysing and
plotting financial data in the data frame (tibble
) format easily.
A quite new package padr
provides functionalities to make implicit missing
data explicit with imputed records and aggregate to a higher-level time interval
for date-time based data.
-
Tidy data: a couple of functions would be expected to facilitate the
conversions between data frames and
ts
,mts
,hts
andforecast
objects. - Slicing and dicing: slicing and wrapping/faceting sub-series based on input temporal components from a given date-time data frame. It would be helpful for plotting a long time series of multiple seasonality.
-
Calendar view: it has become increasingly common to collect data at daily or
even lower time resolution. This creates a need to employ our familiar
calendar layout to effectively organise raw series, particularly if seasonal
effects, such as time of a day, day of a week and day of a year, are present
in the data. New
Geom
for handling a calendar-format data frame will be added. -
Nested facet:
ggplot2::facet_grid
nicely handles a two-dimensional crossing categories (a*b), while there are a nested expression (a/b) lending itself to one-dimensional structure. Wilkinson and Wills (2005) point out that (a*b) and (a/b) seem subtle in terms of layout but are of fundamental difference in interpretations between crossing and nesting. A new facet system would improve the labelling to emphasise the nested structure. It may be also helpful for demonstrating time series in a hierarchical structure for anhts
object.
The completion of this package is expected to contribute useful visual methods
into the existing graphical toolkit for big temporal data and make seamless time
series analysis with tidyverse
packages.
- Dr Dianne Cook, visualisation
- Dr Rob Hyndman, forecasting
NA
NA