Time series #267

vincentqb · 2019-09-05T13:47:16Z

I'd like to reflect upon what functions would be desired (that could live within torchaudio or outside) in order to offer preliminary support for time series.

Time series data format (e.g. many channels compared to audio)
Missing data imputing
Interaction with calendar information
Conversion to other formats, say to audio waveform
Option on transformations to respect time direction
Streaming use case?

Could we make sure that our constructs are general enough to touch on time series, without sacrificing the primary goal of audio for this library?

Motivation

An audio (multichannel) waveform is a (vector) time series with constant time step whose length is given by sample_rate.

Audio processing and time series analysis are related, though their goals may differ. The type of transformations used in audio and general time series are sometimes different (i.e. dB, Mel, ...). For instance, in time series forecasting, transforms are usually expected to respect the time direction, and only consume past information for future value, as in "online" consumption of audio waveform.

@nairbv @zdevito @kingjr @adefossez @gully -- do you have use cases for time series that could relate to torchaudio?

Additional context

Python pandas
R tidyverse time series
Prophet
GPyTorch for PyTorch and Gaussian Processes
Internal doc streaming (internal doc)

The text was updated successfully, but these errors were encountered:

vincentqb · 2019-09-18T15:33:27Z

@sergulaydore -- for reference

gully · 2019-10-12T00:04:05Z

Here are some perspectives on timeseries from the astronomy data perspective.

The astropy project had a discussion on tradeoffs surrounding a TimeSeries class for astronomical applications in their ongoing Proposal for Enhancement. There are some subtle discussions distinguishing two types of time series:

sampled time series that sum up a count rate observed over a time interval, such as how many photons were received from a telescope sensor in a 30 minute interval
event data that are timestamps of discrete events, such as the energy of single proton measured at the instant of impinging a sensor.

The distinction essentially comes down to sparsity-- populating zeros in between infrequent/discrete events is wasteful.

Here at the NASA Kepler/K2 Guest Observer Office we focus on high-precision flux time series: the brightness of a star measured every 30 minutes for four years, with a quarterly gaps for transmitting the telescope data back to Earth. You can see that this acquisition rate yields a modest amount of data by the standards of audio: our "impressive" 70,000 time samples is acquired in under 2 seconds of single channel 44.1 kHz audio.

Some other distinctions: our time series data come with metadata headers that are generally preserved in our objects. Each time sample possesses columns (multichannels) of mixed data types: time, flux (float), flux uncertainty, quality flag (int), quality mask (bool), sky coordinate xy movement. Our in-house toolkit lightkurve deals with this time series data, with tons of application-specific pre-processing steps that wouldn't matter much for a general time series class. The name nods to the convention of "light curves" rather than the audio-familiar waveforms.

We do frequency-domain analysis with FFTs all the time with some slight differences: we use an algorithm that can support unevenly sampled time spacings. We occasionally do spectrogram analysis, but you can see that a 70,000 sample signal can only be cut into 175 bins of nfft=400, which makes for a crude spectrogram.

Astronomers use scalable Gaussian Process analysis all the time. Popular frameworks are tailored towards 1D time series astronomy, but could (and should?) apply more broadly to time series applications that care about uncertainty quantification or probabilistic prediction. The GPyTorch framework is promising, and I aspire to create astronomy-specific demos to advertise this library more widely to astronomers. The fixed time sample size of audio makes it amenable to some of the geometric assumptions of GPyTorch.

Those are some thoughts for now. Very curious to see how these themes evolve!

nairbv · 2020-01-17T21:54:59Z

For non-audio applications (e.g. in finance) I could imagine a number of useful features/functions.

I'm not familiar with audio time series requirements, but similar to what @gully describes above, there are a number of ways time series of financial data can be represented that may be broadly applicable:

In raw trade or tick data, each data point discretely represents a trade or price change. Some data might include each change to the best bid and ask.
Tick data is typically aggregated into "candlesticks" as open (start), low, high, close (final), volume (total number of shares traded) per time period. Each period then ends up being represented as a vector of these five values.
Other approaches similar to the "sampled time series" described above by @gully would be open/low/close/high/duration per N trades or shares traded or ticks. There are a variety of approaches like this that can be used for summarizing "bars" of discrete financial time series data.

Ideally a time-series representation should be flexible/abstract enough so that other representations can be added easily. Tools that convert representations of the data could be useful.

Some other functionality that could be useful in time series tools, at least if applied to certain financial problems:

For "Interaction with calendar information," it could be useful to have a way to "join" multiple time series from different sources.
- One may want to train a single model with data from multiple securities aligned on time.
- Maybe also useful for multi-modal models or stereo audio?
A way to augment time-series data with cumulative or moving averages, stdev, etc.
- Traders often augment their price data with a variety of derived metrics (bollinger bands, EMA, SMA, MACD, etc). I'm not sure if there are similar derived metrics from audio time series.
Forecasting data loaders that help deal with look-ahead or recency bias, maybe using sliding time windows?
- It's easy to introduce look ahead bias, especially if trained online learning incrementally.
- One wouldn't want to re-train a model from scratch with each new tick, but could use some kind of sampling method to incorporate new information while controlling or eliminating recency bias.
- Ways to preprocess the data during loading, e.g. to convert values to deltas or returns
Ways to test for and adjust for stationarity.
- One might want to normalize a return series with mean return, but need to use a cumulative or rolling historical mean to avoid look ahead bias.
Something for generating simplistic auto-regressive test time series could be useful (http://www.jessicayung.com/generating-autoregressive-data-for-experiments/)

nairbv · 2020-06-04T16:21:40Z

More ideas:

Functions to convert time series to Gramian Angular Fields, for applying vision models to time series data:
https://www.aaai.org/ocs/index.php/WS/AAAIW15/paper/viewFile/10179/10251
http://www.ieee-jas.org/article/doi/10.1109/JAS.2020.1003132?pageType=en
Functions to convert duration or date/time to positional encodings (for attention models).

vincentqb · 2020-12-15T15:30:22Z

pytorch/pytorch#49338

vincentqb self-assigned this Sep 5, 2019

vincentqb mentioned this issue May 7, 2020

Quantile Regression Loss pytorch/pytorch#38035

Open

vincentqb changed the title ~~Time series?~~ Time series Jan 8, 2021

mthrok closed this as completed Jul 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time series #267

Time series #267

vincentqb commented Sep 5, 2019 •

edited

Loading

vincentqb commented Sep 18, 2019

gully commented Oct 12, 2019

nairbv commented Jan 17, 2020 •

edited

Loading

nairbv commented Jun 4, 2020

vincentqb commented Dec 15, 2020

Time series #267

Time series #267

Comments

vincentqb commented Sep 5, 2019 • edited Loading

Motivation

Additional context

vincentqb commented Sep 18, 2019

gully commented Oct 12, 2019

nairbv commented Jan 17, 2020 • edited Loading

nairbv commented Jun 4, 2020

vincentqb commented Dec 15, 2020

vincentqb commented Sep 5, 2019 •

edited

Loading

nairbv commented Jan 17, 2020 •

edited

Loading