-
Notifications
You must be signed in to change notification settings - Fork 644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Time series #267
Comments
@sergulaydore -- for reference |
Here are some perspectives on timeseries from the astronomy data perspective. The astropy project had a discussion on tradeoffs surrounding a TimeSeries class for astronomical applications in their ongoing Proposal for Enhancement. There are some subtle discussions distinguishing two types of time series:
The distinction essentially comes down to sparsity-- populating zeros in between infrequent/discrete events is wasteful. Here at the NASA Kepler/K2 Guest Observer Office we focus on high-precision flux time series: the brightness of a star measured every 30 minutes for four years, with a quarterly gaps for transmitting the telescope data back to Earth. You can see that this acquisition rate yields a modest amount of data by the standards of audio: our "impressive" 70,000 time samples is acquired in under 2 seconds of single channel 44.1 kHz audio. Some other distinctions: our time series data come with metadata headers that are generally preserved in our objects. Each time sample possesses columns (multichannels) of mixed data types: time, flux (float), flux uncertainty, quality flag (int), quality mask (bool), sky coordinate xy movement. Our in-house toolkit lightkurve deals with this time series data, with tons of application-specific pre-processing steps that wouldn't matter much for a general time series class. The name nods to the convention of "light curves" rather than the audio-familiar waveforms. We do frequency-domain analysis with FFTs all the time with some slight differences: we use an algorithm that can support unevenly sampled time spacings. We occasionally do spectrogram analysis, but you can see that a 70,000 sample signal can only be cut into 175 bins of Astronomers use scalable Gaussian Process analysis all the time. Popular frameworks are tailored towards 1D time series astronomy, but could (and should?) apply more broadly to time series applications that care about uncertainty quantification or probabilistic prediction. The GPyTorch framework is promising, and I aspire to create astronomy-specific demos to advertise this library more widely to astronomers. The fixed time sample size of audio makes it amenable to some of the geometric assumptions of GPyTorch. Those are some thoughts for now. Very curious to see how these themes evolve! |
For non-audio applications (e.g. in finance) I could imagine a number of useful features/functions. I'm not familiar with audio time series requirements, but similar to what @gully describes above, there are a number of ways time series of financial data can be represented that may be broadly applicable:
Ideally a time-series representation should be flexible/abstract enough so that other representations can be added easily. Tools that convert representations of the data could be useful. Some other functionality that could be useful in time series tools, at least if applied to certain financial problems:
|
More ideas:
|
I'd like to reflect upon what functions would be desired (that could live within torchaudio or outside) in order to offer preliminary support for time series.
Could we make sure that our constructs are general enough to touch on time series, without sacrificing the primary goal of audio for this library?
Motivation
An audio (multichannel) waveform is a (vector) time series with constant time step whose length is given by
sample_rate
.Audio processing and time series analysis are related, though their goals may differ. The type of transformations used in audio and general time series are sometimes different (i.e. dB, Mel, ...). For instance, in time series forecasting, transforms are usually expected to respect the time direction, and only consume past information for future value, as in "online" consumption of audio waveform.
@nairbv @zdevito @kingjr @adefossez @gully -- do you have use cases for time series that could relate to torchaudio?
Additional context
The text was updated successfully, but these errors were encountered: