-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Efficient Schafer-Strimmer for MinT #280
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@christophertitchen Again thanks for the remarks, I made a few other micro optimizations, in the end in total further reducing computation time by ~30% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy the comments were helpful!
The only further change I thought of would be to potentially modify the naïve approach when computing covariance and correlation to a pairwise approach for each time series, like what is done in stats::cov
for R. This could better describe the relationship between pairs of time series, but honestly, it would slow down the generation of the shrunk
As a quick example to demonstrate, imagine monthly time series representing the units sold of products over the past six months. We take two of those products,
This approach would use an n_samples
of n_samples
of
not_mask = ~np.isnan(residuals)
...
for i in prange(n_timeseries):
x = residuals[i]
for j in range(i + 1):
y = residuals[j]
# Get the temporally aligned pairs of observations.
mask = not_mask[i] & not_mask[j]
n_samples = len(mask)
# Check for insufficient observations for unbiased estimator.
if n_samples - 1 == 0:
...
x = x[mask]
y = y[mask]
# Calculate the pairwise sample means.
mean_x = np.mean(x)
mean_y = np.mean(y)
...
Anyway, I am not familiar with any literature on this topic or if there is any evidence to suggest it is significantly worse, and the legacy version also takes the same approach, so as I said, I favour the fast approach you have implemented here. Good job!
Didn't think about this, but it's a good point. I'll have a try and play around a bit with it :) |
@christophertitchen Thanks for the suggestion for the temporal alignment - I included two versions, one that can handle NaNs in the way you describe and a faster one that doesn't have to deal with NaNs. The NaN-version is a bit slower (a.o. due to the masking involved) but it enlarges the scope of MinT significantly, as it can handle much more cases without leading to ill-conditioned |
904444a
to
37121e9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a few more thoughts, but I am honestly not too familiar with the Schäfer and Strimmer paper, so I will leave the rest in your more capable hands! 😅
@christophertitchen Again thanks for the thoughtful comments. I think I'll be deprecating the legacy version now that I get equivalent results across datasets. |
@elephaint great job! Also, regarding Hyndman's implementation, you were right. https://github.com/earowang/hts/blob/1408ab2c5c40f1022e6957bcf8438aaefc8464bf/R/MinT.R#L24
It uses the biased estimator, i.e. normalisation by |
This PR implements an optimized version of the Schafer-Strimmer shrunk empirical covariance algorithm for
MinT
.1. Up to 60x faster
First, the result is that we can perform
MinT
reconciliation much faster, as demonstrated in the table below, wheremint_legacy
is the old version andmint
is the new version, with the number in brackets denoting whether the forecasts contain NaNs (True or False) and the number of bottom-level timeseries reconciled (i.e.,20
,200
,2000
). An improvement of up to 60x (on my machine, i7-12700k) is seen (no NaNs, 2000 timeseries).Note that the NaN-version is about 3x slower than the version that doesn't have to deal with NaNs, which is due to all the masking involved when dealing with NaNs. However, it is still significantly faster than the legacy version (that doesn't handle NaNs properly).
2. Forecasting performance
Forecasting performance is similar:
3. Improved NaN handling
This PR improves NaN-handling (thanks to @christophertitchen for the suggestions!), leading to more cases that MinT can handle. For example, introducing 20% NaN values in the Australian Tourism forecasts couldn't be handled by the old version but the new version will give results nearly identical to the no-nan case:
Code for introducing NaNs in the Australian Tourism example: