[FEAT] Efficient Schafer-Strimmer for MinT #280

elephaint · 2024-08-16T19:42:26Z

This PR implements an optimized version of the Schafer-Strimmer shrunk empirical covariance algorithm for MinT.

1. Up to 60x faster

First, the result is that we can perform MinT reconciliation much faster, as demonstrated in the table below, where mint_legacy is the old version and mint is the new version, with the number in brackets denoting whether the forecasts contain NaNs (True or False) and the number of bottom-level timeseries reconciled (i.e., 20, 200, 2000). An improvement of up to 60x (on my machine, i7-12700k) is seen (no NaNs, 2000 timeseries).

Note that the NaN-version is about 3x slower than the version that doesn't have to deal with NaNs, which is due to all the masking involved when dealing with NaNs. However, it is still significantly faster than the legacy version (that doesn't handle NaNs properly).

2. Forecasting performance

Forecasting performance is similar:

3. Improved NaN handling

This PR improves NaN-handling (thanks to @christophertitchen for the suggestions!), leading to more cases that MinT can handle. For example, introducing 20% NaN values in the Australian Tourism forecasts couldn't be handled by the old version but the new version will give results nearly identical to the no-nan case:

Code for introducing NaNs in the Australian Tourism example:

# Randomly set NaNs
nan_fraction = 0.2
Y_fitted_df = Y_fitted_df.reset_index()
Y_fitted_df_sample_idx = Y_fitted_df.sample(frac=nan_fraction).index
Y_fitted_df.loc[Y_fitted_df_sample_idx, ["y", "ETS"]] = np.nan
Y_fitted_df = Y_fitted_df.set_index("unique_id")

review-notebook-app · 2024-08-16T19:42:32Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

hierarchicalforecast/methods.py

elephaint · 2024-08-23T10:34:37Z

@christophertitchen Again thanks for the remarks, I made a few other micro optimizations, in the end in total further reducing computation time by ~30%

christophertitchen

I am happy the comments were helpful!

The only further change I thought of would be to potentially modify the naïve approach when computing covariance and correlation to a pairwise approach for each time series, like what is done in stats::cov for R. This could better describe the relationship between pairs of time series, but honestly, it would slow down the generation of the shrunk $W$ and go against the spirit of what you were trying to achieve here by implementing a fast method, so I think it is perfectly fine as you have done it.

As a quick example to demonstrate, imagine monthly time series representing the units sold of products over the past six months. We take two of those products, $X$ and $Y$, with $Y$ being a NPI (new product introduction) which was implemented only four months prior. The residuals of a naïve forecast would be:

$X = [(x_{t-5} - x_{t-6}), (x_{t-4} - x_{t-5}), (x_{t-3} - x_{t-4}), (x_{t-2} - x_{t-3}), (x_{t-1} - x_{t-2}), (x_{t} - x_{t-1})]$

$Y = [-, -, -, (y_{t-2} - y_{t-3}), (y_{t-1} - y_{t-2}), (y_{t} - y_{t-1})]$

This approach would use an n_samples of $6$ and the the mean of $X$ would include all six observations. However, a pairwise approach would only use pairs of temporally aligned observations, so an n_samples of $3$, and the mean of $X$ would be $\overline{X} = ((x_{t-2} - x_{t-3}) + (x_{t-1} - x_{t-2}) + (x_{t} - x_{t-1})) / 3$, which would obviously lead to a different $\text{cov}(X, Y)$, $\text{corr}(X, Y)$, $\sigma_{x}$ for standardisation and so on.

not_mask = ~np.isnan(residuals)

...
for i in prange(n_timeseries):
    x = residuals[i]
    for j in range(i + 1):
        y = residuals[j]
        # Get the temporally aligned pairs of observations.
        mask = not_mask[i] & not_mask[j]
        n_samples = len(mask)
        # Check for insufficient observations for unbiased estimator.
        if n_samples - 1 == 0:
            ...
        x = x[mask]
        y = y[mask]
        # Calculate the pairwise sample means.
        mean_x = np.mean(x)
        mean_y = np.mean(y)
        ...

Anyway, I am not familiar with any literature on this topic or if there is any evidence to suggest it is significantly worse, and the legacy version also takes the same approach, so as I said, I favour the fast approach you have implemented here. Good job!

elephaint · 2024-08-26T17:14:16Z

I am happy the comments were helpful!

The only further change I thought of would be to potentially modify the naïve approach when computing covariance and correlation to a pairwise approach for each time series, like what is done in stats::cov for R. This could better describe the relationship between pairs of time series, but honestly, it would slow down the generation of the shrunk W and go against the spirit of what you were trying to achieve here by implementing a fast method, so I think it is perfectly fine as you have done it.

As a quick example to demonstrate, imagine monthly time series representing the units sold of products over the past six months. We take two of those products, X and Y , with Y being a NPI (new product introduction) which was implemented only four months prior. The residuals of a naïve forecast would be:

X = [ ( x t − 6 − x t − 7 ) , ( x t − 5 − x t − 6 ) , ( x t − 4 − x t − 5 ) , ( x t − 3 − x t − 4 ) , ( x t − 2 − x t − 3 ) , ( x t − 1 − x t − 2 ) ]

Y = [ − , − , − , ( y t − 3 − y t − 4 ) , ( y t − 2 − y t − 3 ) , ( y t − 1 − y t − 2 ) ]

This approach would use an n_samples of 6 and the the mean of X would include all six observations. However, a pairwise approach would only use pairs of temporally aligned observations, so an n_samples of 3 , and the mean of X would be X ― = ( ( x t − 3 − x t − 4 ) + ( x t − 2 − x t − 3 ) + ( x t − 1 − x t − 2 ) ) / 3 , which would obviously lead to a different cov ( X , Y ) , corr ( X , Y ) , σ x for standardisation and so on.
not_mask = ~np.isnan(residuals)
...
for i in prange(n_timeseries):
    x = residuals[i]
    for j in range(i + 1):
        y = residuals[j]
        # Get the temporally aligned pairs of observations.
        mask = not_mask[i] & not_mask[j]
        n_samples = len(mask)
        # Check for insufficient observations for unbiased estimator.
        if n_samples - 1 == 0:
            ...
        x = x[mask]
        y = y[mask]
        # Calculate the pairwise sample means.
        mean_x = np.mean(x)
        mean_y = np.mean(y)
        ...
Anyway, I am not familiar with any literature on this topic or if there is any evidence to suggest it is significantly worse, and the legacy version also takes the same approach, so as I said, I favour the fast approach you have implemented here. Good job!

Didn't think about this, but it's a good point. I'll have a try and play around a bit with it :)

elephaint · 2024-08-27T09:41:28Z

@christophertitchen Thanks for the suggestion for the temporal alignment - I included two versions, one that can handle NaNs in the way you describe and a faster one that doesn't have to deal with NaNs. The NaN-version is a bit slower (a.o. due to the masking involved) but it enlarges the scope of MinT significantly, as it can handle much more cases without leading to ill-conditioned W matrices (see example in the top post). Thanks!

christophertitchen

I added a few more thoughts, but I am honestly not too familiar with the Schäfer and Strimmer paper, so I will leave the rest in your more capable hands! 😅

hierarchicalforecast/methods.py

elephaint · 2024-09-02T09:42:25Z

@christophertitchen Again thanks for the thoughtful comments.

I think I'll be deprecating the legacy version now that I get equivalent results across datasets.

christophertitchen · 2024-09-02T16:03:32Z

@christophertitchen Again thanks for the thoughtful comments.

I think I'll be deprecating the legacy version now that I get equivalent results across datasets.

@elephaint great job!

Also, regarding Hyndman's implementation, you were right.

https://github.com/earowang/hts/blob/1408ab2c5c40f1022e6957bcf8438aaefc8464bf/R/MinT.R#L24

  covm <- crossprod(x) / n

  v <- (1/(n * (n - 1))) * (crossprod(xs^2) - 1/n * (crossprod(xs))^2)

It uses the biased estimator, i.e. normalisation by $N$. We could have changed ddof in np.ma.cov to $0$ to copy this approach, but deprecating the legacy approach in favour of your superior one makes sense. 😉

settings.ini

setup.py

elephaint and others added 3 commits August 15, 2024 22:06

new_version

9ec2635

Merge branch 'main' into v0.4.2

a7ffc34

faster_mint

8c57cb5

elephaint and others added 4 commits August 18, 2024 09:34

Merge branch 'main' into feature/efficient_mint

b3f71ea

cleannb

5864db4

fix_tests

10bd91c

add_benchmark

b97f559

elephaint marked this pull request as ready for review August 20, 2024 08:32

elephaint added 2 commits August 20, 2024 13:19

remove_iccrt_for_mac

55b5d1d

add_ridge_param

ae0d9ec

elephaint requested review from jmoralez and AzulGarza August 20, 2024 18:20

christophertitchen reviewed Aug 21, 2024

View reviewed changes

hierarchicalforecast/methods.py Outdated Show resolved Hide resolved

hierarchicalforecast/methods.py Outdated Show resolved Hide resolved

hierarchicalforecast/methods.py Outdated Show resolved Hide resolved

hierarchicalforecast/methods.py Outdated Show resolved Hide resolved

ct_comments

76f6067

elephaint requested a review from christophertitchen August 22, 2024 19:11

elephaint and others added 2 commits August 23, 2024 12:27

further_optim

ac24a09

Merge branch 'main' into feature/efficient_mint

2d05af0

fix_comments

b4e39ec

christophertitchen approved these changes Aug 24, 2024

View reviewed changes

add_nan_method

37121e9

elephaint requested a review from christophertitchen August 27, 2024 09:39

elephaint force-pushed the feature/efficient_mint branch from 904444a to 37121e9 Compare August 27, 2024 10:15

elephaint added 2 commits August 27, 2024 12:20

fix_comment

ca42d06

improve_non_nan_speed

d758051

christophertitchen approved these changes Aug 29, 2024

View reviewed changes

hierarchicalforecast/methods.py Outdated Show resolved Hide resolved

hierarchicalforecast/methods.py Outdated Show resolved Hide resolved

hierarchicalforecast/methods.py Outdated Show resolved Hide resolved

hierarchicalforecast/methods.py Outdated Show resolved Hide resolved

fix_emp_cov_and_nan_handling

9270936

elephaint and others added 4 commits September 2, 2024 14:35

fix_emp_cov_ddof_multiplier_and_comments

021888d

remove_mint_legacy

fc1be25

Merge branch 'main' into feature/efficient_mint

175c713

cleanup

52b244a

jmoralez reviewed Sep 3, 2024

View reviewed changes

settings.ini Outdated Show resolved Hide resolved

elephaint added 3 commits September 3, 2024 11:23

remove_icc_rt_add_intel_cmplr

ec894a2

fix_settings

4d7bb08

fix_listsplit

da426fc

jmoralez reviewed Sep 3, 2024

View reviewed changes

setup.py Outdated Show resolved Hide resolved

jmoralez previously approved these changes Sep 3, 2024

View reviewed changes

requirements_is_never_none

46ee132

elephaint dismissed jmoralez’s stale review via 46ee132 September 3, 2024 15:25

jmoralez approved these changes Sep 3, 2024

View reviewed changes

elephaint merged commit 2e788e4 into main Sep 3, 2024
19 checks passed

elephaint deleted the feature/efficient_mint branch September 3, 2024 15:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEAT] Efficient Schafer-Strimmer for MinT #280

[FEAT] Efficient Schafer-Strimmer for MinT #280

elephaint commented Aug 16, 2024 •

edited

Loading

review-notebook-app bot commented Aug 16, 2024

elephaint commented Aug 23, 2024

christophertitchen left a comment •

edited

Loading

elephaint commented Aug 26, 2024

elephaint commented Aug 27, 2024 •

edited

Loading

christophertitchen left a comment

elephaint commented Sep 2, 2024

christophertitchen commented Sep 2, 2024

[FEAT] Efficient Schafer-Strimmer for MinT #280

[FEAT] Efficient Schafer-Strimmer for MinT #280

Conversation

elephaint commented Aug 16, 2024 • edited Loading

1. Up to 60x faster

2. Forecasting performance

3. Improved NaN handling

review-notebook-app bot commented Aug 16, 2024

elephaint commented Aug 23, 2024

christophertitchen left a comment • edited Loading

Choose a reason for hiding this comment

elephaint commented Aug 26, 2024

elephaint commented Aug 27, 2024 • edited Loading

christophertitchen left a comment

Choose a reason for hiding this comment

elephaint commented Sep 2, 2024

christophertitchen commented Sep 2, 2024

elephaint commented Aug 16, 2024 •

edited

Loading

christophertitchen left a comment •

edited

Loading

elephaint commented Aug 27, 2024 •

edited

Loading