Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve numerical stability of CCA variance #629

Merged
merged 1 commit into from
Jun 22, 2024

Conversation

stephenswat
Copy link
Member

@stephenswat stephenswat commented Jun 22, 2024

I noticed that, at some point, a factor of $\frac{1}{12}$ was added to the variance of measurements and this slipped through because the tolerance on the variance test was extremely large, i.e. no smaller than 0.1. This is an unacceptably high tolerance, and so I decided that the variance computation was in need of an update. I decided to adopt two strategies to do this. The first is the implementation of Welford's online algorithm, which relies on the following recurrence relation:

$$\sigma^2_n = \left(1 - \frac{w_n}{W_n}\right) \sigma^2_{n-1} + \frac{w_n}{W_n} (x_n - \mu_n) (x_n - \mu_{n-1})$$

This is significantly less prone to catastrophic cancellation. Second, I shifted the entire computation by the position of the first cell, which brings the computation closer to zero where floating point computation is more accurate. This depends on two equivalences:

$$\mu(x_1, \ldots, x_n) = \mu(x_1 - C, \ldots, x_n - C) + C$$

and

$$\sigma^2(x_1, \ldots, x_n) = \sigma^2(x_1 - C, \ldots, x_n - C)$$

Combined, these factors allow me to drop the tolerance in the tests from a minimum of 0.1 to a fixed value of 0.0001.

@stephenswat stephenswat added tests Make sure the code keeps working improvement Improve an existing feature shared Changes related to shared code labels Jun 22, 2024
@stephenswat stephenswat changed the title Improve numerical stability of CCL variance Improve numerical stability of CCA variance Jun 22, 2024
@stephenswat
Copy link
Member Author

I also had to update the CPU measurement creation algorithm which, as I discovered in this PR, wasn't actually computing the variance at all and just defaulting to $\frac{1}{12}$. 🤭

Copy link
Member

@krasznaa krasznaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very supportive overall. 👍

@stephenswat
Copy link
Member Author

Updated. 👍

I noticed that, at some point, a factor of $\frac{1}{12}$ was added to
the variance of measurements and this slipped through because the
tolerance on the variance test was extremely large, i.e. no smaller than
0.1. This is an unacceptably high tolerance, and so I decided that the
variance computation was in need of an update. I decided to adopt two
strategies to do this. The first is the implementation of Welford's
online algorithm, which relies on the following recurrence relation:

$$\sigma^2_n = (1 - \frac{w_n}{W_n}) \sigma^2_{n-1} + \frac{w_n}{W_n} *
(x_n - \mu_n) (x_n - \mu_{n-1})$$

This is significantly less prone to catastrophic cancellation. Second, I
shifted the entire computation by the position of the first cell, which
brings the computation closer to zero where floating point computation
is more accurate. This depends on two equivalences:

$$\mu(x_1, \ldots, x_n) = \mu(x_1 - C, \ldots, x_n - C) + C$$

and

$$\sigma^2(x_1, \ldots, x_n) = \sigma^2(x_1 - C, \ldots, x_n - C)$$

Combined, these factors allow me to drop the tolerance in the tests from
a _minimum_ of 0.1 to a fixed value of 0.0001.
@stephenswat stephenswat merged commit f22a970 into acts-project:main Jun 22, 2024
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improve an existing feature shared Changes related to shared code tests Make sure the code keeps working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants