Improve numerical stability of CCA variance #629

stephenswat · 2024-06-22T10:53:57Z

I noticed that, at some point, a factor of $\frac{1}{12}$ was added to the variance of measurements and this slipped through because the tolerance on the variance test was extremely large, i.e. no smaller than 0.1. This is an unacceptably high tolerance, and so I decided that the variance computation was in need of an update. I decided to adopt two strategies to do this. The first is the implementation of Welford's online algorithm, which relies on the following recurrence relation:

$$\sigma^2_n = \left(1 - \frac{w_n}{W_n}\right) \sigma^2_{n-1} + \frac{w_n}{W_n} (x_n - \mu_n) (x_n - \mu_{n-1})$$

This is significantly less prone to catastrophic cancellation. Second, I shifted the entire computation by the position of the first cell, which brings the computation closer to zero where floating point computation is more accurate. This depends on two equivalences:

$$\mu(x_1, \ldots, x_n) = \mu(x_1 - C, \ldots, x_n - C) + C$$

and

$$\sigma^2(x_1, \ldots, x_n) = \sigma^2(x_1 - C, \ldots, x_n - C)$$

Combined, these factors allow me to drop the tolerance in the tests from a minimum of 0.1 to a fixed value of 0.0001.

stephenswat · 2024-06-22T11:33:26Z

I also had to update the CPU measurement creation algorithm which, as I discovered in this PR, wasn't actually computing the variance at all and just defaulting to $\frac{1}{12}$. 🤭

krasznaa

I'm very supportive overall. 👍

core/include/traccc/clusterization/impl/measurement_creation.ipp

tests/common/tests/cca_test.hpp

device/common/include/traccc/clusterization/device/impl/aggregate_cluster.ipp

stephenswat · 2024-06-22T14:20:36Z

Updated. 👍

I noticed that, at some point, a factor of $\frac{1}{12}$ was added to the variance of measurements and this slipped through because the tolerance on the variance test was extremely large, i.e. no smaller than 0.1. This is an unacceptably high tolerance, and so I decided that the variance computation was in need of an update. I decided to adopt two strategies to do this. The first is the implementation of Welford's online algorithm, which relies on the following recurrence relation: $$\sigma^2_n = (1 - \frac{w_n}{W_n}) \sigma^2_{n-1} + \frac{w_n}{W_n} * (x_n - \mu_n) (x_n - \mu_{n-1})$$ This is significantly less prone to catastrophic cancellation. Second, I shifted the entire computation by the position of the first cell, which brings the computation closer to zero where floating point computation is more accurate. This depends on two equivalences: $$\mu(x_1, \ldots, x_n) = \mu(x_1 - C, \ldots, x_n - C) + C$$ and $$\sigma^2(x_1, \ldots, x_n) = \sigma^2(x_1 - C, \ldots, x_n - C)$$ Combined, these factors allow me to drop the tolerance in the tests from a _minimum_ of 0.1 to a fixed value of 0.0001.

stephenswat added tests Make sure the code keeps working improvement Improve an existing feature shared Changes related to shared code labels Jun 22, 2024

stephenswat requested review from krasznaa and niermann999 June 22, 2024 10:53

stephenswat changed the title ~~Improve numerical stability of CCL variance~~ Improve numerical stability of CCA variance Jun 22, 2024

stephenswat mentioned this pull request Jun 22, 2024

Add SYCL tests for CCA code #630

Merged

stephenswat force-pushed the enh/welford branch from e81f09a to e238fa4 Compare June 22, 2024 11:31

krasznaa requested changes Jun 22, 2024

View reviewed changes

core/include/traccc/clusterization/impl/measurement_creation.ipp Show resolved Hide resolved

tests/common/tests/cca_test.hpp Outdated Show resolved Hide resolved

device/common/include/traccc/clusterization/device/impl/aggregate_cluster.ipp Outdated Show resolved Hide resolved

stephenswat force-pushed the enh/welford branch from e238fa4 to 65fa953 Compare June 22, 2024 14:20

stephenswat force-pushed the enh/welford branch from 65fa953 to 09c3264 Compare June 22, 2024 14:20

stephenswat requested a review from krasznaa June 22, 2024 14:21

krasznaa approved these changes Jun 22, 2024

View reviewed changes

stephenswat merged commit f22a970 into acts-project:main Jun 22, 2024
29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve numerical stability of CCA variance #629

Improve numerical stability of CCA variance #629

stephenswat commented Jun 22, 2024 •

edited

Loading

stephenswat commented Jun 22, 2024

krasznaa left a comment

stephenswat commented Jun 22, 2024

Improve numerical stability of CCA variance #629

Improve numerical stability of CCA variance #629

Conversation

stephenswat commented Jun 22, 2024 • edited Loading

stephenswat commented Jun 22, 2024

krasznaa left a comment

Choose a reason for hiding this comment

stephenswat commented Jun 22, 2024

stephenswat commented Jun 22, 2024 •

edited

Loading