Add normalized version of MVN entropy estimator and fix floating point issue #20

mahlzahn · 2023-11-11T12:49:55Z

1. Add agument `normalized`

Add an argument normalized to the get_h_mvn function which returns the entropy of the normalized MVN distribution by normalizing such that its variance is 1 and the covariance matrix becomes equal to the Pearson correlation coefficients. Thus, the entropy becomes invariant under (some) linear transformation (scalar multiplication).

import numpy as np
from entropy_estimators.continuous import get_h_mvn
rng = np.random.default_rng(seed=0)
a = rng.normal(scale=1, size=10000).reshape(-1, 1)
b = rng.normal(scale=2, size=10000).reshape(-1, 1)
c = 5 * a + 7
d = np.c_[a,b]
r = np.pi / 4
rot = [[np.cos(r), -np.sin(r)], [np.sin(r), np.cos(r)]]
e = (rot @ d.T).T
f = d * [5, 2] + [-3, 8]
g = np.c_[a, a + b/1e5]
h = np.c_[a, a + b/1e9]
i = np.c_[a, a]
dists = [a, b, c, d, e, f, g, h, i]
print('|  |a |b |5a+7|d=[a b]|rot(d)|[5 2]⋅d+[-3 8]|[a a+b/1e5]|[a a+b/1e9]|[a a]|')
print('|--|--|--|----|-------|------|--------------|-----------|-----------|-----|')
print('|μ', *[' '.join([f'{s:.2f}' for s in np.ravel(x.mean(axis=0))]) for x in dists], sep='|', end='|\n')
print('|σ', *[' '.join([f'{s:.2f}' for s in np.ravel(x.std(ddof=1, axis=0))]) for x in dists], sep='|', end='|\n')
print('|H', *[f'{get_h_mvn(x):.2f}' for x in dists], sep='|', end='|\n')
print('|H’', *[f'{get_h_mvn(x, normalized=True):.2f}' for x in dists], sep='|', end='|\n')

calculates the entropy H and the normalized entropy H’ for two distributions a and b and a third is c=5a+10, etc.:

	a	b	5a+7	d=[a b]	rot(d)	[5 2]⋅d+[-3 8]	[a a+b/1e5]	[a a+b/1e9]	[a a]
μ	0.01	0.01	7.03	0.01 0.01	0.00 0.01	-2.97 8.01	0.01 0.01	0.01 0.01	0.01 0.01
σ	1.00	1.99	4.99	1.00 1.99	1.58 1.56	4.99 3.98	1.00 1.00	1.00 1.00	1.00 1.00
H	1.42	2.11	3.03	3.52	3.52	5.83	-7.99	nan	-inf
H’	1.42	1.42	1.42	2.84	2.62	2.84	-7.99	nan	-inf

Thus, the normalized entropy of a MVN random variable X with dimension d is equal to

H(X) = d ⋅ log(2 ⋅ π ⋅ e) / 2 = 1.42 ⋅ d.

This is also the maximum normalized entropy for a d-dimensional variable. It is lower if the components are correlated, e.g., in the case of rotated 2D MVN random variable (see table above).

2. Fix floating point issue

The current implementation fails to calculate the entropy properly of highly correlated variables because of float resolution. I fixed this by returning -inf if the determinant of the Pearson correlation coefficients matrix equals 0 and nan if the determinant is close to 0 (|det(…)|<10⁻¹³). The last three columns of above table demonstrate the new behaviour. The entropy of [a a+b/1e5] is -7.99, of [a a+b/1e9] is nan and of [a a] is -inf, indicating that the second one cannot be calculated.

3. Speed-up of MVN entropy estimate for 1D variables

… by using the variance instead of the covariance matrix calculation

paulbrodersen · 2023-11-29T12:08:28Z

Add argument normalized

Could you expand a bit on the motivation, or provide some references and/or applications?

Fix floating point issue. The current implementation fails to calculate the entropy properly of highly correlated variables because of float resolution.

Much appreciated.

Speed-up of MVN entropy estimate for 1D variables by using the variance instead of the covariance matrix calculation.

Did you time it? Since both implementations ultimately rely on LAPACK/OpenBLAS, I would be shocked if the difference was substantial (> 1.5x).

mahlzahn · 2023-11-29T13:02:45Z

Speed-up of MVN entropy estimate for 1D variables by using the variance instead of the covariance matrix calculation.

Did you time it? Since both implementations ultimately rely on LAPACK/OpenBLAS, I would be shocked if the difference was substantial (> 1.5x).

~2 times in my tests. As I am running entropy for thousands of variables or pairs, I’d say it matters (a bit) ;)

paulbrodersen · 2023-11-29T15:35:18Z

Speed-up of MVN entropy estimate for 1D variables by using the variance instead of the covariance matrix calculation.

Did you time it? Since both implementations ultimately rely on LAPACK/OpenBLAS, I would be shocked if the difference was substantial (> 1.5x).

~2 times in my tests. As I am running entropy for thousands of variables or pairs, I’d say it matters (a bit) ;)

Alright, I hate the increase in code complexity but we don't leave factors of two on the table.

paulbrodersen · 2023-12-11T11:53:19Z

When you have time, could you expand a bit on the motivation for the normalization, or provide some references and/or applications? I don't want to support something even I don't understand. ;-)

mahlzahn added 2 commits November 10, 2023 13:30

add normalized version of MVN entropy estimator

1cf97ee

fix floating point resolution issue

4bb63a2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add normalized version of MVN entropy estimator and fix floating point issue #20

Add normalized version of MVN entropy estimator and fix floating point issue #20

mahlzahn commented Nov 11, 2023 •

edited

Loading

paulbrodersen commented Nov 29, 2023

mahlzahn commented Nov 29, 2023

paulbrodersen commented Nov 29, 2023

paulbrodersen commented Dec 11, 2023

Add normalized version of MVN entropy estimator and fix floating point issue #20

Are you sure you want to change the base?

Add normalized version of MVN entropy estimator and fix floating point issue #20

Conversation

mahlzahn commented Nov 11, 2023 • edited Loading

1. Add agument normalized

2. Fix floating point issue

3. Speed-up of MVN entropy estimate for 1D variables

paulbrodersen commented Nov 29, 2023

mahlzahn commented Nov 29, 2023

paulbrodersen commented Nov 29, 2023

paulbrodersen commented Dec 11, 2023

mahlzahn commented Nov 11, 2023 •

edited

Loading

1. Add agument `normalized`