Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to compute L2 norm of distributions #1321

Merged
merged 4 commits into from
May 27, 2021

Conversation

giordano
Copy link
Contributor

@giordano giordano commented May 5, 2021

This implements the L2 norm of some distributions, as requested in #806. I started with continuous univariate distributions for which I was able to work out the integral. The default method throws an error explaining that the function isn't implemented for the given distribution, but other methods can be added later. CC: @ablaom @fkiraly

@azev77
Copy link
Contributor

azev77 commented May 5, 2021

Thanks for this PR!
I noticed you take a slightly different approach.
Usually there is a file for each distribution (such as Cauchy.jl) and there are various properties (mean/var/cdf/fit) within that file.

The approach you took is to instead create one file (property.jl) and implement that property for many distributions in that file. I think I like this.

src/pdfnorm.jl Outdated Show resolved Hide resolved
src/pdfnorm.jl Outdated Show resolved Hide resolved
src/pdfnorm.jl Outdated Show resolved Hide resolved
@giordano
Copy link
Contributor Author

giordano commented May 5, 2021

The approach you took is to instead create one file (property.jl) and implement that property for many distributions in that file.

I initially considered defining the methods in the distributions files, but development and testing is much faster:

julia> @time include("test/pdfnorm.jl");
Test Summary: | Pass  Total
pdf L2 norm   |   28     28
  0.083126 seconds (12.20 k allocations: 954.906 KiB, 0.00% compilation time)

compared to the >10 minutes for running the whole test suite, that's a 7500x improvement in productivity

src/pdfnorm.jl Outdated Show resolved Hide resolved
src/pdfnorm.jl Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented May 6, 2021

Codecov Report

Merging #1321 (fbd84e8) into master (a746130) will increase coverage by 0.05%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1321      +/-   ##
==========================================
+ Coverage   82.08%   82.14%   +0.05%     
==========================================
  Files         115      116       +1     
  Lines        6577     6598      +21     
==========================================
+ Hits         5399     5420      +21     
  Misses       1178     1178              
Impacted Files Coverage Δ
src/pdfnorm.jl 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a746130...fbd84e8. Read the comment docs.

src/pdfnorm.jl Show resolved Hide resolved
src/pdfnorm.jl Outdated Show resolved Hide resolved
@fkiraly
Copy link

fkiraly commented May 10, 2021

Hm, @giordano, this takes me back - when was it that we discussed this, 2 years ago? Nice that you were still working on this!

Indeed I like the property.jl approach, especially since later you may like to dispatch on two distributions, e.g., implementing the L2 inner product $\int f_d(x|\theta) f_{d'}(x|\theta')$ where $d,d'$ are two different parametric distributions, or sth like the energy distance $\mathbb{E}_{X\sim d, Y\sim d'}[X-Y]$.

Just to avoid duplication of effort in working out the integrals: I believe @RaphaelS1 and @aintoha have worked out a couple of these (entropy and cross-entropy/interaction terms) for various scoring rules, and there are also a number of other papers which have lists of integrals, e.g., for energy scores and integrated Brier (which @RaphaelS1 and @aintoha can point to).

Regarding the naming discussion: there is on occasion a bit of terminology confusion between Brier/Gneiting loss ($L(p,x) = -2p(x) + |p|_2^2$) and integrated Brier loss ($L(F,x) = \int (F(y) - \Theta(y-x))^2 dy$) aka RPS, so I'd avoid using the name "Brier" in distribution properties as opposed to scoring rules or losses.

@fkiraly
Copy link

fkiraly commented May 10, 2021

PS: I'm happy to help implement the squared L1 norm of the pdf for the most popular distributions if anyone wants that.

Copy link
Member

@devmotion devmotion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor suggestions, otherwise the PR looks good.

src/pdfnorm.jl Outdated Show resolved Hide resolved
src/pdfnorm.jl Outdated Show resolved Hide resolved
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
src/pdfnorm.jl Outdated Show resolved Hide resolved
Co-authored-by: David Widmann <devmotion@users.noreply.github.com>
@giordano
Copy link
Contributor Author

Bump 🙂

@devmotion devmotion merged commit d285de5 into JuliaStats:master May 27, 2021
@devmotion
Copy link
Member

Sorry for the delay, I wanted to make sure that people have time to comment on it but then I completely forgot the PR 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants