Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ak.std and ak.var for axis != -1. #2918

Merged
merged 2 commits into from
Jan 12, 2024

Conversation

jpivarski
Copy link
Member

Although $\frac{1}{n}\sum_i (x_i - \mbox{mean}(x))^2$ is more precise than $(\frac{1}{n}\sum_i x_i^2) - \mbox{mean}(x)$, the latter can only be broadcasted when axis=-1. This had been failing without a useful error message; now it works (with less precision).

Copy link

codecov bot commented Jan 2, 2024

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (eb004ce) 81.93% compared to head (a61df73) 81.88%.
Report is 1 commits behind head on main.

Additional details and impacted files
Files Coverage Δ
src/awkward/operations/ak_var.py 82.69% <80.00%> (-1.31%) ⬇️

... and 3 files with indirect coverage changes

@agoose77
Copy link
Collaborator

agoose77 commented Jan 3, 2024

This is probably the same issue as #2760, so we should look at softmax too. I'd be tempted to keep the axis=-1 path, and add the new logic as an axis != -1 case in order to improve precision of the result in what I suspect would be the most common case.

@jpivarski
Copy link
Member Author

Note to self:

      def test_dtype_deprecated():
          form = ak.forms.EmptyForm()
  >       with pytest.warns(
              DeprecationWarning,
              match=r"the `dtype` parameter in EmptyForm\.to_NumpyForm is deprecated",
          ):
  E       DeprecationWarning: In version 2.4.0, this will be an error.
  E       To raise these warnings as errors (and get stack traces to find out where they're called), run
  E           import warnings
  E           warnings.filterwarnings("error", module="awkward.*")
  E       after the first `import awkward` or use `@pytest.mark.filterwarnings("error:::awkward.*")` in pytest.
  E       Issue: from_dtype conversion of temporal units to generic `datetime64` and `timedelta64` types is deprecated, pass `time_units_as_parameter=False` to disable this warning..
  
  form       = EmptyForm()
  next_form  = NumpyForm('int64')
  
  /project/tests/test_2503_deprecate_to_numpyform.py:13: DeprecationWarning

@jpivarski
Copy link
Member Author

This is probably the same issue as #2760, so we should look at softmax too. I'd be tempted to keep the axis=-1 path, and add the new logic as an axis != -1 case in order to improve precision of the result in what I suspect would be the most common case.

I noticed this in association with softmax when I had to turn a test from xfail to normal. I haven't looked at softmax yet, but it's likely the same issue of trying to compute it by calling the other semi-reducer functions 1. For axis=-1, calling another semi-reducer, even with the right axis, puts it into a shape that isn't broadcastable with the original. Computing things from associate reducers only removes the need for any broadcasting.

About the precision, it would be a bad surprise for different axis values to have different precisions due to different algorithms. This is as much something to think about as the Kahan summation (which would turn even ak.sum into a "semi-reducer.")

Footnotes

  1. by which I mean functions that do reduce a dimension, but do not satisfy associativity.

@jpivarski jpivarski enabled auto-merge (squash) January 12, 2024 16:26
@jpivarski jpivarski merged commit 5769911 into main Jan 12, 2024
38 checks passed
@jpivarski jpivarski deleted the jpivarski/fix-var-axis-not-minus-one branch January 12, 2024 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants