Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support for writing hist derived profiles #1000

Merged
merged 10 commits into from
Jan 18, 2024

Conversation

ioanaif
Copy link
Collaborator

@ioanaif ioanaif commented Oct 19, 2023

This PR aims to add a writing feature for hist derived profiles; As requested in #908

@ioanaif ioanaif linked an issue Oct 19, 2023 that may be closed by this pull request
@ioanaif ioanaif marked this pull request as ready for review October 26, 2023 15:57
@ioanaif ioanaif requested a review from jpivarski October 26, 2023 15:57
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without weights, this implementation round-trips through boost-histogram/hist, but the intermediate values in boost-histogram/hist are wrong. If someone takes a TProfile into hist and plots it, they'll see completely different values than they would if they plot the TProfile in ROOT.

With weights, the round-tripped values are subtly different.

@henryiii, do you have the TProfile ↔ boost-histogram transformations that Hans derived?


Below is a terminal-based test of histograms without weights. Your test using uproot.from_pyroot is fine; I was testing this by going through files in case there's a difference. (This is how users would ordinarily do it.)

>>> import ROOT
>>> f = ROOT.TFile("/tmp/whatever.root", "RECREATE")
>>> h1 = ROOT.TProfile("h1", "", 2, 0, 2)
>>> h1.Fill(0.5, 10)
1
>>> h1.Fill(0.5, 20)
1
>>> h1.Fill(0.5, 30)
1
>>> h1.Fill(0.5, 40)
1
>>> h1.Write()
284
>>> f.Close()

(new Python session)

>>> import ROOT
>>> f = ROOT.TFile("/tmp/whatever.root")
>>> h1 = f.Get("h1")
>>> h1.GetBinContent(1)
25.0
>>> h1.GetBinError(1)
5.5901699437494745

>>> import uproot
>>> h2 = uproot.open("/tmp/whatever.root:h1")
>>> h2.values()
array([25.,  0.])                    # good
>>> h2.errors()
array([5.59016994, 0.        ])      # good

>>> import numpy as np
>>> h3 = h2.to_hist()
>>> h3.values()
array([100.,   0.])                  # this is very different
>>> np.sqrt(h3.variances())
array([2.79508497,        nan])      # this is very different

>>> f2 = uproot.recreate("/tmp/whatever2.root")
>>> f2["h2"] = h2
>>> f2["h3"] = h3

(new Python session)

>>> import ROOT
>>> f = ROOT.TFile("/tmp/whatever2.root")
>>> h2 = f.Get("h2")
>>> h3 = f.Get("h3")

>>> h2.GetBinContent(1)              # good
25.0
>>> h2.GetBinError(1)                # good
5.5901699437494745

>>> h3.GetBinContent(1)              # good
25.0
>>> h3.GetBinError(1)                # good
5.5901699437494745

Below is the same test, but this time, the histogram is filled with weights.

>>> import ROOT
>>> f = ROOT.TFile("/tmp/whatever.root", "RECREATE")
>>> h1 = ROOT.TProfile("h1", "", 2, 0, 2)
>>> h1.Fill(0.5, 10, 1)
1
>>> h1.Fill(0.5, 20, 1.2)
1
>>> h1.Fill(0.5, 30, 0.75)
1
>>> h1.Fill(0.5, 40, 0.1)
1
>>> h1.Write()
301
>>> f.Close()

(new Python session)

>>> import ROOT
>>> f = ROOT.TFile("/tmp/whatever.root")
>>> h1 = f.Get("h1")
>>> h1.GetBinContent(1)
19.83606557377049
>>> h1.GetBinError(1)
4.776944517205074

>>> import uproot
>>> h2 = uproot.open("/tmp/whatever.root:h1")
>>> h2.values()
array([19.83606557,  0.        ])    # good
>>> h2.errors()
array([4.77694452, 0.        ])      # good

>>> import numpy as np
>>> h3 = h2.to_hist()
>>> h3.values()
array([60.5,  0. ])                  # this is very different
>>> np.sqrt(h3.variances())
array([2.73527047,        nan])      # this is very different

>>> f2 = uproot.recreate("/tmp/whatever2.root")
>>> f2["h2"] = h2
>>> f2["h3"] = h3

(new Python session)

>>> import ROOT
>>> f = ROOT.TFile("/tmp/whatever2.root")
>>> h2 = f.Get("h2")
>>> h3 = f.Get("h3")

>>> h2.GetBinContent(1)
19.83606557377049                    # good
>>> h2.GetBinError(1)
4.776944517205074                    # good

>>> h3.GetBinContent(1)
19.592179521633962                   # close, but it's a little different
>>> h3.GetBinError(1)
4.9077505439230436                   # also close, but not quite right

@ioanaif
Copy link
Collaborator Author

ioanaif commented Oct 27, 2023

h2 = uproot.open("/tmp/whatever.root:h1")
h2.values()
array([25., 0.]) # good
h2.errors()
array([5.59016994, 0. ]) # good

import numpy as np
h3 = h2.to_hist()
h3.values()
array([100., 0.]) # this is very different
np.sqrt(h3.variances())
array([2.79508497, nan]) # this is very different



The above happens because internally, in to_hist we define values to be equal to data; which means than values in hist is set to be the sum of [10,20,30,40] = [100].



The variances should be correct, we set them as the square of errors, but as I understood from Henry, they are internally recomputed using the "_sum_of_weighted_deltas_squared”, which we set as :

view["_sum_of_weighted_deltas_squared"] = variances * (
            sum_of_bin_weights - sum_of_bin_weights_squared / sum_of_bin_weights
        )

When writing them to a file and checking from ROOT, it is all good because we made the values data swap (the same that created the .values() mismatch above)


>>> h2.GetBinContent(1) # good
25.0

h2.GetBinError(1) # good
5.5901699437494745

h3.GetBinContent(1) # good
25.0
h3.GetBinError(1) # good
5.5901699437494745

@jpivarski
Copy link
Member

The new docs/readthedocs.org:uproot test requirement can be satisfied by merging #1084/jpivarski/fix-readthedocs-documentation into this PR (or by merging in main, which has it).

@jpivarski
Copy link
Member

This PR is ready to be merged; it adds the ability to write TProfile objects to ROOT files and the test is 100% showing that they can be round-tripped from ROOT to Uproot to ROOT again.

This doesn't say anything about whether these TProfiles can come from or go to boost-histogram, but that's a separate problem for boost-histogram/hist if it turns out that they can't.

@jpivarski jpivarski enabled auto-merge (squash) January 18, 2024 15:32
@jpivarski jpivarski merged commit 4c2a5e5 into main Jan 18, 2024
21 checks passed
@jpivarski jpivarski deleted the ioanaif/feat-write-tprofile branch January 18, 2024 16:37
lobis added a commit that referenced this pull request Jan 22, 2024
* origin/main: (41 commits)
  chore: hatch-vcs generates 'version.py'; add it to .gitignore
  docs: add bnavigator as a contributor for test (#1087)
  feat: support for writing hist derived profiles (#1000)
  feat: add the ability to read RNTuple alias columns (#1004)
  chore: update pre-commit hooks (#1082)
  docs: fix ReadTheDocs documentation (#1084)
  chore: update pre-commit hooks (#1073)
  chore(deps): bump actions/upload-artifact from 3 to 4 (#1071)
  chore(deps): bump actions/download-artifact from 3 to 4 (#1072)
  build: change build to autogen version info (#1062)
  chore: remove src/uproot/version.py (for #1062).
  The next release will be 5.2.1.
  add known_base_form option so that opening root files can be avoided for mature analyses (#1077)
  test: fsspec cache (#1075)
  test: xrootd server fixture (#1076)
  The next release will be 5.2.0.
  fix: correct typo in fsspec globbing (#1067)
  chore(deps): bump actions/setup-python from 4 to 5 (#1059)
  The next release will be 5.2.0rc5.
  fix: dask distributed fsspec issue (#1065)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Request for support of writing hist-derived profiles
2 participants