-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support for writing hist derived profiles #1000
Conversation
…wx, fTsumwx2, fTsumwy, fTsumw2.
… by root to not throw segmentation error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without weights, this implementation round-trips through boost-histogram/hist, but the intermediate values in boost-histogram/hist are wrong. If someone takes a TProfile into hist and plots it, they'll see completely different values than they would if they plot the TProfile in ROOT.
With weights, the round-tripped values are subtly different.
@henryiii, do you have the TProfile ↔ boost-histogram transformations that Hans derived?
Below is a terminal-based test of histograms without weights. Your test using uproot.from_pyroot
is fine; I was testing this by going through files in case there's a difference. (This is how users would ordinarily do it.)
>>> import ROOT
>>> f = ROOT.TFile("/tmp/whatever.root", "RECREATE")
>>> h1 = ROOT.TProfile("h1", "", 2, 0, 2)
>>> h1.Fill(0.5, 10)
1
>>> h1.Fill(0.5, 20)
1
>>> h1.Fill(0.5, 30)
1
>>> h1.Fill(0.5, 40)
1
>>> h1.Write()
284
>>> f.Close()
(new Python session)
>>> import ROOT
>>> f = ROOT.TFile("/tmp/whatever.root")
>>> h1 = f.Get("h1")
>>> h1.GetBinContent(1)
25.0
>>> h1.GetBinError(1)
5.5901699437494745
>>> import uproot
>>> h2 = uproot.open("/tmp/whatever.root:h1")
>>> h2.values()
array([25., 0.]) # good
>>> h2.errors()
array([5.59016994, 0. ]) # good
>>> import numpy as np
>>> h3 = h2.to_hist()
>>> h3.values()
array([100., 0.]) # this is very different
>>> np.sqrt(h3.variances())
array([2.79508497, nan]) # this is very different
>>> f2 = uproot.recreate("/tmp/whatever2.root")
>>> f2["h2"] = h2
>>> f2["h3"] = h3
(new Python session)
>>> import ROOT
>>> f = ROOT.TFile("/tmp/whatever2.root")
>>> h2 = f.Get("h2")
>>> h3 = f.Get("h3")
>>> h2.GetBinContent(1) # good
25.0
>>> h2.GetBinError(1) # good
5.5901699437494745
>>> h3.GetBinContent(1) # good
25.0
>>> h3.GetBinError(1) # good
5.5901699437494745
Below is the same test, but this time, the histogram is filled with weights.
>>> import ROOT
>>> f = ROOT.TFile("/tmp/whatever.root", "RECREATE")
>>> h1 = ROOT.TProfile("h1", "", 2, 0, 2)
>>> h1.Fill(0.5, 10, 1)
1
>>> h1.Fill(0.5, 20, 1.2)
1
>>> h1.Fill(0.5, 30, 0.75)
1
>>> h1.Fill(0.5, 40, 0.1)
1
>>> h1.Write()
301
>>> f.Close()
(new Python session)
>>> import ROOT
>>> f = ROOT.TFile("/tmp/whatever.root")
>>> h1 = f.Get("h1")
>>> h1.GetBinContent(1)
19.83606557377049
>>> h1.GetBinError(1)
4.776944517205074
>>> import uproot
>>> h2 = uproot.open("/tmp/whatever.root:h1")
>>> h2.values()
array([19.83606557, 0. ]) # good
>>> h2.errors()
array([4.77694452, 0. ]) # good
>>> import numpy as np
>>> h3 = h2.to_hist()
>>> h3.values()
array([60.5, 0. ]) # this is very different
>>> np.sqrt(h3.variances())
array([2.73527047, nan]) # this is very different
>>> f2 = uproot.recreate("/tmp/whatever2.root")
>>> f2["h2"] = h2
>>> f2["h3"] = h3
(new Python session)
>>> import ROOT
>>> f = ROOT.TFile("/tmp/whatever2.root")
>>> h2 = f.Get("h2")
>>> h3 = f.Get("h3")
>>> h2.GetBinContent(1)
19.83606557377049 # good
>>> h2.GetBinError(1)
4.776944517205074 # good
>>> h3.GetBinContent(1)
19.592179521633962 # close, but it's a little different
>>> h3.GetBinError(1)
4.9077505439230436 # also close, but not quite right
The above happens because internally, in The variances should be correct, we set them as the square of errors, but as I understood from Henry, they are internally recomputed using the "_sum_of_weighted_deltas_squared”, which we set as :
When writing them to a file and checking from ROOT, it is all good because we made the
>>> h2.GetBinContent(1) # good
|
The new docs/readthedocs.org:uproot test requirement can be satisfied by merging #1084/jpivarski/fix-readthedocs-documentation into this PR (or by merging in |
This PR is ready to be merged; it adds the ability to write TProfile objects to ROOT files and the test is 100% showing that they can be round-tripped from ROOT to Uproot to ROOT again. This doesn't say anything about whether these TProfiles can come from or go to boost-histogram, but that's a separate problem for boost-histogram/hist if it turns out that they can't. |
* origin/main: (41 commits) chore: hatch-vcs generates 'version.py'; add it to .gitignore docs: add bnavigator as a contributor for test (#1087) feat: support for writing hist derived profiles (#1000) feat: add the ability to read RNTuple alias columns (#1004) chore: update pre-commit hooks (#1082) docs: fix ReadTheDocs documentation (#1084) chore: update pre-commit hooks (#1073) chore(deps): bump actions/upload-artifact from 3 to 4 (#1071) chore(deps): bump actions/download-artifact from 3 to 4 (#1072) build: change build to autogen version info (#1062) chore: remove src/uproot/version.py (for #1062). The next release will be 5.2.1. add known_base_form option so that opening root files can be avoided for mature analyses (#1077) test: fsspec cache (#1075) test: xrootd server fixture (#1076) The next release will be 5.2.0. fix: correct typo in fsspec globbing (#1067) chore(deps): bump actions/setup-python from 4 to 5 (#1059) The next release will be 5.2.0rc5. fix: dask distributed fsspec issue (#1065) ...
This PR aims to add a writing feature for hist derived profiles; As requested in #908