Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting from scikit-hep histogram to Table #243

Merged
merged 12 commits into from
Nov 15, 2023

Conversation

yimuchen
Copy link
Contributor

@yimuchen yimuchen commented Oct 27, 2023

  • Adding the scikit-hep histograms conversion function
  • Adding help function to simplify the Table/Variable declaration
  • Properly add test cases

📚 Documentation preview 📚: https://hepdata-lib--243.org.readthedocs.build/en/243/

@clelange
Copy link
Collaborator

Thanks, this looks useful! I've triggered the tests, but we should have tests as you also point out to make sure this is working and will continue to work in the future. I'll wait for you to add some. Let me know if you need help.

@yimuchen
Copy link
Contributor Author

Cool! If the developers think this contribution is in the right direction, I will continue rounding out the features, testing and the linting! More to come over the next few days.

@yimuchen
Copy link
Contributor Author

@clelange I have included all the features that for testing. Also checked with against pylint for the new hist_utils.py.

@GraemeWatt
Copy link
Member

@yimuchen : can you please also check the linting of tests/test_histread.py? The CI runs python -m pylint tests/*.py --rcfile=tests/pylintrc which is currently failing.

@@ -47,8 +47,10 @@ def test_default_read(self):
"""
try:
readout = read_hist(TestHistUtils.base_hist)
# pylint: disable=W0702
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using this try-except block here (and also below) in the first place? Does it ever fail? If so, you should know which kind of exception is thrown and use that. Please do not disable pylint warnings.

Copy link
Contributor Author

@yimuchen yimuchen Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is following the format that was found in cfilereader and other tests [1].

Hm... I guess since the format parsing is pure python. I should actually remove the try/except all together, and simply have pytest handle the exception?

[1] https://github.com/HEPData/hepdata_lib/blob/main/tests/test_cfilereader.py#L48

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, OK, I had forgotten that this is used elsewhere. I think also in https://github.com/HEPData/hepdata_lib/blob/main/tests/test_cfilereader.py#L48 it should actually not be used, but let's not worry about that here.
I'm wondering what test_default_read is meant to test. Can read_hist(TestHistUtils.base_hist) actually fail? I would think that it can only happen if the hist.Hist() interface changes. So I would think that you can just read in the file and get rid of the try-except. However, there might be other ways that reading fails. For example, is it possible that an invalid hist is provided?

Copy link
Contributor Author

@yimuchen yimuchen Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of the test function is to check if axis information is properly formatted into compatible lists (hepdata_lib basically wants everything to be in 1D list, so all axis information needs to be multiplied out)

Technically, yes, the read_hist function can fail, as it depends on the storage format of the histogram entries [1], which can be user defined. For completeness’ sake, let me work on getting all official storage formats [2] into the package, and just have the function throw an error for user-defined formats for now.

[1] https://github.com/yimuchen/hepdata_lib/blob/hist_reader/hepdata_lib/hist_utils.py#L39
[2] https://github.com/scikit-hep/hist/blob/main/src/hist/storage.py#L14

@yimuchen
Copy link
Contributor Author

yimuchen commented Oct 30, 2023

Hm... I just checked, the hist package is only strictly available for python>=3.7 [1], so that is why some of the CI scripts are failing. I'm not sure if there is an easy way to limit sub-packages to require a different python version.

[1] https://pypi.org/project/hist/

@clelange
Copy link
Collaborator

Hm... I just checked, the hist package is only strictly available for python>=3.7 [1], so that is why some of the CI scripts are failing. I'm not sure if there is an easy way to limit sub-packages to require a different python version.

[1] https://pypi.org/project/hist/

Yes, there's a way, but I need to look it up. We could either deprecate Python 3.6 since it's EOL anyway or do that. I'll get back to you later (hopefully tomorrow) in case nobody else has done so in the meantime.

@codecov-commenter
Copy link

codecov-commenter commented Oct 30, 2023

Codecov Report

Merging #243 (11ae275) into main (8dcf6a1) will decrease coverage by 0.51%.
The diff coverage is 81.18%.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

@@            Coverage Diff             @@
##             main     #243      +/-   ##
==========================================
- Coverage   88.54%   88.04%   -0.51%     
==========================================
  Files           4        5       +1     
  Lines         978     1079     +101     
  Branches      203      229      +26     
==========================================
+ Hits          866      950      +84     
- Misses         82       91       +9     
- Partials       30       38       +8     
Flag Coverage Δ
unittests-3.10 88.04% <81.18%> (-0.51%) ⬇️
unittests-3.6 87.73% <81.18%> (-0.48%) ⬇️
unittests-3.7 87.73% <81.18%> (-0.48%) ⬇️
unittests-3.8 87.85% <81.18%> (-0.49%) ⬇️
unittests-3.9 87.85% <81.18%> (-0.49%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
hepdata_lib/hist_utils.py 81.18% <81.18%> (ø)

... and 1 file with indirect coverage changes

@clelange
Copy link
Collaborator

clelange commented Nov 3, 2023

Hm... I just checked, the hist package is only strictly available for python>=3.7 [1], so that is why some of the CI scripts are failing. I'm not sure if there is an easy way to limit sub-packages to require a different python version.
[1] https://pypi.org/project/hist/

Yes, there's a way, but I need to look it up. We could either deprecate Python 3.6 since it's EOL anyway or do that. I'll get back to you later (hopefully tomorrow) in case nobody else has done so in the meantime.

Thinking about this a bit more, I would say we should just get rid of Python 3.6 support. It's not great since lots of systems we use still have 3.6 as system Python, but then again there are relatively easy ways to get to Python >= 3.7 and we need to make sure we document them.

@GraemeWatt
Copy link
Member

Hm... I just checked, the hist package is only strictly available for python>=3.7 [1], so that is why some of the CI scripts are failing. I'm not sure if there is an easy way to limit sub-packages to require a different python version.

[1] https://pypi.org/project/hist/

The last version of hist for Python 3.6 is v2.4.0. The tests are failing for Python 3.6 not because hist cannot be installed, but because you're using features from hist that were not available in v2.4.0. So it might be possible to keep Python 3.6 support if you rewrite your code to work with hist v2.4.0. But if this is difficult or you need features from the later hist versions, then dropping Python 3.6 support is definitely the simpler solution.

@yimuchen
Copy link
Contributor Author

yimuchen commented Nov 3, 2023

Let me quickly compare what are the minimum set of features required for hist==2.4.0

@yimuchen
Copy link
Contributor Author

yimuchen commented Nov 3, 2023

This new readout method should work with both older versions of hist, as well as user defined accumulators.

Copy link
Collaborator

@clelange clelange left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay, looks good to me. I'll create a new release soon (might be tomorrow though).

@clelange clelange merged commit b7a9238 into HEPData:main Nov 15, 2023
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants