Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow dummy uncertainties for variable member with dummy (string) value #247

Open
wmtford opened this issue Feb 7, 2024 · 4 comments
Open

Comments

@wmtford
Copy link

wmtford commented Feb 7, 2024

In constructing HEPData tables, sometimes we have distributions in which no actual information is available for a subset of the dependent variables in some bins. One solution I’ve used for these is to enter a string, like ‘---’ for the value and zero for the uncertainty. This works, but with lots of warnings about zero error values. Would it be possible to protect string values of the uncertainties, as is done for the central values? Then one could enter a similar dummy indicator string for those.

Of course, the consumer of the table has to trap any special dummy contents in the “values” and “uncertainties” fields. I don’t know if there are any guidelines for conventions to make this uniform.

@clelange
Copy link
Collaborator

clelange commented Feb 9, 2024

Hi Bill, thanks for creating this issue. Do you have an example of how you solved this that you could share so that it's easier to test and reproduce (and then to change eventually)?

@GraemeWatt
Copy link
Member

Thanks for the feedback, but HEPData doesn't support string values for the uncertainties. Recommendations for how to encode missing bins are given in:

https://hepdata-submission.readthedocs.io/en/latest/data_yaml.html#uncertainties (second paragraph)

The implementation in hepdata_lib was done in PR #161, i.e. if all uncertainties are zero for a particular bin, then the errors key is omitted from the YAML output. The warning message was my suggestion to discourage empty bins if there is only one dependent variable. But for your use case where there are multiple dependent variables and a (different) subset of bins are empty for each dependent variable, it is legitimate to set uncertainties to zero for those bins and you should just ignore the warnings from hepdata_lib.

I don't think the behaviour of the hepdata_lib code needs to be changed, but there could be better documentation and perhaps an option could be added to suppress the warnings. I'll try to address these two points and I'll leave this issue open until they are addressed.

@wmtford
Copy link
Author

wmtford commented Feb 11, 2024

Thanks both. An example of the use case is Fig. 5, the second table, in [1]. A stand-alone notebook to reproduce that table, along with the needed input files, can be found in cernbox at [2]. From my perspective this works fine, other than that the warning messages seem to imply a problem.

The reason that the encoding of missing bins pointed to by Graeme doesn't work in the hepdata_lib implementation is that there uncertaintes are added, or not, for an entire column; we don't have access to treat some of the rows differently.

[1] https://www.hepdata.net/record/146018
[2] https://cernbox.cern.ch/s/8Nk392EQuLX8jJC

@GraemeWatt
Copy link
Member

Bill, thank you for providing the detailed example. Unfortunately, it is a limitation of hepdata_lib that it is not possible to use a different treatment of uncertainties for different rows of a dependent variable. Your existing treatment is the best that can be done with the current code. I've opened a PR #251 that adds a paragraph of explanation to the end of the Uncertainties section of the documentation. I also added an option zero_uncertainties_warning (default value True) to the Variable class. In your example notebook, you could suppress the warnings using:

    # Dependent variable
    gy = Variable(axTitles[1][0], is_independent=False, is_binned=False, units=axTitles[1][1], zero_uncertainties_warning=False)

I also added a test that the errors key will be omitted if the uncertainties are zero, which should have been added when PR #161 was completed. Thanks again for the feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants