Add ability to read in YODA files #229

GraemeWatt · 2023-07-06T17:22:10Z

For cases where an analyser has data already in the YODA format for use with Rivet, it would be useful if hepdata_lib could read YODA files for conversion to the HEPData YAML format. It would be preferable if YODA was an optional and not mandatory dependence. The question of converting YODA to HEPData YAML has been a long-standing issue (HEPData/hepdata-converter#10), but it would be better handled by hepdata_lib than the hepdata-converter.

Cc: @20DM

The text was updated successfully, but these errors were encountered:

20DM · 2024-07-22T09:51:03Z

Hi Graeme!

I'm just in the process of preparing submissions for the reference data files in Rivet that don't have a HepData entry yet. I'm currently struggling to use the hepdata_lib for cases with inhomogeneous error breakdowns across bins. For instance, I have a distribution with three bins where the first two bins have two error components 'A' and 'B' (but not 'C') and the third bin has error component 'C' (but not 'A' and 'B').

I know this is supported in principle, e.g. by just omitting the respective components in the dictionary. However, when using the library, it seems hepdata_lib/helpers.py raises a ValueError

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

Is there a trick?

20DM · 2024-07-22T10:21:11Z

PS - just to be clear: of course I can "make it pass" by just setting the uncertainty to zero, but then all bins will have three uncertainty components, some of them zero, which is not the same as the bin not having the component in its breakdown to begin with. I think the problem is that the check for "non-zero uncertainties" only checks if there's at least one non-zero component and then adds all of them, regardless of their value. Can we make this more flexible?

20DM · 2024-07-23T09:41:56Z

On a different note: We have few cases where we have a discrete (string) axis where a subset of the edges is technically a floating point range. The library throws an error e.g. like this

error - independent_variable 'value' must not be a string range (use 'low' and 'high' to represent a range): '1.225-1.300' in 'independent_variables[0].values[6].value' (expected: {'type': 'number or string (not a range)'})

Of course I agree that a discrete axis where all bins are of the form float - float should just be a continuous axis and it's great that the validator enforces this. However, there are also a number of examples on HepData where we have a mix of these kind of bins with genuine discrete bins and we might want to allow this kind of axis in general, no?

One simple example I'm just looking at is one where we have two bins = [ "7 - 8", "13" ] corresponding to LHC centre-of-mass energies. One could get around the error by splitting this table into separate tables with a continuous [7.0, 8.0] bin or a discrete [ "13" ] bin, respectively, but then the two measurement points would not end up in the same plot without additional post-processing, which seems a shame. 🤔

20DM · 2024-07-23T10:26:01Z

On second thought, I suspect this requirement comes from the cases where we have a differential distribution, which is prepended/appended by a single bin corresponding to the average, which probably shouldn't be allowed. Maybe best to leave the validator as is and I will work around these cases (there's only 5 of them, so should be manageable).

GraemeWatt · 2024-07-23T11:59:49Z

This error comes from the hepdata-validator package rather than hepdata_lib. It was a common encoding mistake that uploaders specified a bin as a single value with the bin limits separated by a hyphen rather than giving separate low and high values (HEPData/hepdata-validator#33), so we implemented a check to catch it. I think hepdata_lib does not support mixed bins such as {low: 7, high: 8} and value: 13, although this is allowed in the HEPData YAML format. You could use {low: 13, high: 13} (unless a zero-width bin causes problems?) or use a separator other than - for the discrete bin "7 - 8" like "7 to 8" or "7 & 8".

20DM · 2024-07-23T16:27:18Z

Well, there were only 5 cases where I encountered this issue, so I've just replaced the dash with a "to" or "&" , depending on the context. It's sufficiently rare that this is probably good enough for now.

Good news, though: I've now managed to create submission tarballs that make the validator happy for all of the Rivet reference files that don't have a HepData entry yet. There's a total of 780 tarballs. What's the best way to submit them? I hope I don't have to upload them through the browser one by one? 😉

20DM · 2024-07-23T16:30:01Z

PS - I have a guest account for the IPPP cluster if it would be helpful for me to upload them there somewhere?

GraemeWatt · 2024-07-23T17:04:32Z

Great work! You should log into hepdata.net and click "Request Coordinator Privileges" on your Dashboard, then enter "Rivet" as the Experiment/Group. You can then click the "Submit" button to initiate a submission with an INSPIRE ID and specify an Uploader and Reviewer (maybe just yourself in both roles, unless you want a check from someone else). This will create an empty record that allows you to upload, then the record can be reviewed (there's a shortcut "Approve All Tables") and finalised from your Dashboard.

In terms of automation, we haven't yet encountered a need for bulk uploads like this, so unfortunately, there's not an easy way to finalise 780 records. The upload stage could be done from the command line (or from Python) using the hepdata-cli tool (see Example 9), but it requires an invitation cookie specific for each record. The record creation, reviewing and finalisation can only be done from the web interface. It might be possible to (semi-)automate these steps using something like Selenium, but I think that each record should undergo a basic visual check by a human before it is finalised. I suggest that you perform the create/upload/review/finalise workflow manually for a few records until you see what is involved, then you can decide whether it is worthwhile to look into writing scripts to (semi-)automate the procedure.

GraemeWatt · 2024-07-24T09:16:13Z

I've approved your Coordinator request. I realised that we already have a module for bulk imports that was written to import records from hepdata.net to a developer's local instance. Previously, we had a similar module for bulk migration of records from the old HepData site to the new hepdata.net site. The importer module bypasses the web interface of the normal submission system, so it would be a more efficient way of importing a large number of tarballs. If you could copy the tarballs to a web-accessible location and provide a list of INSPIRE IDs in a format similar to https://www.hepdata.net/search/ids?inspire_ids=true , I'll look into making the necessary changes to the importer module. I've opened a new issue HEPData/hepdata#811 so please continue the discussion there as it no longer relates to hepdata_lib.

20DM · 2024-07-24T09:47:27Z

Great - thank you!! 🙏

GraemeWatt added the enhancement New feature or request label Jul 6, 2023

GraemeWatt mentioned this issue Jul 6, 2023

YODA2 writer HEPData/hepdata-converter#53

Merged

20DM mentioned this issue Jul 22, 2024

Add support for inhomogenous error breakdowns #265

Merged

GraemeWatt mentioned this issue Jul 22, 2024

Support inhomogeneous uncertainties #266

Closed

GraemeWatt mentioned this issue Jul 24, 2024

records: extend importer module to allow bulk import from Rivet HEPData/hepdata#811

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ability to read in YODA files #229

Add ability to read in YODA files #229

GraemeWatt commented Jul 6, 2023 •

edited

Loading

20DM commented Jul 22, 2024

20DM commented Jul 22, 2024

20DM commented Jul 23, 2024

20DM commented Jul 23, 2024

GraemeWatt commented Jul 23, 2024 •

edited

Loading

20DM commented Jul 23, 2024

20DM commented Jul 23, 2024

GraemeWatt commented Jul 23, 2024

GraemeWatt commented Jul 24, 2024

20DM commented Jul 24, 2024

Add ability to read in YODA files #229

Add ability to read in YODA files #229

Comments

GraemeWatt commented Jul 6, 2023 • edited Loading

20DM commented Jul 22, 2024

20DM commented Jul 22, 2024

20DM commented Jul 23, 2024

20DM commented Jul 23, 2024

GraemeWatt commented Jul 23, 2024 • edited Loading

20DM commented Jul 23, 2024

20DM commented Jul 23, 2024

GraemeWatt commented Jul 23, 2024

GraemeWatt commented Jul 24, 2024

20DM commented Jul 24, 2024

GraemeWatt commented Jul 6, 2023 •

edited

Loading

GraemeWatt commented Jul 23, 2024 •

edited

Loading