-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read shapes in from pandas dataframe #548
Conversation
…for event yield and variance
… of the histogram
Thanks @shane-breeze . Will this also support the use of the |
Yes. There's a variance column in the dataframe that fills the error of the histograms. Just tested the simple shapes card with autoMCStats and got the same results with TH1s and the dataframe. |
Added a cast for the index selection in the datacard to the index dtypes. This is taken as a string inside the datacard but the dataframe can have int, float, ... |
Just wondering about the label, what work needs to be done on this? |
@nsmith- , the action items on this PR were the following in the last discussion:
|
@shane-breeze , can you allow editing from maintainers? |
@amarini, I have unarchived my fork of this repository. It should now be writable. |
Conflicts: python/ShapeTools.py
autoMCStats is available (I checked it still works), so I think there is no missing feature. Unbinned data is anyway detected by the output of |
Pull Request Test. |
Merge pull request #548 from shane-breeze/shapes-df Read shapes in from pandas dataframe
Cherry-pick #548 merge into 112x
I kept converting pandas dataframes into TH1s for my binned shape fits so instead I've included this conversion in combine and others might find it useful (shapes can be saved in human readable csv/json files or even excel spreadsheets e.g.).
Changed
ShapeTools.py
to interpret files with the extensions[".csv", ".json", ".html", ".pkl", ".xlsx", ".h5", ".parquet"]
as a pandas dataframe (see here for IO). Any other extensions are dealt with as before, i.e. as ROOT files. Note that multiindexed dataframes are used and some file extensions need to be converted to multiindex where all but the last 2 columns are used for indexing.DataFrameWrapper.py
adds a class to wrap pandas dataframe so there's aGet
method which acts in a similar way toROOT::TFile::Get
in return TH1s.Example csv file included and the following commands gave the same results (apart from file names/CP time):
Excel spreadsheets depends on openpyxl and xlrd, parquet depends on pyarrow or fastparquet, and hdf depends on pytables (can be installed through pip).
This can be extended to unbinned fits by having a similar multiindex for channel/process categorisation and a column(s) with the observable(s).