First a summary of the weekly plan proposed:
- Week 1-2: Learn Madanalysis5(ma5).
- Week 2-3: Make Ma5 Docker Image.
- Week 4-5: Implement and test Ma5 Docker Image.
- Week 6-7: Learn pyhf.
- Week 8-9: Implement pyhf subworkflow and docker and test.
- Week 10-11: REANA integration.
- Week 12: Write documentation.
Below are comprehensive summaries of progress for each week. New progress will appear at the bottom by Friday 11:59pm (PDT) every week.
- By this point, I was already ahead by a few weeks, since I hard started investigating ma5 early.
- I had finished running an end-to-end analysis of an interpretation of the tchannel model for one parameter point.
- The end goal was to run the analysis for the whole "grid" (parameter space) and interpolate the points in between to reproduce a given contur plot
- This analysis was run manually using madgraph with pythia to generate a showered .hepmc file which was then fed as input to madanalysis5 (ma5)
- The results from this initial analysis were not comparable to the given ouputs, since the previous researchers were using a more complex ma5 analysis
- During the investigation, I created an ma5 docker image to develop in, because I had issues setting up my local enviroment for ma5.
- The docker image would later be integrated into recast-workflow
- From last week, the goal was to do a full parameter scan using a specific ma5 analysis.
- To run the specific ma5 analysis, it was necessary to use ma5 expert/recast mode
- This was a special ma5 option that also required additional packages (delphes, delphesMa5Tune, PAD, PADMa5Tune)
- Some of these packages were specifically ma5 versions of other packages. e.g. delphes vs. delpesMa5Tune
- After fixing some compilation issues with the ma5 package manager, I was able to successfully run the analysis and reproduce one point.
- The next step was to scan the rest of the points (~170 points).
- I started working on a python script.
- I used recast-workflow to quickly create a madgraph+pythia workflow and then manually added madanalysis as another stage in this workflow.
- Since I was trying to automate the process through a workflow, I had to make modifications to the ma5 docker image I was using for development.
- Using ma5's run/recast card system, I was able to run the docker image automatically - e.g. docker run recast/ma5 -c ma5/bin -R run_card.dat.
- The run card contained a series of commands that I would normally have to execture manually, but ma5 had a feature that would process them for me.
- The recast card contained the analysis name I wanted to reuse.
- The run and recast cards can be found here and here
- Once that had been setup, I could copy the ma5 workflow I used for the
- Workflow found here
- I used the input madgraph param_card.dat file from the previous point and reproduced its result by running this new workflow.
- Then, I made a template file for the input param_card
- Template found here (Note in line 32, 33 there are two variables being formatted:
{mChi}
and{mPhi}
). - Then, I wrote a simple python script that for every point, generated an input param_card.dat from the template, ran the ma5 workflow for that input file, and saved the results.
- The script is here
- I also repurposed this scanner to scan for decay and cross-section instead of confidence limits to validate an assumption about madgraph computing the decay-width the same was as the original researchers did.
- Decay/xsection script here
- The whole scan took about 72 hours running on my computer without any parallelisation.
- From these results, I was able to produce a plot that matched closely to the given one.
- This is the week where I started applying knowledge form the prior weeks to building recast-workflow.
- The docker image was fully implemented, so I could just copy the files needed to build the image to recast-workflow.
- For the workflow, I had to cut out the ma5 stage and put it in a seperate file.
- The way subworkflows are defined in recast-workflow requires that there is also a description.yml file for the ma5 that describes the input/output file interfaces and a common_inputs.py file for determining common inputs for the subworkflow (common inputs refers to inputs that are common among multiple subworkflows, not frequently inputted values).
- After defining these extra files so recast-workflow can understand the subworkflow, I was able to generate a new end-to-end ma5 workflow (from input cards for generation of data from madgraph to confidence limits in madnalysis5).
- The files defining the ma5 subworkflow can be found [here] (https://github.com/vladov3000/recast_workflow/tree/master/src/recast_workflow/subworkflows/analysis/madanalysis).
- With this yadage workflow, I was able to start working with REANA.
- I had to use ssh tunneling with a vnc to open reana.cern.ch and get my access token.
- Then in lxplus, I submitted some hello world tutorial REANA jobs that introduced me to the reana.yml file that specified how to run the job and how to get my outputs.
- I then found a more specific tutorial for my use case that was about using REANA with the yadage engine.
- Using this tutorial, I was able to create a job for my ma5 workflow and scan through all the input cards. This was significantly faster than the python script due to the parallelisation of execution in REANA.
- I got the same results and was able to reuse my code to recreate the contour plot.
- After gaining familiarity with running these scans (through python scripts or REANA), I could start implementing features in recast-workflow.
- First, I made a command (
recast-cli scan build workflow.yml
). - This would convert a single input workflow (such as the original ma5 workflow) to a workflow that could take several input files.
- Then, I added a command to get a directory with your workflow and reana.yml, such that you could copy your inputs into the directory and submit it without any extra modifications.
- The idea was to also use REANA as a backend for running these workflows that way you could submit using recast-cli, but this is tricky to implement and not very useful for a few reasons.
- This week was dedicated to learning about pyhf and cleaning up the project, which had gotten bloated with random files.
- The previous setup had recast-workflow nested in recast-cli, the command line interface for workflow.
- Due to this setup, a setup script was required to set the $PYTHONPATH enviromental variable (points to where python can search for modules).
- There were also very many complexities with the packageging of recast-cli, since there are many non-.py files that had to be included.
- This is why I decided to reorganize the project by copying in folders one at a time from recast-workflow. Currently, recast-cli has not been fully integrated yet, but recast-workflow only fails one test.
- I also changed recast-workflow to be an installable package - this allows for a non-command line python interface with the package (e.g.
import recast_workflow
). - It also makes the installation quite simply by only requiring the command
pip install -e .[test]
to start developing. - The goal by the end of this project is to build and upload everything, so anyone can install the package just by running
pip install recast-workflow
. - With pyhf, I was able to convert a HistFitter analysis workspace to pyhf and do a basic fit.
- This was done for the MonoJet analysis to reproduce its results as well as for a simple single-bin example. The githubs repos are here and here.
- I am ready to start pyhf integration into recast-workflow and focus on reworking recast-workflow through better organization and more unit tests
- I added back the command line interface component to recast-workflow
- Found at https://github.com/vladov3000/recast_workflow/tree/master/src/recast_workflow/cli
- This includes the
recast-wf make new
command that creates a new workflow using an interactive input process from the user - The make new command also takes command line arguements that can be substituted in for user input, such that the whole process of making a new workflow can be run without interaction
- This is useful for a potential backend of a website that provides a UI for making these workflows, so the backend can just run the command with comprehensive arguments for any requests
- Furthermore, I reimplemented the inventory system (renamed from catalogue).
- Before, the 'catalogue' in recast-workflow was ambigious. Now, the catalogue is just a list of all the possible workflows for the given filters which the user can pick from. While the inventory is a cache of these generated workflows that can be accessed from any directory on the user's machine.
- This puts the rework of recast-workflow to about 90% of what it was before the research fellowship began
- All that is left is adding get_dir (which returns a directory with a run script for executing the workflow) and miscellaneous functions for the inventory
- Then, the newer features can be copy-pasted and tested more thoroughly with the new system.
- The subworkflow specifications for pyhf are complete (description.yml, common_inputs.py, workflow.yml), but the docker image is incomplete. The python script that is the entrypoint for the pyhf docker image currently does nothing with the given arguements.
- Currently, I have been implementing my own CL calculator using this paper. Pyhf uses the same approximate formulas, so I should get a better understanding of the pyhf source code and how to find the cls using pyhf.
- Then, I can finish the pyhf scipt next week and the CL calculator will also be done by Wednesday next week.
- I finished the rest of the 'base' command line features this week.
recast-wf inv add/getyml/getdir
now work.- The inventory works as a glocal storage for workflows generated/used by recast-workflow.
add
will add a workflow to the inventory at a local path. This will be useful later when the scan build feature is copied back in. This would allow users to take a workflow too complex for recast-wf to make, and convert it to a multistage workflow (allowing for parameter scans and parallelisation using REANA).getyml
will print the workflow text or give a yaml file with the workflow. Since recast-wf automatically replaces all references to other files in the main workflow file, the whole workflow can be stored in one yaml file.getdir
will produce a directory which includes some extra files/folders to organize the process of running the workflow. For example, it includes a run script which is used as an alias so users don't have to type in the full command to run a yadage workflow with the right arguments (yadage-run workdir $WORKFLOW inputs/input.yml -d initdir=$PWD/inputs
->./run.sh
). All the extra files are stored in a templates folder which can be easily modified to include new files.- There are still some complexities with pyhf which require more investigation. Specifically, how .yoda files translate to pyhf JSON histograms.
- I have also begun working on documentation by writing the introduction of the first tutorial (this will later be moved to a readthedocs.io page): https://github.com/vladov3000/recast_workflow/blob/master/TUTORIAL.md
- There will also be a summer workshop soon where we want to present recast-workflow, so we have been brainstorming more ideas of applications of recast.
- I have also been reading through a paper on statistics in physics and constructing a presentation concurently that can be used to explain the analysis/statistic subworkflows well to new students/users.