Task overview

General commands

Always needed: version -> so current workflow gets allocated accordingly local-scheduler -> to keep track of tasks running at the moment, other option would be a global scheduler year -> which specific config shall be used for running this (e.g. 2016|2017|2018 in Run2) category -> if you want specific cuts applied to your selection and only process these further

--version (some name) --local-scheduler --year 2018 --category SR0b

Show if task is done (0: only this one | 1: this one and the preceding one)

--print-status 1

Remove output (0: only of this task | a: all output)

--remove-output 0,a

If you have a law.LocalWorkflow | HTConorworkflow and want to use a dedicated one

--workflow (local|htcondor)

Helpful to kill local jobs still running somewhere

--cleanup-jobs

By default, this example uses a local scheduler, which - by definition - offers no visualization tools in the browser. If you want to see how the task tree is built and subsequently run, run luigid in a second terminal. This will start a central scheduler at localhost:8082 (the default address). To inform tasks (or rather workers) about the scheduler, either add --local-scheduler False to the law run command, or set the local-scheduler value in the [luigi_core] config section in the law.cfg file to False.

Task commands for executing the analysis

Important note for the start: These are all example commands. The version string should always be the one matching your current development. Also, the namespace-parameter (here 0b) for defining the task-namespace has to be set according to the analysis. All example tasks use 2018

Finding files, preparing Coffea

Collecting input data from specified directory (should end in /merged/ or /root/ if using the skimmer) and writes a fileset dictionary used for the CoffeaProcessing. This task also runs 0b.WriteDatasets as a dependency

law run 0b.WriteDatasetPathDict --version final_run_2018 --local-scheduler --year 2018 --directory-path  /pnfs/desy.de/cms/tier2/store/user/frengelk/skimmed_files_2018_new_analysis/  --category SR0b

Then collecting the generator weights and cutflows from the input

law run 0b.CollectInputData --version final_run_2018 --local-scheduler --year 2018 --category SR0b

Find out which mass points are included in signal files, needed later for independent running

law run 0b.CollectMasspoints --version final_run_2018 --local-scheduler --year 2018 --category SR0b

Calculating btagging scale factors for all events connected to the files specified in the pathdatadict

law run 0b.CalcBTagSF --version final_run_2018 --local-scheduler --year 2018 --category SR0b --workflow local

Coffea applying selection

Example call for CoffeaProcessor, for testing call it with local and debug to just spawn one test job

law run 0b.CoffeaProcessor --version final_run_2018 --local-scheduler --year 2018 --lepton-selection LeptonIncl --datasets-to-process '["MET"]' --category SR0b (--workflow local --debug)

Normally, we're doing the complete run directly including the merging of all subprocesses Giving all top-level datasets here together with all relevant channels (make sure these are orthogonal, will be used as tree names) spawns Coffea tasks for each included subprocess as parallel HTCondor jobs, collects the input afterwards and merges it per dataset. The processes are defined in processes.py, the channels in analysis.py, the category in categories.py

law run 0b.MergeArrays --version final_run_2018 --local-scheduler --year 2018 --processor ArrayExporter --datasets-to-process '["WJets", "SingleTop", "TTbar", "QCD", "Rare", "DY",  "T5qqqqWW_1500_1000", "T5qqqqWW_2200_100","T5qqqqWW_2200_800", "MET", "SingleMuon"]' --channel '["LeptonIncl"]' --category SR0b

Doing the plotting over all numpy arrays. Will produce for each variable defined a png/pdf in linear/log scale --merged requires MergeArrays as input, unmerged plotting may lead to ill-defined plots if datasets are not named properly, --unblinded produces a ratio plot and data points plotted (attention on CMS rules here), --signal plots signal as step as well

law run 0b.ArrayPlotting --version final_run_2018 --local-scheduler --year 2018 --processor ArrayExporter --datasets-to-process '["WJets", "SingleTop", "TTbar", "QCD", "Rare", "DY", "T5qqqqWW_1500_1000", "T5qqqqWW_2200_100", "MET", "SingleMuon"]' --channel '["LeptonIncl"]' --category SR0b --merged (--signal --unblinded)

Deep Learning Tasks

Preparing Inputs for ML tasks, requires Mergearrays, doing Normalization on full distribution, then splitting dataset into k-fold datasets

law run 0b.ArrayNormalisation --version final_run_2018 --local-scheduler --year 2018 --datasets-to-process '["WJets", "SingleTop", "TTbar", "QCD", "Rare", "DY", "T5qqqqWW", "MET", "SingleMuon"]' --category All_Lep --channel '["LeptonIncl"]'

law run 0b.CrossValidationPrep --version final_run_2018 --local-scheduler --year 2018 --datasets-to-process '["WJets", "SingleTop", "TTbar", "QCD", "Rare", "DY", "T5qqqqWW"]' --category All_Lep --channel '["LeptonIncl"]'(--kfold 2)

DNN Training with PyTorch of kfold separate networks simultaneously, specifying specific parameters of training, each configuration gets a different save-path, training is performed on All-Lep category for later background estimation:

law run 0b.PytorchCrossVal --version final_run_2018 --year 2018 --local-scheduler --category All_Lep --batch-size 64 --epochs 100 --dropout 0.1 --learning-rate 0.0001 --n-nodes 128 --gamma 1.125```

For plotting the history of the training, e.g. accuracy and loss over all epochs comparing train/val performance
Printing the ROC curve for signal vs Background, and the confusion matrix for the classification task, normalized in each row 
```shell
law run 0b.DNNEvaluationCrossValPlotting --version final_run_2018 --year 2018 --local-scheduler --category All_Lep --batch-size 64 --epochs 100 --dropout 0.1 --learning-rate 0.0001 --n-nodes 128 --gamma 1.125

Predicting scores for named processes

law run 0b.PredictDNNScores --version final_run_2017 --year 2017 --local-scheduler --category SR0b --batch-size 64 --dropout 0.1 --learning-rate 0.00005 --n-nodes 256 --gamma 1.125 --workflow local

Looking at the distribution of DNN scores, or split by process

law run 0b.DNNScorePlotting --version final_run_2018 --year 2018 --local-scheduler --category All_Lep --batch-size 64 --epochs 100 --dropout 0.1 --learning-rate 0.0001 --n-nodes 128 --gamma 1.125

law run 0b.DNNScorePerProcess --version final_run_2018 --local-scheduler --year 2018 --datasets-to-process '["WJets", "SingleTop", "TTbar", "QCD", "Rare", "DY", "T5qqqqWW_1500_1000", "T5qqqqWW_2200_100", "MET", "SingleMuon", "EGamma"]' --batch-size 64 --dropout 0.1 --learning-rate 0.0001 --n-nodes 128 --gamma 1.125 --channel '["LeptonIncl"]' --category SR0b --QCD-estimate

Starting a grid search and the best training afterwards

law run 0b.OptimizeHyperParam --version final_run_2018 --local-scheduler --year 2018 --category All_Lep --nParameters 5 --epochs 10

law run 0b.FindBestTraining --version final_run_2018 --local-scheduler --year 2018 --category All_Lep --nParameters 5

Background Estimation

Datadriven QCD Estimation, first the QCD contribution is fitted an then each yield in the SR is estimated

law run 0b.IterativeQCDFitting --version final_run_2017 --year 2018 --local-scheduler --batch-size 64 --dropout 0.1 --learning-rate 0.00005 --n-nodes 256 --gamma 1.125 --category SR0b --datasets-to-process '["WJets", "SingleTop", "TTbar", "QCD", "Rare", "DY", "MET", "SingleMuon", "EGamma"]'

law run 0b.EstimateQCDinSR --version final_run_2018 --year 2018 --local-scheduler --batch-size 64 --dropout 0.1 --learning-rate 0.0001 --n-nodes 128 --gamma 1.125 --category SR0b --datasets-to-process '["WJets", "SingleTop", "TTbar", "QCD", "Rare", "DY","T5qqqqWW_2200_100", "MET", "SingleMuon", "EGamma"]'

Calculating normalization for EWK backgrounds, either including QCD or with data-driven estimate

law run 0b.CalcNormFactors --version final_run_2018 --year 2018 --local-scheduler --category SR0b --batch-size 64 --dropout 0.1 --learning-rate 0.0001 --n-nodes 128 --gamma 1.125 --QCD-estimate

Tasks for constructing yields used in signal strength inference

Constructing background event yields in all bins for the final fit

law run 0b.ConstructInferenceBins --version final_run_2018 --local-scheduler --year 2018 --category SR0b  --batch-size 64 --dropout 0.1 --learning-rate 0.0001 --n-nodes 128 --gamma 1.125 --QCD-est

Estimate signal yields independently and per shift

law run 0b.YieldPerMasspoint --version final_run_2018 --local-scheduler --year 2018 --datasets-to-process '["T5qqqqWW"]' --lepton-selection LeptonIncl --category SR0b --batch-size 64 --dropout 0.1 --learning-rate 0.0001 --n-nodes 128 --gamma 1.1 (--shift TotalUp)

Running selection for all systematics again and compare yields to nominal:

law run 0b.MergeShiftArrays --version final_run_2018 --local-scheduler --year 2018 --datasets-to-process '["WJets", "SingleTop", "TTbar", "QCD", "Rare", "DY"]' --channel '["LeptonIncl"]' --shifts '["TotalUp", "TotalDown", "systematic_shifts", "JERUp", "JERDown"]' --category SR0b

law run 0b.GetShiftedYields --version final_run_2018 --local-scheduler --year 2018 --datasets-to-process '["WJets", "SingleTop", "TTbar", "QCD", "Rare", "DY"]' --channel '["LeptonIncl"]' --shifts '["systematic_shifts", "TotalUp", "TotalDown", "JERUp",  "JERDown"]' --category SR0b  --batch-size 64 --dropout 0.1 --learning-rate 0.0001 --n-nodes 128 --gamma 1.125

Creating datacards for every masspoint, including all nominal and shifted yields

law run 0b.DatacardPerMasspoint --version final_run_2018 --local-scheduler --year 2018 --datasets-to-process '["WJets", "SingleTop", "TTbar", "QCD", "Rare", "DY"]' --channel '["LeptonIncl"]' --shifts '["TotalUp", "TotalDown", "systematic_shifts", "JERUp", "JERDown"]' --batch-size 64 --dropout 0.1 --learning-rate 0.0001 --n-nodes 128 --gamma 1.125 --category SR0b --workflow local

Plotting all yields in the bins used for the final fit

law run 0b.AllFittedBinsPlotting --version final_run_2018 --year 2018 --local-scheduler --category SR0b --batch-size 64 --dropout 0.1 --learning-rate 0.0001 --n-nodes 128 --gamma 1.125 --unblinded --datasets-to-process '["T5qqqqWW_1500_1000", "T5qqqqWW_2200_100"]' --split-signal --do-shifts

Inference is run using combine

Deprecated Tasks:

For cutflow stuff:

law run 0b.GroupCoffea --version test_iso --local-scheduler --year 2017 --datasets-to-process '["SingleMuon", "SingleElectron", "MET"]'

law run 0b.CutflowPlotting --version test_iso --local-scheduler --year 2017 --datasets-to-process '["SingleMuon", "SingleElectron", "MET"]'

Compute how many events pass your cuts

law run 0b.ComputeEfficiencies --version test_mu_2017C --year 2017 --local-scheduler --channel Muon

Captum tasks

Plotting integrated gradients for trained model For more explanation, see captum documentation link or check old (23.02.2022) SUSY 1lep ML presentation

law run mj.PlotFeatureImportance --version testDNN --local-scheduler --batch-size 3000 --epochs 3 --dropout 0.5

law run mj.PlotNeuronConductance --version testDNN --local-scheduler --batch-size 1200 --epochs 1000 --dropout 0.5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly