-
Notifications
You must be signed in to change notification settings - Fork 1
Task overview
Always needed: version -> so current workflow gets allocated accordingly local-scheduler -> to keep track of tasks running at the moment, other option would be a global scheduler year -> which specific config shall be used for running this (e.g. 2016|2017|2018 in Run2) category -> if you want specific cuts applied to your selection and only process these further
--version (some name) --local-scheduler --year 2018 --category SR0b
Show if task is done (0: only this one | 1: this one and the preceding one)
--print-status 1
Remove output (0: only of this task | a: all output)
--remove-output 0,a
If you have a law.LocalWorkflow | HTConorworkflow and want to use a dedicated one
--workflow (local|htcondor)
Helpful to kill local jobs still running somewhere
--cleanup-jobs
By default, this example uses a local scheduler, which - by definition - offers no visualization tools in the browser. If you want to see how the task tree is built and subsequently run, run luigid in a second terminal. This will start a central scheduler at localhost:8082 (the default address). To inform tasks (or rather workers) about the scheduler, either add --local-scheduler False to the law run command, or set the local-scheduler value in the [luigi_core] config section in the law.cfg file to False.
Important note for the start: These are all example commands. The version string should always be the one matching your current development. Also, the namespace-parameter (here 0b) for defining the task-namespace has to be set according to the analysis. All example tasks use 2018
Collecting input data from specified directory (should end in /merged/ or /root/ if using the skimmer) and writes a fileset dictionary used for the CoffeaProcessing. This task also runs 0b.WriteDatasets as a dependency
law run 0b.WriteDatasetPathDict --version final_run_2018 --local-scheduler --year 2018 --directory-path /pnfs/desy.de/cms/tier2/store/user/frengelk/skimmed_files_2018_new_analysis/ --category SR0b
Then collecting the generator weights and cutflows from the input
law run 0b.CollectInputData --version final_run_2018 --local-scheduler --year 2018 --category SR0b
Find out which mass points are included in signal files, needed later for independent running
law run 0b.CollectMasspoints --version final_run_2018 --local-scheduler --year 2018 --category SR0b
Calculating btagging scale factors for all events connected to the files specified in the pathdatadict
law run 0b.CalcBTagSF --version final_run_2018 --local-scheduler --year 2018 --category SR0b --workflow local
Example call for CoffeaProcessor, for testing call it with local and debug to just spawn one test job
law run 0b.CoffeaProcessor --version final_run_2018 --local-scheduler --year 2018 --lepton-selection LeptonIncl --datasets-to-process '["MET"]' --category SR0b (--workflow local --debug)
Normally, we're doing the complete run directly including the merging of all subprocesses Giving all top-level datasets here together with all relevant channels (make sure these are orthogonal, will be used as tree names) spawns Coffea tasks for each included subprocess as parallel HTCondor jobs, collects the input afterwards and merges it per dataset. The processes are defined in processes.py, the channels in analysis.py, the category in categories.py
law run 0b.MergeArrays --version final_run_2018 --local-scheduler --year 2018 --processor ArrayExporter --datasets-to-process '["WJets", "SingleTop", "TTbar", "QCD", "Rare", "DY", "T5qqqqWW_1500_1000", "T5qqqqWW_2200_100","T5qqqqWW_2200_800", "MET", "SingleMuon"]' --channel '["LeptonIncl"]' --category SR0b
Doing the plotting over all numpy arrays. Will produce for each variable defined a png/pdf in linear/log scale
--merged
requires MergeArrays as input, unmerged plotting may lead to ill-defined plots if datasets are not named properly, --unblinded
produces a ratio plot and data points plotted (attention on CMS rules here), --signal
plots signal as step as well
law run 0b.ArrayPlotting --version final_run_2018 --local-scheduler --year 2018 --processor ArrayExporter --datasets-to-process '["WJets", "SingleTop", "TTbar", "QCD", "Rare", "DY", "T5qqqqWW_1500_1000", "T5qqqqWW_2200_100", "MET", "SingleMuon"]' --channel '["LeptonIncl"]' --category SR0b --merged (--signal --unblinded)
Preparing Inputs for ML tasks, requires Mergearrays, doing Normalization on full distribution, then splitting dataset into k-fold datasets
law run 0b.ArrayNormalisation --version final_run_2018 --local-scheduler --year 2018 --datasets-to-process '["WJets", "SingleTop", "TTbar", "QCD", "Rare", "DY", "T5qqqqWW", "MET", "SingleMuon"]' --category All_Lep --channel '["LeptonIncl"]'
law run 0b.CrossValidationPrep --version final_run_2018 --local-scheduler --year 2018 --datasets-to-process '["WJets", "SingleTop", "TTbar", "QCD", "Rare", "DY", "T5qqqqWW"]' --category All_Lep --channel '["LeptonIncl"]'(--kfold 2)
DNN Training with PyTorch of kfold separate networks simultaneously, specifying specific parameters of training, each configuration gets a different save-path, training is performed on All-Lep category for later background estimation:
law run 0b.PytorchCrossVal --version final_run_2018 --year 2018 --local-scheduler --category All_Lep --batch-size 64 --epochs 100 --dropout 0.1 --learning-rate 0.0001 --n-nodes 128 --gamma 1.125```
For plotting the history of the training, e.g. accuracy and loss over all epochs comparing train/val performance
Printing the ROC curve for signal vs Background, and the confusion matrix for the classification task, normalized in each row
```shell
law run 0b.DNNEvaluationCrossValPlotting --version final_run_2018 --year 2018 --local-scheduler --category All_Lep --batch-size 64 --epochs 100 --dropout 0.1 --learning-rate 0.0001 --n-nodes 128 --gamma 1.125
Predicting scores for named processes
law run 0b.PredictDNNScores --version final_run_2017 --year 2017 --local-scheduler --category SR0b --batch-size 64 --dropout 0.1 --learning-rate 0.00005 --n-nodes 256 --gamma 1.125 --workflow local
Looking at the distribution of DNN scores, or split by process
law run 0b.DNNScorePlotting --version final_run_2018 --year 2018 --local-scheduler --category All_Lep --batch-size 64 --epochs 100 --dropout 0.1 --learning-rate 0.0001 --n-nodes 128 --gamma 1.125
law run 0b.DNNScorePerProcess --version final_run_2018 --local-scheduler --year 2018 --datasets-to-process '["WJets", "SingleTop", "TTbar", "QCD", "Rare", "DY", "T5qqqqWW_1500_1000", "T5qqqqWW_2200_100", "MET", "SingleMuon", "EGamma"]' --batch-size 64 --dropout 0.1 --learning-rate 0.0001 --n-nodes 128 --gamma 1.125 --channel '["LeptonIncl"]' --category SR0b --QCD-estimate
Starting a grid search and the best training afterwards
law run 0b.OptimizeHyperParam --version final_run_2018 --local-scheduler --year 2018 --category All_Lep --nParameters 5 --epochs 10
law run 0b.FindBestTraining --version final_run_2018 --local-scheduler --year 2018 --category All_Lep --nParameters 5
Datadriven QCD Estimation, first the QCD contribution is fitted an then each yield in the SR is estimated
law run 0b.IterativeQCDFitting --version final_run_2017 --year 2018 --local-scheduler --batch-size 64 --dropout 0.1 --learning-rate 0.00005 --n-nodes 256 --gamma 1.125 --category SR0b --datasets-to-process '["WJets", "SingleTop", "TTbar", "QCD", "Rare", "DY", "MET", "SingleMuon", "EGamma"]'
law run 0b.EstimateQCDinSR --version final_run_2018 --year 2018 --local-scheduler --batch-size 64 --dropout 0.1 --learning-rate 0.0001 --n-nodes 128 --gamma 1.125 --category SR0b --datasets-to-process '["WJets", "SingleTop", "TTbar", "QCD", "Rare", "DY","T5qqqqWW_2200_100", "MET", "SingleMuon", "EGamma"]'
Calculating normalization for EWK backgrounds, either including QCD or with data-driven estimate
law run 0b.CalcNormFactors --version final_run_2018 --year 2018 --local-scheduler --category SR0b --batch-size 64 --dropout 0.1 --learning-rate 0.0001 --n-nodes 128 --gamma 1.125 --QCD-estimate
Constructing background event yields in all bins for the final fit
law run 0b.ConstructInferenceBins --version final_run_2018 --local-scheduler --year 2018 --category SR0b --batch-size 64 --dropout 0.1 --learning-rate 0.0001 --n-nodes 128 --gamma 1.125 --QCD-est
Estimate signal yields independently and per shift
law run 0b.YieldPerMasspoint --version final_run_2018 --local-scheduler --year 2018 --datasets-to-process '["T5qqqqWW"]' --lepton-selection LeptonIncl --category SR0b --batch-size 64 --dropout 0.1 --learning-rate 0.0001 --n-nodes 128 --gamma 1.1 (--shift TotalUp)
Running selection for all systematics again and compare yields to nominal:
law run 0b.MergeShiftArrays --version final_run_2018 --local-scheduler --year 2018 --datasets-to-process '["WJets", "SingleTop", "TTbar", "QCD", "Rare", "DY"]' --channel '["LeptonIncl"]' --shifts '["TotalUp", "TotalDown", "systematic_shifts", "JERUp", "JERDown"]' --category SR0b
law run 0b.GetShiftedYields --version final_run_2018 --local-scheduler --year 2018 --datasets-to-process '["WJets", "SingleTop", "TTbar", "QCD", "Rare", "DY"]' --channel '["LeptonIncl"]' --shifts '["systematic_shifts", "TotalUp", "TotalDown", "JERUp", "JERDown"]' --category SR0b --batch-size 64 --dropout 0.1 --learning-rate 0.0001 --n-nodes 128 --gamma 1.125
Creating datacards for every masspoint, including all nominal and shifted yields
law run 0b.DatacardPerMasspoint --version final_run_2018 --local-scheduler --year 2018 --datasets-to-process '["WJets", "SingleTop", "TTbar", "QCD", "Rare", "DY"]' --channel '["LeptonIncl"]' --shifts '["TotalUp", "TotalDown", "systematic_shifts", "JERUp", "JERDown"]' --batch-size 64 --dropout 0.1 --learning-rate 0.0001 --n-nodes 128 --gamma 1.125 --category SR0b --workflow local
Plotting all yields in the bins used for the final fit
law run 0b.AllFittedBinsPlotting --version final_run_2018 --year 2018 --local-scheduler --category SR0b --batch-size 64 --dropout 0.1 --learning-rate 0.0001 --n-nodes 128 --gamma 1.125 --unblinded --datasets-to-process '["T5qqqqWW_1500_1000", "T5qqqqWW_2200_100"]' --split-signal --do-shifts
Inference is run using combine
law run 0b.GroupCoffea --version test_iso --local-scheduler --year 2017 --datasets-to-process '["SingleMuon", "SingleElectron", "MET"]'
law run 0b.CutflowPlotting --version test_iso --local-scheduler --year 2017 --datasets-to-process '["SingleMuon", "SingleElectron", "MET"]'
Compute how many events pass your cuts
law run 0b.ComputeEfficiencies --version test_mu_2017C --year 2017 --local-scheduler --channel Muon
Plotting integrated gradients for trained model For more explanation, see captum documentation link or check old (23.02.2022) SUSY 1lep ML presentation
law run mj.PlotFeatureImportance --version testDNN --local-scheduler --batch-size 3000 --epochs 3 --dropout 0.5
law run mj.PlotNeuronConductance --version testDNN --local-scheduler --batch-size 1200 --epochs 1000 --dropout 0.5