Skip to content

Commit

Permalink
Improve documentation. Organise runtest.sh variables. Update and sync…
Browse files Browse the repository at this point in the history
… configuration files. (AliceO2Group#455)
  • Loading branch information
vkucera authored May 27, 2024
1 parent 4070176 commit 79180c3
Show file tree
Hide file tree
Showing 15 changed files with 308 additions and 119 deletions.
173 changes: 140 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,33 +4,59 @@

## Introduction

The main purpose of the Run 3 validation framework is to provide a compact and flexible tool for validation of the
The Run 3 validation framework is a tool for an easy execution, testing and validation of the Run 3 analysis code on large local samples.

Its features include

* simple specification of input datasets,
* simple configuration and activation of analysis tasks,
* easy generation of the O<sup>2</sup> command for complex workflow topologies,
* job parallelisation,
* output merging,
* error checking and reporting,
* specification of postprocessing.

It also provides tools for:

* post mortem debugging of failing jobs,
* comparison of histograms between ROOT files,
* visualisation of workflow dependencies,
* downloading of data samples from the Grid,
* maintenance of Git repositories and installations of aliBuild packages.

The original purpose of the Run 3 validation framework was to provide a compact and flexible tool for validation of the
[O<sup>2</sup>(Physics)](https://github.com/AliceO2Group/O2Physics) analysis framework by comparison of its output to its
[AliPhysics](https://github.com/alisw/AliPhysics) counterpart.
The general idea is to run the same analysis using AliPhysics and O<sup>2</sup>(Physics) and produce comparison plots.

However, it can be used without AliPhysics as well to run O<sup>2</sup> analyses locally, similar to running trains on AliHyperloop.
This makes it a convenient framework for local development, testing and debugging of O<sup>2</sup>(Physics) code.

## Overview

The validation framework is a general configurable platform that gives user the full control over what is done.
Its flexibility is enabled by strict separation of its specialised components into a system of bash scripts.
Its flexibility is enabled by strict separation of its specialised components into a system of Bash scripts.
Configuration is separate from execution code, input configuration is separate from task configuration, execution steps are separate from the main steering code.

* The steering script [`runtest.sh`](exec/runtest.sh) provides control parameters and interface to the machinery for task execution.
* User provides configuration bash scripts which:
* User provides configuration Bash scripts which:
* modify control parameters,
* produce modified configuration files,
* generate step scripts executed by the framework in the validation steps.

### Execution
## Execution

Execution code can be found in the [`exec`](exec) directory.

**The user should not touch anything in this directory!**

The steering script [`runtest.sh`](exec/runtest.sh) performs the following execution steps:

* Load input specification.
* Load tasks configuration.
* Print out input description.
* Clean before running. (activated by `DOCLEAN=1`)
* Deletes specified files.
* Deletes specified files (produced by previous runs).
* Generate list of input files.
* Modify the JSON file.
* Convert `AliESDs.root` to `AO2D.root`. (activated by `DOCONVERT=1`)
Expand All @@ -51,46 +77,107 @@ The steering script [`runtest.sh`](exec/runtest.sh) performs the following execu
* Executes the postprocessing step script.
* This step typically compares AliPhysics and O<sup>2</sup> output and produces plots.
* Clean after running. (activated by `DOCLEAN=1`)
* Deletes specified files.
* Deletes specified (temporary) files.
* Done
* This step is just a visual confirmation that all steps have finished without errors.

All steps are activated by default and some can be disabled individually by setting the respective activation variables to `0` in user's task configuration.

### Configuration
## Configuration

The steering script [`runtest.sh`](exec/runtest.sh) can be executed with the following optional arguments:

```bash
bash [<path>/]runtest.sh [-h] [-i <input config>] [-t <task config>] [-d]
bash [<path>/]runtest.sh [-h] [-i <input-configuration>] [-t <task-configuration>] [-d]
```

`-h` Prints out the usage specification above.
`<input-configuration>` Input specification script. See [Input specification](#input-specification).

`-d` (Debug mode) Prints out more information about settings and execution.

`<input config>` Input specification
* Bash script that modifies input parameters.
* This script defines which data will be processed.
* Defaults to `config_input.sh` (in the current directory).

`<task config>` Task configuration
* Bash script that cleans the directory, deactivates steps, modifies the JSON file, generates step scripts.
* This script defines what the validation steps will do.
`<task-configuration>` Task configuration script. See [Task configuration](#task-configuration).

* Defaults to `config_tasks.sh` (in the current directory).
* Provides these mandatory functions:
* `Clean` Performs cleanup before and after running.
* `AdjustJson` Modifies the JSON file. (e.g. selection cut activation)
* `MakeScriptAli` Generates the AliPhysics step script.
* `MakeScriptO2` Generates the O<sup>2</sup> step script.
* `MakeScriptPostprocess` Generates the postprocessing step script. (e.g. plotting)
* The `Clean` function takes one argument: `$1=1` before running, `$1=2` after running.
* The AliPhysics and O<sup>2</sup> step scripts take two arguments: `$1="<input file>"`, `$2="<JSON file>"`.
* The postprocessing step script takes two arguments: `$1="<O2 output file>"`, `$2="<AliPhysics output file>"`.

Implementation of these configuration scripts is fully up to the user.
`-d` Debug mode. Prints out more information about settings and execution.

`-h` Help. Prints out the usage specification above.

### Input specification

The input specification script is a Bash script that sets input parameters used by the steering script.

**This script defines which data will be processed and how.**

These are the available input parameters and their default values:

* `INPUT_LABEL="nothing"` Input description
* `INPUT_DIR="$PWD"` Input directory
* `INPUT_FILES="AliESDs.root"` Input file pattern
* `INPUT_SYS="pp"` Collision system (`"pp"`, `"PbPb"`)
* `INPUT_RUN=2` LHC Run (2, 3, 5)
* `INPUT_IS_O2=0` Input files are in O<sup>2</sup> format.
* `INPUT_IS_MC=0` Input files are MC data.
* `INPUT_PARENT_MASK=""` Path replacement mask for the input directory of parent files in case of linked derived O<sup>2</sup> input. Set to `";"` if no replacement needed.
* `JSON="dpl-config.json"` O<sup>2</sup> device configuration

This allows you to define several input datasets and switch between them easily by setting the corresponding value of `INPUT_CASE`.

Other available parameters allow you to specify how many input files to process and how to parallelise the job execution.

### Task configuration

Dummy examples can be found in: [`config/config_input_dummy.sh`](config/config_input_dummy.sh), [`config/config_tasks_dummy.sh`](config/config_tasks_dummy.sh).
The task configuration script is a Bash script that modifies the task parameters used by the steering script.

**This script defines which validation steps will run and what they will do.**

* It cleans the directory, deactivates incompatible steps, modifies the JSON file, generates step scripts.
* The body of the script has to provide these mandatory functions:
* `Clean` Performs cleanup before and after running.
* `AdjustJson` Modifies the JSON file (e.g. selection cut activation).
* `MakeScriptAli` Generates the AliPhysics step script `script_ali.sh`.
* `MakeScriptO2` Generates the O<sup>2</sup> step script `script_o2.sh`.
* `MakeScriptPostprocess` Generates the postprocessing step script `script_postprocess.sh` (e.g. plotting).
* The `Clean` function takes one argument: `$1=1` for cleaning before running, `$1=2` for cleaning after running.
* The AliPhysics and O<sup>2</sup> step scripts take two arguments: `$1="<input-file>"`, `$2="<JSON-file>"`.
* The postprocessing step script takes two arguments: `$1="<O2-output-file>"`, `$2="<AliPhysics-output-file>"`.

Configuration that should be defined in the task configuration includes:

* Deactivation of the validation steps (`DOCLEAN`, `DOCONVERT`, `DOALI`, `DOO2`, `DOPOSTPROCESS`)
* Customisation of the commands loading the AliPhysics, O2Physics and postprocessing environments (`ENV_ALI`, `ENV_O2`, `ENV_POST`). By default the latest builds of AliPhysics, O2Physics and ROOT are used, respectively.
* Any other parameters related to "what should run and how", e.g. `SAVETREES`, `MAKE_GRAPH`, `USEALIEVCUTS`

### Workflow specification

The full O<sup>2</sup> command, executed in the O<sup>2</sup> step script to run the activated O<sup>2</sup> workflows, is generated in the `MakeScriptO2` function using a dedicated Python script [`make_command_o2.py`](exec/make_command_o2.py).
This script generates the command using a **YAML database (`workflows.yml`) that specifies workflow options and how workflows depend on each other**.
You can consider a workflow specification in this database to be the equivalent of a wagon definition on AliHyperloop, including the definition of the wagon name, the workflow name, the dependencies and the derived data. The main difference is that the device configuration is stored in the JSON file.

The workflow database has two sections: `options` and `workflows`.
The `options` section defines `global` options, used once at the end of the command, and `local` options, used for every workflow.
The `workflows` section contains the "wagon" definitions.
The available parameters are:

* `executable` Workflow command, if different from the "wagon" name
* This allows you to define multiple wagons for the same workflow.
* `dependencies` **Direct** dependencies (i.e. other wagons **directly** needed to run this wagon)
* Allowed formats: string, list of strings
* Direct dependencies are wagons that produce tables consumed by this wagon. You can figure them out using the [`find_dependencies.py`](https://github.com/AliceO2Group/O2Physics/blob/master/Scripts/find_dependencies.py) script in O2Physics.
* `requires_mc` Boolean parameter to specify whether the workflow can only run on MC
* `options` Command line options. (Currently not supported on AliHyperloop.)
* Allowed formats: string, list of strings, dictionary with keys `default`, `real`, `mc`
* `tables` Descriptions of output tables to be saved as trees
* Allowed formats: string, list of strings, dictionary with keys `default`, `real`, `mc`

The `make_command_o2.py` script allows you to generate a topology graph to visualise the dependencies defined in the database, using [Graphviz](https://graphviz.org/).
Generation of the topology graph can be conveniently enabled with `MAKE_GRAPH=1` in the task configuration.

Dummy examples of the configuration files can be found in:

* [`config/config_input_dummy.sh`](config/config_input_dummy.sh),
* [`config/config_tasks_dummy.sh`](config/config_tasks_dummy.sh),
* [`config/workflows_dummy.yml`](config/workflows_dummy.yml).

## Preparation

Expand Down Expand Up @@ -136,7 +223,7 @@ sudo apt install parallel

Now you are ready to run the validation code.

**Make sure that your bash environment is clean!
**Make sure that your Bash environment is clean!
Do not load ROOT, AliPhysics, O<sup>2</sup>, O<sup>2</sup>Physics or any other aliBuild package environment before running the framework!**

Enter any directory and execute the steering script `runtest.sh`.
Expand All @@ -156,12 +243,25 @@ If any step fails, the script will display an error message and you should look

If the main log file of a validation step mentions "parallel: This job failed:", inspect the respective log file in the directory of the corresponding job.

## How to add a new workflow

To add a new workflow in the framework configuration, you need to follow these steps.

* Add the workflow in the [task configuration](#task-configuration):
* Add the activation switch: `DOO2_...=0 # name of the workflow (without o2-analysis)`.
* Add the application of the switch in the `MakeScriptO2` function: `[ $DOO2_... -eq 1 ] && WORKFLOWS+=" o2-analysis-..."`.
* If needed, add lines in the `AdjustJson` function to modify the JSON configuration.
* Add the [workflow specification](#workflow-specification) in the workflow database:
* See the dummy example `o2-analysis-workflow` for the full list of options.
* Add the device configuration in the default JSON file.

## Job debugging

If you run many parallelised jobs and some of them don't finish successfully, you can make use of the debugging script [`debug.sh`](exec/debug.sh) in the [`exec`](exec) directory
which can help you figure out what went wrong, where and why.

You can execute the script from the current working directory using the following syntax (options can be combined):

```bash
bash [<path>/]debug.sh [-h] [-t TYPE] [-b [-u]] [-f] [-w] [-e]
```
Expand All @@ -180,10 +280,16 @@ bash [<path>/]debug.sh [-h] [-t TYPE] [-b [-u]] [-f] [-w] [-e]
`-e` Show errors (for all jobs).
## Heavy-flavour analyses
## Specific analyses
### Heavy-flavour analyses
Enter the [`codeHF`](codeHF) directory and see the [`README`](codeHF/README.md).
### Jet analyses
Enter the [`codeJE`](codeJE) directory.
## Keep your repositories and installations up to date and clean
With the ongoing fast development, it can easily happen that updating the O<sup>2</sup>Physics part of the validation
Expand All @@ -197,8 +303,9 @@ This includes updating alidist, AliPhysics, O<sup>2</sup>(Physics), and this Run
as well as re-building your AliPhysics and O<sup>2</sup>(Physics) installations via aliBuild and deleting obsolete builds.
You can execute the script from any directory on your system using the following syntax:
```bash
python <path to the Run3Analysisvalidation directory>/exec/update_packages.py [-h] [-d] [-l] [-c] database
python [<path>/]exec/update_packages.py [-h] [-d] [-l] [-c] database
```
optional arguments:
Expand Down Expand Up @@ -245,7 +352,7 @@ It is possible to check your code locally (before even committing or pushing):
### Space checker
```bash
bash <path to the Run3Analysisvalidation directory>/exec/check_spaces.sh
bash [<path>/]exec/check_spaces.sh
```
### [ClangFormat](https://clang.llvm.org/docs/ClangFormat.html)
Expand All @@ -254,7 +361,7 @@ bash <path to the Run3Analysisvalidation directory>/exec/check_spaces.sh
clang-format -style=file -i <file>
```
### [MegaLinter](https://oxsecurity.github.io/megalinter/latest/mega-linter-runner/)
### [MegaLinter](http://megalinter.io/latest/mega-linter-runner/)
```bash
npx mega-linter-runner
Expand Down
10 changes: 0 additions & 10 deletions codeHF/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,13 +55,3 @@ The postprocessing step produces several plots `comparison_histos_(...).pdf`, `M
To confirm that the output of the default settings looks as expected, compare the produced plots with their reference counterparts `(...)_ref.pdf`.

The complete list of commit hashes used to produce the reference plots can be found in `versions_ref.txt`.

## Add a new workflow

- Add the workflow in the task configuration ([`config_task.sh`](config_tasks.sh)):
- Add the activation switch: `DOO2_...=0 # name of the workflow (without o2-analysis)`.
- Add the application of the switch in the `MakeScriptO2` function: `[ $DOO2_... -eq 1 ] && WORKFLOWS+=" o2-analysis-..."`.
- If needed, add lines in the `AdjustJson` function to modify the JSON configuration.
- Add the workflow specification in the workflow database ([`workflows.yml`](workflows.yml)):
- See the dummy example `o2-analysis-workflow` for the full list of options.
- Add the device configuration in the default JSON file ([`dpl-config_run3.json`](dpl-config_run3.json)).
33 changes: 17 additions & 16 deletions codeHF/config_input.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,29 +8,30 @@ INPUT_CASE=2 # Input case

NFILESMAX=1 # Maximum number of processed input files. (Set to -0 to process all; to -N to process all but the last N files.)

# Number of input files per job (Automatic optimisation on if < 1.)
# Number of input files per job. (Will be automatically optimised if set to 0.)
NFILESPERJOB_CONVERT=0 # Conversion
NFILESPERJOB_ALI=0 # AliPhysics
NFILESPERJOB_O2=1 # O2

# Maximum number of simultaneously running O2 jobs
# Maximum number of simultaneously running O2 jobs. (Adjust it based on available memory.)
NJOBSPARALLEL_O2=$(python3 -c "print(min(10, round($(nproc) / 2)))")

JSONRUN3="dpl-config_run3.json" # Run 3 tasks parameters
# Run 5 tasks parameters for open HF study
JSONRUN5_HF="dpl-config_run5_hf.json"
# Run 5 tasks parameters for onia studies:
# J/psi and X (higher pt cut on 2-prong decay tracks and no DCA cut on single track)
JSONRUN5_ONIAX="dpl-config_run5_oniaX.json"
JSON="$JSONRUN3"

# Default settings:
# INPUT_FILES="AliESDs.root"
# INPUT_SYS="pp"
# INPUT_RUN=2
# INPUT_IS_O2=0
# INPUT_IS_MC=0
# JSON="$JSONRUN3"
# INPUT_LABEL="nothing" # Input description
# INPUT_DIR="$PWD" # Input directory
# INPUT_FILES="AliESDs.root" # Input file pattern
# INPUT_SYS="pp" # Collision system ("pp", "PbPb")
# INPUT_RUN=2 # LHC Run (2, 3, 5)
# INPUT_IS_O2=0 # Input files are in O2 format.
# INPUT_IS_MC=0 # Input files are MC data.
# INPUT_PARENT_MASK="" # Path replacement mask for the input directory of parent files in case of linked derived O2 input. Set to ";" if no replacement needed.
# JSON="dpl-config.json" # O2 device configuration

# O2 device configuration
JSONRUN3="dpl-config_run3.json" # Run 3
# JSONRUN5_HF="dpl-config_run5_hf.json" # Run 5, open HF
# JSONRUN5_ONIAX="dpl-config_run5_oniaX.json" # Run 5, onia (J/psi and X), (higher pt cut on 2-prong decay tracks and no DCA cut on single track)
JSON="$JSONRUN3"

INPUT_BASE="/data2/data" # alicecerno2

Expand Down
7 changes: 3 additions & 4 deletions codeHF/config_tasks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@

####################################################################################################

# Here you can select the AliPhysics and O2Physics branches to load.
# Here you can select the AliPhysics and O2Physics Git branches to load. (You need to have them built with aliBuild.)
# BRANCH_ALI="master"
# ENV_ALI="alienv setenv AliPhysics/latest-${BRANCH_ALI}-o2 -c"
# BRANCH_O2="master"
Expand All @@ -29,9 +29,8 @@ DOPOSTPROCESS=1 # Run output postprocessing. (Comparison plots. Requires DOA
# Disable incompatible steps.
[ "$INPUT_IS_O2" -eq 1 ] && { DOCONVERT=0; DOALI=0; }

# O2 database
DATABASE_O2="workflows.yml"
MAKE_GRAPH=0 # Make topology graph.
DATABASE_O2="workflows.yml" # Workflow specification database
MAKE_GRAPH=0 # Make topology graph.

# Activation of O2 workflows
# Trigger selection
Expand Down
1 change: 1 addition & 0 deletions codeHF/dpl-config_run3.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
"start-value-enumeration": "0",
"end-value-enumeration": "-1",
"step-value-enumeration": "1",
"aod-file-private": "@list_o2.txt",
"aod-file": "@list_o2.txt",
"aod-parent-base-path-replacement": "PARENT_PATH_MASK",
"aod-parent-access-level": 1
Expand Down
2 changes: 1 addition & 1 deletion codeHF/workflows.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ workflows:
# default: ""
# real: ""
# mc: "--doMC"
tables: [] # descriptions of tables to be saved in the output tree (format: str, list), see more detailed format below
tables: [] # descriptions of output tables to be saved as trees (format: str, list), see more detailed format below
# tables:
# default: []
# real: []
Expand Down
Loading

0 comments on commit 79180c3

Please sign in to comment.