lstMCpipe

Scripts to ease the reduction of MC data on the LST cluster at La Palma. With this package, the analysis/creation of R1/DL0/DL1/DL2/IRFs can be orchestrated.

Contact: Thomas Vuillaume, thomas.vuillaume [at] lapp.in2p3.fr Enrique Garcia, garcia [at] lapp.in2p3.fr Lukas Nickel, lukas.nickel [at] tu-dortmund.de

Cite us

If lstMCpipe was used for your analysis, please cite:

@misc{garcia2022lstmcpipe,
      title={The lstMCpipe library},
      author={Enrique Garcia and Thomas Vuillaume and Lukas Nickel},
      year={2022},
      eprint={2212.00120},
      archivePrefix={arXiv},
      primaryClass={astro-ph.IM}
}

in addition to the exact lstMCpipe version used from https://doi.org/10.5281/zenodo.6460727

You may also want to include the config file with your published code for reproducibility.

Install

As as user:

wget https://raw.githubusercontent.com/cta-observatory/lstmcpipe/master/environment.yml
conda env create -f environment.yml
conda activate lstmcpipe
pip install lstmcpipe

This will setup a new enviroment with lstchain and other needed tools available in supported versions. If you already have your lstchain conda environment, you may simply activate it and install lstmcpipe there using pip install lstmcpipe.

HIPERTA (referred to as rta in the following) support is builtin, but no installation instructions can be provided as of now.

Alternatively, you can install lstmcpipe in your own enviroment to use different versions of the analysis pipelines. WARNING: Due to changing APIs and data models, we cannot support other versions than the ones specified in the enviroment.

As as developer:

git clone https://github.com/cta-observatory/lstmcpipe.git
cd lstmcpipe
conda env create -n lstmcpipe_dev -f environment.yml
conda activate lstmcpipe_dev
pip install -e .
pre-commit install

This will setup a pre-commit hook: Given that you are in the right enviroment, it will run and format files you are about to commit with black. (You need to stage the changes again after that). This ensures the formatting of the code follows our guidelines and there is less work dealing with the code checker in the CI.

Requesting a MC analysis

You may find the list of already run productions in the documentation. Please check in this list that the request you are about to make does not exist already!

As a LST member, you may require a MC analysis with a specific configuration, for example to later analyse a specific source with tuned MC parameters.

To do so, please:

Open a pull request from your fork into lstMCpipe, adding the desired configuration in a new directory named date_ProdID in production_configs.
You may have a look at the production_configs/template_prod as an example.
Add a descriptive production ID (e.g. src***_psf_tuned) to you directory and configuration.
The requested config must contain:

a lstchain config file (please provide an exhaustive config that will help others and provide a more explicit provenance information)
a lstmcpipe config file (to generate it, please refer to the documentation)
a readme with a short description of why you require this analysis to be run. Do not add information that should not appear publicly (such as source names) here. If you are requesting a production for a specific new source, please edit this table on LST wiki. Also add the command line to generate the lstmcpipe config, that will help debugging.

The proposed configuration will be tested for validity by continuous integration tests and we will interact with you to run the analysis on the cluster at La Palma.

Depending on the number of requests, we may give priorities.

Quickstart

To generate your lstmcpipe configuration file, use lstmcpipe_generate_config command. If the type of production you want is not listed in the existing ones, you may create your own PathConfig class from an existing one, or generate a config from an existing prod type and edit the file manually.

Once you have your configuration file, you way launch the pipeline with the described stages in the config using:

lstmcpipe -c config_MC_prod.yml -conf_lst lstchain_*.json [-conf_cta CONFIG_FILE_CTA] [-conf_rta CONFIG_FILE_RTA] [--debug] [--log-file LOG_FILE]

The lstmcpipe_start.py script is the orchestrator of the pipeline, it schedules the stages specified in the onsite_MC_prod.yml file. All the configuration related with the MC pipe must be declared in this file (stages, particles to be analysed, zenith, pointing, type of MC production...).

Pipeline-specific configuration options (such as cleaning or model parameters) are declared in a different configuration file, which is passed via the options -conf_lst/-conf_cta/-conf_rta.

Note: You can always launch this command without fear; there is an intermediate step that verifies and shows the configuration that you are passing to the pipeline.

The use of slurms jobarrays in the r0_to_dl1 stage in combination with a limited amount of maximum jobs running at the same time reduces the load on the cluster compared to previous versions, but please note that it still requires a lot of resources to process a full MC production. Think about other LP-IT cluster users.

Stages

After launching of the pipeline all selected tasks will be performed in order. These are referred to as stages and are collected in lstmcpipe/stages. Following is a short overview over each stage, that can be specified in the configuration.

r0_to_dl1

In this stage simtel-files are processed up to datalevel 1 and separated into files for training and for testing. For efficiency reasons files are processed in batches: N files (depending on paricle type as that influences the averages duration of the processing) are submitted as one job in a jobarray. To group the files together, the paths are saved in files that are passed to python scripts in lstmcpipe/scripts which then call the selected pipelines processing tool. These are:

lstchain: lstchain_mc_r0_to_dl1
ctapipe: ctapipe-stage1
rta: lstmcpipe_hiperta_r0_to_dl1lstchain (lstmcpipe/hiperta/hiperta_r0_to_dl1lstchain.py)

dl1ab

As an alternative to the processing of simtel r0 files, existing dl1 files can be reprocessed. This can be useful to apply different cleanings or alter the images by adding noise etc. For this to work the old files have to contain images, i.e. they need to have been processed using the no_image: False flag in the config. The config key dl1_reference_id is used to determine the input files. Its value needs to be the full prod_id including software versions (i.e. the name of the directories directly above the dl1 files). For lstchain the dl1ab script is used, ctapipe can use the same script as for simtel processing. There is no support for hiperta!

merge_dl1

In this stage the previously created dl1 files are merged so that you end up with train and test datesets for the next stages.

train_test_split

Split the dataset into training and testing datasets, performing a random selection of files with the specified ratio (default=0.5).

train_pipe

IMPORTANT: From here on out only lstchain tools are available. More about that at the end.

In this stage the models to reconstruct the primary particles properties are trained on the gamma-diffuse and proton train data. At present this means that random forests are created using lstchains lstchain_mc_trainpipe Models will be stored in the models directory.

dl1_to_dl2

The previously trained models are evaluated on the merged dl1 files using lstchain_dl1_to_dl2 from the lstchain package. DL2 data can be found in DL2 directory.

dl2_to_irfs

Point-like IRFs are produced for each set of offset gammas. The processing is performed by calling lstchain_create_irf_files.

dl2_to_sensitivity A sensitivity curve is estimated using a script based on pyirf which performs a cut optimisation similar to EventDisplay. The script can be found in lstmcpipe/scripts/script_dl2_to_sensitivity.py. This does not use the IRFs and cuts computed in dl2_to_irfs, so this can not be compared to observed data. It is a mere benchmark for the pipeline.

Logs and data output

NOTE: lstmcpipe expects the data to be located in a specific structure on the cluster. Output will be written in a stanardized way next to the input data to make sure everyone can access it. Analysing a custom dataset requires replicating parts of the directory structure and is not the intended use case for this package.

All the `r0_to_dl1 stage job logs are stored /fefs/aswg/data/mc/running_analysis/.../job_logs and later moved to /fefs/aswg/data/mc/analysis_logs/.../.

Every time a full MC production is launched, two files with logging information are created:

log_reduced_Prod{3,5}_{PROD_ID}.yml
log_onsite_mc_r0_to_dl3_Prod{3,5}_{PROD_ID}.yml

The first one contains a reduced summary of all the scheduled job ids (to which particle the job corresponds to), while the second one contains the same plus all the commands passed to slurm.

Steps explanation

The directory structure and the stages to run are determined by the config stages. After that, the job dependency between stages is done automatically.

If the full workflow is launched, directories will not be verified as containing data. Overwriting will only happen when a MC prods sharing the same prod_id and analysed the same day is run

If each step is launched independently (advanced users), no overwriting directory will take place prior confirmation from the user

Example of default directory structure for a prod5 MC prod:

/fefs/aswg/data/
 ├── mc/
 |   ├── DL0/20200629_prod5_trans_80/{particle}/zenith_20deg/south_pointing/
 |   |   └── simtel files
 |   |
 |   ├── running_analysis/20200629_prod5_trans_80/{particle}/zenith_20deg/south_pointing/
 |   |   └── YYYYMMDD_v{lstchain}_{prod_id}/
 |   |       └── temporary dir for r0_to_dl1 + merging stages
 |   |
 |   ├── analysis_logs/20200629_prod5_trans_80/{particle}/zenith_20deg/south_pointing/
 |   |   └── YYYYMMDD_v{lstchain}_{prod_id}/
 |   |       ├── file_lists_training/
 |   |       ├── file_lists_testing/
 |   |       └── job_logs/
 |   |
 |   ├── DL1/20200629_prod5_trans_80/{particle}/zenith_20deg/south_pointing/
 |   |   └── YYYYMMDD_v{lstchain}_{prod_id}/
 |   |       ├── dl1 files
 |   |       ├── training/
 |   |       └── testing/
 |   |
 |   ├── DL2/20200629_prod5_trans_80/{particle}/zenith_20deg/south_pointing/
 |   |   └── YYYYMMDD_v{lstchain}_{prod_id}/
 |   |       └── dl2 files
 |   |
 |   └── IRF/20200629_prod5_trans_80/zenith_20deg/south_pointing/
 |       └── YYYYMMDD_v{lstchain}_{prod_id}/
 |           ├── off0.0deg/
 |           ├── off0.4deg/
 |           └── diffuse/
 |
 └── models/
     └── 20200629_prod5_trans_80/zenith_20deg/south_pointing/
         └── YYYYMMDD_v{lstchain}_{prod_id}/
             ├── reg_energy.sav
             ├── reg_disp_vector.sav
             └── cls_gh.sav

Real Data analysis

Real data analysis is not supposed to be supported by these scripts. Use at your own risk.

Pipeline Support

So far the reference pipeline is lstchain and only with it a full analysis is possible. There is however support for ctapipe and hiperta as well. The processing up to dl1 is relatively agnostic of the pipeline; working implementations exist for all of them.

In the case of hiperta a custom script converts the dl1 output to lstchain compatible files and the later stages run using lstchain scripts.

In the case of ctapipe dl1 files can be produced using ctapipe-stage1. Once the dependency issues are solved and ctapipe 0.12 is released, this will most likely switch to using ctapipe-process. We do not have plans to keep supporting older versions longer than necessary currently. Because the files are not compatible to lstchain and there is no support for higher datalevels in ctapipe yet, it is not possible to use any of the following stages. This might change in the future.

Name		Name	Last commit message	Last commit date
Latest commit History 606 Commits
.github		.github
docs		docs
lstmcpipe		lstmcpipe
production_configs		production_configs
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
CITATION.cff		CITATION.cff
README.rst		README.rst
codemeta.json		codemeta.json
environment.yml		environment.yml
license.rst		license.rst
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lstMCpipe

Cite us

Install

Requesting a MC analysis

Quickstart

Stages

Logs and data output

Steps explanation

Real Data analysis

Pipeline Support

About

Releases

Packages

Languages

License

aaguasca/lstmcpipe

Folders and files

Latest commit

History

Repository files navigation

lstMCpipe

Cite us

Install

Requesting a MC analysis

Quickstart

Stages

Logs and data output

Steps explanation

Real Data analysis

Pipeline Support

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages