You can look into presentation
folder for less technicaly detailed overview.
Note that if we are writing about clustering tracker hits, we are writing about image segmentation int the terms of ML community.
If you have any questions feel free to contact me at adam.mendl@cvut.cz or amendl@hotmail.com.
- Calculate number of tracks in event by CNN (1, 2, 3 or 4 tracks).
- Use generative adversarial networks with convolutional autoencoders for track clustering.
- Predict associated calorimeter hit(s) using single-label classification on clustered events or multi-label classification on not clustered events.
- Tested track counting model on real data. Summary of all efforts is that the model adapts pretty well on measured data.
- First parts of CapsNET layer, hoping to finish this soon.
- Currently working on autoencoders - Adding attention inside Matteos architecture, GAN algorithm finished
- Counting tracks done on my_generator (works really well). It removes the problem that SN-IEGenerator sometimes generates event with less tracks.
- Tested on real data, two main problems:
- isolated tracker hits are detected as one track,
- if the z position of tracker hit cannot be computed, then it is set to zero. Side and Front views of this model detect those as aditional track
- Primitive solutions applied (use only top view and filter isolated tracker hits) and then it works well. See Results section.
- Reformulation of this task to finding number of linear segments might be useful. This means add kinks to generator and possibly fine tune models on physical simulations (Falaise).
The way to add tensorflow into binaries into TKEvent was found. It will only reqire c api for TensorFlow which can be downloaded from TensorFlow webside and links dynamically to analysation code. It might be even possible to use this solution within root macros (i.e. running code via root <filename>.cxx
). For more information, see "Working with real data (future-proof)" section.
From now, this section is only about results on generator
Performance depends on number of tracks in event. For one track, we have 98% accuracy. For more tracks, it is multilabel classification problem, which is much more harder to analyse and measure performance for. However, we can say that multilabel classification approach from Meta Research (Softmax on one hot encoding divided by number of tracks) performs significantly better than classical approach with BinaryCrossentropy. See Results section. However associated calohit without clustering is probably useless and if we have working clustering, we can then us single label classifcation on one track so there is nothing groundbreaking here.
Three strategies proposed:
- Approach by Matteo (basically SegNET architecture - see resources in the end of this document) enhanced by Generative Adversarial Networks. *
- Approach by Matteo enhanced by some kind of attention mechanism. It has been shown (for example on LLMs) that attention models sequences and relations between features well. It can be then trained using GANs.
- Train simple autoencoder. Then, disconnect decoder and use only encoder. The clustering/image segmentation will be done within latent space (output of encoder). It means that we will generate latent representation of event (r1), then we will generate latent representation of event without one track (r2) and train model to go from (r1) to (r2). if we want to see clustered event, we can push the modified image latent representation into decoder.
- Cannot find simple resources about image segmentation within latent space (only really complicated modern foundation models which are definitely overkill for SuperNEMO tracker).
- Will be beneficial only if the latent space is small.
- Some results from fitting givethe idea that this will not work (see next subsection).
- Some of these problems might be resolved by using Variational autoencoder.
- Model (basically one layer with convolutional filters) with two channels as input. One channel wil be the actual event and the second will be the track that we are clustering.
In literature, the approach of learning similarity metrics in GAN manner is called (V)AE/GAN depending on whether autoencoder is variational. See for example Autoencoding beyond pixels using a learned similarity metric.
Testing clustering on events Convolutional autoencoder with output neuron trained on classification tasks:
- is significant number of hits in the track no. 2 missing?
- is significant number of hits in the track no. 1 present?
- and possibly any other criterion where the generator would be failing.
Autoencoders do not by default produce nice almost everywhere differentiable distribution in latent space, therefore Variational autoencoders are commonly used. See Resources section.
- Trying to give hint to TKEvent, where it should search for solution (angle - 5 segments, position on foil - 10 segments)
- Mixed results: For one track we have approximately 70% accuracy (see Results section). Then it falls for events with more tracks.
- Information about kinks.
Information in this section are mostly for CCLyon in2p3 computation cluster.
Almost everything runs ot top of python3
. On CCLyon use python
sourced with root
via
ccenv root 6.22.06
- loadspython 3.8.6
(does not work now)- since July 12 2023
module add Analysis/root/6.22.06-fix01
- loadspython 3.9.1
(currently, this is the way to go)
This software should be installed in python or Anaconda environment (python environment is prefered since it can access both sourced root package and all gpu related software directly, however it is still possible to make it work with Anaconda)
root
- Root is not needed to be explicitly installed in python or Anaconda environment, any sourced Root on CCLyon should work - minimum tested verion 6.22.06 (since July 12 2023 6.22.06-fix01 on CCLyon). PyROOT is required.cudatoolkit
,cudnn
- Should be already installed on CCLyontensorflow
- (ideally 2.13, older version produce some random bug with invalid version of openSSL build on CCLyon. However, there seems to be BUG in 2.13 regarding training on multiple GPUs)keras
- Should be part oftensorflow
keras-tuner
- hyperparameter tuningnumpy
maplotlib
,seaborn
- plottingscikit-learn
- some helper functionspydot
,graphviz
- drawing modelsargparse
Optional:
tensorrt
- really useful, makes inference and training of models faster on CPU. It can be also installed in two steps, first installnvidia-pyindex
and thennvidia-tensorrt
. To use it succesfully within scripts, you should import it before tensorflow!tensorboard
- Should be part oftensorflow
tensorboard_plugin_profile
- profilingnvidia-pyindex
,nvidia-tensorrt
- For TensorRT supportnvidia-smi
- For checking usage and available memory on NVIDIA V100 GPU (on CCLyon)
Example is at example_exec.sh
. Run it with sbatch --mem=... -n 1 -t ... gres=gpu:v100:N example_exec.sh
if you have access to GPU, where N
is number of GPUs you want to use (currently CCLyon does not allow me to use more than three of them) Otherwise, leave out gres
option.
Scripts can use two strategies. To use only one GPU use option --OneDeviceStrategy "/gpu:0"
. If you want to use more GPUs, use for example --MirroredStrategy "/gpu:0" "/gpu:1" "/gpu:2"
. For some reason, I was never able to use more than 3 GPUs on CClyon.
If you start job from bash instance with some packages, modules or virtual environment loaded, you should unload them/deactivate them (use module purge --force
). Best way is to start from fresh bash instance.
- source
root
(andpython
) - currently usemodule add Analysis/root/6.22.06-fix01
- create python virtual environment (if not done yet)
- install packages (if not done yet)
- load python virtual environment (or add
#! <path-to-your-python-bin-in-envorinment>
to first line of your script)
We test models on real data and compared them with TKEvent. Unfortunately, it is not possible to open root
files produced by TKEvent library from python with the same library sourced since this library might be built with different version of python and libstdc++. Fortunately, workaround exists. We need to download and build two versions of TKEvent. First version will be built in the manner described in TKEvent README.md. The second library shoudl be build (we ignore the red_to_tk
target) with following steps:
module add ROOT
whereROOT
is version ofroot
library used bytensorflow
(currentlymodule add Analysis/root/6.22.06-fix01
)TKEvent/TKEvent/install.sh
to build library
Now, we can use red_to_tk
from the first library to obtain root file with TKEvent
objects and open this root file with the second version of TKEvent
library.
If the collaboration will want to use keras models inside software, the best way is probably to use cppflow . It is single header c++ library for acessing TensoFlow C api. This means that we will not have to build TensorFlow from source and we should not be restricted by root/python/gcc/libstdc++ version nor calling conventions.
Not to be restricted by the organization structure of our folders, we use this script to load, register and return local modules.
def import_arbitrary_module(module_name,path):
import importlib.util
import sys
spec = importlib.util.spec_from_file_location(module_name,path)
imported_module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = imported_module
spec.loader.exec_module(imported_module)
return imported_module
sbatch
andtensorflow
sometimes fail to initialize libraries (mainly to source python from virtual environment or root) - start the script again ideally from new bash instance without any modules nor virtual environment loaded.tensorflow
sometimes runs out of memory - Don't use checkpoints fortensorboard
. Another cause of this problem might be training more models in one process, we can solve this bykeras.backend.clear_session()
. If this error occurs after several hours of program execution, check out functiontf.config.experimental.set_memory_growth
.- TensorFlow 2.13 distributed training fail - tensorflow/tensorflow#61314
- Sometimes, there are strange errors regarding ranks of tensors while using custom training loop in
gan.py
- looks like really old still unresolved bug inside core tensorflow library. However, the workaround is to pass only one channel into CNN architecture and concat them withtf.keras.layers.Concatenate
numpy
ocasionally stops working (maybe connected to this issue), failing to import (numpy.core._multiarray_umath
or other parts ofnumpy
library). Is is accompanied by message:
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.
Simplest solution is to create new environment and install all libraries inside that.
lib.py
- small library with some functions that are reused across this projectnumber_of_tracks_classification.py
combined.py
- Script for constructing and training model consisting fromtop
,side
andfront
models.vae.py
- Trains VAE modelsnumber_of_tracks_classification3D.py
- TODO: Classification using Conv3Dplot_confusion.py
- Script helping analyze badly classified eventsclustering_one.py
- example of custom trainig loop for GAN autoencoders used for clusteringmultilabel_analysis.py
- script for analysing various aspect of multilabel classifier used for associated calorimeter hit detectiongan.py
-testing_clustering.py
- code for testing tracker hit clustering algorithmsself_attention.py
- primitive self-attention model for convolutional autoencoder within GANcapsule_lib.py
- basic parts of CapsNET architecture- For more information about CapsNET and capsules, see Resources
class RoutingByAgreement
implements routing by agreement procedure proposed by Hinton usingtf.while_loop
class PrimaryCapsule
- implements the first capsule layer in CapsNET architectureclass SecondaryCapsule
implements the second capsule layer in CapsNET architecture
Testing whether my GAN algorithm works. Replicating https://www.tensorflow.org/tutorials/generative/dcgan. One important problem found: when missing batch normalizatio inside generator architecture, gan fails to reprodudce digits. Possible solutions for modelling longer shapes such a tracks might be self-attention (really want this in architecture, massively beneficial for any ML task and implemented in CapsNET). See short article in Resources.
top.py
top_big.py
side.py
front.py
combined.py
- code generating combined modeltop_associated_calorimeter.py
front_associated_calorimeter.py
side_associated_calorimeter.py
generator.py
- Generator used in GAN architecturediscriminator.py
- Discriminator used in GAN architecturematteo_with_skip_connections.py
- Autoencoder for clustering proposed and tested by Matteo (don't know exactly what the results are and if it was working at all). In fact, it is modified SegNET (see Resources section).matteo_without_skip_connections.py
- The same as above but the skip connections are removed. This should not work for clustering, but it will work as autoencoder.
This folder contains files from SN-IEGenerator (version from Mar 7, 2018) that were modified for out project.
toyhaystack.py
- Clustering of hits into tracks added.
my_generator.cxx
- Alternative for toyhaystack generator in form of root script.generate.sh
- sample bash script for usingmy_generator.cxx
This folder contains essential scripts for loading and preprocessing data
number_of_tracks.py
- If you want to change folder with training and testing data, see line 20.number_of_tracks3D.py
- TODOclustering.py
associated_calohit_multilabel.py
associated_calohit_singlelabel.py
Trained models in TensorFlow format.
top
- Number of tracks classifier viewing detector from top (SN-IEGenerator
)side
- Number of tracks classifier viewing detector from side (SN-IEGenerator
)front
- Number of tracks classifier viewing detector from front (SN-IEGenerator
)combined
- Top, side and front view combined usign transfer learning (SN-IEGenerator
)top_my_generator
- Number of tracks classifier viewing detector from top (my_generator
)side_my_generator
- Number of tracks classifier viewing detector from side (my_generator
)front_my_generator
- Number of tracks classifier viewing detector from front (my_generator
)combined_my_generator
- Top, side and front view combined usign transfer learning (my_generator
)matteo_with_skip
- Autoencoder based on Matteos architecture (my_generator
)matteo_without_skip
- Autoencoder based on Matteos architecture (my_generator
)clustering_matteo_with_skip_connections
- Learned to remove track on left from events with two tracks (my_generator
)clustering_matteo_without_skip_connections
- Learned to remove track on left from events with two tracks (my_generator
)
First attempts to use ML to help TKEvent.
TKEvent
- slightly modified TKEvent library.fit_one_iteratively.py
- uses ml to predict number of tracks and fits one track, removes associated tracker hits from event and repeats until the predicted tracks are fittedspecial_events.py
- can modify events and inspect differences between number of predicted tracks before modificatio and after
Helper files for Variational Autoencoder.
decoders.py
- decoders for VAEsencoders.py
- encoders for VAEslib.py
- VAE architecture and layer for Reparametrization trickmy_dataset_with_hint.py
- implementation of tf.datasets workflow for VAE and my_generator
Aproach to clustering done by Matteo. Sometimes works really well, sometimes really badly.
- Confusion matrix for combined model (SN-IEGenerator)
- Confusion matrix for top model (my_generator)
- Confusion matrix for side model (my_generator)
- Confusion matrix for front model (my_generator)
- Confusion matrix for combined model (my_generator)
- Confusion matrix for prediction of angle from top view (my_generator)
- Comparison of accuracy for associated calorimeter (classical sigmoid approach and softmax approach proposed by Meta Research for multi-label classification tasks) - please not that multi-label classification is complex task, so these results require further analysis!
- Clustering approach by Matteo (no attention, with skip connections)- learned on my_generator with events with 2 tracks to remove left track - sometimes works, sometimes doesnt
- To see this well cat this file into terminal.
- Approach done by Matteo without skip connections produces empty pictures (learned on my_generator with events with 2 tracks to remove left track - sometimes works, sometimes doesnt)
- To be added: Clustering approach by Matteo (attention, with skip connections) - learned on my_generator with events with 2 tracks to remove left track
- To be added: Clustering approach by Matteo (attention, without skip connections) - learned on my_generator with events with 2 tracks to remove left track
- Prediction of number of tracks on real data
- only top view
- run 974
- Top model predicted number of tracks and TKEvent tried to fit this number of tracks/linear segments.
- isolated tracker hits filtered
- Multi-label image classification
- ResNetXt original paper
- Deep Convolutional Generative Adversarial Network
- Building a simple Generative Adversarial Network (GAN) using TensorFlow
- NIPS 2016 Tutorial: Generative Adversarial Networks
- Variational autoencoders Auto-Encoding Variational Bayes
- anbles nicer distribution in latent space, which then can be reused for other ML algorithms or used as generator
- Adversarial autoencoders
- connect distriminator to latent space and force encoder to produce latent space mimicking the prefered distribution
- next diccusses semi-superwised learning and unsuperwised clustering (clustering of images, not image segmentation what we want to achieve)
- Autoencoding beyond pixels using a learned similarity metric
- Summarizes AE, VAE, VAE/GAN approach (my approach for clustering tracker hits)
- Advanced GANs - Exploring Normalization Techniques for GAN training: Self-Attention and Spectral Norm
- Attention + convolution #1, Attention + convolution #2, Attention + convolution #3
- Original CBAM paper
- Series of articles on Capsule architecture
- Preprint on routing algorithm (Capsule architecture)
- Tensorflow implementation 1
- Tensorflow implementation 2
- Matrix Capsules with EM routing
- Meta, Instagram: Exploring the Limits of Weakly Supervised Pretraining - Discussion #1, Discussion #2
- how to use softmax for multi-label classification
- Bags of Tricks for Multi-Label Classification
- Discussion about Tensorflow losses
- A no-regret generalization of hierarchical softmax to extreme multi-label classification
- Metrics for Multi-Label Classification
- good overview: Visual Comparison of Multi-label Classification Results
- Single-label multi-class classification ComDia+
- Multi-lable classification UnTangleMap - can show only one result at a time
Label is set, data samples, to which label was assignet are elements of the set