Automatic classification of ATLAS objects
This repo contains a pipeline to connect to the ATLAS (or PS1) database, get cutouts of difference images, build a data set, train a classifier to differentiate between real and bogus images, and plot the results.
configFile : .yaml with database credentials
mjds : list of nights
stampSize : size of cutouts
stampLocation : where to store cutouts
camera : '02a' for Haleakala, '01a' for Maunaloa
downloadthreads : number of threads
stampThreads : number of threads
-getATLASTrainingSetCutouts.py: It takes as input a config file, a list of dates (in MJD) and a directory to store the output in. It connects to the ATLAS database using the credentials in the config file and gets all exposures for the given time frame. For each exposure it creates a .txt file containing all x,y positions for the objects in the images and a 40x40 pixels cutout image for each object. It also creates a "good.txt" and a "bad.txt" file, containing the x,y positions for the real and bogus objects, respectively.
-getPS1TrainingSetCutouts.py: Same as the above file, but it connects to the PS1 data base instead.
good : file with x,y pixel positions for real objects
bad : file with x,y pixel positions for bogus objects
outputFile : .h5 output file
e : extent (default=10)
E : Extension (default=0)
s : skew, how many bogus objects per real ones(default=3)
r : rotation (default=None)
N : normalization function (default='signPreserveNorm')
-buildMLDataset.py: It takes as input the good.txt and bad.txt files with all x,y positions for real and bogus objects. From those, it builds an .h5 file containing the features (20x20 pixels of the image) and targets (real or bogus label) to be used later as training set.
outputcsv : output csv file
trainingset : .h5 input dataset
classifierfile : .h5 file to store model (classifier)
-kerasTensorflowClassifier.py: It takes as input an .h5 file with the training set and a path to store a classifier as an .h5 file. If the model doesn't exist yet, it creates it, trains it and classifies a test set. It returns a .csv file containing the targets and scores for all images. The classifier used is a [CNN] (http://cs231n.github.io/convolutional-networks/) with the following architecture:
inputFiles : csv files to be plotted, with both target and score for each object
outputFile : output .png file with the plots
-plotResults.py: It takes as input a csv file with the scores and targets for all images and plots the ROC curve and the Detection error tradeoff graph for the data set.
When trying to run one task, the pipeline will search for the necessary resources to complete it and try to run it. If it doesn't find them, it'll run the task that's needed to produce those resources and will keep doing this recursively until it can run the task.
python atlasClassificationPipeline.py Name_of_Task --local-scheduler --name_of_oition1 option1 ... --name_of_optionN optionN
-To run the PlotResults task
python atlasClassificationPipeline.py PlotResults --local-scheduler --inputfiles [file1.csv,...,filen.csv] --outputFile output.png## How to run the pipeline?
For more information on how to run a pipeline, go check the luigi documentation
- create virtual environment with python 3.6 and activate it
- pip install -r requirements.txt