Skip to content

Fast Automated Spectral Extraction Software for IFU Datacubes

License

Notifications You must be signed in to change notification settings

a-griffiths/AutoSpec

Repository files navigation

AutoSpec

DOI

This software aims to provide fast, automated extraction of high quality 1D spectra from astronomical datacubes with minimal user effort. AutoSpec takes an IFU datacube and a simple parameter file in order to extract a 1D spectra for each object in a supplied catalogue. A custom designed cross-correlation algorithm helps to improve signal to noise as well as isolate sources from neighbouring contaminants. A preprint of the science paper describing the software can be found on the arXiv.

Contents

Getting Started

Currently the code has only been tested on a Linux system but it should work as long as the prerequisites are met. Because of this all installation and running instructions are based on Linux systems (for now).

Prerequisites

Before you start make sure you have the following pieces of software installed:

- Python 2.7 or 3.3+
- Source Extractor (https://www.astromatic.net/software/sextractor)

You will also need the following python pacakges:

- numpy
- matplotlib
- seaborn
- mpdaf (http://mpdaf.readthedocs.io/en/latest/) and its dependencies.

Installing

AutoSpec doesn't need to be installed, just clone or download this github repository. With git clone use:

git clone https://github.com/a-griffiths/AutoSpec.git

The easiest way to run AutoSpec is to make it it executable. To do this simply navigate to the AutoSpec directory and run chmod +x AutoSpec. You also need to add the line export PATH=$PATH:/installed_dir/AutoSpec/ to your .bashrc file.

Usage

AutoSpec is designed to be as intuitive as possible, this section will provide a brief run through of the different elements of the software.

First off, create a working folder; this should contain a minimum of the datacube, parameter file and catalogue file. The default parameter file can be found here (important note: This file needs to be in your working directory and named 'param.py' for AutoSpec to function correctly), and an example catalogue file can be found here. Alternatively, working examples of each can be found in the test folder on github. You can also include any additional images, segmentation maps and SExtractor configuration files.

AutoSpec runs in two main operating modes which can be specified in the parameter file:

  • 'param' mode: runs the software with a constant set of parameters (defined in the param file) for every source in the catalogue.
  • 'cat' (catalogue) mode: runs AutoSpec with different parameters for each object based on values specified in the catalogue file. For more info on this see the catalogue section.

Edit the parameter file with a your usual python or text editor, making sure to keep the correct python formatting (as detailed below). The code can then be run through the command line by simple navigating to your working directory and running:

AutoSpec

* Note: If you haven't made AutoSpec executable then you will need to run AutoSpec directly from the software directory

/installed_dir/AutoSpec

** Note: All output subdirectories will be automatically created.

The Catalogue

The first thing you need to do is create a catalogue file (simple text or csv file, not fits). There are two options for this; if you run the code in parameter mode (MODE = 'param' in the param.py file), you only need to supply the ID, RA and DEC for each source:

#ID     RA      DEC
(int)   (deg)   (deg)

In this mode, AutoSpec will also accept any existing catalogue as long as the first three columns follow this same format (this includes the output catalogue generated by MUSELET).

Alternatively, if you run in catalogue mode (MODE = 'cat' in the param.py file), you will also need to specify the SIZE and REF values for each source. 'Size' sets the subcube/image extraction size, this is the per object version of the SIZE parameter in param.py. 'REF' specifies which weight image or aperture size to use as the as the main extraction on a source to source basis (note that the weight image or aperture must also be specified in param.py). If the reference is not specified properly AutoSpec will default to using the while-light weighted extraction. The format of the catalogue in this case should follow:

#ID     RA      DEC     SIZE       REF
(int)   (deg)   (deg)   (arcsec)   (str)

The Parameter File

The parameter file provides easy user modification to AutoSpec run modes. Each parameter is briefly explained in the comments of the file, but more in depth explanation is provided below. The incorrect specification of parameters in this file is the most likely place you can go wrong. This is a python based file so make sure to follow python conventions, it might be best to edit the example parameter file provided if you are unfamiliar with python convensions. For any variables that you with to leave empty, make sure you leave empty quotation marks ('' or "") or the default which is stated in the parameter descriptions bellow and in the file. Where multiple values can be provided (APER, IMG and SEG), make sure you separate values with a comma and identify filenames with single or double quotation marks (' or "), i.e. 'one', two'... or for the APER parameter 1.0, 1.5, 2.0...

MODE: this is the main operating mode AutoSpec will run in. For parameter mode ('param'), AutoSpec will extract each source within the catalogue based only on the settings provided in the parameter file. In catalogue mode ('cat'), the software will take extraction size and reference from the catalogue file on a source to source basis.

REF: this is the reference spectrum to use for cross-correlation. This can either be an aperture size or image name. If an aperture size is specified it must also exist in the APER parameter. Likewise, if you provide an image this must either exist in the IMG parameter or be left empty in which case it will uses the white light image created from the datacube.

DATA_EXT: this allows the user to specify the data and/or variance extensions of the datacube if they can not be automatically detected by AutoSpec (necessary for the likes of MaNGA cubes). These can be specified by extension number or name. For data only, this should be in the form of int or str, for data and variance specification, use (int,int) or (str,str). When not in use leave as ().

APER: this is a list of aperture sizes (in arcseconds) in which to extract spectra, this can either be a single value or a list of values (i.e. 2.0 or 1.0, 1.5, 2.0). Maximum value should be less than or equal to half the SIZE parameter.

IMG: list of additional image file names. AutoSpec will used each of the images to produce weighted spectra and derived object and sky masks. This is either a single file name as a string ('g-band.fits') or a list of file names ('g-band.fits', 'r-band.fits').

USE_IMGS: tells AutoSpec if you would like to use your additional images when deriving the object and sky masks. This is best set to false if using a pre-defined segmentation map but want to output spectra weighted by images. If you want AutoSpec to calculate segmentation maps for you from the images then use True here.

OBJ_MASK: here we define if you want to use the intersection or the union of the individual segmentation maps to produce an object mask in order to build the initial spectrum. The intersection mask only selects areas where the object overlaps in the combined segmentation maps, whereas union uses the combination. If you choose intersection and it is found to be empty, the code defaults to using the union mask in order to successfully extract a spectrum.

SEG: similar to IMG but contains a list of additional segmentation map files. A caveat here; because of the nature of segmentation maps they can not be easily rescaled to new pixel sizes. Thus, if you want to use this option, idealy you should produce the maps with the same pixel scale and WCS as the datacube you are using. I have found the easiest way to do this is register your image with a white light imaged created from the datacube BEFORE you run SExtractor.

OUTPUT: name of the output directory files will be saved to (this will be created by AutoSpec).

PRE_OUT: string to prepend to the output files (will be followed by '...id.fits')

SIZE: this is the size of the subcube and postage stamp images the software will create (in arcseconds). If in parameter mode, make sure this is at least as big as your largest source in the data. If in catalogue mode, this is defined in the catalogue file instead, this parameter however will be used as default if value is missing. Note that this size is the full size of the image/cube (i.e. 5 will produce a 5x5 arcsecond cut out centred on the RA and DEC from the catalog). You should also consider processing time when deciding on this value, a larger size will mean bigger subcubes and images and larger extraction times.

XCOR: this parameter tells the code if it should perform the extra cross-correlation step or not. This can produce a higher S/N final spectra but adds on a little more time for each object. This method is beneficial if you have no ancillary imaging along with your datacube. Also, combined with continuum subtraction (CONT_SUB) this is a powerful tool for deblending sources from neighbouring contaminants. See the AutoSpec paper for more information. Cross-correlation spectra is extracted both with, and without the continuum subtraction.

CONT_SUB: here we decide if we want to perform continuum subtraction. This step is crucial if your sources are likely to have neighbouring contaminents that are likely to fall within the same cut outs. Note that this step will only run if you also have set XCOR to True.

CONT_POLY: sets the order of the polynomial to fit the continuum, 5 tends to be a good start.

PLOTS: this parameter lets the user decide if the they want to output plots or not. The software creates up to 4 plots per source; ID_IMAGES.jpg will show a postage stamp of each of the images with the corresponding segmentation map and additional segmentation maps provided. ID_MASKS.jpg shows the sky and object masks generated by AutoSpec, ID_SPECTRA.jpg shows the reference and cross correlation spectra. Finally ID_XCOR.jpg will display the calculated cross-correlation maps.

OUT_XXX: these options let the user decide which objects they want saving in the output source fits file. The average size of the output (with 2 additional images) is ~ 35Mb if you save everything. The default setting is to not save the subcubes as they contribute almost all of this file size (with the subcubes turned off the file size is generally only a few hundred kB).

CMAP: here you can specify which of the matplotlib colour maps you would like to use. Examples of the default colourmaps can be found here.

ORIG_XXX: are related to the MPDAF pacakge (see here) which detail the origin of the source information (detection software and datacube info). This will be saved in the fits headers for all sources.

WARNINGS: sometimes your datacube might have some extra headers that astropy doesn't like. MPDAF also outputs a lot of info into the terminal that isn't necessary. Can be turned on to debug any issue you may be having.

SExtractor File

SExtractor will use the default.nnw, default.param, default.sex and .conv files present in the working directory. If not present, default parameter files are created and used. It is best to try running SExtractor on the images first to get the settings right. If you are using multiple images where a single SExtractor file is not ideal, you can create the segmentation maps using your input images outside of AutoSpec and load them under the SEG parameter instead (be sure to read the caveat first).

Running the Code

My advice would be to try this on a single object first, make sure it works how you want by outputting the plots and/or all of the images, spectra etc (defined in OUT_XXX parameters explained above). If you are using AutoSpec to produce segmentation maps you should also check that the SExtractor file (default.sex) is set up correctly for your data, a simple check would be to look at the ID_IMAGES.jpg output and see how well it is defining the segmentation maps. If you haven't used SExtractor before there is much too much to explain here but the for dummies manual is a good place to start.

If you are using the extra cross-correlation step, you may want to check if it is doing a good job. The best way to do this is to compare the reference and final spectra (top and bottom on the ID_SPECTRA.jpg image). The cross-correlation spectrum usually has visibly better signal to noise, and the emission/absorption features tend to be more well defined for sources with contamination or extractions without ancillary imaging. Otherwise, results tend to be comparable to extractions weighted by deep imaging.

You can also check the output file by following the steps details below.

Loading the Output

The output files are created and saved via the MPDAF framework (here) in fits format. You should be able to open these however you normally open fits files but some basic python commands are detailed here (see the mpdaf page for more):

from mpdaf.sdetect import Source

# load the file.
source = Source.from_file('filename.fits')

# view the contents.
source.info()

# plot an image.
source.images['MUSE_WHITE'].plot(title='MUSE WHITE')

Running the tests

Download the test folder from this github page to somewhere on your computer. Additionally, you will need to download the datacube into the test folder, the datacube can be downloaded from here (can't be uploaded to github due to filesize).

You can run the code as is by opening a terminal and navigating to the test directory and running the code with:

# exacutable
AutoSpec

# non-exacutable
/installed_dir/AutoSpec

More information can be found in the usage section.

I tried to choose test data in which there were a range of objects at various redshifts, some of which need deblending from a neighbouring source (look at objects ID:207 and 208).

For further test data, there are various datasets availabe on the MUSE website and at the ESO Archive.

Current Issues

  • Only one SExtractor file for each run (over all images). This is due to the way the MPDAF module works. To avoid this issue, run SExtractor manually and import the segmentation maps via the SEG parameter instead of defining the images in the IMG parameter (again, be sure to read the caveat for this,as the segmentation maps need to be aligned with the datacube before hand to function correctly).

Further Improvements

Heres a list of functionality that I'd like to add in the near future:

  • Test/adapt code to work with python 2?
  • Fix spectra that don't extract due to empty segmentation maps.
  • Let users specify a weight image for intial spectral extraction.
  • Create output for summary of results (if successful or error encountered etc).
  • Test on other systems (windows/mac)
  • Test compatibility with other datacubes (not just MUSE).
  • Allow user to specify number of cores to use. (Now uses faster numpy method instead of multiprocessing)
  • Fix automatic MUSE naming for use with different data.
  • Let user import a segmentation map instead of images.
  • Add more useful information to output logs.
  • Let user specify wavelength ranges to extract narrow band images around emission lines.
  • Implement an itterative process to perform cross-correlation mapping.

...and some more long term goals:

  • Create GUI interface.
  • Direct redshift estimation?
  • Add option to output fits file for MARZ redshift analysis.
  • Integrate the input of MUSELET and other input catalogues (see here)
  • Improve speed of continuum subtraction.
  • Adapt code to work on a per object basis.

Authors

How to Cite

The paper describing the original method can be found here: http://adsabs.harvard.edu/abs/2018arXiv180705922G or here https://arxiv.org/abs/1807.05922

Please cite AutoSpec as:

\bibitem[Griffiths \& Conselice(2018)]{2018arXiv180705922G} Griffiths, A., \& Conselice, C.~J.\ 2018, arXiv:1807.05922, Zenodo:1305848  

Acknowledgements

This creation of this software wouldn't have been be possible without:

License

Copyright (c) 2018, Alex Griffiths

AutoSpec is licenced under a BSD 3-Clause License

Changelog

v.1.1.2: September 20, 2018

  • Small fixes over 1.1.1 where apertures were not extracting correctly.
  • Changed USE_WHITE parameter to USE_IMGS parameter for users with existing segmentation maps.

v.1.1.1: September 7, 2018

  • Small fix over 1.1.0 where images were not importing properly.

v.1.1.0: September 5, 2018

  • Tidied up code and simplified parameter file.
  • Fixed issue where weighted spectra were not being extracted correctly.
  • Cross-correlation based spectra is extracted before and after continuum subtraction.
  • Cross-correlation extractions uses masks as defined by the reference spectra.
  • Now extracts spectra for all defined apertures and weight images supplied in param.py as well as white light.
  • Fixed various other small bugs.

v.1.0.1: July 23, 2018

  • Added DATA_EXT parameter for user to specify data and/or variance datacube extensions.
  • Cleaned up some code.
  • Added citation information to README

v.1.0.0: July 5, 2018

  • Increased the speed of continuum subtraction routine (now ~8x faster).
  • Adapted code to work on python 2.
  • Fixed a number of small issues.
  • Can extract spectra from a list of user defined apertures.
  • User can now import additional segmentation maps.
  • User can specify extraction methods on a per object basis.

Testing: March 23, 2018

  • Functioning code for testing purposes.