spacetelescope · braingram · Oct 2, 2024 · Aug 8, 2024 · Aug 8, 2024 · Aug 8, 2024
@@ -0,0 +1 @@
+Remove unused arguments to outlier detection.
@@ -0,0 +1 @@
+Update input handling to raise an exception on an invalid input instead of issuing a warning and skipping the step.
@@ -0,0 +1 @@
+Use stcal common code in outlier detection.
@@ -1,16 +1,19 @@
 .. _outlier_detection_step_args:
 
+For more details about step arguments (including datatypes, possible values
+and defaults) see :py:obj:`romancal.outlier_detection.OutlierDetectionStep.spec`.
+
 Step Arguments
 ==============
 The ``outlier_detection`` step has the following optional arguments that control the
 behavior of the processing:
 
-``--weight_type`` (string, default='exptime')
+``--weight_type``
   The type of data weighting to use during resampling the images for creating the
   median image used for detecting outliers; options are `'ivm'`, `'exptime'`,
   and `None` (see :ref:`weight_type_options_details_section` for details).
 
-``--pixfrac`` (float, default=1.0)
+``--pixfrac``
   Fraction by which input pixels are “shrunk” before being drizzled onto the output
   image grid, given as a real number between 0 and 1. This specifies the size of the
   footprint, or “dropsize”, of a pixel in units of the input pixel size. If `pixfrac`
@@ -20,7 +23,7 @@ behavior of the processing:
   output drizzled image is fully populated with pixels from the input image.
   Valid values range from 0.0 to 1.0.
 
-``--kernel`` (string, default='square')
+``--kernel``
   This parameter specifies the form of the kernel function used to distribute
   flux onto the separate output images, for the initial separate drizzling
   operation only. The value options for this parameter include:
@@ -43,7 +46,7 @@ behavior of the processing:
          should never be used for ``pixfrac != 1.0``, and is not recommended
          for ``scale!=1.0``.
 
-``--fillval`` (string, default='INDEF')
+``--fillval``
     The value for this parameter is to be assigned to the output pixels that
     have zero weight or which do not receive flux from any input pixels during
     drizzling. This parameter corresponds to the ``fillval`` parameter of the
@@ -55,77 +58,53 @@ behavior of the processing:
     Any floating-point value, given as a string, is valid.
     A value of 'INDEF' will use the last zero weight flux.
 
-``--nlow`` (integer, default=0)
-  The number of low values in each pixel stack to ignore when computing the median
-  value.
-
-``--nhigh`` (integer, default=0)
-  The number of high values in each pixel stack to ignore when computing the median
-  value.
-
-``--maskpt`` (float, default=0.7)
+``--maskpt``
   Percentage of weight image values below which they are flagged as bad and rejected
   from the median image. Valid values range from 0.0 to 1.0.
 
-``--grow`` (integer, default=1)
-  The distance, in pixels, beyond the limit set by the rejection algorithm being
-  used, for additional pixels to be rejected in an image.
-
-``--snr`` (string, default='4.0 3.0')
+``--snr``
   The signal-to-noise values to use for bad pixel identification. Since cosmic rays
   often extend across several pixels the user must specify two cut-off values for
   determining whether a pixel should be masked: the first for detecting the primary
   cosmic ray, and the second (typically lower threshold) for masking lower-level bad
   pixels adjacent to those found in the first pass.  Valid values are a pair of
-  floating-point values in a single string.
+  floating-point values in a single string (for example "5.0 4.0").
 
-``--scale`` (string, default='0.5 0.4')
+``--scale``
   The scaling factor applied to derivative used to identify bad pixels. Since cosmic
   rays often extend across several pixels the user must specify two cut-off values for
   determining whether a pixel should be masked: the first for detecting the primary
   cosmic ray, and the second (typically lower threshold) for masking lower-level bad
   pixels adjacent to those found in the first pass.  Valid values are a pair of
-  floating-point values in a single string.
+  floating-point values in a single string (for example "1.2 0.7").
 
-``--backg`` (float, default=0.0)
+``--backg``
   User-specified background value (scalar) to subtract during final identification
   step of outliers in `driz_cr` computation.
 
-``--kernel_size`` (string, default='7 7')
-  Size of kernel to be used during resampling of the data
-  (i.e. when `resample_data=True`).
-
-``--save_intermediate_results`` (boolean, default=False)
-  Specifies whether or not to write out intermediate products such as median image or
+``--save_intermediate_results``
+  Boolean specifying whether or not to write out intermediate products such as median image or
   resampled individual input exposures to disk. Typically, only used to track down
   problems with final results when too many or too few pixels are flagged as outliers.
 
-``--resample_data`` (boolean, default=True)
-  Specifies whether or not to resample the input images when performing outlier
+``--resample_data``
+  Boolean specifying whether or not to resample the input images when performing outlier
   detection.
 
-``--good_bits`` (string, default=0)
+``--good_bits``
   The DQ bit values from the input image DQ arrays that should be considered 'good'
   when creating masks of bad pixels during outlier detection when resampling the data.
   See `Roman's Data Quality Flags
   <https://github.com/spacetelescope/romancal/blob/main/romancal/lib/dqflags.py>`_
   for details.
 
-``--allowed_memory`` (float, default=None)
-  Specifies the fractional amount of free memory to allow when creating the resampled
-  image. If ``None``, the environment variable ``DMODEL_ALLOWED_MEMORY`` is used. If
-  not defined, no check is made. If the resampled image would be larger than specified,
-  an ``OutputTooLargeError`` exception will be generated. For example, if set to
-  ``0.5``, only resampled images that use less than half the available memory can be
-  created.
-
-``--in_memory`` (boolean, default=False)
-  Specifies whether or not to keep all intermediate products and datamodels in
+``--in_memory``
+  Boolean specifying whether or not to keep all intermediate products and datamodels in
   memory at the same time during the processing of this step.  If set to `False`,
-  all input and output data will be written to disk at the start of the step
-  (as much as `roman_datamodels` will allow, anyway), then read in to memory only when
-  accessed.  This results in a much lower memory profile at the expense of file I/O,
-  which can allow large mosaics to process in more limited amounts of memory.
+  any `ModelLibrary` opened by this step will use ``on_disk=True`` and use temporary
+  files to store model modifications. Additionally any resampled images will
+  be kept in memory (as long as needed). This can result in much lower memory
+  usage (at the expense of file I/O) to process large associations.
 
 .. _weight_type_options_details_section:
 

@@ -55,20 +55,13 @@ Specifically, this routine performs the following operations:
 
    * The median image is created by combining all grouped mosaic images or
      non-resampled input data pixel-by-pixel.
-   * The ``nlow`` and ``nhigh`` parameters specify how many low and high values
-     to ignore when computing the median for any given pixel.
    * The ``maskpt`` parameter sets the percentage of the weight image values to
      use, and any pixel with a weight below this value gets flagged as "bad" and
      ignored when resampled.
-   * The ``grow`` parameter sets the width, in pixels, beyond the limit set by
-     the rejection algorithm being used, for additional pixels to be rejected in
-     an image.
-   * The median image is written out to disk as `_<asn_id>_median` by default.
 
 #. By default, the median image is blotted back (inverse of resampling) to
    match each original input image.
 
-   * Blotted images are written out to disk as `_<asn_id>_blot` by default.
    * **If resampling is turned off**, the median image is compared directly to
      each input image.
 
@@ -136,26 +129,16 @@ memory usage at the expense of file I/O.  The control over this memory model hap
 with the use of the ``in_memory`` parameter.  The full impact of this parameter
 during processing includes:
 
-#. The ``save_open`` parameter gets set to `False`
+#. The ``on_disk`` parameter gets set to `True`
    when opening the input :py:class:`~romancal.datamodels.library.ModelLibrary`
-   object. This forces all input models in the input
-   :py:class:`~romancal.datamodels.library.ModelLibrary` to get written out to disk.
-   It then uses the filename of the input model during subsequent processing.
+   object. This causes modified models to be written to temporary files.
 
-#. The ``in_memory`` parameter gets passed to the :py:class:`~romancal.resample.ResampleStep`
-   to set whether or not to keep the resampled images in memory or not.  By default,
-   the outlier detection processing sets this parameter to `False` so that each resampled
-   image gets written out to disk.
-
-#. Computing the median image works section-by-section by only keeping 1Mb of each input
-   in memory at a time.  As a result, only the final output product array for the final
-   median image along with a stack of 1Mb image sections are kept in memory.
-
-#. The final resampling step also avoids keeping all inputs in memory by only reading
-   each input into memory 1 at a time as it gets resampled onto the final output product.
+#. Computing the median image uses temporary files. Each resampled group
+   is split into sections (1 per "row") and each section is appended to a different
+   temporary file. After resampling all groups, each temporary file is read and a
+   median is computed for all sections in that file (yielding a median for that
+   section across all resampled groups). Finally, these median sections are
+   combined into a final median image.
 
 These changes result in a minimum amount of memory usage during processing at the obvious
 expense of reading and writing the products from disk.
-
-
-.. automodapi:: romancal.outlier_detection.outlier_detection
@@ -4,9 +4,8 @@ OutlierDetectionStep
 --------------------
 
 This module provides the sole interface to all methods of performing outlier detection
-on Roman observations. The outlier detection algorithm used for WFI data is implemented
-in :py:class:`~romancal.outlier_detection.outlier_detection.OutlierDetection`
-and described in :ref:`outlier-detection-imaging`.
+on Roman observations. The outlier detection algorithm used for WFI data is
+described in :ref:`outlier-detection-imaging`.
 
 .. note::
     Whether the data are being provided in an `association file`_ or as a list of ASDF filenames,

@@ -0,0 +1,45 @@
+import logging
+
+from astropy.units import Quantity
+
+log = logging.getLogger(__name__)
+log.setLevel(logging.DEBUG)
+
+
+def save_median(example_model, median_data, median_wcs, make_output_path):
+    _save_intermediate_output(
+        _make_median_model(example_model, median_data, median_wcs),
+        "median",
+        make_output_path,
+    )
+
+
+def save_drizzled(drizzled_model, make_output_path):
+    _save_intermediate_output(drizzled_model, "outlier_i2d", make_output_path)
+
+
+def _make_median_model(example_model, data, wcs):
+    model = example_model.copy()
+    model.data = Quantity(data, unit=model.data.unit)
+    model.meta.filename = "drizzled_median.asdf"
+    model.meta.wcs = wcs
+    return model
+
+
+def _save_intermediate_output(model, suffix, make_output_path):
+    """
+    Ensure all intermediate outputs from OutlierDetectionStep have consistent file naming conventions
+
+    Notes
+    -----
+    self.make_output_path() is updated globally for the step in the main pipeline
+    to include the asn_id in the output path, so no need to handle it here.
+    """
+
+    # outlier_?2d is not a known suffix, and make_output_path cannot handle an
+    # underscore in an unknown suffix, so do a manual string replacement
+    input_path = model.meta.filename.replace("_outlier_", "_")
+
+    output_path = make_output_path(input_path, suffix=suffix)
+    model.save(output_path)
+    log.info(f"Saved {suffix} model in {output_path}")
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1 @@
		Update input handling to raise an exception on an invalid input instead of issuing a warning and skipping the step.