deploy: 01c3fc3

salesforce · Mar 22, 2023 · d52f20c · d52f20c
1 parent 1450a58
commit d52f20c
Show file tree

Hide file tree

Showing 83 changed files with 7,990 additions and 16,677 deletions.
diff --git a/latest/_sources/architecture.rst.txt b/latest/_sources/architecture.rst.txt
@@ -0,0 +1,194 @@
+Merlion Architecture
+====================
+This document is intended for Merlion developers. It outlines the architecture of Merlion's key components,
+and how they interact with each other. In general, everything in this document describes the ``base.py`` files
+of the modules being discussed.
+
+Transforms
+----------
+:doc:`Transforms <merlion.transform>` in Merlion apply various useful pre-processing to time series data.
+
+Training
+^^^^^^^^
+Many transforms are *trainable*.
+For example, if we want to normalize the data to have zero mean and unit variance, we use training data to learn the
+mean and variance of each variable in the time series. If we wish to resample the data to a fixed granularity, we use
+the most commonly observed timedelta in the training data.
+
+Inversion
+^^^^^^^^^
+Many transforms are *invertible*.
+For example, one may invert the normalization ``y = (x - mu) / sigma`` via ``x = sigma * y + mu``.
+However, other transforms are lossy, and the input cannot be recovered without a *state*. For example, consider the
+difference transform ``y[i+1] = x[i+1] - x[i]``. We need to record ``x[0]`` as the ``transform.inversion_state``
+in order invert the difference transform and recover ``x`` from ``y``.
+
+For invertible transforms which require an inversion state, we handle the inversion state as follows:
+
+* When the transform is called, the inversion state is set. For example, if ``diff = DifferenceTransform()``,
+  ``y = diff(x)`` will record the first observation of each univariate in ``x`` as its inversion state.
+* When ``transform.invert(y)`` is called, the inversion state is reset to ``None``, unless the user explicitly
+  invokes ``transform.invert(y, retain_inversion_state=True)``. This ensures that the user doesn't inadvertently
+  apply a stale inversion state to a new time series.
+
+Some transforms are not invertible at all (e.g. resampling). In these case, ``transform.invert(y)`` simply returns
+``y``, and a warning is emitted.
+
+Multivariate Time Series
+^^^^^^^^^^^^^^^^^^^^^^^^
+For the time being, all transforms are applied identically to all univariates in a time series.
+We generally track the variables required for each univariate via a dictionary that maps the name of the univariate to
+the variables relevant for it. We explicitly use the names of each univariate to ensure robustness to ensure that
+everything behaves as expected even if the individual variables are reordered.
+
+A notable limitation of the current implementation is the fact that we cannot currently apply different transforms to
+different univariates. For example, we cannot mean-variance normalize univariate 0 and apply a difference transform
+to univariate 1. If there is demand for this sort of behavior in the future, we may consider adding a parameter to
+each transform which indicates the names of the univariates it should be applied to. This may be combined with a
+:py:class:`TransformStack <merlion.transform.sequence.TransformStack>` to apply different transforms to different
+univariates. A new tutorial should be written if this feature is added.
+
+Models
+------
+:doc:`Models <merlion.models>` are the central object in Merlion.
+
+Pre-Processing
+^^^^^^^^^^^^^^
+Each ``model`` has a ``model.transform`` which pre-processes the data. Automatically applying this transform at both
+training and inference time (and inverting the transform for forecasting) is a key feature of Merlion models. In
+reality, it is worth noting that ``model.transform`` is generally a reference to ``model.config.transform``.
+If your data is already pre-processed, then you can set ``model.transform`` to be the
+:py:class:`Identity <merlion.transform.base.Identity>`.
+
+When ``model.train()`` is called, the first step is to call ``model.train_pre_process()``. This method
+
+* Records the dimension of the training data as ``model.dim``
+* Trains ``model.transform`` and applies it to the training data
+* Records the sampling frequency of the transformed training data as ``model.timedelta``
+  (as well as the offset ``model.timedelta_offset``)
+* For forecasters, we additionally train and apply ``model.exog_transform`` on the exogenous data if any are given.
+  We also record the dimension of the exogenous data as ``model.exog_dim``.
+
+For anomaly detection, ``model.get_anomaly_score(time_series, time_series_prev)``
+includes the following pre-processing steps:
+
+* Apply ``model.transform`` to the concatenation of ``time_series_prev`` and ``time_series``.
+* Ensure that the data's dimension matches the dimension of the training data.
+
+For forecasting, ``model.forecast(time_stamps, time_series_prev, exog_data)``
+includes the following pre-processing steps:
+
+* If the model expects time series to be sampled at a fixed frequency, resample ``time_stamps``
+  to the frequency specified by ``model.timedelta`` and ``model.timedelta_offset``.
+* Save the current inversion state of ``model.transform``, and then apply ``model.transform`` to ``time_series_prev``.
+* If ``exog_data`` is given, apply ``model.exog_transform`` to ``exog_data``, and
+  resample ``exog_data`` to the same time stamps as ``time_series_prev`` (after the transform) and ``time_stamps``.
+* Ensure that the dimensions of ``time_series_prev`` and ``exog_data`` match the training data.
+  See `tutorials/forecast/3_ForecastExogenous` for more details on exogenous regressors.
+
+User-Defined Implementations
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+After pre-processing the input data, we pass it to the user-defined implementations ``model._train()``,
+``model._train_with_exog()``, ``model._get_anomaly_score()``, or ``model._forecast()``. These methods do the real work
+of training or inference for the underlying model, and these are the methods that must be manually defined for each new model.
+
+Post-Processing
+^^^^^^^^^^^^^^^
+After training, both anomaly detectors and forecasters apply ``model.train_post_process()`` on the output of
+``model._train()``. For anomaly detectors, this involves training their post-rule (calibrator and threshold) and then
+returning the anomaly scores returned by ``model._train()``. For forecasters, this involves applying the inverse of
+``model.transform`` on the forecast returned by ``model._train()``.
+
+For anomaly detectors, the final step of calling ``model.get_anomaly_label()`` is to apply the post-rule on the
+unprocessed anomaly scores. For forecasters, we apply the inverse transform on the forecast and then set the inversion
+state of ``model.transform`` to be what it was before ``model.forecast()`` was called.
+
+Multiple Time Series
+^^^^^^^^^^^^^^^^^^^^
+If we extend Merlion to accommodate training models on multiple time series, we must make some changes to the way that
+models handle transforms. In particular,
+
+* ``model.transform`` should be re-trained for each time series individually.
+
+    * At training time, we will probably need to write a new method ``model.train_pre_process_multiple()`` which
+      uses a different copy of ``model.transform`` for each time series. The other functionality should be similar to
+      ``model.train_pre_process()``.
+    * At inference time, ``time_series_prev`` must be a required parameter, and a copy of ``model.transform``
+      should be trained on ``time_series_prev``.
+* To make training code easier to write, ``model.train_multiple()`` probably doesn't need to return anything when
+  trained on multiple time series. This also removes the need to invert the transform on the training data.
+* For anomaly detection, the :doc:`post-processing transforms <merlion.post_process>` should be updated to accommodate
+  multiple time series. This is especially important for calibration. For example, if we receive 10 time series of
+  of anomaly scores, we should use all 10 to learn a single calibrator, rather than learning one calibrator per time
+  series. The underlying assumption is that the anomaly score distributions should be similar across all time series.
+* For forecasting, ``model.transform`` can be trained and applied on ``time_series_prev``, and then inverted on the
+  concatenation of ``time_series_prev`` and ``forecast`` as it is done now, via a call to ``model._process_forecast()``.
+  ``model.exog_transform`` should also be handled similarly (minus the inversion).
+  See `tutorials/forecast/3_ForecastExogenous` for more details on exogenous regressors.
+
+In general, the code changes to ``model.forecast()`` and ``model.get_anomaly_score()`` are relatively minor.
+If the flag ``model.multi_series `` is set, then make sure that ``time_series_prev`` is given and then train
+``model.transform`` and ``model.exog_transform`` on ``time_series_prev`` and ``exog_data`` respectively. After this
+point, the functions should be unchanged.
+
+Model Variants
+--------------
+There are a number of model variants which either build upon the above model classes or modify them slightly.
+
+Simple Variants
+^^^^^^^^^^^^^^^
+Below are some simpler model variants that are useful to understand:
+
+* In order to support forecasting with exogenous regressors, we implement the
+  :py:class:`ForecasterExogBase <merlion.models.forecast.base.ForecasterExogBase>` base class.
+  Most of the functionality to support exogenous regressors is actually implemented in
+  :py:class:`ForecasterBase <merlion.models.forecast.base.ForecasterBase>`, which this class inherits from. The only
+  real difference is that a few internal fields have been changed to indicate that exogenous regressors are supported.
+* We support using basic forecasters as the basis for anomaly detection models. The key piece is the mixin class
+  :py:class:`ForecastingDetectorBase <merlion.models.anomaly.forecast_based.base.ForecastingDetectorBase>`.
+* Some models don't work unless the input is pre-normalized. To support these models, we implement the
+  :py:class:`NormalizingConfig <merlion.models.base.NormalizingConfig>`. This config class applies a
+  ``MeanVarNormalize`` after any other pre-processing (specified by the user in ``transform``) has been applied.
+  The full transform is accessed via ``config.full_transform``. Models automatically understand how this works because
+  the property ``model.transform`` tries to get ``model.config.full_transform`` if possible and defaults to
+  ``model.config.transform`` otherwise. When using this class to implement models, simply add the ``NormalizingConfig``
+  as a base class for your model.
+
+Ensembles
+^^^^^^^^^
+Merlion supports ensembles of both anomaly detectors and forecasters. The ensemble config has two key components
+which make this possible: ``ensemble.config.models`` contains all the models present in the ensemble, while
+``ensemble.config.combiner`` contains a :py:mod:`combiner <merlion.models.ensemble.combine>` object which defines
+a way of combining the outputs of multiple models. This includes Mean, Median, and ModelSelector based on an evaluation
+metric. When doing model selection, the ``ensemble.train()`` method automatically splits the train data into training
+and validation splits, and it evaluates the performance of each model on the validation split.
+It then re-trains each model on the full training data afterwards.
+
+One possible improvement is to parallelize the training of each models in the models. We can probably just use
+Python's native ``multiprocessing`` library.
+
+Layered Models
+^^^^^^^^^^^^^^
+Layered models are a useful abstraction for models that act as a wrapper around another model. This feature is
+especially useful for AutoML. Like ensembles, we store the wrapped model in ``layered_model.config.model``,
+and ``layered_model.model`` is a reference to ``layered_model.config.model``. The *base model* is the model at the
+lowest level of the hierarchy.
+
+There are a number of dirty tricks used to (1) ensure that layered anomaly detectors and forecasters inherit from the
+right base classes, (2) config parameters are not duplicated between different levels of the hierarchy, and (3) users
+can call a parameter like ``layered_model.config.max_forecast_steps`` (which should only be defined for the base model)
+and receive ``layered_model.base_model.config.max_forecast_steps`` directly.
+
+The documentation for :py:mod:`merlion.models.layers` has some more details.
+
+Post-Processing
+---------------
+Distinct :doc:`post-rules <merlion.post_process>` are only relevant for anomaly detection.
+There are two types of post-rules: calibration and thresholding. Similar to transforms, post-rules may be trained by
+calling ``post_rule.train(train_anom_scores)`` and applied by calling ``post_rule(anom_scores)``. Extending post-rules
+so that they can be trained on multiple time series simultaneously is a worthwhile direction to investigate.
+
+Other Modules
+-------------
+Most other modules are stand-alone pieces that don't directly interact with each other, except in longer pipelines. We
+defer to the main documentation in :doc:`merlion`.
diff --git a/latest/_sources/index.rst.txt b/latest/_sources/index.rst.txt
@@ -69,13 +69,17 @@ For code resources, we recommend the linked tutorials on `anomaly detection <tut
 and `forecasting <tutorials/forecast/0_ForecastIntro>`. After that, you should read in more detail about Merlion's
 main data structure for representing time series `here <tutorials/TimeSeries>`.
 
+Finally, developers should look at the `architecture <architecture>` document to better understand how Merlion's
+key components interact with each other.
+
 .. toctree::
    :maxdepth: 2
    :caption: Contents:
 
    merlion
    ts_datasets
    tutorials
+   architecture
 
 
 Indices and tables

diff --git a/latest/_static/nbsphinx-broken-thumbnail.svg b/latest/_static/nbsphinx-broken-thumbnail.svg