Skip to content

OliverHennhoefer/unquad

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

unquad: Uncertainty-Quantified Anomaly Detection

License PyPI - Python Version Code style: black

unquad is a wrapper applicable for most PyOD detectors (see Supported Estimators) enabling uncertainty-quantified anomaly detection based on one-class classification and the principles of conformal inference.

pip install unquad

Mind the optional dependencies for using deep learning models or the built-in datasets (see. pyproject.toml).

What is Conformal Anomaly Detection?

start with why

Conformal Anomaly Detection applies the principles of conformal inference (conformal prediction) to anomaly detection. Conformal Anomaly Detection focuses on controlling error metrics like the false discovery rate, while maintaining statistical power.

CAD converts anomaly scores to p-values by comparing test data scores against calibration scores from normal training data. The resulting p-value of the test score(s) is computed as the normalized rank among the calibration scores. These statistically valid p-values enable error control through methods like Benjamini-Hochberg, replacing traditional anomaly estimates that lack any kind of statistical guarantee.

Usage: Split-Conformal (Inductive Approach)

Using the default behavior of ConformalDetector() with default DetectorConfig().

from pyod.models.gmm import GMM

from unquad.utils.enums import Dataset
from unquad.data.loader import DataLoader
from unquad.estimator.detector import ConformalDetector
from unquad.strategy.split import SplitConformal
from unquad.utils.metrics import false_discovery_rate, statistical_power

dl = DataLoader(dataset=Dataset.SHUTTLE)
x_train, x_test, y_test = dl.get_example_setup(random_state=1)

ce = ConformalDetector(
    detector=GMM(),
    strategy=SplitConformal(calib_size=1_000)
)

ce.fit(x_train)
estimates = ce.predict(x_test)

print(f"Empirical FDR: {false_discovery_rate(y=y_test, y_hat=estimates)}")
print(f"Empirical Power: {statistical_power(y=y_test, y_hat=estimates)}")

Output:

Empirical FDR: 0.108
Empirical Power: 0.892

The behavior can be customized by changing the DetectorConfig():

@dataclass
class DetectorConfig:
    alpha: float = 0.2  # Nominal FDR value
    adjustment: Adjustment = Adjustment.BH  # Multiple Testing Procedure
    aggregation: Aggregation = Aggregation.MEDIAN  # Score Aggregation (if necessary)
    seed: int = 1
    silent: bool = True

Usage: Bootstrap-after-Jackknife+ (JaB+)

Using ConformalDetector() with customized DetectorConfig(). The BootstrapConformal() strategy allows to set 2 of the 3 parameters resampling_ratio, n_boostraps and n_calib. For either combination, the remaining parameter will be filled automatically. This allows exact control of the calibration procedure when using a bootstrap strategy.

from pyod.models.iforest import IForest

from unquad.data.loader import DataLoader
from unquad.estimator.configuration import DetectorConfig
from unquad.estimator.detector import ConformalDetector
from unquad.strategy.bootstrap import BootstrapConformal
from unquad.utils.enums import Aggregation, Adjustment, Dataset
from unquad.utils.metrics import false_discovery_rate, statistical_power

dl = DataLoader(dataset=Dataset.SHUTTLE)
x_train, x_test, y_test = dl.get_example_setup(random_state=1)

ce = ConformalDetector(
    detector=IForest(behaviour="new"),
    strategy=BootstrapConformal(resampling_ratio=0.99, n_bootstraps=20, plus=True),
    config=DetectorConfig(alpha=0.1, adjustment=Adjustment.BY, aggregation=Aggregation.MEAN),
)

ce.fit(x_train)
estimates = ce.predict(x_test)

print(f"Empirical FDR: {false_discovery_rate(y=y_test, y_hat=estimates)}")
print(f"Empirical Power: {statistical_power(y=y_test, y_hat=estimates)}")

Output:

Empirical FDR: 0.0
Empirical Power: 1.0

Supported Estimators

The package only supports anomaly estimators that are suitable for unsupervised one-class classification. As respective detectors are therefore exclusively fitted on normal (or non-anomalous) data, parameters like threshold are internally set to the smallest possible values.

Models that are currently supported include:

  • Angle-Based Outlier Detection (ABOD)
  • Autoencoder (AE)
  • Cook's Distance (CD)
  • Copula-based Outlier Detector (COPOD)
  • Deep Isolation Forest (DIF)
  • Empirical-Cumulative-distribution-based Outlier Detection (ECOD)
  • Gaussian Mixture Model (GMM)
  • Histogram-based Outlier Detection (HBOS)
  • Isolation-based Anomaly Detection using Nearest-Neighbor Ensembles (INNE)
  • Isolation Forest (IForest)
  • Kernel Density Estimation (KDE)
  • k-Nearest Neighbor (kNN)
  • Kernel Principal Component Analysis (KPCA)
  • Linear Model Deviation-base Outlier Detection (LMDD)
  • Local Outlier Factor (LOF)
  • Local Correlation Integral (LOCI)
  • Lightweight Online Detector of Anomalies (LODA)
  • Locally Selective Combination of Parallel Outlier Ensembles (LSCP)
  • GNN-based Anomaly Detection Method (LUNAR)
  • Median Absolute Deviation (MAD)
  • Minimum Covariance Determinant (MCD)
  • One-Class SVM (OCSVM)
  • Principal Component Analysis (PCA)
  • Quasi-Monte Carlo Discrepancy Outlier Detection (QMCD)
  • Rotation-based Outlier Detection (ROD)
  • Subspace Outlier Detection (SOD)
  • Scalable Unsupervised Outlier Detection (SUOD)

Contact

Bug reporting: https://github.com/OliverHennhoefer/unquad/issues