Unraveling the 'Anomaly' in Time Series Anomaly Detection: A Self-supervised Tri-domain Solution (ICDE 2024)
This paper addresses key challenges in time series anomaly detection (TSAD): (1) the scarcity of labels, and (2) the diversity in anomaly types and lengths. Furthermore, it contributes to the ongoing debate on the effectiveness of deep learning models in TSAD, highlighting the problems of flawed benchmarks and ill-posed evaluation metrics. This study stands out as the first to reassess the potential of deep learning in TSAD, employing both rigorously designed datasets (UCR Archive) and evaluation metrics (PA%K and affiliation). (paper)
-
Download the UCR dataset ready for use. Next, run preprocess_data.py. This script will partition 10% of the training data as the validation set and create a directory containing the dataset at ./dataset/ucr_data.pt in the following format:
{'train_data': train_x, 'valid_data': valid_x, 'test_data': test_x, 'test_labels': test_y}
-
Simply run train.py to train TriAD over the whole dataset. The results are saved as tri_res.pt (a demo version provided) and wrapped in a data frame.
-
To get a summary of both the tri-window and single window detection accuracy (among the 250 datasets, how many are successfully detected by tri/single window), simply run single_window_selection.py. The results will be saved as merlin_win.pt, which can generate the Merlin readable files by discord_data_prep.py. By restricting our focus to the single window, we force Merlin to scan around the window to find anomalies.
-
To get the summary of detection results of the shortest 62 datasets, simply run shortest_62.py.
-
The visualization of detection results and point-wise metrics are shown in the directory ./eval_demo. UCR 025 and UCR 150 are used as demo examples, the test_xxx.txt contains the Merlin search results, where the columns represent search_length, start_index, end_index, and discord_distance. Install the affiliation metrics, and run convert_pw.py:
python convert_pw.py 150
which will give the output as:
Dataset: UCR 150 window magic correction !! UCR 150 Traditional Metrics: F1 Score: 0.3947 PA: F1 Score: 0.8619 PA%K - AUC: Precision: 0.5859 Recall: 0.5442 F1 Score: 0.5466 Affinity: Precision: 0.9922 Recall: 0.9954
*Please note that the experimental outcomes might vary between runs due to the randomness introduced during the augmentation process.
You can access several widely used TSAD datasets from Data Smith. Additionally, we offer a comprehensive visualization of them including the UCR dataset. The preprocessed version of the UCR dataset utilized in this study is available for direct download here.
You may be also interested in this blog where we discuss about why the popular evaluation metric, point-adjustment (PA), can be a bit tricky. Additionally, we provide a detailed explanation, along with calculation examples, of the two reliable evaluation metrics used in this study.