Skip to content

durgeshsamariya/awesome-outlier-detection-resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 

Repository files navigation

Awesome Outlier Detection Resources

GitHub stars GitHub forks License Awesome

An awesome curated list of outlier (a.k.a anomaly) detection papers.


Outlier Detection

In data mining, anomaly detection (also outlier detection) is the identification of rare items, events or observations which raise suspicions by differing significantly from the majority of the data. - Wikipedia


Table of Contents

1. Books

Outlier Analysis by Charu C. Aggarwal [URL].

2. Research Papers

2.1. Survey Papers

Title Publication Venue Year Reference URL
Novelty detection: a review—part 1: statistical approaches Elsevier 2003 [1] [URL]
Novelty detection: a review—part 2:: neural network based approaches Elsevier 2003 [2] [URL]
A Survey of Outlier Detection Methodologies Springer 2004 [3] [URL] [PDF]
Anomaly detection: A survey ACM 2009 [4] [PDF]
A Comprehensive Survey of Data Mining-based Fraud Detection Research ArXiv Preprint 2010 [15] [PDF]
A survey on unsupervised outlier detection in high‐dimensional numerical data Wiley Online Library 2012 [16] [URL]
Survey on Anomaly Detection using Data Mining Techniques ScienceDirect 2015 [14] [URL]
Graph based anomaly detection and description: a survey DMKD 2015 [47] [URL] [PDF]
A comparative evaluation of outlier detection algorithms: Experiments and analyses Pattern Recognition 2018 [48] [PDF]
Progress in outlier detection techniques: A survey IEEE Access 2019 [46] [URL]
Deep learning for anomaly detection: A survey ArXiv 2019 [49] [PDF]
Deep learning for anomaly detection: A review ArXiv 2020 [50] [PDF]

2.2. State-of-the-Art Papers

Title Publication Venue Year Reference URL
LOF: Identifying Density-Based Local Outliers ACM SIGMOD Record 2000 [6] [PDF]
Efficient algorithms for mining outliers from large data sets ACM SIGMOD Record 2000 [17] [PDF]
Fast outlier detection in high dimensional spaces PKDD 2002 [33] [PDF]
Isolation Forest IEEE 2008 [5] [URL]

2.3. Density Based Outlier Detection Methods

Title Publication Venue Year Reference URL
OPTICS-OF: Identifying Local Outliers Springer 1999 [9] [URL]
LOF: Identifying Density-Based Local Outliers ACM SIGMOD Record 2000 [6] [PDF]
Enhancing effectiveness of outlier detections for low density patterns (COF) PAKDD 2002 [55] [URL]
RDF: A density-based outlier detection method using vertical data representation ICDM 2004 [57] [URL]
LOCI: fast outlier detection using the local correlation integral IEEE 2003 [7] [URL]
LoOP: local outlier probabilities CIKM 2009 [58] [PDF]
Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data (ROF) KAIS 2009 [59] [URL]
FastLOF: An expectation-maximization based local outlier detection algorithm ICPR 2012 [60] [URL]
Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection (SimplifiedLOF) DMKD 2014 [56] [URL]
LiNearN: A new approach to nearest neighbour density estimator Pattern Recognition 2014 [52] [URL]
Revisiting Attribute Independence Assumption in Probabilistic Unsupervised Anomaly Detection Springer 2016 [8] [URL]
Hierarchical density estimates for data clustering, visualization, and outlier detection (GLOSH) TKDD 2015 [61] [URL]
Improved histogram-based anomaly detector with the extended principal component features arXiv preprint 2019 [51] [PDF]

2.4. Distance Based Outlier Detection Methods

Title Publication Venue Year Reference URL
Efficient algorithms for mining outliers from large data sets ACM SIGMOD Record 2000 [17] [PDF]
Fast outlier detection in high dimensional spaces PKDD 2002 [33] [PDF]
A New Local Distance-Based Outlier Detection Approach for Scattered Real-World Data PAKDD 2009 [30] [URL]
Rapid Distance-Based Outlier Detection via Sampling NIPS 2013 [32] [PDF]
Distance-based Outlier Detection in Data Streams VLDB 2016 [31] [PDF]

2.5. Clustering Based Outlier Detection Methods

Title Publication Year Reference URL
Clustering-Based Outlier Detection Method FSKD 2008 [35] [URL]
Efficient Clustering-Based Outlier Detection Algorithm for Dynamic Data Stream FSKD 2008 [36] [URL]
Cluster-based outlier detection Annals of Operations Research 2009 [34] [PDF]
Framework of Clustering-Based Outlier Detection FSKD 2009 [40] [URL]
An Outlier Detection Method Based on Clustering EAIT 2011 [37] [PDF]
A Minimum Spanning Tree-Inspired Clustering-Based Outlier Detection Technique ICDM 2012 [39] [PDF]
Cluster Based Outlier Detection Algorithm for Healthcare Data Procedia Computer Science 2015 [38] [PDF]

2.6. Isolation Based Outlier Detection Methods

Title Publication Venue Year Reference URL
Isolation Forest IEEE 2008 [5] [URL]
On Detecting Clustered Anomalies Using SCiForest Springer 2010 [12] [PDF]
Isolation-Based Anomaly Detection ACM 2012 [10] [PDF]
Improving iForest with Relative Mass Springer 2014 [11] [URL]
Efficient anomaly detection by isolation using nearest neighbour ensemble ICDEW 2014 [42] [URL]
LeSiNN: Detecting anomalies by identifying Least Similar Nearest Neighbours IEEE 2015 [13] [URL]
Isolation‐based anomaly detection using nearest‐neighbor ensembles Computational Intelligence 2018 [45] [URL]
Anomaly Detection Technique Robust to Units and Scales of Measurement PAKDD 2018 [53] [URL]
usfAD: a robust anomaly detector based on unsupervised stochastic forest International Journal of Machine Learning and Cybernetics 2020 [54] [URL]

2.7. Subspace Outlier Detection Methods

Title Publication Venue Year Reference URL
Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data Springer 2009 [28] [URL] [PDF]
Local Subspace Based Outlier Detection Springer 2009 [24] [URL]
HiCS: High Contrast Subspaces for Density-Based Outlier Ranking ACM 2012 [21] [URL]
Outlier Detection in Arbitrarily Oriented Subspaces ICDM 2012 [26] [PDF]
An angle-based subspace anomaly detection approach to high-dimensional data: With an application to industrial fault detection Elsevier 2015 [27] [URL] [PDF]
ZERO++: Harnessing the Power of Zero Appearances to Detect Anomalies in Large-Scale Data Sets JAIR 2016 [23] [URL] [PDF]
Subspace Outlier Detection in Linear Time with Randomized Hashing IEEE 2016 [25] [URL] [PDF]
Hiding outliers in high-dimensional data spaces Springer 2017 [22] [URL]

2.8. Ensemble based Outlier Detection Methods

Title Publication Year Reference URL
LODA: Lightweight on-line detector of anomalies Machine Learning 2016 [62] [PDF]
LSCP: Locally selective combination in parallel outlier ensembles SIAM 2019 [63] [PDF]
DCSO: dynamic combination of detector scores for outlier ensemble ArXiv 2019 [64] [PDF]

2.9. Deep Learning Outlier Detection Methods

Title Publication Year Reference URL
DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning CCS conference 2017 [65] [PDF]

2.10. Graph Outlier Detection

Title Publication Year Reference URL
Graph based anomaly detection and description: a survey DMKD 2015 [67] [PDF]
On Using Classification Datasets to Evaluate Graph Outlier Detection: Peculiar Observations and New Insights ArXiv 2020 [66] [PDF]

2.11. Outlying Aspect Mining

Title Publication Venue Year Reference URL
Hos-Miner: a system for detecting outlyting subspaces of high-dimensional data VLDB 2004 [20] [PDF]
Mining outlying aspects on numeric data Springer 2015 [19] [URL]
Discovering outlying aspects in large datasets Springer 2016 [18] [PDF]
Scalable Outlying-Inlying Aspects Discovery via Feature Ranking PAKDD 2015 [29] [URL]
A new effective and efficient measure for outlying aspect mining WISE 2020 [41] [URL]

3. Tutorials

Title Publication Year Reference URL
Outlier detection techniques KDD 2010 [44] [URL]
Which Outlier Detector Should I use? ICDE 2018 [43] [URL]

4. Datasets

ODDS - Outlier Detection DataSets

5. Tools

Tool Language URL
ELKI Java [URL]
PyOD Python [URL]

References

[1] Markou, M., & Singh, S. (2003). Novelty detection: a review—part 1: statistical approaches. Signal processing, 83(12), 2481-2497.
[2] Markou, M., & Singh, S. (2003). Novelty detection: a review—part 2:: neural network based approaches. Signal processing, 83(12), 2499-2521.
[3] Hodge, V., & Austin, J. (2004). A survey of outlier detection methodologies. Artificial intelligence review, 22(2), 85-126.
[4] Chandola, V., Banerjee, A., & Kumar, V. (2009). Anomaly detection: A survey. ACM computing surveys (CSUR), 41(3), 1-58.
[5] Liu, F. T., Ting, K. M., & Zhou, Z. H. (2008, December). Isolation forest. In 2008 Eighth IEEE International Conference on Data Mining (pp. 413-422). IEEE.
[6] Breunig, M. M., Kriegel, H. P., Ng, R. T., & Sander, J. (2000, May). LOF: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data (pp. 93-104).
[7] Papadimitriou, S., Kitagawa, H., Gibbons, P. B., & Faloutsos, C. (2003, March). Loci: Fast outlier detection using the local correlation integral. In Proceedings 19th international conference on data engineering (Cat. No. 03CH37405) (pp. 315-326). IEEE.
[8] Aryal, S., Ting, K. M., & Haffari, G. (2016, April). Revisiting attribute independence assumption in probabilistic unsupervised anomaly detection. In Pacific-Asia Workshop on Intelligence and Security Informatics (pp. 73-86). Springer, Cham.
[9] Breunig, M. M., Kriegel, H. P., Ng, R. T., & Sander, J. (1999, September). Optics-of: Identifying local outliers. In European Conference on Principles of Data Mining and Knowledge Discovery (pp. 262-270). Springer, Berlin, Heidelberg.
[10] Liu, F. T., Ting, K. M., & Zhou, Z. H. (2012). Isolation-based anomaly detection. ACM Transactions on Knowledge Discovery from Data (TKDD), 6(1), 1-39.
[11] Aryal, S., Ting, K. M., Wells, J. R., & Washio, T. (2014, May). Improving iforest with relative mass. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 510-521). Springer, Cham.
[12] Liu, F. T., Ting, K. M., & Zhou, Z. H. (2010, September). On detecting clustered anomalies using SCiForest. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases (pp. 274-290). Springer, Berlin, Heidelberg.
[13] Pang, G., Ting, K. M., & Albrecht, D. (2015, November). LeSiNN: Detecting anomalies by identifying least similar nearest neighbours. In 2015 IEEE international conference on data mining workshop (ICDMW) (pp. 623-630). IEEE.
[14] Agrawal, S., & Agrawal, J. (2015). Survey on anomaly detection using data mining techniques. Procedia Computer Science, 60, 708-713.
[15] Phua, C., Lee, V., Smith, K., & Gayler, R. (2010). A comprehensive survey of data mining-based fraud detection research. arXiv preprint arXiv:1009.6119.
[16] Zimek, A., Schubert, E., & Kriegel, H. P. (2012). A survey on unsupervised outlier detection in high‐dimensional numerical data. Statistical Analysis and Data Mining: The ASA Data Science Journal, 5(5), 363-387.
[17] Ramaswamy, S., Rastogi, R., & Shim, K. (2000, May). Efficient algorithms for mining outliers from large data sets. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data (pp. 427-438).
[18] Vinh, N. X., Chan, J., Romano, S., Bailey, J., Leckie, C., Ramamohanarao, K., & Pei, J. (2016). Discovering outlying aspects in large datasets. Data mining and knowledge discovery, 30(6), 1520-1555.
[19] Duan, L., Tang, G., Pei, J., Bailey, J., Campbell, A., & Tang, C. (2015). Mining outlying aspects on numeric data. Data Mining and Knowledge Discovery, 29(5), 1116-1151.
[20] Zhang, J., Lou, M., Ling, T. W., & Wang, H. (2004). HOS-miner: A system for detecting outlying subspaces of high-dimensional data. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB'04) (pp. 1265-1268). Morgan Kaufmann Publishers Inc..
[21] Keller, F., Muller, E., & Bohm, K. (2012, April). HiCS: High contrast subspaces for density-based outlier ranking. In 2012 IEEE 28th international conference on data engineering (pp. 1037-1048). IEEE.
[22] Steinbuss, G., & Böhm, K. (2017). Hiding outliers in high-dimensional data spaces. International Journal of Data Science and Analytics, 4(3), 173-189.
[23] Pang, G., Ting, K. M., Albrecht, D., & Jin, H. (2016). ZERO++: Harnessing the power of zero appearances to detect anomalies in large-scale data sets. Journal of Artificial Intelligence Research, 57, 593-620.
[24] Agrawal, A. (2009, August). Local subspace based outlier detection. In International Conference on Contemporary Computing (pp. 149-157). Springer, Berlin, Heidelberg.
[25] Sathe, S., & Aggarwal, C. C. (2016, December). Subspace outlier detection in linear time with randomized hashing. In 2016 IEEE 16th International Conference on Data Mining (ICDM) (pp. 459-468). IEEE.
[26] Kriegel, H. P., Kröger, P., Schubert, E., & Zimek, A. (2012, December). Outlier detection in arbitrarily oriented subspaces. In 2012 IEEE 12th international conference on data mining (pp. 379-388). IEEE.
[27] Zhang, L., Lin, J., & Karim, R. (2015). An angle-based subspace anomaly detection approach to high-dimensional data: With an application to industrial fault detection. Reliability Engineering & System Safety, 142, 482-497.
[28] Kriegel, H. P., Kröger, P., Schubert, E., & Zimek, A. (2009, April). Outlier detection in axis-parallel subspaces of high dimensional data. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 831-838). Springer, Berlin, Heidelberg.
[29] Vinh, N. X., Chan, J., Bailey, J., Leckie, C., Ramamohanarao, K., & Pei, J. (2015, May). Scalable outlying-inlying aspects discovery via feature ranking. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 422-434). Springer, Cham.
[30] Zhang, K., Hutter, M., & Jin, H. (2009, April). A new local distance-based outlier detection approach for scattered real-world data. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 813-822). Springer, Berlin, Heidelberg.
[31] Tran, L., Fan, L., & Shahabi, C. (2016). Distance-based outlier detection in data streams. Proceedings of the VLDB Endowment, 9(12), 1089-1100.
[32] Sugiyama, M., & Borgwardt, K. (2013). Rapid distance-based outlier detection via sampling. In Advances in Neural Information Processing Systems (pp. 467-475).
[33] Angiulli, F., & Pizzuti, C. (2002, August). Fast outlier detection in high dimensional spaces. In European conference on principles of data mining and knowledge discovery (pp. 15-27). Springer, Berlin, Heidelberg.
[34] Duan, L., Xu, L., Liu, Y., & Lee, J. (2009). Cluster-based outlier detection. Annals of Operations Research, 168(1), 151-168.
[35] Jiang, S. Y., & An, Q. B. (2008, October). Clustering-based outlier detection method. In 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery (Vol. 2, pp. 429-433). IEEE.
[36] Elahi, M., Li, K., Nisar, W., Lv, X., & Wang, H. (2008, October). Efficient clustering-based outlier detection algorithm for dynamic data stream. In 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery (Vol. 5, pp. 298-304). IEEE.
[37] Pamula, R., Deka, J. K., & Nandi, S. (2011, February). An outlier detection method based on clustering. In 2011 Second International Conference on Emerging Applications of Information Technology (pp. 253-256). IEEE.
[38] Christy, A., Gandhi, G. M., & Vaithyasubramanian, S. (2015). Cluster based outlier detection algorithm for healthcare data. Procedia Computer Science, 50, 209-215.
[39] Wang, X., Wang, X. L., & Wilkes, D. M. (2012, July). A minimum spanning tree-inspired clustering-based outlier detection technique. In Industrial Conference on Data Mining (pp. 209-223). Springer, Berlin, Heidelberg.
[40] Jiang, S. Y., & Yang, A. M. (2009, August). Framework of clustering-based outlier detection. In 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery (Vol. 1, pp. 475-479). IEEE.
[41] Samariya, D., Aryal, S., Ting, K. M., & Ma, J. (2020, October). A new effective and efficient measure for outlying aspect mining. In International Conference on Web Information Systems Engineering (pp. 463-474). Springer, Cham.
[42] T. R. Bandaragoda, K. M. Ting, D. Albrecht, F. T. Liu and J. R. Wells, "Efficient Anomaly Detection by Isolation Using Nearest Neighbour Ensemble," 2014 IEEE International Conference on Data Mining Workshop, Shenzhen, 2014, pp. 698-705, doi: 10.1109/ICDMW.2014.70.
[43] Ting, K. M., Aryal, S., & Washio, T. (2018, November). Which Outlier Detector Should I use?. In 2018 IEEE International Conference on Data Mining (ICDM) (pp. 8-8). IEEE.
[44] Kriegel, H. P., Kröger, P., & Zimek, A. (2010). Outlier detection techniques. Tutorial at KDD, 10, 1-76.
[45] Bandaragoda, T. R., Ting, K. M., Albrecht, D., Liu, F. T., Zhu, Y., & Wells, J. R. (2018). Isolation‐based anomaly detection using nearest‐neighbor ensembles. Computational Intelligence, 34(4), 968-998.
[46] Wang, H., Bah, M. J., & Hammad, M. (2019). Progress in outlier detection techniques: A survey. IEEE Access, 7, 107964-108000.
[47] Akoglu, L., Tong, H., & Koutra, D. (2015). Graph based anomaly detection and description: a survey. Data mining and knowledge discovery, 29(3), 626-688.
[48] Domingues, R., Filippone, M., Michiardi, P., & Zouaoui, J. (2018). A comparative evaluation of outlier detection algorithms: Experiments and analyses. Pattern Recognition, 74, 406-421.
[49] Chalapathy, R., & Chawla, S. (2019). Deep learning for anomaly detection: A survey. arXiv preprint arXiv:1901.03407.
[50] Pang, G., Shen, C., Cao, L., & Hengel, A. V. D. (2020). Deep learning for anomaly detection: A review. arXiv preprint arXiv:2007.02500.
[51] Aryal, S., Baniya, A. A., & Santosh, K. C. (2019). Improved histogram-based anomaly detector with the extended principal component features. arXiv preprint arXiv:1909.12702.
[52] Wells, J. R., Ting, K. M., & Washio, T. (2014). LiNearN: A new approach to nearest neighbour density estimator. Pattern Recognition, 47(8), 2702-2720.
[53] Aryal, S. (2018, June). Anomaly detection technique robust to units and scales of measurement. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 589-601). Springer, Cham.
[54] Aryal, S., Santosh, K. C., & Dazeley, R. (2020). usfAD: a robust anomaly detector based on unsupervised stochastic forest. International Journal of Machine Learning and Cybernetics, 1-14.
[55] Tang, J., Chen, Z., Fu, A. W. C., & Cheung, D. W. (2002, May). Enhancing effectiveness of outlier detections for low density patterns. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 535-548). Springer, Berlin, Heidelberg.
[56] Schubert, E., Zimek, A., & Kriegel, H. P. (2014). Local outlier detection reconsidered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data mining and knowledge discovery, 28(1), 190-237.
[57] Ren, D., Wang, B., & Perrizo, W. (2004, November). Rdf: A density-based outlier detection method using vertical data representation. In Fourth IEEE International Conference on Data Mining (ICDM'04) (pp. 503-506). IEEE.
[58] Kriegel, H. P., Kröger, P., Schubert, E., & Zimek, A. (2009, November). LoOP: local outlier probabilities. In Proceedings of the 18th ACM conference on Information and knowledge management (pp. 1649-1652).
[59] Fan, H., Zaïane, O. R., Foss, A., & Wu, J. (2009). Resolution-based outlier factor: detecting the top-n most outlying data points in engineering data. Knowledge and Information Systems, 19(1), 31-51.
[60] Goldstein, M. (2012, November). FastLOF: An expectation-maximization based local outlier detection algorithm. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012) (pp. 2282-2285). IEEE.
[61] Campello, R. J., Moulavi, D., Zimek, A., & Sander, J. (2015). Hierarchical density estimates for data clustering, visualization, and outlier detection. ACM Transactions on Knowledge Discovery from Data (TKDD), 10(1), 1-51.
[62] Pevný, T. (2016). Loda: Lightweight on-line detector of anomalies. Machine Learning, 102(2), 275-304.
[63] Zhao, Y., Nasrullah, Z., Hryniewicki, M. K., & Li, Z. (2019, May). LSCP: Locally selective combination in parallel outlier ensembles. In Proceedings of the 2019 SIAM International Conference on Data Mining (pp. 585-593). Society for Industrial and Applied Mathematics.
[64] Zhao, Y., & Hryniewicki, M. K. (2019). DCSO: dynamic combination of detector scores for outlier ensembles. arXiv preprint arXiv:1911.10418.
[65] Du, M., Li, F., Zheng, G., & Srikumar, V. (2017, October). Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (pp. 1285-1298).
[66] Zhao, L., & Akoglu, L. (2020). On Using Classification Datasets to Evaluate Graph Outlier Detection: Peculiar Observations and New Insights. arXiv preprint arXiv:2012.12931.
[67] Akoglu, L., Tong, H., & Koutra, D. (2015). Graph based anomaly detection and description: a survey. Data mining and knowledge discovery, 29(3), 626-688.

More to come...

More items will be added to the repository. Please feel free to suggest other key resources by opening an issue report, submitting a pull request, or dropping me an email @ (samariya.durgesh@gmail.com). Enjoy reading!

Last updated on January 2, 2021