KDD 2021 Tutorial on 

''Toward Explainable Deep Anomaly Detection''

Abstract

Anomaly detection can offer important insights into many safety-critical or commercially-significant real-world applications such as extreme climate event detection, mechanical fault detection, terrorist detection, fraud detection, malicious URL detection, just to name a few. Because of this significance, it has been extensively studied for decades, with numerous shallow methods proposed for this task. However, these methods are challenged by various data complexities, such as high dimensionality, data interdependencies, data heterogeneity, etc. In recent years deep learning has shown tremendous success in tackling these complexities in a wide range of applications, but popular deep learning techniques are inapplicable to anomaly detection due to some unique characteristics of anomalies, e.g., rarity, heterogeneity, unbounded  nature, and prohibitively high cost of collecting large-scale anomaly data. A large number of studies therefore have been dedicated to deep learning techniques specifically designed for anomaly detection. These studies demonstrate great success in addressing some major challenges to which shallow anomaly detection methods fail in different application contexts.

In this tutorial we aim to present a comprehensive review of the advances in deep learning-based anomaly detection and explanation. We first introduce the key intuitions, objective functions, underlying assumptions, advantages and disadvantages of 12 categories of state-of-the-art deep anomaly detection methods. Anomaly explanation is often as important as anomaly detection, which is especially true for deep detection models -- the `black-box' models, so we also introduce a number of principled approaches used to provide anomaly explanation for deep detection models. Deep anomaly detection is significantly less explored than many other data mining tasks. We aim to actively promote its development in algorithms, theories and evaluation through this tutorial.

Schedule and Materials

Date and Time: Aug 14, 9am-12pm (GMT +8) [No video recording was made as the ACM policies do not allow us to do so]

[Slides

Q & A (5 min)

+ Deep learning for feature extraction

+ Learning feature representations of normality

– Generic normality feature learning

∗ Autoencoder-based approaches

∗ Generative adversarial network-based approaches

∗ Predictability modeling approaches

∗ Self-supervised classification approaches

– Anomaly measure-dependent feature learning

∗ Feature learning for distance-based measure

∗ Feature learning for one-class classification measure

∗ Feature learning for clustering-based measure

+ End-to-end anomaly score learning

– Ranking models

– Prior-driven models

– Softmax likelihood models

– End-to-end one-class classification models

+ Unsupervised approach

+ Semi-supervised approach

+ Weakly-supervised approach

Break (10 min)

+ Outlying aspect mining

+ Joint feature selection and anomaly detection

+ Data reconstruction

+ Back-propagation approach

Q & A (5 min)

Presenters

Guansong Pang

Research Fellow

Australian Institute for Machine Learning

University of Adelaide

Charu Aggarwal

Distinguished Research Staff Member

IBM T. J. Watson Research Center


Dr. Guansong Pang obtained his PhD degree in Data Mining at the University of Technology Sydney in 2019. He is a Research Fellow in the Australian Institute for Machine Learning at the University of Adelaide, and an incoming Assistant Professor at Singapore Management University. His research interests lie in data mining, machine learning and their applications; he has been dedicating to the research on anomaly detection for over six years. He has published more than 25 papers (most of them are on (deep) anomaly detection) in refereed conferences and journals, such as KDD, AAAI, IJCAI, CVPR, ACM MM, ICDM, CIKM, IEEE Transactions on Knowledge and Data Engineering, and Data Mining and Knowledge Discovery Journal. He is one of the key presenters of the KDD17's tutorial on ``Non-IID Learning" and the KDD18's tutorial on ``Behavior Analytics: Methods and Applications". He also gives a number of oral representations of his papers at top conferences such as IJCAI16, IJCAI17, CIKM17, KDD18, KDD19 and invited talks at various universities.

Dr. Charu Aggarwal completed his Ph.D. in Operations Research from the Massachusetts Institute of Technology in 1996. He has worked extensively in the field of data mining, with particular interests in data streams, privacy, uncertain data and social network analysis. He is a recipient of the IEEE ICDM Research Contributions Award (2015) and the ACM SIGKDD Innovation Award (2019), which are the two highest awards for research in the field of data mining. He has served as the general or program co-chair of the IEEE Big Data Conference (2014), the ICDM Conference (2015), the ACM CIKM Conference (2015), and the KDD Conference (2016). He is an editor-in-chief of the ACM Transactions on Knowledge Discovery and Data Mining , and has served as editor-in-chief of the ACM SIGKDD Explorations. He is a fellow of the IEEE (2010), ACM (2013), and the SIAM (2015) for ``contributions to knowledge discovery and data mining algorithms''. He is the sole author of the popular anomaly detection textbook ``Outlier Analysis''. He delivers a number of invited keynotes at various top conferences such as ECML 06, ASONAM 14, ECML 14 and SIGIR 18, and is one of the key presenters of several conference tutorials such as CIKM13 and SDM13 Tutorial on ``Outlier Detection in Temporal Data'' and ASONAM13 Tutorial on ``Outlier Detection in Graph Data''.

Key References

Pang, Guansong, et al. Deep learning for anomaly detection: A review. ACM Computing Survey 54, 2, Article 38 (March 2021), 38 pages. https://doi.org/10.1145/3439950. arXiv preprint.

Aggarwal, Charu. Outlier analysis. Springer (2017). 

Contact

Any questions can be sent to Guansong Pang (pangguansong@gmail.com)