This kind of data is everywhere.
Unsupervised and semi-supervised anomaly detection with data-centric ML
February 08, 2023 In Googleblog
Posted by Jinsung Yoon and Sercan O. Arik, Research Scientists, Google Research, Cloud AI Team
Anomaly detection (AD), the task of distinguishing anomalies from normal data, plays a vital role in many real-world applications, such as detecting faulty products from vision sensors in manufacturing, fraudulent behaviors in financial transactions, or network security threats. Depending on the availability of the type of data — negative (normal) vs. positive (anomalous) and the availability of their labels — the task of AD involves different challenges.
(a) Fully supervised anomaly detection, (b) normal-only anomaly detection, (c, d, e) semi-supervised anomaly detection, (f) unsupervised anomaly detection.
While most previous works were shown to be effective for cases with fully-labeled data (either (a) or (b) in the above figure), such settings are less common in practice because labels are particularly tedious to obtain. In most scenarios users have a limited labeling budget, and sometimes there aren’t even any labeled samples during training. Furthermore, even when labeled data are available, there could be biases in the way samples are labeled, causing distribution differences. Such real-world data challenges limit the achievable accuracy of prior methods in detecting anomalies.
This post covers two of our recent papers on AD, published in Transactions on Machine Learning Research (TMLR), that address the above challenges in unsupervised and semi-supervised settings. Using data-centric approaches, we show state-of-the-art results in both. In “Self-supervised, Refine, Repeat: Improving Unsupervised Anomaly Detection”, we propose a novel unsupervised AD framework that relies on the principles of self-supervised learning without labels and iterative data refinement based on the agreement of one-class classifier (OCC) outputs. In “SPADE: Semi-supervised Anomaly Detection under Distribution Mismatch”, we propose a novel semi-supervised AD framework that yields robust performance even under distribution mismatch with limited labeled samples. ... '
No comments:
Post a Comment