15. Anomaly detection

作者: 玄语梨落 | 来源:发表于2020-09-04 21:48 被阅读0次

15. Anomaly detection
15.异常检测（anomaly detection)
信贷风控实战（六）——异常检测
异常检测（二）
Graph Anomaly Detection with Dee
A Comprehensive Survey on Graph
Anomaly Detection
Coursera.MachineLearning.Week9
2018-09-11
【拓扑异常检测】Topological Anomaly Dete

Anomaly detection

Problem motivation

Gaussian distribution

Gaussian distribution: Say $x\in R$ . If $x$ is a distributed Gassian with mean $\mu$ , variance $\sigma^2$

$x\sim N(\mu,\sigma^2)$

$P(x;\mu,\sigma^2) = \frac{1}{\sqrt{2\pi}\sigma}\exp^{(-\frac{(x-\mu)^2}{2\sigma^2})}$

Parameter estimation:

$\mu = \frac{1}{m}\sum\limits_{i=1}^mx^{(i)}$
$\sigma^2 = \frac{1}{m}\sum\limits_{i=1}^m(x^{(i)}-\mu)^2$

$m = m-1$ , whether use $m$ or $m-1$ make very little difference.

Algorithm

Density estimation

$\begin{aligned} P(x) & = P(x_1;\mu_1,\sigma_1^2)P(x_2;\mu_2,\sigma_2^2)...P(x_n;\mu_n,\sigma_n^2) \\ & =\prod_{j=1}^nP(x_j;\mu_j,\sigma_j^2) \end{aligned}$

Anomaly detection algorithm

Choose features $x_i$ that you think might be indicative of anomalous examples.
Fit parameters $\mu_1,...,\mu_n,\sigma_1^2,...,\sigma_n^2$
Given new example $x$ , compute $p(x)$ :
$\begin{aligned} P(x) & = P(x_1;\mu_1,\sigma_1^2)P(x_2;\mu_2,\sigma_2^2)...P(x_n;\mu_n,\sigma_n^2) \\ & =\prod_{j=1}^nP(x_j;\mu_j,\sigma_j^2) \end{aligned}$
Anomaly if $p\le \epsilon$

Developing and evaluating an anomaly detection system

Whem developing a learning algorithm (choosing features, etc.), making decisions is much easier if we have a way of evaluating our learning algorithm.

Assume we have some labeled data, of anomalous and non-anomalous examples.

Training set (normal examples)
cross validiation set (labeled examples)
test set (labeled examples)

Can also use cross validation set to choose parameter $\epsilon$

Anomaly detection vs. supervised learning

Anomaly detection	Supervised learning
Very small number of positive examples; Large number of negative examples	Large number of positive examples and negative examples
Hard for any algorithm to learn from positive examples what the anomalies look like; future anomalies may look nothing like any of the anomalous examples we've seen so far.	Enough positive examples for algorithm to get a sense of what positive examples are like, future positive examples likely to be similar to ones in training set.

Choosing what features to use

Non-gaussian features: make your data more like Gaussian.

Error analysis for anomaly detection

Most common problem: $p(x)$ is comparable (say, both large) for normal and anomalous examples.
Create some new features.
Choose featrues that might take on unusually large or small values in the event of an anomaly.

Multivariate Gaussian distribution

$x\in R^n$ . Don't model $p(x_1),p(x_2),...,$ etc. separately.
Model $p(x)$ all in one go.
Parameters: $\mu\in R^n$ , $\Sigma\in R^{n\times n}$

$p(x;\mu,\sigma) = \frac{1}{(2\pi)^{n/2}\det(\Sigma)^{1/2}}\exp(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu))$

there are some pics that show the multivariate gaussian look like in the video.

Anomaly detection using the multivariate Gaussian distribution

$p(x;\mu,\sigma) = \frac{1}{(2\pi)^{n/2}\det(\Sigma)^{1/2}}\exp(-\frac{1}{2}(x-\mu)^T\Sigma^{-1}(x-\mu))$

$\begin{aligned} \mu & = \frac{1}{m}\sum_{i=1}^mx^{(i)} \\ \Sigma & = \frac{1}{m}\sum_{i=1}^m(x^{(i)l}-\mu)(x^{(i)}-\mu)^T \end{aligned}$

Original model vs. Multivariate Gaussian

original model:

manually create features to capture anomalies where $x_1,x_2$ take unusual combinations of values.
computationally cheaper
ok even if $m$ is small

multivariate Gaussian:

automatically captures correlations between features
computationally more expensive
must have $m\ge n$ , or else $\Sigma$ is non-invertible

15. Anomaly detection
Anomaly detection Problem motivation Gaussian distributio...
15.异常检测（anomaly detection)
第九周、Lecture 15 假设所有的训练数据符合高斯分布（Gaussian(Normal) distribut...
信贷风控实战（六）——异常检测
异常检测（Outlier Detection / Anomaly Detection），也称之为离群点检测，...
异常检测（二）
Developing and Evaluating an Anomaly Detection System 异常检...
Graph Anomaly Detection with Dee
论文：A Comprehensive Survey on Graph Anomaly Detection with...
A Comprehensive Survey on Graph
论文：A Comprehensive Survey on Graph Anomaly Detection with...
Anomaly Detection
异常检测是一个很有趣的问题，在生活中的运用也很广，比如在一系列信用卡刷卡行为中检测出盗刷行为，或者癌症检测等。其...
Coursera.MachineLearning.Week9
Machine Learning Week9 : Anomaly Detection & Recommender ...
2018-09-11
Unsupervised real-time anomaly detection for streaming da...
【拓扑异常检测】Topological Anomaly Dete
Topological Anomaly Detection Posted onAugust 4, 2014bysh...

15. Anomaly detection

Anomaly detection

Problem motivation

Gaussian distribution

Algorithm

Density estimation

Anomaly detection algorithm

Developing and evaluating an anomaly detection system

Anomaly detection vs. supervised learning

Choosing what features to use

Error analysis for anomaly detection

Multivariate Gaussian distribution

Anomaly detection using the multivariate Gaussian distribution

Original model vs. Multivariate Gaussian

相关文章

15. Anomaly detection

15.异常检测（anomaly detection)

信贷风控实战（六）——异常检测

异常检测（二）

Graph Anomaly Detection with Dee

A Comprehensive Survey on Graph

Anomaly Detection

Coursera.MachineLearning.Week9

2018-09-11

【拓扑异常检测】Topological Anomaly Dete

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

大数据，机器学习，人工智能