使用癌症案例
有一种特定的癌症 在人群中的发病率为 1%
这种癌症的一种化验,有 90% 的几率,在得了这种癌症的情况下呈阳性,这通常被叫做敏感度(Sensitivity)
但有时候即便没有得癌症,化验也呈阳性,比如 另外有 90% 的几率,没有得癌症的情况下呈阴性,这通常被叫做特异度(Specitivity)
那么对于以下问题:
在没有其他症状的情况下 你进行了化验,结果呈阳性
你认为的这种特定类型癌症的概率是多少?
要回答这个问题 我们先画个图
![](https://img.haomeiwen.com/i12735209/dae8220f74009c54.png)
假设这些是所有人,其中刚好有 1% 得了癌症,99% 没得癌症
我们知道有一种化验,如果你得了癌症,可以有 90% 的几率正确诊断出来, 如果我们画出化验呈阳性的区域(红色阴影部分)
但是 这并不是全部的真实情况,化验还有可能在没有癌症的,情况下呈阳性, 实际上 在我们的例子里,这占所有情况的 10%,因此 我们必须添加更多区域,就是这个大的区域的10%,这里化验结果可能在没有癌症的情况下,呈阳性( 蓝色阴影部分). 很显然 这些圆圈外的所有区域,对应没得癌症并且,化验呈阴性的情况
问题:
如果化验结果呈阳性,你认为在癌症先验概率为1%,敏感度和特异度为 90%的情况下,你认为现在新的概率是多少?
下面首先了解一些基本术语.
先验概率(Prior probability): 进行化验之前的概率
后验概率(posterior probability): 通过化验得到一些证据, 这样我们就得到了所谓的后验概率
贝叶斯法则(Bayesion rule)可以将化验中得到的一些证据, 纳入你的先验概率中,并得到后验概率
![](https://img.haomeiwen.com/i12735209/f630b270779d4a62.png)
例如, 在癌症的例子中, 癌症的先验概率(Prior probability)为1%, 患癌症并且检测为阳性的概率为90%.
![](https://img.haomeiwen.com/i12735209/4d6b905437a97cc7.png)
对于后验概率(posterior probability), 我们的问题是由阳性来推出得癌症得概率, 因此要包括两部分, 一部分是得癌症呈现阳性得概率, 另一部分是没有得癌症呈现阳性得概率.
那么计算可得:
![](https://img.haomeiwen.com/i12735209/9387e608be553197.png)
然后对两部分进行归一化(Normalization), 在这个例子中表示测试结果为阳性, 也就是上图中蓝色区域加红色区域组成得椭圆部分
对于之前的先验概率其实严格讲为两个事件的联合概率(Joint Probability), 最后得到化验结果为阳性时得癌症的后验概率, 就必须用测试为阳性得到癌症的概率0.009除以Normalizer = 0.108
![](https://img.haomeiwen.com/i12735209/864c0b76b839c404.png)
我们刚刚说过 有一种情况
- 先验的概率 P(C)
- 带有一定敏感度的化验 P(Pos|C)
- 特定的特异度 P(Neg|¬C)
![](https://img.haomeiwen.com/i12735209/89951aeec12d7822.png)
例如 当你收到阳性化验结果时 你要做的是:
- 使用先验的 P(C) 乘以在得癌症情况下化验结果的概率
- 然后乘以没得癌症情况下的化验结果的概率=
- 计算后 你得到一个数字, 包括得癌症的假设和没得癌症的假设, 把两者相加 结果通常不是 1, 这就是化验结果 , 在该情况的总概率 这里的情况是阳性
- 进行归一化或除法混算, 除以总概率
现在 你计算出了期望的后验概率
这就是贝叶斯法则的算法
对应到之前画的示意图中可得下图
![](https://img.haomeiwen.com/i12735209/22969391ddfbeb47.png)
练习
在这个例子中 你是一台机器人, 这个机器人在一条道路上 这条路只有两个地方 红色和绿色 用 R 和 G 表示,现在假设初始状态下 机器人不知道它的位置, 因此 红色或绿色两个地方的先验概率都是 0.5, 它还有传感器 就像眼睛一样 但它的传感器不太可靠, 因此它在红色格子中看到红色的概率是 0.8, 在绿色格子中看到绿色的概率是 0.8, 现在 假设机器人看到了红色, 如果机器人看到了红色, 那么它在红色格子里的后验概率是多少?, 同样地 如果机器人看到了红色 那么它在绿色格子里的概率是多少?现在 你可以应用贝叶斯法则计算出结果
示意图如下
![](https://img.haomeiwen.com/i12735209/81650b833ea1c1e4.png)
提示,贝叶斯法则如下所示:
![](https://img.haomeiwen.com/i12735209/d537043735dd3866.png)
我们可以将贝叶斯法则中的 A 和B 替换掉,显示为:
![](https://img.haomeiwen.com/i12735209/bfb4031855731d95.png)
现在,我们了解先验概率和条件概率后,可以改写为:
![](https://img.haomeiwen.com/i12735209/63d87803d5d84434.png)
不过我们还不知道一件事!我们看见红色的概率是多少? 答案是 0.5。
全概率法则
机器人看到红色有以下两种情况。
- 当机器人处于红色格子并且其传感器正常工作时。
- 当机器人在绿色格子中,其传感器犯了一个错误。 我只需要把这两个概率加起来就可以得到红色的总概率。
![](https://img.haomeiwen.com/i12735209/bc14af89bc780d03.png)
由此得出答案:
![](https://img.haomeiwen.com/i12735209/05fa7666d68a9f6b.png)
The following questions will help you review what you learned in the Bayes' Rule lesson.
Prior knowledge
For questions 1-3, assume you already have the following knowledge:
You’re interested in finding out the probability of a car stopping if it sees a yellow traffic light.
-
Past data tells you that the probability of a car stopping at a traffic light intersection is
P(S) = 0.40 -
You also know that the past probability of a traffic light being yellow (as opposed to red or green) is
P(Y) = 0.10
![](https://img.haomeiwen.com/i12735209/cc46f20f990875f6.png)
习题 1/5
When a car is stopped at an intersection, data shows that 12% of the time the light is yellow. So if we know a car is stopped, there's a 12% chance the light is yellow. This is called a conditional probability.
Given P(S) and P(Y) above, how would you represent this conditional probability in notation?
Given that a car is stopped, we know that it is 12% likely (0.12 in decimal value) that the light is yellow, which is given by the notation P(Y|S). Which can be read as "Probability of Yellow given a Stopped car."
P(Y|S) = 0.12
习题 2/5
Using what you know from question 1, answer the following: if the traffic light is yellow, what is the chance that the car will stop?
Using Bayes' rule, we know that
P(S|Y) = P(Y|S)P(S) / P(Y)
P(S|Y) = 0.120.4 / 0.1 = 0.48
And intuitively this value seems about right; a car should stop about half the time when faced with a yellow light.
0.48
习题 3/5
Knowing that a car stopping at an intersection and the presence of a yellow traffic light are related events, what are P(S) and P(Y) known as?
![](https://img.haomeiwen.com/i12735209/bb33e8488d0396bd.png)
Questions 4 and 5 are different scenarios.
Prior knowledge for question 4:
On a four-lane highway, cars are either going fast or not fast. Faster cars should go in the leftmost lanes.
- At any given time, 20% of cars are in the left-most lane.
- Overall, 40% of cars on the highway are classified as going fast.
- Out of all the cars in the leftmost lane, 90% are going fast.
习题 4/5
Given the above information, if a car is going fast, what is the probability that it will be in the leftmost lane?
Using Bayes' rule, we know that 0.9*0.2/0.4 = 0.45.
Bayes' rule is not only used to incorporate sensor data into an estimate; it’s also often used to incorporate test data into a medical diagnosis.
Prior knowledge for question 5:
- 1% of all people have cancer.
- 90% of people who have cancer test positive when given a cancer-detecting blood test, meaning the test detects cancer 90% of the time.
- 5% of people will have false positives, meaning that 5% of the time, this test will produce a positive result when people do not have cancer.
习题 5/5
Given the above data, what is the probability that a person has cancer if they have a positive cancer-test result? (Note: answers are rounded to the nearest 4th decimal place).
0.1538
网友评论