1. 思路
The point of this quick post is to write out why using the log-odds is infact very well motivated in the first place, and once it is modeled by a linear function, what you get is the logistic function.
Beginning with log-odds would infact be begging the question, so let us try to understand.
2. 结论
log-odds 是个很自然的选择,sigmoid 是对 log-odds 的线性建模。(事实逻辑回归可以说是the log-odds with a linear function的最简单的例子,如果我们有结构化输出,这种模型的自然扩展将是the Conditional Random Field。使用线性函数的选择只是在其他一些有利的属性中凸优化)。
3. 理解
假设我们有一个线性分类器:data:image/s3,"s3://crabby-images/0d38b/0d38b5fe79571f3fb7919269bf0b6bc3723c05c7" alt=""
我们要求得合适的W和 ,使 0-1 loss 的期望值最小,即下面这个期望最小:
data:image/s3,"s3://crabby-images/76127/76127ca93eb6c033d0e12992cbabcd583f00cd67" alt=""
data:image/s3,"s3://crabby-images/3b09c/3b09c6d0302cc0e93f1bb62ff4a3300b5c13a434" alt=""
data:image/s3,"s3://crabby-images/f19e9/f19e9ea61c113280d11bc4a51c493238806ebaef" alt=""
data:image/s3,"s3://crabby-images/c74b3/c74b3903c056c22d8323cc2b01ef96cb90fff438" alt=""
data:image/s3,"s3://crabby-images/5361c/5361c0b7f6a77055cf550aa0fdede1d127dd5274" alt=""
data:image/s3,"s3://crabby-images/0e1c8/0e1c8c7c626e3c494b234d895cbc9051c85e3a24" alt=""
data:image/s3,"s3://crabby-images/37187/37187fe6325952922b1bbd47a7ab940c046dde97" alt=""
data:image/s3,"s3://crabby-images/dd4d2/dd4d2634b9d13daa24b0905e4ae4204f3ade1910" alt=""
data:image/s3,"s3://crabby-images/8194f/8194fafe0d16452d710c08be69575cbde43e1482" alt=""
值得注意的是,到目前为止,我们对数据完全没有做出任何假设。 所以上面的分类器就新样本点的预期损失而言,是我们在泛化方面可以拥有的最佳分类器。 这种分类器称为贝叶斯分类器,有时也称为Plug-in 分类器。
data:image/s3,"s3://crabby-images/38b0b/38b0b4ff64ac69ca2bedb6ec6921e5a7365c8e75" alt=""
data:image/s3,"s3://crabby-images/515a8/515a89d83c443c677b97a96626f89274f2e0f4be" alt=""
data:image/s3,"s3://crabby-images/b6dfb/b6dfba947de005bbe8e8cc375fd02c537d0a1827" alt=""
我们得到了 log-odds ratio !
请注意,通过不对数据做出任何假设,只需写出条件风险,log-odds ratio 就会直接下降。 这不是偶然的,因为最佳贝叶斯分类器具有用于二进制分类的这种形式。 但问题仍然存在,我们如何模拟这个对数比值比? 最简单的选择是考虑线性模型(there is no reason to stick to a linear model, but due to some reasons, one being convexity, we stick to a linear model):
data:image/s3,"s3://crabby-images/7c4ab/7c4ab4ea1b7be9d73d3a0ba368bb7c4faf4f85f3" alt=""
data:image/s3,"s3://crabby-images/df4d0/df4d09a32c2a492805df203cfcf3c590366f0589" alt=""
data:image/s3,"s3://crabby-images/8994f/8994f638a25d9a12abfe558cf27fcb7965264a94" alt=""
由此可见,log-odds 是个很自然的选择,sigmoid 是对 log-odds 的线性建模。
网友评论