Linear Regression & Logistic Reg

Linear Regression & Logistic Reg

作者: asuka_19d5 | 来源:发表于2018-10-18 06:19 被阅读0次

数据挖掘常用模型构建示例（R语言版）
Validation
Linear Regression & Logistic Reg
tensorflow 已经完成高级别的模型封装种类
Logistic Regression
M.L.-Classification and Represen
机器学习算法速查
【机器学习】复习大纲
2020-02-06
Logistic Regression

Random Variable:

its variable changes due to chance
can be a combination: Y = [y1, y2, ..., yn]
discrete/ continuous

Discrete variable:

Expected Value
$\mu = E(X) = \sum^n_{i=1}x_ip_i$
Variance
$\sigma^2 = Var(X) = \sum^n_{i=1}(x_i - \mu)^2p_i$
Here X means random variable

Discrete Probability Distribution:

Bernoulli Distribution (two-point)
$P(X = x) = p^x(1 - p)^{1 - x}, 0 <p <1, x = 0, 1$
p 为 x=1 的概率
Binomial Distribution
n independent trials
$P(X= x) = C^x_np^x(1 - p)^{n-x}, x = 1, 1, 2, ..., n$
p 为试验成功（1）的概率，x为成功次数
P 描述在n次独立的实验中成功x次的概率

Continuous random variable:

probability density function (PDF)
- the probability is given by the integral
- the entire space is equal to one
  
  image
Expected value
$\mu = E(X) = \int^\infty_{-\infty}xf(x)dx$
Variance
$\sigma^2 = Var(X) = \int^\infty_{-\infty}[x - E(X)]^2f(x)dx$
Normal Distribution (PDF)
- given $\mu$ and $\sigma$ , the form is determined

Central Limit Theorem

exmaple: use random(5) to get random(25)
answer 5*random(5) + random(5) （quinary 五进制）
Background:
If a random variable reflects a large number of independent random factors, and each single factor does not play a significant role in the intermediate stage of the comprehensive influence, the random variable generally follows the normal distribution. 若一个随机变量反应了大量相互独立的随机因素综合影响，而每一个单独因素在综合影响中期的作用不显著，则这种随机变量一般都服从正态分布。
analysis:
- the limit of sum of independent variables with the same distribution is normal distribution
- because the sum can approch $\infty$ , we consider their normalized forms. Then it is a normal distribution
no matter these factors have any forms of distribution, when n is very large, the distribution of their sum and the sampling distribution are close to normal distribution

Linear Regression

Loss Function

Least Square: (argument mini) 从距离的角度建立目标函数 $argmin_{\beta}\sum_{i=1}^{n}\epsilon^2_i = \sum_{i=1}^{n}(y_i - \beta_0 - \sum_{i=1}^{m}\beta_jx_{ji})^2$
Maximum Likelihood Estimation (MLE): 从概率的角度建立目标函数
- Definition: find the parameter values that maximize the likelihood of making the observations given the parameters
- assumption for linear regression:
  p(y|x) is a Gaussian distrubution with $mean = \mu = ax + b, variance = \sigma$
  $p(y) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2\sigma^2}(y- \mu)^2}, -\infty < y < \infty$ $p(y_1, y_2, ... , y_n|\sigma, \mu) = p(y_1|\sigma, \mu)(y_2|\sigma, \mu) ... p(y_n|\sigma, \mu)$ use $\theta$ to denote $(\sigma, \mu)$ , we have $\theta_{ML}(Y) = argmax_{\theta}p(Y|\theta)$
- For a sample Yi, its PDF: $P(Y_i) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2\sigma^2}(Y_i - \beta_0 -\beta_1X_i)^2}$
  Because all the Yi are independent, the MLA function becomes: $L(\beta_0, \beta_1, \sigma^2) = P(Y_1, Y_2, ... ,Y_n) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2\sigma^2}\sum(Y_i - \beta_0 -\beta_1X_i)^2}$ $lnL = -nln(\sqrt{2\pi}\sigma) - \frac{1}{2\sigma^2}\sum_{i=1}^n(Y_i - \beta_0 -\beta_1X_i)^2$ $max L -> min \sum_{i=1}^{n}\epsilon^2_i = \sum_{i=1}^{n}(y_i - \beta_0 - \sum_{i=1}^{m}\beta_jx_{ji})^2$

Logistic Regression

How to create a function to fit discrete values?
step 1: Discrete Y -> Continuous y(p)
step 2: Continuous y(p) -> X
then we can get X <-> Y
p -> [0, 1]
odds = p/(1-p) -> [0, +infty]
log(odds) -> [-infty, +infty]
thus, x= log(odds) <-> p

Loss Function

assumption for logistic regression:
p(y|x) is a Bernoulli distrubution with $P(Y =y | X) = \frac{1}{1+ e^{-ax-b}}$
Step 1: Choose Model $P(Y = y_i) = p^{y_i}(1-p)^{1-y_i}, 0 < p < 1, y_i = 0, 1$ $p = h_{\beta}(x_i) = \frac{1}{1+e^{-\beta_0-\beta_1x}}$
Step 2: Calculate loss function - MLE
$L(\beta_0, \beta_1) = P(Y_1, Y_2, ... , Y_n) = \Pi_{i=1}^n h_{\beta}(x_i)^{y_i}(1 - h_{\beta}(x_i))^{(1-y_i)}, y_i = 0, 1$ $min log(P) = \sum^n_{i=1}[ - y_ilog(h_{\beta}(x_i)) - (1 - y_i)log(1 - h_\beta(x_i))]$

相关文章

数据挖掘常用模型构建示例（R语言版）
Linear Regression Ridge Regreesion and Lasso Logistic Reg...
Validation
Validation 对于PLA, pocket, linear regression, logistic reg...
Linear Regression & Logistic Reg
Random Variable: its variable changes due to chance can b...
tensorflow 已经完成高级别的模型封装种类
linear regression logistic regression linear classificati...
Logistic Regression
Logistic Regression Identical part of Linear Regression i...
M.L.-Classification and Represen
1.Logistic Regression（classification regression） Linear R...
机器学习算法速查
机器学习算法速查 Linear Regression Logistic Regression Decision T...
【机器学习】复习大纲
What is machine learning? Linear Regression Logistic Regr...
2020-02-06
Chapter 7 Logistic regression Wouldnt suggest linear as c...
Logistic Regression
Logistic Regression可以简单记为在linear regression的后面加了一个functio...

网友评论

本文标题：Linear Regression & Logistic Reg

本文链接：https://www.haomeiwen.com/subject/kkadzftx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

栏目导航

热点阅读

关于我们|服务条款|联系我们|Linear Regression & Logistic Reg|投稿指南|网站地图|RSS订阅|排版工具|手机版

提供经典美文摘抄,优美散文欣赏,现代诗歌精选,短篇小说,心情随笔,表白情书范文,故事会在线阅读欣赏

Copyright © 2014-2023 Haomeiwen.com All Rights Reserved. 好美文阅读网版权所有

备案信息：桂公网安备 45052102000051号 · 桂ICP备13007215号-3

本站所收录作品、热点评论等信息部分来源互联网，目的只是为了系统归纳学习和传递资讯

所有作品版权归原创作者所有，与本站立场无关，如不慎侵犯了你的权益，请联系我们告知，我们将做删除处理！