Random Variable:
- its variable changes due to chance
- can be a combination: Y = [y1, y2, ..., yn]
- discrete/ continuous
Discrete variable:
- Expected Value
- Variance
Here X means random variable
Discrete Probability Distribution:
- Bernoulli Distribution (two-point)
p 为 x=1 的概率 - Binomial Distribution
n independent trials
p 为试验成功(1)的概率,x为成功次数
P 描述在n次独立的实验中成功x次的概率
Continuous random variable:
- probability density function (PDF)
- the probability is given by the integral
-
the entire space is equal to one
image
- Expected value
- Variance
- Normal Distribution (PDF)
- given and , the form is determined
Central Limit Theorem
- exmaple: use random(5) to get random(25)
answer 5*random(5) + random(5) (quinary 五进制) - Background:
If a random variable reflects a large number of independent random factors, and each single factor does not play a significant role in the intermediate stage of the comprehensive influence, the random variable generally follows the normal distribution. 若一个随机变量反应了大量相互独立的随机因素综合影响,而每一个单独因素在综合影响中期的作用不显著,则这种随机变量一般都服从正态分布。 - analysis:
- the limit of sum of independent variables with the same distribution is normal distribution
- because the sum can approch , we consider their normalized forms. Then it is a normal distribution
- no matter these factors have any forms of distribution, when n is very large, the distribution of their sum and the sampling distribution are close to normal distribution
Linear Regression
Loss Function
- Least Square: (argument mini) 从距离的角度建立目标函数
- Maximum Likelihood Estimation (MLE): 从概率的角度建立目标函数
- Definition: find the parameter values that maximize the likelihood of making the observations given the parameters
- assumption for linear regression:
p(y|x) is a Gaussian distrubution with
use to denote , we have - For a sample Yi, its PDF:
Because all the Yi are independent, the MLA function becomes:
Logistic Regression
- How to create a function to fit discrete values?
step 1: Discrete Y -> Continuous y(p)
step 2: Continuous y(p) -> X
then we can get X <-> Y - p -> [0, 1]
odds = p/(1-p) -> [0, +infty]
log(odds) -> [-infty, +infty]
thus, x= log(odds) <-> p
Loss Function
- assumption for logistic regression:
p(y|x) is a Bernoulli distrubution with - Step 1: Choose Model
- Step 2: Calculate loss function - MLE
网友评论