美文网首页
1线性回归

1线性回归

作者: xxy41902_29d6 | 来源:发表于2019-07-07 19:05 被阅读0次

斜体部分为本人的理解

线性回归(LinearRegression),即模型服从最小二乘y=ax+b,机器学习就是通过损失函数loss去拟合合适的值a,b

这个简单的线性回归模型是, y=bx+e,其中e服从正态分布(/0,sigma^2*i), 在统计学里这个相似于  y - bX ~ N(0, sigma^2 * I), y | X, b ~ N(bX, sigma^2 * I), 这个模型的损失函数类似于预测值与真实值的方差损失函数, 这个模型的变量b可以通过下式计算 b = (X^T X)^{-1} X^T y, 这个公式又被称为伪逆矩阵



代码

class LinearRegression:

    """

    The simple linear regression model is

        y = bX + e  where e ~ N(0, sigma^2 * I)

    In probabilistic terms this corresponds to

        y - bX ~ N(0, sigma^2 * I)

        y | X, b ~ N(bX, sigma^2 * I)

    The loss for the model is simply the squared error between the model

    predictions and the true values:

        Loss = ||y - bX||^2

    The MLE for the model parameters b can be computed in closed form via the

    normal equation

        b = (X^T X)^{-1} X^T y

    where (X^T X)^{-1} X^T is known as the pseudoinverse / Moore-Penrose

    inverse.

    """

    def __init__(self, fit_intercept=True):

        self.beta = None

        self.fit_intercept = fit_intercept

    def fit(self, X, y):

        # convert X to a design matrix if we're fitting an intercept

        if self.fit_intercept:

            X = np.c_[np.ones(X.shape[0]), X]#矩阵拼接

        pseudo_inverse = np.dot(np.linalg.inv(np.dot(X.T, X)), X.T)#求解伪逆矩阵

        self.beta = np.dot(pseudo_inverse, y)#求解b即beta为b

    def predict(self, X):

        # convert X to a design matrix if we're fitting an intercept

        if self.fit_intercept:

            X = np.c_[np.ones(X.shape[0]), X]

        return np.dot(X, self.beta)#预测y=bx

二 岭回归(RidgeRegression)

与LinearRegression相比,在损失函数中添加了一项alpha,拟合更加精确

岭回归使用了与简单线性回归相同的模型,但是为损失函数添加了额外对系数L2的惩罚,这有时也被叫做吉洪诺夫正则化,一般来说,这个模型也符合简单的模型

      y = bX + e  where e ~ N(0, sigma^2 * I)

    ,但它的损失函数由下面的公式计算

    RidgeLoss = ||y - bX||^2 + alpha * ||b||^2

    在ML中这个模型b可以被计算通过一个调整后的式子

    b = (X^T X + alpha I)^{-1} X^T y

    其中(X^T X + alpha I)^{-1} X^T是被L2系数调整后的伪逆矩阵

原文


class RidgeRegression:

""" Ridge regression uses the same simple linear regression model but adds an additional penalty on the L2-norm of the coefficients to the loss function.

This is sometimes known as Tikhonov regularization.

In particular, the ridge model is still simply

y = bX + e where e ~ N(0, sigma^2 * I)

except now the error for the model is calcualted as

RidgeLoss = ||y - bX||^2 + alpha * ||b||^2

The MLE for the model parameters b can be computed in closed form via the adjusted normal equation:

b = (X^T X + alpha I)^{-1} X^T y

where (X^T X + alpha I)^{-1} X^T is the pseudoinverse / Moore-Penrose inverse adjusted for the L2 penalty on the model coefficients.

"""

def __init__(self, alpha=1, fit_intercept=True):

""" A ridge regression model fit via the normal equation. 岭回归服从正态分布

Parameters(参数) ---------- alpha : float (default: 1) L2 regularization coefficient. Higher values correspond to larger penalty on the l2 norm of the model coefficients

L2系数惩罚项,较高的值对应较大的模型系数l2范数的惩罚

fit_intercept : bool (default: True) Whether to fit an additional intercept term in addition to the model coefficients (系数是否符合截距???) """

self.beta = None

self.alpha = alpha

self.fit_intercept = fit_intercept

def fit(self, X, y): # convert X to a design matrix if we're fitting an intercept if self.fit_intercept: X = np.c_[np.ones(X.shape[0]), X]

A = self.alpha * np.eye(X.shape[1])

pseudo_inverse = np.dot(np.linalg.inv(X.T @ X + A), X.T)

self.beta = pseudo_inverse @ y

def predict(self, X): # convert X to a design matrix if we're fitting an intercept

if self.fit_intercept: X = np.c_[np.ones(X.shape[0]), X]

return np.dot(X, self.beta)


sigmod函数

def sigmoid(x):

    return 1 / (1 + np.exp(-x))

三逻辑回归(LogisticRegression)

使用线性模型为每一特征赋一个值再代入sigmod函数判断在哪个部分

class LogisticRegression:

    def __init__(self, penalty="l2", gamma=0, fit_intercept=True):

        """

        A simple logistic regression model fit via gradient descent on the

        penalized negative log likelihood.

        一个简单的逻辑回归模型符合梯度下降的负对数概率惩罚

        Parameters(变量)

        ----------

        penalty : str (default: 'l2')

            The type of regularization penalty to apply on the coefficients

            `beta`. Valid entries are {'l2', 'l1'}.

            penalty是一个应用于系数“beta”的正则化惩罚,默认12

        gamma : float in [0, 1] (default: 0)

            The regularization weight. Larger values correspond to larger

            regularization penalties, and a value of 0 indicates no penalty.

            gamme:正则化权值,大的值意味着大的penalty,而0代表没有惩罚

        fit_intercept : bool (default: True)

            Whether to fit an intercept term in addition to the coefficients in

            b. If True, the estimates for `beta` will have M+1 dimensions,

            where the first dimension corresponds to the intercept

            截距是否符合系数b,如果是该变量为true,,估计beta有m+1个维度其中第一个维度与截距相关

        """

        err_msg = "penalty must be 'l1' or 'l2', but got: {}".format(penalty)

        assert penalty in ["l2", "l1"], err_msg

        self.beta = None

        self.gamma = gamma

        self.penalty = penalty

        self.fit_intercept = fit_intercept

    def fit(self, X, y, lr=0.01, tol=1e-7, max_iter=1e7):

        # convert X to a design matrix if we're fitting an intercept

        if self.fit_intercept:

            X = np.c_[np.ones(X.shape[0]), X]

        l_prev = np.inf#预设为无穷大

        self.beta = np.random.rand(X.shape[1])

        for _ in range(int(max_iter)):

            y_pred = sigmoid(np.dot(X, self.beta))

            loss = self._NLL(X, y, y_pred)

            if l_prev - loss < tol:

                return

            l_prev = loss

            self.beta -= lr * self._NLL_grad(X, y, y_pred)

    def _NLL(self, X, y, y_pred):

        """

        Penalized negative log likelihood of the targets under the current

        model.

        当前目标负对数概率的惩罚模型。

            NLL = -1/N * (

                [sum_{i=0}^N y_i log(y_pred_i) + (1-y_i) log(1-y_pred_i)] -

                (gamma ||b||) / 2

            )

        """

        N, M = X.shape

        order = 2 if self.penalty == "l2" else 1

        nll = -np.log(y_pred[y == 1]).sum() - np.log(1 - y_pred[y == 0]).sum()

        penalty = 0.5 * self.gamma * np.linalg.norm(self.beta, ord=order)

        return (penalty + nll) / N

    def _NLL_grad(self, X, y, y_pred):

        """ Gradient of the penalized negative log likelihood wrt beta """

        N, M = X.shape

        p = self.penalty

        beta = self.beta

        gamma = self.gamma

        d_penalty = gamma * beta if p == "l2" else gamma * np.sign(beta)

        return -(np.dot(y - y_pred, X) + d_penalty) / N

    def predict(self, X):

        # convert X to a design matrix if we're fitting an intercept

        if self.fit_intercept:

            X = np.c_[np.ones(X.shape[0]), X]

        return sigmoid(np.dot(X, self.beta))

2019.7.10更新

四贝叶斯回归

还有两个算法是贝叶斯回归,看不太懂了,先放上

class BayesianLinearRegressionUnknownVariance:

    """

    贝叶斯回归

    Bayesian Linear Regression

    --------------------------

    In its general form, Bayesian linear regression extends the simple linear

    regression model by introducing priors on model parameters b and/or the

    error variance sigma^2.

  常规来看,贝叶斯线性回归通过引入先验的模型参数b或者损失方差sigma^2来拓展简单线性回归模型

    The introduction of a prior allows us to quantify the uncertainty in our

    parameter estimates for b by replacing the MLE point estimate in simple

    linear regression with an entire posterior *distribution*, p(b | X, y,

    sigma), simply by applying Bayes rule:

    先验的引入使我们能够量化用简单的MLE点估计代替B的参数估计具有完整后验分布的线性回归,p(b_x,y,sigma),只需应用bayes规则:

        p(b | X, y) = [ p(y | X, b) * p(b | sigma) ] / p(y | X)

    We can also quantify the uncertainty in our predictions y* for some new

    data X* with the posterior predictive distribution:

    我们也可以用我们预测的y*和一些新的x*预测分布来后验我们不确定的参数:

        p(y* | X*, X, Y) = \int_{b} p(y* | X*, b) p(b | X, y) db

    Depending on the choice of prior it may be impossible to compute an

    analytic form for the posterior / posterior predictive distribution. In

    these cases, it is common to use approximations, either via MCMC or

    variational inference.

    根据先验的选择,可能无法计算后/后预测分布的分析形式。在这些情况下,通常通过MCMC或变分推理。

    Bayesian Regression w/ unknown variance

    贝叶斯未知变化范围的回归

    ---------------------------------------

    If *both* b and the error variance sigma^2 are unknown, the conjugate prior

    for the Gaussian likelihood is the Normal-Gamma distribution (univariate

    likelihood) or the Normal-Inverse-Wishart distribution (multivariate

    likelihood).

    如果*b和误差方差sigma^2都未知,则共轭先验因为高斯似然是正态伽马分布(单变量似然)或威沙特分布(多变量可能性)。

        Univariate:

            b, sigma^2 ~ NG(b_mean, b_V, alpha, beta)

            sigma^2 ~ InverseGamma(alpha, beta)

            b | sigma^2 ~ N(b_mean, sigma^2 * b_V)

            where alpha, beta, b_V, and b_mean are parameters of the prior.

        Multivariate:

            b, Sigma ~ NIW(b_mean, lambda, Psi, rho)

            Sigma ~ N(b_mean, 1/lambda * Sigma)

            b | Sigma ~ W^{-1}(Psi, rho)

            where b_mean, lambda, Psi, and rho are parameters of the prior.

    Due to the conjugacy of the above priors with the Gaussian likelihood of

    the linear regression model we can compute the posterior distributions for

    the model parameters in closed form:

    根据上述先验对线性模型的高斯似然的共轭我们可以在闭合域内计算模型后验的下述变量

        B = (y - X b_mean)

        shape = N + alpha

        scale = (1 / shape) * {alpha * beta + B^T ([X b_V X^T + I])^{-1} B}

        sigma^2 | X, y ~ InverseGamma(shape, scale)

        A    = (b_V^{-1} + X^T X)^{-1}

        mu_b  = A b_V^{-1} b_mean + A X^T y

        cov_b = sigma^2 A

        b | X, y, sigma^2 ~ N(mu_b, cov_b)

    This allows us a closed form for the posterior predictive distribution as

    well:

    这也使我们用一个闭合的形式验证后验分布(翻译不太懂。。。)

        y* | X*, X, Y ~ N(X* mu_b, X* cov_b X*^T + I)

    """

    def __init__(self, alpha=1, beta=2, b_mean=0, b_V=None, fit_intercept=True):

        """

        Bayesian linear regression model with conjugate Normal-Gamma prior on b

        and sigma^2

        贝叶斯回归有共轭normal-gamma先验参数b和sigma2,服从下列分布

            b, sigma^2 ~ NG(b_mean, b_V, alpha, beta)

            sigma^2 ~ InverseGamma(alpha, beta)

            b ~ N(b_mean, sigma^2 * b_V)

        Parameters

        ----------

        alpha : float (default: 1)

            The shape parameter for the Inverse-Gamma prior on sigma^2. Must be

            strictly greater than 0.

        beta : float (default: 1)

            The scale parameter for the Inverse-Gamma prior on sigma^2. Must be

            strictly greater than 0.

        b_mean : np.array of shape (M,) or float (default: 0)

            The mean of the Gaussian prior on b. If a float, assume b_mean is

            np.ones(M) * b_mean.

        b_V : np.array of shape (N, N) or np.array of shape (N,) or None

            A symmetric positive definite matrix that when multiplied

            element-wise by b_sigma^2 gives the covariance matrix for the

            Gaussian prior on b. If a list, assume b_V=diag(b_V). If None,

            assume b_V is the identity matrix.

        fit_intercept : bool (default: True)

            Whether to fit an intercept term in addition to the coefficients in

            b. If True, the estimates for b will have M+1 dimensions, where

            the first dimension corresponds to the intercept

        """

        # this is a placeholder until we know the dimensions of X

        b_V = 1.0 if b_V is None else b_V

        if isinstance(b_V, list):

            b_V = np.array(b_V)

        if isinstance(b_V, np.ndarray):

            if b_V.ndim == 1:

                b_V = np.diag(b_V)

            elif b_V.ndim == 2:

                assert is_symmetric_positive_definite(

                    b_V

                ), "b_V must be symmetric positive definite"

        self.b_V = b_V

        self.beta = beta

        self.alpha = alpha

        self.b_mean = b_mean

        self.fit_intercept = fit_intercept

        self.posterior = {"mu": None, "cov": None}

        self.posterior_predictive = {"mu": None, "cov": None}

    def fit(self, X, y):

        # convert X to a design matrix if we're fitting an intercept

        if self.fit_intercept:

            X = np.c_[np.ones(X.shape[0]), X]

        N, M = X.shape

        beta = self.beta

        self.X, self.y = X, y

        if is_number(self.b_V):

            self.b_V *= np.eye(M)

        if is_number(self.b_mean):

            self.b_mean *= np.ones(M)

        # sigma

        I = np.eye(N)

        a = y - np.dot(X, self.b_mean)

        b = np.linalg.inv(np.dot(X, self.b_V).dot(X.T) + I)

        c = y - np.dot(X, self.b_mean)

        shape = N + self.alpha

        sigma = (1 / shape) * (self.alpha * beta ** 2 + np.dot(a, b).dot(c))

        scale = sigma ** 2

        # b_sigma is the mode of the inverse gamma prior on sigma^2

        b_sigma = scale / (shape - 1)

        # mean

        b_V_inv = np.linalg.inv(self.b_V)

        l = np.linalg.inv(b_V_inv + np.dot(X.T, X))

        r = np.dot(b_V_inv, self.b_mean) + np.dot(X.T, y)

        mu = np.dot(l, r)

        cov = l * b_sigma

        # posterior distribution for sigma^2 and c

        self.posterior = {

            "sigma**2": {"dist": "InvGamma", "shape": shape, "scale": scale},

            "b | sigma**2": {"dist": "Gaussian", "mu": mu, "cov": cov},

        }

    def predict(self, X):

        # convert X to a design matrix if we're fitting an intercept

        if self.fit_intercept:

            X = np.c_[np.ones(X.shape[0]), X]

        I = np.eye(X.shape[0])

        mu = np.dot(X, self.posterior["b | sigma**2"]["mu"])

        cov = np.dot(X, self.posterior["b | sigma**2"]["cov"]).dot(X.T) + I

        # MAP estimate for y corresponds to the mean of the posterior

        # predictive

        self.posterior_predictive["mu"] = mu

        self.posterior_predictive["cov"] = cov

        return mu

class BayesianLinearRegressionKnownVariance:

    """

    Bayesian Linear Regression

    --------------------------

    In its general form, Bayesian linear regression extends the simple linear

    regression model by introducing priors on model parameters b and/or the

    error variance sigma^2.

    The introduction of a prior allows us to quantify the uncertainty in our

    parameter estimates for b by replacing the MLE point estimate in simple

    linear regression with an entire posterior *distribution*, p(b | X, y,

    sigma), simply by applying Bayes rule:

        p(b | X, y) = [ p(y | X, b) * p(b | sigma) ] / p(y | X)

    We can also quantify the uncertainty in our predictions y* for some new

    data X* with the posterior predictive distribution:

        p(y* | X*, X, Y) = \int_{b} p(y* | X*, b) p(b | X, y) db

    Depending on the choice of prior it may be impossible to compute an

    analytic form for the posterior / posterior predictive distribution. In

    these cases, it is common to use approximations, either via MCMC or

    variational inference.

    此段和上面一样

    Bayesian linear regression with known variance

    ----------------------------------------------

    If we happen to already know the error variance sigma^2, the conjugate

    prior on b is Gaussian. A common parameterization is:

    如果我们已知损失方差sigma2,那么b的共轭先验是高斯化的,一个普通的关系是:

        b | sigma, b_V ~ N(b_mean, sigma^2 * b_V)

    where b_mean, sigma and b_V are hyperparameters. Ridge regression is a

    special case of this model where b_mean = 0, sigma = 1 and b_V = I (ie.,

    the prior on b is a zero-mean, unit covariance Gaussian).

    Due to the conjugacy of the above prior with the Gaussian likelihood in the

    linear regression model, we can compute the posterior distribution over the

    model parameters in closed form:

        A    = (b_V^{-1} + X^T X)^{-1}

        mu_b  = A b_V^{-1} b_mean + A X^T y

        cov_b = sigma^2 A

        b | X, y ~ N(mu_b, cov_b)

    which allows us a closed form for the posterior predictive distribution as

    well:

        y* | X*, X, Y ~ N(X* mu_b, X* cov_b X*^T + I)

    """

    def __init__(self, b_mean=0, b_sigma=1, b_V=None, fit_intercept=True):

        """

        Bayesian linear regression model with known error variance and

        conjugate Gaussian prior on b

            b | b_mean, sigma^2, b_V ~ N(b_mean, sigma^2 * b_V)

        Ridge regression is a special case of this model where b_mean = 0,

        sigma = 1 and b_V = I (ie., the prior on b is a zero-mean, unit

        covariance Gaussian).

        Parameters

        ----------

        b_mean : np.array of shape (M,) or float (default: 0)

            The mean of the Gaussian prior on b. If a float, assume b_mean is

            np.ones(M) * b_mean.

        b_sigma : float (default: 1)

            A scaling term for covariance of the Gaussian prior on b

        b_V : np.array of shape (N,N) or np.array of shape (N,) or None

            A symmetric positive definite matrix that when multiplied

            element-wise by b_sigma^2 gives the covariance matrix for the

            Gaussian prior on b. If a list, assume b_V=diag(b_V). If None,

            assume b_V is the identity matrix.

        fit_intercept : bool (default: True)

            Whether to fit an intercept term in addition to the coefficients in

            b. If True, the estimates for b will have M+1 dimensions, where

            the first dimension corresponds to the intercept

        """

        # this is a placeholder until we know the dimensions of X

        b_V = 1.0 if b_V is None else b_V

        if isinstance(b_V, list):

            b_V = np.array(b_V)

        if isinstance(b_V, np.ndarray):

            if b_V.ndim == 1:

                b_V = np.diag(b_V)

            elif b_V.ndim == 2:

                assert is_symmetric_positive_definite(

                    b_V

                ), "b_V must be symmetric positive definite"

        self.posterior = {}

        self.posterior_predictive = {}

        self.b_V = b_V

        self.b_mean = b_mean

        self.b_sigma = b_sigma

        self.fit_intercept = fit_intercept

    def fit(self, X, y):

        # convert X to a design matrix if we're fitting an intercept

        if self.fit_intercept:

            X = np.c_[np.ones(X.shape[0]), X]

        N, M = X.shape

        self.X, self.y = X, y

        if is_number(self.b_V):

            self.b_V *= np.eye(M)

        if is_number(self.b_mean):

            self.b_mean *= np.ones(M)

        b_V = self.b_V

        b_mean = self.b_mean

        b_sigma = self.b_sigma

        b_V_inv = np.linalg.inv(b_V)

        l = np.linalg.inv(b_V_inv + np.dot(X.T, X))

        r = np.dot(b_V_inv, b_mean) + np.dot(X.T, y)

        mu = np.dot(l, r)

        cov = l * b_sigma ** 2

        # posterior distribution over b conditioned on b_sigma

        self.posterior["b"] = {"dist": "Gaussian", "mu": mu, "cov": cov}

    def predict(self, X):

        # convert X to a design matrix if we're fitting an intercept

        if self.fit_intercept:

            X = np.c_[np.ones(X.shape[0]), X]

        I = np.eye(X.shape[0])

        mu = np.dot(X, self.posterior["b"]["mu"])

        cov = np.dot(X, self.posterior["b"]["cov"]).dot(X.T) + I

        # MAP estimate for y corresponds to the mean of the posterior

        # predictive distribution

        self.posterior_predictive = {"dist": "Gaussian", "mu": mu, "cov": cov}

        return mu

相关文章

  • 线性回归模型

    参考:1.使用Python进行线性回归2.python机器学习:多元线性回归3.线性回归概念 线性回归模型是线性模...

  • 回归的分类

    一、回归可以分为以下几类 1.线性回归 2.非线性回归 3.逻辑回归 二、回归的概念 1.线性回归 可以简单理解为...

  • 算法概述-02

    1.逻辑回归和线性回归的联系和区别: 逻辑回归和线性回归的都是广义的线性回归。 线性回归是根据最小二乘法来建模,逻...

  • 初始线性回归 - 正规方程 - 梯度下降法 - 岭回归

    一、初始线性回归: 1、线性回归的定义:线性回归(Linear regression)是利用回归方程(函数)对一个...

  • 统计学习基础复习浓缩版

    1.简单线性回归 2.多元线性回归 3.多项式回归 4.广义线性回归(含逻辑斯谛回归) 广义线性回归模型通过拟合响...

  • 机器学习

    1.线性回归 1.1一元线性回归 y=a+bx 1.2多元线性回归 y=a+b1x1+b2x2+...+bnxn ...

  • 深度学习-基本概念

    线性回归向非线性回归的转化 1. 线性回归 线性关系描述输入和输出的映射 2. 非线性回归 对线性模型引入一个激励...

  • 机器学习-10 线性回归及其相关算法

    返回主页 本节讨论四个内容:1、线性回归2、多重共线性问题3、岭回归4、局部加权线性回归 线性回归(Linear ...

  • 3 线性回归算法

    线性回归分为: 简单线性回归:特征数量只有一个。 多元线性回归:特征数量有多个。 1 简单线性回归 寻找一条直线,...

  • 线性回归大家族

    目录 1、线性回归大家族谱 2、总结 1、线性回归家族谱 2、总结:本文中,大家学习了多元线性回归,岭回归,Las...

网友评论

      本文标题:1线性回归

      本文链接:https://www.haomeiwen.com/subject/rpjihctx.html