美文网首页AI
数据发掘与机器学习算法导论-1.最优化(Optimization

数据发掘与机器学习算法导论-1.最优化(Optimization

作者: sarai_c7eb | 来源:发表于2019-07-29 14:49 被阅读0次

    1.1 算法(algorithms)

    算法就是为了计算而产生的可迭代的,一步步执行的程序;
    这个程序可以是一段简洁的描述,一个公式,或者公式加描述;
    常用的算法如:寻找多项式的解,判断一个数字是否为素数,或者产生随机数字;这些都是算法。

    An algorithm is an iterative, step-by-step procedure for computation.

    1.1.1 算法的本质

    一个算法的本质可以写作一个迭代公式或者多个迭代公式的组合;

    In essence, an algorithm can be written as an iterative equation or a set of iterative equations;

    如:寻找a的平方根,迭代公式如下:
    x_{k+1}=\frac{1}{2}(x_k+\frac{a}{x_k}) \qquad k\in(0,1,2,...)
    k为迭代计数器,一般从0开始;
    a=100,从x_0=1开始迭代,R代码如下:

    cnt <- 8 
    a <- 100
    i <- 0
    x <- 1
    while(i<cnt) {
        x <- (1/2)*(x+a/x)
        i <- i+1
        cat("the iteration conter is: ",i," and the result is: ",x,"\n")
    }
    
    迭代结果

    第6轮时已经接近了真实解,第7轮和第8轮起始已经得到了真实的结果。

    泰勒公式
    如果f(x)x_0n阶可导,f(x)可利用(x-x_0)n次多项式来逼近。
    f(x)=\frac{f(x_0)}{0!}+\frac{f'(x_0)}{1!}(x-x_0)+\frac{f''(x_0)}{2!}(x-x_0)^2+...+\frac{f^{(n)}(x_n)}{n!}(x-x_0)^n+R_n(x)
    牛顿迭代算法
    取泰勒公式前两项f(x)=\frac{f(x_0)}{0!}+\frac{f'(x_0)}{1!}(x-x_0)=0
    得到:x=x_0-\frac{f(x_0)}{f'(x_0)}
    迭代公式:x_{k+1}=x_k-\frac{f(x_k)}{f'(x_k )}

    1.1.2 算法存在的问题

    • 最终解依赖于初始值,如上例子,x_0从-1开始迭代
      解以来与迭代的初始值
      解决办法:初始值随机产生.
    • 当前讨论的算法对f(x)的要求,在x_0处可导且不为0.
    • 其他issues:
      • setting of parameters
      • the slow rate of convergence
      • condition numbers
      • iteration structures

    1.1.3 算法的类型

    一种算法只能完成一种特定的计算任务,没有一种算法可以完成所有的任务:
    一般有:

    • root-finding algorithms
    • sorting algorithms

    1.2 最优化(optimization)

    1.2.1 A simple example

    已知一个体积确定的圆柱桶,怎么设计让其表面积最小:

    根据面积和体积公式,最终计算得出当h=2r时候最优。
    如果选择为球形,则得到更小的面积;

    1.2.2 General formulation of optimization

    maximize/minimize f(x), x=(x_1,x_2,...,x_D)^T\in R^D
    subject to:
    \phi_j(x)=0 (j=1,2,...,M)
    \psi_k(x)\leq 0 (k=1,...,N)

    根据微积分知识,可以根据一次微分和二次微分判断极大值和极小值;

    一次微分 二次微分
    极大值 f'(x)=0 f''(x)<0
    极小值 f'(x)=0 f''(x)>0

    1.2.3 feasible solution(可行解)

    A point x that satisfies all the constraints is called a feasible point and thus is a feasible solution to the problem.
    the set of all feasible points is called the feasible region.

    满足条件的取值范围都为可行解。

    1.2.4 optimality criteria(最优性判决)

    A point x_* is called a strong local maximum of the nonlinearly constrained optimization problem if f(x) is defined in a \delta-neighborhood N(x_*,\delta) and satisfies f(x_*)>f(u) for u\in N(x_*,\delta), where \delta>0 and u\neq x_ *
    a weak local maximum f(x_*)\geq f(u).
    a strong local minimum f(x_*)<f(u).
    a weak local minimum f(x_*)\leq f(u).

    1.3 无约束最优化

    1.3.1 univariate functions(一元函数)

    可利用1.2.2节中表格进行计算。
    即利用一次微分和二次微分组合进行计算;

    1.3.2 Multivariate functions(多元函数)

    一次微分用梯度向量G替换
    二次微分用Hessian matrix替换

    G=\nabla f=(\frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},...,\frac{\partial f}{\partial x_n})^T
    H= \left[ \begin{matrix} \frac{\partial^2f}{\partial x_1^2} &\frac{\partial^2f}{\partial x_1\partial x_2} & \cdots & \frac{\partial^2f}{\partial x_1\partial x_n} \\ \frac{\partial^2f}{\partial x_2\partial x_1} &\frac{\partial^2f}{\partial x_2^2} & \cdots & \frac{\partial^2f}{\partial x_2\partial x_n} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2f}{\partial x_n\partial x_1} & \frac{\partial^2f}{\partial x_n\partial x_2} & \cdots & \frac{\partial^2f}{\partial x_n^2} \\ \end{matrix} \right]
    G=0,当H行列式为正值时得到极小值。
    G=0,当H行列式为负值时得到极大值。

    1.4 非线性约束最优化

    思路:将约束问题变为非约束问题

    1.4.1 Penalty method

    minimize f(x),x=(x_1,...,x_n)^T\in R^n
    subject to:
    \phi_i(x)=0,(i=1,...,M)
    \psi_j(x)\leq0,(j=1,...,N)

    定义penalty function从而将约束问题变成非约束问题:
    \Pi(x,u_i,v_j)=f(x)+\sum_{i=1}^{M}u_i\phi^2_i(x)+\sum_{j=1}^{N}v_jmax\left\{0,\psi_j(x)\right\} ^2
    u_i\geq1,v_j\geq0
    然后根据1.3节求得最优解;

    1.4.2 Lagrange multipliers

    minimize f(x),x=(x_1,...,x_n)^T\in R^n
    subject to:
    h(x)=0
    combine the f(x) and h(x) called Lagrangian
    \Pi=f(x)+\lambda h(x)
    \lambda is the Lagrange multiplier, which is an unknown scalar to be determined.

    如果h_j(x)=0,(j=1,...,M),则需要M个Lagrange multipliers,
    \Pi(x,\lambda_j)=f(x)+\sum_{j=1}^{M}\lambda_jh_j(x)
    则偏微分方程如下:
    \frac{\partial\Pi}{\partial x_i}=\frac{\partial{f}}{\partial{x_i}}+\sum_{j=1}^M\lambda_j\frac{\partial{h_j}}{\partial{x_i}}=0(i=1,...,n)
    \frac{\partial\Pi}{\partial \lambda_j}=h_j=0(j=1,...,M)

    1.4.3 Karush-Kuhn-Tucker conditions

    minimize f(x),x=(x_1,...,x_n)^T\in R^n
    subject to:
    \phi_i(x)=0,(i=1,...,M)
    \psi_j(x)\leq0,(j=1,...,N)

    If all the functions are continuously differentiable at a local minimum x_*, then there exist constants \lambda_0,\lambda_1,...,\lambda_q and u_1,...,u_p such that:
    \lambda_0\nabla f(x_*)+\sum_{j=1}^{M}u_j\nabla\phi_i(x_*)+\sum_{j=1}^{N}\lambda_j\nabla\psi_j(x_*)=0
    \psi_j(x_*)\leq0,\lambda_j\psi_j(x_*)=0,(j=1,2,...,N)
    where\lambda_j\geq0,(i=0,1,...,N),\sum_{j=0}^N\lambda_j+\sum_{i=1}^{M} {\left|u_i\right|}\geq0.

    相关文章

      网友评论

        本文标题:数据发掘与机器学习算法导论-1.最优化(Optimization

        本文链接:https://www.haomeiwen.com/subject/nqshrctx.html