CVNN

作者: 咚咚董dyh | 来源:发表于2022-03-16 19:49 被阅读0次

Complex valued Neural Network

几个复数概念:

  • 共轭:z=(x+yj), z^*=(x-yj)两者互为共轭,将其中一个视为复变量,另一个就是共轭。
  • 复变函数:自变量为复数的函数。
  • 实值复变函数:结果为实数的复变函数,f: \mathbb{C} \to \mathbb{R},或f: \mathbb{R}^2 \to \mathbb{R}
  • 复值复变函数:结果为复数的复变函数,f: \mathbb{C} \to \mathbb{C},或f: \mathbb{R}^2 \to \mathbb{R}^2
  • Wirtinger Calculus/Derivatives:一种适用于实值/复值复变函数的微分/求导法则。鉴于该法则在实数和复数形式的微积分间切换频繁,又称\mathbb{CR}-calculus。
  • 复可微/复可导:Complex Differentiable/Complex Derivable,\mathbb{C}-differentiable/\mathbb{C}-derivable,是基于f: \mathbb{C} \to \mathbb{C}定义的。
  • 实可微/实可导:Real Differentiable/Real Derivable,\mathbb{R}-differentiable/\mathbb{R}-derivable,是基于f: \mathbb{R}^2 \to \mathbb{R}^2f: \mathbb{R}^2 \to \mathbb{R}定义的,将复数的实部和虚部分开对待。复变函数通常是实可微的,但未必复可微。
  • 复梯度:Complex Gradient,复可导得到的。
  • 实梯度:Real Gradient,实可导得到的。
  • 全纯函数:Holomorphic,在定义域上复可导的复变函数。全纯函数是可解析的,反之亦然。全纯函数同复分析中的解析函数。
  • CVNN:Loss为实值复变函数的神经网络,非全纯。

几个概念的价值和定位:

  • 全纯函数可像实变函数一样求导,链式法则也同实数链式法则一样。Wirtinger Calculus和CVNN链式法则是对全纯或非全纯复变函数统一的理论,比全纯函数的求导和链式法则要复杂,但用于全纯函数时结果一致。
  • Wirtinger Calculus是将复数的实部和虚部分开处理的,适用于实值复变函数及复值复变函数,是一种复数求导法则。
  • CVNN链式法则是将复数看做一个整体处理的,求导过程可能依赖前者,是一种计算CVNN(实值Loss)梯度的反向传播法则。
  • 复数导数和CVNN梯度不同,后者为对共轭的导数。
  • CVNN实际应用中,要同时关注求导和链式法则两方面。全纯函数直接求导,对共轭的导数为0,非全纯函数使用Wirtinger导数。

复数导数

复变函数f(z=x+yj) = F(x,y) = u(x,y) + v(x,y)j,其中x=Re(z), y=Im(z)为实数,u,v为实变函数,取自f的实部和虚部。u,v之于fx,y之于z。复数导数的极限定义为:

f'(z) = \lim_{Δz \to 0, Δz \in C} \frac{f(z+Δz) - f(z)}{Δz}

柯西-黎曼方程

柯西-黎曼方程(Cauchy–Riemann equations, CR)核心思想导数极限定义中沿实轴或虚轴逼近(实部Δx \to 0或虚部Δyj \to 0j)所得导数相等,即复可微。

f'(z) = \lim_{Δx \to 0, Δx \in R} \frac{f(z+Δx) - f(z)}{Δx} = \frac{\partial F}{\partial x}(z) \\ f'(z) = \lim_{Δy \to 0, Δyj \in Rj} \frac{f(z+Δyj) - f(z)}{Δyj} = -j * \frac{\partial F}{\partial y}(z) \\ \frac{\partial F}{\partial x}(z) = -j * \frac{\partial F}{\partial y}(z) \\ \frac{\partial F}{\partial x} = \frac{\partial u}{\partial x} + \frac{\partial v}{\partial x}j \\ -j * \frac{\partial F}{\partial y} = -j*(\frac{\partial u}{\partial y} + \frac{\partial v}{\partial y}j) = \frac{\partial v}{\partial y} - \frac{\partial u}{\partial y}j

假设在点z=(x,yj)u,v可微/可导(不必连续可微/可导),偏导数存在(这是后续所有结论的前提条件)。当且仅当u,v偏导数满足下列CR方程时(充要条件),f,F(两者等价)复可微。

实数形式:

\frac{\partial u}{\partial x} = \frac{\partial v}{\partial y} \\ \frac{\partial u}{\partial y} = -\frac{\partial v}{\partial x}

复数形式:

j*\frac{\partial F}{\partial x} = \frac{\partial F}{\partial y} \\

结合复数形式和Wirtinger Calculus \frac{\partial f}{\partial z^*} = 1/2 * (\frac{\partial F}{\partial x} + 1j * \frac{\partial F}{\partial y})可得下列形式,即f独立(无关)于变量z^*=x-yjz的共轭):

\frac{\partial f}{\partial z^*} = 0

全纯函数

复变函数f(z=x+yj) = F(x,y) = u(x,y) + v(x,y)j在定义域上(复数域\mathbb{C}的一个连续开放子域,开集)处处可微(满足CR等式),则f,F全纯。

  • 全纯:Holomorphic,即复可导。
  • 非全纯:Nonholomorphic,非复可导。

若复变函数f(z)z^*相关,则f(z)一定非全纯。如实值复变函数(非常函数)f(z) = \frac{z+z^*}{2} = x+0jv=0,因此\frac{\partial v}{\partial x} = \frac{\partial v}{\partial y} = 0,而\frac{\partial u}{\partial x} = 1,不满足CR不等式。

几个等价的陈述:

  • f'(z)=\frac{\partial f}{\partial z}存在。
  • f(z)全纯(即可分析)。
  • f(z)满足CR方程
  • f(z)的所有导数存在,且f(z)有收敛的幂级数(Power Series)。

Wirtinger Calculus

对任意f(z), z = x + yj(不必全纯)必然可以转换为G(z,z^*)(注意此处两个函数不同,但是等价)。转换方法:

\begin{aligned} x &= \frac {z + z^*}{2} \\ y &= \frac {z - z^*}{2j} \end{aligned}

若将z,z^*中的一个视为常量,G(z,z^*)则变为形式上的全纯函数,因而存在偏导数\frac{\partial G}{\partial z}。“形式上”是因为z,z^*中一个为常量时,另一个不可能为变量。“全纯”是因为z,z^*中一个为常量时,G(z,z^*)“形式上”变成一个复值复变函数,所以“全纯”。另一种理解思路是\frac{\partial G}{\partial z}\mathbb{R}-differentiable”,相当于从r(x,y)坐标系切换到c(z,z^*)坐标系。如:

f(z) = Re(z) = F(x,y) = x = \frac{z+z^*}{2} = G(z,z^*) \\ f(z) = |z|^2 = F(x,y) = x^2 + y^2 = zz^* = G(z,z^*)

x,y分别利用链式法则求偏导,得到\frac{\partial F}{\partial x},\frac{\partial F}{\partial y}
\begin{aligned} \frac{\partial F}{\partial x} &= \frac{\partial G}{\partial z}\frac{\partial z}{\partial x} + \frac{\partial G}{\partial z^*}\frac{\partial z^*}{\partial x} \\ &= \frac{\partial G}{\partial z} + \frac{\partial G}{\partial z^*} \\ \\ \frac{\partial F}{\partial y} &= \frac{\partial G}{\partial z}\frac{\partial z}{\partial y} + \frac{\partial G}{\partial z^*}\frac{\partial z^*}{\partial y} \\ &= 1j * (\frac{\partial G}{\partial z} - \frac{\partial G}{\partial z^*}) \end{aligned}

由上式可得到Wirtinger Calculus/Derivatives:

\begin{aligned} \frac{\partial G}{\partial z} &= 1/2 * (\frac{\partial F}{\partial x} - 1j * \frac{\partial F}{\partial y}) \\ \frac{\partial G}{\partial z^*} &= 1/2 * (\frac{\partial F}{\partial x} + 1j * \frac{\partial F}{\partial y}) \end{aligned}

也可通过链式法则\frac{\partial G}{\partial x}\frac{\partial x}{\partial z}...得出Wirtinger Derivatives。注意几个导数的存在情况(存在即可导):

  • \frac{\partial f(z)}{\partial z}未必存在,仅f(z)全纯时存在。
  • \frac{\partial F(x,y)}{\partial x}, \frac{\partial F(x,y)}{\partial y}存在,
  • \frac{\partial G(z,z^*)}{\partial z}存在。
  • \frac{\partial z}{\partial x}, \frac{\partial z}{\partial y}存在。
  • \frac{\partial x}{\partial z}, \frac{\partial x}{\partial z*}存在,y的类推。
  • \frac{\partial G}{\partial x}存在,因为\frac{\partial G}{\partial z},\frac{\partial z}{\partial x}存在。

由Wirtinger Derivatives可得下述关系,说明z,z^*是不相关的变量,对其一求导时,另一个可看做常量。

\begin{aligned} \frac{\partial z^*}{\partial z} &= 1/2 * (1 - 1j * (-1j)) = 0 \\ \frac{\partial z}{\partial z^*} &= 1/2 * (1 + 1j * 1j) = 0 \end{aligned}

\frac{\partial G}{\partial z^*}符合CVNN所需梯度形式,忽略系数1/2,将其规约到学习率中,可用于复数权重参数更新。

f,F为全纯函数时,根据CR方程,Wirtinger derivatives变为(和上面复导数定义一致):

\begin{aligned} \frac{\partial f}{\partial z} &= \frac{\partial G}{\partial z} = \frac{\partial F}{\partial x} = \frac{\partial u}{\partial x} + \frac{\partial v}{\partial x}j \\ \frac{\partial f}{\partial z^*} &= \frac{\partial G}{\partial z^*} = 0 \end{aligned}

复导数对共轭运算的特性:

  • f全纯时导数:\frac{\partial f^*}{\partial z^*} = (\frac{\partial f}{\partial z})^*
  • Wirtinger导数:\frac{\partial f}{\partial z^*} = (\frac{\partial f}{\partial z})^*

链式法则

对于全纯函数,链式法则同实数的链式法则。对于损失函数l(z)=L(z,z^*)为实值的CVNN,因为\frac{\partial l}{\partial z}不存在,故需利用Wirtinger derivatives(\frac{\partial L}{\partial z}形式上存在)进行链式法则。

给定CVNN,实值损失函数为L(s,s^*)s = f(z) = G(z,z^*)为前向输出,z为前向输入,\frac{\partial L}{\partial s^*}为反向输入grad_output,求反向输出\frac{\partial L}{\partial z^*}(梯度),链式法则如下:

\begin{aligned} \frac{\partial L}{\partial z^*} &= \frac{\partial L}{\partial s} * \frac{\partial s}{\partial z^*} + \frac{\partial L}{\partial s^*} * \frac{\partial s^*}{\partial z^*} \\ &= (\frac{\partial L}{\partial s^*})^* * \frac{\partial s}{\partial z^*} + \frac{\partial L}{\partial s^*} * (\frac{\partial s}{\partial z})^* \\ &= \boxed{ (grad\_output)^* * \frac{\partial s}{\partial z^*} + grad\_output * {(\frac{\partial s}{\partial z})}^* } \\ \end{aligned}

  • CVNN中目标函数/Loss值是实数,\frac{\partial L}{\partial z^*}为每层的“梯度”,用于更新权重。

  • 这一约定和TensorFlow的复数微分相同,和JAX不同(梯度为\frac{\partial L}{\partial z})。

  • 根据CR方程,当s=f(z)全纯时,\frac{\partial s}{\partial z^*} = 0

  • f: ℂ → ℝ,f: ℝ → ℂ,f: ℝ → ℝ为该法则的特殊情况,依然遵循该法则。

  • 对于f: ℂ → ℝ

\frac{\partial L}{\partial z^*} = 2 * grad\_output * \frac{\partial s}{\partial z^*}

  • 对于f: ℝ → ℂ

\frac{\partial L}{\partial z^*} = 2 * Re(grad\_output^* * \frac{\partial s}{\partial z^*})

举个例子

全纯函数f(z) = czF(x,y) = cx + cyj, c \in \mathbb{R}

  1. 直接求导:f’(z) = c
  2. 通过Wirtinger求导:f’(z) = 0.5(\frac{\partial F}{\partial x} - \frac{\partial F}{\partial y}j) = 0.5[c - (cj)j] = c
  3. 通过CR方程求导:f'(z) = \frac{\partial F}{\partial x} = -\frac{\partial F}{\partial y}j = c

全纯函数f(z) = czF(x,y) = (ax-by) + (bx+ay)j, c=a+bj \in \mathbb{C}, a,b \in \mathbb{R}

  1. 直接求导:f’(z) = c
  2. 通过Wirtinger求导:f’(z) = 0.5(\frac{\partial F}{\partial x} - \frac{\partial F}{\partial y}j) = 0.5[(a+bj) -(-b+aj)j] = c
  3. 通过CR方程求导:f'(z) = \frac{\partial F}{\partial x} = -\frac{\partial F}{\partial y}j = a+bj = c

全纯函数f(z) = z^2F(x,y) = x^2 - y^2 + 2xyj

  1. 直接求导:f’(z) = 2z = 2x+2yj
  2. 通过Wirtinger求导:f’(z) = 0.5(\frac{\partial F}{\partial x} - \frac{\partial F}{\partial y}j) = 0.5[(2x+2yj) - (-2y+2xj)j] = 2x+2yj
  3. 通过CR方程求导:f'(z) = \frac{\partial F}{\partial x} = -\frac{\partial F}{\partial y}j = 2x+2yj

参考资料

相关文章

  • CVNN

    Complex valued Neural Network 几个复数概念: 共轭:两者互为共轭,将其中一个视为复变...

网友评论

      本文标题:CVNN

      本文链接:https://www.haomeiwen.com/subject/uxpifrtx.html