常见的矩阵求导-白红宇的个人博客

https://zhuanlan.zhihu.com/p/25063314

或者简洁版 http://www.sohu.com/a/221429567_129720

矩阵求导好像读书的时候都没学过，因为讲矩阵的课程上不讲求导，讲求导的课又不提矩阵。如果从事机器学习方面的工作，那就一定会遇到矩阵求导的东西。维基百科上：

，根据Y与X的不同类型（实值，向量，矩阵），给出了具体的求导公式，以及一堆相关的公式，查起来都费劲。

其实在实际的机器学习工作中，最常用到的就是实值函数y对向量X的求导，定义如下（其实就是y对向量X的每一个元素求导）：

实值函数对矩阵X求导也类似：

因为机器学习（这里指的是有监督的机器学习）的一般套路是给定输入X，选择一个模型f作为决策函数，由f(X)预测出Y'。而得到f的参数θ（往往是向量），需要定义一个loss函数（一般都是实值函数），描述当前f预测值Y'与实际的Y值的接近程度。模型学习的过程就是求使得 loss函数 L(f(X),Y)最小的参数θ。这是一个最优化问题，实际应用中都是用和梯度相关的最优化方法，如梯度下降，共轭梯度，拟牛顿法等等。

其实只要掌握上面这个公式，就能搞定很多问题了。

为了方便推导，下面列出一些机器学习中常用的求导公式，其中andrew ng那一套用矩阵迹的方法还是挺不错的，矩阵的迹也是实值的，而一个实数的迹等于其本身，实际工作中可以将loss函数转化成迹，然后在求导，可能会简化推导的步骤。

以上只是一些最基本的公式，能够解决一些问题，主要是减少大家对矩阵求导的恐惧感。关于矩阵方面的更多信息可以参考上面的wiki链接以及《Matrix cookbook》（感谢

Notation

d/dx (y) is a vector whose (i) element is dy(i)/dx

d/dx (y) is a vector whose (i) element is dy/dx(i)

d/dx (yT) is a matrix whose (i,j) element is dy(j)/dx(i)

d/dx (Y) is a matrix whose (i,j) element is dy(i,j)/dx

d/dX (y) is a matrix whose (i,j) element is dy/dx(i,j)

Note that the Hermitian transpose is not used because complex conjugates are not analytic.

In the expressions below matrices and vectors A, B, C do not depend on X.

Derivatives of Linear Products

d/dx (AYB) =A * d/dx (Y) * B

- d/dx (Ay) =A * d/dx (y)

d/dx (xTA) =A

- d/dx (xT) =I
- d/dx (xTa) = d/dx (aTx) = a

d/dX (aTXb) = abT

- d/dX (aTXa) = d/dX (aTXTa) = aaT

d/dX (aTXTb) = baT

d/dx (YZ) =Y * d/dx (Z) + d/dx (Y) * Z

Derivatives of Quadratic Products

d/dx (Ax+b)TC(Dx+e) = ATC(Dx+e) + DTCT(Ax+b)

- d/dx (xTCx) = (C+CT)x
- - [C: symmetric]: d/dx (xTCx) = 2Cx
  - d/dx (xTx) = 2x
- d/dx (Ax+b)T (Dx+e) = AT (Dx+e) + DT (Ax+b)
- - d/dx (Ax+b)T (Ax+b) = 2AT (Ax+b)
- [C: symmetric]: d/dx (Ax+b)TC(Ax+b) = 2ATC(Ax+b)

d/dX (aTXTXb) = X(abT + baT)

- d/dX (aTXTXa) = 2XaaT

d/dX (aTXTCXb) = CTXabT + CXbaT

- d/dX (aTXTCXa) = (C + CT)XaaT
- [C:Symmetric] d/dX (aTXTCXa) = 2CXaaT

d/dX ((Xa+b)TC(Xa+b)) = (C+CT)(Xa+b)aT

Derivatives of Cubic Products

d/dx (xTAxxT) = (A+AT)xxT+xTAxI

Derivatives of Inverses

d/dx (Y-1) = -Y-1d/dx (Y)Y-1

Derivative of Trace

Note: matrix dimensions must result in an n*n argument for tr().

d/dX (tr(X)) = I

d/dX (tr(Xk)) =k(Xk-1)T

d/dX (tr(AXk)) = SUMr=0:k-1(XrAXk-r-1)T

d/dX (tr(AX-1B)) = -(X-1BAX-1)T
- d/dX (tr(AX-1)) =d/dX (tr(X-1A)) = -X-TATX-T

d/dX (tr(ATXBT)) = d/dX (tr(BXTA)) = AB
- d/dX (tr(XAT)) = d/dX (tr(ATX)) =d/dX (tr(XTA)) = d/dX (tr(AXT)) = A

d/dX (tr(AXBXT)) = ATXBT + AXB
- d/dX (tr(XAXT)) = X(A+AT)
- d/dX (tr(XTAX)) = XT(A+AT)
- d/dX (tr(AXTX)) = (A+AT)X

d/dX (tr(AXBX)) = ATXTBT + BTXTAT

[C:symmetric] d/dX (tr((XTCX)-1A) = d/dX (tr(A (XTCX)-1) = -(CX(XTCX)-1)(A+AT)(XTCX)-1

[B,C:symmetric] d/dX (tr((XTCX)-1(XTBX)) = d/dX (tr( (XTBX)(XTCX)-1) = -2(CX(XTCX)-1)XTBX(XTCX)-1 + 2BX(XTCX)-1

Derivative of Determinant

Note: matrix dimensions must result in an n*n argument for det().

d/dX (det(X)) = d/dX (det(XT)) = det(X)*X-T

- d/dX (det(AXB)) = det(AXB)*X-T
- d/dX (ln(det(AXB))) = X-T

d/dX (det(Xk)) = k*det(Xk)*X-T
- d/dX (ln(det(Xk))) = kX-T

[Real] d/dX (det(XTCX)) = det(XTCX)*(C+CT)X(XTCX)-1

- [C: Real,Symmetric] d/dX (det(XTCX)) = 2det(XTCX)* CX(XTCX)-1

[C: Real,Symmetricc] d/dX (ln(det(XTCX))) = 2CX(XTCX)-1

If y is a function of x, then dyT/dx is the Jacobian matrix of y with respect to x.

Its determinant, |dyT/dx|, is the Jacobian of y with respect to x and represents the ratio of the hyper-volumes dy and dx. The Jacobian occurs when changing variables in an integration: Integral(f(y)dy)=Integral(f(y(x)) |dyT/dx| dx).

Hessian matrix

If f is a function of x then the symmetric matrix d2f/dx2 = d/dxT(df/dx) is the Hessian matrix of f(x). A value of x for which df/dx = 0 corresponds to a minimum, maximum or saddle point according to whether the Hessian is positive definite, negative definite or indefinite.

d2/dx2 (aTx) = 0

d2/dx2 (Ax+b)TC(Dx+e) = ATCD + DTCTA

- d2/dx2 (xTCx) = C+CT
- - d2/dx2 (xTx) = 2I
- d2/dx2 (Ax+b)T (Dx+e) = ATD + DTA
- - d2/dx2 (Ax+b)T (Ax+b) = 2ATA
- [C: symmetric]: d2/dx2 (Ax+b)TC(Ax+b) = 2ATCA

http://www.psi.toronto.edu/matrix/calculus.html

转载地址：https://blog.csdn.net/xiaojiajia007/article/details/52748111 如侵犯您的版权，请留言回复原文章的地址，我们会给您删除此文章，给您带来不便请您谅解！

机器学习中常用的矩阵求导公式

Contents

Notation

Derivatives of Linear Products

Derivatives of Quadratic Products

Derivatives of Cubic Products

Derivatives of Inverses

Derivative of Trace

Derivative of Determinant

Hessian matrix

发表评论

最新留言

关于作者

推荐文章