Linear Algebra Foundation

About Linear Algebra Foundation


Gradients and Hessians

Recall that a matrix ARn×nA \in \mathbb{R}^{n \times n} is symmetric if AT=AA^T = A, that is, Aij=AjiA*{ij} = A*{ji} for all i,ji, j. Also recall the gradient f(x)\nabla f(x) of a function f:RnRf : \mathbb{R}^n \rightarrow \mathbb{R}, which is the n-vector of partial derivatives

f(x)=[x1f(x)xnf(x)]\nabla f(x) = \begin{bmatrix} \frac{\partial}{\partial x_1} f(x) \\ \vdots \\ \frac{\partial}{\partial x_n} f(x) \end{bmatrix}

where x=x =

[x1xn]\begin{bmatrix} x_1 \\ \vdots \\ x_n \end{bmatrix}

The hessian 2f(x)\nabla^2 f(x) of a function f:RnRf : \mathbb{R}^n \rightarrow \mathbb{R} is the n×nn \times n symmetric matrix of twice partial derivatives,

2f(x)=[2x12f(x)2x1x2f(x)2x1xnf(x)2x2x1f(x)2x22f(x)2x2xnf(x)2xnx1f(x)2xnx2f(x)2xn2f(x)]\nabla^2 f(x) = \begin{bmatrix} \frac{\partial^2}{\partial x_1^2} f(x) & \frac{\partial^2}{\partial x_1 \partial x_2} f(x) & \cdots & \frac{\partial^2}{\partial x_1 \partial x_n} f(x) \\ \frac{\partial^2}{\partial x_2 \partial x_1} f(x) & \frac{\partial^2}{\partial x_2^2} f(x) & \cdots & \frac{\partial^2}{\partial x_2 \partial x_n} f(x) \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^2}{\partial x_n \partial x_1} f(x) & \frac{\partial^2}{\partial x_n \partial x_2} f(x) & \cdots & \frac{\partial^2}{\partial x_n^2} f(x) \\ \end{bmatrix}
  1. (a) Let f(x)=12xTAx+bTxf(x) = \frac{1}{2} x^T A x + b^T x, where AA is a symmetric matrix and bRnb \in \mathbb{R}^n is a vector. What is f(x)\nabla f(x)?

  2. (b) Let f(x)=g(h(x))f(x) = g(h(x)), where g:RRg : \mathbb{R} \rightarrow \mathbb{R} is differentiable and h:RnRh : \mathbb{R}^n \rightarrow \mathbb{R} is differentiable. What is f(x)\nabla f(x)?

  3. (c) Let f(x)=12xTAx+bTxf(x) = \frac{1}{2} x^T A x + b^T x, where AA is symmetric and bRnb \in \mathbb{R}^n is a vector. What is 2f(x)\nabla^2 f(x)?

  4. (d) Let f(x)=g(aTx)f(x) = g(a^T x), where g:RRg : \mathbb{R} \rightarrow \mathbb{R} is continuously differentiable and aRna \in \mathbb{R}^n is a vector. What are f(x)\nabla f(x) and 2f(x)\nabla^2 f(x)? (Hint: your expression for 2f(x)\nabla^2 f(x) may have as few as 11 symbols, including ' and parentheses.)

Solutions

Problem 1(a)

We want to find the gradient f(x)\nabla f(x) of the function f(x)=12xTAx+bTxf(x) = \frac{1}{2} x^T A x + b^T x, where AA is a symmetric matrix and bRnb \in \mathbb{R}^n is a vector.

  1. Differentiate 12xTAx\frac{1}{2} x^T A x:

    The derivative of the quadratic form xTAxx^T A x with respect to xx is Ax+ATxAx + A^T x. Since AA is symmetric (A=ATA = A^T), this simplifies to 2Ax2Ax. The coefficient 12\frac{1}{2} in front of xTAxx^T A x will cancel the 2 from the derivative, resulting in:

    x(12xTAx)=Ax\frac{\partial}{\partial x} \left(\frac{1}{2} x^T A x\right) = Ax
  2. Differentiate bTxb^T x:

    The gradient of the linear form bTxb^T x with respect to xx is bb, because the derivative of each component bixib_i x_i with respect to xix_i is just bib_i:

    x(bTx)=b\frac{\partial}{\partial x} (b^T x) = b
  3. Combine the results:

    The gradient f(x)\nabla f(x) is the sum of the gradients of 12xTAx\frac{1}{2} x^T A x and bTxb^T x:

    f(x)=Ax+b\nabla f(x) = Ax + b

Thus, the gradient f(x)\nabla f(x) of the function f(x)f(x) is Ax+bAx + b.

Problem 1(b) Solution

Find the gradient f(x)\nabla f(x) of the function f(x)=g(h(x))f(x) = g(h(x)), where g:RRg: \mathbb{R} \rightarrow \mathbb{R} is differentiable and h:RnRh: \mathbb{R}^n \rightarrow \mathbb{R} is differentiable.

  1. By the chain rule for gradients, the gradient of ff with respect to xx is the product of the derivative of gg with respect to h(x)h(x) and the gradient of hh with respect to xx:

    f(x)=g(h(x))h(x)\nabla f(x) = g'(h(x)) \cdot \nabla h(x)

    where g(h(x))g'(h(x)) is a scalar and h(x)\nabla h(x) is a vector.

Problem 1(c) Solution

Given f(x)=12xTAx+bTxf(x) = \frac{1}{2} x^T A x + b^T x, where AA is symmetric and bRnb \in \mathbb{R}^n is a vector, find the Hessian 2f(x)\nabla^2 f(x).

  1. We have already calculated f(x)=Ax+b\nabla f(x) = Ax + b in problem 1(a).

  2. Now, to find the Hessian, we differentiate the gradient f(x)\nabla f(x) with respect to xx again.

  3. The derivative of AxAx with respect to xx is AA since AA is constant with respect to xx.

  4. The derivative of bb with respect to xx is zero since bb does not depend on xx.

    Combining these results, we get:

    2f(x)=A\nabla^2 f(x) = A

Problem 1(d) Solution

To solve problem 1(d), we need to find the gradient f(x)\nabla f(\mathbf{x}) and the Hessian 2f(x)\nabla^2 f(\mathbf{x}) of the function f(x)=g(aTx)f(\mathbf{x}) = g(a^T \mathbf{x}), where g:RRg: \mathbb{R} \rightarrow \mathbb{R} is continuously differentiable and aRna \in \mathbb{R}^n is a vector.

Finding the Gradient f(x)\nabla f(\mathbf{x}):

The gradient of a scalar function is a vector of its first partial derivatives. Here, we use the chain rule:

  1. Apply the chain rule: f(x)=g(u)f(\mathbf{x}) = g(u) with u=aTxu = a^T \mathbf{x}. The derivative of ff with respect to xix_i is:

    fxi=guuxi\frac{\partial f}{\partial x_i} = \frac{\partial g}{\partial u} \cdot \frac{\partial u}{\partial x_i}

    where uxi=ai\frac{\partial u}{\partial x_i} = a_i.

  2. Compute the gradient:

    f(x)=[gua1guan]\nabla f(\mathbf{x}) = \left[ \begin{array}{c} \frac{\partial g}{\partial u} a_1 \\ \vdots \\ \frac{\partial g}{\partial u} a_n \end{array} \right]

    Factoring out gu\frac{\partial g}{\partial u}:

    f(x)=g(aTx)a\nabla f(\mathbf{x}) = g'(a^T \mathbf{x}) \cdot a

Finding the Hessian 2f(x)\nabla^2 f(\mathbf{x}):

The Hessian is a square matrix of second partial derivatives.

  1. Use the product rule: For the second derivatives, we differentiate g(aTx)ag'(a^T \mathbf{x}) \cdot a again with respect to xx.

  2. Compute the Hessian: Each element (i,j)(i, j) of the Hessian is:

    2fxixj=aig(aTx)aj\frac{\partial^2 f}{\partial x_i \partial x_j} = a_i \cdot g''(a^T \mathbf{x}) \cdot a_j

    Thus, the Hessian matrix is:

    2f(x)=g(aTx)aaT\nabla^2 f(\mathbf{x}) = g''(a^T \mathbf{x}) \cdot a a^T

The gradient is a vector in the direction of aa scaled by gg', and the Hessian is an outer product of aa with itself, scaled by gg''.