Recall that a matrix A∈Rn×n is symmetric if AT=A, that is, A∗ij=A∗ji for all i,j. Also recall the gradient ∇f(x) of a function f:Rn→R, which is the n-vector of partial derivatives
∇f(x)=∂x1∂f(x)⋮∂xn∂f(x)
where x=
x1⋮xn
The hessian ∇2f(x) of a function f:Rn→R is the n×n symmetric matrix of twice partial derivatives,
(a) Let f(x)=21xTAx+bTx, where A is a symmetric matrix and b∈Rn is a vector. What is ∇f(x)?
(b) Let f(x)=g(h(x)), where g:R→R is differentiable and h:Rn→R is differentiable. What is ∇f(x)?
(c) Let f(x)=21xTAx+bTx, where A is symmetric and b∈Rn is a vector. What is ∇2f(x)?
(d) Let f(x)=g(aTx), where g:R→R is continuously differentiable and a∈Rn is a vector. What are ∇f(x) and ∇2f(x)? (Hint: your expression for ∇2f(x) may have as few as 11 symbols, including ' and parentheses.)
Solutions
Problem 1(a)
We want to find the gradient ∇f(x) of the function f(x)=21xTAx+bTx, where A is a symmetric matrix and b∈Rn is a vector.
Differentiate 21xTAx:
The derivative of the quadratic form xTAx with respect to x is Ax+ATx. Since A is symmetric (A=AT), this simplifies to 2Ax. The coefficient 21 in front of xTAx will cancel the 2 from the derivative, resulting in:
∂x∂(21xTAx)=Ax
Differentiate bTx:
The gradient of the linear form bTx with respect to x is b, because the derivative of each component bixi with respect to xi is just bi:
∂x∂(bTx)=b
Combine the results:
The gradient ∇f(x) is the sum of the gradients of 21xTAx and bTx:
∇f(x)=Ax+b
Thus, the gradient ∇f(x) of the function f(x) is Ax+b.
Problem 1(b) Solution
Find the gradient ∇f(x) of the function f(x)=g(h(x)), where g:R→R is differentiable and h:Rn→R is differentiable.
By the chain rule for gradients, the gradient of f with respect to x is the product of the derivative of g with respect to h(x) and the gradient of h with respect to x:
∇f(x)=g′(h(x))⋅∇h(x)
where g′(h(x)) is a scalar and ∇h(x) is a vector.
Problem 1(c) Solution
Given f(x)=21xTAx+bTx, where A is symmetric and b∈Rn is a vector, find the Hessian ∇2f(x).
We have already calculated ∇f(x)=Ax+b in problem 1(a).
Now, to find the Hessian, we differentiate the gradient ∇f(x) with respect to x again.
The derivative of Ax with respect to x is A since A is constant with respect to x.
The derivative of b with respect to x is zero since b does not depend on x.
Combining these results, we get:
∇2f(x)=A
Problem 1(d) Solution
To solve problem 1(d), we need to find the gradient ∇f(x) and the Hessian ∇2f(x) of the function f(x)=g(aTx), where g:R→R is continuously differentiable and a∈Rn is a vector.
Finding the Gradient ∇f(x):
The gradient of a scalar function is a vector of its first partial derivatives. Here, we use the chain rule:
Apply the chain rule:
f(x)=g(u) with u=aTx. The derivative of f with respect to xi is:
∂xi∂f=∂u∂g⋅∂xi∂u
where ∂xi∂u=ai.
Compute the gradient:
∇f(x)=∂u∂ga1⋮∂u∂gan
Factoring out ∂u∂g:
∇f(x)=g′(aTx)⋅a
Finding the Hessian ∇2f(x):
The Hessian is a square matrix of second partial derivatives.
Use the product rule:
For the second derivatives, we differentiate g′(aTx)⋅a again with respect to x.
Compute the Hessian:
Each element (i,j) of the Hessian is:
∂xi∂xj∂2f=ai⋅g′′(aTx)⋅aj
Thus, the Hessian matrix is:
∇2f(x)=g′′(aTx)⋅aaT
The gradient is a vector in the direction of a scaled by g′, and the Hessian is an outer product of a with itself, scaled by g′′.