## Section 8.3 The derivative

*Note: 2–3 lectures*

### Subsection 8.3.1 The derivative

For a function \(f \colon \R \to \R\text{,}\) we defined the derivative at \(x\) as

In other words, there is a number \(a\) (the derivative of \(f\) at \(x\)) such that

Multiplying by \(a\) is a linear map in one dimension: \(h \mapsto ah\text{.}\) Namely, we think of \(a \in L(\R^1,\R^1)\text{,}\) which is the best linear approximation of how \(f\) changes near \(x\text{.}\) We use this interpretation to extend differentiation to more variables.

#### Definition 8.3.1.

Let \(U \subset \R^n\) be open and \(f \colon U \to \R^m\) a function. We say \(f\) is *differentiable* at \(x \in U\) if there exists an \(A \in L(\R^n,\R^m)\) such that

We write \(Df(x) := A\text{,}\) or \(f'(x) := A\text{,}\) and we say \(A\) is the *derivative* of \(f\) at \(x\text{.}\) When \(f\) is differentiable at every \(x \in U\text{,}\) we say simply that \(f\) is *differentiable*. See Figure 8.3 for an illustration.

For a differentiable function, the derivative of \(f\) is a function from \(U\) to \(L(\R^n,\R^m)\text{.}\) Compare to the one-dimensional case, where the derivative is a function from \(U\) to \(\R\text{,}\) but we really want to think of \(\R\) here as \(L(\R^1,\R^1)\text{.}\) As in one dimension, the idea is that a differentiable mapping is “infinitesimally close” to a linear mapping, and this linear mapping is the derivative.

Notice which norms are being used in the definition. The norm in the numerator is on \(\R^m\text{,}\) and the norm in the denominator is on \(\R^n\) where \(h\) lives. Normally it is understood that \(h \in \R^n\) from context (the formula makes no sense otherwise). We will not explicitly say so from now on.

We have again cheated somewhat and said that \(A\) is *the* derivative. We have not shown yet that there is only one, let us do that now.

#### Proposition 8.3.2.

Let \(U \subset \R^n\) be an open subset and \(f \colon U \to \R^m\) a function. Suppose \(x \in U\) and there exist \(A,B \in L(\R^n,\R^m)\) such that

Then \(A=B\text{.}\)

#### Proof.

Suppose \(h \in \R^n\text{,}\) \(h \not= 0\text{.}\) Compute

So \(\frac{\snorm{(A-B)h}}{\snorm{h}} \to 0\) as \(h \to 0\text{.}\) Given \(\epsilon > 0\text{,}\) for all nonzero \(h\) in some \(\delta\)-ball around the origin we have

For any given \(v \in \R^n\) with \(\snorm{v}=1\text{,}\) let \(h = (\nicefrac{\delta}{2}) \, v\text{,}\) then \(\snorm{h} < \delta\) and \(\frac{h}{\snorm{h}} = v\text{.}\) So \(\snorm{(A-B)v} < \epsilon\text{.}\) Taking the supremum over all \(v\) with \(\snorm{v} = 1\text{,}\) we get the operator norm \(\snorm{A-B} \leq \epsilon\text{.}\) As \(\epsilon > 0\) was arbitrary, \(\snorm{A-B} = 0\text{,}\) or in other words \(A = B\text{.}\)

#### Example 8.3.3.

If \(f(x) = Ax\) for a linear mapping \(A\text{,}\) then \(f'(x) = A\text{:}\)

#### Example 8.3.4.

Let \(f \colon \R^2 \to \R^2\) be defined by

Let us show that \(f\) is differentiable at the origin and let us compute the derivative, directly using the definition. If the derivative exists, it is in \(L(\R^2,\R^2)\text{,}\) so it can be represented by a \(2\)-by-\(2\) matrix \(\left[\begin{smallmatrix}a&b\\c&d\end{smallmatrix}\right]\text{.}\) Suppose \(h = (h_1,h_2)\text{.}\) We need the following expression to go to zero.

If we choose \(a=1\text{,}\) \(b=2\text{,}\) \(c=2\text{,}\) \(d=3\text{,}\) the expression becomes

This expression does indeed go to zero as \(h \to 0\text{.}\) The function \(f\) is differentiable at the origin and the derivative \(f'(0)\) is represented by the matrix \(\left[\begin{smallmatrix}1&2\\2&3\end{smallmatrix}\right]\text{.}\)

#### Proposition 8.3.5.

Let \(U \subset \R^n\) be open and \(f \colon U \to \R^m\) be differentiable at \(p \in U\text{.}\) Then \(f\) is continuous at \(p\text{.}\)

#### Proof.

Another way to write the differentiability of \(f\) at \(p\) is to consider

The function \(f\) is differentiable at \(p\) if \(\frac{\snorm{r(h)}}{\snorm{h}}\) goes to zero as \(h \to 0\text{,}\) so \(r(h)\) itself goes to zero. The mapping \(h \mapsto f'(p) h\) is a linear mapping between finite-dimensional spaces, hence continuous and \(f'(p) h \to 0\) as \(h \to 0\text{.}\) Thus, \(f(p+h)\) must go to \(f(p)\) as \(h \to 0\text{.}\) That is, \(f\) is continuous at \(p\text{.}\)

The derivative is itself a linear operator on the space of differentiable functions.

#### Proposition 8.3.6.

Suppose \(U \subset \R^n\) is open, \(f \colon U \to \R^m\) and \(g \colon U \to \R^m\) are differentiable at \(p\text{,}\) and \(\alpha \in \R\text{.}\) Then the functions \(f+g\) and \(\alpha f\) are differentiable at \(p\) and

#### Proof.

Let \(h \in \R^n\text{,}\) \(h \not= 0\text{.}\) Then

and

The limits as \(h\) goes to zero of the right-hand sides are zero by hypothesis. The result follows.

If \(A \in L(\R^n,\R^m)\) and \(B \in L(\R^m,\R^k)\) are linear maps, then they are their own derivative. The composition \(BA \in L(\R^n,\R^k)\) is also its own derivative, and so the derivative of the composition is the composition of the derivatives. As differentiable maps are “infinitesimally close” to linear maps, they have the same property:

#### Theorem 8.3.7. Chain rule.

Let \(U \subset \R^n\) be open and let \(f \colon U \to \R^m\) be differentiable at \(p \in U\text{.}\) Let \(V \subset \R^m\) be open, \(f(U) \subset V\) and let \(g \colon V \to \R^\ell\) be differentiable at \(f(p)\text{.}\) Then

is differentiable at \(p\) and

Without the points where things are evaluated, this is sometimes written as \(F' = {(g \circ f)}' = g' f'\text{.}\) The way to understand it is that the derivative of the composition \(g \circ f\) is the composition of the derivatives of \(g\) and \(f\text{.}\) If \(f'(p) = A\) and \(g'\bigl(f(p)\bigr) = B\text{,}\) then \(F'(p) = BA\text{,}\) just as for linear maps.

#### Proof.

Let \(A := f'(p)\) and \(B := g'\bigl(f(p)\bigr)\text{.}\) Take a nonzero \(h \in \R^n\) and write \(q := f(p)\text{,}\) \(k := f(p+h)-f(p)\text{.}\) Let

Then \(r(h) = k-Ah\) or \(Ah = k-r(h)\text{,}\) and \(f(p+h) = q+k\text{.}\) We look at the quantity we need to go to zero:

First, \(\snorm{B}\) is a constant and \(f\) is differentiable at \(p\text{,}\) so the term \(\snorm{B}\frac{\snorm{r(h)}}{\snorm{h}}\) goes to 0. Next because \(f\) is continuous at \(p\text{,}\) then as \(h\) goes to 0, so \(k\) goes to 0. Thus \(\frac {\snorm{g(q+k)-g(q) - Bk}} {\snorm{k}}\) goes to 0, because \(g\) is differentiable at \(q\text{.}\) Finally,

As \(f\) is differentiable at \(p\text{,}\) for small enough \(h\text{,}\) the quantity \(\frac{\snorm{f(p+h)-f(p)-Ah}}{\snorm{h}}\) is bounded. Hence, the term \(\frac {\snorm{f(p+h)-f(p)}} {\snorm{h}}\) stays bounded as \(h\) goes to 0. Therefore, \(\frac{\snorm{F(p+h)-F(p) - BAh}}{\snorm{h}}\) goes to zero, and \(F'(p) = BA\text{,}\) which is what was claimed.

### Subsection 8.3.2 Partial derivatives

There is another way to generalize the derivative from one dimension. We hold all but one variable constant and take the regular one-variable derivative.

#### Definition 8.3.8.

Let \(f \colon U \to \R\) be a function on an open set \(U \subset \R^n\text{.}\) If the following limit exists, we write

We call \(\frac{\partial f}{\partial x_j} (x)\) the *partial derivative* of \(f\) with respect to \(x_j\text{.}\) See Figure 8.4. Here \(h\) is a number, not a vector.

For a mapping \(f \colon U \to \R^m\) we write \(f = (f_1,f_2,\ldots,f_m)\text{,}\) where \(f_k\) are real-valued functions. We then take partial derivatives of the components, \(\frac{\partial f_k}{\partial x_j}\text{.}\)

Partial derivatives are easier to compute with all the machinery of calculus, and they provide a way to compute the derivative of a function.

#### Proposition 8.3.9.

Let \(U \subset \R^n\) be open and let \(f \colon U \to \R^m\) be differentiable at \(p \in U\text{.}\) Then all the partial derivatives at \(p\) exist and, in terms of the standard bases of \(\R^n\) and \(\R^m\text{,}\) \(f'(p)\) is represented by the matrix

In other words,

If \(v = \sum_{j=1}^n c_j\, e_j = (c_1,c_2,\ldots,c_n)\text{,}\) then

#### Proof.

Fix a \(j\) and note that for nonzero \(h\text{,}\)

As \(h\) goes to 0, the right-hand side goes to zero by differentiability of \(f\text{,}\) and hence

Let us represent \(f\) by components \(f = (f_1,f_2,\ldots,f_m)\text{,}\) since it is vector-valued. Taking a limit in \(\R^m\) is the same as taking the limit in each component separately. For every \(k\text{,}\) the partial derivative

exists and is equal to the \(k\)th component of \(f'(p)\, e_j\text{,}\) and we are done.

The converse of the proposition is not true. Just because the partial derivatives exist, does not mean that the function is differentiable. See the exercises. However, when the partial derivatives are continuous, we will prove that the converse holds. One of the consequences of the proposition is that if \(f\) is differentiable on \(U\text{,}\) then \(f' \colon U \to L(\R^n,\R^m)\) is a continuous function if and only if all the \(\frac{\partial f_k}{\partial x_j}\) are continuous functions.

### Subsection 8.3.3 Gradients, curves, and directional derivatives

Let \(U \subset \R^n\) be open and \(f \colon U \to \R\) a differentiable function. We define the *gradient* as

The gradient gives a way to represent the action of the derivative as a dot product: \(f'(x)\,v = \nabla f(x) \cdot v\text{.}\)

Suppose \(\gamma \colon (a,b) \subset \R \to \R^n\) is differentiable. Such a function and its image is sometimes called a *curve*, or a *differentiable curve*. Write \(\gamma =
(\gamma_1,\gamma_2,\ldots,\gamma_n)\text{.}\) For the purposes of computation, we identify \(L(\R^1)\) and \(\R\) as we did when we defined the derivative in one variable. We also identify \(L(\R^1,\R^n)\) with \(\R^n\text{.}\) We treat \(\gamma^{\:\prime}(t)\) both as an operator in \(L(\R^1,\R^n)\) and the vector \(\bigl(\gamma_1^{\:\prime}(t),
\gamma_2^{\:\prime}(t),\ldots,\gamma_n^{\:\prime}(t)\bigr)\) in \(\R^n\text{.}\) Using Proposition 8.3.9, if \(v\in \R^n\) is \(\gamma^{\:\prime}(t)\) acting as a vector, then \(h \mapsto h \, v\) (for \(h \in \R^1 = \R\)) is \(\gamma^{\:\prime}(t)\) acting as an operator in \(L(\R^1,\R^n)\text{.}\) We often use this slight abuse of notation when dealing with curves. The vector \(\gamma^{\:\prime}(t)\) is called a *tangent vector*. See Figure 8.5.

Suppose \(\gamma\bigl((a,b)\bigr) \subset U\) and let

The function \(g\) is differentiable. Treating \(g'(t)\) as a number,

For convenience, we often leave out the points where we are evaluating, such as above on the far right-hand side. With the notation of the gradient and the dot product the equation becomes

We use this idea to define derivatives in a specific direction. A direction is simply a vector pointing in that direction. Pick a vector \(u \in \R^n\) such that \(\snorm{u} = 1\text{,}\) and fix \(x \in U\text{.}\) We define the *directional derivative* as

where the notation \(\frac{d}{dt}\big|_{t=0}\) represents the derivative evaluated at \(t=0\text{.}\) Taking the standard basis vector \(e_j\) we find \(\frac{\partial f}{\partial x_j} = D_{e_j} f\text{.}\) For this reason, sometimes the notation \(\frac{\partial f}{\partial u}\) is used instead of \(D_u f\text{.}\)

Let \(\gamma\) be defined by

Then \(\gamma^{\:\prime}(t) = u\) for all \(t\text{.}\) Let us see what happens to \(f\) when we travel along \(\gamma\text{:}\)

In fact, this computation holds whenever \(\gamma\) is any curve such that \(\gamma(0) = x\) and \(\gamma^{\:\prime}(0) = u\text{.}\)

Suppose \((\nabla f)(x) \neq 0\text{.}\) By the Cauchy–Schwarz inequality,

Equality is achieved when \(u\) is a scalar multiple of \((\nabla f)(x)\text{.}\) That is, when

we get \(D_u f(x) = \snorm{(\nabla f)(x)}\text{.}\) The gradient points in the direction in which the function grows fastest, in other words, in the direction in which \(D_u f(x)\) is maximal.

### Subsection 8.3.4 The Jacobian

#### Definition 8.3.10.

Let \(U \subset \R^n\) and \(f \colon U \to \R^n\) be a differentiable mapping. Define the *Jacobian*^{ 1 }, or the *Jacobian determinant*^{ 3 }, of \(f\) at \(x\) as

Sometimes \(J_f\) is written as

This last piece of notation may seem somewhat confusing, but it is quite useful when we need to specify the exact variables and function components used, as we will do, for example, in the implicit function theorem.

The Jacobian \(J_f\) is a real-valued function, and when \(n=1\) it is simply the derivative. From the chain rule and the fact that \(\det(AB) = \det(A)\det(B)\text{,}\) it follows that:

The determinant of a linear mapping tells us what happens to area/volume under the mapping. Similarly, the Jacobian measures how much a differentiable mapping stretches things locally, and if it flips orientation. In particular, if the Jacobian is non-zero than we would assume that locally the mapping is invertible (and we would be correct as we will later see).

### Subsection 8.3.5 Exercises

#### Exercise 8.3.1.

Suppose \(\gamma \colon (-1,1) \to \R^n\) and \(\alpha \colon (-1,1) \to \R^n\) are two differentiable curves such that \(\gamma(0) = \alpha(0)\) and \(\gamma^{\:\prime}(0) = \alpha'(0)\text{.}\) Suppose \(F \colon \R^n \to \R\) is a differentiable function. Show that

#### Exercise 8.3.2.

Let \(f \colon \R^2 \to \R\) be given by \(f(x,y) := \sqrt{x^2+y^2}\text{,}\) see Figure 8.6. Show that \(f\) is not differentiable at the origin.

#### Exercise 8.3.3.

Using only the definition of the derivative, show that the following \(f \colon \R^2 \to \R^2\) are differentiable at the origin and find their derivative.

\(f(x,y) := (1+x+xy,x)\text{,}\)

\(f(x,y) := \bigl(y-y^{10},x \bigr)\text{,}\)

\(f(x,y) := \bigl( {(x+y+1)}^2 , {(x-y+2)}^2 \bigr)\text{.}\)

#### Exercise 8.3.4.

Suppose \(f \colon \R \to \R\) and \(g \colon \R \to \R\) are differentiable functions. Using only the definition of the derivative, show that \(h \colon \R^2 \to \R^2\) defined by \(h(x,y) := \bigl(f(x),g(y)\bigr)\) is a differentiable function, and find the derivative, at all points \((x,y)\text{.}\)

#### Exercise 8.3.5.

Define a function \(f \colon \R^2 \to \R\) by (see Figure 8.7)

Show that the partial derivatives \(\frac{\partial f}{\partial x}\) and \(\frac{\partial f}{\partial y}\) exist at all points (including the origin).

Show that \(f\) is not continuous at the origin (and hence not differentiable).

#### Exercise 8.3.6.

Define a function \(f \colon \R^2 \to \R\) by (see Figure 8.8)

Show that the partial derivatives \(\frac{\partial f}{\partial x}\) and \(\frac{\partial f}{\partial y}\) exist at all points.

Show that for all \(u \in \R^2\) with \(\snorm{u}=1\text{,}\) the directional derivative \(D_u f\) exists at all points.

Show that \(f\) is continuous at the origin.

Show that \(f\) is not differentiable at the origin.

#### Exercise 8.3.7.

Suppose \(f \colon \R^n \to \R^n\) is one-to-one, onto, differentiable at all points, and such that \(f^{-1}\) is also differentiable at all points.

Show that \(f'(p)\) is invertible at all points \(p\) and compute \({(f^{-1})}'\bigl(f(p)\bigr)\text{.}\) Hint: Consider \(x = f^{-1}\bigl(f(x)\bigr)\text{.}\)

Let \(g \colon \R^n \to \R^n\) be a function differentiable at \(q \in \R^n\) and such that \(g(q)=q\text{.}\) Suppose \(f(p) = q\) for some \(p \in \R^n\text{.}\) Show \(J_g(q) = J_{f^{-1} \circ g \circ f}(p)\) where \(J_g\) is the Jacobian determinant.

#### Exercise 8.3.8.

Suppose \(f \colon \R^2 \to \R\) is differentiable and such that \(f(x,y) = 0\) if and only if \(y=0\) and such that \(\nabla f(0,0) = (0,1)\text{.}\) Prove that \(f(x,y) > 0\) whenever \(y > 0\text{,}\) and \(f(x,y) < 0\) whenever \(y < 0\text{.}\)

As for functions of one variable, \(f \colon U \to \R\) has a *relative maximum* at \(p \in U\) if there exists a \(\delta >0\) such that \(f(q) \leq f(p)\) for all \(q \in B(p,\delta) \cap U\text{.}\) Similarly for *relative minimum*.

#### Exercise 8.3.9.

Suppose \(U \subset \R^n\) is open and \(f \colon U \to \R\) is differentiable. Suppose \(f\) has a relative maximum at \(p \in U\text{.}\) Show that \(f'(p) = 0\text{,}\) that is the zero mapping in \(L(\R^n,\R)\text{.}\) That is \(p\) is a *critical point* of \(f\text{.}\)

#### Exercise 8.3.10.

Suppose \(f \colon \R^2 \to \R\) is differentiable and \(f(x,y) = 0\) whenever \(x^2+y^2 = 1\text{.}\) Prove that there exists at least one point \((x_0,y_0)\) such that \(\frac{\partial f}{\partial x}(x_0,y_0) = \frac{\partial f}{\partial y}(x_0,y_0) = 0\text{.}\)

#### Exercise 8.3.11.

Define \(f(x,y) := ( x-y^2 ) ( 2 y^2 - x)\text{.}\) The graph of \(f\) is called the *Peano surface*.^{ 4 }

Show that \((0,0)\) is a critical point, that is \(f'(0,0) = 0\text{,}\) that is the zero linear map in \(L(\R^2,\R)\text{.}\)

Show that for every direction the restriction of \(f\) to a line through the origin in that direction has a relative maximum at the origin. In other words, for every \((x,y)\) such that \(x^2+y^2=1\text{,}\) the function \(g(t) := f(tx,ty)\text{,}\) has a relative maximum at \(t=0\text{.}\)

Hint: While not necessary Section 4.3 makes this part easier.Show that \(f\) does not have a relative maximum at \((0,0)\text{.}\)

#### Exercise 8.3.12.

Suppose \(f \colon \R \to \R^n\) is differentiable and \(\snorm{f(t)} = 1\) for all \(t\) (that is, we have a curve in the unit sphere). Show that \(f'(t) \cdot f(t) = 0\) (treating \(f'\) as a vector) for all \(t\text{.}\)

#### Exercise 8.3.13.

Define \(f \colon \R^2 \to \R^2\) by \(f(x,y) := \bigl(x,y+\varphi(x)\bigr)\) for some differentiable function \(\varphi\) of one variable. Show \(f\) is differentiable and find \(f'\text{.}\)

#### Exercise 8.3.14.

Suppose \(U \subset \R^n\) is open, \(p \in U\text{,}\) and \(f \colon U \to \R\text{,}\) \(g \colon U \to \R\text{,}\) \(h \colon U \to \R\) are functions such that \(f(p) = g(p) = h(p)\text{,}\) \(f\) and \(h\) are differentiable at \(p\text{,}\) \(f'(p) = h'(p)\text{,}\) and

for all \(x \in U\text{.}\) Show that \(g\) is differentiable at \(p\) and \(g'(p) = f'(p) = h'(p)\text{.}\)

^{ 2 }(1804–1851).

`https://en.wikipedia.org/wiki/Carl_Gustav_Jacob_Jacobi`

^{ 5 }(1858–1932).

`https://en.wikipedia.org/wiki/Giuseppe_Peano`