Skip to main content

Section 8.4 Continuity and the derivative

Note: 1–2 lectures

Subsection 8.4.1 Bounding the derivative

Let us prove a “mean value theorem” for vector-valued functions.


By the mean value theorem on the scalar-valued function \(t \mapsto \bigl(\varphi(b)-\varphi(a) \bigr) \cdot \varphi(t)\text{,}\) where the dot is the dot product, we obtain that there is a \(t_0 \in (a,b)\) such that

\begin{equation*} \begin{split} \snorm{\varphi(b)-\varphi(a)}^2 & = \bigl( \varphi(b)-\varphi(a) \bigr) \cdot \bigl( \varphi(b)-\varphi(a) \bigr) \\ & = \bigl(\varphi(b)-\varphi(a) \bigr) \cdot \varphi(b) - \bigl(\varphi(b)-\varphi(a) \bigr) \cdot \varphi(a) \\ & = (b-a) \bigl(\varphi(b)-\varphi(a) \bigr) \cdot \varphi'(t_0) , \end{split} \end{equation*}

where we treat \(\varphi'\) as a vector in \(\R^n\) by the abuse of notation we mentioned in the previous section. If we think of \(\varphi'(t)\) as a vector, then by Exercise 8.2.6, \(\snorm{\varphi'(t)}_{L(\R,\R^n)} = \snorm{\varphi'(t)}_{\R^n}\text{.}\) That is, the euclidean norm of the vector is the same as the operator norm of \(\varphi'(t)\text{.}\)

By the Cauchy–Schwarz inequality

\begin{equation*} \snorm{\varphi(b)-\varphi(a)}^2 = (b-a)\bigl(\varphi(b)-\varphi(a) \bigr) \cdot \varphi'(t_0) \leq (b-a) \snorm{\varphi(b)-\varphi(a)} \, \snorm{\varphi'(t_0)} . \qedhere \end{equation*}

Recall that a set \(U\) is convex if whenever \(x,y \in U\text{,}\) the line segment from \(x\) to \(y\) lies in \(U\text{.}\)


Fix \(x\) and \(y\) in \(U\) and note that \((1-t)x+ty \in U\) for all \(t \in [0,1]\) by convexity. Next

\begin{equation*} \frac{d}{dt} \Bigl[f\bigl((1-t)x+ty\bigr)\Bigr] = f'\bigl((1-t)x+ty\bigr) (y-x) . \end{equation*}

By Lemma 8.4.1 there is some \(t_0 \in (0,1)\) such that

\begin{equation*} \begin{split} \snorm{f(x)-f(y)} & \leq \norm{\frac{d}{dt} \Big|_{t=t_0} \Bigl[ f\bigl((1-t)x+ty\bigr) \Bigr] } \\ & \leq \norm{f'\bigl((1-t_0)x+t_0y\bigr)} \, \snorm{y-x} \leq M \snorm{y-x} . \qedhere \end{split} \end{equation*}

Example 8.4.3.

If \(U\) is not convex the proposition is not true: Consider the set

\begin{equation*} U := \bigl\{ (x,y) : 0.5 < x^2+y^2 < 2 \bigr\} \setminus \bigl\{ (x,0) : x < 0 \bigr\} . \end{equation*}

For \((x,y) \in U\text{,}\) let \(f(x,y)\) be the angle that the line from the origin to \((x,y)\) makes with the positive \(x\) axis. We even have a formula for \(f\text{:}\)

\begin{equation*} f(x,y) = 2 \operatorname{arctan}\left( \frac{y}{x+\sqrt{x^2+y^2}}\right) . \end{equation*}

Think a spiral staircase with room in the middle. See Figure 8.9.

Figure 8.9. A non-Lipschitz function with uniformly bounded derivative.

The function is differentiable, and the derivative is bounded on \(U\text{,}\) which is not hard to see. Now think of what happens near where the negative \(x\)-axis cuts the annulus in half. As we approach this cut from positive \(y\text{,}\) \(f(x,y)\) approaches \(\pi\text{.}\) From negative \(y\text{,}\) \(f(x,y)\) approaches \(-\pi\text{.}\) So for small \(\epsilon > 0\text{,}\) \(\sabs{f(-1,\epsilon)-f(-1,-\epsilon)}\) approaches \(2\pi\text{,}\) but \(\snorm{(-1,\epsilon)-(-1,-\epsilon)} = 2\epsilon\text{,}\) which is arbitrarily small. The conclusion of the proposition does not hold for this nonconvex \(U\text{.}\)

Let us solve the differential equation \(f' = 0\text{.}\)


For any given \(x \in U\text{,}\) there is a ball \(B(x,\delta) \subset U\text{.}\) The ball \(B(x,\delta)\) is convex. Since \(\snorm{f'(y)} \leq 0\) for all \(y \in B(x,\delta)\text{,}\) then by the proposition, \(\snorm{f(x)-f(y)} \leq 0 \snorm{x-y} = 0\text{.}\) So \(f(x) = f(y)\) for all \(y \in B(x,\delta)\text{.}\)

This means that \(f^{-1}(c)\) is open for all \(c \in \R^m\text{.}\) Suppose \(f^{-1}(c)\) is nonempty. The two sets

\begin{equation*} U' = f^{-1}(c), \qquad U'' = f^{-1}\bigl(\R^m\setminus\{c\}\bigr) \end{equation*}

are open and disjoint, and further \(U = U' \cup U''\text{.}\) As \(U'\) is nonempty and \(U\) is connected, then \(U'' = \emptyset\text{.}\) So \(f(x) = c\) for all \(x \in U\text{.}\)

Subsection 8.4.2 Continuously differentiable functions

Definition 8.4.5.

Let \(U \subset \R^n\) be open. We say \(f \colon U \to \R^m\) is continuously differentiable, or \(C^1(U)\text{,}\) if \(f\) is differentiable and \(f' \colon U \to L(\R^n,\R^m)\) is continuous.

Without continuity the theorem does not hold. Just because partial derivatives exist does not mean that \(f\) is differentiable, in fact, \(f\) may not even be continuous. See the exercises for the last section and also for this section.


We proved that if \(f\) is differentiable, then the partial derivatives exist. The partial derivatives are the entries of the matrix of \(f'(x)\text{.}\) If \(f' \colon U \to L(\R^n,\R^m)\) is continuous, then the entries are continuous, and hence the partial derivatives are continuous.

To prove the opposite direction, suppose the partial derivatives exist and are continuous. Fix \(x \in U\text{.}\) If we show that \(f'(x)\) exists we are done, because the entries of the matrix \(f'(x)\) are the partial derivatives and if the entries are continuous functions, the matrix-valued function \(f'\) is continuous.

We do induction on dimension. First, the conclusion is true when \(n=1\text{.}\) In this case the derivative is just the regular derivative (exercise, noting that \(f\) is vector-valued).

Suppose the conclusion is true for \(\R^{n-1}\text{,}\) that is, if we restrict to the first \(n-1\) variables, the function is differentiable. It is easy to see that the first \(n-1\) partial derivatives of \(f\) restricted to the set where the last coordinate is fixed are the same as those for \(f\text{.}\) In the following, by a slight abuse of notation, we think of \(\R^{n-1}\) as a subset of \(\R^n\text{,}\) that is the set in \(\R^n\) where \(x_n = 0\text{.}\) In other words, we identify the vectors \((x_1,x_2,\ldots,x_{n-1})\) and \((x_1,x_2,\ldots,x_{n-1},0)\text{.}\) Let

\begin{equation*} A := \begin{bmatrix} \frac{\partial f_1}{\partial x_1}(x) & \ldots & \frac{\partial f_1}{\partial x_n}(x) \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1}(x) & \ldots & \frac{\partial f_m}{\partial x_n}(x) \end{bmatrix} , \qquad A' := \begin{bmatrix} \frac{\partial f_1}{\partial x_1}(x) & \ldots & \frac{\partial f_1}{\partial x_{n-1}}(x) \\ \vdots & \ddots & \vdots \\ \frac{\partial f_m}{\partial x_1}(x) & \ldots & \frac{\partial f_m}{\partial x_{n-1}}(x) \end{bmatrix} , \qquad v := \begin{bmatrix} \frac{\partial f_1}{\partial x_n}(x) \\ \vdots \\ \frac{\partial f_m}{\partial x_n}(x) \end{bmatrix} . \end{equation*}

Let \(\epsilon > 0\) be given. By the induction hypothesis, there is a \(\delta > 0\) such that for every \(k \in \R^{n-1}\) with \(\snorm{k} < \delta\text{,}\) we have

\begin{equation*} \frac{\snorm{f(x+k) - f(x) - A' k}}{\snorm{k}} < \epsilon . \end{equation*}

By continuity of the partial derivatives, suppose \(\delta\) is small enough so that

\begin{equation*} \abs{\frac{\partial f_j}{\partial x_n}(x+h) - \frac{\partial f_j}{\partial x_n}(x)} < \epsilon \end{equation*}

for all \(j\) and all \(h \in \R^n\) with \(\snorm{h} < \delta\text{.}\)

Suppose \(h = k + t e_n\) is a vector in \(\R^n\text{,}\) where \(k \in \R^{n-1}\text{,}\) \(t \in \R\text{,}\) such that \(\snorm{h} < \delta\text{.}\) Then \(\snorm{k} \leq \snorm{h} < \delta\text{.}\) Note that \(Ah = A' k + tv\text{.}\)

\begin{equation*} \begin{split} \snorm{f(x+h) - f(x) - Ah} & = \snorm{f(x+k + t e_n) - f(x+k) - tv + f(x+k) - f(x) - A' k} \\ & \leq \snorm{f(x+k + t e_n) - f(x+k) -tv} + \snorm{f(x+k) - f(x) - A' k} \\ & \leq \snorm{f(x+k + t e_n) - f(x+k) -tv} + \epsilon \snorm{k} . \end{split} \end{equation*}

As all the partial derivatives exist, by the mean value theorem, for each \(j\) there is some \(\theta_j \in [0,t]\) (or \([t,0]\) if \(t < 0\)), such that

\begin{equation*} f_j(x+k + t e_n) - f_j(x+k) = t \frac{\partial f_j}{\partial x_n}(x+k+\theta_j e_n). \end{equation*}

Note that if \(\snorm{h} < \delta\text{,}\) then \(\snorm{k+\theta_j e_n} \leq \snorm{h} < \delta\text{.}\) We finish the estimate

\begin{equation*} \begin{split} \snorm{f(x+h) - f(x) - Ah} & \leq \snorm{f(x+k + t e_n) - f(x+k) -tv} + \epsilon \snorm{k} \\ & \leq \sqrt{\sum_{j=1}^m {\left(t\frac{\partial f_j}{\partial x_n}(x+k+\theta_j e_n) - t \frac{\partial f_j}{\partial x_n}(x)\right)}^2} + \epsilon \snorm{k} \\ & \leq \sqrt{m}\, \epsilon \sabs{t} + \epsilon \snorm{k} \\ & \leq (\sqrt{m}+1)\epsilon \snorm{h} . \qedhere \end{split} \end{equation*}

A common application is to prove that a certain function is differentiable. For example, let us show that all polynomials are differentiable, and in fact continuously differentiable by computing the partial derivatives.


Consider the partial derivative of \(p\) in the \(x_n\) variable. Write \(p\) as

\begin{equation*} p(x) = \sum_{j=0}^d p_j(x_1,\ldots,x_{n-1}) \, x_n^j , \end{equation*}

where \(p_j\) are polynomials in one less variable. Then

\begin{equation*} \frac{\partial p}{\partial x_n}(x) = \sum_{j=1}^d p_j(x_1,\ldots,x_{n-1}) \, j x_n^{j-1} , \end{equation*}

which is again a polynomial. So the partial derivatives of polynomials exist and are again polynomials. By the continuity of algebraic operations, polynomials are continuous functions. Therefore \(p\) is continuously differentiable.

Subsection 8.4.3 Exercises

Exercise 8.4.1.

Define \(f \colon \R^2 \to \R\) as

\begin{equation*} f(x,y) := \begin{cases} (x^2+y^2)\sin\bigl({(x^2+y^2)}^{-1}\bigr) & \text{if } (x,y) \not= (0,0), \\ 0 & \text{if } (x,y) = (0,0). \end{cases} \end{equation*}

Show that \(f\) is differentiable at the origin, but that it is not continuously differentiable.
Note: Feel free to use what you know about sine and cosine from calculus.

Exercise 8.4.2.

Let \(f \colon \R^2 \to \R\) be the function from Exercise 8.3.5, that is,

\begin{equation*} f(x,y) := \begin{cases} \frac{xy}{x^2+y^2} & \text{if } (x,y) \not= (0,0), \\ 0 & \text{if } (x,y) = (0,0). \end{cases} \end{equation*}

Compute the partial derivatives \(\frac{\partial f}{\partial x}\) and \(\frac{\partial f}{\partial y}\) at all points and show that these are not continuous functions.

Exercise 8.4.3.

Let \(B(0,1) \subset \R^2\) be the unit ball, that is, the set given by \(x^2 + y^2 < 1\text{.}\) Suppose \(f \colon B(0,1) \to \R\) is a differentiable function such that \(\sabs{f(0,0)} \leq 1\text{,}\) and \(\babs{\frac{\partial f}{\partial x}} \leq 1\) and \(\babs{\frac{\partial f}{\partial y}} \leq 1\) for all points in \(B(0,1)\text{.}\)

  1. Find an \(M \in \R\) such that \(\snorm{f'(x,y)} \leq M\) for all \((x,y) \in B(0,1)\text{.}\)

  2. Find a \(B \in \R\) such that \(\sabs{f(x,y)} \leq B\) for all \((x,y) \in B(0,1)\text{.}\)

Exercise 8.4.4.

Define \(\varphi \colon [0,2\pi] \to \R^2\) by \(\varphi(t) = \bigl(\sin(t),\cos(t)\bigr)\text{.}\) Compute \(\varphi'(t)\) for all \(t\text{.}\) Compute \(\snorm{\varphi'(t)}\) for all \(t\text{.}\) Notice that \(\varphi'(t)\) is never zero, yet \(\varphi(0) = \varphi(2\pi)\text{,}\) therefore, Rolle's theorem is not true in more than one dimension.

Exercise 8.4.5.

Let \(f \colon \R^2 \to \R\) be a function such that \(\frac{\partial f}{\partial x}\) and \(\frac{\partial f}{\partial y}\) exist at all points and there exists an \(M \in \R\) such that \(\babs{\frac{\partial f}{\partial x}} \leq M\) and \(\babs{\frac{\partial f}{\partial y}} \leq M\) at all points. Show that \(f\) is continuous.

Exercise 8.4.6.

Let \(f \colon \R^2 \to \R\) be a function and \(M \in R\text{,}\) such that for every \((x,y) \in \R^2\text{,}\) the function \(g(t) := f(xt,yt)\) is differentiable and \(\sabs{g'(t)} \leq M\) for all \(t\text{.}\)

  1. Show that \(f\) is continuous at \((0,0)\text{.}\)

  2. Find an example of such an \(f\) that is discontinuous at every other point of \(\R^2\text{.}\)
    Hint: Think back to how we constructed a nowhere continuous function on \([0,1]\text{.}\)

Exercise 8.4.7.

Suppose \(r \colon \R^n \setminus X \to \R\) is a rational function, that is, let \(p \colon \R^n \to \R\) and \(q \colon \R^n \to \R\) be polynomials, \(q\) not identically zero, where \(X = q^{-1}(0)\text{,}\) and \(r = \frac{p}{q}\text{.}\) Show that \(r\) is continuously differentiable.

Exercise 8.4.8.

Suppose \(f \colon \R^n \to \R\) and \(h \colon \R^n \to \R\) are two differentiable functions such that \(f'(x) = h'(x)\) for all \(x \in \R^n\text{.}\) Prove that if \(f(0) = h(0)\text{,}\) then \(f(x) = h(x)\) for all \(x \in \R^n\text{.}\)

Exercise 8.4.9.

Prove the base case in Proposition 8.4.6. That is, prove that if \(n=1\) and “the partials exist and are continuous,” then the function is continuously differentiable. Note that \(f\) is vector-valued.

Exercise 8.4.10.

Suppose \(g \colon \R \to \R\) is continuously differentiable and \(h \colon \R^2 \to \R\) is continuous. Show that

\begin{equation*} F(x,y) := g(x) + \int_0^y h(x,s) \,ds \end{equation*}

is continuously differentiable, and that it is the solution of the partial differential equation \(\frac{\partial F}{\partial y} = h\text{,}\) with the initial condition \(F(x,0) = g(x)\) for all \(x \in \R\text{.}\)

For a higher quality printout use the PDF versions: or