Intuitively, if a function is continuously differentiable, then it locally “behaves like” the derivative (which is a linear function). The idea of the inverse function theorem is that if a function is continuously differentiable and the derivative is invertible, the function is (locally) invertible.

Theorem8.5.1.Inverse function theorem.

Let \(U \subset \R^n\) be an open set and let \(f \colon U \to \R^n\) be a continuously differentiable function. Suppose \(p \in U\) and \(f'(p)\) is invertible (that is, \(J_f(p) \not=0\)). Then there exist open sets \(V, W \subset \R^n\) such that \(p \in V \subset U\text{,}\)\(f(V) = W\text{,}\) and \(f|_V\) is one-to-one. Hence a function \(g \colon W \to V\) exists such that \(g(y) \coloneqq (f|_V)^{-1}(y)\text{.}\) Furthermore, \(g\) is continuously differentiable and

\begin{equation*}
g'(y) = {\bigl(f'(x)\bigr)}^{-1}, \qquad \text{for all } x \in V, y = f(x).
\end{equation*}

To prove the theorem, we use the contraction mapping principle from Chapter 7, where we used it to prove Picard’s theorem. Recall that a mapping \(f \colon X \to Y\) between metric spaces \((X,d_X)\) and \((Y,d_Y)\) is a contraction if there exists a \(k < 1\) such that

\begin{equation*}
d_Y\bigl(f(p),f(q)\bigr) \leq k \, d_X(p,q)
\qquad \text{for all } p,q \in X.
\end{equation*}

The contraction mapping principle says that if \(f \colon X \to X\) is a contraction and \(X\) is a complete metric space, then there exists a unique fixed point, that is, there exists a unique \(x \in X\) such that \(f(x) = x\text{.}\)

Proof.

Write \(A = f'(p)\text{.}\) As \(f'\) is continuous, there is an open ball \(V\) centered at \(p\) such that

\begin{equation*}
\snorm{A-f'(x)} < \frac{1}{2\snorm{A^{-1}}}
\qquad \text{for all } x \in V.
\end{equation*}

Consequently, the derivative \(f'(x)\) is invertible for all \(x \in V\) by Proposition 8.2.6.

Given \(y \in \R^n\text{,}\) define \(\varphi_y \colon V \to \R^n\) by

\begin{equation*}
\varphi_y (x) \coloneqq x + A^{-1}\bigl(y-f(x)\bigr) .
\end{equation*}

As \(A^{-1}\) is one-to-one, \(\varphi_y(x) = x\) (\(x\) is a fixed point) if only if \(y-f(x) = 0\text{,}\) or in other words \(f(x)=y\text{.}\) Using the chain rule we obtain

\begin{equation*}
\snorm{\varphi_y(x_1)-\varphi_y(x_2)} \leq \frac{1}{2} \snorm{x_1-x_2}
\qquad
\text{for all } x_1,x_2 \in V.
\end{equation*}

In other words, \(\varphi_y\) is a contraction defined on \(V\text{,}\) though we so far do not know what is the range of \(\varphi_y\text{.}\) We cannot yet apply the fixed point theorem, but we can say that \(\varphi_y\) has at most one fixed point in \(V\text{:}\) If \(\varphi_y(x_1) = x_1\) and \(\varphi_y(x_2) = x_2\text{,}\) then \(\snorm{x_1-x_2} = \snorm{\varphi_y(x_1)-\varphi_y(x_2)} \leq
\frac{1}{2} \snorm{x_1-x_2}\text{,}\) so \(x_1 = x_2\text{.}\) That is, there exists at most one \(x \in V\) such that \(f(x) = y\text{,}\) and so \(f|_V\) is one-to-one.

Let \(W \coloneqq f(V)\) and let \(g \colon W \to V\) be the inverse of \(f|_V\text{.}\) We need to show that \(W\) is open. Take a \(y_0 \in W\text{.}\) There is a unique \(x_0 \in V\) such that \(f(x_0) = y_0\text{.}\) Let \(r > 0\) be small enough such that the closed ball \(C(x_0,r) \subset V\) (such \(r > 0\) exists as \(V\) is open).

So \(\varphi_y\) takes \(C(x_0,r)\) into \(B(x_0,r) \subset C(x_0,r)\text{.}\) It is a contraction on \(C(x_0,r)\) and \(C(x_0,r)\) is complete (closed subset of \(\R^n\) is complete). Apply the contraction mapping principle to obtain a fixed point \(x\text{,}\) i.e. \(\varphi_y(x) = x\text{.}\) That is, \(f(x) = y\text{,}\) and \(y \in
f\bigl(C(x_0,r)\bigr) \subset f(V) = W\text{.}\) Therefore, \(W\) is open.

Next we need to show that \(g\) is continuously differentiable and compute its derivative. First, let us show that it is differentiable. Let \(y \in W\) and \(k \in \R^n\text{,}\)\(k\not= 0\text{,}\) such that \(y+k \in W\text{.}\) Because \(f|_V\) is a one-to-one and onto mapping of \(V\) onto \(W\text{,}\) there are unique \(x \in V\) and \(h \in \R^n\text{,}\)\(h \not= 0\) and \(x+h \in V\text{,}\) such that \(f(x) = y\) and \(f(x+h) = y+k\text{.}\) In other words, \(g(y) = x\) and \(g(y+k) = x+h\text{.}\) See Figure 8.12.

We can still squeeze some information from the fact that \(\varphi_y\) is a contraction.

\begin{equation*}
\varphi_y(x+h)-\varphi_y(x) = h + A^{-1} \bigl( f(x)-f(x+h) \bigr) = h - A^{-1} k .
\end{equation*}

In particular, as \(k\) goes to 0, so does \(h\text{.}\)

As \(x \in V\text{,}\) then \(f'(x)\) is invertible. Let \(B \coloneqq \bigl(f'(x)\bigr)^{-1}\text{,}\) which is what we think the derivative of \(g\) at \(y\) is. Then

As \(k\) goes to 0, so does \(h\text{.}\) So the right-hand side goes to 0 as \(f\) is differentiable, and hence the left-hand side also goes to 0. And \(B\) is precisely what we wanted \(g'(y)\) to be.

We have \(g\) is differentiable, let us show it is \(C^1(W)\text{.}\) The function \(g \colon W \to V\) is continuous (it is differentiable), \(f'\) is a continuous function from \(V\) to \(L(\R^n)\text{,}\) and \(X \mapsto X^{-1}\) is a continuous function on the set of invertible operators. As \(g'(y) = {\bigl( f'\bigl(g(y)\bigr)\bigr)}^{-1}\) is the composition of these three continuous functions, it is continuous.

Corollary8.5.2.

Suppose \(U \subset \R^n\) is open and \(f \colon U \to \R^n\) is a continuously differentiable mapping such that \(f'(x)\) is invertible for all \(x \in U\text{.}\) Then for every open set \(V \subset U\text{,}\) the set \(f(V)\) is open (\(f\) is said to be an open mapping).

Proof.

Without loss of generality, suppose \(U=V\text{.}\) For each \(y \in f(V)\text{,}\) pick \(x \in f^{-1}(y)\) (there could be more than one such point), then by the inverse function theorem there is a neighborhood of \(x\) in \(V\) that maps onto a neighborhood of \(y\text{.}\) Hence \(f(V)\) is open.

Example8.5.3.

The theorem, and the corollary, is not true if \(f'(x)\) is not invertible for some \(x\text{.}\) For example, the map \(f(x,y) \coloneqq (x,xy)\text{,}\) maps \(\R^2\) onto the set \(\R^2 \setminus \bigl\{ (0,y) : y \neq 0 \bigr\}\text{,}\) which is neither open nor closed. In fact, \(f^{-1}(0,0) = \bigl\{ (0,y) : y \in \R \bigr\}\text{.}\) This bad behavior only occurs on the \(y\)-axis, everywhere else the function is locally invertible. If we avoid the \(y\)-axis, \(f\) is even one-to-one.

Example8.5.4.

Just because \(f'(x)\) is invertible everywhere does not mean that \(f\) is one-to-one. It is “locally” one-to-one, but perhaps not “globally.” Consider \(f \colon \R^2 \setminus \bigl\{ (0,0) \bigr\} \to
\R^2 \setminus \bigl\{ (0,0) \bigr\}\) defined by \(f(x,y) \coloneqq (x^2-y^2,2xy)\text{.}\) It is left to the reader to verify the following statements. The map \(f\) is differentiable and the derivative is invertible. On the other hand, \(f\) is 2-to-1 globally: For every \((a,b)\) that is not the origin, there are exactly two solutions to \(x^2-y^2=a\) and \(2xy=b\) (\(f\) is also onto). Notice that once you show that there is at least one solution, replacing \(x\) and \(y\) with \(-x\) and \(-y\) we obtain another solution.

The invertibility of the derivative is not a necessary condition, just sufficient, for having a continuous inverse and for being an open mapping. For example, the function \(f(x) \coloneqq x^3\) is an open mapping from \(\R\) to \(\R\) and is globally one-to-one with a continuous inverse, although the inverse is not differentiable at \(x=0\text{.}\)

As a side note, there is a related famous, and as yet unsolved, problem called the Jacobian conjecture. If \(F \colon \R^n \to
\R^n\) is polynomial (each component is a polynomial) and \(J_F\) (the Jacobian determinant) is a nonzero constant, does \(F\) have a polynomial inverse? The inverse function theorem gives a local \(C^1\) inverse, but can one always find a global polynomial inverse is the question.

Subsection8.5.1Implicit function theorem

The inverse function theorem is a special case of the implicit function theorem, which we prove next. Although somewhat ironically we prove the implicit function theorem using the inverse function theorem. In the inverse function theorem we showed that the equation \(x-f(y) = 0\) is solvable for \(y\) in terms of \(x\) if the derivative with respect to \(y\) is invertible, that is, if \(f'(y)\) is invertible. Then there is (locally) a function \(g\) such that \(x-f\bigl(g(x)\bigr) = 0\text{.}\)

In general, the equation \(f(x,y) = 0\) is not not solvable for \(y\) in terms of \(x\) in every case. For instance, there is generally no solution when \(f(x,y)\) does not actually depend on \(y\text{.}\) For a more interesting example, notice that \(x^2+y^2-1 = 0\) defines the unit circle, and we can locally solve for \(y\) in terms of \(x\) when 1) we are near a point on the unit circle and 2) we are not at a point where the circle has a vertical tangency, that is, where \(\frac{\partial f}{\partial y} = 0\text{.}\)

We fix some notation. Let \((x,y) \in
\R^{n+m}\) denote the coordinates \((x_1,\ldots,x_n,y_1,\ldots,y_m)\text{.}\) We can then write a linear map \(A \in L(\R^{n+m},\R^m)\) as \(A = [ A_x ~ A_y ]\) so that \(A(x,y) = A_x x + A_y y\text{,}\) where \(A_x \in L(\R^n,\R^m)\) and \(A_y \in L(\R^m)\text{.}\) First, the linear version of the theorem.

Proposition8.5.5.

Let \(A = [A_x~A_y] \in L(\R^{n+m},\R^m)\) and suppose \(A_y\) is invertible. If \(B = - {(A_y)}^{-1} A_x\text{,}\) then

\begin{equation*}
0 = A ( x, Bx) = A_x x + A_y Bx .
\end{equation*}

Furthermore, \(y=Bx\) is the unique \(y \in \R^m\) such that \(A(x,y) = 0\text{.}\)

The proof is immediate: We solve and obtain \(y = Bx\text{.}\) Another way to solve is to “complete the basis,” that is, add rows to the matrix until we have an invertible matrix: The operator in \(L(\R^{n+m})\) given by \((x,y) \mapsto (x,A_x x + A_y y)\) is invertible, and the map \(B\) can be read off from the inverse. Let us show that the same can be done for \(C^1\) functions.

Theorem8.5.6.Implicit function theorem.

Let \(U \subset \R^{n+m}\) be an open set and let \(f \colon U \to \R^m\) be a \(C^1(U)\) mapping. Let \((p,q) \in U\) be a point such that \(f(p,q) = 0\) and such that

Then there exists an open set \(W \subset \R^n\) with \(p \in W\text{,}\) an open set \(W' \subset \R^m\) with \(q \in W'\text{,}\) where \(W \times W' \subset U\text{,}\) and a \(C^1(W)\) map \(g \colon W \to W'\text{,}\) with \(g(p) = q\text{,}\) and for all \(x \in W\text{,}\) the point \(g(x)\) is the unique point in \(W'\) such that

The condition \(\frac{\partial(f_1,\ldots,f_m)}{\partial(y_1,\ldots,y_m)} (p,q) =
\det(A_y) \neq 0\) simply means that \(A_y\) is invertible. If \(n=m=1\text{,}\) the condition is \(\frac{\partial f}{\partial y}(p,q) \not= 0\text{,}\) and \(W\) and \(W'\) are open intervals. See Figure 8.13.

Proof.

Define \(F \colon U \to \R^{n+m}\) by \(F(x,y) \coloneqq \bigl(x,f(x,y)\bigr)\text{.}\) It is clear that \(F\) is \(C^1\text{,}\) and we want to show that its derivative at \((p,q)\) is invertible. Let us compute the derivative. The quotient

So the derivative of \(F\) at \((p,q)\) takes \((h,k)\) to \((h,A_x h+A_y k)\text{.}\) In block matrix form, it is \(\left[\begin{smallmatrix}I & 0\\A_x & A_y\end{smallmatrix}\right]\text{.}\) If \((h,A_x h+A_y k) = (0,0)\text{,}\) then \(h=0\text{,}\) and so \(A_y k = 0\text{.}\) As \(A_y\) is one-to-one, \(k=0\text{.}\) Thus \(F'(p,q)\) is one-to-one, and hence invertible. We apply the inverse function theorem.

That is, there exists an open set \(V \subset \R^{n+m}\) with \(F(p,q) = (p,0) \in V\text{,}\) and a \(C^1\) mapping \(G \colon V \to \R^{n+m}\text{,}\) such that \(F\bigl(G(x,s)\bigr) = (x,s)\) for all \((x,s) \in V\text{,}\)\(G\) is one-to-one, and \(G(V)\) is open. Write \(G = (G_1,G_2)\) (the first \(n\) and the next \(m\) components of \(G\)). Then

As the set \(G(V)\) is open and \((p,q) \in G(V)\text{,}\) there exist some open sets \(\widetilde{W}\) and \(W'\) such that \(\widetilde{W} \times W' \subset G(V)\) with \(p
\in \widetilde{W}\) and \(q \in W'\text{.}\) Take \(W \coloneqq \bigl\{ x \in \widetilde{W} : G_2(x,0) \in W' \bigr\}\text{.}\) The function that takes \(x\) to \(G_2(x,0)\) is continuous and therefore \(W\) is open. Define \(g \colon W \to \R^m\) by \(g(x) \coloneqq G_2(x,0)\text{,}\) which is the \(g\) in the theorem. The fact that \(g(x)\) is the unique point in \(W'\) follows because \(W \times
W' \subset G(V)\) and \(G\) is one-to-one.

The theorem guarantees a solution if \(f=(f_1,f_2,\ldots,f_m)\) is a \(C^1\) map (the components are \(C^1\text{:}\) partial derivatives in all variables exist and are continuous) and the matrix

In other words, near the origin the set of solutions is a smooth curve in \(\R^3\) that goes through the origin. The theorem does not tell us how to find \(y(x)\) and \(z(x)\) explicitly, it just tells us they exist.

An interesting, and sometimes useful, observation from the proof is that we solved the equation \(f\bigl(x,g(x)\bigr) = s\) for all \(s\) in some neighborhood of \(0\text{,}\) not just \(s=0\text{.}\)

Remark8.5.8.

There are versions of the theorem for arbitrarily many derivatives: If \(f\) has \(k\) continuous derivatives (see the next section), then the solution has \(k\) continuous derivatives as well.

Solve for \(y\) in terms of \(x\) near \((0,1)\) (that is, find the function \(g\) from the implicit function theorem for a neighborhood of the point \((p,q) = (0,1)\)).

Solve for \(y\) in terms of \(x\) near \((0,-1)\text{.}\)

Solve for \(x\) in terms of \(y\) near \((-1,0)\text{.}\)

Exercise8.5.2.

Define \(f \colon \R^2 \to \R^2\) by \(f(x,y) \coloneqq
\bigl(x,y+h(x)\bigr)\) for some continuously differentiable function \(h\) of one variable.

Show that \(f\) is one-to-one and onto.

Compute \(f'\text{.}\) (Make sure to argue why \(f'\) exists.)

Show that \(f'\) is invertible at all points, and compute its inverse.

Show that \(f\) is not one-to-one, in fact for every \((a,b) \in \R^2
\setminus \bigl\{ (0,0) \bigr\}\text{,}\) there exist infinitely many different points \((x,y) \in \R^2\) such that \(f(x,y) = (a,b)\text{.}\)

Therefore, invertible derivative at every point does not mean that \(f\) is invertible globally. Note: Feel free to use what you know about sine and cosine from calculus.

Exercise8.5.4.

Find a map \(f \colon \R^n \to \R^n\) that is one-to-one, onto, continuously differentiable, but \(f'(0) = 0\text{.}\) Hint: Generalize \(f(x) = x^3\) from one to \(n\) dimensions.

Exercise8.5.5.

Consider \(z^2 + xz + y =0\) in \(\R^3\text{.}\) Find an equation \(D(x,y)=0\text{,}\) such that if \(D(x_0,y_0) \not= 0\) and \(z^2+x_0z+y_0 = 0\) for some \(z \in \R\text{,}\) then for points near \((x_0,y_0)\) there exist exactly two distinct continuously differentiable functions \(r_1(x,y)\) and \(r_2(x,y)\) such that \(z=r_1(x,y)\) and \(z=r_2(x,y)\) solve \(z^2 + xz + y =0\text{.}\) Do you recognize the expression \(D\) from algebra?

Exercise8.5.6.

Suppose \(f \colon (a,b) \to \R^2\) is continuously differentiable and the first component (the \(x\) component) of \(\nabla f(t)\) is not equal to 0 for all \(t \in (a,b)\text{.}\) Prove that there exists an open interval interval \(I \subset \R\) and a continuously differentiable function \(g \colon I \to \R\) such that \((x,y) \in f\bigl((a,b)\bigr)\) if and only if \(x \in I\) and \(y=g(x)\text{.}\) In other words, the set \(f\bigl((a,b)\bigr)\) is a graph of \(g\text{.}\)

Show that \(f\) is not one-to-one in every neighborhood of the origin (it is not locally invertible, that is, the inverse function theorem does not work).

Show that \(f\) is not continuously differentiable.

Note: Feel free to use what you know about sine and cosine from calculus.

Exercise8.5.8.

(Polar coordinates) Define a mapping \(F(r,\theta) \coloneqq \bigl(r \cos(\theta), r \sin(\theta) \bigr)\text{.}\)

Show that \(F\) is continuously differentiable (for all \((r,\theta) \in
\R^2\)).

Compute \(F'(0,\theta)\) for all \(\theta\text{.}\)

Show that if \(r \not= 0\text{,}\) then \(F'(r,\theta)\) is invertible, therefore an inverse of \(F\) exists locally as long as \(r \not= 0\text{.}\)

Show that \(F \colon \R^2 \to \R^2\) is onto, and for each point \((x,y) \in
\R^2\text{,}\) the set \(F^{-1}(x,y)\) is infinite.

Show that \(F \colon \R^2 \to \R^2\) is an open map, despite not satisfying the condition of the inverse function theorem.

Show that \(F|_{(0,\infty) \times [0,2\pi)}\) is one-to-one and onto \(\R^2 \setminus \bigl\{ (0,0) \bigr\}\text{.}\)

Note: Feel free to use what you know about sine and cosine from calculus.

Exercise8.5.9.

Let \(H \coloneqq \bigl\{ (x,y) \in \R^2 : y > 0 \}\text{,}\) and for \((x,y) \in H\) define

Prove that \(F\) is a bijective mapping from \(H\) to \(B(0,1)\text{,}\) it is continuously differentiable on \(H\text{,}\) and its inverse is also continuously differentiable.

Exercise8.5.10.

Suppose \(U \subset \R^2\) is open and \(f \colon U \to \R\) is a \(C^1\) function such that \(\nabla f(x,y) \not= 0\) for all \((x,y) \in U\text{.}\) Show that every level set is a \(C^1\) smooth curve. That is, for every \((x,y) \in U\text{,}\) there exists a \(C^1\) function \(\gamma \colon (-\delta,\delta)
\to \R^2\) with \(\gamma^{\:\prime}(0) \not= 0\) such that \(f\bigl(\gamma(t)\bigr)\) is constant for all \(t \in (-\delta,\delta)\text{.}\)

Exercise8.5.11.

Suppose \(U \subset \R^2\) is open and \(f \colon U \to \R\) is a \(C^1\) function such that \(\nabla f(x,y) \not= 0\) for all \((x,y) \in U\text{.}\) Show that for every \((x,y)\) there exists a neighborhood \(V\) of \((x,y)\) an open set \(W \subset \R^2\text{,}\) a bijective \(C^1\) function with a \(C^1\) inverse \(g \colon W \to V\) such that the level sets of \(f \circ g\) are horizontal lines in \(W\text{,}\) that is, the set given by \((f \circ g) (s,t) = c\) for a constant \(c\) is a set of the form \(\bigl\{ (s,t_0) \in \R^2 : s \in \R, (s,t_0) \in W \bigr\}\text{,}\) where \(t_0\) is fixed. That is, the level curves can be locally “straightened.”

For a higher quality printout use the PDF versions: https://www.jirka.org/ra/realanal.pdf or https://www.jirka.org/ra/realanal2.pdf