## Section 8.5 Inverse and implicit function theorems

*Note: 2–3 lectures*

To prove the inverse function theorem we use the contraction mapping principle from Chapter 7, where we used it to prove Picard's theorem. Recall that a mapping \(f \colon X \to Y\) between two metric spaces \((X,d_X)\) and \((Y,d_Y)\) is called a contraction if there exists a \(k < 1\) such that

The contraction mapping principle says that if \(f \colon X \to X\) is a contraction and \(X\) is a complete metric space, then there exists a unique fixed point, that is, there exists a unique \(x \in X\) such that \(f(x) = x\text{.}\)

Intuitively, if a function is continuously differentiable, then it locally “behaves like” the derivative (which is a linear function). The idea of the inverse function theorem is that if a function is continuously differentiable and the derivative is invertible, the function is (locally) invertible.

### Theorem 8.5.1. Inverse function theorem.

Let \(U \subset \R^n\) be an open set and let \(f \colon U \to \R^n\) be a continuously differentiable function. Suppose \(p \in U\) and \(f'(p)\) is invertible (that is, \(J_f(p) \not=0\)). Then there exist open sets \(V, W \subset \R^n\) such that \(p \in V \subset U\text{,}\) \(f(V) = W\) and \(f|_V\) is one-to-one. Hence a function \(g \colon W \to V\) exists such that \(g(y) := (f|_V)^{-1}(y)\text{.}\) See Figure 8.10. Furthermore, \(g\) is continuously differentiable and

### Proof.

Write \(A = f'(p)\text{.}\) As \(f'\) is continuous, there exists an open ball \(V\) around \(p\) such that

Consequently, the derivative \(f'(x)\) is invertible for all \(x \in V\) by Proposition 8.2.6.

Given \(y \in \R^n\text{,}\) we define \(\varphi_y \colon V \to \R^n\) by

As \(A^{-1}\) is one-to-one, \(\varphi_y(x) = x\) (\(x\) is a fixed point) if only if \(y-f(x) = 0\text{,}\) or in other words \(f(x)=y\text{.}\) Using the chain rule we obtain

So for \(x \in V\text{,}\) we have

As \(V\) is a ball, it is convex. Hence

In other words, \(\varphi_y\) is a contraction defined on \(V\text{,}\) though we so far do not know what is the range of \(\varphi_y\text{.}\) We cannot yet apply the fixed point theorem, but we can say that \(\varphi_y\) has at most one fixed point in \(V\text{:}\) If \(\varphi_y(x_1) = x_1\) and \(\varphi_y(x_2) = x_2\text{,}\) then \(\snorm{x_1-x_2} = \snorm{\varphi_y(x_1)-\varphi_y(x_2)} \leq \frac{1}{2} \snorm{x_1-x_2}\text{,}\) so \(x_1 = x_2\text{.}\) That is, there exists at most one \(x \in V\) such that \(f(x) = y\text{,}\) and so \(f|_V\) is one-to-one.

Let \(W := f(V)\) and let \(g \colon W \to V\) be the inverse of \(f|_V\text{.}\) We need to show that \(W\) is open. Take a \(y_0 \in W\text{.}\) There is a unique \(x_0 \in V\) such that \(f(x_0) = y_0\text{.}\) Let \(r > 0\) be small enough such that the closed ball \(C(x_0,r) \subset V\) (such \(r > 0\) exists as \(V\) is open).

Suppose \(y\) is such that

If we show that \(y \in W\text{,}\) then we have shown that \(W\) is open. If \(x_1 \in C(x_0,r)\text{,}\) then

So \(\varphi_y\) takes \(C(x_0,r)\) into \(B(x_0,r) \subset C(x_0,r)\text{.}\) It is a contraction on \(C(x_0,r)\) and \(C(x_0,r)\) is complete (closed subset of \(\R^n\) is complete). Apply the contraction mapping principle to obtain a fixed point \(x\text{,}\) i.e., \(\varphi_y(x) = x\text{.}\) That is, \(f(x) = y\text{,}\) and \(y \in f\bigl(C(x_0,r)\bigr) \subset f(V) = W\text{.}\) Therefore \(W\) is open.

Next we need to show that \(g\) is continuously differentiable and compute its derivative. First, let us show that it is differentiable. Let \(y \in W\) and \(k \in \R^n\text{,}\) \(k\not= 0\text{,}\) such that \(y+k \in W\text{.}\) Because \(f|_V\) is a one-to-one and onto mapping of \(V\) onto \(W\text{,}\) there are unique \(x \in V\) and \(h \in \R^n\text{,}\) \(h \not= 0\) and \(x+h \in V\text{,}\) such that \(f(x) = y\) and \(f(x+h) = y+k\text{.}\) In other words, \(g(y) = x\) and \(g(y+k) = x+h\text{.}\) See Figure 8.11.

We can still squeeze some information from the fact that \(\varphi_y\) is a contraction.

So

By the inverse triangle inequality, \(\snorm{h} - \snorm{A^{-1}k} \leq \frac{1}{2}\snorm{h}\text{.}\) So

In particular, as \(k\) goes to 0, so does \(h\text{.}\)

As \(x \in V\text{,}\) then \(f'(x)\) is invertible. Let \(B := \bigl(f'(x)\bigr)^{-1}\text{,}\) which is what we think the derivative of \(g\) at \(y\) is. Then

As \(k\) goes to 0, so does \(h\text{.}\) So the right-hand side goes to 0 as \(f\) is differentiable, and hence the left-hand side also goes to 0. And \(B\) is precisely what we wanted \(g'(y)\) to be.

We have \(g\) is differentiable, let us show it is \(C^1(W)\text{.}\) The function \(g \colon W \to V\) is continuous (it is differentiable), \(f'\) is a continuous function from \(V\) to \(L(\R^n)\text{,}\) and \(X \mapsto X^{-1}\) is a continuous function on the set of invertible operators. As \(g'(y) = {\bigl( f'\bigl(g(y)\bigr)\bigr)}^{-1}\) is the composition of these three continuous functions, it is continuous.

### Corollary 8.5.2.

Suppose \(U \subset \R^n\) is open and \(f \colon U \to \R^n\) is a continuously differentiable mapping such that \(f'(x)\) is invertible for all \(x \in U\text{.}\) Then for every open set \(V \subset U\text{,}\) the set \(f(V)\) is open (\(f\) is said to be an *open mapping*).

### Proof.

Without loss of generality, suppose \(U=V\text{.}\) For each point \(y \in f(V)\text{,}\) we pick \(x \in f^{-1}(y)\) (there could be more than one such point), then by the inverse function theorem there is a neighborhood of \(x\) in \(V\) that maps onto a neighborhood of \(y\text{.}\) Hence \(f(V)\) is open.

### Example 8.5.3.

The theorem, and the corollary, is not true if \(f'(x)\) is not invertible for some \(x\text{.}\) For example, the map \(f(x,y) := (x,xy)\text{,}\) maps \(\R^2\) onto the set \(\R^2 \setminus \bigl\{ (0,y) : y \neq 0 \bigr\}\text{,}\) which is neither open nor closed. In fact \(f^{-1}(0,0) = \bigl\{ (0,y) : y \in \R \bigr\}\text{.}\) This bad behavior only occurs on the \(y\)-axis, everywhere else the function is locally invertible. If we avoid the \(y\)-axis, \(f\) is even one-to-one.

### Example 8.5.4.

Just because \(f'(x)\) is invertible everywhere does not mean that \(f\) is one-to-one globally. It is “locally” one-to-one but perhaps not “globally.” For an example, take the map \(f \colon \R^2 \setminus \bigl\{ (0,0) \bigr\} \to \R^2 \setminus \bigl\{ (0,0) \bigr\}\) defined by \(f(x,y) := (x^2-y^2,2xy)\text{.}\) It is left to student to show that \(f\) is differentiable and the derivative is invertible.

On the other hand, the mapping \(f\) is 2-to-1 globally. For every \((a,b)\) that is not the origin, there are exactly two solutions to \(x^2-y^2=a\) and \(2xy=b\) (it is also onto). We leave it to the student to show that there is at least one solution, and then notice that replacing \(x\) and \(y\) with \(-x\) and \(-y\) we obtain another solution.

The invertibility of the derivative is not a necessary condition, just sufficient, for having a continuous inverse and being an open mapping. For example, the function \(f(x) := x^3\) is an open mapping from \(\R\) to \(\R\) and is globally one-to-one with a continuous inverse, although the inverse is not differentiable at \(x=0\text{.}\)

As a side note, there is a related famous, and as yet unsolved problem, called the *Jacobian conjecture*. If \(F \colon \R^n \to
\R^n\) is polynomial (each component is a polynomial) and \(J_F\) is a nonzero constant, does \(F\) have a polynomial inverse? The inverse function theorem gives a local \(C^1\) inverse, but can one always find a global polynomial inverse is the question.

### Subsection 8.5.1 Implicit function theorem

The inverse function theorem is really a special case of the implicit function theorem, which we prove next. Although somewhat ironically we prove the implicit function theorem using the inverse function theorem. In the inverse function theorem we showed that the equation \(x-f(y) = 0\) is solvable for \(y\) in terms of \(x\) if the derivative in terms of \(y\) is invertible, that is if \(f'(y)\) is invertible. Then there is (locally) a function \(g\) such that \(x-f\bigl(g(x)\bigr) = 0\text{.}\)

OK, so how about the equation \(f(x,y) = 0\text{.}\) This equation is not solvable for \(y\) in terms of \(x\) in every case. For example, there is no solution when \(f(x,y)\) does not actually depend on \(y\text{.}\) For a slightly more complicated example, notice that \(x^2+y^2-1 = 0\) defines the unit circle, and we can locally solve for \(y\) in terms of \(x\) when 1) we are near a point that lies on the unit circle and 2) when we are not at a point where the circle has a vertical tangency, or in other words where \(\frac{\partial f}{\partial y} = 0\text{.}\)

To make things simple, we fix some notation. We let \((x,y) \in \R^{n+m}\) denote the coordinates \((x_1,\ldots,x_n,y_1,\ldots,y_m)\text{.}\) A linear transformation \(A \in L(\R^{n+m},\R^m)\) can then be written as \(A = [ A_x ~ A_y ]\) so that \(A(x,y) = A_x x + A_y y\text{,}\) where \(A_x \in L(\R^n,\R^m)\) and \(A_y \in L(\R^m)\text{.}\)

#### Proposition 8.5.5.

Let \(A = [A_x~A_y] \in L(\R^{n+m},\R^m)\) and suppose \(A_y\) is invertible. If \(B = - {(A_y)}^{-1} A_x\text{,}\) then

Furthermore, \(y=Bx\) is the unique \(y \in \R^m\) such that \(A(x,y) = 0\text{.}\)

The proof is immediate: We solve and obtain \(y = Bx\text{.}\) Another way to solve is to “complete the basis,” that is, add rows to the matrix until we have an invertible matrix. In this case, we construct a mapping \((x,y) \mapsto (x,A_x x + A_y y)\text{,}\) and find that this operator in \(L(\R^{n+m})\) is invertible, and the map \(B\) can be read off from the inverse. Let us show that the same can be done for \(C^1\) functions.

#### Theorem 8.5.6. Implicit function theorem.

Let \(U \subset \R^{n+m}\) be an open set and let \(f \colon U \to \R^m\) be a \(C^1(U)\) mapping. Let \((p,q) \in U\) be a point such that \(f(p,q) = 0\) and such that

Then there exists an open set \(W \subset \R^n\) with \(p \in W\text{,}\) an open set \(W' \subset \R^m\) with \(q \in W'\text{,}\) with \(W \times W' \subset U\text{,}\) and a \(C^1(W)\) mapping \(g \colon W \to W'\text{,}\) with \(g(p) = q\text{,}\) and for all \(x \in W\text{,}\) the point \(g(x)\) is the unique point in \(W'\) such that

Furthermore, if \(A = [ A_x ~ A_y ] = f'(p,q)\text{,}\) then

The condition \(\frac{\partial(f_1,\ldots,f_m)}{\partial(y_1,\ldots,y_m)} (p,q) = \det(A_y) \neq 0\) simply means that \(A_y\) is invertible. If \(n=m=1\text{,}\) the condition becomes \(\frac{\partial f}{\partial y}(p,q) \not= 0\text{,}\) \(W\) and \(W'\) are open intervals. See Figure 8.12.

#### Proof.

Define \(F \colon U \to \R^{n+m}\) by \(F(x,y) := \bigl(x,f(x,y)\bigr)\text{.}\) It is clear that \(F\) is \(C^1\text{,}\) and we want to show that the derivative at \((p,q)\) is invertible.

Let us compute the derivative. The quotient

goes to zero as \(\snorm{(h,k)} = \sqrt{\snorm{h}^2+\snorm{k}^2}\) goes to zero. But then so does

So the derivative of \(F\) at \((p,q)\) takes \((h,k)\) to \((h,A_x h+A_y k)\text{.}\) In block matrix form, it is \(\left[\begin{smallmatrix}I & 0\\A_x & A_y\end{smallmatrix}\right]\text{.}\) If \((h,A_x h+A_y k) = (0,0)\text{,}\) then \(h=0\text{,}\) and so \(A_y k = 0\text{.}\) As \(A_y\) is one-to-one, \(k=0\text{.}\) Thus \(F'(p,q)\) is one-to-one or in other words invertible, and we apply the inverse function theorem.

That is, there exists an open set \(V \subset \R^{n+m}\) with \(F(p,q) = (p,0) \in V\text{,}\) and a \(C^1\) mapping \(G \colon V \to \R^{n+m}\text{,}\) such that \(F\bigl(G(x,s)\bigr) = (x,s)\) for all \((x,s) \in V\text{,}\) \(G\) is one-to-one, and \(G(V)\) is open. Write \(G = (G_1,G_2)\) (the first \(n\) and the second \(m\) components of \(G\)). Then

So \(x = G_1(x,s)\) and \(f\bigl(G_1(x,s),G_2(x,s)\bigr) = f\bigl(x,G_2(x,s)\bigr) = s\text{.}\) Plugging in \(s=0\text{,}\) we obtain

As the set \(G(V)\) is open and \((p,q) \in G(V)\text{,}\) there exist some open sets \(\widetilde{W}\) and \(W'\) such that \(\widetilde{W} \times W' \subset G(V)\) with \(p \in \widetilde{W}\) and \(q \in W'\text{.}\) Take \(W := \bigl\{ x \in \widetilde{W} : G_2(x,0) \in W' \bigr\}\text{.}\) The function that takes \(x\) to \(G_2(x,0)\) is continuous and therefore \(W\) is open. Define \(g \colon W \to \R^m\) by \(g(x) := G_2(x,0)\text{,}\) which is the \(g\) in the theorem. The fact that \(g(x)\) is the unique point in \(W'\) follows because \(W \times W' \subset G(V)\) and \(G\) is one-to-one.

Next, differentiate

at \(p\text{,}\) which is the zero map, so its derivative is zero. Using the chain rule,

for all \(h \in \R^{n}\text{,}\) and we obtain the desired derivative for \(g\text{.}\)

In other words, in the context of the theorem, we have \(m\) equations in \(n+m\) unknowns:

The condition guaranteeing a solution is that \(f\) is a \(C^1\) mapping (all the components are \(C^1\text{:}\) partial derivatives in all variables exist and are continuous) and that the matrix

is invertible at \((p,q)\text{.}\)

#### Example 8.5.7.

Consider the set given by \(x^2+y^2-{(z+1)}^3 = -1\) and \(e^x+e^y+e^z = 3\) near the point \((0,0,0)\text{.}\) It is the zero set of the mapping

whose derivative is

The matrix

is invertible. Hence near \((0,0,0)\) we can solve for \(y\) and \(z\) as \(C^1\) functions of \(x\) such that for \(x\) near \(0\text{,}\) we have

The theorem does not tell us how to find \(y(x)\) and \(z(x)\) explicitly, it just tells us they exist. In other words, near the origin the set of solutions is a smooth curve in \(\R^3\) that goes through the origin.

An interesting observation from the proof is that we solved the equation \(f\bigl(x,g(x)\bigr) = s\) for all \(s\) in some neighborhood of \(0\text{,}\) not just \(s=0\text{.}\)

#### Remark 8.5.8.

There are versions of the theorem for arbitrarily many derivatives. If \(f\) has \(k\) continuous derivatives, then the solution also has \(k\) continuous derivatives. See also the next section.

### Subsection 8.5.2 Exercises

#### Exercise 8.5.1.

Let \(C := \bigl\{ (x,y) \in \R^2 : x^2+y^2 = 1 \bigr\}\text{.}\)

Solve for \(y\) in terms of \(x\) near \((0,1)\) (that is, find the function \(g\) from the implicit function theorem for a neighborhood of the point \((p,q) = (0,1)\)).

Solve for \(y\) in terms of \(x\) near \((0,-1)\text{.}\)

Solve for \(x\) in terms of \(y\) near \((-1,0)\text{.}\)

#### Exercise 8.5.2.

Define \(f \colon \R^2 \to \R^2\) by \(f(x,y) := \bigl(x,y+h(x)\bigr)\) for some continuously differentiable function \(h\) of one variable.

Show that \(f\) is one-to-one and onto.

Compute \(f'\text{.}\)

Show that \(f'\) is invertible at all points, and compute its inverse.

#### Exercise 8.5.3.

Define \(f \colon \R^2 \to \R^2 \setminus \bigl\{ (0,0) \bigr\}\) by \(f(x,y) := \bigl(e^x\cos(y),e^x\sin(y)\bigr)\text{.}\)

Show that \(f\) is onto.

Show that \(f'\) is invertible at all points.

Show that \(f\) is not one-to-one, in fact for every \((a,b) \in \R^2 \setminus \bigl\{ (0,0) \bigr\}\text{,}\) there exist infinitely many different points \((x,y) \in \R^2\) such that \(f(x,y) = (a,b)\text{.}\)

Therefore, invertible derivative at every point does not mean that \(f\) is invertible globally.

Note: Feel free to use what you know about sine and cosine from calculus.

#### Exercise 8.5.4.

Find a map \(f \colon \R^n \to \R^n\) that is one-to-one, onto, continuously differentiable, but \(f'(0) = 0\text{.}\) Hint: Generalize \(f(x) = x^3\) from one to \(n\) dimensions.

#### Exercise 8.5.5.

Consider \(z^2 + xz + y =0\) in \(\R^3\text{.}\) Find an equation \(D(x,y)=0\text{,}\) such that if \(D(x_0,y_0) \not= 0\) and \(z^2+x_0z+y_0 = 0\) for some \(z \in \R\text{,}\) then for points near \((x_0,y_0)\) there exist exactly two distinct continuously differentiable functions \(r_1(x,y)\) and \(r_2(x,y)\) such that \(z=r_1(x,y)\) and \(z=r_2(x,y)\) solve \(z^2 + xz + y =0\text{.}\) Do you recognize the expression \(D\) from algebra?

#### Exercise 8.5.6.

Suppose \(f \colon (a,b) \to \R^2\) is continuously differentiable and the first component (the \(x\) component) of \(\nabla f(t)\) is not equal to 0 for all \(t \in (a,b)\text{.}\) Prove that there exists an interval \((c,d)\) and a continuously differentiable function \(g \colon (c,d) \to \R\) such that \((x,y) \in f\bigl((a,b)\bigr)\) if and only if \(x \in (c,d)\) and \(y=g(x)\text{.}\) In other words, the set \(f\bigl((a,b)\bigr)\) is a graph of \(g\text{.}\)

#### Exercise 8.5.7.

Define \(f \colon \R^2 \to \R^2\)

Show that \(f\) is differentiable everywhere.

Show that \(f'(0,0)\) is invertible.

Show that \(f\) is not one-to-one in every neighborhood of the origin (it is not locally invertible, that is, the inverse function theorem does not work).

Show that \(f\) is not continuously differentiable.

Note: Feel free to use what you know about sine and cosine from calculus.

#### Exercise 8.5.8.

*(Polar coordinates)* Define a mapping \(F(r,\theta) := \bigl(r \cos(\theta), r \sin(\theta) \bigr)\text{.}\)

Show that \(F\) is continuously differentiable (for all \((r,\theta) \in \R^2\)).

Compute \(F'(0,\theta)\) for all \(\theta\text{.}\)

Show that if \(r \not= 0\text{,}\) then \(F'(r,\theta)\) is invertible, therefore an inverse of \(F\) exists locally as long as \(r \not= 0\text{.}\)

Show that \(F \colon \R^2 \to \R^2\) is onto, and for each point \((x,y) \in \R^2\text{,}\) the set \(F^{-1}(x,y)\) is infinite.

Show that \(F \colon \R^2 \to \R^2\) is an open map, despite not satisfying the condition of the inverse function theorem.

Show that \(F|_{(0,\infty) \times [0,2\pi)}\) is one-to-one and onto \(\R^2 \setminus \bigl\{ (0,0) \bigr\}\text{.}\)

Note: Feel free to use what you know about sine and cosine from calculus.

#### Exercise 8.5.9.

Let \(H := \bigl\{ (x,y) \in \R^2 : y > 0 \}\text{,}\) and for \((x,y) \in H\) define

Prove that \(F\) is a bijective mapping from \(H\) to \(B(0,1)\text{,}\) it is continuously differentiable on \(H\text{,}\) and its inverse is also continuously differentiable.

#### Exercise 8.5.10.

Suppose \(U \subset \R^2\) is open and \(f \colon U \to \R\) is a \(C^1\) function such that \(\nabla f(x,y) \not= 0\) for all \((x,y) \in U\text{.}\) Show that every level set is a \(C^1\) smooth curve. That is, for every \((x,y) \in U\text{,}\) there exists a \(C^1\) function \(\gamma \colon (-\delta,\delta) \to \R^2\) with \(\gamma^{\:\prime}(0) \not= 0\) such that \(f\bigl(\gamma(t)\bigr)\) is constant for all \(t \in (-\delta,\delta)\text{.}\)

#### Exercise 8.5.11.

Suppose \(U \subset \R^2\) is open and \(f \colon U \to \R\) is a \(C^1\) function such that \(\nabla f(x,y) \not= 0\) for all \((x,y) \in U\text{.}\) Show that for every \((x,y)\) there exists a neighborhood \(V\) of \((x,y)\) an open set \(W \subset \R^2\text{,}\) a bijective \(C^1\) function with a \(C^1\) inverse \(g \colon W \to V\) such that the level sets of \(f \circ g\) are horizontal lines in \(W\text{,}\) that is, the set given by \((f \circ g) (s,t) = c\) for a constant \(c\) is a set of the form \(\bigl\{ (s,t_0) \in \R^2 : s \in \R, (s,t_0) \in W \bigr\}\text{,}\) where \(t_0\) is fixed. That is, the level curves can be locally “straightened.”