Skip to main content

Section 8.2 Analysis with vector spaces

Note: 3 lectures

Subsection 8.2.1 Norms

Let us start measuring the size of vectors and hence distance.

Definition 8.2.1.

If \(X\) is a vector space, then we say a function \(\snorm{\cdot} \colon X \to \R\) is a norm if

  1. \(\snorm{x} \geq 0\text{,}\) with \(\snorm{x}=0\) if and only if \(x=0\text{.}\)

  2. \(\snorm{cx} = \sabs{c} \, \snorm{x}\) for all \(c \in \R\) and \(x \in X\text{.}\)

  3. \(\snorm{x+y} \leq \snorm{x}+\snorm{y}\) for all \(x,y \in X\text{.}\)        (triangle inequality)

A vector space equipped with a norm is called a normed vector space.

Given a norm (any norm) on a vector space \(X\text{,}\) we define a distance \(d(x,y) := \snorm{x-y}\text{,}\) and this \(d\) makes \(X\) into a metric space (exercise). So everything you know about metric spaces applies to normed vector spaces.

Before defining the standard norm on \(\R^n\text{,}\) let us define the standard scalar dot product on \(\R^n\text{.}\) For vectors \(x=(x_1,x_2,\ldots,x_n) \in \R^n\) and \(y=(y_1,y_2,\ldots,y_n) \in \R^n\) define

\begin{equation*} x \cdot y := \sum_{j=1}^n x_j\, y_j . \end{equation*}

The dot product is linear in each variable separately, or in more fancy language it is bilinear. That is, if \(y\) is fixed, the map \(x \mapsto x \cdot y\) is a linear map from \(\R^n\) to \(\R\text{.}\) Similarly, if \(x\) is fixed, then \(y \mapsto x \cdot y\) is linear. It is also symmetric in the sense that \(x \cdot y = y \cdot x\text{.}\) The euclidean norm is defined as

\begin{equation*} \snorm{x} := \snorm{x}_{\R^n} := \sqrt{x \cdot x} = \sqrt{(x_1)^2+(x_2)^2 + \cdots + (x_n)^2}. \end{equation*}

We normally just use \(\snorm{x}\text{,}\) only in the rare instance when it is necessary to emphasize that we are talking about the euclidean norm will we use \(\snorm{x}_{\R^n}\text{.}\) It is easy to see that the euclidean norm satisfies i and ii. To prove that iii holds, the key inequality is the so-called Cauchy–Schwarz inequality we saw before. As this inequality is so important, we restate and reprove a slightly stronger version using the notation of this chapter.


If \(x=0\) or \(y = 0\text{,}\) then the theorem holds trivially. So assume \(x\not= 0\) and \(y \not= 0\text{.}\)

If \(x\) is a scalar multiple of \(y\text{,}\) that is \(x = \lambda y\) for some \(\lambda \in \R\text{,}\) then the theorem holds with equality:

\begin{equation*} \sabs{ x \cdot y } = \sabs{\lambda y \cdot y} = \sabs{\lambda} \, \sabs{y\cdot y} = \sabs{\lambda} \, \snorm{y}^2 = \snorm{\lambda y} \, \snorm{y} = \snorm{x} \, \snorm{y} . \end{equation*}

Fixing \(x\) and \(y\text{,}\) as a function of \(t\text{,}\) \(\snorm{x+ty}^2\) is a quadratic polynomial:

\begin{equation*} \snorm{x+ty}^2 = (x+ty) \cdot (x+ty) = x \cdot x + x \cdot ty + ty \cdot x + ty \cdot ty = \snorm{x}^2 + 2t(x \cdot y) + t^2 \snorm{y}^2 . \end{equation*}

If \(x\) is not a scalar multiple of \(y\text{,}\) then \(\snorm{x+ty}^2 > 0\) for all \(t\text{.}\) So the polynomial \(\snorm{x+ty}^2\) is never zero. Elementary algebra says that the discriminant must be negative:

\begin{equation*} 4 {(x \cdot y)}^2 - 4 \snorm{x}^2\snorm{y}^2 < 0, \end{equation*}

or in other words, \({(x \cdot y)}^2 < \snorm{x}^2\snorm{y}^2\text{.}\)

Item iii, the triangle inequality in \(\R^n\text{,}\) follows from:

\begin{equation*} \snorm{x+y}^2 = x \cdot x + y \cdot y + 2 (x \cdot y) \leq \snorm{x}^2 + \snorm{y}^2 + 2 \bigl(\snorm{x} \,\snorm{y}\bigr) = {\bigl(\snorm{x} + \snorm{y}\bigr)}^2 . \end{equation*}

The distance \(d(x,y) := \snorm{x-y}\) is the standard distance (standard metric) on \(\R^n\) that we used when we talked about metric spaces.

Definition 8.2.3.

Let \(A \in L(X,Y)\text{.}\) Define

\begin{equation*} \snorm{A} := \sup \bigl\{ \snorm{Ax} : x \in X \text{ with } \snorm{x} = 1 \bigr\} . \end{equation*}

The number \(\snorm{A}\) (possibly \(\infty\)) is called the operator norm. We will see below that it is indeed a norm for finite-dimensional spaces. Again, when necessary to emphasize which norm we are talking about, we may write it as \(\snorm{A}_{L(X,Y)}\text{.}\)

For example, if \(X=\R^1\) with norm \(\snorm{x}=\sabs{x}\text{,}\) we think of elements of \(L(X)\) as multiplication by scalars: \(x \mapsto ax\text{.}\) If \(\snorm{x}=\sabs{x}=1\text{,}\) then \(\sabs{a x} = \sabs{a}\text{,}\) so the operator norm of \(a\) is \(\sabs{a}\text{.}\)

By linearity, \(\norm{A \frac{x}{\snorm{x}}} = \frac{\snorm{Ax}}{\snorm{x}}\) for all nonzero \(x \in X\text{.}\) The vector \(\frac{x}{\snorm{x}}\) is of norm 1. Therefore,

\begin{equation*} \snorm{A} = \sup \bigl\{ \snorm{Ax} : x \in X \text{ with } \snorm{x} = 1 \bigr\} = \sup_{\substack{x \in X\\x\neq 0}} \frac{\snorm{Ax}}{\snorm{x}} . \end{equation*}

This implies, assuming \(\snorm{A}\) is not infinity, that for every \(x \in X\text{,}\)

\begin{equation*} \snorm{Ax} \leq \snorm{A} \snorm{x} . \end{equation*}

We will use this inequality a lot. It is not hard to see from the definition that \(\snorm{A} = 0\) if and only if \(A = 0\text{,}\) where by \(A=0\) we mean that \(A\) takes every vector to the zero vector. It is also not difficult to compute the operator norm of the identity operator:

\begin{equation*} \snorm{I} = \sup_{\substack{x \in X\\x\neq 0}} \frac{\snorm{Ix}}{\snorm{x}} = \sup_{\substack{x \in X\\x\neq 0}} \frac{\snorm{x}}{\snorm{x}} = 1. \end{equation*}

The operator norm is not always a norm on \(L(X,Y)\text{,}\) in particular, \(\snorm{A}\) is not always finite for \(A \in L(X,Y)\text{.}\) We prove below that \(\snorm{A}\) is finite when \(X\) is finite-dimensional. The operator norm being finite is equivalent to \(A\) being continuous. For infinite-dimensional spaces, neither statement needs to be true. For an example, consider the vector space of continuously differentiable functions on \([0,2\pi]\) using the uniform norm. The functions \(t \mapsto \sin(n t)\) have norm 1, but their derivatives have norm \(n\text{.}\) So differentiation, which is a linear operator valued in the space of continuous functions, has infinite operator norm on this space. We will stick to finite-dimensional spaces.

When we talk about a finite-dimensional vector space \(X\text{,}\) we often think of \(\R^n\text{,}\) although if we have a norm on \(X\text{,}\) the norm might not be the standard euclidean norm. In the exercises, you can prove that every norm is “equivalent” to the euclidean norm in that the topology it generates is the same. For simplicity, we only prove the following proposition for the euclidean space, and the proof for a general finite-dimensional space is left as an exercise.


As we said we only prove the proposition for euclidean space, so suppose that \(X = \R^n\) and the norm is the standard euclidean norm. The general case is left as an exercise.

Let \(\{ e_1,e_2,\ldots,e_n \}\) be the standard basis of \(\R^n\text{.}\) Write \(x \in \R^n\text{,}\) with \(\snorm{x} = 1\text{,}\) as

\begin{equation*} x = \sum_{j=1}^n c_j \, e_j . \end{equation*}

Since \(e_j \cdot e_\ell = 0\) whenever \(j\not=\ell\) and \(e_j \cdot e_j = 1\text{,}\) then \(c_j = x \cdot e_j\) and by Cauchy–Schwarz

\begin{equation*} \sabs{c_j} = \sabs{ x \cdot e_j } \leq \snorm{x} \, \snorm{e_j} = 1 . \end{equation*}


\begin{equation*} \snorm{Ax} = \norm{\sum_{j=1}^n c_j \, Ae_j} \leq \sum_{j=1}^n \sabs{c_j} \, \snorm{Ae_j} \leq \sum_{j=1}^n \snorm{Ae_j} . \end{equation*}

The right-hand side does not depend on \(x\text{.}\) We found a finite upper bound for \(\snorm{Ax}\) independent of \(x\text{,}\) so \(\snorm{A} < \infty\text{.}\)

For any normed vector spaces \(X\) and \(Y\text{,}\) and \(A \in L(X,Y)\text{,}\) suppose that \(\snorm{A} < \infty\text{.}\) For \(v,w \in X\text{,}\)

\begin{equation*} \snorm{Av - Aw} = \snorm{A(v-w)} \leq \snorm{A} \, \snorm{v-w} . \end{equation*}

As \(\snorm{A} < \infty\text{,}\) then this says \(A\) is Lipschitz with constant \(\snorm{A}\text{.}\)


First, since all the spaces are finite-dimensional, then all the operator norms are finite, and the statements make sense to begin with.

For i,

\begin{equation*} \snorm{(A+B)x} = \snorm{Ax+Bx} \leq \snorm{Ax}+\snorm{Bx} \leq \snorm{A} \, \snorm{x}+\snorm{B} \,\snorm{x} = \bigl(\snorm{A}+\snorm{B}\bigr) \snorm{x} . \end{equation*}

So \(\snorm{A+B} \leq \snorm{A}+\snorm{B}\text{.}\)


\begin{equation*} \snorm{(cA)x} = \sabs{c} \, \snorm{Ax} \leq \bigl(\sabs{c} \,\snorm{A}\bigr) \snorm{x} . \end{equation*}

Thus \(\snorm{cA} \leq \sabs{c} \, \snorm{A}\text{.}\) Next,

\begin{equation*} \sabs{c} \, \snorm{Ax} = \snorm{cAx} \leq \snorm{cA} \, \snorm{x} . \end{equation*}

Hence \(\sabs{c} \, \snorm{A} \leq \snorm{cA}\text{.}\)

For ii write

\begin{equation*} \snorm{BAx} \leq \snorm{B} \, \snorm{Ax} \leq \snorm{B} \, \snorm{A} \, \snorm{x} . \qedhere \end{equation*}

As a norm defines a metric, there is a metric space topology on \(L(X,Y)\) for finite-dimensional vector spaces, so we can talk about open/closed sets, continuity, and convergence.

Let us make sense of this proposition on a simple example. Consider \(X=\R^1\text{,}\) where linear operators are just numbers \(a\) and the operator norm of \(a\) is \(\sabs{a}\text{.}\) The operator \(a\) is invertible (\(a^{-1} = \nicefrac{1}{a}\)) whenever \(a \not=0\text{.}\) The condition \(\sabs{a-b} < \frac{1}{\sabs{a^{-1}}}\) does indeed imply that \(b\) is not zero. And \(a \mapsto \nicefrac{1}{a}\) is a continuous map. When the dimension is bigger than \(1\text{,}\) there are other noninvertible operators than just zero, and in general things are a bit more difficult.


Let us prove i. We know something about \(A^{-1}\) and \(A-B\text{;}\) they are linear operators. So apply them to a vector:

\begin{equation*} A^{-1}(A-B)x = x-A^{-1}Bx . \end{equation*}


\begin{equation*} \begin{split} \snorm{x} & = \snorm{A^{-1} (A-B)x + A^{-1}Bx} \\ & \leq \snorm{A^{-1}}\,\snorm{A-B}\, \snorm{x} + \snorm{A^{-1}}\,\snorm{Bx} . \end{split} \end{equation*}

Assume \(x \neq 0\) and so \(\snorm{x} \neq 0\text{.}\) Using (8.2), we obtain

\begin{equation*} \snorm{x} < \snorm{x} + \snorm{A^{-1}} \, \snorm{Bx} . \end{equation*}

In other words, \(\snorm{Bx} \not= 0\) for all nonzero \(x\text{,}\) and hence \(Bx \not= 0\) for all nonzero \(x\text{.}\) This is enough to see that \(B\) is one-to-one (if \(Bx = By\text{,}\) then \(B(x-y) = 0\text{,}\) so \(x=y\)). As \(B\) is one-to-one operator from \(X\) to \(X\text{,}\) which is finite-dimensional and hence also onto by Proposition 8.1.18, \(B\) is invertible.

Let us prove ii. Item i immediately implies that \(GL(X)\) is open. Let us show that the inverse is continuous. Fix some \(A \in GL(X)\text{.}\) Let \(B\) be near \(A\text{,}\) specifically \(\snorm{A-B} < \frac{1}{2 \snorm{A^{-1}}}\text{.}\) Then (8.2) is satisfied and \(B\) is invertible. A similar computation as above (using \(B^{-1}y\) instead of \(x\)) gives

\begin{equation*} \snorm{B^{-1}y} \leq \snorm{A^{-1}} \, \snorm{A-B} \, \snorm{B^{-1}y} + \snorm{A^{-1}} \, \snorm{y} \leq \frac{1}{2} \snorm{B^{-1}y} + \snorm{A^{-1}}\,\snorm{y} , \end{equation*}


\begin{equation*} \snorm{B^{-1}y} \leq 2\snorm{A^{-1}}\,\snorm{y} . \end{equation*}

So \(\snorm{B^{-1}} \leq 2 \snorm{A^{-1}} \text{.}\)


\begin{equation*} A^{-1}(A-B)B^{-1} = A^{-1}(AB^{-1}-I) = B^{-1}-A^{-1} , \end{equation*}


\begin{equation*} \snorm{B^{-1}-A^{-1}} = \snorm{A^{-1}(A-B)B^{-1}} \leq \snorm{A^{-1}}\,\snorm{A-B}\,\snorm{B^{-1}} \leq 2\snorm{A^{-1}}^2 \snorm{A-B} . \end{equation*}

Therefore, as \(B\) tends to \(A\text{,}\) \(\snorm{B^{-1}-A^{-1}}\) tends to 0, and so the inverse operation is a continuous function at \(A\text{.}\)

Subsection 8.2.2 Matrices

Once we fix a basis in a finite-dimensional vector space \(X\text{,}\) we can represent a vector of \(X\) as an \(n\)-tuple of numbers—a vector in \(\R^n\text{.}\) Same can be done with \(L(X,Y)\text{,}\) bringing us to matrices, which are a convenient way to represent finite-dimensional linear transformations. Suppose \(\{ x_1, x_2, \ldots, x_n \}\) and \(\{ y_1, y_2, \ldots, y_m \}\) are bases for vector spaces \(X\) and \(Y\) respectively. A linear operator is determined by its values on the basis. Given \(A \in L(X,Y)\text{,}\) \(A x_j\) is an element of \(Y\text{.}\) Define the numbers \(a_{i,j}\) as follows

\begin{equation} A x_j = \sum_{i=1}^m a_{i,j} \, y_i ,\tag{8.3} \end{equation}

and write them as a matrix

\begin{equation*} A = \begin{bmatrix} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m,1} & a_{m,2} & \cdots & a_{m,n} \end{bmatrix} . \end{equation*}

We sometimes write \(A\) as \([a_{i,j}]\text{.}\) We say \(A\) is an \(m\)-by-\(n\) matrix. The \(j\)th column of the matrix gives precisely the coefficients that represent \(A x_j\) in terms of the basis \(\{ y_1,y_2,\ldots,y_m \}\text{.}\) If we know the numbers \(a_{i,j}\text{,}\) then via the formula (8.3) we find the corresponding linear operator, as it is determined by the action on a basis. Hence, once we fix a basis on \(X\) and on \(Y\text{,}\) we have a one-to-one correspondence between \(L(X,Y)\) and the \(m\)-by-\(n\) matrices.


\begin{equation*} z = \sum_{j=1}^n c_j \, x_j , \end{equation*}


\begin{equation*} A z = \sum_{j=1}^n c_j \, A x_j = \sum_{j=1}^n c_j \left( \sum_{i=1}^m a_{i,j}\, y_i \right) = \sum_{i=1}^m \left(\sum_{j=1}^n a_{i,j}\, c_j \right) y_i , \end{equation*}

which gives rise to the familiar rule for matrix multiplication.

More generally, if \(B\) is an \(n\)-by-\(r\) matrix with entries \(b_{j,k}\text{,}\) then the matrix for \(C = AB\) is an \(m\)-by-\(r\) matrix whose \((i,k)\)th entry \(c_{i,k}\) is

\begin{equation*} c_{i,k} = \sum_{j=1}^n a_{i,j}\,b_{j,k} . \end{equation*}

A way to remember it is if you order the indices as we do, that is row, column, and put the elements in the same order as the matrices, then it is the “middle index” that is “summed-out.”

There is a one-to-one correspondence between matrices and linear operators in \(L(X,Y)\text{,}\) once we fix a basis in \(X\) and in \(Y\text{.}\) If we choose a different basis, we get different matrices. This is an important distinction, the operator \(A\) acts on elements of \(X\text{,}\) the matrix is something that works with \(n\)-tuples of numbers, that is, vectors of \(\R^n\text{.}\) By convention, we use standard bases in \(\R^n\) unless otherwise specified, and we identify \(L(\R^n,\R^m)\) with the set of \(m\)-by-\(n\) matrices.

A linear mapping changing one basis to another is represented by a square matrix in which the columns represent vectors of the second basis in terms of the first basis. We call such a linear mapping a change of basis. So for two choices of a basis in an \(n\)-dimensional vector space, there is a linear mapping (a change of basis) taking one basis to the other, and this corresponds to an \(n\)-by-\(n\) matrix which does the corresponding operation on \(\R^n\text{.}\)

Suppose \(X=\R^n\text{,}\) \(Y=\R^m\text{,}\) and all the bases are just the standard bases. Using the Cauchy–Schwarz inequality compute

\begin{equation*} \snorm{Az}^2 = \sum_{i=1}^m { \left(\sum_{j=1}^n a_{i,j} \, c_j \right)}^2 \leq \sum_{i=1}^m { \left(\sum_{j=1}^n {(c_j)}^2 \right) \left(\sum_{j=1}^n {(a_{i,j})}^2 \right) } = \left( \sum_{i=1}^m \sum_{j=1}^n {(a_{i,j})}^2 \right) \snorm{z}^2 . \end{equation*}

In other words, we have a bound on the operator norm (note that equality rarely happens)

\begin{equation*} \snorm{A} \leq \sqrt{\sum_{i=1}^m \sum_{j=1}^n {(a_{i,j})}^2} . \end{equation*}

The right hand side is the euclidean norm on \(\R^{nm}\text{,}\) the space of all the entries of the matrix. If the entries go to zero, then \(\snorm{A}\) goes to zero. Conversely,

\begin{equation*} \sum_{i=1}^m \sum_{j=1}^n {(a_{i,j})}^2 = \sum_{j=1}^n \snorm{A e_j}^2 \leq \sum_{j=1}^n \snorm{A}^2 = n \snorm{A}^2 . \end{equation*}

So if the operator norm of \(A\) goes to zero, so do the entries. In particular, if \(A\) is fixed and \(B\) is changing, then the entries of \(B\) go to the entries of \(A\) if and only if \(B\) goes to \(A\) in operator norm (\(\snorm{A-B}\) goes to zero). We have proved:

Subsection 8.2.3 Determinants

A certain number can be assigned to square matrices that measures how the corresponding linear mapping stretches space. In particular, this number, called the determinant, can be used to test for invertibility of a matrix.

Define the symbol \(\operatorname{sgn}(x)\) (read “sign of \(x\)”) for a number \(x\) by

\begin{equation*} \operatorname{sgn}(x) := \begin{cases} -1 & \text{if } x < 0 , \\ 0 & \text{if } x = 0 , \\ 1 & \text{if } x > 0 . \end{cases} \end{equation*}

Suppose \(\sigma = (\sigma_1,\sigma_2,\ldots,\sigma_n)\) is a permutation of the integers \((1,2,\ldots,n)\text{,}\) that is, a reordering of \((1,2,\ldots,n)\text{.}\) Define

\begin{equation} \operatorname{sgn}(\sigma) = \operatorname{sgn}(\sigma_1,\ldots,\sigma_n) := \prod_{p < q} \operatorname{sgn}(\sigma_q-\sigma_p) .\tag{8.4} \end{equation}

Here \(\prod\) stands for multiplication, similarly to how \(\sum\) stands for summation.

Any permutation can be obtained by a sequence of transpositions (switchings of two elements). A permutation is even (resp. odd) if it takes an even (resp. odd) number of transpositions to get from \((1,2,\ldots,n)\) to \(\sigma\text{.}\) For instance, \((2,4,3,1)\) is two transpositions away from \((1,2,3,4)\) and is therefore even: \((1,2,3,4) \to (2,1,3,4) \to (2,4,3,1)\text{.}\) Being even or odd is well-defined: \(\operatorname{sgn}(\sigma)\) is \(1\) if \(\sigma\) is even and \(-1\) if \(\sigma\) is odd (exercise). This fact can be proved by noting that applying a transposition changes the sign, and computing that \(\operatorname{sgn}(1,2,\ldots,n) = 1\text{.}\)

Let \(S_n\) be the set of all permutations on \(n\) elements (the symmetric group). Let \(A= [a_{i,j}]\) be a square \(n\)-by-\(n\) matrix. Define the determinant of \(A\) as

\begin{equation*} \det(A) := \sum_{\sigma \in S_n} \operatorname{sgn} (\sigma) \prod_{i=1}^n a_{i,\sigma_i} . \end{equation*}

In fact, the determinant is the unique function that satisfies i, ii, and iii. But we digress. By ii, we mean that if we fix all the vectors \(x_1,\ldots,x_n\) except for \(x_j\text{,}\) and let \(v,w \in \R^n\) be two vectors, and \(a,b \in \R\) be scalars, then

\begin{multline*} \det\bigl([x_1 ~~~ \cdots ~~~ x_{j-1} ~~~ (av+bw) ~~~ x_{j+1} ~~~ \cdots ~~~ x_n]\bigr) = \\ a \det\bigl([x_1 ~~~ \cdots ~~~ x_{j-1} ~~~ v ~~~ x_{j+1} ~~~ \cdots ~~~ x_n]\bigr) + b \det\bigl([x_1 ~~~ \cdots ~~~ x_{j-1} ~~~ w ~~~ x_{j+1} ~~~ \cdots ~~~ x_n]\bigr) . \end{multline*}


We go through the proof quickly, as you have likely seen it before. Item i is trivial. For ii, notice that each term in the definition of the determinant contains exactly one factor from each column. Item iii follows by noting that switching two columns is like switching the two corresponding numbers in every element in \(S_n\text{.}\) Hence, all the signs are changed. Item iv follows because if two columns are equal, and we switch them, we get the same matrix back. So item iii says the determinant must be 0. Item v follows because the product in each term in the definition includes one element from the zero column. Item vi follows as \(\det\) is a polynomial in the entries of the matrix and hence continuous (as a function of the entries of the matrix). Two matrices are A function defined on matrices is continuous in the operator norm if and only if it is continuous as a function of the entries (Proposition 8.2.7). Finally, item vii is a direct computation.

The determinant tells us about areas and volumes, and how they change. For example, in the \(1\)-by-\(1\) case, a matrix is just a number, and the determinant is exactly this number. It says how the linear mapping “stretches” the space. Similarly for \(\R^2\text{.}\) Suppose \(A \in L(\R^2)\) is a linear transformation. It can be checked directly that the area of the image of the unit square \(A\bigl([0,1]^2\bigr)\) is precisely \(\sabs{\det(A)}\text{.}\) This works with arbitrary figures, not just the unit square: The absolute value of the determinant tells us the stretch in the area. The sign of the determinant tells us if the image is flipped (changes orientation) or not. In \(\R^3\) it tells us about the 3-dimensional volume, and in \(n\) dimensions about the \(n\)-dimensional volume. We claim this without proof.


Let \(b_1,b_2,\ldots,b_n\) be the columns of \(B\text{.}\) Then

\begin{equation*} AB = [ Ab_1 \quad Ab_2 \quad \cdots \quad Ab_n ] . \end{equation*}

That is, the columns of \(AB\) are \(Ab_1,Ab_2,\ldots,Ab_n\text{.}\)

Let \(b_{j,k}\) denote the elements of \(B\) and \(a_j\) the columns of \(A\text{.}\) By linearity of the determinant,

\begin{equation*} \begin{split} \det(AB) & = \det \bigl([ Ab_1 \quad Ab_2 \quad \cdots \quad Ab_n ] \bigr) = \det \left(\left[ \sum_{j=1}^n b_{j,1} a_j \quad Ab_2 \quad \cdots \quad Ab_n \right]\right) \\ & = \sum_{j=1}^n b_{j,1} \det \bigl([ a_j \quad Ab_2 \quad \cdots \quad Ab_n ]\bigr) \\ & = \sum_{1 \leq j_1,j_2,\ldots,j_n \leq n} b_{j_1,1} b_{j_2,2} \cdots b_{j_n,n} \det \bigl([ a_{j_1} \quad a_{j_2} \quad \cdots \quad a_{j_n} ]\bigr) \\ & = \left( \sum_{(j_1,j_2,\ldots,j_n) \in S_n} b_{j_1,1} b_{j_2,2} \cdots b_{j_n,n} \operatorname{sgn}(j_1,j_2,\ldots,j_n) \right) \det \bigl([ a_{1} \quad a_{2} \quad \cdots \quad a_{n} ]\bigr) . \end{split} \end{equation*}

In the last equality, we can sum over just the elements of \(S_n\) instead of all \(n\)-tuples for integers between 1 and \(n\) by noting that when two columns in the determinant are the same, then the determinant is zero. Then we reordered the columns to the original ordering to obtain the sgn.

The conclusion that \(\det(AB) = \det(A)\det(B)\) follows by recognizing that the expression in parentheses above is the determinant of \(B\text{.}\) We obtain this by plugging in \(A=I\text{.}\) The expression we get for the determinant of \(B\) has rows and columns swapped, so as a side note, we have also just proved that the determinant of a matrix and its transpose are equal.

Let us prove the “Furthermore” part. If \(A\) is invertible, then \(A^{-1}A = I\) and consequently \(\det(A^{-1})\det(A) = \det(A^{-1}A) = \det(I) = 1\text{.}\) If \(A\) is not invertible, then there must be a nonzero vector that \(A\) takes to zero as \(A\) is not one-to-one. In other words, the columns of \(A\) are linearly dependent. Suppose

\begin{equation*} \sum_{j=1}^n \gamma_j\, a_j = 0 , \end{equation*}

where not all \(\gamma_j\) are equal to 0. Without loss of generality suppose \(\gamma_1\neq 0\text{.}\) Take

\begin{equation*} B := \begin{bmatrix} \gamma_1 & 0 & 0 & \cdots & 0 \\ \gamma_2 & 1 & 0 & \cdots & 0 \\ \gamma_3 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \gamma_n & 0 & 0 & \cdots & 1 \end{bmatrix} . \end{equation*}

Using the definition of the determinant (there is only a single permutation \(\sigma\) for which \(\prod_{i=1}^n b_{i,\sigma_i}\) is nonzero) we find \(\det(B) = \gamma_1 \not= 0\text{.}\) Then \(\det(AB) = \det(A)\det(B) = \gamma_1\det(A)\text{.}\) The first column of \(AB\) is zero, and hence \(\det(AB) = 0\text{.}\) We conclude \(\det(A) = 0\text{.}\)

The proof is to compute \(\det(B^{-1}AB) = \det(B^{-1})\det(A)\det(B) = \frac{1}{\det(B)}\det(A)\det(B) = \det(A)\text{.}\)

If in one basis \(A\) is the matrix representing a linear operator, then for another basis we can find a matrix \(B\) such that the matrix \(B^{-1}AB\) takes us to the first basis, applies \(A\) in the first basis, and takes us back to the basis we started with. Let \(X\) be a finite-dimensional vector space. Let \(\Phi \in L(X,\R^n)\) take a basis \(\{ x_1,\ldots,x_n \}\) to the standard basis \(\{ e_1,\ldots,e_n \}\) and let \(\Psi \in L(X,\R^n)\) take another basis \(\{ y_1, \ldots, y_n \}\) to the standard basis. Let \(T \in L(X)\) be a linear operator and let a matrix \(A\) represent the operator in the basis \(\{ x_1,\ldots,x_n \}\text{.}\) Then \(B\) would be such that we have the following diagram 3 : The two \(\R^n\)s on the bottom row represent \(X\) in the first basis, and the \(\R^n\)s on top represent \(X\) in the second basis.

If we compute the determinant of the matrix \(A\text{,}\) we obtain the same determinant if we use any other basis; in the other basis the matrix would be \(B^{-1}AB\text{.}\) Consequently,

\begin{equation*} \det \colon L(X) \to \R \end{equation*}

is a well-defined function without the need to fix a basis. That is, \(\det\) is defined on \(L(X)\text{,}\) not just on matrices.

There are three types of so-called elementary matrices. Let \(e_1,e_2,\ldots,e_n\) be the standard basis on \(\R^n\) as usual.

First, for \(j = 1,2,\ldots,n\) and \(\lambda \in \R\text{,}\) \(\lambda \neq 0\text{,}\) define the first type of an elementary matrix, an \(n\)-by-\(n\) matrix \(E\) by

\begin{equation*} Ee_i := \begin{cases} e_i & \text{if } i \neq j , \\ \lambda e_i & \text{if } i = j . \end{cases} \end{equation*}

Given any \(n\)-by-\(m\) matrix \(M\) the matrix \(EM\) is the same matrix as \(M\) except with the \(j\)th row multiplied by \(\lambda\text{.}\) It is an easy computation (exercise) that \(\det(E) = \lambda\text{.}\)

Next, for \(j\) and \(k\) with \(j\neq k\text{,}\) and \(\lambda \in \R\text{,}\) define the second type of an elementary matrix \(E\) by

\begin{equation*} Ee_i := \begin{cases} e_i & \text{if } i \neq j , \\ e_i + \lambda e_k & \text{if } i = j . \end{cases} \end{equation*}

Given any \(n\)-by-\(m\) matrix \(M\) the matrix \(EM\) is the same matrix as \(M\) except with \(\lambda\) times the \(k\)th row added to the \(j\)th row. It is an easy computation (exercise) that \(\det(E) = 1\text{.}\)

Finally, for \(j\) and \(k\) with \(j\neq k\text{,}\) define the third type of an elementary matrix \(E\) by

\begin{equation*} Ee_i := \begin{cases} e_i & \text{if } i \neq j \text{ and } i \neq k , \\ e_k & \text{if } i = j , \\ e_j & \text{if } i = k . \end{cases} \end{equation*}

Given any \(n\)-by-\(m\) matrix \(M\) the matrix \(EM\) is the same matrix with \(j\)th and \(k\)th rows swapped. It is an easy computation (exercise) that \(\det(E) = -1\text{.}\)

The proof is left as an exercise. The proposition says that we can compute the determinant by doing elementary row operations. For computing the determinant, one does not have to factor the matrix into a product of elementary matrices completely. One only does row operations until one finds an upper triangular matrix, that is, a matrix \([a_{i,j}]\) where \(a_{i,j} = 0\) if \(i > j\text{.}\) Computing determinant of such a matrix is not difficult (exercise).

Factorization into elementary matrices (or variations on elementary matrices) is useful in proofs involving an arbitrary linear operator, by reducing to a proof for an elementary matrix, similarly as the computation of the determinant.

Subsection 8.2.4 Exercises

Exercise 8.2.1.

For a vector space \(X\) with a norm \(\snorm{\cdot}\text{,}\) show that \(d(x,y) := \snorm{x-y}\) makes \(X\) a metric space.

Exercise 8.2.2.

(Easy)   Show that for square matrices \(A\) and \(B\text{,}\) \(\det(AB) = \det(BA)\text{.}\)

Exercise 8.2.3.

For \(x \in \R^n\text{,}\) define

\begin{equation*} \snorm{x}_{\infty} := \max \bigl\{ \sabs{x_1}, \sabs{x_2}, \ldots, \sabs{x_n} \bigr\} , \end{equation*}

sometimes called the sup or the max norm.

  1. Show that \(\snorm{\cdot}_\infty\) is a norm on \(\R^n\) (defining a different distance).

  2. What is the unit ball \(B(0,1)\) in this norm?

Exercise 8.2.4.

For \(x \in \R^n\text{,}\) define

\begin{equation*} \snorm{x}_{1} := \sum_{j=1}^n \sabs{x_j}, \end{equation*}

sometimes called the \(1\)-norm (or \(L^1\) norm).

  1. Show that \(\snorm{\cdot}_1\) is a norm on \(\R^n\) (defining a different distance, sometimes called the taxicab distance).

  2. What is the unit ball \(B(0,1)\) in this norm? Think about what it is in \({\mathbb{R}}^2\) and \(\mathbb{R}^3\text{.}\) Hint: It is, for example, a convex hull of a finite number of points.

Exercise 8.2.5.

Using the euclidean norm on \(\R^2\text{,}\) compute the operator norm of the operators in \(L(\R^2)\) given by the matrices:

a) \(\left[ \begin{smallmatrix} 1 & 0 \\ 0 & 2 \end{smallmatrix} \right]\)     b) \(\left[ \begin{smallmatrix} 0 & 1 \\ -1 & 0 \end{smallmatrix} \right]\)     c) \(\left[ \begin{smallmatrix} 1 & 1 \\ 0 & 1 \end{smallmatrix} \right]\)     d) \(\left[ \begin{smallmatrix} 0 & 1 \\ 0 & 0 \end{smallmatrix} \right]\)

Exercise 8.2.6.

Using the standard euclidean norm \(\R^n\text{,}\) show:

  1. Suppose \(A \in L(\R,\R^n)\) is defined for \(x \in \R\) by \(Ax := xa\) for a vector \(a \in \R^n\text{.}\) Then the operator norm \(\snorm{A}_{L(\R,\R^n)} = \snorm{a}_{\R^n}\text{.}\) (That is, the operator norm of \(A\) is the euclidean norm of \(a\text{.}\))

  2. Suppose \(B \in L(\R^n,\R)\) is defined for \(x \in \R^n\) by \(Bx := b \cdot x\) for a vector \(b \in \R^n\text{.}\) Then the operator norm \(\snorm{B}_{L(\R^n,\R)} = \snorm{b}_{\R^n}\text{.}\)

Exercise 8.2.7.

Suppose \(\sigma = (\sigma_1,\sigma_2,\ldots,\sigma_n)\) is a permutation of \((1,2,\ldots,n)\text{.}\)

  1. Show that we can make a finite number of transpositions (switching of two elements) to get to \((1,2,\ldots,n)\text{.}\)

  2. Using the definition (8.4) show that \(\sigma\) is even if \(\operatorname{sgn}(\sigma) = 1\) and \(\sigma\) is odd if \(\operatorname{sgn}(\sigma) = -1\text{.}\) In particular, showing that being odd or even is well-defined.

Exercise 8.2.8.

Verify the computation of the determinant for the three types of elementary matrices.

Exercise 8.2.10.

  1. Suppose \(D = [d_{i,j}]\) is an \(n\)-by-\(n\) diagonal matrix, that is, \(d_{i,j} = 0\) whenever \(i \not= j\text{.}\) Show that \(\det(D) = d_{1,1}d_{2,2} \cdots d_{n,n}\text{.}\)

  2. Suppose \(A\) is a diagonalizable matrix. That is, there exists a matrix \(B\) such that \(B^{-1}AB = D\) for a diagonal matrix \(D = [d_{i,j}]\text{.}\) Show that \(\det(A) = d_{1,1}d_{2,2} \cdots d_{n,n}\text{.}\)

Exercise 8.2.11.

Take the vector space of polynomials \(\R[t]\) and the linear operator \(D \in L(\R[t])\) that is the differentiation (we proved in an earlier exercise that \(D\) is a linear operator). Given \(P(t) = c_0 + c_1 t + \cdots + c_n t^n \in \R[t]\) define \(\snorm{P} := \sup \bigl\{ \sabs{c_j} : j = 0,1,2,\ldots,n \bigr\}\text{.}\)

  1. Show that \(\snorm{P}\) is a norm on \(\R[t]\text{.}\)

  2. Show that \(D\) does not have bounded operator norm, that is \(\snorm{D} = \infty\text{.}\) Hint: Consider the polynomials \(t^n\) as \(n\) tends to infinity.

Exercise 8.2.12.

We finish the proof of Proposition 8.2.4. Let \(X\) be a finite-dimensional normed vector space with basis \(\{ x_1,x_2,\ldots,x_n \}\text{.}\)

  1. Show that \(f \colon \R^n \to \R\text{,}\)

    \begin{equation*} f(c_1,c_2,\ldots,c_n) := \snorm{c_1 x_1 + c_2 x_2 + \cdots + c_n x_n} , \end{equation*}

    is continuous (the norm is the one on \(X\)).

  2. Show that there exist numbers \(m\) and \(M\) such that if \(c = (c_1,c_2,\ldots,c_n) \in \R^n\) with \(\snorm{c} = 1\) (standard euclidean norm), then \(m \leq \snorm{c_1 x_1 + c_2 x_2 + \cdots + c_n x_n} \leq M\) (here the norm is on \(X\)).

  3. Show that there exists a number \(B\) such that if \(\snorm{c_1 x_1 + c_2 x_2 + \cdots + c_n x_n}=1\text{,}\) then \(\sabs{c_j} \leq B\text{.}\)

  4. Use part c) to show that if \(X\) is finite-dimensional vector spaces and \(A \in L(X,Y)\text{,}\) then \(\snorm{A} < \infty\text{.}\)

Exercise 8.2.13.

Let \(X\) be a finite-dimensional vector space with norm \(\snorm{\cdot}\) and basis \(\{ x_1,x_2,\ldots,x_n \}\text{.}\) Let \(c = (c_1,c_2,\ldots,c_n) \in \R^n\) and \(\snorm{c}\) be the standard euclidean norm on \(\R^n\text{.}\)

  1. Prove that there exist positive numbers \(m,M > 0\) such that for all \(c \in \R^n\text{,}\)

    \begin{equation*} m \snorm{c} \leq \snorm{c_1 x_1 + c_2 x_2 + \cdots + c_n x_n} \leq M \snorm{c} . \end{equation*}

    Hint: See previous exercise.

  2. Use part a) to show that of \(\snorm{\cdot}_1\) and \(\snorm{\cdot}_2\) are two norms on \(X\text{,}\) then there exist positive numbers \(m,M > 0\) (perhaps different from above) such that for all \(x \in X\text{,}\) we have

    \begin{equation*} m \snorm{x}_1 \leq \snorm{x}_2 \leq M \snorm{x}_1 . \end{equation*}
  3. Show that \(U \subset X\) is open in the metric defined by \(\norm{x-y}_1\) if and only if \(U\) is open in the metric defined by \(\norm{x-y}_2\text{.}\) In particular, convergence of sequences and continuity of functions is the same in either norm.

Exercise 8.2.14.

Let \(A\) be an upper triangular matrix. Find a formula for the determinant of \(A\) in terms of the diagonal entries, and prove that your formula works.

Exercise 8.2.15.

Given an \(n\)-by-\(n\) matrix \(A\text{,}\) prove that \(\sabs{\det(A)} \leq \snorm{A}^n\) (the norm on \(A\) is the operator norm). Hint: One way to do it is to first prove it in the case \(\snorm{A}=1\text{,}\) which means that all columns are of norm 1 or less, then prove that this means that \(\sabs{\det(A)} \leq 1\) using linearity.

Exercise 8.2.16.

Consider Proposition 8.2.6 where \(X=\R^n\) (for all \(n\)) using the euclidean norm.

  1. Prove that the estimate \(\snorm{A-B} < \frac{1}{\snorm{A^{-1}}}\) is the best possible: For every \(A \in GL(\R^n)\text{,}\) find a \(B\) where equality is satisfied and \(B\) is not invertible. Hint: Difficulty is that \(\snorm{A}\snorm{A^{-1}}\) is not always \(1\text{.}\) Prove that a vector \(x_1\) can be completed to a basis \(\{ x_1,\ldots,x_n \}\) such that \(x_1 \cdot x_j = 0\) for \(j \geq 2\text{.}\) For the right \(x_1\text{,}\) make it so that \((A-B)x_j = 0\) for \(j \geq 2\text{.}\)

  2. For every fixed \(A \in GL(\R^n)\text{,}\) let \(\sM\) denote the set of matrices \(B\) such that \(\snorm{A-B} < \frac{1}{\snorm{A^{-1}}}\text{.}\) Prove that while every \(B \in \sM\) is invertible, \(\snorm{B^{-1}}\) is unbounded as a function of \(B\) on \(\sM\text{.}\)

Let \(A\) be an \(n\)-by-\(n\) matrix. A number \(\lambda \in \C\) (possibly complex even for a real matrix) is called an eigenvalue of \(A\) if there is a nonzero (possibly complex) vector \(x \in \C^n\) such that \(Ax = \lambda x\) (the multiplication by complex vectors is the same as for real vectors. In particular, if \(x = a+ib\) for real vectors \(a\) and \(b\text{,}\) and \(A\) is a real matrix, then \(Ax = Aa + i Ab\)). The number

\begin{equation*} \rho(A) := \sup \bigl\{ \abs{\lambda} : \lambda \text{ is an eigenvalue of } A \bigr\} \end{equation*}

is called the spectral radius of \(A\text{.}\) Here \(\abs{\lambda}\) is the complex modulus. We state without proof that at least one eigenvalue always exists, and there are no more than \(n\) distinct eigenvalues of \(A\text{.}\) You can therefore assume that \(0 \leq \rho(A) < \infty\text{.}\) The exercises below hold for complex matrices, but feel free to assume they are real matrices.

Exercise 8.2.17.

Let \(A,S\) be \(n\)-by-\(n\) matrices, where \(S\) is invertible. Prove that \(\lambda\) is an eigenvalue of \(A\text{,}\) if and only if it is an eigenvalue of \(S^{-1}AS\text{.}\) Then prove that \(\rho(S^{-1}AS) = \rho(S)\text{.}\) In particular, \(\rho\) is a well-defined function on \(L(X)\) for every finite-dimensional vector space \(X\text{.}\)

Exercise 8.2.18.

Let \(A\) be an \(n\)-by-\(n\) matrix \(A\text{.}\)

  1. Prove \(\rho(A) \leq \snorm{A}\text{.}\)

  2. For every \(k \in \N\text{,}\) prove \(\rho(A) \leq \snorm{A^k}^{1/k}\text{.}\)

  3. Suppose \(\displaystyle \lim_{k\to\infty} A^k = 0\) (limit in the operator norm). Prove that \(\rho(A) < 1\text{.}\)

Exercise 8.2.19.

We say a set \(C \subset \R^n\) is symmetric if \(x \in C\) implies \(-x \in C\text{.}\)

  1. Let \(\snorm{\cdot}\) be any given norm on \(\R^n\text{.}\) Show that the closed unit ball \(C(0,1)\) (using the metric induced by this norm) is a compact symmetric convex set.

  2. (Challenging) Let \(C \subset \R^n\) be a compact symmetric convex set and \(0 \in C\text{.}\) Show that

    \begin{equation*} \snorm{x} := \inf \left\{ \lambda : \lambda > 0 \text{ and } \frac{x}{\lambda} \in C \right\} \end{equation*}

    is a norm on \(\R^n\text{,}\) and \(C = C(0,1)\) (the closed unit ball) in the metric induced by this norm.

Hint: Feel free to the result of Exercise 8.2.13 part c).

If we strike the “In particular” part and interpret the algebra with infinite operator norms properly, namely decree that \(0\) times \(\infty\) is 0, then this result also holds for infinite-dimensional spaces.
\(GL(X)\) is called the general linear group, that is where the acronym GL comes from.
This is a so-called commutative diagram. Following arrows in any way should end up with the same result.
For a higher quality printout use the PDF versions: or