Skip to main content
Logo image

Section 8.2 Analysis with vector spaces

Note: 3 lectures

Subsection 8.2.1 Norms

Let us start measuring the size of vectors and hence distance.

Definition 8.2.1.

If \(X\) is a vector space, then we say a function \(\snorm{\cdot} \colon X \to \R\) is a norm if
  1. \(\snorm{x} \geq 0\text{,}\) with \(\snorm{x}=0\) if and only if \(x=0\text{.}\)
  2. \(\snorm{cx} = \sabs{c} \, \snorm{x}\) for all \(c \in \R\) and \(x \in X\text{.}\)
  3. \(\snorm{x+y} \leq \snorm{x}+\snorm{y}\) for all \(x,y \in X\)        (triangle inequality).
A vector space equipped with a norm is called a normed vector space.
Given a norm (any norm) on a vector space \(X\text{,}\) define a distance \(d(x,y) \coloneqq \snorm{x-y}\text{,}\) which makes \(X\) into a metric space (exercise). So what you know about metric spaces applies to normed vector spaces. Before defining the standard norm on \(\R^n\text{,}\) we define the standard scalar dot product on \(\R^n\text{.}\) For \(x=(x_1,x_2,\ldots,x_n) \in \R^n\) and \(y=(y_1,y_2,\ldots,y_n) \in \R^n\) define
\begin{equation*} x \cdot y \coloneqq \sum_{k=1}^n x_k\, y_k . \end{equation*}
Dot product is linear in each variable separately—in more fancy language, it is bilinear. That is, if \(y\) is fixed, the map \(x \mapsto x \cdot y\) is a linear map from \(\R^n\) to \(\R\text{.}\) Similarly, if \(x\) is fixed, \(y \mapsto x \cdot y\) is linear. It is symmetric in the sense that \(x \cdot y = y \cdot x\text{.}\) Define the euclidean norm as
\begin{equation*} \snorm{x} \coloneqq \snorm{x}_{\R^n} \coloneqq \sqrt{x \cdot x} = \sqrt{(x_1)^2+(x_2)^2 + \cdots + (x_n)^2}. \end{equation*}
We will normally write \(\snorm{x}\text{,}\) only in the rare instance when it is necessary to emphasize that we are talking about the euclidean norm will we write \(\snorm{x}_{\R^n}\text{.}\) Unless otherwise stated, if we talk about \(\R^n\) as a normed vector space, we mean the standard euclidean norm. It is easy to see that the euclidean norm satisfies i and ii. To prove that iii holds, the key inequality is the so-called Cauchy–Schwarz inequality we saw before. As this inequality is so important, we state and prove a slightly stronger version using the notation of this chapter.

Proof.

If \(x=0\) or \(y = 0\text{,}\) then the theorem holds trivially. So assume \(x\not= 0\) and \(y \not= 0\text{.}\)
If \(x\) is a scalar multiple of \(y\text{,}\) that is, \(x = \lambda y\) for some \(\lambda \in \R\text{,}\) then the theorem holds with equality:
\begin{equation*} \sabs{ x \cdot y } = \sabs{\lambda y \cdot y} = \sabs{\lambda} \, \sabs{y\cdot y} = \sabs{\lambda} \, \snorm{y}^2 = \snorm{\lambda y} \, \snorm{y} = \snorm{x} \, \snorm{y} . \end{equation*}
Fixing \(x\) and \(y\text{,}\) \(\snorm{x+ty}^2\) is a quadratic polynomial as a function of \(t\text{:}\)
\begin{equation*} \snorm{x+ty}^2 = (x+ty) \cdot (x+ty) = x \cdot x + x \cdot ty + ty \cdot x + ty \cdot ty = \snorm{x}^2 + 2t(x \cdot y) + t^2 \snorm{y}^2 . \end{equation*}
If \(x\) is not a scalar multiple of \(y\text{,}\) then \(\snorm{x+ty}^2 > 0\) for all \(t\text{.}\) So the polynomial \(\snorm{x+ty}^2\) is never zero. Elementary algebra says that the discriminant must be negative:
\begin{equation*} 4 {(x \cdot y)}^2 - 4 \snorm{x}^2\snorm{y}^2 < 0. \end{equation*}
In other words, \({(x \cdot y)}^2 < \snorm{x}^2\snorm{y}^2\text{.}\)
Item iii, the triangle inequality in \(\R^n\text{,}\) follows from:
\begin{equation*} \snorm{x+y}^2 = x \cdot x + y \cdot y + 2 (x \cdot y) \leq \snorm{x}^2 + \snorm{y}^2 + 2 \bigl(\snorm{x} \,\snorm{y}\bigr) = {\bigl(\snorm{x} + \snorm{y}\bigr)}^2 . \end{equation*}
The distance \(d(x,y) \coloneqq \snorm{x-y}\) is the standard distance (standard metric) on \(\R^n\) that we used when we talked about metric spaces.

Definition 8.2.3.

Let \(A \in L(X,Y)\text{.}\) Define
\begin{equation*} \snorm{A} \coloneqq \sup \bigl\{ \snorm{Ax} : x \in X \text{ with } \snorm{x} = 1 \bigr\} . \end{equation*}
The number \(\snorm{A}\) (possibly \(\infty\)) is called the operator norm. We will see below that it is indeed a norm on \(L(X,Y)\) for finite-dimensional spaces. Again, when necessary to emphasize which norm we are talking about, we may write it as \(\snorm{A}_{L(X,Y)}\text{.}\)
For example, if \(X=\R^1\) with norm \(\snorm{x}=\sabs{x}\text{,}\) elements of \(L(X)\) are multiplication by scalars, \(x \mapsto ax\text{,}\) and we identify \(a \in \R\) with the corresponding element of \(L(X)\text{.}\) If \(\snorm{x}=\sabs{x}=1\text{,}\) then \(\sabs{a x} = \sabs{a}\text{,}\) so the operator norm of \(a\) is \(\sabs{a}\text{.}\)
By linearity, \(\norm{A \frac{x}{\snorm{x}}} = \frac{\snorm{Ax}}{\snorm{x}}\) for all nonzero \(x \in X\text{.}\) The vector \(\frac{x}{\snorm{x}}\) is of norm 1. Therefore,
\begin{equation*} \snorm{A} = \sup \bigl\{ \snorm{Ax} : x \in X \text{ with } \snorm{x} = 1 \bigr\} = \sup_{\substack{x \in X\\x\neq 0}} \frac{\snorm{Ax}}{\snorm{x}} . \end{equation*}
This implies, assuming \(\snorm{A}\not= \infty\) to avoid a technicality when \(x=0\text{,}\) that for every \(x \in X\text{,}\)
\begin{equation*} \snorm{Ax} \leq \snorm{A} \snorm{x} . \end{equation*}
Conversely, if one shows \(\snorm{Ax} \leq C \snorm{x}\) for all \(x\text{,}\) then \(\snorm{A} \leq C\text{.}\)
It is not hard to see from the definition that \(\snorm{A} = 0\) if and only if \(A = 0\text{,}\) where \(A=0\) means that \(A\) takes every vector to the zero vector. It is also not difficult to compute the operator norm of the identity operator:
\begin{equation*} \snorm{I} = \sup_{\substack{x \in X\\x\neq 0}} \frac{\snorm{Ix}}{\snorm{x}} = \sup_{\substack{x \in X\\x\neq 0}} \frac{\snorm{x}}{\snorm{x}} = 1. \end{equation*}
The operator norm is not always so easy to compute using the definition alone, nor is it easy to read off the form of the operator. Consider \(\R^2\) and the operator \(A \in L(\R^2)\) that takes \((x,y)\) to \((x+y,2x)\text{.}\) Unit norm vectors can be written as \(\bigl(\pm t, \pm \sqrt{1-t^2}\bigr)\) for \(t \in [0,1]\) (or perhaps \(\bigl(\cos(\theta), \sin(\theta) \bigr)\)). One then maximizes
\begin{equation*} \snorm{A(x,y)} = \sqrt{{\left(t \pm \sqrt{1-t^2}\right)}^2+4t^2} \end{equation*}
to find \(\snorm{A} = \sqrt{3+\sqrt{5}}\text{.}\) More generally, one often does two steps. For instance, consider the operator \(B \in L\bigl(C([0,1],\R),\R\bigr)\) taking a continuous \(f\) to \(f(0)\text{.}\) If \(\snorm{f} = 1\) (the uniform norm), then clearly \(\sabs{f(0)} \leq 1\text{,}\) so \(\sabs{Bf} \leq 1\text{,}\) meaning \(\snorm{B} \leq 1\text{.}\) To prove it is equal to \(1\text{,}\) note that the constant function \(1\) has norm \(1\text{,}\) so \(B1 = 1\text{,}\) meaning \(\snorm{B} \geq 1\text{.}\) So \(\snorm{B}=1\text{.}\)
The operator norm is not always a norm on \(L(X,Y)\text{,}\) in particular, \(\snorm{A}\) is not always finite for \(A \in L(X,Y)\text{.}\) We prove below that \(\snorm{A}\) is finite when \(X\) is finite-dimensional. The operator norm being finite is equivalent to \(A\) being continuous. For infinite-dimensional spaces, neither statement needs to be true. For an example, consider the vector space of continuously differentiable functions on \([0,2\pi]\) using the uniform norm. The functions \(t \mapsto \sin(n t)\) have norm 1, but their derivatives have norm \(n\text{.}\) So differentiation, which is a linear operator valued in the space of continuous functions, has infinite operator norm on this space. We will stick to finite-dimensional spaces.
Given a finite-dimensional vector space \(X\text{,}\) we often think of \(\R^n\text{,}\) although if we have a norm on \(X\text{,}\) the norm might not be the standard euclidean norm. In the exercises, you can prove that every norm on \(\R^n\) is “equivalent” to the euclidean norm in that the topology it generates is the same. For simplicity, we only prove the following proposition for euclidean spaces, and the proof for general finite-dimensional spaces is left as an exercise.

Proof.

As we said we only prove the proposition for euclidean spaces, so suppose that \(X = \R^n\) and the norm is the standard euclidean norm. The general case is left as an exercise.
Let \(\{ e_1,e_2,\ldots,e_n \}\) be the standard basis of \(\R^n\text{.}\) Write \(x \in \R^n\text{,}\) with \(\snorm{x} = 1\text{,}\) as
\begin{equation*} x = \sum_{k=1}^n c_k \, e_k . \end{equation*}
Since \(e_k \cdot e_\ell = 0\) whenever \(k\not=\ell\) and \(e_k \cdot e_k = 1\text{,}\) we have \(c_k = x \cdot e_k\text{.}\) By Cauchy–Schwarz,
\begin{equation*} \sabs{c_k} = \sabs{ x \cdot e_k } \leq \snorm{x} \, \snorm{e_k} = 1 . \end{equation*}
Then
\begin{equation*} \snorm{Ax} = \norm{\sum_{k=1}^n c_k \, Ae_k} \leq \sum_{k=1}^n \sabs{c_k} \, \snorm{Ae_k} \leq \sum_{k=1}^n \snorm{Ae_k} . \end{equation*}
The right-hand side does not depend on \(x\text{.}\) We found a finite upper bound for \(\snorm{Ax}\) independent of \(x\text{,}\) so \(\snorm{A} < \infty\text{.}\)
Take normed vector spaces \(X\) and \(Y\text{,}\) and \(A \in L(X,Y)\) with \(\snorm{A} < \infty\text{.}\) For \(v,w \in X\text{,}\)
\begin{equation*} \snorm{Av - Aw} = \snorm{A(v-w)} \leq \snorm{A} \, \snorm{v-w} . \end{equation*}
As \(\snorm{A} < \infty\text{,}\) then the inequality above says that \(A\) is Lipschitz with constant \(\snorm{A}\text{.}\)

Proof.

First, since all the spaces are finite-dimensional, then all the operator norms are finite, and the statements make sense to begin with.
For i, let \(x \in X\) be arbitrary. Then
\begin{equation*} \snorm{(A+B)x} = \snorm{Ax+Bx} \leq \snorm{Ax}+\snorm{Bx} \leq \snorm{A} \, \snorm{x}+\snorm{B} \,\snorm{x} = \bigl(\snorm{A}+\snorm{B}\bigr) \snorm{x} . \end{equation*}
So \(\snorm{A+B} \leq \snorm{A}+\snorm{B}\text{.}\) Similarly,
\begin{equation*} \snorm{(cA)x} = \sabs{c} \, \snorm{Ax} \leq \bigl(\sabs{c} \,\snorm{A}\bigr) \snorm{x} . \end{equation*}
Thus \(\snorm{cA} \leq \sabs{c} \, \snorm{A}\text{.}\) Next,
\begin{equation*} \sabs{c} \, \snorm{Ax} = \snorm{cAx} \leq \snorm{cA} \, \snorm{x} . \end{equation*}
Hence \(\sabs{c} \, \snorm{A} \leq \snorm{cA}\text{.}\)
For ii, write
\begin{equation*} \snorm{BAx} \leq \snorm{B} \, \snorm{Ax} \leq \snorm{B} \, \snorm{A} \, \snorm{x} . \qedhere \end{equation*}
A norm defines a metric, giving a metric space topology on \(L(X,Y)\) for finite-dimensional vector spaces. So, we can talk about open/closed sets, continuity, convergence, etc.
We illustrate this proposition on a simple example. Consider \(X=\R^1\text{,}\) where linear operators are just numbers \(a\) and the operator norm of \(a\) is \(\sabs{a}\text{.}\) The operator \(a\) is invertible (\(a^{-1} = \nicefrac{1}{a}\)) whenever \(a \not=0\text{.}\) The condition \(\sabs{a-b} < \frac{1}{\sabs{a^{-1}}}\) indeed implies that \(b\) is not zero. Moreover, \(a \mapsto \nicefrac{1}{a}\) is a continuous function. When the dimension is \(2\) or higher, there are other noninvertible operators than just zero, and things are a bit more difficult.

Proof.

Let us prove i. We know something about \(A^{-1}\) and \(A-B\text{;}\) they are linear operators. So apply them to a vector:
\begin{equation*} A^{-1}(A-B)x = x-A^{-1}Bx . \end{equation*}
Therefore,
\begin{equation*} \begin{split} \snorm{x} & = \snorm{A^{-1} (A-B)x + A^{-1}Bx} \\ & \leq \snorm{A^{-1}}\,\snorm{A-B}\, \snorm{x} + \snorm{A^{-1}}\,\snorm{Bx} . \end{split} \end{equation*}
Assume \(x \neq 0\) and so \(\snorm{x} \neq 0\text{.}\) Using (8.2), we obtain
\begin{equation*} \snorm{x} < \snorm{x} + \snorm{A^{-1}} \, \snorm{Bx} . \end{equation*}
Thus \(\snorm{Bx} \not= 0\) for all \(x \not= 0\text{,}\) and consequently \(Bx \not= 0\) for all \(x \not= 0\text{.}\) So \(B\) is one-to-one; if \(Bx = By\text{,}\) then \(B(x-y) = 0\text{,}\) so \(x=y\text{.}\) As \(B\) is a one-to-one linear mapping from \(X\) to \(X\text{,}\) which is finite-dimensional, it is also onto by Proposition 8.1.18. Therefore, \(B\) is invertible. It follows that, in particular, \(GL(X)\) is open.
Let us prove ii. We must show that the inverse is continuous. Fix a \(A \in GL(X)\text{.}\) Let \(B\) be near \(A\text{,}\) specifically \(\snorm{A-B} < \frac{1}{2 \snorm{A^{-1}}}\text{.}\) Then (8.2) is satisfied and \(B\) is invertible. A similar computation as above (using \(B^{-1}y\) instead of \(x\)) gives
\begin{equation*} \snorm{B^{-1}y} \leq \snorm{A^{-1}} \, \snorm{A-B} \, \snorm{B^{-1}y} + \snorm{A^{-1}} \, \snorm{y} \leq \frac{1}{2} \snorm{B^{-1}y} + \snorm{A^{-1}}\,\snorm{y} , \end{equation*}
or
\begin{equation*} \snorm{B^{-1}y} \leq 2\snorm{A^{-1}}\,\snorm{y} . \end{equation*}
So \(\snorm{B^{-1}} \leq 2 \snorm{A^{-1}} \text{.}\)
Now
\begin{equation*} A^{-1}(A-B)B^{-1} = A^{-1}(AB^{-1}-I) = B^{-1}-A^{-1} , \end{equation*}
and
\begin{equation*} \snorm{B^{-1}-A^{-1}} = \snorm{A^{-1}(A-B)B^{-1}} \leq \snorm{A^{-1}}\,\snorm{A-B}\,\snorm{B^{-1}} \leq 2\snorm{A^{-1}}^2 \snorm{A-B} . \end{equation*}
Therefore, as \(B\) tends to \(A\text{,}\) \(\snorm{B^{-1}-A^{-1}}\) tends to 0, and so the inverse operation is a continuous function at \(A\text{.}\)

Subsection 8.2.2 Matrices

Once we fix a basis in a finite-dimensional vector space \(X\text{,}\) we can represent a vector of \(X\) as an \(n\)-tuple of numbers—a vector in \(\R^n\text{.}\) Same can be done with \(L(X,Y)\text{,}\) bringing us to matrices, which are a convenient way to represent finite-dimensional linear transformations. Suppose \(\{ x_1, x_2, \ldots, x_n \}\) and \(\{ y_1, y_2, \ldots, y_m \}\) are bases for vector spaces \(X\) and \(Y\) respectively. A linear operator is determined by its values on the basis. Given \(A \in L(X,Y)\text{,}\) \(A x_j\) is an element of \(Y\text{.}\) Define the numbers \(a_{i,j}\) via
\begin{equation} A x_j = \sum_{i=1}^m a_{i,j} \, y_i ,\tag{8.3} \end{equation}
and write them as a matrix, which we, by slight abuse of notation, also call \(A\text{,}\)
\begin{equation*} A = \begin{bmatrix} a_{1,1} & a_{1,2} & \cdots & a_{1,n} \\ a_{2,1} & a_{2,2} & \cdots & a_{2,n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m,1} & a_{m,2} & \cdots & a_{m,n} \end{bmatrix} . \end{equation*}
We sometimes write \(A\) as \([a_{i,j}]\text{.}\) We say \(A\) is an \(m\)-by-\(n\) matrix. The \(j\)th column of the matrix contains precisely the coefficients that represent \(A x_j\) in terms of the basis \(\{ y_1,y_2,\ldots,y_m \}\text{.}\) Given the numbers \(a_{i,j}\text{,}\) then via the formula (8.3), we find the corresponding linear operator, as it is determined by the action on a basis. Hence, once we fix bases on \(X\) and \(Y\text{,}\) we have a one-to-one correspondence between \(L(X,Y)\) and the \(m\)-by-\(n\) matrices. When
\begin{equation*} z = \sum_{j=1}^n z_j \, x_j , \end{equation*}
then
\begin{equation*} A z = \sum_{j=1}^n z_j \, A x_j = \sum_{j=1}^n z_j \left( \sum_{i=1}^m a_{i,j}\, y_i \right) = \sum_{i=1}^m \left(\sum_{j=1}^n a_{i,j}\, z_j \right) y_i , \end{equation*}
which gives rise to the familiar rule for matrix multiplication, thinking of \(z\) as a column vector, that is, an \(n\)-by-1 matrix. More generally, if \(B\) is an \(n\)-by-\(r\) matrix with entries \(b_{j,k}\text{,}\) then the matrix for \(C = AB\) is an \(m\)-by-\(r\) matrix whose \((i,k)\)th entry \(c_{i,k}\) is
\begin{equation*} c_{i,k} = \sum_{j=1}^n a_{i,j}\,b_{j,k} . \end{equation*}
A way to remember it is if you order the indices as we do, that is row, column, and put the elements in the same order as the matrices, then the “middle index” is “summed-out.”
There is a one-to-one correspondence between matrices and linear operators in \(L(X,Y)\text{,}\) once we fix bases in \(X\) and \(Y\text{.}\) If we choose different bases, we get different matrices. This is an important distinction. The operator \(A\) acts on elements of \(X\text{,}\) while the matrix is something that works with \(n\)-tuples of numbers, that is, vectors of \(\R^n\text{.}\) By convention, we use standard bases in \(\R^n\) unless otherwise specified, and we identify \(L(\R^n,\R^m)\) with the set of \(m\)-by-\(n\) matrices.
A linear mapping changing one basis to another is represented by a square matrix in which the columns represent vectors of the second basis in terms of the first basis. We call such a linear mapping a change of basis. So for two choices of a basis in an \(n\)-dimensional vector space, there is a linear mapping (a change of basis) taking one basis to the other, and this corresponds to an \(n\)-by-\(n\) matrix which does the corresponding operation on \(\R^n\text{.}\)
Suppose \(X=\R^n\text{,}\) \(Y=\R^m\text{,}\) and all the bases are just the standard bases. Using the Cauchy–Schwarz inequality, with \(c=(c_1,c_2,\ldots,c_n) \in \R^n\text{,}\) compute
\begin{equation*} \snorm{Ac}^2 = \sum_{i=1}^m { \left(\sum_{j=1}^n a_{i,j} \, c_j \right)}^2 \leq \sum_{i=1}^m \left( \left(\sum_{j=1}^n {(a_{i,j})}^2 \right) \left(\sum_{j=1}^n {(c_j)}^2 \right) \right) = \left( \sum_{i=1}^m \sum_{j=1}^n {(a_{i,j})}^2 \right) \snorm{c}^2 . \end{equation*}
In other words, we have a bound on the operator norm (note that equality rarely happens)
\begin{equation*} \snorm{A} \leq \sqrt{\sum_{i=1}^m \sum_{j=1}^n {(a_{i,j})}^2} . \end{equation*}
The right hand side is the euclidean norm on \(\R^{nm}\text{,}\) the space of all the entries of the matrix. If the entries go to zero, then \(\snorm{A}\) goes to zero. Conversely,
\begin{equation*} \sum_{i=1}^m \sum_{j=1}^n {(a_{i,j})}^2 = \sum_{j=1}^n \snorm{A e_j}^2 \leq \sum_{j=1}^n \snorm{A}^2 = n \snorm{A}^2 . \end{equation*}
So if the operator norm of \(A\) goes to zero, so do the entries. In particular, if \(A\) is fixed and \(B\) is changing, then the entries of \(B\) go to the entries of \(A\) if and only if \(B\) goes to \(A\) in operator norm (\(\snorm{A-B}\) goes to zero). We have proved:

Subsection 8.2.3 Determinants

A certain number can be assigned to square matrices that measures how the corresponding linear mapping stretches space. In particular, this number, called the determinant, can be used to test for invertibility of a matrix.
Define the symbol \(\operatorname{sgn}(x)\) (read “sign of \(x\)”) for a number \(x\) by
\begin{equation*} \operatorname{sgn}(x) \coloneqq \begin{cases} -1 & \text{if } x < 0 , \\ 0 & \text{if } x = 0 , \\ 1 & \text{if } x > 0 . \end{cases} \end{equation*}
A permutation \(\sigma = (\sigma_1,\sigma_2,\ldots,\sigma_n)\) is a reordering of \((1,2,\ldots,n)\text{.}\) Define
\begin{equation} \operatorname{sgn}(\sigma) = \operatorname{sgn}(\sigma_1,\ldots,\sigma_n) \coloneqq \prod_{p < q} \operatorname{sgn}(\sigma_q-\sigma_p) .\tag{8.4} \end{equation}
Here \(\prod\) stands for multiplication, similarly to how \(\sum\) stands for summation.
Every permutation can be obtained by a sequence of transpositions (switchings of two elements). A permutation is even (resp. odd) if it takes an even (resp. odd) number of transpositions to get from \((1,2,\ldots,n)\) to \(\sigma\text{.}\) For instance, \((2,4,3,1)\) is two transpositions away from \((1,2,3,4)\) and is therefore even: \((1,2,3,4) \to (2,1,3,4) \to (2,4,3,1)\text{.}\) Being even or odd is well-defined: \(\operatorname{sgn}(\sigma)\) is \(1\) if \(\sigma\) is even and \(-1\) if \(\sigma\) is odd (exercise). This fact follows since applying a transposition changes the sign and \(\operatorname{sgn}(1,2,\ldots,n) = 1\text{.}\)
Let \(S_n\) be the set of all permutations on \(n\) elements (the symmetric group). Let \(A= [a_{i,j}]\) be a square \(n\)-by-\(n\) matrix. Define the determinant of \(A\) as
\begin{equation*} \det(A) \coloneqq \sum_{\sigma \in S_n} \operatorname{sgn} (\sigma) \prod_{i=1}^n a_{i,\sigma_i} . \end{equation*}
In fact, the determinant is the unique function that satisfies i, ii, and iii, but we digress. By ii, we mean that if we fix all the vectors \(x_1,\ldots,x_n\) except for \(x_j\text{,}\) and let \(v,w \in \R^n\) be two vectors, and \(a,b \in \R\) be scalars, then
\begin{multline*} \det\bigl([x_1 ~~~ \cdots ~~~ x_{j-1} ~~~ (av+bw) ~~~ x_{j+1} ~~~ \cdots ~~~ x_n]\bigr) = \\ a \det\bigl([x_1 ~~~ \cdots ~~~ x_{j-1} ~~~ v ~~~ x_{j+1} ~~~ \cdots ~~~ x_n]\bigr) + b \det\bigl([x_1 ~~~ \cdots ~~~ x_{j-1} ~~~ w ~~~ x_{j+1} ~~~ \cdots ~~~ x_n]\bigr) . \end{multline*}

Proof.

We go through the proof quickly, as you have likely seen it before. Item i is trivial. For ii, note that each term in the definition of the determinant contains exactly one factor from each column. Item iii follows as switching two columns is switching the two corresponding numbers in every element in \(S_n\text{.}\) Hence, all the signs are changed. Item iv follows because if two columns are equal, and we switch them, we get the same matrix back. So item iii says the determinant must be 0. Item v follows because the product in each term in the definition includes one element from the zero column. Item vi follows as \(\det\) is a polynomial in the entries of the matrix and hence continuous (as a function of the entries of the matrix). A function defined on matrices is continuous in the operator norm if and only if it is continuous as a function of the entries (Proposition 8.2.7). Finally, item vii is a direct computation.
The determinant tells us about areas and volumes, and how they change. For example, in the \(1\)-by-\(1\) case, a matrix is just a number, and the determinant is exactly this number. It says how the linear mapping “stretches” the space. Similarly, suppose \(A \in L(\R^2)\) is a linear transformation. It can be checked directly that the area of the image of the unit square \(A\bigl([0,1]^2\bigr)\) is \(\sabs{\det(A)}\text{,}\) see Figure 8.3 for an example. This works with arbitrary figures, not just the unit square: The absolute value of the determinant tells us the stretch in the area. The sign of the determinant tells us if the image is flipped (changes orientation) or not. In \(\R^3\) it tells us about the 3-dimensional volume, and in \(n\) dimensions about the \(n\)-dimensional volume. We claim this without proof.

Figure 8.3. Image of the unit square \({[0,1]}^2\) via the matrix \(\left[\begin{smallmatrix}1 & 1 \\ -1 & 1\end{smallmatrix}\right]\text{.}\) The image is a square of side \(\sqrt{2}\text{,}\) thus of area 2, and the determinant of the matrix is 2.

Proof.

Let \(b_1,b_2,\ldots,b_n\) be the columns of \(B\text{.}\) Then
\begin{equation*} AB = [ Ab_1 \quad Ab_2 \quad \cdots \quad Ab_n ] . \end{equation*}
That is, the columns of \(AB\) are \(Ab_1,Ab_2,\ldots,Ab_n\text{.}\)
Let \(b_{j,k}\) denote the elements of \(B\) and \(a_j\) the columns of \(A\text{.}\) By linearity of the determinant,
\begin{equation*} \begin{split} \det(AB) & = \det \bigl([ Ab_1 \quad Ab_2 \quad \cdots \quad Ab_n ] \bigr) = \det \left(\left[ \sum_{j=1}^n b_{j,1} a_j \quad Ab_2 \quad \cdots \quad Ab_n \right]\right) \\ & = \sum_{j=1}^n b_{j,1} \det \bigl([ a_j \quad Ab_2 \quad \cdots \quad Ab_n ]\bigr) \\ & = \sum_{1 \leq j_1,j_2,\ldots,j_n \leq n} b_{j_1,1} b_{j_2,2} \cdots b_{j_n,n} \det \bigl([ a_{j_1} \quad a_{j_2} \quad \cdots \quad a_{j_n} ]\bigr) \\ & = \left( \sum_{(j_1,j_2,\ldots,j_n) \in S_n} b_{j_1,1} b_{j_2,2} \cdots b_{j_n,n} \operatorname{sgn}(j_1,j_2,\ldots,j_n) \right) \det \bigl([ a_{1} \quad a_{2} \quad \cdots \quad a_{n} ]\bigr) . \end{split} \end{equation*}
In the last equality, we sum over the elements of \(S_n\) instead of all \(n\)-tuples for integers between 1 and \(n\text{,}\) because when two columns in the determinant are the same, then the determinant is zero. Reordering the columns to the original ordering to obtains the sgn.
The conclusion that \(\det(AB) = \det(A)\det(B)\) follows by recognizing that the expression in parentheses above is the determinant of \(B\text{.}\) We obtain this by plugging in \(A=I\text{.}\) The expression we get for the determinant of \(B\) has rows and columns swapped, so as a bonus, we have also just proved that the determinant of a matrix and its transpose are equal.
Let us prove the “Furthermore.” If \(A\) is invertible, then \(A^{-1}A = I\text{.}\) Consequently \(\det(A^{-1})\det(A) = \det(A^{-1}A) = \det(I) = 1\text{.}\) If \(A\) is not invertible, then it is not one-to-one, and so \(A\) takes some nonzero vector to zero. In other words, the columns of \(A\) are linearly dependent. Suppose
\begin{equation*} \sum_{k=1}^n \gamma_k\, a_k = 0 , \end{equation*}
where not all \(\gamma_k\) are equal to 0. Without loss of generality, suppose \(\gamma_1\neq 0\text{.}\) Take
\begin{equation*} B \coloneqq \begin{bmatrix} \gamma_1 & 0 & 0 & \cdots & 0 \\ \gamma_2 & 1 & 0 & \cdots & 0 \\ \gamma_3 & 0 & 1 & \cdots & 0 \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \gamma_n & 0 & 0 & \cdots & 1 \end{bmatrix} . \end{equation*}
Using the definition of the determinant (there is only a single permutation \(\sigma\) for which \(\prod_{i=1}^n b_{i,\sigma_i}\) is nonzero) we find \(\det(B) = \gamma_1 \not= 0\text{.}\) Then \(\det(AB) = \det(A)\det(B) = \gamma_1\det(A)\text{.}\) The first column of \(AB\) is zero, and hence \(\det(AB) = 0\text{.}\) We conclude \(\det(A) = 0\text{.}\)

Proof.

\(\det(B^{-1}AB) = \det(B^{-1})\det(A)\det(B) = \frac{1}{\det(B)}\det(A)\det(B) = \det(A)\text{.}\)
If in one basis \(A\) is the matrix representing a linear operator, then for another basis we can find a matrix \(B\) such that the matrix \(B^{-1}AB\) takes us to the first basis, applies \(A\) in the first basis, and takes us back to the basis we started with. Let \(X\) be a finite-dimensional vector space. Let \(\Phi \in L(X,\R^n)\) take a basis \(\{ x_1,\ldots,x_n \}\) to the standard basis \(\{ e_1,\ldots,e_n \}\) and let \(\Psi \in L(X,\R^n)\) take another basis \(\{ y_1, \ldots, y_n \}\) to the standard basis. Let \(T \in L(X)\) be a linear operator and let a matrix \(A\) represent the operator in the basis \(\{ x_1,\ldots,x_n \}\text{.}\) Then \(B\) would be such that we have the following diagram 3 : The two \(\R^n\)s on the bottom row represent \(X\) in the first basis, and the \(\R^n\)s on top represent \(X\) in the second basis.
If we compute the determinant of the matrix \(A\text{,}\) we obtain the same determinant if we use any other basis; in the other basis the matrix would be \(B^{-1}AB\text{.}\) Consequently,
\begin{equation*} \det \colon L(X) \to \R \end{equation*}
is a well-defined function without the need to fix a basis. That is, \(\det\) is defined on \(L(X)\text{,}\) not just on matrices.
There are three types of so-called elementary matrices. Let \(e_1,e_2,\ldots,e_n\) be the standard basis on \(\R^n\) as usual. First, for \(j = 1,2,\ldots,n\) and \(\lambda \in \R\text{,}\) \(\lambda \neq 0\text{,}\) define the first type of an elementary matrix, an \(n\)-by-\(n\) matrix \(E\) by
\begin{equation*} Ee_i \coloneqq \begin{cases} e_i & \text{if } i \neq j , \\ \lambda e_i & \text{if } i = j . \end{cases} \end{equation*}
Given any \(n\)-by-\(m\) matrix \(M\) the matrix \(EM\) is the same matrix as \(M\) except with the \(j\)th row multiplied by \(\lambda\text{.}\) It is an easy computation (exercise) that \(\det(E) = \lambda\text{.}\)
Next, for \(j,k\) with \(j\neq k\) and \(\lambda \in \R\text{,}\) define the second type of an elementary matrix \(E\) by
\begin{equation*} Ee_i \coloneqq \begin{cases} e_i & \text{if } i \neq j , \\ e_i + \lambda e_k & \text{if } i = j . \end{cases} \end{equation*}
Given any \(n\)-by-\(m\) matrix \(M\) the matrix \(EM\) is the same matrix as \(M\) except with \(\lambda\) times the \(k\)th row added to the \(j\)th row. It is an easy computation (exercise) that \(\det(E) = 1\text{.}\)
Finally, for \(j\) and \(k\) with \(j\neq k\text{,}\) define the third type of an elementary matrix \(E\) by
\begin{equation*} Ee_i \coloneqq \begin{cases} e_i & \text{if } i \neq j \text{ and } i \neq k , \\ e_k & \text{if } i = j , \\ e_j & \text{if } i = k . \end{cases} \end{equation*}
Given any \(n\)-by-\(m\) matrix \(M\) the matrix \(EM\) is the same matrix with \(j\)th and \(k\)th rows swapped. It is an easy computation (exercise) that \(\det(E) = -1\text{.}\)
The proof is left as an exercise. The proposition says we can compute the determinant via elementary row operations. We do not have to factor the matrix into a product of elementary matrices completely. It is sufficient to do row operations until we find an upper triangular matrix, that is, a matrix \([a_{i,j}]\) where \(a_{i,j} = 0\) if \(i > j\text{.}\) Computing determinant of such a matrix is not difficult (exercise).
Factorization into elementary matrices (or variations on elementary matrices) is useful in proofs involving an arbitrary linear operator, by reducing to a proof for an elementary matrix, similarly as the computation of the determinant.

Subsection 8.2.4 Exercises

Exercise 8.2.1.

For a vector space \(X\) with a norm \(\snorm{\cdot}\text{,}\) show that \(d(x,y) \coloneqq \snorm{x-y}\) makes \(X\) a metric space.

Exercise 8.2.2.

(Easy)   Show that for square matrices \(A\) and \(B\text{,}\) \(\det(AB) = \det(BA)\text{.}\)

Exercise 8.2.3.

For \(x \in \R^n\text{,}\) define
\begin{equation*} \snorm{x}_{\infty} \coloneqq \max \bigl\{ \sabs{x_1}, \sabs{x_2}, \ldots, \sabs{x_n} \bigr\} , \end{equation*}
sometimes called the sup or the max norm.
  1. Show that \(\snorm{\cdot}_\infty\) is a norm on \(\R^n\) (defining a different distance).
  2. What is the unit ball \(B(0,1)\) in this norm?

Exercise 8.2.4.

For \(x \in \R^n\text{,}\) define
\begin{equation*} \snorm{x}_{1} \coloneqq \sum_{k=1}^n \sabs{x_k}, \end{equation*}
sometimes called the \(1\)-norm (or \(L^1\) norm).
  1. Show that \(\snorm{\cdot}_1\) is a norm on \(\R^n\) (defining a different distance, sometimes called the taxicab distance).
  2. What is the unit ball \(B(0,1)\) in this norm? Think about what it is in \({\mathbb{R}}^2\) and \(\mathbb{R}^3\text{.}\) Hint: It is, for example, a convex hull of a finite number of points.

Exercise 8.2.5.

Using the euclidean norm on \(\R^2\text{,}\) compute the operator norm of the operators in \(L(\R^2)\) given by the matrices:
a) \(\left[ \begin{smallmatrix} 1 & 0 \\ 0 & 2 \end{smallmatrix} \right]\)     b) \(\left[ \begin{smallmatrix} 0 & 1 \\ -1 & 0 \end{smallmatrix} \right]\)     c) \(\left[ \begin{smallmatrix} 1 & 1 \\ 0 & 1 \end{smallmatrix} \right]\)     d) \(\left[ \begin{smallmatrix} 0 & 1 \\ 0 & 0 \end{smallmatrix} \right]\)

Exercise 8.2.6.

Using the standard euclidean norm \(\R^n\text{,}\) show:
  1. Suppose \(A \in L(\R,\R^n)\) is defined for \(x \in \R\) by \(Ax \coloneqq xa\) for a vector \(a \in \R^n\text{.}\) Then the operator norm \(\snorm{A}_{L(\R,\R^n)} = \snorm{a}_{\R^n}\text{.}\) (That is, the operator norm of \(A\) is the euclidean norm of \(a\text{.}\))
  2. Suppose \(B \in L(\R^n,\R)\) is defined for \(x \in \R^n\) by \(Bx \coloneqq b \cdot x\) for a vector \(b \in \R^n\text{.}\) Then the operator norm \(\snorm{B}_{L(\R^n,\R)} = \snorm{b}_{\R^n}\text{.}\)

Exercise 8.2.7.

Suppose \(\sigma = (\sigma_1,\sigma_2,\ldots,\sigma_n)\) is a permutation of \((1,2,\ldots,n)\text{.}\)
  1. Show that we can make a finite number of transpositions (switching of two elements) to get to \((1,2,\ldots,n)\text{.}\)
  2. Using the definition (8.4) show that \(\sigma\) is even if \(\operatorname{sgn}(\sigma) = 1\) and \(\sigma\) is odd if \(\operatorname{sgn}(\sigma) = -1\text{.}\) In particular, showing that being odd or even is well-defined.

Exercise 8.2.8.

Verify the computation of the determinant for the three types of elementary matrices.

Exercise 8.2.10.

  1. Suppose \(D = [d_{i,j}]\) is an \(n\)-by-\(n\) diagonal matrix, that is, \(d_{i,j} = 0\) whenever \(i \not= j\text{.}\) Show that \(\det(D) = d_{1,1}d_{2,2} \cdots d_{n,n}\text{.}\)
  2. Suppose \(A\) is a diagonalizable matrix. That is, there exists a matrix \(B\) such that \(B^{-1}AB = D\) for a diagonal matrix \(D = [d_{i,j}]\text{.}\) Show that \(\det(A) = d_{1,1}d_{2,2} \cdots d_{n,n}\text{.}\)

Exercise 8.2.11.

Take the vector space of polynomials \(\R[t]\) and let \(D \in L(\R[t])\) be differentiation (we proved in an earlier exercise that \(D\) is a linear operator). Given \(P(t) = c_0 + c_1 t + \cdots + c_n t^n \in \R[t]\) define \(\snorm{P} \coloneqq \sup \bigl\{ \sabs{c_j} : j = 0,1,2,\ldots,n \bigr\}\text{.}\)
  1. Show that \(\snorm{\cdot}\) is a norm on \(\R[t]\text{.}\)
  2. Prove \(\snorm{D} = \infty\text{.}\) Hint: Consider the polynomials \(t^n\) as \(n\) tends to infinity.

Exercise 8.2.12.

We finish the proof of Proposition 8.2.4. Let \(X\) be a finite-dimensional normed vector space with basis \(\{ x_1,x_2,\ldots,x_n \}\text{.}\) Denote by \(\snorm{\cdot}_X\) the norm on \(X\text{,}\) by \(\snorm{\cdot}_{\R^n}\) the standard euclidean norm on \(\R^n\text{,}\) and by \(\snorm{\cdot}_{L(X,Y)}\) the operator norm.
  1. Define \(f \colon \R^n \to \R\text{,}\)
    \begin{equation*} f(c_1,c_2,\ldots,c_n) \coloneqq \snorm{c_1 x_1 + c_2 x_2 + \cdots + c_n x_n}_X . \end{equation*}
    Show \(f\) is continuous.
  2. Show that there exist numbers \(m\) and \(M\) such that if \(c = (c_1,c_2,\ldots,c_n) \in \R^n\) with \(\snorm{c}_{\R^n} = 1\text{,}\) then \(m \leq \snorm{c_1 x_1 + c_2 x_2 + \cdots + c_n x_n}_X \leq M\text{.}\)
  3. Show that there exists a number \(B\) such that if \(\snorm{c_1 x_1 + c_2 x_2 + \cdots + c_n x_n}_X = 1\text{,}\) then \(\sabs{c_j} \leq B\text{.}\)
  4. Use part c) to show that if \(X\) is a finite-dimensional vector space and \(A \in L(X,Y)\text{,}\) then \(\snorm{A}_{L(X,Y)} < \infty\text{.}\)

Exercise 8.2.13.

Let \(X\) be a finite-dimensional vector space with basis \(\{ x_1,x_2,\ldots,x_n \}\text{.}\)
  1. Let \(\snorm{\cdot}_X\) be a norm on \(X\text{,}\) \(c = (c_1,c_2,\ldots,c_n) \in \R^n\text{,}\) and \(\snorm{\cdot}_{\R^n}\) the standard euclidean norm on \(\R^n\text{.}\) Prove that there exist numbers \(m,M > 0\) such that for all \(c \in \R^n\text{,}\)
    \begin{equation*} m \snorm{c}_{\R^n} \leq \snorm{c_1 x_1 + c_2 x_2 + \cdots + c_n x_n}_X \leq M \snorm{c}_{\R^n} . \end{equation*}
    Hint: See the previous exercise.
  2. Use part a) to show that if \(\snorm{\cdot}_1\) and \(\snorm{\cdot}_2\) are two norms on \(X\text{,}\) then there exist numbers \(m,M > 0\) (perhaps different from above) such that for all \(x \in X\text{,}\)
    \begin{equation*} m \snorm{x}_1 \leq \snorm{x}_2 \leq M \snorm{x}_1 . \end{equation*}
  3. Show that \(U \subset X\) is open in the metric defined by \(\norm{x-y}_1\) if and only if \(U\) is open in the metric defined by \(\norm{x-y}_2\text{.}\) So convergence of sequences and continuity of functions is the same in either norm.

Exercise 8.2.14.

Let \(A\) be an upper triangular matrix. Find a formula for the determinant of \(A\) in terms of the diagonal entries, and prove that your formula works.

Exercise 8.2.15.

Given an \(n\)-by-\(n\) matrix \(A\text{,}\) prove that \(\sabs{\det(A)} \leq \snorm{A}^n\) (the norm on \(A\) is the operator norm). Hint: One way to do it is to first prove it in the case \(\snorm{A}=1\text{,}\) which means that all columns are of norm 1 or less, then prove that this means that \(\sabs{\det(A)} \leq 1\) using linearity.

Exercise 8.2.16.

Consider Proposition 8.2.6 where \(X=\R^n\) (for all \(n\)) using the euclidean norm.
  1. Prove that the estimate \(\snorm{A-B} < \frac{1}{\snorm{A^{-1}}}\) is the best possible: For every \(A \in GL(\R^n)\text{,}\) find a \(B\) where equality is satisfied and \(B\) is not invertible. Hint: Difficulty is that \(\snorm{A}\snorm{A^{-1}}\) is not always \(1\text{.}\) Prove that a vector \(x_1\) can be completed to a basis \(\{ x_1,\ldots,x_n \}\) such that \(x_1 \cdot x_j = 0\) for \(j \geq 2\text{.}\) For the right \(x_1\text{,}\) make it so that \((A-B)x_j = 0\) for \(j \geq 2\text{.}\)
  2. For every fixed \(A \in GL(\R^n)\text{,}\) let \(\sM\) denote the set of matrices \(B\) such that \(\snorm{A-B} < \frac{1}{\snorm{A^{-1}}}\text{.}\) Prove that while every \(B \in \sM\) is invertible, \(\snorm{B^{-1}}\) is unbounded as a function of \(B\) on \(\sM\text{.}\)
Let \(A\) be an \(n\)-by-\(n\) matrix. A \(\lambda \in \C\) (possibly complex even for a real matrix) is an eigenvalue of \(A\) if there is a nonzero (possibly complex) vector \(x \in \C^n\) such that \(Ax = \lambda x\) (the multiplication by complex vectors is the same as for real vectors; if \(x = a+ib\) for real vectors \(a\) and \(b\text{,}\) and \(A\) is a real matrix, then \(Ax = Aa + i Ab\)). The number
\begin{equation*} \rho(A) \coloneqq \sup \bigl\{ \abs{\lambda} : \lambda \text{ is an eigenvalue of } A \bigr\} \end{equation*}
is the spectral radius of \(A\text{.}\) Here \(\abs{\lambda}\) is the complex modulus. We state without proof that at least one eigenvalue always exists, and there are no more than \(n\) distinct eigenvalues of \(A\text{.}\) You can therefore assume that \(0 \leq \rho(A) < \infty\text{.}\) The exercises below hold for complex matrices, but feel free to assume they are real matrices.

Exercise 8.2.17.

Let \(A,S\) be \(n\)-by-\(n\) matrices, where \(S\) is invertible. Prove that \(\lambda\) is an eigenvalue of \(A\text{,}\) if and only if it is an eigenvalue of \(S^{-1}AS\text{.}\) Then prove that \(\rho(S^{-1}AS) = \rho(S)\text{.}\) In particular, \(\rho\) is a well-defined function on \(L(X)\) for every finite-dimensional vector space \(X\text{.}\)

Exercise 8.2.18.

Let \(A\) be an \(n\)-by-\(n\) matrix \(A\text{.}\)
  1. Prove \(\rho(A) \leq \snorm{A}\text{.}\) (See above for definition of \(\rho\text{.}\))
  2. For every \(k \in \N\text{,}\) prove \(\rho(A) \leq \snorm{A^k}^{1/k}\text{.}\)
  3. Suppose \(\displaystyle \lim_{k\to\infty} A^k = 0\) (limit in the operator norm). Prove that \(\rho(A) < 1\text{.}\)

Exercise 8.2.19.

We say a set \(C \subset \R^n\) is symmetric if \(x \in C\) implies \(-x \in C\text{.}\)
  1. Let \(\snorm{\cdot}\) be any given norm on \(\R^n\text{.}\) Show that the closed unit ball \(C(0,1)\) (using the metric induced by this norm) is a compact symmetric convex set.
  2. (Challenging) Let \(C \subset \R^n\) be a compact, but note symmetric convex set and \(0 \in C\text{.}\) Show that
    \begin{equation*} \snorm{x} \coloneqq \inf \left\{ \lambda : \lambda > 0 \text{ and } \frac{x}{\lambda} \in C \right\} \end{equation*}
    is a norm on \(\R^n\text{,}\) and \(C = C(0,1)\) (the closed unit ball) in the metric induced by this norm.
Hint: Feel free to the result of Exercise 8.2.13 part c). In particular, whether a set is ``compact’’ is independent of the norm.
If we strike the “In particular” part and interpret the algebra with infinite operator norms properly, namely decree that \(0\) times \(\infty\) is 0, then this result also holds for infinite-dimensional spaces.
\(GL(X)\) is called the general linear group, that is where the acronym GL comes from.
This is a so-called commutative diagram. Following arrows in any way should end up with the same result.
For a higher quality printout use the PDF versions: https://www.jirka.org/ra/realanal.pdf or https://www.jirka.org/ra/realanal2.pdf