Skip to main content

Section 8.1 Vector spaces, linear mappings, and convexity

Note: 2–3 lectures

Subsection 8.1.1 Vector spaces

The euclidean space \(\R^n\) has already made an appearance in the metric space chapter. In this chapter, we extend the differential calculus we created for one variable to several variables. The key idea in differential calculus is to approximate differentiable functions by linear functions (approximating the graph by a straight line). In several variables, we must introduce a little bit of linear algebra before we can move on. We start with vector spaces and linear mappings on vector spaces.

While it is common to use \(\vec{v}\) or the bold \(\mathbf{v}\) for elements of \(\R^n\text{,}\) especially in the applied sciences, we use just plain old \(v\text{,}\) which is common in mathematics. That is, \(v \in \R^n\) is a vector, which means \(v = (v_1,v_2,\ldots,v_n)\) is an \(n\)-tuple of real numbers. 1  It is common to write and treat vectors as column vectors, that is, \(n\)-by-\(1\) matrices:

\begin{equation*} v = (v_1,v_2,\ldots,v_n) = { \left[ \begin{smallmatrix} v_1 \\ v_2 \\ \vdots \\ v_n \end{smallmatrix} \right] } \end{equation*}

We will do so when convenient. We call real numbers scalars to distinguish them from vectors.

We often think of vectors as a direction and a magnitude and draw the vector as an arrow. The vector \((v_1,v_2,\ldots,v_n)\) is represented by an arrow from the origin to the point \((v_1,v_2,\ldots,v_n)\text{,}\) see Figure 8.1 in the plane \(\R^2\text{.}\) When we think of vectors as arrows, they are not based at the origin necessarily; a vector is simply the direction and the magnitude, and it does not know where it starts.

Figure 8.1. Vector as an arrow.

On the other hand, each vector also represents a point in \(\R^n\text{.}\) Usually, we think of \(v \in \R^n\) as a point if we are thinking of \(\R^n\) as a metric space, and we think of it as an arrow if we think of the so-called vector space structure on \(\R^n\) (addition and scalar multiplication). Let us define the abstract notion of a vector space, as there are many other vector spaces than just \(\R^n\text{.}\)

Definition 8.1.1.

Let \(X\) be a set together with the operations of addition, \(+ \colon X \times X \to X\text{,}\) and multiplication, \(\cdot \colon \R \times X \to X\text{,}\) (we usually write \(ax\) instead of \(a \cdot x\)). \(X\) is called a vector space (or a real vector space) if the following conditions are satisfied:

  1. (Addition is associative)     If \(u, v, w \in X\text{,}\) then \(u+(v+w) = (u+v)+w\text{.}\)

  2. (Addition is commutative)     If \(u, v \in X\text{,}\) then \(u+v = v+u\text{.}\)

  3. (Additive identity)     There is a \(0 \in X\) such that \(v+0=v\) for all \(v \in X\text{.}\)

  4. (Additive inverse)     For every \(v \in X\text{,}\) there is a \(-v \in X\text{,}\) such that \(v+(-v)=0\text{.}\)

  5. (Distributive law)     If \(a \in \R\text{,}\) \(u,v \in X\text{,}\) then \(a(u+v) = au+av\text{.}\)

  6. (Distributive law)     If \(a,b \in \R\text{,}\) \(v \in X\text{,}\) then \((a+b)v = av+bv\text{.}\)

  7. (Multiplication is associative)     If \(a,b \in \R\text{,}\) \(v \in X\text{,}\) then \((ab)v = a(bv)\text{.}\)

  8. (Multiplicative identity)     \(1v = v\) for all \(v \in X\text{.}\)

Elements of a vector space are usually called vectors, even if they are not elements of \(\R^n\) (vectors in the “traditional” sense).

If \(Y \subset X\) is a subset that is a vector space itself using the same operations, then \(Y\) is called a subspace or a vector subspace of \(X\text{.}\)

Multiplication by scalars works as one would expect. For example, \(2v = (1+1)v = 1v + 1v = v+v\text{,}\) similarly \(3v = v+v+v\text{,}\) and so on. One particular fact we often use is that \(0 v = 0\text{,}\) where the zero on the left is \(0 \in \R\) and the zero on the right is \(0 \in X\text{.}\) To see this start with \(0v = (0+0)v = 0v + 0v\text{,}\) and add \(-(0v)\) to both sides to obtain \(0 = 0v\text{.}\) Similarly \(-v = (-1)v\text{,}\) which follows by \((-1)v+v = (-1)v + 1v = (-1+1)v = 0v = 0\text{.}\) These algebraic facts which follow quickly from the definition we will take for granted from now on.

Example 8.1.2.

An example vector space is \(\R^n\text{,}\) where addition and multiplication by a scalar is done componentwise: If \(a \in \R\text{,}\) \(v = (v_1,v_2,\ldots,v_n) \in \R^n\text{,}\) and \(w = (w_1,w_2,\ldots,w_n) \in \R^n\text{,}\) then

\begin{equation*} \begin{aligned} & v+w := (v_1,v_2,\ldots,v_n) + (w_1,w_2,\ldots,w_n) = (v_1+w_1,v_2+w_2,\ldots,v_n+w_n) , \\ & a v := a (v_1,v_2,\ldots,v_n) = (a v_1, a v_2,\ldots, a v_n) . \end{aligned} \end{equation*}

In this book, we mostly deal with vector spaces that can be often regarded as subsets of \(\R^n\text{,}\) but there are other vector spaces useful in analysis. Let us give a couple of examples.

Example 8.1.3.

A trivial example of a vector space is just \(X := \{ 0 \}\text{.}\) The operations are defined in the obvious way: \(0 + 0 := 0\) and \(a0 := 0\text{.}\) A zero vector must always exist, so all vector spaces are nonempty sets, and this \(X\) is the smallest possible vector space.

Example 8.1.4.

The space \(C([0,1],\R)\) of continuous functions on the interval \([0,1]\) is a vector space. For two functions \(f\) and \(g\) in \(C([0,1],\R)\) and \(a \in \R\text{,}\) we make the obvious definitions of \(f+g\) and \(af\text{:}\)

\begin{equation*} (f+g)(x) := f(x) + g(x), \qquad (af) (x) := a\bigl(f(x)\bigr) . \end{equation*}

The 0 is the function that is identically zero. We leave it as an exercise to check that all the vector space conditions are satisfied.

The space \(C^1([0,1],\R)\) of continuously differentiable functions is a subspace of \(C([0,1],\R)\text{.}\)

Example 8.1.5.

The space of polynomials \(c_0 + c_1 t + c_2 t^2 + \cdots + c_m t^m\) (of arbitrary degree \(m\)) is a vector space. Let us denote it by \(\R[t]\) (coefficients are real and the variable is \(t\)). The operations are defined in the same way as for functions above. Suppose there are two polynomials, one of degree \(m\) and one of degree \(n\text{.}\) Assume \(n \geq m\) for simplicity. Then

\begin{multline*} (c_0 + c_1 t + c_2 t^2 + \cdots + c_m t^m) + (d_0 + d_1 t + d_2 t^2 + \cdots + d_n t^n) = \\ (c_0+d_0) + (c_1+d_1) t + (c_2 + d_2) t^2 + \cdots + (c_m+d_m) t^m + d_{m+1} t^{m+1} + \cdots + d_n t^n \end{multline*}


\begin{equation*} a(c_0 + c_1 t + c_2 t^2 + \cdots + c_m t^m) = (ac_0) + (ac_1) t + (ac_2) t^2 + \cdots + (ac_m) t^m . \end{equation*}

Despite what it looks like, \(\R[t]\) is not equivalent to \(\R^n\) for any \(n\text{.}\) In particular, it is not “finite-dimensional.” We will make this notion precise in just a little bit. One can make a finite-dimensional vector subspace by restricting the degree. For example, if \(\sP_n\) is the set of polynomials of degree \(n\) or less, then \(\sP_n\) is a finite-dimensional vector space, and we could identify it with \(\R^{n+1}\text{.}\)

In the above, the variable \(t\) is really just a formal placeholder. By setting \(t\) equal to a real number we obtain a function. So the space \(\R[t]\) can be thought of as a subspace of \(C(\R,\R)\text{.}\) If we restrict the range of \(t\) to \([0,1]\text{,}\) \(\R[t]\) can be identified with a subspace of \(C([0,1],\R)\text{.}\)

Remark 8.1.6.

If \(X\) is a vector space, to check that a subset \(S \subset X\) is a vector subspace, we only need

  1. \(0 \in S\text{,}\)

  2. \(S\) is closed under addition, adding two vectors in \(S\) gets us a vector in \(S\text{,}\) and

  3. \(S\) is closed under scalar multiplication, multiplying a vector in \(S\) by a scalar gets us a vector in \(S\text{.}\)

Items 2) and 3) make sure that the addition and scalar multiplication are indeed defined on \(S\text{.}\) Item 1) is required to fulfill item iii from the definition of vector space. Existence of additive inverse \(-v\) follows because \(-v = (-1)v\) and item 3) says that \(-v \in S\) if \(v \in S\text{.}\) All other properties are certain equalities that are already satisfied in \(X\) and thus must be satisfied in a subset.

It is often better to think of even the simpler “finite-dimensional” vector spaces using the abstract notion rather than always as \(\R^n\text{.}\) It is possible to use other fields than \(\R\) in the definition (for example it is common to use the complex numbers \(\C\)), but let us stick with the real numbers 2 .

Subsection 8.1.2 Linear combinations and dimension

Definition 8.1.7.

Suppose \(X\) is a vector space, \(x_1, x_2, \ldots, x_k \in X\) are vectors, and \(a_1, a_2, \ldots, a_k \in \R\) are scalars. Then

\begin{equation*} a_1 x_1 + a_2 x_2 + \cdots + a_k x_k \end{equation*}

is called a linear combination of the vectors \(x_1, x_2, \ldots, x_k\text{.}\)

If \(Y \subset X\) is a set, then the span of \(Y\text{,}\) or in notation \(\spn(Y)\text{,}\) is the set of all linear combinations of all finite subsets of \(Y\text{.}\) We say \(Y\) spans \(\spn(Y)\text{.}\) By convention, define \(\spn(\emptyset) := \{ 0 \}.\)

Example 8.1.8.

Let \(Y := \bigl\{ (1,1) \bigr\} \subset \R^2\text{.}\) Then

\begin{equation*} \spn(Y) = \bigl\{ (x,x) \in \R^2 : x \in \R \bigr\} . \end{equation*}

That is, \(\spn(Y)\) is the line through the origin and the point \((1,1)\text{.}\)

Example 8.1.9.

Let \(Y := \bigl\{ (1,1), (0,1) \bigr\} \subset \R^2\text{.}\) Then

\begin{equation*} \spn(Y) = \R^2 , \end{equation*}

as every point \((x,y) \in \R^2\) can be written as a linear combination

\begin{equation*} (x,y) = x (1,1) + (y-x) (0,1) . \end{equation*}

Example 8.1.10.

Let \(Y := \{ 1, t, t^2, t^3, \ldots \} \subset \R[t]\text{,}\) and \(E := \{ 1, t^2, t^4, t^6, \ldots \} \subset \R[t]\text{.}\)

The span of \(Y\) is all polynomials,

\begin{equation*} \spn(Y) = \R[t] . \end{equation*}

The span of \(E\) is the set of polynomials with even powers of \(t\) only.

Suppose we have two linear combinations of vectors from \(Y\text{.}\) One linear combination uses the vectors \(\{ x_1,x_2,\ldots,x_k \}\text{,}\) and the other uses \(\{ \widetilde{x}_1,\widetilde{x}_2,\ldots,\widetilde{x}_\ell \}\text{.}\) Then clearly we can write both linear combinations using vectors from the union \(\{ x_1,x_2,\ldots,x_k \} \cup \{ \widetilde{x}_1,\widetilde{x}_2,\ldots,\widetilde{x}_\ell \}\text{,}\) by just taking zero multiples of the vectors we do not need, e.g. \(x_1 = x_1 + 0 \widetilde{x}_1\text{.}\) Suppose we have two linear combinations, we can without loss of generality write them as a linear combination of \(x_1,x_2,\ldots,x_k\text{.}\) Then their sum is also a linear combination of vectors from \(Y\text{:}\)

\begin{multline*} (a_1 x_1 + a_2 x_2 + \cdots + a_k x_k) + (b_1 x_1 + b_2 x_2 + \cdots + b_k x_k) \\ = (a_1 + b_1) x_1 + (a_2 + b_2) x_2 + \cdots + (a_k + b_k) x_k . \end{multline*}

Similarly, a scalar multiple of a linear combination of vectors from \(Y\) is a linear combination of vectors from \(Y\text{:}\)

\begin{equation*} b (a_1 x_1 + a_2 x_2 + \cdots + a_k x_k) = b a_1 x_1 + b a_2 x_2 + \cdots + b a_k x_k . \end{equation*}

Finally, \(0 \in Y\text{;}\) if \(Y\) is nonempty, \(0 = 0 v\) for some \(v \in Y\text{.}\) We have proved the following proposition.

If \(Y\) is already a vector space, then \(\spn(Y) = Y\text{.}\)

Definition 8.1.12.

A set of vectors \(\{ x_1, x_2, \ldots, x_k \} \subset X\) is linearly independent 3  if the equation

\begin{equation} a_1 x_1 + a_2 x_2 + \cdots + a_k x_k = 0\tag{8.1} \end{equation}

has only the trivial solution \(a_1 = a_2 = \cdots = a_k = 0\text{.}\) A set that is not linearly independent is linearly dependent. A linearly independent set of vectors \(B\) such that \(\spn(B) = X\) is called a basis of \(X\text{.}\)

If a vector space \(X\) contains a linearly independent set of \(d\) vectors, but no linearly independent set of \(d+1\) vectors, then we say the dimension of \(X\) is \(d\text{,}\) and we write \(\dim \, X := d\text{.}\) If for all \(d \in \N\) the vector space \(X\) contains a set of \(d\) linearly independent vectors, we say \(X\) is infinite-dimensional and write \(\dim \, X := \infty\text{.}\) For the trivial vector space \(\{ 0 \}\text{,}\) we define \(\dim \, \{ 0 \} := 0\text{.}\)

A subset of a linear independent set is clearly linearly independent, so in the definition of dimension, notice that if a set does not have \(d+1\) linearly independent vectors, no set of more than \(d+1\) vectors is linearly independent either. Also note that no element of a linear independent set can be zero. In particular, \(\{ 0 \}\) is the only vector space of dimension \(0\text{.}\) By convention, the empty set is linearly independent and thus a basis of \(\{ 0 \}\text{.}\)

As an example, the set \(Y\) of the two vectors in Example 8.1.9 is a basis of \(\R^2\text{,}\) and so \(\dim \, \R^2 \geq 2\text{.}\) We will see in a moment that every vector subspace of \(\R^n\) has a finite dimension, and that dimension is less than or equal to \(n\text{.}\) So every set of 3 vectors in \(\R^2\) is linearly dependent, and \(\dim \, \R^2 = 2\text{.}\)

If a set is linearly dependent, then one of the vectors is a linear combination of the others. In other words, in (8.1) if \(a_j \not= 0\text{,}\) then we solve for \(x_j\text{:}\)

\begin{equation*} x_j = \frac{-a_1}{a_j} x_1 + \cdots + \frac{-a_{j-1}}{a_j} x_{j-1} + \frac{-a_{j+1}}{a_j} x_{j+1} + \cdots + \frac{-a_k}{a_k} x_k . \end{equation*}

The vector \(x_j\) has at least two different representations as linear combinations of \(\{ x_1,x_2,\ldots,x_k \}\text{.}\) The one above and \(x_j\) itself. For example, the set \(\bigl\{ (0,1), (2,3), (5,0) \bigr\}\) in \(\R^2\) is linearly dependent:

\begin{equation*} 3(0,1) - (2,3) + 2(1,0) = 0 , \qquad \text{so} \qquad (2,3) = 3(0,1) + 2(1,0) . \end{equation*}


As \(X\) is the span of \(B\text{,}\) every \(y \in X\) is a linear combination of elements of \(B\text{.}\) Suppose

\begin{equation*} y = \sum_{j=1}^k a_j \, x_j = \sum_{j=1}^k b_j \, x_j . \end{equation*}


\begin{equation*} \sum_{j=1}^k (a_j-b_j) x_j = 0 . \end{equation*}

By linear independence of the basis \(a_j = b_j\) for all \(j\text{,}\) and so the representation is unique.

For \(\R^n\) we define the standard basis of \(\R^n\text{:}\)

\begin{equation*} e_1 := (1,0,0,\ldots,0) , \quad e_2 := (0,1,0,\ldots,0) , \quad \ldots, \quad e_n := (0,0,0,\ldots,1) , \end{equation*}

We use the same letters \(e_j\) for any \(\R^n\text{,}\) and which space \(\R^n\) we are working in is understood from context. A direct computation shows that \(\{ e_1, e_2, \ldots, e_n \}\) is really a basis of \(\R^n\text{;}\) it spans \(\R^n\) and is linearly independent. In fact,

\begin{equation*} a = (a_1,a_2,\ldots,a_n) = \sum_{j=1}^n a_j \, e_j . \end{equation*}


All statements hold trivially when \(d=0\text{,}\) so assume \(d \geq 1\text{.}\)

Let us start with i. Suppose \(S = \{ x_1 , x_2, \ldots, x_d \}\) spans \(X\text{,}\) and \(T = \{ y_1, y_2, \ldots, y_m \}\) is a set of linearly independent vectors of \(X\text{.}\) We wish to show that \(m \leq d\text{.}\) Write

\begin{equation*} y_1 = \sum_{k=1}^d a_{k,1} x_k , \end{equation*}

for some numbers \(a_{1,1},a_{2,1},\ldots,a_{d,1}\text{,}\) which we can do as \(S\) spans \(X\text{.}\) One of the \(a_{k,1}\) is nonzero (otherwise \(y_1\) would be zero), so suppose without loss of generality that this is \(a_{1,1}\text{.}\) Then we solve

\begin{equation*} x_1 = \frac{1}{a_{1,1}} y_1 - \sum_{k=2}^d \frac{a_{k,1}}{a_{1,1}} x_k . \end{equation*}

In particular, \(\{ y_1 , x_2, \ldots, x_d \}\) span \(X\text{,}\) since \(x_1\) can be obtained from \(\{ y_1 , x_2, \ldots, x_d \}\text{.}\) Therefore, there are some numbers for some numbers \(a_{1,2},a_{2,2},\ldots,a_{d,2}\text{,}\) such that

\begin{equation*} y_2 = a_{1,2} y_1 + \sum_{k=2}^d a_{k,2} x_k . \end{equation*}

As \(T\) is linearly independent—and so \(\{ y_1, y_2 \}\) is linearly independent—one of the \(a_{k,2}\) for \(k \geq 2\) must be nonzero. Without loss of generality suppose \(a_{2,2} \not= 0\text{.}\) Proceed to solve for

\begin{equation*} x_2 = \frac{1}{a_{2,2}} y_2 - \frac{a_{1,2}}{a_{2,2}} y_1 - \sum_{k=3}^d \frac{a_{k,2}}{a_{2,2}} x_k . \end{equation*}

In particular, \(\{ y_1 , y_2, x_3, \ldots, x_d \}\) spans \(X\text{.}\)

We continue this procedure. If \(m < d\text{,}\) then we are done. So suppose \(m \geq d\text{.}\) After \(d\) steps, we obtain that \(\{ y_1 , y_2, \ldots, y_d \}\) spans \(X\text{.}\) Any other vector \(v\) in \(X\) is a linear combination of \(\{ y_1 , y_2, \ldots, y_d \}\text{,}\) and hence cannot be in \(T\) as \(T\) is linearly independent. So \(m = d\text{.}\)

Let us look at ii. First a short claim. If \(T\) is a set of linearly independent vectors that do not span \(X\text{,}\) that is, \(X \setminus \spn (T) \not= \emptyset\text{,}\) then for any vector \(v \in X \setminus \spn (T)\text{,}\) the set \(T \cup \{ v \}\) is linearly independent. Indeed, a nonzero linear combination of elements of \(T \cup \{ v \}\) would either produce \(v\) as a combination of \(T\text{,}\) or it would be a combination of elements of \(T\text{,}\) and neither option is possible.

If \(\dim \, X = d\text{,}\) then there must exist some linearly independent set \(T\) of \(d\) vectors, and \(T\) must span \(X\text{,}\) otherwise we could choose a larger set of linearly independent vectors via the claim. So we have a basis of \(d\) vectors. On the other hand, if we have a basis of \(d\) vectors, the dimension is at least \(d\) as a basis is linearly independent. On the other hand a basis also spans \(X\text{,}\) and so by i we know that dimension is at most \(d\text{.}\) Hence the dimension of \(X\) must equal \(d\text{.}\)

For iii notice that \(\{ e_1, e_2, \ldots, e_n \}\) is a basis of \(\R^n\text{.}\)

To see iv, suppose \(Y \subset X\) is a vector subspace, where \(\dim \, X = d\text{.}\) As \(X\) cannot contain \(d+1\) linearly independent vectors, neither can \(Y\text{.}\)

For v suppose \(T\) is a set of \(m\) vectors that is linearly dependent and spans \(X\text{,}\) we will show that \(m > d\text{.}\) One of the vectors is a linear combination of the others. If we remove it from \(T\) we obtain a set of \(m-1\) vectors that still span \(X\) and hence \(d = \dim \, X \leq m-1\) by i.

For vi suppose \(T = \{ x_1, x_2, \ldots, x_m \}\) is a linearly independent set. Firstly, \(m \leq d\) by definition of dimension. If \(m=d\text{,}\) we are done. Otherwise, we follow the procedure above in the proof of ii to add a vector \(v\) not in the span of \(T\text{.}\) The set \(T \cup \{ v \}\) is linearly independent, whose span has dimension \(m+1\text{.}\) Therefore, we repeat this procedure \(d-m\) times to find a set of \(d\) linearly independent vectors. They must span \(X\) otherwise we could add yet another vector.

Subsection 8.1.3 Linear mappings

A function \(f \colon X \to Y\text{,}\) when \(Y\) is not \(\R\text{,}\) is often called a mapping or a map rather than a function.

Definition 8.1.15.

A mapping \(A \colon X \to Y\) of vector spaces \(X\) and \(Y\) is linear (we also say \(A\) is a linear transformation or a linear operator) if for all \(a \in \R\) and all \(x,y \in X\text{,}\)

\begin{equation*} A(a x) = a A(x), \qquad \text{and} \qquad A(x+y) = A(x)+A(y) . \end{equation*}

We usually write \(Ax\) instead of \(A(x)\) if \(A\) is linear. If \(A\) is one-to-one and onto, then we say \(A\) is invertible, and we denote the inverse by \(A^{-1}\text{.}\) If \(A \colon X \to X\) is linear, then we say \(A\) is a linear operator on \(X\).

We write \(L(X,Y)\) for the set of all linear transformations from \(X\) to \(Y\text{,}\) and just \(L(X)\) for the set of linear operators on \(X\text{.}\) If \(a \in \R\) and \(A,B \in L(X,Y)\text{,}\) define the transformations \(aA\) and \(A+B\) by

\begin{equation*} (aA)(x) := aAx , \qquad (A+B)(x) := Ax + Bx . \end{equation*}

If \(A \in L(Y,Z)\) and \(B \in L(X,Y)\text{,}\) define the transformation \(AB\) as the composition \(A \circ B\text{,}\) that is,

\begin{equation*} ABx := A(Bx) . \end{equation*}

Finally, denote by \(I \in L(X)\) the identity: the linear operator such that \(Ix = x\) for all \(x\text{.}\)

It is not hard to see that \(aA \in L(X,Y)\) and \(A+B \in L(X,Y)\text{,}\) and that \(AB \in L(X,Z)\text{.}\) In particular, \(L(X,Y)\) is a vector space (\(0\) is the linear map that takes everything to \(0\)). As the set \(L(X)\) is not only a vector space, but also admits a product (composition of operators), it is often called an algebra.

An immediate consequence of the definition of a linear mapping is: If \(A\) is linear, then \(A0 = 0\text{.}\)


Let \(a \in \R\) and \(y \in Y\text{.}\) As \(A\) is onto, then there is an \(x\) such that \(y = Ax\text{,}\) and further as it is also one-to-one \(A^{-1}(Az) = z\) for all \(z \in X\text{.}\) So

\begin{equation*} A^{-1}(ay) = A^{-1}(aAx) = A^{-1}\bigl(A(ax)\bigr) = ax = aA^{-1}(y). \end{equation*}

Similarly let \(y_1,y_2 \in Y\text{,}\) and \(x_1, x_2 \in X\) such that \(Ax_1 = y_1\) and \(Ax_2 = y_2\text{,}\) then

\begin{equation*} A^{-1}(y_1+y_2) = A^{-1}(Ax_1+Ax_2) = A^{-1}\bigl(A(x_1+x_2)\bigr) = x_1+x_2 = A^{-1}(y_1) + A^{-1}(y_2). \qedhere \end{equation*}

We will only prove this proposition for finite-dimensional spaces, as we do not need infinite-dimensional spaces. For infinite-dimensional spaces, the proof is essentially the same, but a little trickier to write, so let us stick with finitely many dimensions.


Let \(\{ x_1, x_2, \ldots, x_n \}\) be a basis of \(X\text{,}\) and let \(y_j := A x_j\text{.}\) Every \(x \in X\) has a unique representation

\begin{equation*} x = \sum_{j=1}^n b_j \, x_j \end{equation*}

for some numbers \(b_1,b_2,\ldots,b_n\text{.}\) By linearity

\begin{equation*} Ax = A\sum_{j=1}^n b_j x_j = \sum_{j=1}^n b_j \, Ax_j = \sum_{j=1}^n b_j \, y_j . \end{equation*}

The “furthermore” follows by setting \(y_j := \widetilde{A}(x_j)\text{,}\) and then for \(x = \sum_{j=1}^n b_j \, x_j\text{,}\) defining the extension as \(Ax := \sum_{j=1}^n b_j \, y_j\text{.}\) The function is well-defined by uniqueness of the representation of \(x\text{.}\) We leave it to the reader to check that \(A\) is linear.

The next proposition only works for finite-dimensional vector spaces. It is a special case of the so-called rank-nullity theorem from linear algebra.


Let \(\{ x_1,x_2,\ldots,x_n \}\) be a basis for \(X\text{.}\) First suppose \(A\) is one-to-one. Let \(c_1,c_2,\ldots,c_n\) be such that

\begin{equation*} 0 = \sum_{j=1}^n c_j \, Ax_j = A\sum_{j=1}^n c_j \, x_j . \end{equation*}

As \(A\) is one-to-one, the only vector that is taken to 0 is 0 itself. Hence,

\begin{equation*} 0 = \sum_{j=1}^n c_j \, x_j \end{equation*}

and \(c_j = 0\) for all \(j\text{.}\) So \(\{ Ax_1, Ax_2, \ldots, Ax_n \}\) is a linearly independent set. By Proposition 8.1.14 and the fact that the dimension is \(n\text{,}\) we conclude \(\{ Ax_1, Ax_2, \ldots, Ax_n \}\) spans \(X\text{.}\) Any point \(x \in X\) can be written as

\begin{equation*} x = \sum_{j=1}^n a_j \, Ax_j = A\sum_{j=1}^n a_j \, x_j , \end{equation*}

so \(A\) is onto.

For the other direction, suppose \(A\) is onto. As \(A\) is determined by the action on the basis, every element of \(X\) is in the span of \(\{ Ax_1, Ax_2, \ldots, Ax_n \}\text{.}\) Suppose that for some \(c_1,c_2,\ldots,c_n\text{,}\)

\begin{equation*} 0 = A\sum_{j=1}^n c_j \, x_j = \sum_{j=1}^n c_j \, Ax_j . \end{equation*}

By Proposition 8.1.14 as \(\{ Ax_1, Ax_2, \ldots, Ax_n \}\) span \(X\text{,}\) the set is linearly independent, and hence \(c_j = 0\) for all \(j\text{.}\) In other words, if \(Ax = 0\text{,}\) then \(x=0\text{.}\) This means that \(A\) is one-to-one: If \(Ax = Ay\text{,}\) then \(A(x-y) = 0\) and so \(x=y\text{.}\)

We leave the proof of the next proposition as an exercise.

We often identify a finite-dimensional vector space \(X\) of dimension \(n\) with \(\R^n\text{,}\) provided we fix a basis \(\{ x_1, x_2, \ldots, x_n \}\) in \(X\text{.}\) That is, we define a bijective linear map \(A \in L(X,\R^n)\) by \(Ax_j := e_j\text{,}\) where \(\{ e_1, e_2, \ldots, e_n \}\) is the standard basis in \(\R^n\text{.}\) Then we have the correspondence

\begin{equation*} \sum_{j=1}^n c_j \, x_j \, \in X \quad \overset{A}{\mapsto} \quad (c_1,c_2,\ldots,c_n) \, \in \R^n . \end{equation*}

Subsection 8.1.4 Convexity

A subset \(U\) of a vector space is convex if whenever \(x,y \in U\text{,}\) the line segment from \(x\) to \(y\) lies in \(U\text{.}\) That is, if the convex combination \((1-t)x+ty\) is in \(U\) for all \(t \in [0,1]\text{.}\) Sometimes we write \([x,y]\) for this line segment. See Figure 8.2.

Figure 8.2. Convexity.

In \(\R\text{,}\) convex sets are precisely the intervals, which are also precisely the connected sets. In two or more dimensions there are lots of nonconvex connected sets. For example, the set \(\R^2 \setminus \{0\}\) is not convex, but it is connected. To see this, take any \(x \in \R^2 \setminus \{0\}\) and let \(y:=-x\text{.}\) Then \((\nicefrac{1}{2})x + (\nicefrac{1}{2})y = 0\text{,}\) which is not in the set. Balls in \(\R^n\) are convex. We use this result often enough we state it as a proposition, and leave the proof as an exercise.

Example 8.1.21.

As a convex combination is, in particular, a linear combination, so every subspace \(V\) of a vector space \(X\) is convex.

Example 8.1.22.

Let \(C([0,1],\R)\) be the vector space of continuous real-valued functions on \(\R\text{.}\) Let \(X \subset C([0,1],\R)\) be the set of those \(f\) such that

\begin{equation*} \int_0^1 f(x)\,dx \leq 1 \qquad \text{and} \qquad f(x) \geq 0 \enspace \text{for all } x \in [0,1] . \end{equation*}

Then \(X\) is convex. Take \(t \in [0,1]\text{,}\) and note that if \(f,g \in X\text{,}\) then \(t f(x) + (1-t) g(x) \geq 0\) for all \(x\text{.}\) Furthermore,

\begin{equation*} \int_0^1 \bigl(tf(x) + (1-t)g(x)\bigr) \,dx = t \int_0^1 f(x) \,dx + (1-t)\int_0^1 g(x) \,dx \leq 1 . \end{equation*}

Note that \(X\) is not a vector subspace of \(C([0,1],\R)\text{.}\) The function \(f(x):=1\) is in \(X\text{,}\) but \(2f\) and \(-f\) is not.


If \(x, y \in C\text{,}\) then \(x,y \in C_\lambda\) for all \(\lambda \in I\text{,}\) and hence if \(t \in [0,1]\text{,}\) then \(tx + (1-t)y \in C_\lambda\) for all \(\lambda \in I\text{.}\) Therefore, \(tx + (1-t)y \in C\) and \(C\) is convex.


Take two points \(p,q \in T(C)\text{.}\) Pick \(x,y \in C\) such that \(Tx = p\) and \(Ty=q\text{.}\) As \(C\) is convex, then \(tx+(1-t)y \in C\) for all \(t \in [0,1]\text{,}\) so

\begin{equation*} tp+(1-t)q = tTx+(1-t)Ty = T\bigl(tx+(1-t)y\bigr) \in T(C) . \qedhere \end{equation*}

For completeness, a very useful construction is the convex hull. Given a subset \(S \subset V\) of a vector space, define the convex hull of \(S\) as the intersection of all convex sets containing \(S\text{:}\)

\begin{equation*} \operatorname{co}(S) := \bigcap \{ C \subset V : S \subset C, \text{ and } C \text{ is convex} \} . \end{equation*}

That is, the convex hull is the smallest convex set containing \(S\text{.}\) By a proposition above, the intersection of convex sets is convex and hence, the convex hull is convex.

Example 8.1.25.

The convex hull of \(\{ 0, 1 \}\) in \(\R\) is \([0,1]\text{.}\) Proof: Any convex set containing 0 and 1 must contain \([0,1]\text{,}\) so \([0,1] \subset \operatorname{co}(\{0,1\})\text{.}\) The set \([0,1]\) is convex and contains \(\{0,1\}\text{,}\) so \(\operatorname{co}(\{0,1\}) \subset [0,1]\text{.}\)

Subsection 8.1.5 Exercises

Exercise 8.1.1.

Show that in \(\R^n\) (with the standard euclidean metric), for every \(x \in \R^n\) and every \(r > 0\text{,}\) the ball \(B(x,r)\) is convex.

Exercise 8.1.2.

Verify that \(\R^n\) is a vector space.

Exercise 8.1.3.

Let \(X\) be a vector space. Prove that a finite set of vectors \(\{ x_1,x_2,\ldots,x_n \} \subset X\) is linearly independent if and only if for every \(j=1,2,\ldots,n\)

\begin{equation*} \spn\bigl( \{ x_1,\ldots,x_{j-1},x_{j+1},\ldots,x_n \}\bigr) \subsetneq \spn\bigl( \{ x_1,x_2,\ldots,x_n \}\bigr) . \end{equation*}

That is, the span of the set with one vector removed is strictly smaller.

Exercise 8.1.4.

Show that the set \(X \subset C([0,1],\R)\) of those functions such that \(\int_0^1 f = 0\) is a vector subspace.

Exercise 8.1.5.

(Challenging)   Prove \(C([0,1],\R)\) is an infinite-dimensional vector space where the operations are defined in the obvious way: \(s=f+g\) and \(m=af\) are defined as \(s(x) := f(x)+g(x)\) and \(m(x) := a f(x)\text{.}\) Hint: For the dimension, think of functions that are only nonzero on the interval \((\nicefrac{1}{n+1},\nicefrac{1}{n})\text{.}\)

Exercise 8.1.6.

Let \(k \colon [0,1]^2 \to \R\) be continuous. Show that \(L \colon C([0,1],\R) \to C([0,1],\R)\) defined by

\begin{equation*} Lf(y) := \int_0^1 k(x,y)f(x)\,dx \end{equation*}

is a linear operator. That is, first show that \(L\) is well-defined by showing that \(Lf\) is continuous whenever \(f\) is, and then showing that \(L\) is linear.

Exercise 8.1.7.

Let \(\sP_n\) be the vector space of polynomials in one variable of degree \(n\) or less. Show that \(\sP_n\) is a vector space of dimension \(n+1\text{.}\)

Exercise 8.1.8.

Let \(\R[t]\) be the vector space of polynomials in one variable \(t\text{.}\) Let \(D \colon \R[t] \to \R[t]\) be the derivative operator (derivative in \(t\)). Show that \(D\) is a linear operator.

Exercise 8.1.9.

Let us show that Proposition 8.1.18 only works in finite dimensions. Take the space of polynomials \(\R[t]\) and define the operator \(A \colon \R[t] \to \R[t]\) by \(A\bigl(P(t)\bigr) := tP(t)\text{.}\) Show that \(A\) is linear and one-to-one, but show that it is not onto.

Exercise 8.1.10.

Finish the proof of Proposition 8.1.17 in the finite-dimensional case. That is, suppose \(\{ x_1, x_2,\ldots x_n \}\) is a basis of \(X\text{,}\) \(\{ y_1, y_2,\ldots y_n \} \subset Y\text{,}\) and we define a function

\begin{equation*} Ax := \sum_{j=1}^n b_j y_j, \qquad \text{if} \quad x=\sum_{j=1}^n b_j x_j . \end{equation*}

Then prove that \(A \colon X \to Y\) is linear.

Exercise 8.1.11.

Prove Proposition 8.1.19. Hint: A linear transformation is determined by its action on a basis. So given two bases \(\{ x_1,\ldots,x_n \}\) and \(\{ y_1,\ldots,y_m \}\) for \(X\) and \(Y\) respectively, consider the linear operators \(A_{jk}\) that send \(A_{jk} x_j = y_k\text{,}\) and \(A_{jk} x_\ell = 0\) if \(\ell \not= j\text{.}\)

Exercise 8.1.12.

(Easy)   Suppose \(X\) and \(Y\) are vector spaces and \(A \in L(X,Y)\) is a linear operator.

  1. Show that the nullspace \(N := \{ x \in X : Ax = 0 \}\) is a vector space.

  2. Show that the range \(R := \{ y \in Y : Ax = y \text{ for some } x \in X \}\) is a vector space.

Exercise 8.1.13.

(Easy)   Show by example that a union of convex sets need not be convex.

Exercise 8.1.14.

Compute the convex hull of the set of 3 points \(\bigl\{ (0,0), (0,1), (1,1) \bigr\}\) in \(\R^2\text{.}\)

Exercise 8.1.15.

Show that the set \(\bigl\{ (x,y) \in \R^2 : y > x^2 \bigr\}\) is a convex set.

Exercise 8.1.16.

Show that the set \(X \subset C([0,1],\R)\) of those functions such that \(\int_0^1 f = 1\) is a convex set, but not a vector subspace.

Exercise 8.1.17.

Show that every convex set in \(\R^n\) is connected using the standard topology on \(\R^n\text{.}\)

Exercise 8.1.18.

Suppose \(K \subset \R^2\) is a convex set such that the only point of the form \((x,0)\) in \(K\) is the point \((0,0)\text{.}\) Further suppose that there \((0,1) \in K\) and \((1,1) \in K\text{.}\) Then show that if \((x,y) \in K\text{,}\) then \(y > 0\) unless \(x=0\text{.}\)

Exercise 8.1.19.

Prove that an arbitrary intersection of vector subspaces is a vector subspace. That is, if \(X\) is a vector space and \(\{ V_\lambda \}_{\lambda \in I}\) is an arbitrary collection of vector subspaces of \(X\text{,}\) then \(\bigcap_{\lambda \in I} V_\lambda\) is a vector subspace of \(X\text{.}\)

Subscripts are used for many purposes, so sometimes we may have several vectors that may also be identified by subscript, such as a finite or infinite sequence of vectors \(y_1,y_2,\ldots\text{.}\)
If you want a very funky vector space over a different field, \(\R\) itself is a vector space over the rational numbers.
For an infinite set \(Y \subset X\text{,}\) we would say \(Y\) is linearly independent if every finite subset of \(Y\) is linearly independent in the sense given. However, this situation only comes up in infinitely many dimensions and we will not require it.
For a higher quality printout use the PDF versions: or