#### Definition 0.3.1.

A

*set*is a collection of objects called*elements*or*members*. A set with no objects is called the*empty set*and is denoted by \(\emptyset\) (or sometimes by \(\{ \}\)).Before we start talking about analysis, we need to fix some language. Modern^{ 1 } analysis uses the language of sets, and therefore that is where we start. We talk about sets in a rather informal way, using the so-called “naïve set theory.” Do not worry, that is what the majority of mathematicians use, and it is hard to get into trouble. The reader has hopefully seen the very basics of set theory and proof writing before, and this section should be a quick refresher.

A *set* is a collection of objects called *elements* or *members*. A set with no objects is called the *empty set* and is denoted by \(\emptyset\) (or sometimes by \(\{ \}\)).

Think of a set as a club with a certain membership. For example, the students who play chess are members of the chess club. The same student can be a member of many different clubs. However, do not take the analogy too far. A set is only defined by the members that form the set; two sets that have the same members are the same set.

Most of the time we will consider sets of numbers. For example, the set

\begin{equation*}
S \coloneqq \{ 0, 1, 2 \}
\end{equation*}

is the set containing the three elements 0, 1, and 2. By “\(\coloneqq\)”, we mean we are defining what \(S\) is, rather than just showing equality. We write

\begin{equation*}
1 \in S
\end{equation*}

to denote that the number 1 belongs to the set \(S\text{.}\) That is, 1 is a member of \(S\text{.}\) At times we want to say that two elements are in a set \(S\text{,}\) so we write “\(1,2 \in S\)” as a shorthand for “\(1 \in S\) and \(2 \in S\text{.}\)”

Similarly, we write

\begin{equation*}
7 \notin S
\end{equation*}

to denote that the number 7 is not in \(S\text{.}\) That is, 7 is not a member of \(S\text{.}\)

The elements of all sets under consideration come from some set we call the *universe*. For simplicity, we often consider the universe to be the set that contains only the elements we are interested in. The universe is generally understood from context and is not explicitly mentioned. In this course, our universe will most often be the set of real numbers.

While the elements of a set are often numbers, other objects, such as other sets, can be elements of a set. A set may also contain some of the same elements as another set. For example,

\begin{equation*}
T \coloneqq \{ 0, 2 \}
\end{equation*}

contains the numbers 0 and 2. In this case all elements of \(T\) also belong to \(S\text{.}\) We write \(T \subset S\text{.}\) See Figure 1 for a diagram.

- A set \(A\) is a
*subset*of a set \(B\) if \(x \in A\) implies \(x \in B\text{,}\) and we write \(A \subset B\text{.}\) That is, all members of \(A\) are also members of \(B\text{.}\) At times we write \(B \supset A\) to mean the same thing. - Two sets \(A\) and \(B\) are
*equal*if \(A \subset B\) and \(B \subset A\text{.}\) We write \(A = B\text{.}\) That is, \(A\) and \(B\) contain exactly the same elements. If it is not true that \(A\) and \(B\) are equal, then we write \(A \not= B\text{.}\) - A set \(A\) is a
*proper subset*of \(B\) if \(A \subset B\) and \(A \not= B\text{.}\) We write \(A \subsetneq B\text{.}\)

For the example \(S\) and \(T\) defined above, \(T \subset S\text{,}\) but \(T \not= S\text{.}\) So \(T\) is a proper subset of \(S\text{.}\) If \(A = B\text{,}\) then \(A\) and \(B\) are simply two names for the same exact set.

To define sets, one often uses the *set building notation*,

\begin{equation*}
\bigl\{ x \in A : P(x) \bigr\} .
\end{equation*}

This notation refers to a subset of the set \(A\) containing all elements of \(A\) that satisfy the property \(P(x)\text{.}\) Using \(S = \{ 0, 1, 2 \}\) as above, \(\{ x \in S : x \not= 2 \}\) is the set \(\{ 0, 1 \}\text{.}\) The notation is sometimes abbreviated as \(\bigl\{ x : P(x) \bigr\}\text{,}\) that is, \(A\) is not mentioned when understood from context. Furthermore, \(x \in A\) is sometimes replaced with a formula to make the notation easier to read.

The following are sets including the standard notations.

- The set of
*natural numbers*, \(\N \coloneqq \{ 1, 2, 3, \ldots \}\text{.}\) - The set of
*integers*, \(\Z \coloneqq \{ 0, -1, 1, -2, 2, \ldots \}\text{.}\) - The set of
*rational numbers*, \(\Q \coloneqq \bigl\{ \frac{m}{n} : m, n \in \Z \text{ and } n \not= 0 \bigr\}\text{.}\) - The set of even natural numbers, \(\{ 2m : m \in \N \}\text{.}\)
- The set of real numbers, \(\R\text{.}\)

Note that \(\N \subset \Z \subset \Q \subset \R\text{.}\)

We create new sets out of old ones by applying some natural operations.

- A
*union*of two sets \(A\) and \(B\) is defined as\begin{equation*} A \cup B \coloneqq \{ x : x \in A \text{ or } x \in B \} . \end{equation*} - An
*intersection*of two sets \(A\) and \(B\) is defined as\begin{equation*} A \cap B \coloneqq \{ x : x \in A \text{ and } x \in B \} . \end{equation*} - A
*complement of \(B\) relative to \(A\)*(or*set-theoretic difference*of \(A\) and \(B\)) is defined as\begin{equation*} A \setminus B \coloneqq \{ x : x \in A \text{ and } x \notin B \} . \end{equation*} - We say
*complement*of \(B\) and write \(B^c\) instead of \(A \setminus B\) if the set \(A\) is either the entire universe or if it is the obvious set containing \(B\text{,}\) and is understood from context. - We say sets \(A\) and \(B\) are
*disjoint*if \(A \cap B = \emptyset\text{.}\)

The notation \(B^c\) may be a little vague at this point. If the set \(B\) is a subset of the real numbers \(\R\text{,}\) then \(B^c\) means \(\R \setminus B\text{.}\) If \(B\) is naturally a subset of the natural numbers, then \(B^c\) is \(\N \setminus B\text{.}\) If ambiguity can arise, we use the set difference notation \(A \setminus B\text{.}\)

We illustrate the operations on the *Venn diagrams* in Figure 2. Let us now establish one of most basic theorems about sets and logic.

Let \(A, B, C\) be sets. Then

\begin{equation*}
{(B \cup C)}^c = B^c \cap C^c , \qquad
{(B \cap C)}^c = B^c \cup C^c ,
\end{equation*}

or, more generally,

\begin{equation*}
A \setminus (B \cup C) = (A \setminus B) \cap (A \setminus C) ,
\qquad
A \setminus (B \cap C) = (A \setminus B) \cup (A \setminus C) .
\end{equation*}

The first statement is proved by the second statement if we assume the set \(A\) is our “universe.”

Let us prove \(A \setminus (B \cup C) = (A \setminus B) \cap (A \setminus C)\text{.}\) Remember the definition of equality of sets. First, we must show that if \(x \in A \setminus (B \cup C)\text{,}\) then \(x \in (A \setminus B) \cap (A \setminus C)\text{.}\) Second, we must also show that if \(x \in (A \setminus B) \cap (A \setminus C)\text{,}\) then \(x \in A \setminus (B \cup C)\text{.}\) So let us assume \(x \in A \setminus (B \cup C)\text{.}\) Then \(x\) is in \(A\text{,}\) but not in \(B\) nor \(C\text{.}\) Hence \(x\) is in \(A\) and not in \(B\text{,}\) that is, \(x \in A \setminus B\text{.}\) Similarly \(x \in A \setminus C\text{.}\) Thus \(x \in (A \setminus B) \cap (A \setminus C)\text{.}\) On the other hand suppose \(x \in (A \setminus B) \cap (A \setminus C)\text{.}\) In particular, \(x \in (A \setminus B)\text{,}\) so \(x \in A\) and \(x \notin B\text{.}\) Also as \(x \in (A \setminus C)\text{,}\) then \(x \notin C\text{.}\) Hence \(x \in A \setminus (B \cup C)\text{.}\)

The proof of the other equality is left as an exercise.

The result above we called a *Theorem*, while most results we call a *Proposition*, and a few we call a *Lemma* (a result leading to another result) or *Corollary* (a quick consequence of the preceding result). Do not read too much into the naming. Some of it is traditional, some of it is stylistic choice. It is not necessarily true that a *Theorem* is always “more important” than a *Proposition* or a *Lemma*.

We will also need to intersect or union several sets at once. If there are only finitely many, then we simply apply the union or intersection operation several times. However, suppose we have an infinite collection of sets (a set of sets) \(\{ A_1, A_2, A_3, \ldots \}\text{.}\) We define

\begin{equation*}
\begin{aligned}
& \bigcup_{n=1}^\infty A_n \coloneqq \{ x : x \in A_n \text{ for some } n \in \N
\} , \\
& \bigcap_{n=1}^\infty A_n \coloneqq \{ x : x \in A_n \text{ for all } n \in \N
\} .
\end{aligned}
\end{equation*}

We can also have sets indexed by two natural numbers. For example, we can have the set of sets \(\{ A_{1,1}, A_{1,2}, A_{2,1}, A_{1,3}, A_{2,2}, A_{3,1}, \ldots \}\text{.}\) Then we write

\begin{equation*}
\bigcup_{n=1}^\infty \bigcup_{m=1}^\infty A_{n,m}
=
\bigcup_{n=1}^\infty \left( \bigcup_{m=1}^\infty A_{n,m} \right) .
\end{equation*}

And similarly with intersections.

It is not hard to see that we can take the unions in any order. However, switching the order of unions and intersections is not generally permitted without proof. For instance,

\begin{equation*}
\bigcup_{n=1}^\infty
\bigcap_{m=1}^\infty
\{ k \in \N : mk < n \}
=
\bigcup_{n=1}^\infty \emptyset = \emptyset .
\end{equation*}

However,

\begin{equation*}
\bigcap_{m=1}^\infty
\bigcup_{n=1}^\infty
\{ k \in \N : mk < n \}
=
\bigcap_{m=1}^\infty
\N
=
\N.
\end{equation*}

Sometimes, the index set is not the natural numbers. In such a case we require a more general notation. Suppose \(I\) is some set and for each \(\lambda \in
I\text{,}\) there is a set \(A_\lambda\text{.}\) Then we define

\begin{equation*}
\bigcup_{\lambda \in I} A_\lambda \coloneqq \{ x : x \in A_\lambda \text{ for some }
\lambda \in I
\} ,
\qquad
\bigcap_{\lambda \in I} A_\lambda \coloneqq \{ x : x \in A_\lambda \text{ for all }
\lambda \in I
\} .
\end{equation*}

When a statement includes an arbitrary natural number, a common method of proof is the principle of induction. We start with the set of natural numbers \(\N = \{ 1,2,3,\ldots \}\text{,}\) and we give them their natural ordering, that is, \(1 < 2 < 3 < 4 < \cdots\text{.}\) By \(S \subset \N\) having a *least element*, we mean that there exists an \(x \in S\text{,}\) such that for every \(y \in S\text{,}\) we have \(x \leq y\text{.}\)

The natural numbers \(\N\) ordered in the natural way possess the so-called *well ordering property*. We take this property as an axiom; we simply assume it is true.

Every nonempty subset of \(\N\) has a least (smallest) element

The *principle of induction* is the following theorem, which is in a sense^{ 2 } equivalent to the well ordering property of the natural numbers.

Let \(P(n)\) be a statement depending on a natural number \(n\text{.}\) Suppose that

*(basis statement)*\(P(1)\) is true.*(induction step)*If \(P(n)\) is true, then \(P(n+1)\) is true.

Then \(P(n)\) is true for all \(n \in \N\text{.}\)

Let \(S\) be the set of natural numbers \(n\) for which \(P(n)\) is not true. Suppose for contradiction that \(S\) is nonempty. Then \(S\) has a least element by the well ordering property. Call \(m \in S\) the least element of \(S\text{.}\) We know \(1 \notin
S\) by hypothesis. So \(m > 1\text{,}\) and \(m-1\) is a natural number as well. Since \(m\) is the least element of \(S\text{,}\) we know that \(P(m-1)\) is true. But the induction step says that \(P(m-1+1) = P(m)\) is true, contradicting the statement that \(m \in S\text{.}\) Therefore, \(S\) is empty and \(P(n)\) is true for all \(n \in \N\text{.}\)

Sometimes it is convenient to start at a different number than 1, all that changes is the labeling. The assumption that \(P(n)\) is true in “if \(P(n)\) is true, then \(P(n+1)\) is true” is usually called the *induction hypothesis*.

Let us prove that for all \(n \in \N\text{,}\)

\begin{equation*}
2^{n-1} \leq n! \qquad \text{(recall } n! = 1 \cdot 2 \cdot 3 \cdots n\text{)}.
\end{equation*}

We let \(P(n)\) be the statement that \(2^{n-1} \leq n!\) is true. Plug in \(n=1\) to see that \(P(1)\) is true.

Suppose \(P(n)\) is true. That is, suppose \(2^{n-1} \leq n!\) holds. Multiply both sides by 2 to obtain

\begin{equation*}
2^n \leq 2(n!) .
\end{equation*}

As \(2 \leq (n+1)\) when \(n \in \N\text{,}\) we have \(2(n!) \leq (n+1)(n!) = (n+1)!\text{.}\) That is,

\begin{equation*}
2^n \leq 2(n!) \leq (n+1)!,
\end{equation*}

and hence \(P(n+1)\) is true. By the principle of induction, \(P(n)\) is true for all \(n \in \N\text{.}\) In other words, \(2^{n-1} \leq n!\) is true for all \(n \in \N\text{.}\)

We claim that for all \(c \not= 1\text{,}\)

\begin{equation*}
1 + c + c^2 + \cdots + c^n = \frac{1-c^{n+1}}{1-c} .
\end{equation*}

Proof: It is easy to check that the equation holds with \(n=1\text{.}\) Suppose it is true for \(n\text{.}\) Then

\begin{equation*}
\begin{split}
1 + c + c^2 + \cdots + c^n + c^{n+1} & =
( 1 + c + c^2 + \cdots + c^n ) + c^{n+1} \\
& = \frac{1-c^{n+1}}{1-c} + c^{n+1} \\
& = \frac{1-c^{n+1} + (1-c)c^{n+1}}{1-c} \\
& = \frac{1-c^{n+2}}{1-c} .
\end{split}
\end{equation*}

Sometimes, it is easier to use in the inductive step that \(P(k)\) is true for all \(k = 1,2,\ldots,n\text{,}\) not just for \(k=n\text{.}\) This principle is called *strong induction* and is equivalent to the normal induction above. The proof of that equivalence is left as an exercise.

Let \(P(n)\) be a statement depending on a natural number \(n\text{.}\) Suppose that

*(basis statement)*\(P(1)\) is true.*(induction step)*If \(P(k)\) is true for all \(k = 1,2,\ldots,n\text{,}\) then \(P(n+1)\) is true.

Then \(P(n)\) is true for all \(n \in \N\text{.}\)

Informally, a *set-theoretic function* \(f\) taking a set \(A\) to a set \(B\) is a mapping that to each \(x \in A\) assigns a unique \(y \in B\text{.}\) We write \(f \colon A \to B\text{.}\) An example function \(f \colon S \to T\) taking \(S \coloneqq \{ 0, 1, 2 \}\) to \(T \coloneqq \{ 0, 2 \}\) can be defined by assigning \(f(0) \coloneqq 2\text{,}\) \(f(1) \coloneqq 2\text{,}\) and \(f(2) \coloneqq 0\text{.}\) That is, a function \(f
\colon A \to B\) is a black box, into which we stick an element of \(A\) and the function spits out an element of \(B\text{.}\) Sometimes \(f\) is called a *mapping* or a *map*, and we say \(f\) *maps \(A\) to \(B\)*.

Often, functions are defined by some sort of formula; however, you should really think of a function as just a very big table of values. The subtle issue here is that a single function can have several formulas, all giving the same function. Also, for many functions, there is no formula that expresses its values.

To define a function rigorously, first let us define the Cartesian product.

Let \(A\) and \(B\) be sets. The *Cartesian product* is the set of tuples defined as

\begin{equation*}
A \times B \coloneqq
\bigl\{ (x,y) : x \in A, y \in B \bigr\} .
\end{equation*}

For instance, \(\{ a,b \} \times \{ c , d\} =
\bigl\{
(a,c), (a,d), (b,c), (b,d)
\bigr\}\text{.}\) A more complicated example is the set \([0,1] \times [0,1]\text{:}\) a subset of the plane bounded by a square with vertices \((0,0)\text{,}\) \((0,1)\text{,}\) \((1,0)\text{,}\) and \((1,1)\text{.}\) When \(A\) and \(B\) are the same set we sometimes use a superscript 2 to denote such a product. For example, \([0,1]^2 =
[0,1] \times [0,1]\) or \(\R^2 = \R \times \R\) (the Cartesian plane).

A *function* \(f \colon A \to B\) is a subset \(f\) of \(A \times B\) such that for each \(x \in A\text{,}\) there exists a unique \(y \in B\) for which \((x,y) \in f\text{.}\) We write \(f(x) = y\text{.}\) Sometimes the set \(f\) is called the *graph* of the function rather than the function itself.

The set \(A\) is called the *domain* of \(f\) (and sometimes confusingly denoted \(D(f)\)). The set

\begin{equation*}
R(f) \coloneqq \{ y \in B : \text{there exists an } x \in A \text{ such that }
f(x)=y \}
\end{equation*}

is called the *range* of \(f\text{.}\) The set \(B\) is called the *codomain* of \(f\text{.}\)

It is possible that the range \(R(f)\) is a proper subset of the codomain \(B\text{,}\) while the domain of \(f\) is always equal to \(A\text{.}\) We generally assume that the domain of \(f\) is nonempty.

From calculus, you are most familiar with functions taking real numbers to real numbers. However, you saw some other types of functions as well. The derivative is a function mapping the set of differentiable functions to the set of all functions. Another example is the Laplace transform, which also takes functions to functions. Yet another example is the function that takes a continuous function \(g\) defined on the interval \([0,1]\) and returns the number \(\int_0^1 g(x) \,dx\text{.}\)

Consider a function \(f \colon A \to B\text{.}\) Define the *image* (or *direct image*) of a subset \(C \subset A\) as

\begin{equation*}
f(C) \coloneqq \bigl\{ f(x) \in B : x \in C \bigr\} .
\end{equation*}

Define the *inverse image* of a subset \(D
\subset B\) as

\begin{equation*}
f^{-1}(D) \coloneqq \bigl\{ x \in A : f(x) \in D \bigr\} .
\end{equation*}

In particular, \(R(f) = f(A)\text{,}\) the range is the direct image of the domain \(A\text{.}\)

Define the function \(f \colon \R \to \R\) by \(f(x) \coloneqq \sin(\pi x)\text{.}\) Then \(f\bigl([0,\nicefrac{1}{2}]\bigr) = [0,1]\text{,}\) \(f^{-1}\bigl(\{0\}\bigr) = \Z\text{,}\) etc.

Consider \(f \colon A \to B\text{.}\) Let \(C, D\) be subsets of \(B\text{.}\) Then

\begin{equation*}
\begin{aligned}
& f^{-1}( C \cup D) = f^{-1} (C) \cup f^{-1} (D) , \\
& f^{-1}( C \cap D) = f^{-1} (C) \cap f^{-1} (D) , \\
& f^{-1}( C^c) = {\bigl( f^{-1} (C) \bigr)}^c .
\end{aligned}
\end{equation*}

Read the last line of the proposition as \(f^{-1}( B \setminus C) = A \setminus f^{-1} (C)\text{.}\)

We start with the union. If \(x \in
f^{-1}( C \cup D)\text{,}\) then \(x\) is taken to \(C\) or \(D\text{,}\) that is, \(f(x) \in C\) or \(f(x) \in D\text{.}\) Thus \(f^{-1}( C \cup D) \subset f^{-1} (C) \cup f^{-1} (D)\text{.}\) Conversely if \(x \in f^{-1}(C)\text{,}\) then \(x \in f^{-1}(C \cup D)\text{.}\) Similarly for \(x \in f^{-1}(D)\text{.}\) Hence \(f^{-1}( C \cup D) \supset f^{-1} (C) \cup f^{-1} (D)\text{,}\) and we have equality.

The rest of the proof is left as an exercise.

For direct images, the best we can do is the following weaker result.

Consider \(f \colon A \to B\text{.}\) Let \(C, D\) be subsets of \(A\text{.}\) Then

\begin{equation*}
\begin{aligned}
& f( C \cup D) = f (C) \cup f (D) , \\
& f( C \cap D) \subset f (C) \cap f (D) .
\end{aligned}
\end{equation*}

The proof is left as an exercise.

Let \(f \colon A \to B\) be a function. The function \(f\) is said to be *injective* or *one-to-one* if \(f(x_1) = f(x_2)\) implies \(x_1 = x_2\text{.}\) In other words, \(f\) is injective if for all \(y \in B\text{,}\) the set \(f^{-1}(\{y\})\) is empty or consists of a single element. We call such an \(f\) an *injection*.

If \(f(A) = B\text{,}\) then we say \(f\) is *surjective* or *onto*. In other words, \(f\) is surjective if the range and the codomain of \(f\) are equal. We call such an \(f\) a *surjection*.

If \(f\) is both surjective and injective, then we say \(f\) is *bijective* or that \(f\) is a *bijection*.

When \(f \colon A \to B\) is a bijection, then the inverse image of a single element, \(f^{-1}(\{y\})\text{,}\) is always a unique element of \(A\text{.}\) We then consider \(f^{-1}\) as a function \(f^{-1} \colon B \to A\) and we write simply \(f^{-1}(y)\text{.}\) In this case, we call \(f^{-1}\) the *inverse function* of \(f\text{.}\) For instance, for the bijection \(f \colon \R \to \R\) defined by \(f(x) \coloneqq x^3\text{,}\) we have \(f^{-1}(x) = \sqrt[3]{x}\text{.}\)

Consider \(f \colon A \to B\) and \(g \colon B \to C\text{.}\) The *composition* of the functions \(f\) and \(g\) is the function \(g \circ f \colon A \to C\) defined as

\begin{equation*}
(g \circ f)(x) \coloneqq g\bigl(f(x)\bigr) .
\end{equation*}

For example, if \(f \colon \R \to \R\) is \(f(x)\coloneqq x^3\) and \(g \colon \R \to \R\) is \(g(y) = \sin(y)\text{,}\) then \((g \circ f)(x) = \sin(x^3)\text{.}\) It is left to the reader as an easy exericise to show that composition of one-to-one maps is one-to-one and composition of onto maps is onto. Therefore, composition of bijections is a bijection.

We often compare two objects in some way. We say \(1 < 2\) for natural numbers, or \(\nicefrac{1}{2} = \nicefrac{2}{4}\) for rational numbers, or \(\{ a,c \} \subset \{ a,b,c \}\) for sets. The `\(<\)’, `\(=\)’, and `\(\subset\)’ are examples of relations.

Given a set \(A\text{,}\) a *binary relation* on \(A\) is a subset \(\sR \subset A \times A\text{,}\) which are those pairs where the relation is said to hold. Instead of \((a,b) \in \sR\text{,}\) we write \(a \, \sR \, b\text{.}\)

Take \(A \coloneqq \{ 1,2,3 \}\text{.}\)

Consider the relation `\(<\)’. The corresponding set of pairs is \(\bigl\{ (1,2), (1,3), (2,3) \bigr\}\text{.}\) So \(1 < 2\) holds as \((1,2)\) is in the corresponding set of pairs, but \(3 < 1\) does not hold as \((3,1)\) is not in the set.

Similarly, the relation `\(=\)’ is defined by the set of pairs \(\bigl\{ (1,1), (2,2), (3,3) \big\}\text{.}\)

Any subset of \(A \times A\) is a relation. Let us define the relation \(\dagger\) via \(\bigl\{ (1,2), (2,1), (2,3), (3,1) \bigr\}\text{,}\) then \(1 \dagger 2\) and \(3 \dagger 1\) are true, but \(1 \dagger 3\) is not.

Let \(\sR\) be a relation on a set \(A\text{.}\) Then \(\sR\) is said to be

*Reflexive*if \(a\, \sR \, a\) for all \(a \in A\text{.}\)*Symmetric*if \(a\, \sR \, b\) implies \(b \, \sR \, a\text{.}\)*Transitive*if \(a\, \sR \, b\) and \(b \, \sR \, c\) implies \(a\, \sR \, c\text{.}\)

If \(\sR\) is reflexive, symmetric, and transitive, then it is said to be an *equivalence relation*.

Let \(A \coloneqq \{ 1,2,3 \}\text{.}\) The relation `\(<\)’ is transitive, but neither reflexive nor symmetric. The relation `\(\leq\)’ defined by \(\bigl\{ (1,1), (1,2), (1,3), (2,2), (2,3), (3,3) \bigr\}\) is reflexive and transitive, but not symmetric. Finally, a relation `\(\star\)’ defined by \(\bigl\{ (1,1), (1,2), (2,1), (2,2), (3,3) \bigr\}\) is an equivalence relation.

Equivalence relations are useful in that they divide a set into sets of “equivalent” elements.

Let \(A\) be a set and \(\sR\) an equivalence relation. An *equivalence class* of \(a \in A\text{,}\) often denoted by \([a]\text{,}\) is the set \(\{ x \in A : a\, \sR \, x \}\text{.}\)

For example, given the relation `\(\star\)’ above, there are two equivalence classes, \([1] = [2] = \{ 1,2 \}\) and \([3] = \{ 3 \}\text{.}\)

Reflexivity guarantees that \(a \in [a]\text{.}\) Symmetry guarantees that if \(b \in [a]\text{,}\) then \(a \in [b]\text{.}\) Finally, transitivity guarantees that if \(b \in [a]\) and \(c \in [b]\text{,}\) then \(c \in [a]\text{.}\) In particular, we have the following proposition, whose proof is an exercise.

If \(\sR\) is an equivalence relation on a set \(A\text{,}\) then every \(a \in A\) is in exactly one equivalence class. Moreover, \(a\, \sR \, b\) if and only if \([a] = [b]\text{.}\)

The set of rational numbers can be defined as equivalence classes of a pair of an integer and a natural number, that is elements of \(\Z \times \N\text{.}\) The relation is defined by \((a,b) \sim (c,d)\) whenever \(ad = bc\text{.}\) It is left as an exercise to prove that `\(\sim\)’ is an equivalence relation. Usually the equivalence class \(\bigl[(a,b)\bigr]\) is written as \(\nicefrac{a}{b}\text{.}\)

A subtle issue in set theory and one generating a considerable amount of confusion among students is that of cardinality, or “size” of sets. The concept of cardinality is important in modern mathematics in general and in analysis in particular. In this section, we will see the first really unexpected theorem.

Let \(A\) and \(B\) be sets. We say \(A\) and \(B\) have the same *cardinality* when there exists a bijection \(f \colon A \to B\text{.}\) We denote by \(\abs{A}\) the equivalence class of all sets with the same cardinality as \(A\) and we simply call \(\abs{A}\) the cardinality of \(A\text{.}\)

For example, \(\{ 1,2,3 \}\) has the same cardinality as \(\{ a,b,c \}\) by defining a bijection \(f(1) \coloneqq a\text{,}\) \(f(2) \coloneqq b\text{,}\) \(f(3) \coloneqq c\text{.}\) Clearly the bijection is not unique.

The existence of a bijection really is an equivalence relation. The identity, \(f(x) \coloneqq x\text{,}\) is a bijection showing reflexivity. If \(f\) is a bijection, then so is \(f^{-1}\) showing symmetry. If \(f \colon A \to B\) and \(g \colon B \to C\) are bijections, then \(g \circ f\) is a bijection of \(A\) and \(C\) showing transitivity. A set \(A\) has the same cardinality as the empty set if and only if \(A\) itself is the empty set: If \(B\) is nonempty, then no function \(f \colon B \to \emptyset\) can exist. In particular, there is no bijection of \(B\) and \(\emptyset\text{.}\)

If \(A\) has the same cardinality as \(\{ 1,2,3,\ldots,n \}\) for some \(n \in \N\text{,}\) we write \(\abs{A} \coloneqq n\text{.}\) If \(A\) is empty, we write \(\abs{A} \coloneqq 0\text{.}\) In either case, we say that \(A\) is *finite*. We say \(A\) is *infinite* or “of infinite cardinality” if \(A\) is not finite.

That the notation \(\abs{A} = n\) is justified we leave as an exercise. That is, for each nonempty finite set \(A\text{,}\) there exists a unique natural number \(n\) such that there exists a bijection from \(A\) to \(\{ 1,2,3,\ldots,n \}\text{.}\)

We can order sets by size.

We write

\begin{equation*}
\abs{A} \leq \abs{B}
\end{equation*}

if there exists an injection from \(A\) to \(B\text{.}\) We write \(\abs{A} = \abs{B}\) if \(A\) and \(B\) have the same cardinality. We write \(\abs{A} < \abs{B}\) if \(\abs{A} \leq \abs{B}\text{,}\) but \(A\) and \(B\) do not have the same cardinality.

We state without proof that \(A\) and \(B\) have the same cardinality if and only if \(\abs{A} \leq \abs{B}\) and \(\abs{B} \leq \abs{A}\text{.}\) This is the so-called Cantor–Bernstein–Schröder theorem. Furthermore, if \(A\) and \(B\) are any two sets, we can always write \(\abs{A} \leq \abs{B}\) or \(\abs{B} \leq \abs{A}\text{.}\) The issues surrounding this last statement are very subtle. As we do not require either of these two statements, we omit proofs.

The truly interesting cases of cardinality are infinite sets. We will distinguish two types of infinite cardinality.

If \(\abs{A} = \abs{\N}\text{,}\) then we say \(A\) is *countably infinite*. If \(A\) is finite or countably infinite, then we say \(A\) is *countable*. If \(A\) is not countable, then \(A\) is said to be *uncountable*.

The set of even natural numbers has the same cardinality as \(\N\text{.}\) Proof: Let \(E \subset \N\) be the set of even natural numbers. Given \(k \in E\text{,}\) write \(k=2n\) for some \(n \in \N\text{.}\) Then \(f(n) \coloneqq 2n\) defines a bijection \(f \colon \N \to E\text{.}\)

In fact, we mention without proof the following characterization of infinite sets: *A set is infinite if and only if it is in one-to-one correspondence with a proper subset of itself*.

\(\N \times \N\) is a countably infinite set. Proof: Arrange the elements of \(\N \times \N\) as follows \((1,1)\text{,}\) \((1,2)\text{,}\) \((2,1)\text{,}\) \((1,3)\text{,}\) \((2,2)\text{,}\) \((3,1)\text{,}\) .... That is, always write down first all the elements whose two entries sum to \(k\text{,}\) then write down all the elements whose entries sum to \(k+1\) and so on. Define a bijection with \(\N\) by letting 1 go to \((1,1)\text{,}\) 2 go to \((1,2)\text{,}\) and so on. See Figure 4.

The set of rational numbers is countable. Proof: (informal) For positive rational numbers follow the same procedure as in the previous example, writing \(\nicefrac{1}{1}\text{,}\) \(\nicefrac{1}{2}\text{,}\) \(\nicefrac{2}{1}\text{,}\) etc. However, leave out fractions (such as \(\nicefrac{2}{2}\)) that have already appeared. The list would continue: \(\nicefrac{1}{3}\text{,}\) \(\nicefrac{3}{1}\text{,}\) \(\nicefrac{1}{4}\text{,}\) \(\nicefrac{2}{3}\text{,}\) etc. For all rational numbers, include \(0\) and the negative numbers: \(0\text{,}\) \(\nicefrac{1}{1}\text{,}\) \(\nicefrac{-1}{1}\text{,}\) \(\nicefrac{1}{2}\text{,}\) \(\nicefrac{-1}{2}\text{,}\) etc.

For completeness, we mention the following statements from the exercises. *If \(A \subset
B\) and \(B\) is countable, then \(A\) is countable.* The contrapositive of the statement is that if \(A\) is uncountable, then \(B\) is uncountable. As a consequence, if \(\abs{A} < \abs{\N}\text{,}\) then \(A\) is finite. Similarly, if \(B\) is finite and \(A \subset B\text{,}\) then \(A\) is finite.

We give the first truly striking result about cardinality. To do so we need a notation for the set of all subsets of a set.

The *power set* of a set \(A\text{,}\) denoted by \(\sP(A)\text{,}\) is the set of all subsets of \(A\text{.}\)

For example, if \(A \coloneqq \{ 1,2\}\text{,}\) then \(\sP(A) = \bigl\{ \emptyset, \{ 1 \}, \{ 2 \}, \{ 1, 2 \} \bigr\}\text{.}\) In particular, \(\abs{A} = 2\) and \(\abs{\sP(A)} = 4 = 2^2\text{.}\) In general, for a finite set \(A\) of cardinality \(n\text{,}\) the cardinality of \(\sP(A)\) is \(2^n\text{.}\) This fact is left as an exercise. Hence, for a finite set \(A\text{,}\) the cardinality of \(\sP(A)\) is strictly larger than the cardinality of \(A\text{.}\) What is an unexpected and striking fact is that this statement is also true for infinite sets.

An injection \(f \colon A \to \sP(A)\) exists: For \(x \in A\text{,}\) let \(f(x) \coloneqq \{ x \}\text{.}\) Thus, \(\abs{A} \leq \abs{\sP(A)}\text{.}\)

To finish the proof, we must show that no function \(g \colon A \to \sP(A)\) is a surjection. Suppose \(g \colon A \to \sP(A)\) is a function. So for \(x \in A\text{,}\) \(g(x)\) is a subset of \(A\text{.}\) Define the set

\begin{equation*}
B \coloneqq \bigl\{ x \in A : x \notin g(x) \bigr\} .
\end{equation*}

We claim that \(B\) is not in the range of \(g\) and hence \(g\) is not a surjection. Suppose for contradiction that there exists an \(x_0\) such that \(g(x_0) = B\text{.}\) Either \(x_0 \in B\) or \(x_0 \notin B\text{.}\) If \(x_0 \in B\text{,}\) then \(x_0 \notin
g(x_0) = B\text{,}\) which is a contradiction. If \(x_0 \notin B\text{,}\) then \(x_0 \in
g(x_0) = B\text{,}\) which is again a contradiction. Thus such an \(x_0\) does not exist. Therefore, \(B\) is not in the range of \(g\text{,}\) and \(g\) is not a surjection. As \(g\) was an arbitrary function, no surjection exists.

One particular consequence of this theorem is that there do exist uncountable sets, as \(\sP(\N)\) must be uncountable. A related fact is that the set of real numbers (which we study in the next chapter) is uncountable. The existence of uncountable sets may seem unintuitive, and the theorem caused quite a controversy at the time it was announced. The theorem not only says that uncountable sets exist, but that there in fact exist progressively larger and larger infinite sets \(\N\text{,}\) \(\sP(\N)\text{,}\) \(\sP(\sP(\N))\text{,}\) \(\sP(\sP(\sP(\N)))\text{,}\) etc.

Show \(A \setminus (B \cap C) = (A \setminus B) \cup (A \setminus C)\text{.}\)

Prove that the principle of strong induction is equivalent to the standard induction.

Finish the proof of Proposition 0.3.15.

- Prove Proposition 0.3.16.
- Find an example for which equality of sets in \(f( C \cap D) \subset f (C) \cap f (D)\) fails. That is, find an \(f\text{,}\) \(A\text{,}\) \(B\text{,}\) \(C\text{,}\) and \(D\) such that \(f( C \cap D)\) is a proper subset of \(f(C) \cap f(D)\text{.}\)

Prove:

- \(A \cap (B \cup C) = (A \cap B) \cup (A \cap C)\text{.}\)
- \(A \cup (B \cap C) = (A \cup B) \cap (A \cup C)\text{.}\)

Let \(A \Delta B\) denote the *symmetric difference*, that is, the set of all elements that belong to either \(A\) or \(B\text{,}\) but not to both \(A\) and \(B\text{.}\)

- Draw a Venn diagram for \(A \Delta B\text{.}\)
- Show \(A \Delta B = (A \setminus B) \cup (B \setminus A)\text{.}\)
- Show \(A \Delta B = (A \cup B) \setminus ( A \cap B)\text{.}\)

For each \(n \in \N\text{,}\) let \(A_n \coloneqq \{ (n+1)k : k \in \N \}\text{.}\)

- Find \(A_1 \cap A_2\text{.}\)
- Find \(\bigcup_{n=1}^\infty A_n\text{.}\)
- Find \(\bigcap_{n=1}^\infty A_n\text{.}\)

Determine \(\sP(S)\) (the power set) for each of the following:

- \(S = \emptyset\text{,}\)
- \(S = \{1\}\text{,}\)
- \(S = \{1,2\}\text{,}\)
- \(S = \{1,2,3,4\}\text{.}\)

Let \(f \colon A \to B\) and \(g \colon B \to C\) be functions.

- Prove that if \(g \circ f\) is injective, then \(f\) is injective.
- Prove that if \(g \circ f\) is surjective, then \(g\) is surjective.
- Find an explicit example where \(g \circ f\) is bijective, but neither \(f\) nor \(g\) is bijective.

Prove by induction that \(n < 2^n\) for all \(n \in \N\text{.}\)

Show that for a finite set \(A\) of cardinality \(n\text{,}\) the cardinality of \(\sP(A)\) is \(2^n\text{.}\)

Prove \(\frac{1}{1\cdot 2} +
\frac{1}{2\cdot 3} + \cdots + \frac{1}{n(n+1)} = \frac{n}{n+1}\) for all \(n \in \N\text{.}\)

Prove \(1^3 + 2^3 + \cdots + n^3 = {\left( \frac{n(n+1)}{2} \right)}^2\) for all \(n \in \N\text{.}\)

Prove that \(n^3 + 5n\) is divisible by \(6\) for all \(n \in \N\text{.}\)

Find the smallest \(n \in \N\) such that \(2{(n+5)}^2 < n^3\) and call it \(n_0\text{.}\) Show that \(2{(n+5)}^2 < n^3\) for all \(n \geq n_0\text{.}\)

Find all \(n \in \N\) such that \(n^2 < 2^n\text{.}\)

Give an example of a countably infinite collection of finite sets \(A_1, A_2, \ldots\text{,}\) whose union is not a finite set.

Give an example of a countably infinite collection of infinite sets \(A_1, A_2, \ldots\text{,}\) with \(A_j \cap A_k\) being infinite for all \(j\) and \(k\text{,}\) such that \(\bigcap_{j=1}^\infty A_j\) is nonempty and finite.

Suppose \(A \subset B\) and \(B\) is finite. Prove that \(A\) is finite. That is, if \(A\) is nonempty, construct a bijection of \(A\) to \(\{ 1,2,\ldots,n \}\text{.}\)

Prove Proposition 0.3.24. That is, prove that if \(\sR\) is an equivalence relation on a set \(A\text{,}\) then every \(a \in A\) is in exactly one equivalence class. Then prove that \(a \, \sR \, b\) if and only if \([a] =
[b]\text{.}\)

Prove that the relation `\(\sim\)’ in Example 0.3.25 is an equivalence relation.

- Suppose \(A \subset B\) and \(B\) is countably infinite. By constructing a bijection, show that \(A\) is countable (that is, \(A\) is empty, finite, or countably infinite).
- Use part a) to show that if \(\abs{A} < \abs{\N}\text{,}\) then \(A\) is finite.

Prove the infinite versions of DeMorgan’s laws. Suppose \(A\) is a set and \(B_\lambda\) is a collection of sets for \(\lambda
\in I\text{.}\) Prove

\begin{equation*}
A \setminus \biggl(
\bigcup_{\lambda \in I} B_\lambda
\biggr)
=
\bigcap_{\lambda \in I}
(
A \setminus
B_\lambda
) ,
\qquad
A \setminus \biggl(
\bigcap_{\lambda \in I} B_\lambda
\biggr)
=
\bigcup_{\lambda \in I}
(
A \setminus
B_\lambda
) .
\end{equation*}

Suppose \(f \colon A \to B\) is a function, and for \(\lambda \in I\text{,}\) we have a collection of subsets \(C_\lambda \subset A\) and \(D_\lambda \subset B\text{.}\) Prove

\begin{equation*}
f^{-1} \biggl(
\bigcup_{\lambda \in I} D_\lambda
\biggr)
=
\bigcup_{\lambda \in I}
f^{-1}(D_\lambda) ,
\quad
f^{-1} \biggl(
\bigcap_{\lambda \in I} D_\lambda
\biggr)
=
\bigcap_{\lambda \in I}
f^{-1}(D_\lambda) ,
\end{equation*}

and

\begin{equation*}
f \biggl(
\bigcup_{\lambda \in I} C_\lambda
\biggr)
=
\bigcup_{\lambda \in I}
f(C_\lambda) ,
\quad
f \biggl(
\bigcap_{\lambda \in I} C_\lambda
\biggr)
\subset
\bigcap_{\lambda \in I}
f(C_\lambda) .
\end{equation*}

The term “modern” refers to late 19th century up to the present.

To be completely rigorous, this equivalence is only true if we also assume as an axiom that \(n-1\) exists for all natural numbers bigger than \(1\text{,}\) which we do. In this book, we are assuming all the usual arithmetic holds.

For the fans of the TV show *Futurama*, there is a movie theater in one episode called an \(\aleph_0\)-plex.

Named after the German mathematician Georg Ferdinand Ludwig Philipp Cantor (1845–1918).