Skip to main content
Logo image

Section 11.7 The Stone–Weierstrass theorem

Note: 3 lectures

Subsection 11.7.1 Weierstrass approximation

Perhaps surprisingly, even a very badly behaved continuous function is a uniform limit of polynomials. We cannot really get any “nicer” functions than polynomials. The idea of the proof is a very common approximation or “smoothing” idea (convolution with an approximate delta function) that has applications far beyond pure mathematics.


For \(x \in [0,1]\text{,}\) define
\begin{equation*} g(x) \coloneqq f\bigl((b-a)x+a\bigr)-f(a) - x\bigl(f(b)-f(a)\bigr) . \end{equation*}
If we prove the theorem for \(g\) and find the sequence \(\{ p_n \}_{n=1}^\infty\) for \(g\text{,}\) it is proved for \(f\) as we simply composed with an invertible affine function and added an affine function to \(f\text{:}\) We reverse the process and apply that to our \(p_n\text{,}\) to obtain polynomials approximating \(f\text{.}\) The function \(g\) is defined on \([0,1]\) and \(g(0)=g(1)=0\text{.}\) For simplicity, assume that \(g\) is defined on \(\R\) by letting \(g(x) \coloneqq 0\) if \(x < 0\) or \(x > 1\text{.}\) This extended \(g\) is continuous.
\begin{equation*} c_n \coloneqq {\left( \int_{-1}^1 {(1-x^2)}^n\,dx \right)}^{-1} , \qquad q_n(x) \coloneqq c_n (1-x^2)^n . \end{equation*}
The choice of \(c_n\) is so that \(\int_{-1}^1 q_n(x)\,dx = 1\text{.}\) See Figure 11.8.

Figure 11.8. Plot of the approximate delta functions \(q_n\) on \([-1,1]\) for \(n=5,10,15,20,\ldots,100\) with higher \(n\) in lighter shade.

The functions \(q_n\) are peaks around 0 (ignoring what happens outside of \([-1,1]\)) that get narrower and taller as \(n\) increases, while the area underneath is always 1. A classic approximation idea is to do a convolution integral with peaks like this: For for \(x \in [0,1]\text{,}\) let
\begin{equation*} p_n(x) \coloneqq \int_{0}^1 g(t)q_n(x-t) \,dt \quad \left( = \int_{-\infty}^\infty g(t)q_n(x-t) \,dt \right) . \end{equation*}
The idea of this convolution is that we do a “weighted average” of the function \(g\) around the point \(x\) using \(q_n\) as the weight. See Figure 11.9.

Figure 11.9. For \(x=0.3\text{,}\) the plot of \(q_{100}(x-t)\) (light gray peak centered at \(x\)), some continuous function \(g(t)\) (the jagged line) and the product \(g(t)q_{100}(x-t)\) (the bold line).

As \(q_n\) is a narrow peak, the integral mostly sees the values of \(g\) that are close to \(x\) and it does the weighted average of them. When the peak gets narrower, we compute this average closer to \(x\) and we expect the result to get closer to the value of \(g(x)\text{.}\) Really, we are approximating what is called a delta function 1  (don’t worry if you have not heard of this concept), and functions like \(q_n\) are often called approximate delta functions. We could do this with any set of polynomials that look like narrower and narrower peaks near zero. These just happen to be the simplest ones. We only need this behavior on \([-1,1]\) as the convolution sees nothing further than this as \(g\) is zero outside \([0,1]\text{.}\)
Because \(q_n\) is a polynomial, we write
\begin{equation*} q_n(x-t) = a_0(t) + a_1(t)\,x + \cdots + a_{2n}(t)\, x^{2n} , \end{equation*}
where \(a_k(t)\) are polynomials in \(t\text{,}\) and hence integrable functions. So
\begin{equation*} \begin{split} p_n(x) & = \int_{0}^1 g(t)q_n(x-t) \,dt \\ &= \left( \int_0^1 g(t) a_0(t)\,dt \right) + \left( \int_0^1 g(t) a_1(t)\,dt \right) \, x + \cdots + \left( \int_0^1 g(t) a_{2n}(t)\,dt \right) \, x^{2n} . \end{split} \end{equation*}
In other words, \(p_n\) is a polynomial 2  in \(x\text{.}\) If \(g(t)\) is real-valued, then the functions \(g(t)a_j(t)\) are real-valued and \(p_n\) has real coefficients, proving the “furthermore” part of the theorem.
We still need to prove that \(\{ p_n \}_{n=1}^\infty\) converges to \(g\text{.}\) We start with estimating the size of \(c_n\text{.}\) For \(x \in [0,1]\text{,}\) we have that \(1-x \leq 1-x^2\text{.}\) We estimate
\begin{equation*} \begin{split} c_n^{-1} = \int_{-1}^1 {(1-x^2)}^n \, dx & = 2\int_0^1 {(1-x^2)}^n \, dx \\ & \geq 2\int_0^{1} {(1-x)}^n \, dx = \frac{2}{n+1} . \end{split} \end{equation*}
So \(c_n \leq \frac{n+1}{2} \leq n\text{.}\) Let us see how small \(q_n\) is if we ignore some small interval around the origin, where the peak is. Given any \(\delta > 0\text{,}\) \(\delta < 1\text{,}\) we have that for all \(x\) such that \(\delta \leq \sabs{x} \leq 1\text{,}\)
\begin{equation*} q_n(x) \leq c_n {(1-\delta^2)}^n \leq n{(1-\delta^2)}^n , \end{equation*}
because \(q_n\) is increasing on \([-1,0]\) and decreasing on \([0,1]\text{.}\) By the ratio test, \(n{(1-\delta^2)}^n\) goes to 0 as \(n\) goes to infinity.
The function \(q_n\) is even, \(q_n(t) = q_n(-t)\text{,}\) and \(g\) is zero outside of \([0,1]\text{.}\) So for \(x \in [0,1]\text{,}\)
\begin{equation*} p_n(x) = \int_{0}^1 g(t)q_n(x-t) \, dt = \int_{-x}^{1-x} g(x+t)q_n(-t) \, dt = \int_{-1}^{1} g(x+t)q_n(t) \, dt . \end{equation*}
Let \(\epsilon > 0\) be given. As \([-1,2]\) is compact and \(g\) is continuous on \([-1,2]\text{,}\) we have that \(g\) is uniformly continuous. Pick \(0 < \delta < 1\) such that if \(\sabs{x-y} < \delta\) (and \(x,y \in [-1,2]\)), then
\begin{equation*} \sabs{g(x)-g(y)} < \frac{\epsilon}{2} . \end{equation*}
Let \(M\) be such that \(\sabs{g(x)} \leq M\) for all \(x\text{.}\) Let \(N\) be such that for all \(n \geq N\text{,}\)
\begin{equation*} 4M n{(1-\delta^2)}^n < \frac{\epsilon}{2} . \end{equation*}
Note that \(\int_{-1}^1 q_n(t) \, dt = 1\) and \(q_n(t) \geq 0\) on \([-1,1]\text{.}\) So for \(n \geq N\) and every \(x \in [0,1]\text{,}\)
\begin{equation*} \begin{aligned} \sabs{p_n(x)-g(x)} & = \abs{\int_{-1}^1 g(x+t)q_n(t) \, dt -g(x)\int_{-1}^1 q_n(t) \, dt} \\ & = \abs{\int_{-1}^1 \bigl(g(x+t)-g(x)\bigr)q_n(t) \, dt} \\ & \leq \int_{-1}^1 \sabs{g(x+t)-g(x)} q_n(t) \, dt \\ & = \int_{-1}^{-\delta} \sabs{g(x+t)-g(x)} q_n(t) \, dt \quad + \int_{-\delta}^{\delta} \sabs{g(x+t)-g(x)} q_n(t) \, dt \\ & \phantom{\leq} + \int_{\delta}^1 \sabs{g(x+t)-g(x)} q_n(t) \, dt \\ & \leq 2M \int_{-1}^{-\delta} q_n(t) \, dt \quad + \quad \frac{\epsilon}{2} \int_{-\delta}^{\delta} q_n(t) \, dt \quad + \quad 2M \int_{\delta}^1 q_n(t) \, dt \\ & \leq 2M n{(1-\delta^2)}^n(1-\delta) \quad + \quad \frac{\epsilon}{2} \quad + \quad 2M n{(1-\delta^2)}^n(1-\delta) \\ & < 4M n{(1-\delta^2)}^n + \frac{\epsilon}{2} < \epsilon . \qedhere \end{aligned} \end{equation*}
A convolution often inherits some property of the functions we are convolving. In our case the convolution \(p_n\) inherited the property of being a polynomial from \(q_n\text{.}\) The same idea of the proof is often used to get other properties. If \(q_n\) or \(g\) is infinitely differentiable, so is \(p_n\text{.}\) If \(q_n\) or \(g\) is a solution to a linear differential equation, so is \(p_n\text{.}\) Etc.
Let us note an immediate application of the Weierstrass theorem. We have already seen that countable dense subsets can be very useful.


Without loss of generality, consider only \(C\bigl([a,b],\R\bigr)\) (why?). Real polynomials are dense in \(C\bigl([a,b],\R\bigr)\) by Weierstrass. If we show that every real polynomial can be approximated by polynomials with rational coefficients, we are done. Indeed, there are only countably many rational numbers and so there are only countably many polynomials with rational coefficients (a countable union of countable sets is countable).
Further without loss of generality, suppose \([a,b]=[0,1]\text{.}\) Let
\begin{equation*} p(x) \coloneqq \sum_{k=0}^n a_k\, x^k \end{equation*}
be a polynomial of degree \(n\) where \(a_k \in \R\text{.}\) Given \(\epsilon > 0\text{,}\) pick \(b_k \in \Q\) such that \(\sabs{a_k-b_k} < \frac{\epsilon}{n+1}\text{.}\) Then if we let
\begin{equation*} q(x) \coloneqq \sum_{k=0}^n b_k \, x^k , \end{equation*}
we have
\begin{equation*} \sabs{p(x)-q(x)} = \abs{\sum_{k=0}^n (a_k-b_k) x^k} \leq \sum_{k=0}^n \sabs{a_k-b_k} x^k \leq \sum_{k=0}^n \sabs{a_k-b_k} < \sum_{k=0}^n \frac{\epsilon}{n+1} = \epsilon . \qedhere \end{equation*}

Remark 11.7.3.

While we will not prove so, the corollary above implies that \(C\bigl([a,b],\C\bigr)\) has the same cardinality as \(\R\text{,}\) which may be a bit surprising. The set of all functions \([a,b] \to \C\) has cardinality strictly greater than the cardinality of \(\R\text{,}\) it has the cardinality of the power set of \(\R\text{.}\) So the set of continuous functions is a very tiny subset of the set of all functions.
Warning! The fact that every continuous function \(f \colon [-1,1] \to \C\) (or any interval \([a,b]\)) can be uniformly approximated by polynomials
\begin{equation*} \sum_{k=0}^n a_k\, x^k \end{equation*}
does not mean that every continuous \(f\) is analytic, that is, equal to a power series
\begin{equation*} \sum_{k=0}^\infty c_k\, x^k . \end{equation*}
An analytic function is infinitely differentiable, however, the function \(\sabs{x}\) is continuous and near the origin approximable by polynomials, and so provides a counterexample.
The key distinction is that the polynomials coming from the Weierstrass theorem are not the partial sums of a power series. For each one, the coefficients \(a_k\) above can be completely different—they do not need to come from a single sequence \(\{ c_k \}_{k=1}^\infty\text{.}\)
Interestingly, to generalize Weierstrass, we will only need to use it to approximate the absolute value function by polynomials without a constant term.


As \(f(x) \coloneqq \sabs{x}\) is continuous and real-valued on \([-a,a]\text{,}\) the Weierstrass theorem gives a sequence of real polynomials \(\{ \widetilde{p}_n \}_{n=1}^\infty\) that converges to \(f\) uniformly on \([-a,a]\text{.}\) Let
\begin{equation*} p_n(x) \coloneqq \widetilde{p}_n(x) - \widetilde{p}_n(0) . \end{equation*}
Obviously \(p_n(0) = 0\text{.}\)
Given \(\epsilon > 0\text{,}\) let \(N\) be such that for \(n \geq N\text{,}\) we have \(\bigl\lvert\widetilde{p}_n(x)-\sabs{x}\big\rvert < \nicefrac{\epsilon}{2}\) for all \(x \in [-a,a]\text{.}\) In particular, \(\sabs{\widetilde{p}_n(0)} < \nicefrac{\epsilon}{2}\text{.}\) Then for \(n \geq N\text{,}\)
\begin{equation*} \bigl\lvert p_n(x)-\sabs{x} \bigr\rvert = \bigl\lvert \widetilde{p}_n(x) - \widetilde{p}_n(0) - \sabs{x} \bigr\rvert \leq \bigl\lvert \widetilde{p}_n(x) - \sabs{x} \bigr\rvert + \sabs{\widetilde{p}_n(0)} < \nicefrac{\epsilon}{2} + \nicefrac{\epsilon}{2} = \epsilon . \qedhere \end{equation*}
Generalizing the corollary, we can make the polynomials from the Weierstrass theorem be equal to our target function at one point, not just for \(\sabs{x}\text{,}\) but that’s the one we will need. It is also possible (see Exercise 11.7.14) to make the polynomials equal at finitely many points by subtracting not a constant but a properly crafted polynomial.

Subsection 11.7.2 Stone–Weierstrass approximation

We want to abstract away what is not really necessary and prove a general version of the Weierstrass theorem, the Stone–Weierstrass theorem 3 . Polynomials are dense in the space of continuous functions on a compact interval. What other kind of families of functions are also dense? And if the domain is an arbitrary metric space, then we no longer have polynomials to begin with.

Definition 11.7.5.

A set \(\sA\) of complex-valued functions \(f \colon X \to \C\) is said to be an algebra (sometimes complex algebra or algebra over \(\C\)) if for all \(f, g \in \sA\) and \(c \in \C\text{,}\) we have
  1. \(f+g \in \sA\text{.}\)
  2. \(fg \in \sA\text{.}\)
  3. \(cg \in \sA\text{.}\)
A real algebra or an algebra over \(\R\) is a set of real-valued functions that satisfies the three properties above for \(c \in \R\text{.}\)
We are interested in the case when \(X\) is a compact metric space. Then \(C(X,\C)\) and \(C(X,\R)\) are metric spaces. Given a set \(\sA \subset C(X,\C)\text{,}\) the set of all uniform limits is the metric space closure \(\widebar{\sA}\text{.}\) When we talk about closure of an algebra from now on we mean the closure in \(C(X,\C)\) as a metric space. Same for \(C(X,\R)\text{.}\)
The set \(\sP\) of all polynomials is an algebra in \(C\bigl([a,b],\C\bigr)\text{,}\) and we have shown that its closure \(\widebar{\sP} = C\bigl([a,b],\C\bigr)\text{.}\) That is, it is dense. That is the sort of result that we wish to prove.
We leave the following proposition as an exercise.
We distill the properties of polynomials that are sufficient for an approximation theorem.

Definition 11.7.7.

Let \(\sA\) be a set of complex-valued functions defined on a set \(X\text{.}\)
  1. \(\sA\) separates points if for every \(x,y \in X\) with \(x \not= y\text{,}\) there is an \(f \in \sA\) such that \(f(x) \not= f(y)\text{.}\)
  2. \(\sA\) vanishes at no point if for every \(x \in X\) there is an \(f \in \sA\) such that \(f(x) \not= 0\text{.}\)

Example 11.7.8.

Given any \(X \subset \R\) (or \(X \subset \C\)), the set \(\sP\) of polynomials in one variable separates points and vanishes at no point on \(X\text{.}\) That is, \(1 \in \sP\text{,}\) so it vanishes at no point. And for \(x,y \in X\text{,}\) \(x\not= y\text{,}\) take \(f(t) \coloneqq t\text{.}\) Then \(f(x) = x \not= y = f(y)\text{.}\) So \(\sP\) separates points.

Example 11.7.9.

The set of functions of the form
\begin{equation*} f(t) = a_0 + \sum_{n=1}^k a_n \cos(nt) \end{equation*}
is an algebra, which follows by the identity \(\cos(mt)\cos(nt) = \frac{\cos((n+m) t)}{2}+ \frac{\cos((n-m) t)}{2}\text{.}\) The algebra vanishes at no point as it contains a constant function. It does not separate points if the domain is an interval \([-a,a]\text{,}\) as \(f(-t) = f(t)\) for all \(t\text{.}\) It does separate points if the domain is \([0,\pi]\text{;}\) \(\cos(t)\) is one-to-one on \([0,\pi]\text{.}\)

Example 11.7.10.

The set \(\sP\) of real polynomials with no constant term is an algebra that vanishes at the origin. Clearly, any function in the closure of \(\sP\) also vanishes at the origin, so the closure of \(\sP\) cannot be \(C\bigl([0,1],\R\bigr)\text{.}\)
Similarly, the set of constant functions is an algebra that does not separate points. Uniform limits of constants are constants, so we also do not obtain all continuous functions.
It is interesting that these two properties, “vanishes at no point” and “separates points,” are sufficient to obtain approximation of any real-valued continuous function. Before we prove this theorem, we note that such an algebra can interpolate a finite number of values exactly. We will state this result only for two points as that is all that we will require.


There must exist an \(g,h,k \in \sA\) such that \(g(x) \not= g(y)\text{,}\) \(h(x) \not= 0\text{,}\) \(k(y) \not= 0\text{.}\) Let
\begin{equation*} \begin{split} f & \coloneqq c \frac{\bigl(g - g(y)\bigr)h}{\bigl(g(x)-g(y)\bigr)h(x) } + d \frac{\bigl(g - g(x)\bigr)k}{\bigl(g(y)-g(x)\bigr)k(y)} \\ & = c \frac{gh - g(y)h}{g(x)h(x)-g(y)h(x) } + d \frac{gk - g(x)k}{g(y)k(y)-g(x)k(y)} . \end{split} \end{equation*}
We are not dividing by zero (clear from the first formula). Also by the first formula, \(f(x) = c\) and \(f(y) = d\text{.}\) By the second formula, \(f \in \sA\) (as \(\sA\) is an algebra).
The proof is divided into several claims.
Claim 1: If \(f \in \widebar{\sA}\text{,}\) then \(\sabs{f} \in \widebar{\sA}\text{.}\)


The function \(f\) is bounded (continuous on a compact set), so there is an \(M\) such that \(\sabs{f(x)} \leq M\) for all \(x \in X\text{.}\) Let \(\epsilon > 0\) be given. By the corollary to the Weierstrass theorem, there exists a real polynomial \(c_1 y + c_2 y^2 + \cdots+ c_n y^n\) (vanishing at \(y=0\)) such that
\begin{equation*} \abs{\sabs{y} - \sum_{k=1}^n c_k y^k} < \epsilon \end{equation*}
for all \(y \in [-M,M]\text{.}\) Because \(\widebar{\sA}\) is an algebra and because there is no constant term in the polynomial,
\begin{equation*} \sum_{k=1}^n c_k f^k \in \widebar{\sA} . \end{equation*}
As \(\sabs{f(x)} \leq M\text{,}\) then for all \(x \in X\)
\begin{equation*} \abs{\sabs{f(x)} - \sum_{k=1}^n c_k {\bigl(f(x)\bigr)}^k} < \epsilon . \end{equation*}
So \(\sabs{f}\) is in the closure of \(\widebar{\sA}\text{,}\) which is itself closed. In other words, \(\sabs{f} \in \widebar{\sA}\text{.}\)
Claim 2: If \(f \in \widebar{\sA}\) and \(g \in \widebar{\sA}\text{,}\) then \(\max(f,g) \in \widebar{\sA}\) and \(\min(f,g) \in \widebar{\sA}\text{,}\) where
\begin{equation*} \bigl(\max(f,g)\bigr) (x) \coloneqq \max \bigl\{ f(x), g(x) \bigr\} , \qquad \text{and} \qquad \bigl(\min(f,g)\bigr) (x) \coloneqq \min \bigl\{ f(x), g(x) \bigr\} . \end{equation*}


\begin{equation*} \max(f,g) = \frac{f+g}{2} + \frac{\sabs{f-g}}{2} , \qquad\text{and}\qquad \min(f,g) = \frac{f+g}{2} - \frac{\sabs{f-g}}{2} . \end{equation*}
As \(\widebar{\sA}\) is an algebra we are done.
By induction, the claim is also true for the minimum or maximum of a finite collection of functions.
Claim 3: Given \(f \in C(X,\R)\text{,}\) \(x \in X\text{,}\) and \(\epsilon > 0\text{,}\) there exists a \(g_x \in \widebar{\sA}\) with \(g_x(x) = f(x)\) and
\begin{equation*} g_x(t) > f(t)-\epsilon \qquad \text{for all } t \in X. \end{equation*}


Fix \(f\text{,}\) \(x\text{,}\) and \(\epsilon\text{.}\) By Proposition 11.7.11, for every \(y \in X\text{,}\) find an \(h_y \in \sA\) such that
\begin{equation*} h_y(x) = f(x), \qquad h_y(y)=f(y) . \end{equation*}
As \(h_y\) and \(f\) are continuous, the function \(h_y-f\) is continuous, and the set
\begin{equation*} U_y \coloneqq \bigl\{ t \in X : h_y(t) > f(t) -\epsilon \bigr\} = {(h_y-f)}^{-1} \bigl( (-\epsilon,\infty) \bigr) \end{equation*}
is open (it is the inverse image of an open set by a continuous function). Furthermore \(y \in U_y\text{.}\) So the sets \(U_y\) cover \(X\text{.}\) The space \(X\) is compact, so there exist finitely many points \(y_1,y_2,\ldots,y_n\) in \(X\) such that
\begin{equation*} X = \bigcup_{k=1}^n U_{y_k} . \end{equation*}
\begin{equation*} g_x \coloneqq \max(h_{y_1},h_{y_2},\ldots,h_{y_n}) . \end{equation*}
By Claim 2, \(g_x \in \widebar{\sA}\text{.}\) See Figure 11.10. Moreover,
\begin{equation*} g_x(t) > f(t) -\epsilon \end{equation*}
for all \(t \in X\text{,}\) since for every \(t\text{,}\) there is a \(y_k\) such that \(t \in U_{y_k}\text{,}\) and so \(h_{y_k}(t) > f(t) -\epsilon\text{.}\) Finally, \(h_y(x) = f(x)\) for all \(y \in X\text{,}\) so \(g_x(x) = f(x)\text{.}\)

Figure 11.10. Construction of \(g_x\) out of two \(h_{y_1}\) (longer dashes) and \(h_{y_2}\) (shorter dashes).

What we have now is for each \(x\) a function \(g_x \in \widebar{\sA}\) that is within \(\epsilon\) of \(f\) near \(x\) (being continuous), but also \(g_x\) is within \(\epsilon\) of \(f\) from at least one side at all points. If we cover \(X\) with neighborhoods where \(g_x\) is a good approximation, we can repeat the idea of the argument with a minimum to get a function that is within \(\epsilon\) from both sides.
Claim 4: If \(f \in C(X,\R)\) and \(\epsilon > 0\) is given, then there exists an \(\varphi \in \widebar{\sA}\) such that
\begin{equation*} \sabs{f(x) - \varphi(x)} < \epsilon . \end{equation*}


For every \(x \in X\text{,}\) find the function \(g_x\) as in Claim 3. Let
\begin{equation*} V_x \coloneqq \bigl\{ t \in X : g_x(t) < f(t) + \epsilon \bigr\}. \end{equation*}
The sets \(V_x\) are open as \(g_x\) and \(f\) are continuous. As \(g_x(x) = f(x)\text{,}\) then \(x \in V_x\text{.}\) So the sets \(V_x\) cover \(X\text{.}\) By compactness of \(X\text{,}\) there are finitely many points \(x_1,x_2,\ldots,x_n\) such that
\begin{equation*} X = \bigcup_{k=1}^n V_{x_k} . \end{equation*}
\begin{equation*} \varphi \coloneqq \min(g_{x_1},g_{x_2},\ldots,g_{x_n}) . \end{equation*}
By Claim 2, \(\varphi \in \widebar{\sA}\text{.}\) Similarly as before (same argument as in Claim 3), for all \(t \in X\text{,}\)
\begin{equation*} \varphi(t) < f(t) + \epsilon . \end{equation*}
Since all the \(g_x\) satisfy \(g_x(t) > f(t) - \epsilon\) for all \(t \in X\text{,}\) \(\varphi(t) > f(t) - \epsilon\) as well. Hence, for all \(t\text{,}\)
\begin{equation*} -\epsilon < \varphi(t) - f(t) < \epsilon , \end{equation*}
which is the desired conclusion.
The proof of the theorem follows from Claim 4. The claim states that an arbitrary continuous function is in the closure of \(\widebar{\sA}\text{,}\) which is already closed. The theorem is proved.

Example 11.7.13.

The functions of the form
\begin{equation*} f(t) = \sum_{k=1}^n c_k \, e^{kt}, \end{equation*}
for \(c_k \in \R\text{,}\) are dense in \(C\bigl([a,b],\R\bigr)\text{.}\) Such functions are a real algebra, which follows from \(e^{kt} e^{\ell t} = e^{(k+\ell)t}\text{.}\) They separate points as \(e^t\) is one-to-one. As \(e^t > 0\) for all \(t\text{,}\) the algebra does not vanish at any point.
In general, given a set of functions that separates points and does not vanish at any point, we let these functions generate an algebra by considering all the linear combinations of arbitrary multiples of such functions. That is, we consider all real polynomials without constant term of such functions. In the example above, the algebra is generated by \(e^t\text{.}\) We consider polynomials in \(e^t\) without constant term.

Example 11.7.14.

We mentioned that the set of all functions of the form
\begin{equation*} a_0 + \sum_{n=1}^N a_n \cos(nt) \end{equation*}
is an algebra. When considered on \([0,\pi]\text{,}\) it separates points and vanishes nowhere so Stone–Weierstrass applies. As for polynomials, you do not want to conclude that every continuous function on \([0,\pi]\) has a uniformly convergent Fourier cosine series, that is, that every continuous function can be written as
\begin{equation*} a_0 + \sum_{n=1}^\infty a_n \cos(nt) . \end{equation*}
That is not true! There exist continuous functions whose Fourier series does not converge even pointwise let alone uniformly. See Section 11.8.
To obtain Stone–Weierstrass for complex algebras, we must make an extra assumption.

Definition 11.7.15.

An algebra \(\sA\) is self-adjoint if for all \(f \in \sA\text{,}\) the function \(\bar{f}\) defined by \(\bar{f}(x) \coloneqq \overline{f(x)}\) is in \(\sA\text{,}\) where by the bar we mean the complex conjugate.


Suppose \(\sA_\R \subset \sA\) is the set of the real-valued elements of \(\sA\text{.}\) For \(f \in \sA\text{,}\) write \(f = u+iv\) where \(u\) and \(v\) are real-valued. Then
\begin{equation*} u = \frac{f+\bar{f}}{2}, \qquad v = \frac{f-\bar{f}}{2i} . \end{equation*}
So \(u, v \in \sA\) as \(\sA\) is a self-adjoint algebra, and since they are real-valued \(u, v \in \sA_\R\text{.}\)
If \(x \not= y\text{,}\) then find an \(f \in \sA\) such that \(f(x) \not= f(y)\text{.}\) If \(f = u+iv\text{,}\) then it is obvious that either \(u(x) \not= u(y)\) or \(v(x) \not= v(y)\text{.}\) So \(\sA_\R\) separates points. Similarly, for every \(x\) find \(f \in \sA\) such that \(f(x) \not= 0\text{.}\) If \(f = u+iv\text{,}\) then either \(u(x) \not= 0\) or \(v(x) \not= 0\text{.}\) So \(\sA_\R\) vanishes at no point. The set \(\sA_\R\) is a real algebra, and satisfies the hypotheses of the real Stone–Weierstrass theorem. Given any \(f = u+iv \in C(X,\C)\text{,}\) we find \(g,h \in \sA_\R\) such that \(\sabs{u(t)-g(t)} < \nicefrac{\epsilon}{2}\) and \(\sabs{v(t)-h(t)} < \nicefrac{\epsilon}{2}\) for all \(t \in X\text{.}\) Next, \(g+i h \in \sA\text{,}\) and
\begin{multline*} \abs{f(t) - \bigl(g(t)+ih(t)\bigr)} = \abs{u(t)+iv(t) - \bigl(g(t)+ih(t)\bigr)} \\ \leq \sabs{u(t)-g(t)}+\sabs{v(t)-h(t)} < \nicefrac{\epsilon}{2} + \nicefrac{\epsilon}{2} = \epsilon \end{multline*}
for all \(t \in X\text{.}\) So \(\widebar{\sA} = C(X,\C)\text{.}\)
The self-adjoint requirement is necessary, although it is not so obvious to see it. For an example, see Exercise 11.7.9.
We give an interesting application. When working with functions of two variables, it may be useful to work with functions of the form \(f(x)g(y)\) rather than \(F(x,y)\text{.}\) For example, they are easier to integrate. We have the following.

Example 11.7.17.

Any continuous \(F \colon [0,1] \times [0,1] \to \C\) can be approximated uniformly by functions of the form
\begin{equation*} \sum_{j=1}^n f_j(x) g_j(y) , \end{equation*}
where \(f_j \colon [0,1] \to \C\) and \(g_j \colon [0,1] \to \C\) are continuous.
Proof: It is not hard to see that the functions of the above form are a complex algebra. It is equally easy to show that they vanish nowhere, separate points, and the algebra is self-adjoint. As \([0,1] \times [0,1]\) is compact, Stone–Weierstrass obtains the result.

Subsection 11.7.3 Exercises

Exercise 11.7.1.

Prove Proposition 11.7.6. Hint: If \(\{ f_n \}_{n=1}^\infty\) is a sequence in \(C(X,\R)\) converging to \(f\text{,}\) then as \(f\) is bounded, show that \(f_n\) is uniformly bounded, that is, there exists a single bound for all \(f_n\) (and \(f\)).

Exercise 11.7.2.

Suppose \(X \coloneqq \R\) (not compact in particular). Show that \(f(t) \coloneqq e^t\) is not possible to uniformly approximate by polynomials on \(X\text{.}\) Hint: Consider \(\babs{\frac{e^t}{t^n}}\) as \(t \to \infty\text{.}\)

Exercise 11.7.3.

Suppose \(f \colon [0,1] \to \C\) is a uniform limit of a sequence of polynomials of degree at most \(d\text{,}\) then the limit is a polynomial of degree at most \(d\text{.}\) Conclude that to approximate a function which is not a polynomial, we need the degree of the approximations to go to infinity.
Hint: First prove that if a sequence of polynomials of degree \(d\) converges uniformly to the zero function, then the coefficients converge to zero. One way to do this is linear algebra: Consider a polynomial \(p\) evaluated at \(d+1\) points to be a linear operator taking the coefficients of \(p\) to the values of \(p\) (an operator in \(L(\R^{d+1})\)).

Exercise 11.7.4.

Suppose \(f \colon [0,1] \to \R\) is continuous and \(\int_0^1 f(x) x^n \, dx = 0\) for all \(n = 0,1,2,\ldots\text{.}\) Show that \(f(x) = 0\) for all \(x \in [0,1]\text{.}\) Hint: Approximate by polynomials to show that \(\int_0^1 {\bigl( f(x) \bigr)}^2 \, dx = 0\text{.}\)

Exercise 11.7.5.

Suppose \(I \colon C\bigl([0,1],\R\bigr) \to \R\) is a linear continuous function such that \(I(x^n) = \frac{1}{n+1}\) for all \(n=0,1,2,3,\ldots\text{.}\) Prove that \(I(f) = \int_0^1 f\) for all \(f \in C\bigl([0,1],\R\bigr)\text{.}\)

Exercise 11.7.6.

Let \(\sA\) be the collection of real polynomials in \(x^2\text{,}\) that is, polynomials of the form \(c_0 + c_1 x^2 + c_2 x^4 + \cdots + c_d x^{2d}\text{.}\)
  1. Show that every \(f \in C\bigl([0,1],\R\bigr)\) is a uniform limit of polynomials from \(\sA\text{.}\)
  2. Find an \(f \in C\bigl([-1,1],\R\bigr)\) that is not a uniform limit of polynomials from \(\sA\text{.}\)
  3. Which hypothesis of the real Stone–Weierstrass is not satisfied for the domain \([-1,1]\text{?}\)

Exercise 11.7.7.

Let \(\sabs{z}=1\) define the unit circle \(S^1 \subset \C\text{.}\)
  1. Show that functions of the form
    \begin{equation*} \sum_{k=-n}^n c_k\, z^k \end{equation*}
    are dense in \(C(S^1,\C)\text{.}\) Notice the negative powers.
  2. Show that functions of the form
    \begin{equation*} c_0 + \sum_{k=1}^n c_k \, z^k + \sum_{k=1}^n c_{-k}\, \bar{z}^k \end{equation*}
    are dense in \(C(S^1,\C)\text{.}\) These are so-called harmonic polynomials, and this approximation leads to, for example, the solution of the steady state heat problem.
Hint: A good way to write the equation for \(S^1\) is \(z \bar{z} = 1\text{.}\)

Exercise 11.7.8.

Show that for complex numbers \(c_j\text{,}\) the set of functions of \(x\) on \([-\pi,\pi]\) of the form
\begin{equation*} \sum_{k=-n}^n c_k \, e^{ik x} \end{equation*}
satisfies the hypotheses of the complex Stone–Weierstrass theorem and therefore such functions are dense in the \(C\bigl([-\pi,\pi],\C\bigr)\text{.}\)

Exercise 11.7.9.

Let \(S^1 \subset \C\) be the unit circle, that is the set where \(\sabs{z} = 1\text{.}\) Orient this set counterclockwise. Let \(\gamma(t) \coloneqq e^{it}\text{.}\) For the one-form \(f(z)\,dz\) we write 4 
\begin{equation*} \int_{S^1} f(z) \,dz \coloneqq \int_0^{2\pi} f(e^{it}) \, i e^{it} \, dt . \end{equation*}
  1. Prove that for all nonnegative integers \(k = 0,1,2,3,\ldots\text{,}\) we have \(\int_{S^1} z^k \, dz = 0\text{.}\)
  2. Prove that if \(P(z) = \sum_{k=0}^n c_k z^k\) is a polynomial in \(z\text{,}\) then \(\int_{S^1} P(z) \, dz = 0\text{.}\)
  3. Prove \(\int_{S^1} \bar{z} \, dz \not= 0\text{.}\)
  4. Conclude that polynomials in \(z\) (this algebra of functions is not self-adjoint) are not dense in \(C(S^1,\C)\text{.}\)

Exercise 11.7.10.

Let \((X,d)\) be a compact metric space and suppose \(\sA \subset C(X,\R)\) is a real algebra that separates points, but vanishes at exactly one point \(x_0 \in X\text{.}\) That is, \(f(x_0) = 0\) for all \(f \in \sA\text{,}\) but for every \(y \in X \setminus \{ x_0 \}\) there is a \(\varphi \in \sA\) such that \(\varphi(y) \not= 0\text{.}\) Prove that every function \(g \in C(X,\R)\) such that \(g(x_0) = 0\) is a uniform limit of functions from \(\sA\text{.}\)

Exercise 11.7.11.

Let \((X,d)\) be a compact metric space and suppose \(\sA \subset C(X,\R)\) is a real algebra. Suppose that for each \(y \in X\) the closure \(\widebar{\sA}\) contains the function \(\varphi_y(x) \coloneqq d(y,x)\text{.}\) Then \(\widebar{\sA} = C(X,\R)\text{.}\)

Exercise 11.7.12.

  1. Suppose \(f \colon [a,b] \to \C\) is continuously differentiable. Show that there exists a sequence of polynomials \(\{ p_n \}_{n=1}^\infty\) that converges in the \(C^1\) norm to \(f\text{,}\) that is, \(\snorm{f - p_n}_{[a,b]} + \snorm{f'-p_n'}_{[a,b]} \to 0\) as \(n \to \infty\text{.}\)
  2. Suppose \(f \colon [a,b] \to \C\) is \(k\) times continuously differentiable. Show that there exists a sequence of polynomials \(\{ p_n \}_{n=1}^\infty\) that converges in the \(C^k\) norm to \(f\text{,}\) that is,
    \begin{equation*} \sum_{j=0}^k \norm{f^{(j)} - p_n^{(j)}}_{[a,b]} \to 0 \qquad \text{as} \qquad n \to \infty. \end{equation*}

Exercise 11.7.13.

  1. Show that an even function \(f \colon [-1,1] \to \R\) is a uniform limit of polynomials with even powers only, that is, polynomials of the form \(a_0 + a_1 x^2 + a_2 x^4 + \cdots + a_k x^{2k}\text{.}\)
  2. Show that an odd function \(f \colon [-1,1] \to \R\) is a uniform limit of polynomials with odd powers only, that is, polynomials of the form \(b_1 x + b_2 x^3 + b_3 x^5 + \cdots + b_k x^{2k-1}\text{.}\)

Exercise 11.7.14.

Let \(f \colon [a,b] \to \R\) be continuous.
  1. Given two points \(x_1,x_2 \in [a,b]\text{,}\) show that there exists a sequence of real polynomials \(\{ p_n \}_{n=1}^\infty\) so that \(p_n(x_1) = f(x_1)\) and \(p_n(x_2) = f(x_2)\) for all \(n\text{.}\)
  2. Generalize the previous part to \(k\) points: Given the points \(x_1,x_2,\ldots,x_k \in [a,b]\text{,}\) show that there exists a sequence of real polynomials \(\{ p_n \}_{n=1}^\infty\) so that for all \(n\text{,}\) \(p_n(x_j) = f(x_j)\) for \(j=1,2,\ldots,k\text{.}\)
    Hint: The polynomial \((x-x_1)(x-x_2)\cdots(x-x_{\ell-1})(x-x_{\ell+1}) \cdots(x-x_k)\) is zero at \(x_j\) for \(j\not=\ell\) but nonzero at \(x_\ell\text{.}\) Use it to construct a polynomial that takes prescribed values at \(x_1,x_2,\ldots,x_k\text{.}\)
The delta function is not actually a function, it is a “thing” that should give “\(\int_{-\infty}^\infty g(t) \delta(x-t) \, dt = g(x)\text{.}\)
Do note that the functions \(a_j\) depend on \(n\text{,}\) so the coefficients of \(p_n\) change as \(n\) changes.
Named after the American mathematician Marshall Harvey Stone (1903–1989), and the German mathematician Karl Theodor Wilhelm Weierstrass (1815–1897).
Alternatively, one could define \(dz \coloneqq dx + i \, dy\) and extend the path integral from Chapter 9 to complex-valued one-forms.
For a higher quality printout use the PDF versions: or