Linear Transformations

Definition. Let V and W be vector spaces over a field F. A linear transformation $f: V \to W$ is a function which satisfies

$$f(ku + v) = kf(u) + f(v) \quad\hbox{for all}\quad u, v \in V, \quad k \in F.$$

Note that u and v are vectors, whereas k is a scalar (number).

You can break the definition down into two pieces:

$$f(u + v) = f(u) + f(v) \quad\hbox{for all}\quad u, v \in V,$$

$$f(au) = kf(u) \quad\hbox{for all}\quad k \in F, \quad u \in V.$$

Conversely, it is clear that if these two equations are satisfied then f is a linear transformation.

The notation $f: F^m \to F^n$ means that f is a function which takes a vector in $F^m$ as input and produces a vector in $F^n$ as output.


Example. The function

$$f(x,y) = (x^2, y^2, xy)$$

is a function $f: \real^2 \to \real^3$ .

Sometimes, the outputs are given variable names, e.g.

$$(u, v, w) = f(x,y) = (x^2, y^2, xy).$$

This is the same as saying

$$u = x^2, \quad v = y^2, \quad w = uv.$$

You plug numbers into f in the obvious way:

$$f(3,-2) = \left(3^2, (-2)^2, 3\cdot (-2)\right) = (9, 4, -6).$$

Since you can think of f as taking a 2-dimensional vector as its input and producing a 3-dimensional vector as its output, you could write

$$(u, v, w) = f((x,y)) = (x^2, y^2, xy).$$

But I'll supress some of the angle brackets when there's no danger of confusion.


Example. Define $f: \real^2 \to \real^3$ by

$$f(x,y) = (x + 2y, x - y, -2x + 3y).$$

I'll show that f is a linear transformation the hard way. First, I need two 2-dimensional vectors:

$$u = (u_1, u_2), \quad v = (v_1, v_2).$$

I also need a real number k.

Notice that I don't pick specific numerical vectors like $u = (4, -2)$ or a specific number for k, like $k = 3$ . I have to do the proof with general vectors and numbers.

I must show that

$$f(ku + v) = kf(u) + f(v).$$

I'll compute the left side and the right side, then show that they're equal. Here's the left side:

$$f(ku + v) = f\left(k(u_1, u_2) + (v_1, v_2)\right) = f\left((ku_1 + v_1, ku_2 + v_2)\right) =$$

$$((ku_1 + v_1) + 2(ku_2 + v_2), (ku_1 + v_1) - (ku_2 + v_2), -2(ku_1 + v_1) + 3(ku_2 + v_2)) =$$

$$(k(u_1 + 2u_2) + (v_1 + 2v_2), k(u_1 - u_2) + (v_1 - v_2), k(-2u_1 + 3u_2) + (-2v_1 + 3v_2)).$$

And here's the right side:

$$kf(u) + f(v) = kf\left((u_1, u_2)\right) + f\left((v_1, v_2)\right) = k(u_1 + 2u_2, u_1 - u_2, -2u_1 + 3u_2) + (v_1 + 2v_2, v_1 - v_2, -2v_1 + 3v_2) =$$

$$(k(u_1 + 2u_2) + (v_1 + 2v_2), k(u_1 - u_2) + (v_1 - v_2), k(-2u_1 + 3u_2) + (-2v_1 + 3v_2)).$$

Therefore, $f(ku + v) = kf(u) + f(v)$ , so f is a linear transformation.

This was a pretty disgusting computation, and it would be a shame to have to go through this every time. I'll come up with a better way of recognizing linear transformations shortly.


Example. The function

$$f(x,y) = (x^2, y^2, xy)$$

is not a linear transformation from $\real^2$ to $\real^3$ .

To prove that a function is not a linear transformation --- unlike proving that it is --- you must come up with specific, numerical vectors u and v and a number k for which the defining equation is false. There is often no systematic way to come up with such a counterexample; you simply try some numbers till you get what you want.

I'll take $k = 1$ , $u = (1, 2)$ , and $v = (1, 1)$ . This time, I want to show $f(ku + v) \ne kf(u) +
   f(v)$ ; again, I compute the two sides.

$$f(ku + v) = f\left(1\cdot (1, 2) + (1, 1)\right) = f\left((2, 3)\right) = (2^2, 3^2, 2\cdot 3) = (4, 9, 6).$$

$$kf(u) + f(v) = 1\cdot f\left((1, 2)\right) + f\left((1, 1)\right) = (1^2, 2^2, 1\cdot 2) + (1^2, 1^2, 1\cdot 1) =$$

$$(1, 4, 2) + (1, 1, 1) = (2, 5, 3).$$

Since $f(ku + v) \ne kf(u) + f(v)$ , f is not a linear transformation.


Example. Let $f: \real^n \to \real^m$ be a function. f is differentiable at a point $a \in \real^n$ if there is a linear transformation $Df(a): \real^n \to \real^m$ such that

$$\lim_{h \to 0} \dfrac{|f(a + h) - f(a) - Df(a)(h)|}{|h|} = 0.$$

In this definition, if $v \in
   \real^m$ , then $|v|$ is the length of v:

$$|v| = |(v_1, v_2, \ldots, v_m)| = \sqrt{v_1^2 + v_2^2 + \cdots + v_m^2}.$$

Since f produces outputs in $\real^m$ , you can think of f as being built out of m component functions. Suppose that $f = (f_1, f_2, \ldots, f_m)$ .

It turns out that the matrix of $Df(a)$ (relative to the standard bases on $\real^n$ and $\real^m$ ) is the $m \times n$ matrix whose $(i,
   j)^{\rm th}$ entry is

$$D_jf^i = \dfrac{\partial f_i}{\partial x_j}.$$

This matrix is called the Jacobian matrix of f at a.

For example, suppose $f: \real^2 \to
   \real^2$ is given by

$$f(x, y) = (x^2 y^3, x^2 - y^5).$$

Then

$$Df(x, y) = \left[\matrix{ 2x y^3 & 3x^2 y^2 \cr 2x & -5y^4 \cr}\right].\quad\halmos$$


The next lemma gives an easy way of constructing --- or recognizing --- linear transformations.

Theorem. Let F be a field, and let A be an $n \times m$ matrix over F. The function $f: F^m \to F^n$ given by

$$f(u) = A\cdot u \quad\hbox{for}\quad u \in F^m$$

is a linear transformation.

Proof. This is pretty easy given the rules for matrix arithmetic. Let $u, v \in
   F^m$ and let $k \in F$ . Then

$$f(ku + v) = A\cdot (ku + v) = A\cdot (ku) + A\cdot v = k(A\cdot u) + A\cdot v = kf(u) + f(v).$$

Therefore, f is a linear transformation.

This result says that any function which is defined by matrix multiplication is a linear transformation. Later on, I'll show that for finite-dimensional vector spaces, any linear transformation can be thought of as multiplication by a matrix.


Example. Define $f: \real^2 \to \real^3$ by

$$f(x,y) = (x + 2y, x - y, -2x + 3y).$$

I'll show that f is a linear transformation the easy way. Observe that

$$f((x, y)) = \left[\matrix{1 & 2 \cr 1 & -1 \cr -2 & 3 \cr}\right] \left[\matrix{x \cr y \cr}\right].$$

f is given by multiplication by a matrix of numbers, exactly as in the lemma. ($(x, y)$ is taking the place of u.) So the lemma implies that f is a linear transformation.


Lemma. Let F be a field, and let V and W be vector spaces over F. Suppose that $f:
   V \to W$ is a linear transformation. Then:

  1. $f(\vec{0}) = \vec{0}$ .
  2. $f(-v) = -f(v)$ for all $v \in V$ .

Proof. (a) Put $u = \vec{0}$ , $v = \vec{0}$ , and $k = 1$ in the defining equation for a linear transformation. Then

$$f(\vec{0} + \vec{0}) = f(\vec{0}) + f(\vec{0}), \quad\hbox{so}\quad f(\vec{0}) = f(\vec{0}) + f(\vec{0}).$$

Subtracting $f(\vec{0})$ from both sides, I get $f(\vec{0}) = \vec{0}$ .

(b) I know that $-v = (-1)\cdot v$ , so

$$f(-v) = f\left[(-1)\cdot v\right] = (-1)\cdot f(v) = -f(v). \quad\halmos$$

The lemma gives a quick way of showing a function is not a linear transformation.


Example. Define $g: \real^2 \to \real^2$ by

$$g(x,y) = (x + 1, y + 2).$$

Then

$$g(0,0) = (1, 2) \ne (0, 0).$$

Since g does not take the zero vector to the zero vector, it is not a linear transformation.

Be careful! If $f(\vec{0}) = \vec{0}$ , you can't conclude that f is a linear transformation. For example, I showed that the function $f(x,y) = (x^2, y^2, xy)$ is not a linear transformation from $\real^2$ to $\real^3$ . But $f(0,0) = (0, 0, 0)$ , so it does take the zero vector to the zero vector.


Next, I want to prove the result I mentioned earlier: Every linear transformation on a finite-dimensional vector space can be represented by matrix multiplication. I'll begin by reviewing some notation.

Definition. The standard basis vectors for $F^m$ are

$$e_1 = (1, 0, 0, \ldots, 0), \quad e_2 = (0, 1, 0, \ldots, 0), \quad \ldots \quad e_m = (0, 0, 0, \ldots, 1).$$

Thus, $e_i$ is an m-dimensional vector with a 1 in the $i^{\rm th}$ position and 0's elsewhere. For instance, in $\real^3$ ,

$$e_1 = (1, 0, 0), \quad e_2 = (0, 1, 0), \quad e_3 = (0, 0, 1).$$

I showed earlier that $\{e_1, e_2,
   \ldots, e_m\}$ is a basis for $F^m$ . This implies the following result.

Lemma. Every vector $u \in F^m$ can be written uniquely as

$$u = u_1e_1 + u_2e_2 + \cdots + u_me_m,$$

where $u_1, u_2, \ldots, u_m \in W$ .


Example. In $\real^3$ ,

$$(\pi, -5, 17) = \pi\cdot (1, 0, 0) + (-5)\cdot (0, 1, 0) + 17\cdot (0, 0, 1) = \pi\cdot e_1 + (-5)\cdot e_2 + 17\cdot e_3.\quad\halmos$$


Now here is the converse to the Theorem I proved earlier.

Theorem. If $f: F^m \rightarrow F^n$ is a linear transformation, there is an $n \times m$ matrix A such that

$$f(u) = A\cdot u \quad\hbox{for all}\quad u \in F^m.$$

Proof. Regard the standard basis vectors $e_1$ , $e_2$ , ..., $e_m$ as m-dimensional column vectors. Then $f(e_1)$ , $f(e_2)$ , ..., $f(e_m)$ are n-dimensional column vectors, because f produces outputs in $F^n$ . Take these m n-dimensional column vectors $f(e_1)$ , $f(e_2)$ , ..., $f(e_m)$ and build a matrix:

$$A = \left[\matrix{\uparrow & \uparrow & & \uparrow \cr f(e_1) & f(e_2) & \cdots & f(e_m) \cr \downarrow & \downarrow & & \downarrow \cr}\right].$$

I claim that A is the matrix I want. To see this, take a vector $u \in \real^m$ and write it in component form:

$$u = (u_1, u_2, \ldots, u_m) = \left[\matrix{u_1 \cr u_2 \cr \vdots \cr u_m \cr}\right].$$

By the standard basis vector lemma above,

$$u = u_1e_1 + u_2e_2 + \cdots + u_me_m.$$

Then I can use the fact that f is a linear transformation --- so f of a sum is the sum of the f's, and constants (like the $u_i$ 's) can be pulled out --- to write

$$f(u) = f(u_1e_1 + u_2e_2 + \cdots + u_me_m) = u_1f(e_1) + u_2f(e_2) + \cdots + u_mf(e_m).$$

On the other hand,

$$A\cdot u = \left[\matrix{\uparrow & \uparrow & & \uparrow \cr f(e_1) & f(e_2) & \cdots & f(e_m) \cr \downarrow & \downarrow & & \downarrow \cr}\right] \left[\matrix{u_1 \cr u_2 \cr \vdots \cr u_m \cr}\right] = u_1f(e_1) + u_2f(e_2) + \cdots + u_mf(e_m).$$

To get the last equality, think about how matrix multiplication works.

Therefore, $f(u) = A\cdot u$ .

Linear transformations and matrices are not quite identical. If a linear transformation is like a person, then a matrix for the transformation is like a picture of the person --- the point being that there can be many different pictures of the same person. You get different "pictures" of a linear transformation by changing coordinates --- something I'll discuss later.


Example. Define $f: \real^2 \to \real^3$ by

$$f(x,y) = (x + 2y, x - y, -2x + 3y).$$

I already know that f is a linear transformation, and I found its matrix by inspection. Here's how it would work using the theorem. I feed the standard basis vectors into f:

$$f(e_1) = f(1,0) = (1,1,-2), \quad f(e_2) = f(0,1) = (2,-1,3).$$

I construct a matrix with these vectors as the columns:

$$A = \left[\matrix{1 & 2 \cr 1 & -1 \cr -2 & 3 \cr}\right].$$

It's the same matrix as I found by inspection.


You can combine linear transformations to obtain other linear transformations. First, I'll consider sums and scalar multiplication.

Definition. Let $f, g: V \to W$ be linear transformations of vector spaces over a field F, and let $k \in F$ .

  1. The sum $f +
   g$ of f and g is the function $V \to W$ which is defined by

$$(f + g)(v) = f(v) + g(v) \quad\hbox{for all}\quad v \in V.$$

  1. The product $k\cdot f$ of k and f is the function $V \to W$ which is defined by

$$(k\cdot f)(v) = k\cdot f(v) \quad\hbox{for all}\quad v \in V.$$

Lemma. Let $f, g: V \to W$ be linear transformations of vector spaces over a field F, and let $k \in F$ .

  1. $f + g$ is a linear transformation.
  2. $k\cdot f$ is a linear transformation.

Proof. I'll prove the first part by way of example and leave the proof of the second part to you.

Let $u, v \in V$ and let $k \in F$ . Then

$$(f + g)(ku + v) = f(ku + v) + g(ku + v) = [kf(u) + f(v)] + [kg(u) + g(v)] =$$

$$k[f(u) + g(u)] + [f(v) + g(v)] = k(f + g)(u) + (f + g)(v).$$

Therefore, $f + g$ is a linear transformation.

If $f: X \to Y$ and $g: Y \to Z$ are functions, the composite of f and g is

$$(g\circ f)(x) = g(f(x)).$$

$$\hbox{\epsfysize=1.75in \epsffile{lintrans1.eps}}$$

The $\circ$ between the g and f does not mean multiplication, but it's so easy to confuse that I'll usually just write "$g(f(x))$ " when I want to compose functions.

Note that things go from left to right in the picture, but that they go right to left in "$g(f(x))$ ". The effect is to do f first, then g.

Lemma. Let $f: U \to V$ and $g: V \to W$ be linear transformations of vector spaces over a field F. Then $g\circ f: U \to W$ is a linear transformation.

Proof. Let $u_1, u_2 \in U$ and let $k \in F$ . Then

$$(g\circ f)(ku_1 + u_2) = g(f(ku_1 + u_2)) = g(kf(u_1) + f(u_2)) = kg(f(u_1)) + g(f(u_2)) = k(g\circ f)(u_1) + (g\circ f)(u_2).$$

Hence, $g\circ f: U \to W$ is a linear transformation.

Now suppose $f: F^m \to F^n$ and $g: F^n \to F^p$ are linear transformations.

$$\matrix{& f & & g & & \cr \real^m & \longrightarrow & \real^n & \longrightarrow & \real^p \cr}$$

f and g can be represented by matrices; I'll use $[f]$ for the matrix of f and $[g]$ for the matrix of g, so

$$f(u) = [f]\cdot u \quad\hbox{for all}\quad u \in \real^m,$$

$$g(v) = [g]\cdot v \quad\hbox{for all}\quad v \in \real^n.$$

What's the matrix for the composite $g\circ f$ ?

$$g(f(u)) = [g]\cdot (f(u)) = [g]\cdot ([f]\cdot u) = ([g]\cdot [f])\cdot u.$$

The matrix for $g\circ f$ is $[g]\cdot [f]$ , the product of the matrices for f and g.


Example. Suppose

$$f(x,y) = (x + y, 2x + 5y, -3y) \quad\hbox{and}\quad g(u,v,w) = (u + v - w, 2u + 3w, -u - v + 6w)$$

are linear transformations. Thus, $f:
   \real^2 \to \real^3$ and $g: \real^3 \to \real^3$ .

Write them in matrix form:

$$f\left(\left[\matrix{x \cr y \cr}\right]\right) = \left[\matrix{1 & 1 \cr 2 & 5 \cr 0 & -3 \cr}\right] \left[\matrix{x \cr y \cr}\right], \quad g\left(\left[\matrix{u \cr v \cr w \cr}\right]\right) = \left[\matrix{1 & 1 & -1 \cr 2 & 0 & 3 \cr -1 & -1 & 6 \cr}\right] \left[\matrix{u \cr v \cr w \cr}\right].$$

The matrix for the composite transformation $g\circ f$ is

$$\left[\matrix{1 & 1 & -1 \cr 2 & 0 & 3 \cr -1 & -1 & 6 \cr}\right] \left[\matrix{1 & 1 \cr 2 & 5 \cr 0 & -3 \cr}\right] = \left[\matrix{3 & 9 \cr 2 & -7 \cr -3 & -24 \cr}\right].$$

To write it out in equation form, multiply:

$$(g\circ f)\left(\left[\matrix{x \cr y \cr}\right]\right) = \left[\matrix{3 & 9 \cr 2 & -7 \cr -3 & -24 \cr}\right] \left[\matrix{x \cr y \cr}\right] = \left[\matrix{3x + 9y \cr 2x - 7y \cr -3x - 24y \cr}\right].$$

That is,

$$(g\circ f)(x,y) = 3x + 9y, 2x - 7y, -3x - 24y).\quad\halmos$$


Example. The idea of composing transformations can be extended to affine transformations. For the sake of this example, you can think of an affine transformation as a linear transformation plus a translation (a constant). This provides a powerful way of doing various geometric constructions.

For example, I wanted to write a program to generate self-similar fractals. There are many self-similar fractals, but I was interested in those that can be constructed in the following way. Start with an initiator, which is a collection of segments in the plane. For example, this initiator is a square:

$$\hbox{\epsfysize=1in \epsffile{lintrans2.eps}}$$

Next, I need a generator. It's a collection of segments which start at the point $(0,0)$ and end at the point $(1,0)$ . Here's a generator shaped like a square hump:

$$\hbox{\epsfysize=1in \epsffile{lintrans3.eps}}$$

The construction proceeds in stages. Start with the initiator and replace each segment with a scaled copy of the generator. (There is an issue here of which way you "flip" the generator when you copy it, but I'll ignore this for simplicity.) Here's what you get by replacing the segments of the square with copies of the square hump:

$$\hbox{\epsfysize=1.5in \epsffile{lintrans4.eps}}$$

Now keep going. Take the current figure and replace all of its segments with copies of the generator. And so on. Here's what you get after around 4 steps:

$$\hbox{\epsfysize=1.5in \epsffile{lintrans5.eps}}$$

Roughly, self-similarity means that if you enlarge a piece of the figure, the enlarged piece looks like the original. If you imagine carrying out infinitely many steps of the construction above, you'd get a figure which would look "the same" no matter how much you enlarged it --- which is a very crude definition of a fractal. If you're interested in this stuff, you should look at Benoit Mandelbrot's classic book ([1]) --- it has great pictures!

What does this have to do with transformations? The idea is that to replace a segment of the current figure with a scaled copy of the generator, you need to stretch the generator, rotate it, then translate it. Here's a picture with a different initiator and generator:

$$\hbox{\epsfysize=1.75in \epsffile{lintrans6.eps}}$$

Stretching by a factor of k amounts to multiplying by the matrix

$$\left[\matrix{k & 0 \cr 0 & k \cr}\right].$$

This is a linear transformation.

Rotation through an angle $\theta$ is accomplished by multiplying by

$$\left[\matrix{\cos \theta & \sin \theta \cr -\sin \theta & \cos \theta \cr}\right].$$

This is also a linear transformation.

Finally, translation by a vector $(a,b)$ amounts to adding the vector $(a,b)$ . This is not a linear transformation (unless the vector is zero).

Thus, to stretch by k, rotate through $\theta$ , and then translate by $(a,b)$ , you should do this:

$$\left[\matrix{a \cr b \cr}\right] + \left[\matrix{\cos \theta & \sin \theta \cr -\sin \theta & \cos \theta \cr}\right] \left[\matrix{k & 0 \cr 0 & k \cr}\right] \left[\matrix{x \cr y \cr}\right].$$

By thinking of the operations as linear or affine transformations, it is very easy to write down the formula.


If $f: V \to W$ is a linear transformation, the inverse of f is a linear transformation $f^{-1}: W \to V$ which satisfies

$$(f^{-1}\circ f)(v) = v \quad\hbox{for all}\quad v \in V,$$

$$(f\circ f^{-1})(w) = w \quad\hbox{for all}\quad w \in W.$$

That is, $f^{-1}$ is a linear transformation which "undoes" the effect of f. If f can be represented by matrix multiplication, it's natural to suppose that if A is the matrix of f, then $A^{-1}$ is the matrix of $f^{-1}$ --- and it's true.

Lemma. If A and B are $n \times n$ matrices and

$$ABu = u \quad\hbox{and}\quad BAu = u \quad\hbox{for all}\quad u \in F^n,$$

then A and B are inverses --- that is, $B = A^{-1}$ .

Proof. Since $ABu = u$ for any vector u,

$$AB\left[\matrix{1 \cr 0 \cr \vdots \cr 0 \cr}\right] = \left[\matrix{1 \cr 0 \cr \vdots \cr 0 \cr}\right],$$

$$AB\left[\matrix{0 \cr 1 \cr \vdots \cr 0 \cr}\right] = \left[\matrix{0 \cr 1 \cr \vdots \cr 0 \cr}\right],$$

$$\vdots$$

$$AB\left[\matrix{0 \cr 0 \cr \vdots \cr 1 \cr}\right] = \left[\matrix{0 \cr 0 \cr \vdots \cr 1 \cr}\right].$$

Put the vectors in the equations above into the columns of a matrix. They form the identity:

$$AB\left[\matrix{1 & 0 & \cdots & 0 \cr 0 & 1 & \cdots & 0 \cr \vdots & \vdots & & \vdots \cr 0 & 0 & \cdots & 1 \cr}\right] = \left[\matrix{1 & 0 & \cdots & 0 \cr 0 & 1 & \cdots & 0 \cr \vdots & \vdots & & \vdots \cr 0 & 0 & \cdots & 1 \cr}\right], \quad\hbox{or}\quad ABI = I, \quad\hbox{so}\quad AB = I.$$

The same reasoning shows $BA = I$ , so A and B are inverses.

Proposition. If $f: F^n \to F^n$ is a linear transformation whose matrix is A and $f^{-1}$ is the inverse of f, then the matrix of $f^{-1}$ is $A^{-1}$ .

Remark. If I use $[f]$ to denote the matrix of the linear transformation f, this result can be expressed more concisely as

$$\left[f^{-1}\right] = [f]^{-1}.$$

Proof. Let A be the matrix of f and let B be the matrix of $f^{-1}$ . Since f and $f^{-1}$ are inverses, for all $u \in F^n$ ,

$$f^{-1}(f(u)) = u, \quad\hbox{so}\quad BAu = u.$$

Similarly, $f(f^{-1}(u)) = u$ gives $ABu = u$ for all $u \in F^n$ . By the lemma, A and B are inverses, which is what I wanted to prove.

Of course, a linear transformation may not be invertible, and the last result gives an easy test --- a linear transformation is invertible if and only if its matrix is invertible. You should know almost half a dozen conditions which are equivalent to the invertibility of a matrix.


Example. Suppose $f: \real^2 \to \real^2$ is given by

$$f(x,y) = (x + 3y, x + 4y).$$

The matrix of f is

$$\left[\matrix{1 & 3 \cr 1 & 4 \cr}\right].$$

Therefore, the matrix of $f^{-1}$ is

$$\left[\matrix{4 & -3 \cr -1 & 1 \cr}\right].$$

Hence, the inverse transformation is

$$f^{-1}(u,v) = (4u - 3v, -u + v).$$

So, for example,

$$f^{-1}(f(x,y)) = f^{-1}(x + 3y, x + 4y) = \left(4(x + 3y) - 3(x + 4y), -(x + 3y) + (x + 4y)\right) = (x,y).$$

You can check that $f(f^{-1}(u,v)) =
   (u,v)$ as well.


Example. Let $\real_2[x]$ denote the set of polynomials with real coefficients of degree 2 or less. Thus,

$$\real_2[x] = \{ax^2 + bx + c \mid a, b, c \in \real\}.$$

Here are some elements of $\real_2[x]$ :

$$x^2 + 3x + 2, \quad -7x^2, \quad 0, \quad 42x - 5.$$

You can represent polynomials in $\real_2[x]$ by vectors. For example, here is how to represent $x^2 + 3x + 2$ as a vector:

$$\matrix{(& & 2 & , & 3 & , & 1 & ) \cr & & \uparrow & & \uparrow & & \uparrow & \cr & & 2 & + & 3x & + & x^2 & \cr}$$

I'm writing the coefficients with the powers increasing to make it easy to extend this to higher powers. For example,

$$\matrix{(& & 7 & , & -1 & , & 4 & , & 5 & ) \cr & & \uparrow & & \uparrow & & \uparrow & & \uparrow & \cr & & 7 & - & x & + & 4x^2 & + & 5x^3 \cr}$$

Returning to $\real_2[x]$ , there is a function $D: \real_2[x] \to \real_2[x]$ defined by

$$D((c, b, a)) = (b, 2a, 0).$$

Do you see what it is?

If I switch back to polynomial notation, it is

$$D(c + bx + ax^2) = b + 2ax.$$

Of course, D is just differentiation.

Now you know from calculus that:

This means that D is a linear transformation $\real_2[x] \to \real_2[x]$ . What is its matrix?

To find the matrix of a linear transformation (relative to the standard basis), apply the transformation to the standard basis vectors. Use the results as the columns of your matrix.

In vector form, $\real_2[x]$ is just $\real^3$ , so the standard basis vectors are

$$e_1 = (1,0,0), \quad e_2 = (0,1,0), \quad e_3 = (0,0,1).$$

As I apply D, I'll translate to polynomial notation to make it easier for you to follow:

$$D((1,0,0)) = D(1) = 0 = (0,0,0).$$

$$D((0,1,0)) = D(x) = 1 = (1,0,0).$$

$$D((0,0,1)) = D(x^2) = 2x = (0,2,0).$$

Therefore, the matrix of D is

$$\left[\matrix{0 & 1 & 0 \cr 0 & 0 & 2 \cr 0 & 0 & 0 \cr}\right]. \quad\halmos$$


Example. Construct a linear transformation which maps the unit square $0 \le x
   \le 1$ , $0 \le y \le 1$ to the parallelogram in $\real^3$ determined by the vectors $(2,-3,6)$ and $(
   1,-1,1)$ .

The idea is to send vectors for the square's sides --- namely, $( 1,0)$ and $(0,1)$ --- to the target vectors $(2,-3,6)$ and $(1,-1,1)$ . If I do this with a linear transformation, the rest of the square will go with them.

$$\hbox{\epsfysize=1.75in \epsffile{lintrans7.eps}}$$

The following matrix does it:

$$\left[\matrix{2 & 1 \cr -3 & -1 \cr 6 & 1 \cr}\right] \left[\matrix{1 \cr 0 \cr}\right] = \left[\matrix{2 \cr -3 \cr 6 \cr}\right], \quad \left[\matrix{2 & 1 \cr -3 & -1 \cr 6 & 1 \cr}\right] \left[\matrix{0 \cr 1 \cr}\right] = \left[\matrix{1 \cr -1 \cr 1 \cr}\right].$$

Therefore, the transformation is

$$f\left(\left[\matrix{x \cr y \cr}\right]\right) = \left[\matrix{2 & 1 \cr -3 & -1 \cr 6 & 1 \cr}\right] \left[\matrix{x \cr y \cr}\right].\quad\halmos$$


  1. Benoit Mandelbrot, The fractal geometry of nature. New York: W. H. Freeman and Company, 1983. [ISBN 0-7167-1186-9]

Contact information

Bruce Ikenaga's Home Page

Copyright 2008 by Bruce Ikenaga