Derivatives

Suppose we have a function of several variables $f: \real^n \to \real^m$ --- a function with possibly multiple inputs and multiple outputs. What would it mean to be the derivative of such a function?

We've seen that the partial derivatives of a function $z = f(x, y)$ give the rates of change of f in the x- and y-directions. This is a partial answer to our question. However, what do we do if we have a function like

$$(u, v) = f(x, y, z) = (2 x y - 3 z, y z - 3 x z)?$$

A naive approach might be to write the function in component form:

$$u = 2 x y - 3 z, \quad v = yz - 3 x z.$$

Now we can take partial derivatives:

$$\eqalign{ \pder u x = 2 y, \quad \pder u y & = 2 x, \quad \pder u z = -3,\cr \pder v x = -3 z, \quad \pder v y & = z, \quad \pder v z = y - 3 x.\cr}$$

Well, we have 6 partial derivatives --- do we want to call all of these (collectively) the "derivative" of f? This seems somewhat ad hoc and unsatisfying.

You might think of arranging these partial derivatives in a matrix:

$$\left[\matrix{ \pder u x = 2 y & \pder u y = 2 x & \pder u z = -3 \cr \pder v x = -3 z & \pder v y = z & \pder v z = y - 3 x \cr}\right].$$

Now we have a single entity, and it is reasonable to say that it contains information about the rate(s) of change of f However, this still seems a bit arbitrary --- for example, why put the u derivatives in the first row (as opposed to, say, the first column)?

Actually, the matrix we've written down above will turn out to be the derivative of f. In order to give a rationale for this, we have to think about derivatives in a conceptual way.

You probably encountered the derivative of a function $y = f(x)$ as the slope of the tangent line to a curve. What is the tangent line, in a conceptual sense? It's the best "flat" approximation to the curve near the point of tangency. That is, near the point of tangency, the tangent and the curve are very close.

We could generalize this to a function $z = f(x, y)$ by considering the tangent plane to the graph, which is a surface in three dimensions. And though it's difficult to picture, we can imagine continuing this procedure to higher dimensions. A line in the plane looks like this:

$$a x + b y = c.$$

A plane in space looks like this:

$$a x + b y + c z = d.$$

With more input variables for f, we just keep adding more variables to the equation.

These equations are linear, which means roughly that the variables occur only to the first power, multiplied by constants. Equations of this kind describe things that are "flat".

With multiple inputs and multiple outputs, an equation for a "flat" thing involves a matrix. For example, here's a linear function with 3 inputs and 2 outputs:

$$\left[\matrix{u \cr v \cr}\right] = \left[\matrix{1 & 3 & -8 \cr 4 & 0 & 3 \cr}\right] \left[\matrix{x \cr y \cr z \cr}\right].$$

Written out, this would be

$$u = x + 3 y - 8 z, \quad v = 4 x + 3 z.$$

Here's the general definition.

Definition. A function $T: \real^n \to \real^m$ is a linear transformation if:

(a) $T(x + y) = T(x) + T(y)$ for all $x, y \in \real^n$ .

(b) $T(c \cdot x) = c \cdot T(x)$ for all $c \in \real$ and $x \in
   \real^n$ .

You've seen things which have these properties --- derivatives and antiderivatives, for example.

A linear transformation is a function that behaves like the linear functions you know: $y = m x$ or $z = a x + b y$ , for instance. The derivative of a function will be the linear transformation that best approximates the function. Here's the definition.

Definition. Let $f: U \to \real^m$ be a function, where U is an open set in $\real^n$ , and let $x \in U$ . The derivative of f at x is a linear transformation $Df:
   \real^n \to \real^m$ which satisfies

$$\lim_{\|h\| \to 0} \dfrac{\|f(x + h) - f(x) - Df(h)\|}{\|h\|} = 0.$$

People sometimes write "$Df(x)(h)$ " or "$Df_x(h)$ to indicate the dependence of $Df$ on x. It is also common to see "$df$ " instead of "$Df$ ".

Notice that this resembles our old difference quotient definition for derivatives:

$$f'(x) = \lim_{h \to 0} \dfrac{f(x + h) - f(x)}{h}.$$

Why do we have $\|\cdot\|$ 's in the equation? The idea is that h, $f(x + h)$ , and $f(x)$ are elements of $\real^m$ --- that is, m-dimensional vectors. h is really $(h_1, \ldots, h_n)$ . And f is a function which produces m numbers as its output, so $f(x + h)$ , and $f(x)$ look like $(y_1,
   \ldots, y_m)$ . We can't take the quotient of two vectors, but we can take the quotient of their lengths.

As with the difference quotient definition for the derivative $f'(x)$ of a function $y =
   f(x)$ , the definition we've given is not so great for actually computing derivatves. The following result makes it a lot easier.

Theorem. Suppose $f: \real^n \to \real^m$ is differentiable at $c \in \real^n$ . Write $x = (x_1, x_2, \ldots x_n)$ and $f =
   (f_1, f_2, \ldots f_m)$ , and let $h \in \real^n$ . Then

$$Df(h) = \left[\matrix{ \pder {f_1} {x_1} & \pder {f_1} {x_2} & \cdots & \pder {f_1} {x_n} \cr \pder {f_2} {x_1} & \pder {f_2} {x_2} & \cdots & \pder {f_2} {x_n} \cr & & \vdots & \cr \pder {f_m} {x_1} & \pder {f_m} {x_2} & \cdots & \pder {f_m} {x_n} \cr}\right] \cdot h.$$

In other words, $Df$ is represented by the matrix of partial derivatives of the components of f with respect to the input variables.

The converse isn't true. If the partial derivatives of a function exist, the function might fail to be differentiable (in the sense that the limit above might be undefined). On the other hand, we have the following result.

Theorem. Suppose $f: \real^n \to \real^m$ . Write $x = (x_1, x_2, \ldots x_n)$ and $f = (f_1, f_2, \ldots f_m)$ . If the partial derivatives $\pder {f_i} {x_j}$ at $c \in \real^n$ are continuous for $i = 1, \ldots m$ and $j = 1, \ldots
   n$ , then f is differentiable at c.

In this case, by the preceding theorem, $Df$ is represented by the matrix of partial derivatives.

Remark. A function with continuous partial derivatives is called continuously differentiable.


Example. Compute $Df$ for

$$(u, v) = f(x, y, z) = (7 x y + z^2, x + 3 y - z^4).$$

Since f has 3 inputs and 2 outputs, the matrix for $Df$ will be $2 \times 3$ :

$$Df = \left[\matrix{ 7 y & 7 x & 2 z \cr 1 & 3 & -4 z^3 \cr}\right].\quad\halmos$$


Example. Compute $Df$ for

$$(x, y, z) = f(s, t) = (s^2 - t^2, 3 st, s^3 t^3)$$

We have 2 inputs and 3 outputs, so $Df$ will be a $3 \times 2$ matrix:

$$Df =\left[\matrix{ 2 s & 2 t \cr 3 t & 3 s \cr 3 s^2 t^3 & 3 s^3 t^2 \cr}\right].\quad\halmos$$


Example. Compute $Df$ for

$$y = \tan x.$$

$$Df = \left[(\sec x)^2\right].$$

You can see this is the ordinary derivative of a function of 1 variable. We drop the matrix brackets and just write $Df = (\sec x)^2$ , or $y' = (\sec
   x)^2$ like usual.


Definition. Let $f: U \to \real$ , where $U \subset \real^n$ . The derivative $Df$ is called the gradient of f, and is denote $\nabla f$ .

Example. Find the gradient (the derivative) of

$$z = f(x, y) = x^2 - xy - y^2.$$

$$\nabla f = \left[\matrix{2 x - y & -x - 2 y \cr}\right].$$

Notice that the gradient is a row vector --- that is, a $1 \times n$ matrix.


Example. Find the gradient (the derivative) of

$$w = f(x, y, z) = \dfrac{x + 3 y}{z}.$$

$$\nabla f = \left[\matrix{ \dfrac{1}{z} & \dfrac{3}{z} & -\dfrac{x + 3 y}{z^2} \cr}\right].\quad\halmos$$


Since you'd expect that the best linear approximation to a linear function is the linear function, and since the derivative of f as the best linear approximation to f, you'd expect the derivative of a linear function to be the linear function.

Proposition. Let $f: \real^n \to \real^m$ be a linear function. Then f is differentiable, and $Df = f$ .

Proof. Since f is linear, $f(x + h) = f(x) + f(h)$ . Therefore,

$$\lim_{\|h\| \to 0} \dfrac{\|f(x + h) - f(x) - f(h)\|}{\|h\|} = \lim_{\|h\| \to 0} \dfrac{\|f(x) + f(h) - f(x) - f(h)\|}{\|h\|} = \lim_{\|h\| \to 0} \dfrac{0}{\|h\|} = 0.$$

This shows that $Df = f$ .

Corollary. (a) The function $s: \real^n \times \real^n \to \real^n$ given by $s(x, y) = x + y$ is differentiable.

Note: $\real^n \times \real^n$ consists of pairs $(x, y)$ where $x, y \in
   \real^n$ .

(b) For $c \in \real$ , the function $p: \real^n \to \real^n$ given by $p9x) = c \cdot x$ is differentiable.

Proof. Both s and p are linear functions, so (a) and (b) follow from the proposition.

Example. The following function $f: \real^3 \to \real^2$ is linear:

$$f(x, y, z) = (2 x + 3 y, 5 y - y, 7 x + 10 y).$$

Verify that it's equal to its derivative.

Computing the partial derivatives directly,

$$Df = \left[\matrix{ 2 & 3 \cr 5 & -1 \cr 7 & 10 \cr}\right].$$

Writing f as a matrix multiplication, I have

$$f\left(\left[\matrix{x \cr y \cr z \cr}\right]\right) = \left[\matrix{ 2 & 3 \cr 5 & -1 \cr 7 & 10 \cr}\right] \left[\matrix{x \cr y \cr z \cr}\right].$$

You can see that the matrices are the same.


Here are some other standard and unsurprising results about derivatives. You can prove them using the preceding corollary and the Chain Rule, which I'll discuss later.

Theorem. Let $U \subset \real^n$ , let $f, g: U \to \real^m$ , and let $c \in \real$ . Suppose f and g are differentiable at c. Then:

(a) $f + g$ is differentiable at c, and

$$D(f + g) = Df + Dg.$$

(b) $c \cdot f$ is differentiable at c, and

$$D(c \cdot f) = c \cdot Df.\quad\halmos$$


Contact information

Bruce Ikenaga's Home Page

Copyright 2018 by Bruce Ikenaga