Suppose we have a function of several variables --- a function with possibly multiple
inputs and multiple outputs. What would it mean to be the * derivative* of such a function?

We've seen that the partial derivatives of a function give the rates of change of f in the x- and y-directions. This is a partial answer to our question. However, what do we do if we have a function like

A naive approach might be to write the function in component form:

Now we can take partial derivatives:

Well, we have 6 partial derivatives --- do we want to call all of
these (collectively) the "derivative" of f? This seems
somewhat *ad hoc* and unsatisfying.

You might think of arranging these partial derivatives in a matrix:

Now we have a single entity, and it is reasonable to say that it contains information about the rate(s) of change of f However, this still seems a bit arbitrary --- for example, why put the u derivatives in the first row (as opposed to, say, the first column)?

Actually, the matrix we've written down above *will* turn out
to be *the* derivative of f. In order to give a rationale for
this, we have to think about derivatives in a conceptual way.

You probably encountered the derivative of a function as the slope of the tangent line to a curve. What
*is* the tangent line, in a conceptual sense? It's the best
"flat" approximation to the curve near the point of
tangency. That is, near the point of tangency, the tangent and the
curve are very close.

We could generalize this to a function by considering
the tangent *plane* to the graph, which is a surface in three
dimensions. And though it's difficult to picture, we can
*imagine* continuing this procedure to higher dimensions. A
line in the plane looks like this:

A plane in space looks like this:

With more input variables for f, we just keep adding more variables to the equation.

These equations are * linear*, which means
roughly that the variables occur only to the first power, multiplied
by constants. Equations of this kind describe things that are
"flat".

With multiple inputs and multiple outputs, an equation for a
"flat" thing involves a *matrix*. For example,
here's a * linear function* with 3 inputs and 2
outputs:

Written out, this would be

Here's the general definition.

* Definition.* A function is a * linear
transformation* if:

(a) for all .

(b) for all and .

You've seen things which have these properties --- derivatives and antiderivatives, for example.

A linear transformation is a function that behaves like the linear functions you know: or , for instance. The derivative of a function will be the linear transformation that best approximates the function. Here's the definition.

* Definition.* Let be a
function, where U is an open set in , and let . The * derivative* of f at x is
a linear transformation which
satisfies

People sometimes write " " or " to indicate the dependence of on x. It is also common to see " " instead of " ".

Notice that this resembles our old difference quotient definition for derivatives:

Why do we have 's in the equation? The idea is
that h, , and are elements of --- that is, m-dimensional *vectors*. h is
really . And f is a function which
produces m numbers as its output, so , and look like . We can't take the
quotient of two vectors, but we *can* take the quotient of
their lengths.

As with the difference quotient definition for the derivative of a function , the definition we've given is not so great for actually computing derivatves. The following result makes it a lot easier.

* Theorem.* Suppose is
differentiable at . Write and , and
let . Then

In other words, is represented by the matrix of partial derivatives of the components of f with respect to the input variables.

The converse isn't true. If the partial derivatives of a function exist, the function might fail to be differentiable (in the sense that the limit above might be undefined). On the other hand, we have the following result.

* Theorem.* Suppose .
Write and . If the partial derivatives at are continuous for and , then f is differentiable at
c.

In this case, by the preceding theorem, is represented by the matrix of partial derivatives.

* Remark.* A function with continuous partial
derivatives is called * continuously
differentiable*.

* Example.* Compute for

Since f has 3 inputs and 2 outputs, the matrix for will be :

* Example.* Compute for

We have 2 inputs and 3 outputs, so will be a matrix:

* Example.* Compute for

You can see this is the ordinary derivative of a function of 1 variable. We drop the matrix brackets and just write , or like usual.

* Definition.* Let , where
. The derivative
is called the * gradient* of f, and is denote
.

* Example.* Find the gradient (the derivative) of

Notice that the gradient is a * row vector* ---
that is, a matrix.

* Example.* Find the gradient (the derivative) of

Since you'd expect that the best linear approximation to a linear function is the linear function, and since the derivative of f as the best linear approximation to f, you'd expect the derivative of a linear function to be the linear function.

* Proposition.* Let be
a linear function. Then f is differentiable, and .

* Proof.* Since f is linear, . Therefore,

This shows that .

* Corollary.* (a) The function given by is differentiable.

Note: consists of pairs where .

(b) For , the function given by is differentiable.

* Proof.* Both s and p are linear functions, so
(a) and (b) follow from the proposition.

* Example.* The following function is linear:

Verify that it's equal to its derivative.

Computing the partial derivatives directly,

Writing f as a matrix multiplication, I have

You can see that the matrices are the same.

Here are some other standard and unsurprising results about
derivatives. You can prove them using the preceding corollary and the
* Chain Rule*, which I'll discuss later.

* Theorem.* Let , let
, and let . Suppose f
and g are differentiable at c. Then:

(a) is differentiable at c, and

(b) is differentiable at c, and

Copyright 2018 by Bruce Ikenaga