Tag Archives: function

Thinking about a function as a mathematical object

A mathematician’s mental representation of a function is generally quite rich and may involve many different metaphors and images kept in mind simultaneously. The abmath article on metaphors and images for functions discusses many of these representations, although the article is incomplete. This post is a fairly thorough rewrite of the discussion in that article of the representation of the concept of “function” as a mathematical object. You must think of functions as math objects when you are taking the rigorous view, which happens when you are trying to prove something about functions (or large classes of functions) in general.

What often happens is that you visualize one of your functions in many of the ways described in this article (it is a calculation, it maps one space to another, its graph is bounded, and so on) but those images can mislead you. So when you are completely stuck, you go back to thinking of the function as an axiomatically-defined mathe­matical structure of some sort that just sits there, like a complicated machine where you can see all the parts and how they relate to each other. That enables you to prove things by strict logical deduction. (Mathematicians mostly only go this far when they are desperate. We would much rather quote somebody’s theorem.) This is what I have called the dry bones approach.

The “mathematical structure” is most commonly a definition of function in terms of sets and axioms. The abmath article Specification and definition of “function” discusses the usual definitions of “function” in detail.

Example

This example is intended to raise your consciousness about the possibilities for functions as objects.

Consider the function $f:\mathbb{R}\to\mathbb{R}$ defined by $f(x)=2{{\sin }^{2}}x-1$. Its value can be computed at many different numbers but it is a single, static math object.

You can apply operators to it

  • Just as you can multiply a number by $2$, you can multiply $f$ by $2$.   You can say “Let $g(x)=2f(x)$” or “Let $g=2f$”. Multiplying a numerical function by $2$ is an operator that take the function $f$ to $2f$. Its input is a function and its output is another function. Then the value of $g$ (which is $2f$) at any real $x$ is $g(x)=2f(x)=4{{\sin }^{2}}x-2$. The notation  “$g=2f$” reveals that mathematicians think of $f$ as a single math object just as the $3$ in the expression “$2\times 3$” represents the number $3$ as a single object.
  • But you can’t do arithmetic operations to functions that don’t have numerical output, such as the function $\text{FL}$ that takes an English word to its first letter, so $\text{FL}(`\text{wolf’})=`\text{w’}$. (The quotes mean that I am writing about the word ‘wolf’ and the letter ‘w’.) The expression $2\times \text{FL}(`\text{wolf’})$ doesn’t make sense because ‘w’ is a letter, not a number.
  • You can find the derivative.  The derivative operator is a function from differentiable functions to functions. Such a thing is usually called an operator.  The derivative operator is sometimes written as $D$, so $Df$ is the function defined by: “$(Df)(x)$ is the slope of the tangent line to $f$ at the point $(x,f(x)$.” That is a perfectly good definition. In calculus class you learn formulas that allow you to calculate $(Df)(x)$ (usually called “$f'(x)$”) to be $4 \sin (x) \cos (x)$.

Like all math objects, functions may have properties

  • The function defined by $f(x)=2{{\sin}^{2}}x-1$ is differentiable, as noted above. It is also continuous.
  • But $f$ is not injective. This means that two different inputs can give the same output. For example,$f(\frac{\pi}{3})=f(\frac{4\pi}{3})=\frac{1}{2}$. This is a property of the whole function, not individual values. It makes no sense to say that $f(\frac{\pi}{3})$ is injective.
  • The function $f$ is periodic with period $2\pi$, meaning that for any $x$, $f(x+2\pi)=f(x)$.     It is the function itself that has period $2\pi$, not any particular value of it.  

As a math object, a function can be an element of a set

  • For example,$f$ is an element of the set ${{C}^{\infty }}(\mathbb{R})$ of real-valued functions that have derivatives of all orders.
  • On ${{C}^{\infty }}(\mathbb{R})$, differentiation is an operator that takes a function in that set to another function in the set.   It takes $f(x)$ to the function $4\sin x\cos x$.
  • If you restrict $f$ to the unit interval, it is an element of the function space ${{\text{L}}^{2}}[0,1]$.   As such it is convenient to think of it as a point in the space (the whole function is the point, not just values of it).    In this particular space, you can think of the points as vectors in an uncountably-infinite-dimensional space. (Ideas like that weird some people out. Do not worry if you are one of them. If you keep on doing math, function spaces will seem ordinary. They are OK by me, except that I think they come in entirely too many different kinds which I can never keep straight.) As a vector, $f$ has a norm, which you can think of as its length. The norm of $f$ is about $0.81$.

The discussion above shows many examples of thinking of a function as an object. You are thinking about it as an undivided whole, as a chunk, just as you think of the number $3$ (or $\pi$) as just a thing. You think the same way about your bicycle as a whole when you say, “I’ll ride my bike to the library”. But if the transmission jams, then you have to put it down on the grass and observe its individual pieces and their relation to each other (the chain came off a gear or whatever), in much the same way as noticing that the function $g(x)=x^3$ goes through the origin and looks kind of flat there, but at $(2,8)$ it is really rather steep. Phrases like “steep” and “goes through the origin” are a clue that you are thinking of the function as a curve that goes left to right and levels off in one place and goes up fast in another — you are thinking in a dynamic, not a static way like the dry bones of a math object.

Send to Kindle

The definition of “function”

 

This is the new version of the abstractmath article on the definition of function. I had to adapt the formatting and some of it looks weird, but legible. It is prettier on abstractmath.org.

I expect to announce new revisions of other abmath articles on this blog, with links, but not to publish them here. This article brings out a new point of view about defining functions that I wanted to call attention to, so I am publishing it here, as well.

 

FUNCTIONS: SPECIFICATION AND DEFINITION

It is essential that you understand many of the images, metaphors and terminology that mathe­maticians use when they think and talk about functions. For many purposes, the precise mathematical definition of "function" does not play much of a role when you are trying to understand particular kinds of functions. But there is one point of view about functions that has resulted in fundamental progress in math:

 

 

A function is a mathematical object.

To deal with functions in that way you need a precise definition of "function". That is what this article gives you.

  • The article starts by giving a specification of "function".
  • After that, we get into the technicalities of the definitions of the general concept of function.
  • Things get complicated because there are several inequivalent definitions of "function" in common use.

Specification of "function"

A function $f$ is a mathematical object which determines and is completely determined by the following data:

(DOM) $f$ has a domain, which is a set. The domain may be denoted by $\text{dom} f$.

(COD) $f$ has a codomain, which is also a set and may be denoted by $\text{cod} f$.

(VAL) For each element $a$ of the domain of $f$, $f$ has a value at $a$, denoted by $f(a)$.

(FP) The value of $f$ at $a$ is completely determined by $a$ and $f$.

(VIC) The value of $f$ at $a$ must be an element of the codomain of $f$.

  • The operation of finding $f(a)$ given $f$ and $a$ is called evaluation.
  • "FP" means functional property.
  • "VIC" means "value in codomain".

Examples

The examples of functions chapter contains many examples. The two I give here provide immediate examples.

A finite function

Let $F$ be the function defined on the set $\left\{1,\,2,3,6 \right\}$ as follows: $F(1)=3,\,\,\,F(2)=3,\,\,\,F(3)=2,\,\,\,F(6)=1$. This is the function called "Finite'' in the chapter on examples of functions.

  • The definition of $F$ says "$F$ is defined on the set $\left\{1,\,2,\,3,\,6 \right\}$". That phrase means that the domain is that set.
  • The value of $F$ at each element of the domain is given explicitly. The value at 3, for example, is 2, because the definition says that $F(2) = 3$. No other reason needs to be given. Mathematical definitions can be arbitrary.
  • The codomain of $F$ is not specified, but must include the set $\{1,2,3\}$. The codomain of a function is often not specified when it is not important — which is most of the time in freshman calculus (for example).

A real-valued function

Let $G$ be the real-valued function defined by the formula $G(x)={{x}^{2}}+2x+5$.

  • The definition of $G$ gives the value at each element of the domain by a formula. The value at $3$, for example, is $G(3)=3^2+2\cdot3+5=20$.
  • The definition of $G$ does not specify the domain. The convention in the case of functions defined on the real numbers by a formula is to take the domain to be all real numbers at which the formula is defined. In this case, that is every real number, so the domain is $\mathbb{R}$.
  • The definition does not specify the codomain, either. However, must include all real numbers greater than or equal to 4. (Why?)

What the specification means

  • The specification guarantees that a function satisfies all five of the properties listed.
  • The specification does not define a mathematical structure in the way mathematical structures have been defined in the past: In particular, it does not require a function to be one or more sets with structure.
  • Even so, it is useful to have the specification, because:

     

     

    Many mathematical definitions
    introduce extraneous technical elements
    which clutter up your thinking
    about the object they define.

     

     

    I will say more about this when I give the various definitions that are in use.

History

Until late in the nineteenth century, functions were usually thought of as defined by formulas (including infinite series). Problems arose in the theory of harmonic analysis which made mathematicians require a more general notion of function. They came up with the concept of function as a set of ordered pairs with the functional property (discussed below), and that understanding revolutionized our understanding of math.

This discussion is an over­simpli­fication of the history of mathe­matics, which many people have written thick books about. A book relevant to these ideas is Plato's Ghost, by Jeremy Gray.

In particular, this definition, along with the use of set theory, enabled abstract math (ahem) to become a common tool for understanding math and proving theorems. It is conceivable that some of you may wish it hadn't. Well, tough.

The more modern definition of function given here (which builds on the older definition) came into use beginning in the 1950's. The strict version became necessary in algebraic topology and is widely used in many fields today.

The concept of function as a formula never disappeared entirely, but was studied mostly by logicians who generalized it to the study of function-as-algorithm. Of course, the study of algorithms is one of the central topics of modern computing science, so the notion of function-as-formula (updated to function-as-algorithm) has achieved a new importance in recent years.

To state both the old abstract definition and the modern one, we need a preliminary idea.

The functional property

A set $P$ of ordered pairs has the functional property if two pairs in $P$ with the same first coordinate have to have the same second coordinate (which means they are the same pair). In other words, if $(x,a)$ and $(x,b)$ are both in $P$, then $a=b$.

How to think about the functional property

The point of the functional property is that for any pair in the set of ordered pairs, the first coordinate determines what the second one is. That's why you can write "$G(x)$'' for any $x $ in the domain of $G$ and not be ambiguous.

Examples

  • The set $\{(1,2), (2,4), (3,2), (5,8)\}$ has the functional property, since no two different pairs have the same first coordinate. Note that there are two different pairs with the same second coordinate. This is irrelevant to the functional property.
  • The set $\{(1,2), (2,4), (3,2), (2,8)\}$ does not have the functional property. There are two different pairs with first coordinate 2.
  • The empty set $\emptyset$ has the function property vacuously.

Graph of a function.

Example: graph of a function defined by a formula

In calculus books, a picture like this one (of part of $y=x^2+2x+5$) is called a graph. Here I use the word "graph" to denote the set of ordered pairs \[\left\{ (x,{{x}^{2}}+2x+5)\,\mathsf{|}\,x\in \mathbb{R } \right\}\] which is a mathematical object rather than some ink on a page or pixels on a screen.

The graph of any function studied in beginning calculus has the functional property. For example, the set of ordered pairs above has the functional property because if $x$ is any real number, the formula ${{x}^{2}}+2x+5$ defines a specific real number.

  • if $x = 0$, then ${{x}^{2}}+2x+5=5$, so the pair $(0, 5)$ is an element of the graph of $G$. Each time you plug in $0$ in the formula you get 5.
  • if $x = 1$, then ${{x}^{2}}+2x+5=8$.
  • if $x = -2$, then ${{x}^{2}}+2x+5=5$.

You can measure where the point $\{-2,5\}$ is on the (picture of) the graph and see that it is on the blue curve as it should be. No other pair whose first coordinate is $-2$ is in the graph of $G$, only $(-2, 5)$. That is because when you plug $-2$ into the formula ${{x}^{2}}+2x+5$, you get $5$ and nothing else. Of course, $(0, 5)$ is in the graph, but that does not contradict the functional property. $(0, 5)$ and $(-2, 5)$ have the same second coordinate, but that is OK.

Modern mathematical definition of function

A function $f$ is a mathematical structure consisting of the following objects:

  • A set called the domain of $f$, denoted by $\text{dom} f$.
  • A set called the codomain of $f$, denoted by $\text{cod} f$.
  • A set of ordered pairs called the graph of $ f$, with the following properties:
  • $\text{dom} f$ is the set of all first coordinates of pairs in the graph of $f$.
  • Every second coordinate of a pair in the graph of $f$ is in $\text{cod} f$ (but $\text{cod} f$ may contain other elements).
  • The graph of $f$ has the functional property.

Using arrow notation, this implies that $f:A\to B$.

Remark

The main difference between the specification of function given previously and this definition is that the definition replaces the statement "$f$ has a value at $a$" by introducing a set of ordered pairs (the graph) with the functional property.

  • This set of ordered pairs is extra structure introduced by the definition mainly in order to make the definition a classical sets-with-structure, which makes the graph, which should be a concept derived from the concept of function, into an apparently necessary part of the function.
  • That suggests incorrectly that the graph is more of a primary intuition that other intuitions such as function as relocator, function as transformer, and other points of view discussed in the article Intuitions and metaphors for functions.

Examples

  • Let $F$ have graph $\{(1,2), (2,4), (3,2), (5,8)\}$ and define $A = \{1, 2, 3, 5\}$ and $B = \{2, 4, 8\}$. Then $F:A\to B$ is a function. In speaking, we would usually say, "$F$ is a function from $A$ to $B$."
  • Let $G$ have graph $\{(1,2), (2,4), (3,2), (5,8)\}$ (same as above), and define $A = \{1, 2, 3, 5\}$ and $C = \{2, 4, 8, 9, 11, \pi, 3/2\}$. Then $G:A\to C$ is a (admittedly ridiculous) function. Note that all the second coordinates of the graph are in $C$, along with a bunch of miscellaneous suspicious characters that are not second coordinates of pairs in the graph.
  • Let $H$ have graph $\{(1,2), (2,4), (3,2), (5,8)\}$. Then $H:A\to \mathbb{R}$ is a function, since $2$, $4$ and $8$ are all real numbers.
  • Let $D = \{1, 2, 5\}$ and $E = \{1, 2, 3, 4, 5\}$. Then there is no function $D\to A$ and no function $E\to A$ with graph $\{(1,2), (2,4), (3,2), (5,8)\}$. Neither $D$ nor $E$ has exactly the same elements as the first coordinates of the graph.

Identity and inclusion

Suppose we have two sets  A and  B with $A\subseteq B$.

  • The identity function on A is the function ${{\operatorname{id}}_{A}}:A\to A$ defined by ${{\operatorname{id}}_{A}}(x)=x$ for all $x\in A$. (Many authors call it ${{1}_{A}}$).
  • When $A\subseteq B$, the inclusion function from $A$ to $B$ is the function $i:A\to B$ defined by $i(x)=x$ for all $x\in A$. Note that there is a different function for each pair of sets $A$ and $B$ for which $A\subseteq B$. Some authors call it ${{i}_{A,\,B}}$ or $\text{in}{{\text{c}}_{A,\,B}}$.

The identity function and an inclusion function for the same set $A$ have exactly the same graph, namely $\left\{ (a,a)|a\in A \right\}$. More about this below.

Other definitions of function

Original abstract definition of function

Definition

Remarks

Possible confusion

Some confusion can result because of the presence of these two different definitions.

Multivalued function

Some older mathematical papers in com­plex func­tion theory do not tell you that their functions are multi­valued. There was a time when com­plex func­tion theory was such a Big Deal in research mathe­matics that the phrase "func­tion theory" meant complex func­tion theory and all the cogno­scenti knew that their functions were multi­valued.

The phrase multivalued function refers to an object that is like a function $f:S\to T$ except that for $s\in S$, $f(s)$ may denote more than one value.

Examples

  • Multivalued functions arose in considering complex functions. In common practice, the symbol $\sqrt{4}$ denoted $2$, although $-2$ is also a square root of $4$. But in complex function theory, the square root function takes on both the values $2$ and $-2$. This is discussed in detail in Wikipedia.
  • The antiderivative is an example of a multivalued operator. For any constant $C$, $\frac{x^3}{3}+C$ is an antiderivative of $x^2$.

A multivalued function $f:S\to T$ can be modeled as a function with domain $S$ and codomain the set of all subsets of $T$. The two meanings are equivalent in a strong sense (naturally equivalent}). Even so, it seems to me that they represent two differ­ent ways of thinking about multivalued functions. ("The value may be any of these things…" as opposed to "The value is this whole set of things.")

The phrases "multivalued function" and "partial function" upset some picky types who say things like, "But a multi­valued func­tion is not a func­tion!". A step­mother is not a mother, either. See the Hand­book article on radial category.

Partial function

A partial function $f:S\to T$ is just like a function except that its input may be defined on only a subset of $S$. For example, the function $f(x)=\frac{1}{x}$ is a partial function from the real numbers to the real numbers.

This models the behavior of computer programs (algorithms): if you consider a program with one input and one output as a function, it may not be defined on some inputs because for them it runs forever (or gives an error message).

In some texts in computing science and mathematical logic, a function is by convention a partial function, and this fact may not be mentioned explicitly, especially in research papers.

New approaches to functions

All the definitions of function given here produce mathematical structures, using the traditional way to define mathematical objects in terms of sets. Such definitions have disadvantages.

Mathematicians have many ways to think about functions. That a function is a set of ordered pairs with a certain property (functional) and possibly some ancillary ideas (domain, codomain, and others) is not the way we usually think about them$\ldots$Except when we need to reduce the thing we are studying to its absolutely most abstract form to make sure our proofs are correct. That most abstract form is what I have called the rigorous view or the dry bones and it is when that reasoning is needed that the sets-with-structure approach has succeeded.

Our practice of abstraction has led us to new approaches to talking about functions. The most important one currently is category theory. Roughly, a category is a bunch of objects together with some arrows going between them that can be composed head to tail. Functions between sets are examples of this: the sets are the objects and the functions the arrows.

This abstracts the idea of function in a way that brings out common ideas in various branches of math. Research papers in many branches of mathematics now routinely use the language of category theory. Categories now appear in some undergraduate math courses, meaning that Someone needs to write a chapter on category theory for abstractmath.org.

Besides category theory, computing scientists have come up with other abstract ways of dealing with functions, for example type theory. It has not come as far along as category theory, but has shown recent signs of major progress.

Both category theory and type theory define math objects in terms of their effect on and relationship with other math objects. This makes it possible to do abstract math entirely without using sets-with-structure as a means of defining concepts.

 

Send to Kindle

Function and codomain

I recently posted the following information in the talk page of the Wikipedia article on functions, where they were arguing about whether "function" means a set of ordered pairs with the functional property or a structure with a domain $D$, a codomain $C$, and a graph $G$ which is a subset of $D\times C$ with the functional property.

I collected data from some math books published since 2000 that contain a definition of function; they are listed below.  In this list, "typed" means  function was defined as going from a set A to a set B, A was called the domain, and B was not given a name. If "typed" is followed by a word (codomain, range or target) that was the name given the codomain. One book defined a function essentially as a partial function. Some that did not name the codomain defined "range" in the sense of image. Some of them emphasized that the range/image need not be the same as the codomain.

As far as I know, none of these books said that if two functions had the same domain and the same graph but different codomains they had to be different functions.  But I didn't read any of them extensively. 

My impression is that modern mathematical writing at least at college level does distinguish the domain, codomain, and image/range of a function, not always providing a word to refer to the codomain.

If the page number as a question mark after it that means I got the biblio data for the book from Amazon and the page number from Google books, which doesn't give the edition number, so it might be different.

I did not look for books by logicians or computing scientists.  My experience is that logicians tend to use partial functions and modern computing scientists generally require the codomain to be specified.

Opinion:  If you don't distinguish functions as different if they have different codomains, you lose some basic intuition (a function is a map) and you mess up common terminology.  For example the only function from {1} to {1} is the identity function, and is surjective.  The function from {1} to the set of real numbers (which is a point on the real line) is not the identity function and is not surjective.

THE LIST

Mathematics for Secondary School Teachers
 By Elizabeth G. Bremigan, Ralph J. Bremigan, John D. Lorch, MAA 2011
p. 6 (typed)

Oxford Concise Dictionary of Mathematics, ed. Christopher Clapham and James Nicholson,  Oxford University Press, 4th ed., 2009.
p. 184, (typed, codomain)

Math and Math-in-school: Changes in the Treatment of the Function Concept in …
 By Kyle M. Cochran, Proquest, 2011
p74  (partial function)

 Discrete Mathematics: An Introduction to Mathematical Reasoning
 By Susanna S. Epp, 4th edition, Cengage Learning, 2010 
 p. 294? (typed, co-domain)

 Teaching Mathematics in Grades 6 – 12: Developing Research-Based …
 By Randall E. Groth, SAGE, 2011
 p236 (typed, codomain)

Essentials of Mathematics, by Margie Hale, MAA, 2003.
p. 38 (typed, target).

Elements of Advanced Mathematics
 By Steven G. Krantz, 3rd ed., Chapman and Hall, 2012
p79? (typed, range)

Bridge to Abstract Mathematics
 By Ralph W. Oberste-Vorth, Aristides Mouzakitis, Bonita A. Lawrence, MAA 2012
 p76 (typed, codomain)

The Road to Reality by Roger Penrose, Knopf, 2005.
p. 104 (typed, target)

Precalculus: Mathematics for Calculus
 By James Stewart, Lothar Redlin, Saleem Watson, Cengage, 2011
p. 143.  (typed)

The Mathematics that Every Secondary School Math Teacher Needs to Know
 By Alan Sultan, Alice F. Artzt , Routledge, 2010.
 p.400 (typed)
 
 

Send to Kindle

The power of being naive

The interactive examples in this post require installing Wolfram CDF player, which is free and works on most desktop computers using Firefox, Safari and Internet Explorer, but not Chrome. The source code is the Mathematica Notebook MM Def Deriv.nb, which is available for free use under a Creative Commons Attribution-ShareAlike 2.5 License. See How to manipulate the diagrams for more information on what you can do with them. The notebook can be read by CDF Player if you cannot make the embedded versions in this post work.

Learning about the derivative as a concept

The derivative $f'(x)$ of $f(x)$ is the function whose value at $a$ is the slope of the line tangent to the graph $y=f(x)$ at the point $(a,f(a))$.

To gain understanding of the concept of derivative the student need to see and play with the pictures that illustrate the definition. This can be done in stages:

  • Give an intuitive, pictorial explanation of the tangent line.
  • Show in pictures what the slope of a line is.
  • Show in pictures how you can approximate the tangent line with secant lines.

Of course, many teachers and textbooks do this. I propose that:

The student will benefit in the long run by spending a whole class session on the intuitive ideas I just described and doing a set homework based only on intuition. Then you can start doing the algebraic stuff.

This post provides some ideas about manipulable diagrams that students can play with to gain intuition about derivatives. Others are possible. There are many on the Mathematica Demonstrations website. There are others written in Java and other languages, but I don't know of a site that tries to collect them in one place.

My claim that the student will benefit in the long run is not something I can verify, since I no longer teach.

Present the tangent line conceptually

The tangent line to a curve

  • is a straight line that touches the curve at a point on the curve,
  • and it goes in the same direction that the curve is going, like the red line in the picture below. (See How to manipulate the diagrams.)

 

My recommendation is that you let the students bring up some of the fine points.

  • The graph of $y=x^3-x$ has places where the tangent line cuts the curve at another point without being parallel to the curve there. Move the slider to find these places.
  • The graph of $y=\cos(\pi x)$ has places where the same line is tangent at more than one point on the curve. (This may requre stepping the slider using the incrementers.)
  • Instigate a conversation about the tangent line to a given straight line.
  • My post Tangents has other demos intended to bother the students.
  • Show the unit circle with some tangent lines and make them stare at it until they notice something peculiar.
  • "This graph shows the tangent line but how do you calculate it?" You can point out that if you draw the curve carefully and then slide a ruler around it so that it is tangent at the point you are interested in, then you can draw the tangent carefully and measure the rise and run with the ruler. This is a perfectly legitimate way to estimate the value of the slope there.

Slope of the tangent line conceptually

This diagram shows the slope of the tangent line as height over width.

  • Slide the $x$ slider back and forth. The width does not change. The height is measured from the tangent line to the corner, so the height does change; in particular, it changes sign appropriately.
  • This shows that the standard formula for the derivative of the curve gives the same value as the calculated slope of the tangent. (If you are careful you can find a place where the last decimal places differ.) You may want to omit the "derivative value" info line, but most students in college calculus already know how to calculate the formulas for the derivative of a polynomial– or you can just tell them what it is in this case and promise to show how to calculate the formula later.
  • Changing the width while leaving $x$ fixed does not change the slope of the tangent line (up to roundoff error).
  • In fact I could add another parameter that allows you to calculate height over width at other places on the tangent line. But that is probably excessive. (You could do that in a separate demo that shows that basic property that the slope of a straight line does not change depending on where you measure it — that is what a curve being a straight line means.)
  • This graph provides a way to estimate the slope, but does not suggest a way to come up with a formula for the slope, in other words, a formula for the derivative.

Conceptual calculation of the slope

This diagram shows how to calculate the value of the slope at a point using secant lines to approximate the tangent line. If you have a formula for the function, you can calculate the limit of the slope of the secant line and get a formula for the derivative.

 

  • The function $f(x)=x^3-x$.
  • The secant points are $(x-h,f(x-h))$ and $(x+h, f(x+h))$. $h$ is called "width" in the diagram.
  • Moving $x$ with the slider shows how the tangent line and secant line have similar slopes.
  • Moving the width to the left, to $0$ (almost), makes the secant line coincide with the tangent line. So intuitively the limit of the slope of the secant line is the slope of the tangent line.
  • The distance between the secant points is the Euclidean distance. (It may be that including this information does not help, so maybe it should be left out.)
  • The slope of the secant line is $\frac{f(x+h)-f(x-h)}{(x+h)-(x-h)}$ when $h\neq0$. This simplifies to $3x^2+h^2-1$, so the limit when $h\to0$ is $3x^2-1$, which is therefore a formula for the derivative function.

 

Testing intuitive concepts

Most of the work students do when studying derivatives is to solve some word problems (rate of change, maximization) in which the student is expected to come up with an appropriate function $f(x)$ and then know or find out the formula for $f'(x)$ in the process of solving the problem. In other words there is a heavy emphasis on computation and much less on concept.

The student in the past has had to do very few homework problems that test for understanding the concept. Lately some texts do have problems that test the concept, for example:

This is the graph of a function and its derivative. Which one is the function and which is its derivative?

Concept Prob

Note that the problem does not give you the formula for the function, nor does it have to.

Many variations are possible, all involving calculating parameters directly from the graph:

  • "These are the first and second derivatives of a function. Where (within the bounds of the graph) is the function concave up?"
  • "These are the first and second derivatives of a function. Where (within the bounds of the graph) are its maxima and minima?"
  • "This straight line is the derivative of a function. Show that the function is a quadratic function and measure the slope of the line in order to estimate some of the coefficients of the quadratic."

 

How to manipulate the diagrams

 

  • You can move the sliders back and forth to to move to different points on the curve.
  • In the first diagram, you can click on one of the four buttons to see how it works for various curves.
  • The arrow at the upper right makes it run automatically in a not very useful sort of way.
  • The little plus sign below the arrow opens up some other controls and a box showing the value of $a$, including step by step operation (plus and minus signs).
  • If you are using Mathematica, you can enter values into the box, but if you are using CDF Player, you can only manipulate the number using the slider or the plus and minus incrementers.

 

Send to Kindle

Conceptual blending

This post uses MathJax.  If you see formulas in unrendered TeX, try refreshing the screen.

A conceptual blend is a structure in your brain that connects two concepts by associating part of one with part of another.  Conceptual blending is a major tool used by our brain to understand the world.

The concept of conceptual blend includes special cases, such as representations, images and conceptual metaphors, that math educators have used for years to understand how mathematics is communicated and how it is learned.  The Wikipedia article is a good starting place for understanding conceptual blending. 

In this post I will illustrate some of the ways conceptual blending is used to understand a function of the sort you meet with in freshman calculus.  I omit the connections with programs, which I will discuss in a separate post.

A particular function

Consider the function $h(t)=4-(t-2)^2$. You may think of this function in many ways.

FORMULA:

$h(t)$ is defined by the formula $4-(t-2)^2$.

  • The formula encapsulates a particular computation of the value of $h$ at a given value $t$.
  • The formula defines the function, which is a stronger statement than saying it represents the function.
  • The formula is in standard algebraic notation. (See Note 1)
  • To use the formula requires one of these:
    • Understand and use the rules of algebra
    • Use a calculator
    • Use an algebraic programming language. 
  • Other formulas could be used, for example $4t-t^2$.
    • That formula encapsulates a different computation of the value of $h$.

TREE: 

$h(t)$ is also defined by this tree (right).
  • The tree makes explicit the computation needed to evaluate the function.
  • The form of the tree is based on a convention, almost universal in computing science, that the last operation performed (the root) is placed at the top and that evaluation is done from bottom to top.
  • Both formula and tree require knowledge of conventions.
  • The blending of formula and tree matches some of the symbols in the formula with nodes in the tree, but the parentheses do not appear in the tree because they are not necessary by the bottom-up convention.
  • Other formulas correspond to other trees.  In other words, conceptually, each tree captures not only everything about the function, but everything about a particular computation of the function.
  • More about trees in these posts:

GRAPH:

$h(t)$ is represented by its graph (right). (See note 2.)

  • This is the graph as visual image, not the graph as a set of ordered pairs.
  • The blending of graph and formula associates each point on the (blue) graph with the value of the formula at the number on the x-axis directly underneath the point.
  • In contrast to the formula, the graph does not define the function because it is a physical picture that is only approximate.
  • But the formula does represent the function.  (This is "represents" in the sense of cognitive psychology, but not in the mathematical sense.)
  • The blending requires familiarity with the conventions concerning graphs of functions. 
  • It sets into operation the vision machinery of your brain, which is remarkably elaborate and powerful.
    • Your visual machinery allows you to see instantly that the maximum of the curve occurs at about $t=2$. 
  • The blending leaves out many things.
    • For one, the graph does not show the whole function.  (That's another reason why the graph does not define the function.)
    • Nor does it make it obvious that the rest of the graph goes off to negative infinity in both directions, whereas that formula does make that obvious (if you understand algebraic notation).  

GEOMETRIC

The graph of $h(t)$ is the parabola with vertex $(2,4)$, directrix $x=2$, and focus $(2,\frac{3}{4})$. 

  • The blending with the graph makes the parabola identical with the graph.
  • This tells you immediately (if you know enough about parabolas!) that the maximum is at $(2,4)$ (because the directrix is vertical).
  • Knowing where the focus and directrix are enables you to mechanically construct a drawing of the parabola using a pins, string, T-square and pencil.  (In the age of computers, do you care?)

HEIGHT:

$h(t)$ gives the height of a certain projectile going straight up and down over time.

  • The blending of height and graph lets you see instantly (using your visual machinery) how high the projectile goes. 
  • The blending of formula and height allows you to determing the projectile's velocity at any point by taking the derivative of the function.
  • A student may easily be confused into thinking that the path of the projectile is a parabola like the graph shown.  Such a student has misunderstood the blending.

KINETIC:

You may understand $h(t)$ kinetically in various ways.

  • You can visualize moving along the graph from left to right, going, reaching the maximum, then starting down.
    • This calls on your experience of going over a hill. 
    • You are feeling this with the help of mirror neurons.
  • As you imagine traversing the graph, you feel it getting less and less steep until it is briefly level at the maximum, then it gets steeper and steeper going down.
    • This gives you a physical understanding of how the derivative represents the slope.
    • You may have seen teachers swooping with their hand up one side and down the other to illustrate this.
  • You can kinetically blend the movement of the projectile (see height above) with the graph of the function.
    • As it goes up (with $t$ increasing) the projectile starts fast but begins to slow down.
    • Then it is briefly stationery at $t=2$ and then starts to go down.
    • You can associate these feelings with riding in an elevator.
      • Yes, the elevator is not a projectile, so this blending is inaccurate in detail.
    • This gives you a kinetic understanding of how the derivative gives the velocity and the second derivative gives the acceleration.

OBJECT:

The function $h(t)$ is a mathematical object.

  • Usually the mental picture of function-as-object consists of thinking of the function as a set of ordered pairs $\Gamma(h):=\{(t,4-(t-2)^2)|t\in\mathbb{R}\}$. 
  • Sometimes you have to specify domain and codomain, but not usually in calculus problems, where conventions tell you they are both the set of real numbers.
  • The blend object and graph identifies each point on the graph with an element of $\Gamma(h)$.
  • When you give a formal proof, you usually revert to a dry-bones mode and think of math objects as inert and timeless, so that the proof does not mention change or causation.
    • The mathematical object $h(t)$ is a particular set of ordered pairs. 
    • It just sits there.
    • When reasoning about something like this, implication statements work like they are supposed to in math: no causation, just picking apart a bunch of dead things. (See Note 3).
    • I did not say that math objects are inert and timeless, I said you think of them that way.  This post is not about Platonism or formalism. What math objects "really are" is irrelevant to understanding understanding math [sic].

DEFINITION

definition of the concept of function provides a way of thinking about the function.

  • One definition is simply to specify a mathematical object corresponding to a function: A set of ordered pairs satisfying the property that no two distinct ordered pairs have the same second coordinate, along with a specification of the codomain if that is necessary.
  • A concept can have many different definitions.
    • A group is usually defined as a set with a binary operation, an inverse operation, and an identity with specific properties.  But it can be defined as a set with a ternary operation, as well.
    • A partition of a set is a set of subsets of a set with certain properties. An equivalence relation is a relation on a set with certain properties.  But a partition is an equivalence relation and an equivalence relation is a partition.  You have just picked different primitives to spell out the definition. 
    • If you are a beginner at doing proofs, you may focus on the particular primitive objects in the definition to the exclusion of other objects and properties that may be more important for your current purposes.
      • For example, the definition of $h(t)$ does not mention continuity, differentiability, parabola, and other such things.
      • The definition of group doesn't mention that it has linear representations.

SPECIFICATION

A function can be given as a specification, such as this:

If $t$ is a real number, then $h(t)$ is a real number, whose value is obtained by subtracting $2$ from $t$, squaring the result, and then subtracting that result from $4$.

  • This tells you everything you need to know to use the function $h$.
  • It does not tell you what it is as a mathematical object: It is only a description of how to use the notation $h(t)$.

Notes

1. Formulas can be give in other notations, in particular Polish and Reverse Polish notation. Some forms of these notations don't need parentheses.

2. There are various ways to give a pictorial image of the function.  The usual way to do this is presenting the graph as shown above.  But you can also show its cograph and its endograph, which are other ways of representing a function pictorially.  They  are particularly useful for finite and discrete functions. You can find lots of detail in these posts and Mathematica notebooks:

3. See How to understand conditionals in the abstractmath article on conditionals.

References

  1. Conceptual blending (Wikipedia)
  2. Conceptual metaphors (Wikipedia)
  3. Definitions (abstractmath)
  4. Embodied cognition (Wikipedia)
  5. Handbook of mathematical discourse (see articles on conceptual blendmental representationrepresentation, and metaphor)
  6. Images and Metaphors (article in abstractmath)
  7. Links to G&G posts on representations
  8. Metaphors in Computing Science (previous post)
  9. Mirror neurons (Wikipedia)
  10. Representations and models (article in abstractmath)
  11. Representations II: dry bones (article in abstractmath)
  12. The transition to formal thinking in mathematics, David Tall, 2010
  13. What is the object of the encapsulation of a process? Tall et al., 2000.

 

Send to Kindle

More about the definition of function

Maya Incaand commented on my post Definition of "function":

Why did you decide against "two inequivalent descriptions in common use"?  Is it no longer true?

This question concerns [1], which is a draft article.  I have not promoted it to the standard article in abstractmath because I am not satisfied with some things in it. 

More specifically, there really are two inequivalent descriptions in common use.  This is stated by the article, buried in the text, but if you read the beginning, you get the impression that there is only one specification.  I waffled, in other words, and I expect to rewrite the beginning to make things clearer.

Below are the two main definitions you see in university courses taken by math majors and grad students.  A functional relation has the property that no two distinct ordered pairs have the same first element.

Strict definition: A function consists of a functional relation with specified codomain (the domain is then defined to be the set of first elements of pairs in the relation).  Thus if $A$ and $B$ are sets and $A\subseteq B$, then the identity function $1_A:A\to A$ and the inclusion function $i:A\to B$  are two different functions.

Relational definition: A function is a functional relation.  Then the identity and inclusion functions are the same function.  This means that a function and its graph are the same thing (discussed in the draft article).

These definitions are subject to variations:

Variations in the strict definition: Some authors use "range" for "codomain" in the definition, and some don't make it clear that two functions with the same functional relation but different codomains are different functions.

Variations in the relational definition: Most such definitions state explicitly that the domain and range are determined by the relation (the set of first coordinates and the set of second coordinates). 

Formalism

There are many other variations in the formalism used in the definition.  For example, the strict definition can be formalized (as in Wikipedia) as an ordered triple $(A, B, f)$ where $A$ and $B$ are sets and $f$ is a functional relation with the property thar every element of $A$ is the first element of an ordered pair in the relation.  

You could of course talk about an ordered triple $(A,f,B)$ blah blah.  Such definitions introduce arbitrary constructions that have properties irrelevant to the concept of function.  Would you ever say that the second element of the function $f(x)=x+1$ on the reals is the set of real numbers?  (Of course, if you used the formalism $(A,f,B)$ you would have to say the second element of the function is its graph! )

It is that kind of thing that led me to use a specification instead of a definition.  If you pay attention to such irrelevant formalism there seems to be many definitions of function.  In fact, at the university level there are only two, the strict definition and the relational definition.  The usage varies by discipline and age.  Younger mathematicians are more likely to use the strict definition.  Topologists use the strict definition more often than analysts (I think).

Usage

There is also variation in usage.

  • Most authors don't tell you which definition they use, and it often doesn't matter anyway. 
  • If an author defines a function using a formula, there is commonly an implicit assumption that the domain includes everything for which the formula is well-defined.  (The "everything" may be modified by referring to it as an integer, real, or complex function.)

Definitions of function on the web

Below are some definitions of function that appear on the web.  I have excluded most definitions aimed at calculus students or below; they often assume you are talking about numbers and formulas.  I have not surveyed textbooks and research papers.  That would have to be done for a proper scholarly article about mathematical usage of "function". But most younger people get their knowledge from the web anyway.

  1. Abstractmath draft article: Functions: Specification and Definition.  (Note:  Right now you can't get to this from the Table of Contents; you have to click the preceding link.) 
  2. Gyre&Gimble post: Definition of "function"
  3. Intmath discussion of function  Function as functional relation between numbers, with induced domain and range.
  4. Mathworld definition of function Functional-relation definition.  Defines $F:A\to B$ in a way that requires $B$ to be the image.
  5. Planet Math definition of function Strict definition.
  6. Prime Encyclopedia of Mathematics Functional-relation definition.
  7. Springer Encyclopedia of Math definition of function  Strict definition, except not clear if different codomains mean different functions.
  8. Wikipedia definition of function Discusses both definitions.
  9. Wisconsin Department of Public Instruction Definition of function  Function as functional relation.
Send to Kindle

Mathematical usage

Comments about mathematical usage, extending those in my post on abuse of notation.

Geoffrey Pullum, in his post Dogma vs. Evidence: Singular They, makes some good points about usage that I want to write about in connection with mathematical usage.  There are two different attitudes toward language usage abroad in the English-speaking world. (See Note [1])

  • What matters is what people actually write and say.   Usage in this sense may often be reported with reference to particular dialects or registers, but in any case it is based on evidence, for example citations of quotations or a linguistic corpus.  (Note [2].)  This approach is scientific.
  • What matters is what a particular writer (of usage or style books) believes about  standards for speaking or writing English.  Pullum calls this "faith-based grammar".  (People who think in this way often use the word "grammar" for usage.)  This approach is unscientific.

People who write about mathematical usage fluctuate between these two camps.

My writings in the Handbook of Mathematical Discourse and abstractmath.org are mostly evidence based, with some comments here and there deprecating certain usages because they are confusing to students.  I think that is about the right approach.  Students need to know what is actual mathematical usage, even usage that many mathematicians deprecate.

Most math usage that is deprecated (by me and others) is deprecated for a reason.  This reason should be explained, and that is enough to stop it being faith-based.  To make it really scientific you ought to cite evidence that students have been confused by the usage.  Math education people have done some work of this sort.  Most of it is at the K-12 level, but some have worked with college students observing the way the solve problems or how they understand some concepts, and this work often cites examples.

Examples of usage to be deprecated

 

Powers of functions

f^n(x) can mean either iterated composition or multiplication of the values.  For example, f^2(x) can mean f(x)f(x) or f(f(x)).  This is exacerbated by the fact that in undergrad calculus texts,  \sin^{-1}x refers to the arcsine, and \sin^2 x refers to \sin x\sin x.  This causes innumerable students trouble.  It is a Big Deal.

In

Set "in" another set.  This is discussed in the Handbook.  My impression is that for students the problem is that they confuse "element of" with "subset of", and the fact that "in" is used for both meanings is not the primary culprit.  That's because most sets in practice don't have both sets and non-sets as elements.  So the problem is a Big Deal when students first meet with the concept of set, but the notational confusion with "in" is only a Small Deal.

Two

This is not a Big Deal.  But I have personally witnessed students (in upper level undergrad courses) that were confused by this.

Parentheses

The many uses of parentheses, discussed in abstractmath.  (The Handbook article on parentheses gives citations, including one in which the notation "(a,b)" means open interval once and GCD once in the same sentence!)  I think the only part that is a Big Deal, or maybe Medium Deal, is the fact that the value of a function f at an input x can be written either  "f\,x" or as "f(x)".  In fact, we do without the parentheses when the name of the function is a convention, as in \sin x or \log x, and with the parentheses when it is a variable symbol, as in "f(x)".  (But a substantial minority of mathematicians use f\,x in the latter case.  Not to mention xf.)  This causes some beginning calculus students to think "\sin x" means "sin" times x.

More

The examples given above are only a sampling of troubles caused by mathematical notation.   Many others are mentioned in the Handbook and in Abstractmath, but they are scattered.  I welcome suggestions for other examples, particularly at the college and graduate level. Abstractmath will probably have a separate article listing the examples someday…

Notes

[1] The situation Pullum describes for English is probably different in languages such as Spanish, German and French, which have Academies that dictate usage for the language.  On the other hand, from what I know about them most speakers of those languages ignore their dictates.

[2] Actually, they may use more than one corpus, but I didn't want to write "corpuses" or "corpora" because in either way I would get sharp comments from faith-based usage people.

References on mathematical usage

Bagchi, A. and C. Wells (1997), Communicating Logical Reasoning.

Bagchi, A. and C. Wells (1998)  Varieties of Mathematical Prose.

Bullock, J. O. (1994), ‘Literacy in the language of mathematics’. American Mathematical Monthly, volume 101, pages 735743.

de Bruijn, N. G. (1994), ‘The mathematical vernacular, a language for mathematics with typed sets’. In Selected Papers on Automath, Nederpelt, R. P., J. H. Geuvers, and R. C. de Vrijer, editors, volume 133 of Studies in Logic and the Foundations of Mathematics, pages 865  935.  

Epp, S. S. (1999), ‘The language of quantification in mathematics instruction’. In Developing Mathematical Reasoning in Grades K-12. Stiff, L. V., editor (1999),  NCTM Publications.  Pages 188197.

Gillman, L. (1987), Writing Mathematics Well. Mathematical Association of America

Higham, N. J. (1993), Handbook of Writing for the Mathematical Sciences. Society for Industrial and Applied Mathematics.

Knuth, D. E., T. Larrabee, and P. M. Roberts (1989), Mathematical Writing, volume 14 of MAA Notes. Mathematical Association of America.

Krantz, S. G. (1997), A Primer of Mathematical Writing. American Mathematical Society.

O'Halloran, K. L.  (2005), Mathematical Discourse: Language, Symbolism And Visual Images.  Continuum International Publishing Group.

Pimm, D. (1987), Speaking Mathematically: Communications in Mathematics Classrooms.  Routledge & Kegan Paul.

Schweiger, F. (1994b), ‘Mathematics is a language’. In Selected Lectures from the 7th International Congress on Mathematical Education, Robitaille, D. F., D. H. Wheeler, and C. Kieran, editors. Sainte-Foy: Presses de l’Université Laval.

Steenrod, N. E., P. R. Halmos, M. M. Schiffer, and J. A. Dieudonné (1975), How to Write Mathematics. American Mathematical Society.

Wells, C. (1995), Communicating Mathematics: Useful Ideas from Computer Science.

Wells, C. (2003), Handbook of Mathematical Discourse

Wells, C. (ongoing), Abstractmath.org.

Send to Kindle

Definition of “function”

I have made a major revision of the abstractmath.org article Functions: Specification and Definition.   The links from the revised article lead into the main abstractmath website, but links from other articles on the website still go back to the old version. So if you click on a link in the revised version, make it come up in a new window.

I expect to link the revision in after I make a few small changes, and I will take into account any comments from you all.

Remarks

1.  You will notice that the new version is in PDF instead of HTML.  A couple of other articles on the website are already in PDF, but I don’t expect to continue replacing HTML by PDF.   It is too much work.  Besides, you can’t shrink it to fit tablets.

2. It would also have been a lot of work to adapt the revision so that I could display it directly on Word Press.  In some cases I have written revisions first in WP and then posted them on the abmath website.  That is not so difficult and I expect to do it again.

Send to Kindle

Tangents

The interactive examples in this post require installing Wolfram CDF player, which is free and works on most desktop computers using Firefox, Safari and Internet Explorer, but not Chrome. The source code is the Mathematica Notebook Tangent Line.nb, which is available for free use under a Creative Commons Attribution-ShareAlike 2.5 License. The notebook can be read by CDF Player if you cannot make the embedded versions in this post work.

This is an experiment in exposition of the mathematical concepts of tangent.  It follows the same pattern as my previous post on secant, although that post has explanations of my motivation for this kind of presentation that are not repeated here.

Tangent line

A line is tangent to a curve (in the plane) at a given point if all the following conditions hold (Wikipedia has more detail.):

  1. The line is a straight line through the point.
  2. The curve goes through that point.
  3. The curve is differentiable in a neighborhood of the point.
  4. The slope of the straight line is the same as the derivative of the curve at that point.

In this picture the curve is $ y=x^3-x$ and the tangent is shown in red. You can click on the + signs for additional controls and information.

Etymology and metaphor

The word “tangent” comes from the Latin word for “touching”. (See Note below.) The early scholars who talked about “tangent” all read Latin and knew that the word meant touching, so the metaphor was alive to them.

The mathematical meaning of “tangent” requires that the tangent line have slope equal to the derivative of the curve at the point of contact. All of the red lines in the picture below touch the curve at the point (0, 1.5). None of them are tangent to the curve there because the curve has no derivative at the point:

The curve in this picture is defined by

The mathematical meaning restricts the metaphor. The red lines you can generate in the graph all touch the curve at one point, in fact at exactly at one point (because I made the limits on the slider -1 and 1), but there are not tangent to the curve.

Tangents can hug!

On the other hand, “touching” in English usage includes maintaining contact on an interval (hugging!) as well as just one point, like this:

The blue curve in this graph is given by

The green curve is the derivative dy/dx. Notice that it has corners at the endpoints of the unit interval, so the blue curve has no second derivative there. (See my post Curvature).

Tangent lines in calculus usually touch at the point of tangency and not nearby (although it can cross the curve somewhere else). But the red line above is nevertheless tangent to the curve at every point on the curve defined on the unit interval, according to the definition of tangent. It hugs the curve at the straight part.

The calculus-book behavior of tangent line touching at only one point comes about because functions in calculus books are always analytic, and two analytic curves cannot agree on an open set without being the same curve.

The blue curve above is not analytic; it is not even smooth, because its second derivative is broken at $x=0$ and $x=1$. With bump functions you can get pictures like that with a smooth function, but I am too lazy to do it.

Tangent on the unit circle

In trigonometry, the value of the tangent function at an angle $ \theta$ erected on the x-axis is the length of the segment of the tangent at (1,0) to the unit circle (in the sense defined above) measured from the x-axis to the tangent’s intersection with the secant line given by the angle. The tangent line segment is the red line in this picture:


This defines the tangent function for $ -\frac{\pi}{2} < x < \frac{\pi}{2}$.

The tangent function in calculus

That is not the way the tangent function is usually defined in calculus. It is given by \tan\theta=\frac{\sin\theta}{\cos\theta}, which is easily seen by similar triangles to be the same on -\frac{\pi}{2} < x < \frac{\pi}{2}.

We can now see the relationship between the geometric and the $ \frac{\sin\theta}{\cos\theta}$ definition of the tangent function using this graph:


The red segment and the green segment are always the same length.
It might make sense to extend the geometric definition to $ \frac{\pi}{2} < x < \frac{3\pi}{2}$ by constructing the tangent line to the unit circle at (-1,0), but then the definition would not agree with the $ \frac{\sin\theta}{\cos\theta}$ definition.

References

Send to Kindle

Case Study in Exposition: Secant

The interactive examples in this post require installing Wolfram CDF player, which is free and works on most desktop computers using Firefox, Safari and Internet Explorer, but not Chrome. The source code comes from several Mathematica notebooks lists in the References. The notebooks are available for free use under a Creative Commons Attribution-ShareAlike 2.5 License. The notebook can be read by CDF Player if you cannot make the embedded versions in this post work.

Pictures, metaphors and etymology

Math texts and too many math teachers do not provide enough pictures and metaphors to help students understand a concept.  I suspect that the etymology of the technical terms might also be useful. This post is an experimental exposition of the math concept of “secant” that use pictures, metaphors and etymology to describe the concept.

The exposition is interlarded with comments about what I am doing and why.  An exposition directly aimed at students would be slimmer — but some explanations of why you are doing such and such in an exposition are not necessarily out of place every time!

Secant Line

The word “secant” is used in various related ways in math.  To start with, a secant line on a curve is the unique line determined by two distinct points on the curve, like this:


The word “secant” comes from the Latin word for “cut”, which came from the Indo-European root “sek”, meaning “cut”.  The IE root also came directly into English via various Germanic sound changes to give us “saw” and “sedge”.

The picture

Showing pictures of mathematical objects that the reader can fiddle with may make it much easier to understand a new concept.  The static picture you get above by keeping your mitts off the sliders requires imagining similar lines going through other pairs of points. When you wiggle the picture you see similar lines going through other pairs of points.  You also get a very strong understanding of how the secant line is a function of the two given points.  I don’t think that is obvious to someone without some experience with such things.

This belief contains the hidden claim that individuals vary a lot on how they can see the possibilities in a still picture that stands as an example of a lot of similar mathematical objects.  (Math books are full of such pictures.)  So people who have not had much practice learning about possible variation in abstract structures by looking at one motionless one will benefit from using movable parametrized pictures of various kinds.  This is the sort of claim that is amenable to field testing.

The metaphor

Most metaphors are based on a physical phenomenon.  The mathematical meanings of “secant” use the metaphor of cutting.  When the word “secant” was first introduced by a European writer (see its etymology) in the 16th century, the word really was a metaphor.   In those days essentially every European scholar read Latin. To them “secant” would transparently mean “cutting”.  This is not transparent to many of us these days, so the metaphor may be hidden.

If you examine the metaphor you realize that (like all metaphors) it involves making some remarkably subtle connections in your brain.

  • The straight line does not really cut the curve.  Indeed, the curve itself is both an abstract object that is not physical, so can’t be cut, and also the picture you see on the screen, which is physical, but what would it mean to cut it?  Cut the screen?  The line can’t do that.
  • You can make up a story that (for example) the use was suggested by the mental image of a mark made by a knife edge crossing the plane at points a and b that looks like it is severing the curve.
  • The metaphor is restricted further by saying that it is determined by two points on the curve.   This restriction turns the general idea of secant line into a (not necessarily faithful!) two-parameter family of straight lines.  You could define such a family by using one point on the curve and a slope, for example.  This particular way of doing it with two points on the curve leads directly to the concept of tangent line as limit.

Secant on circle

Another use of the word “secant” is the red line in this picture:


This is the secant line on the unit circle determined by the origin and one point on the circle, with one difference: The secant of the angle is the line segment between the origin and the point on the curve.  This means it corresponds to a number, and that number is what we mean by “secant” in trigonometry.

To the ancient Greeks, a (positive) number was the length of a line segment.

The Definition

The secant of an angle $\theta$ is usually defined as $\frac{1}{\cos\theta}$, which you can see by similar triangles is the length of the red line in the picture above.

In many parts of the world, trig students don’t learn the word “secant”. They simply use $\frac{1}{\cos\theta}$.

This illustrates important facts about definitions:

  • Different equivalent definitions all make the same theorems true.
  • Different equivalent definitions can give you a very different understanding of the concept.

The red-line-segment-in-picture definition gives you a majorly important visual understanding of the concept of “secant”.  You can tell a lot from its behavior right off (it goes to infinity near $\pi/2$, for example).

The definition $\sec\theta=\frac{1}{\cos\theta}$ gives you a way of computing $\sec\theta$.  It also reduces the definition of $ \sec\theta$ to a previously known concept.

It used to be common to give only the $ \frac{1}{\cos\theta}$ definition of secant, with no mention of the geometric idea behind it.  That is a crime.  Yes, I know many students don’t want to “understand” stuff, they only want to know how to do the problems.  Teachers need to talk them out of that attitude.  One way to do that in this case is to test them on the geometric definition.

Etymology

This idea was known to the Arabs, and brought into European view in the 16th century by Danish mathematician Thomas Fincke in “Geometria Rotundi” (1583), where the first known use of the word “secant” occurs.  I have not checked, but I suspect from the title of the book that the geometric definition was the one he used in the book.

It wold be interesting to know the original Arabic name for secant, and what physical metaphor it is based on.  A cursory search of the internet gave me the current name in Arabic for secant but nothing else.

Graph of the secant function

The familiar graph of the secant function can be seen as generated by the angle sweeping around the curve, as in the picture below. The two red line segments always have the same length.


References

Mathematica notebooks used in this post:

 

Send to Kindle