Category Archives: understanding math

Dysfunctions in doing math II

This post continues Dysfunctions in doing math I, with some more revisions to the article in abstractmath on dysfunctions.


First Myth

MYTH: There are two kinds of mathematical objects: "sets" and "elements".

This is the TRUTH: Being an element is not a property that some math objects have and others don’t. “Element” is a binary relation; it relates an object and a set. So “$3$ is an element” means nothing, but “$3$ is an element of the set of integers” is true and relates two mathematical objects to each other.

Any mathematical object can be an element of a set
In particular, any set can be the
element of another set.


  • The number $42$ is not a set, but it is an element of the set $\{5,10,41,42,-30\}$.
  • The sine function is not a set, but it is an element of the set of all differentiable functions defined on the real numbers.
  • The set $\{1,2,5\}$ is a set, but it is also an element of the set $\left\{\{1,2,5\},\{3,5\}, \emptyset,\{42\}\right\}$. It is not an element of the set $\{1,2,3,4,5\}$.

If you find these examples confusing, read this.

Second Myth

MYTH: The empty set is an element of every set.

This is the TRUTH:
The empty set is an element of a set $S$ if and only if the definition of $S$ requires it to be an element.


  • The empty set is not an element of every set. It is not an element of the set $\{2,3\}$ for example; that set has only the elements $2$ and $3$.
  • The empty set is an element of the set $\{2,3,\emptyset\}$.
  • The empty set is a subset of every set.

Other ways to misunderstand sets

The myths just listed are explicit; students tell them to each other. The articles below tell you about other misunderstanding about sets which are usually subconscious.


An enthymeme is an argument based partly on unexpressed beliefs. Beginners at the art of writing proofs often produce enthymemes.


In the process of showing that the intersection of two equivalence relations $E$ and $E’$ is also an equivalence relation, a student may write “$E\cap E’$ is transitive because $E$ and $E’$ are transitive.”

  • This is an enthymeme; it omits stating, much less proving, that the intersection of transitive relations is transitive.
  • The student may “know” that it is obvious that the intersection of transitive relations is transitive, having never considered the similar question of the union of transitive relations.
  • It is very possible that the student possesses (probably subconsciously) a malrule to the effect that for any property $P$ the union or intersection of relations with property $P$ also has property $P$.
  • The instructor very possibly suspects this. For some students, of course, the suspicion will be unjustified, but for which ones?
  • This sort of thing is a frequent source of tension between student and instructor: “Why did you take points off because I assumed the intersection of transitive relations is transitive? It’s true!”


A malrule is an incorrect rule for syntactic transformation of a mathematical expression.


The malrule $\sqrt{x+y}=\sqrt{x}+\sqrt{y}$ invented by algebra students may come from the pattern given by the distributive law $a(x+y)=ax+ay$. The malrule invented by many first year calculus students that transforms $\frac{d(uv)}{dx}$ to $\frac{du}{dx}\frac{dv}{dx}$ may have been generated by extrapolating from the correct rule
\[\frac{d(u+v)}{dx}=\frac{du}{dx}+\frac{dv}{dx}\] by changing addition to multiplication. Both are examples of “every operation is linear”, which students want desperately to be true, although they are not aware of it.

Existential bigamy

Beginning abstract math students sometimes make a particular type of mistake that occurs in connection with a property $P$ of an mathematical object $x$ that is defined by requiring the existence of an item $y$ with a certain relationship to $x$. When students have a proof that assumes that there are two items $x$ and $x’$ with property $P$, they sometimes assume that the same $y$ serves for both of them. This mistake is called existential bigamy: The fact that Muriel and Bertha are both married (there is a person to whom Muriel is married and there is a person to whom Bertha is married) doesn’t mean they are married to the same person.


Let $m$ and $n$ be integers. By definition, $m$ divides $n$ if there is an integer $q$ such that $n=qm$. Suppose you are asked to prove that if $m$ divides both $n$ and $p$, then $m$ divides $n+p$. If you begin the proof by saying, “Let $n = qm$ and $p = qm$…” then you are committing existential bigamy.

You need to begin the proof this way: “Let $n = qm$ and $p = q’m…”$

Inverse image demo revisited

This post is an update of the post Demonstrating the inverse image of a function.

To manipulate the demos in this post, you must have Wolfram CDF Player installed on your computer. It is available free from the Wolfram website. The code for the demos, with some explanatory remarks, is in the file InverseImage.nb on my ,Mathematica website. That website also includes some other examples as .cdf files.

  • If the diagrams don’t appear, or appear but show pink diagrams, or if the formulas in the text are too high or too low, refresh the screen.
  • The vertical red interval has the horizontal green interval(s) as inverse image.
  • You can move the sliders back and forth to move to different points on the curve. As you move the sliders back and forth you will see the inverse image breaking up into a disjoint union in intervals, merging into a single interval, or disappearing entirely.
  • The arrow at the upper right makes it run automatically in a not very useful sort of way.
  • If you are using Mathematica, you can enter values into the boxes, but if you are using CDF Player, you can only change the number using the slider or the plus and minus incrementers.


This is the graph of $y=x^2-1$.

The graph of $-.5 + .5 x + .2 x^2 – .19 x^3 – .015 x^4 + .01 x^5 $

The graph of the rational function $0.5 x+\frac{1.5 \left(x^4-1\right)}{x^4+1}$

The graph of a straight line whose slope can be changed. You can design demos of other functions with variable parameters.

The graph of the sine function. The other demos were coded using the Mathematica Reduce function to get the inverse image. This one had to be done in an ad hoc way as explained in the InverseImage.nb file.


Dysfunctions in doing math I

I am in the middle of revising the article in on dysfunctional attitudes and behaviors in doing math. Here are three of the sections I have finished.

Misuse of analogy

When William Rowan Hamilton was trying to understand the new type of number called quaternions (MW, Wik) that he invented, he assumed by analogy that like other numbers, quaternion multiplication was commutative. It was a major revelation to him that they were not commutative.

Analogy may suggest new theorems or ways of doing things. But it is fallible. What happens particularly often in abstract math is applying a rule to a situation where it is not appropriate. This is an easy trap to fall into when the notation in two different cases has the same form; that is an example of formal analogy.

Matrix multiplication

Matrix multiplication is not commutative

If $r$ and $s$ are real numbers then the products $rs$ and $sr$ are always the same number. In other words, multiplication of real numbers is commutative : $rs = sr$ for all real numbers $r$ and $s$.

The product of two matrices $M $and $N$ is written $MN$, just as for numbers. But matrix multiplication is not commutative. For example,
1 & 2 \\
3 & 4\\
3 & 1 \\
3 &2\\
9 & 5\\
21 & 11 \\
3 & 1 \\
3 & 2\\
1 & 2 \\
3 & 4\\
6 & 10\\
91 & 14 \\
Because $rs = sr$ for numbers, the formal similarity of the notation suggests $MN$ = $NM$, which is wrong.

This means you can’t blindly manipulate $MNM$ to become $M^2N$. More generally, a law such as $(MN)^n=M^nN^n$ is not correct when $M$ and $N$ are matrices.

You must understand the meanings
of the symbols you manipulate.

The product of two nonzero matrices can be 0

If the product of two numbers is 0, then one or both of the numbers is zero. But that is not true for matrix multiplication:
-2 & 2 \\
-1 & 1\\
1 & 1 \\
1 &1\\
0 &0\\
0 & 0 \\

Canceling sine

  • Beginning calculus students have already learned algebra.
  • They have learned that an expression such as $xy$ means $x$ times $y$.
  • They have learned to cancel like terms in a quotient, so that for example \[\frac{3x}{3y}=\frac{x}{y}\]
  • They have learned to write the value of a function $f$ at the input $x$ by $f(x)$.
  • They have seen people write $\sin x$ instead of $\sin(x)$ but have never really thought about it.
  • So they write \[\frac{\sin x}{\sin y}=\frac{x}{y}\]

This happens fairly often in freshman calculus classes. But you wouldn’t do that, would you?

Boundary values of definitions

Definitions are usually inclusive

Definitions of math concepts usually include the special cases they generalize.


  • A square is a special case of rectangle. As far as I know texts that define “rectangle” include squares in the definition. Thus a square is a rectangle.
  • A straight line is a curve.
  • A group is a semigroup.
  • An integer is a real number. (But not always in computing languages — see here.)

But not always

  • The axioms of a field include a bunch of axioms that a one-element set satisfies, plus a special axiom that does nothing but exclude the one-element set. So a field has to have at least two elements, and that fact does not follow from the other axioms.
  • Boolean algebras are usually defined that way, too, but not always. MathWorld gives several definitions of Boolean algebra that disagree on this point.

When boundary values are not special cases

Definitions may or may not include other types of boundary values.


  • If $S$ is a set, it is a subset of itself. The empty set is also a subset of $S$.
  • Similarly the divisors of $6$ are $-6$, $-3$, $-2$, $-1$, $1$, $2$, $3$ and $6$, not just $2$ and $3$ and not just $1$, $2$, $3$ and $6$ (there are two different boundaries here).

But …

  • The positive real numbers include everything bigger than $0$, but not $0$. ( Note).


A definition that includes such special cases may be called inclusive; otherwise it is exclusive. People new to abstract math very commonly use words defined inclusively as if their definition was exclusive.

  • They say things such as “That’s not a rectangle, it is a square!” and “Is that a group or a semigroup?”
  • They object if you say “Consider the complex number $\pi $.”

This appears to be natural linguistic behavior. Even so, math is picky-picky: a square is a rectangle, a group is a semigroup and $\pi$ is a complex number (of course, it is also a real number).


  • You attend a math lecture and the speaker starts talking about things you never heard of.
  • Your fellow students babble at you about manifolds and tensors and you thought they were car parts and lamps.
  • You suspect your professor is deliberately talking over your head to put you down.
  • You suspect your friends are trying to make you believe they are much smarter than you are.
  • You suspect your friends are smarter than you are.

There are two possibilities:

  • They are not trying to intimidate you (most common).
  • They are deliberately setting out to intimidate you with their arcane knowledge so you will know what a worm you are. (There are people like that.)

Another possibility, which can overlap with the two above, is:

  • You expect to be intimidated. You may be what might be called a co-intimidator, Similar to the way someone who is codependent wants some other person to be dependent on them. (This is not like the “co” in category theory: “product” and “coproduct” have a symmetric relationship with each other, but the co-intimidator relation is asymmetric.)

There are many ways to get around being intimidated.

  • Ask “What the heck is a manifold?”
  • (In a lecture where it might be imprudent or impractical to ask) Write down what they say, then later ask a friend or look it up.
  • Most teachers like to be asked to explain something. Yes, I know some professors repeatedly put down people. Change sections! If you can’t, live with it! Not knowing something says nothing bad about you.

And remember:

If you don’t know something
probably many other students don’t know it either.

A proof by diagram chasing

In Rigorous proofs, I went through the details of a medium-easy epsilon-delta proof in great detail as a way of showing what is hidden by the wording of the proof. In this post, I will do the same for an easy diagram-chasing proof in category theory. This theorem is stated and proved in Category Theory for Computing Science, page 365, but the proof I give here maximizes the diagram-chasing as a way of illustrating the points I want to make.

Theorem (J. Lambek) Let $F$ be a functor from a category to itself and let $\alpha:Fa\to a$ be an algebra for $F$ which is initial. Then $\alpha$ is an isomorphism.


  1. $F\alpha:FFa\to Fa$ is also an $F$-algebra.
  2. Initiality means that there is a unique algebra morphism $\eta:a\to Fa$ from $\alpha:Fa\to a$ to $F\alpha:FFa\to Fa$ for which this diagram commutes:

  3. To that diagram we can adjoin another (obviously) commutative square:

  4. Then the outside rectangle in the diagram above also commutes.
  5. This means that $\alpha\circ\eta:a\to a$ is an $F$-algebra morphism from $\alpha:Fa\to a$ to itself.
  6. Another such $F$-algebra morphism is $\text{id}_{A}$.
  7. Initiality of $\alpha$ means that the diagram below commutes:

  8. Because the upper bow and the left square both commute we are justified in inserting a diagonal arrow as below.

  9. Now we can read off the diagram that $F\alpha\circ F(\eta)=\text{id}_{Fa}$ and $\eta\circ\alpha=\text{id}_a$. By definition, then, $\eta$ is a two-sided inverse to $\alpha$, so $\alpha$ is an isomorphism.

Analysis of the proof

This is an analysis of the proof showing what is not mentioned in the proof, similar to the analysis in Rigorous proofs.

  • An $F$-algebra is any arrow of the form $\alpha:Fa\to a$. This definition directly verifies statement (1). You do need to know the definition of “functor” and that the notation $Fa$ means $F(a)$ and $FFa$ means $F(F(a))$.
  • When I am chasing diagrams, I visualize the commutativity of the diagram in (2) by thinking of the red path and the blue path as having the same composites in this graph:

    In other words, $F\alpha\circ F\eta=\eta\circ\alpha$. Notice that the diagram carries all the domain and codomain information for the arrows, whereas the formula “$F\alpha\circ F\eta=\eta\circ\alpha$” requires you to hold the domains and codomains in your head.

  • (Definition of morphism of $F$-algebra) The reader needs to know that a morphism of $F$ algebras is any arrow $\delta:c\to d$ for which

  • (Definition of initial $F$-algebra) $\alpha$ is an initial $F$-algebra means that for any algebra $\beta:Fb\to b$, there is a unique arrow $\delta$ for which the diagram above commutes.
  • (2) is justified by the last two definitions.
  • Pulling a “rabbit out of a hat” in a proof means introducing something that is obviously correct with no motivation, and then checking that it results in a proof. Step (9) in the proof given in Rigorous proofs has an example of adding zero cleverly. It is completely OK to pull a rabbit out of a hat in a proof, as long as the result is correct, but it makes students furious.
  • In statement (3) of the proof we are considering here, the rabbit is the trivially commutative diagram that is adjoined on the right of the diagram from (2).
  • Statement (4) uses a fact known to all diagram chasers: Two joined commutative squares make the outside rectangle commute. You can visualize this by seeing that the three red paths shown below all have the same composite. When I am chasing a complicated diagram I trace the various paths with my finger, or in my head.

    You could also show it by pointing out that $\alpha\circ F\alpha\circ F\eta=\alpha\circ\eta\circ\alpha$, but to check that I think most of us would go back and look at the diagram in (3) to see why it is true. Why not work directly with the diagram?

  • The definition of initiality requires that there be only one $F$-algebra morphism from $\alpha:Fa\to a$ to itself. This means that the upper and lower bows in (7) commute.
  • The diagonal identity arrow in (8) is justified by the fact that the upper bow is exactly the same diagram as the upper triangular diagram in (8). It follows that the upper triangle in (8) commutes. I visualize this as moving the bow down and to the left with the upper left node $Fa$ as a hinge, so that the two triangles coincide. (It needs to be flipped, too.) I should make an interactive diagram that shows this.
  • The lower triangle in (8) also commutes because the square in (2) is given to be commutative.
  • (Definition of isomorphism in a category) An arrow $f:a\to b$ in a category is an isomorphism if there is an arrow $g:b\to a$ for which these diagrams commute:


    This justifies statement (9).

Remark: I have been profligate in using as many diagrams as I want because this can be seen on a screen instead of on paper. That and the fact that much more data about domains and codomains are visible because I am using diagrams instead of equations involving composition means that the proof requires the readers to carry much less invisible data in their heads.

Rigorous proofs

Rich and rigorous

When we try to understand a math statement, we visualize what the statement says using metaphors, images and kinetic feelings to feel how it is true, or to suggest that the statement is not true.

If we are convinced that it is true, we may then want to prove it. Doing that involves pitching out all the lovely pictures and metaphors and gestures and treating the mathematical objects involved in the proof as static and inert. “Static” means the object does not change. “Inert” means that it does not affect anything else. I am saying how we think about math objects for the purpose of rigorous proof. I am not saying anything about “what math objects are”.

In this post I give a detailed example of a proof of the rigorous sort.


Informal statement

First, I’ll describe this example in typical spoken mathematical English. Suppose you suspect that the following statement is true:

Claim: Let $f(x)$ be a differentiable function with $f’(a)=0$.
Going from left to right, suppose the graph of $f(x)$ goes UP before $x$ reaches $a$ and then DOWN for $x$ to the right of $a$
Then $a$ has to be a local maximum of the function.

This claim is written in informal math English. Mathematicians talk like that a lot. In this example they will probably wave their hands around in swoops.

The language used is an attempt to get a feeling for the graph going up to $(a,f(a))$ and then falling away from it. It uses two different metaphors for $x\lt a$ and $x\gt a$. I suspect that most of us would want to clean that up a bit even in informal writing.

A more formal statement

Theorem: Let $f$ be a real valued differentiable function defined on an open interval $R$. Let $a$ be a number in $R$ for which $f’(a)=0$. Suppose that for all $x\in R$, $f$ increases for $x\lt a$ and decreases for $x\gt a$. Then $f(a)$ is a maximum of $f$ in $R$.


  1. By definition of derivative, \[\lim_{x\to a}\frac{f(x)-f(a)}{x-a}=0.\]
  2. By definition of limit, then for any positive $\epsilon$ there is a positive $\delta$ for which if $0\lt|x-a|\lt\delta$ then \[\left|\frac{f(x)-f(a)}{x-a}\right|\lt\epsilon.\]
  3. By requiring that $\delta\lt 1$, it follows from (2) that for any positive $\epsilon$, there is a positive $\delta$ for which if $0\lt|x-a|\lt\delta$, then $|f(x)-f(a)|\lt\epsilon$.
  4. “$f$ increases for $x\lt a$” means that if $x$ and $y$ are numbers in $R$ and $x\lt y\lt a$, then $f(x)\lt f(y)$.
  5. “$f$ decreases for $x\gt a$” means that if $x$ and $y$ are numbers in $R$ and $a\lt x\lt y$, then $f(x)\gt f(y)$.
  6. “$f(a)$ is a maximum of $f$ in $R$” means that for $x\in R$, if $x\neq a$, then $f(x)\lt f(a)$.
  7. Suppose that $x\in R$ and $x\lt a$. (The case that $x\gt a$ has a symmetric proof.)
  8. Given $\epsilon\gt0$ with $\delta$ as given by (3), choose $y\in R$ such that $x\lt y\lt a$ and $|f(y)-f(a)|\lt\epsilon$.
  9. By (4), $f(x)\lt f(y)$. So by (8), \[\begin{align*}
    f(x)-f(y)+f(y)-f(a)\\ &\lt f(y)-f(a)\\ &\leq|f(y)-f(a)|\lt\epsilon\end{align*}\]
    so that $f(x)\lt f(a)+\epsilon$. By inserting “$-f(y)+f(y)$” into the second formula, I am “adding zero cleverly”, an example of pulling a rabbit out of a hat. Students hate that. But you have to live with it; as long as the statements following are correct, it makes a valid proof. Rabbit-out-of-a-hat doesn’t make a proof wrong, but it does make you wonder how the author thought of it. Live with it.
  10. Since (9) is true for all positive $\epsilon$, it follows that $f(x)\leq f(a)$.
  11. By the same argument as that leading up to (10), $f(\frac{x-a}{2})\leq f(a)$.
  12. Since $f(x)\lt f(\frac{x-a}{2})$, it follows that $f(x)\lt f(a)$ as required.

About the proof

This proof is intended to be a typical “rigorous” proof. I suspect it tends to be more rigorous than most mathematicians would find necessary,


The point about “rigor”, about insisting that the objects be static and inert, is that this causes symbols and expression to retain the same meaning throughout the text. This is one aspect of extensionality.

Of course, some of the symbols denote variables, or variable objects. This does not mean they are “varying”. I am taking this point of view: A variable refers to a math object but you don’t know what it is. Constraints such as $x\lt a$ rule out some possible values but don’t generally tell you exactly what $x$ is. There is more about this in Variable Objects

The idea in (6), for example, is that $y$ denotes a real number. You don’t know which number it is, but you do know some facts about it: $x\lt y\lt a$, $|f(y)\lt f(a)|\lt\epsilon$ and so on. Similarly you don’t know what function $f$ is, but you do know some facts about it: It is differentiable, for example, and $f’(a)=0$.

My statement that the variables aren’t “varying” means specifically that each unbound occurrence of the variable refers to the same value as any other occurrence, unless some intervening remark changes its meaning. For example, the references to $x$ in (7) through (10) refer to the same value it has in (6), and (10), in particular, constitutes a statement that the claim about $x$ is correct.


The elimination of metaphors that lets the proof achieve rigor is part of a plan in the back of the mind of at least some mathematicians who write proofs. The idea is that the proof be totally checkable:

  • Every statement in the proof has a semantics, a meaning, that is invariant (given the remark about variables above).
  • Each statement is justified by some of the previous statements. This justification is given by two systems that the reader is supposed to understand.
  • One system is the rules of symbol manipulation that are applied to the symbolic expressions, ordinary algebra, and higher-level manipulations used in particular branches of math.
  • The other system consists of the rules of logical reasoning that justify the claims that each statement follows logically from preceding ones.
  • These two systems are really branches of one system, the entire system of math computation and reasoning. It can be obscure which system is being used in a particular step.

Suppression of reasons

The logical and symbolic-manipulation reasons justifying the deductions may not be made completely explicit. In fact, for many steps they may not be mentioned at all, and for others, one or two phrases may be used to give a hint. This is standard practice in writing “rigorous” proofs. That is a descriptive statement, made without criticism. Giving all the reasons is essentially impossible without a computer.

I am aware that some work has been done to write proof checkers that can read a theorem like the one we are considering, stated in natural language, and correctly implement the semantics I have described in this list. I don’t know of any references to such work and would appreciate information about it.

Suppression of reasons makes it difficult to mechanically check a proof written in this standard “rigorous” writing style. Basically, you must be at at least the graduate student level to be able to make sense of what is said, and even experienced math research people find it difficult to read a paper in a very different field. Writing the proof so that it can be checked by a proof checker requires understanding of the same sort, and it typically makes the proof much longer.

One hopeful new approach is to write the proofs using homotopy type theory. The pioneers in that field report that the proofs don’t expand nearly as much as is required by first order logic.

Examples of suppression

Here are many examples of suppression in the $\epsilon$-$\delta$ proof above. This is intended to raise your consciousness concerning how nearly opaque writing in math research is to anyone but the cognoscenti.

  • The first sentence of the theorem names $R$ and $f$ and puts constraints on them that can be used to justify statements in the proof. The naming of $R$ and $f$ requires that every occurrence of $R$ in the proof refers to the same mathematical object, and similarly for $f$.

Remark: The savvy reader “knows” the facts stated in (a), possibly entirely subconsciously. For many of us there is no conscious thought of constraints and permanence of naming. My goal is to convince those who teach beginning abstract math course to become conscious of these phenomena. This remark applies to all the following items as well.

  • The second sentence gives $a$ a specific meaning that will be maintained throughout the proof. It also puts constraints on $a$ and an additional constraint on $f$.
  • The third sentence gives a constraint on $R$, $f$ and $a$. It does not give a constraint on $x$, which is a bound variable. Nor does it name $x$ as a specific number with the same meaning in the rest of the proof. (That happens later).
  • The fact that the first three sentences impose constraints on various objects is signaled by the fact that the sentences are introduced by “let” and “suppose”. The savvy reader knows this.
  • The fourth sentence announces that “$f(a)$ is a maximum of $f$ in $R$” is a consequence of the constraints imposed by the preceding three sentences. (In other words, it follows from the context.) This is signaled by the word “then”.
  • The fact that the paragraph is labeled “Theorem” informs us that the fourth sentence is therefore a statement of what is to be proved, and that every constraint imposed by the first three sentences of the Theorem may be used in the proof.
  • In the proof, statements (1), (4), (5) and (6) rewrite the statements in the theorem according to the definitions of the words involved, namely “derivative: “increases”, “decreases” and “maximum”. Rewriting statements according to the definitions of the words involved is a fundamental method for starting a proof.
  • (2) follows from (1) by rewriting using the definition of “limit”. Note that pattern-matching against the definition of limit requires understanding that there is a zero inside the absolute value signs that is not written down. Could a computer proof-checker handle that?
  • (3) follows from (2). The reader or proof-checker must:
    • Know that it is acceptable to put an upper bound on $\delta$ in the definition of limit.
    • Notice that you can move $|x-a|$ out of the denominator because $x\neq a$ by (2).
  • The conclusion in (6) that we much show that $f(x)\lt f(a)$ is now the statement we must prove.

Remark: In the following items, I mention the context of the proof. I am using the word informally here. It is used in some forms of formal logic with a related but more precise meaning. The context consists of the variables you must hold in your head as you read each part of the proof, along with their current constraints. “Current” means the “now” that you are in when considering the step of the proof you are reading right now. I give some references at the end of the post.

  • At the point between (6) and (7), our context consists of $a$, $R$ and $f$ all subject to some constraints. $x$ is not yet in the context of our proof because its previous occurrences in the theorems and in (1) through (6) have been bound, mostly by an unexpressed universal quantifier. Now we are to think of $x$ as a specific number bound by some constraints.
  • The statement in (7) that the case $x\gt a$ as a symmetric proof is a much higher-level claim than the other steps in this proof, even though in fact it is not very high level compared to statements such as “An application of Serre’s spectral sequence shows$\ldots$”. Most mathematicians with even a little experience will read this statement and accept it in the confidence that they will know how to swap “$\lt$” and “$\gt$” in the proof in the correct way (which is a bit picky) to provide a dual proof. Some students might write out the dual proof to make sure they understand it (more likely because writing it out was a class assignment). I await the day that an automated proof checker can handle a statement like this.
  • (8) introduces three new math objects $\epsilon$, $\delta$ and $y$ subject to several constraints. The symbols occur earlier but they are all bound. $\epsilon$ will be fixed in our context from now until (10). The others don’t appear later.
  • (9) consists of several steps of algebraic computation. A cognoscent (I am tired of writing “savvy”) reader first looks at the computation as a whole and notices that it deduces that $|f(x)-f(a)|\lt\epsilon$, which is almost what is to be proved. This helps the reader understand the reason for the calculation. No mention whatever is made in this step of all this stuff that should go through your mind (or the proof-checker’s “mind”).
  • The computations in (9) are are basic algebra not explained step by step, except that the remark that $f(x)\lt f(y)$ explains how you get $f(x)-f(y)+f(y)-f(a) \lt f(y)-f(a)$.
  • (10) banishes $\epsilon$ from the context by universally quantifying over it. That $f(x)\leq f(a)$ follows by the garbage-dump-in-Star-Wars trick that often baffles first year analysis students: Since for all positive $\epsilon$, $f(x)\lt f(a)+\epsilon$, then $f(x)\leq f(a)$. (See also Terry Tao’s article in Tricks Wiki.)
  • (11) “By the same argument as leading up to (10)” puts some demands on the reader, who has to discover that you have to go back to (7) and do the following steps with a new context using a value of $x$ that is halfway closer to $a$ than the “old” $x$ was. This means in particular that the choice of $\frac{x-2}{2}$ is unnecessarily specific. But it works.
  • (12) suppresses the reference to (11).
  • References

    I have written extensively on these topics. Here are some links.

    Rich-rigorous bifurcation in math thinking

The symbolic language

Math English and the language of proofs

Proofs and context

Guest post by F. Kafi

Before I posted Extensional and Intensional, I had emailed a draft to F. Kafi.  The following was his response.  –cw


In your example, “Suppose you set out to prove that if $f(x)$ is a differentiable function and $f(a)=0$ and the graph going from left to right goes UP to $f(a)$ and then DOWN after that then $a$ has to be a maximum of the function”, could we have the graph of the function $f(x)$ without being aware of the internal structure of the function; i.e., the mathematical formulation of $f(x)$ such as $f(x):=-(x-a)^2$ or simply its intensional meaning? Certainly not.

Furthermore, what paves the way for the comparison with our real world experiences leading to the metaphoric thinking is nothing but the graph of the function. Therefore, it is the intensional meaning of the function which makes the metaphoric mode of thinking possible.

The intensional meaning is specially required if we are using a grounding metaphor. A grounding metaphor uses concepts from our physical and real world life. As a result we require a medium to connect such real life concepts like “going up” and “going down” to mathematical concepts like the function $f(x)$. The intensional meaning of function $f(x)$ through providing numbers opens the door of the mind to the outer world. This is possible because numbers themselves are the result of a kind of abstraction process which the famous educational psychologist Piaget calls empirical abstraction. In fact, through empirical abstraction we transform the real world experience to numbers.



Let’s consider an example. We see some racing cars in the picture above, a real world experience if you are the spectator of a car match. The empirical abstraction works something like this:



Now we may choose a symbol like "$5$" to denote our understanding of "|||||".

It is now clear that the metaphoric mode of thinking is the reverse process of “empirical abstraction”. For example, in comparing “|||||||||||” with “||||” we may say “A car race with more competing cars is much more exciting than a much less crowded one.” Therefore, “|||||||||||”>“||”, where “>” is the abstraction of “much more exciting than”.

In the rigorous mode of thinking, the idea is almost similar. However, there is an important difference. Here again we have a metaphor. But this time, the two concepts are mathematical. There is no outer world concept. For example, we want to prove a differentiable function is also a continuous one. Both concepts of “differentiability” and “continuity” have rigorous mathematical definitions. Actually, we want to show that differentiability is similar to continuity, a linking metaphor. As a result, we again require a medium to connect the two mathematical concepts. This time there is no need to open the door of the mind to the outer world because the two concepts are in the mind. Hence, the intensional meaning of function $f(x)$ through providing numbers is not helpful. However, we need the intensional meanings of differentiability and continuity of $f(x)$; i.e., the logical definitions of differentiability and continuity.

In the case of comparing the graph of $f(x$) with a real hill we associated dots on the graph with the path on the hill. Right? Here we need to do the the same. We need to associate the $f(x)$’s in the definition of differentailblity to the $f(x)$’s used in the definition of continuity. The $f(x)$’s play the role of dots on the graph. As the internal structure of dots on the graph are unimportant to the association process in the grounding metaphor, the internal structure of $f(x)$’s in the logical definition are unimportant to the association process in the linking metaphor. Therefore, we only need the extensional meaning of the function $f(x)$; i.e., syntactically valid roles it can play in expressions.

Extensional and Intensional

This post uses the word intensional, which is not the word "intentional" and doesn't mean the same thing.


The connection between rich view/rigorous view and intensional/extensional


In the abmath article Images and Metaphors I wrote about the rigorous view of math, in contrast to the rich view which allows metaphors, images and intuition. F. Kafi has proposed the following thesis:

The rigorous mode of thinking deals with the extensional meaning of mathematical objects while the metaphoric mode of thinking deals with the intensional meaning of mathematical objects.

This statement is certainly suggestive as an analogy. I have several confused and disjointed thoughts about it.

What does "intensional" mean?


Philosophers say that "the third largest planet in the solar system" has intensional meaning and "Neptune" has extensional meaning. Among other things we might discover a planet ridiculously far out that is bigger than Neptune. But the word "Neptune" denotes a specific object.

The intensional meaning of "the third largest planet in the solar system" has a hidden time dimension that, if made overt, makes the statement more nearly explicit. (Don't read this paragraph as a mathematical statement; it is merely thrashing about to inch towards understanding.)

Computing science

Computer languages are distinguishes as intensional or extensional, but their meaning there is technical, although clearly related to the philosophers' meaning.

I don't understand it very well, but in Type Theory and in Logic, an intensional language seems to make a distinction between declaring two math objects to be equal and proving that they are equal. In an extensional language there is no such distinction, with the effect that in a typed language typing would be undecidable.

Here is another point: If you define the natural numbers by the Peano axioms, you can define addition and then prove that addition is commutative. But for example a vector space is usually defined by axioms and one of the axioms is a declaration that addition of vectors is commutative. That is an imposed truth, not a deduced one. So is the difference between intensional and extensional languages really a big deal or just a minor observation?

What is "dry-bones rigor"?

Another problem is that I have never spelled out in more than a little detail what I mean by rigor, dry-bones rigor as I have called it. This is about the process mathematicians go through to prove a theorem, and I don't believe that process can be given a completely mathematical description. But I could go into much more detail than I have in the past.

Suppose you set out to prove that if $f(x)$ is a differentiable function and $f(a)=0$ and the graph going from left to right goes UP before $x$ reaches $a$ and then DOWN for $x$ to the right of $a$, then $a$ has to be a maximum of the function. That is a metaphorical description based on the solid physical experience of walking up to the top of a hill. But when you get into the proof you start using lots of epsilons and deltas. This abandons ideas of moving up and down and left to right and so on. As one of the members of Bourbaki said, rigorous math is when everything goes dead. That sounds like extensionality, but isn't their work really based on the idea that everything has to be reduced to sets and logic? (This paragraph was modified on 2013.11.07)

Many perfectly rigorous proofs are based on reasoning in category theory. You can define an Abelian group as a categorical diagram with the property that any product preserving functor to any category will result in a group. This takes you away from sets altogether, and is a good illustration of the axiomatic method. It is done by using nodes, arrows and diagrams. The group is an object and the binary operation is an arrow from the square of the object. Commutativity is required by stating that a certain diagram must commute. But when you prove that two elements in an Abelian group (an Abelian topological group, an Abelian group in the category of differentiable manifolds, or whatever) can be added in either order, then you find yourself staring at dead arrows and diagrams rather than dead collections of things and so you are still in rigor mortis mode.

I will write a separate post describing these examples in much more detail than you might want to think about.

Metaphors and intensionality

One other thing I won't go into now: How are thinking in metaphors and intensional descriptions related? It seems to me the two ideas are related somehow, but I don't know how to formulate it.


This is an update of an article in the abstractmath glossary.

About theorems

A theorem is said to be trivial to prove or trivially true


Here is a scenario that exemplifies (a):

  • A textbook defines the image of a function $F:A\to B$ to be the set of all elements of $B$ of the form $F(a)$ for some $a\in A$.
  • It then goes on to say that $F$ is surjective if for every element $b$ of $B$ there is an element $a\in A$ with the property that $F(a) = b$.
  • It then states a theorem, or give an exercise, that says that a function $F:A\to B$ is surjective if and only if the image of $F$ is $B$.
  • The proof follows immediately by rewriting using definitions.
  • The instructor calls the proof trivial and goes on to the next topic.
  • Some students are totally baffled.

I have seen this happen many times with this and other theorems.  This sort of incident may be why many intelligent people feel they are "bad at math".

People are not born knowing the principle of rewriting by definitions. The principle needs to be TAUGHT.

  • When a class is first introduced to proof techniques the instructor should explicitly describe rewriting by definitions with several examples.
  • After that, the instructor can say that a proof follows by rewriting by definitions and make it clear that the students will have to do the work (then or later).
  • Such a proof is justly called "trival" but saying it is trivial is also a putdown if no one has pointed out the procedure of rewriting by definitions.


This example illustrates (b).

Theorem: Let $G$ be a finite group and $H$ a subgroup of index $2$ (meaning it has half the number of elements of the group).  Then $H$ is normal in $G$.

Basic facts about groups and subgroups learned in first semester abstract algebra:

  • A subgroup of a group determines a partition consisting of left cosets and another partition of right cosets, each (in the finite case) with the same number of elements as the subgroup.
  • A subgroup is a left coset of itself and also a right coset of itself.
  • If every left coset is also a right coset and vice versa (so the two partitions just mentioned are the same), then by definition the subgroup is normal in the group.

Now if $H$ has index $2$ that means that each partition consists of two cosets. In both cases, one of them has to be $H$, so the other one has to be $G\setminus H$, which must therefore be a left and right coset of $H$. So $H$ is normal in $G$.

So once you understand the basics about cosets and normal subgroups, the fact that $H$ has to be normal if it is of index $2$ is "obvious". I don't think you should call this "trivial". Best to say it is "obvious if you have a clear understanding of cosets of groups".

About mathematical objects

  • A function may be called trivial if it is the identity function or a constant function, and possibly in other circumstances. (If someone showed that the cosmo­logical constant is 0 that would not be called trivial.)
  • A solution to an equation may be said to be trivial if it is 0 or 1. There may be other situations in which a solution is called "trivial" as well.
  • A mathematical structure is said to be trivial if its underlying set is empty or a singleton set. In particular, a subset of a set is nontrivial if it is nonempty. I have not found an example where "nontrivial subset" means it is not a singleton. 

Note: "Trivial" and "degenerate" overlap in meaning but are not interchangeable.  What is called "degenerate" seems to depend on the mathematical specialty.

Thinking about a function as a mathematical object

A mathematician’s mental representation of a function is generally quite rich and may involve many different metaphors and images kept in mind simultaneously. The abmath article on metaphors and images for functions discusses many of these representations, although the article is incomplete. This post is a fairly thorough rewrite of the discussion in that article of the representation of the concept of “function” as a mathematical object. You must think of functions as math objects when you are taking the rigorous view, which happens when you are trying to prove something about functions (or large classes of functions) in general.

What often happens is that you visualize one of your functions in many of the ways described in this article (it is a calculation, it maps one space to another, its graph is bounded, and so on) but those images can mislead you. So when you are completely stuck, you go back to thinking of the function as an axiomatically-defined mathe­matical structure of some sort that just sits there, like a complicated machine where you can see all the parts and how they relate to each other. That enables you to prove things by strict logical deduction. (Mathematicians mostly only go this far when they are desperate. We would much rather quote somebody’s theorem.) This is what I have called the dry bones approach.

The “mathematical structure” is most commonly a definition of function in terms of sets and axioms. The abmath article Specification and definition of “function” discusses the usual definitions of “function” in detail.


This example is intended to raise your consciousness about the possibilities for functions as objects.

Consider the function $f:\mathbb{R}\to\mathbb{R}$ defined by $f(x)=2{{\sin }^{2}}x-1$. Its value can be computed at many different numbers but it is a single, static math object.

You can apply operators to it

  • Just as you can multiply a number by $2$, you can multiply $f$ by $2$.   You can say “Let $g(x)=2f(x)$” or “Let $g=2f$”. Multiplying a numerical function by $2$ is an operator that take the function $f$ to $2f$. Its input is a function and its output is another function. Then the value of $g$ (which is $2f$) at any real $x$ is $g(x)=2f(x)=4{{\sin }^{2}}x-2$. The notation  “$g=2f$” reveals that mathematicians think of $f$ as a single math object just as the $3$ in the expression “$2\times 3$” represents the number $3$ as a single object.
  • But you can’t do arithmetic operations to functions that don’t have numerical output, such as the function $\text{FL}$ that takes an English word to its first letter, so $\text{FL}(`\text{wolf’})=`\text{w’}$. (The quotes mean that I am writing about the word ‘wolf’ and the letter ‘w’.) The expression $2\times \text{FL}(`\text{wolf’})$ doesn’t make sense because ‘w’ is a letter, not a number.
  • You can find the derivative.  The derivative operator is a function from differentiable functions to functions. Such a thing is usually called an operator.  The derivative operator is sometimes written as $D$, so $Df$ is the function defined by: “$(Df)(x)$ is the slope of the tangent line to $f$ at the point $(x,f(x)$.” That is a perfectly good definition. In calculus class you learn formulas that allow you to calculate $(Df)(x)$ (usually called “$f’(x)$”) to be $4 \sin (x) \cos (x)$.

Like all math objects, functions may have properties

  • The function defined by $f(x)=2{{\sin}^{2}}x-1$ is differentiable, as noted above. It is also continuous.
  • But $f$ is not injective. This means that two different inputs can give the same output. For example,$f(\frac{\pi}{3})=f(\frac{4\pi}{3})=\frac{1}{2}$. This is a property of the whole function, not individual values. It makes no sense to say that $f(\frac{\pi}{3})$ is injective.
  • The function $f$ is periodic with period $2\pi$, meaning that for any $x$, $f(x+2\pi)=f(x)$.     It is the function itself that has period $2\pi$, not any particular value of it.  

As a math object, a function can be an element of a set

  • For example,$f$ is an element of the set ${{C}^{\infty }}(\mathbb{R})$ of real-valued functions that have derivatives of all orders.
  • On ${{C}^{\infty }}(\mathbb{R})$, differentiation is an operator that takes a function in that set to another function in the set.   It takes $f(x)$ to the function $4\sin x\cos x$.
  • If you restrict $f$ to the unit interval, it is an element of the function space ${{\text{L}}^{2}}[0,1]$.   As such it is convenient to think of it as a point in the space (the whole function is the point, not just values of it).    In this particular space, you can think of the points as vectors in an uncountably-infinite-dimensional space. (Ideas like that weird some people out. Do not worry if you are one of them. If you keep on doing math, function spaces will seem ordinary. They are OK by me, except that I think they come in entirely too many different kinds which I can never keep straight.) As a vector, $f$ has a norm, which you can think of as its length. The norm of $f$ is about $0.81$.

The discussion above shows many examples of thinking of a function as an object. You are thinking about it as an undivided whole, as a chunk, just as you think of the number $3$ (or $\pi$) as just a thing. You think the same way about your bicycle as a whole when you say, “I’ll ride my bike to the library”. But if the transmission jams, then you have to put it down on the grass and observe its individual pieces and their relation to each other (the chain came off a gear or whatever), in much the same way as noticing that the function $g(x)=x^3$ goes through the origin and looks kind of flat there, but at $(2,8)$ it is really rather steep. Phrases like “steep” and “goes through the origin” are a clue that you are thinking of the function as a curve that goes left to right and levels off in one place and goes up fast in another — you are thinking in a dynamic, not a static way like the dry bones of a math object.

The definition of “function”


This is the new version of the abstractmath article on the definition of function. I had to adapt the formatting and some of it looks weird, but legible. It is prettier on

I expect to announce new revisions of other abmath articles on this blog, with links, but not to publish them here. This article brings out a new point of view about defining functions that I wanted to call attention to, so I am publishing it here, as well.



It is essential that you understand many of the images, metaphors and terminology that mathe­maticians use when they think and talk about functions. For many purposes, the precise mathematical definition of "function" does not play much of a role when you are trying to understand particular kinds of functions. But there is one point of view about functions that has resulted in fundamental progress in math:



A function is a mathematical object.

To deal with functions in that way you need a precise definition of "function". That is what this article gives you.

  • The article starts by giving a specification of "function".
  • After that, we get into the technicalities of the definitions of the general concept of function.
  • Things get complicated because there are several inequivalent definitions of "function" in common use.

Specification of "function"

A function $f$ is a mathematical object which determines and is completely determined by the following data:

(DOM) $f$ has a domain, which is a set. The domain may be denoted by $\text{dom} f$.

(COD) $f$ has a codomain, which is also a set and may be denoted by $\text{cod} f$.

(VAL) For each element $a$ of the domain of $f$, $f$ has a value at $a$, denoted by $f(a)$.

(FP) The value of $f$ at $a$ is completely determined by $a$ and $f$.

(VIC) The value of $f$ at $a$ must be an element of the codomain of $f$.

  • The operation of finding $f(a)$ given $f$ and $a$ is called evaluation.
  • "FP" means functional property.
  • "VIC" means "value in codomain".


The examples of functions chapter contains many examples. The two I give here provide immediate examples.

A finite function

Let $F$ be the function defined on the set $\left\{1,\,2,3,6 \right\}$ as follows: $F(1)=3,\,\,\,F(2)=3,\,\,\,F(3)=2,\,\,\,F(6)=1$. This is the function called "Finite'' in the chapter on examples of functions.

  • The definition of $F$ says "$F$ is defined on the set $\left\{1,\,2,\,3,\,6 \right\}$". That phrase means that the domain is that set.
  • The value of $F$ at each element of the domain is given explicitly. The value at 3, for example, is 2, because the definition says that $F(2) = 3$. No other reason needs to be given. Mathematical definitions can be arbitrary.
  • The codomain of $F$ is not specified, but must include the set $\{1,2,3\}$. The codomain of a function is often not specified when it is not important — which is most of the time in freshman calculus (for example).

A real-valued function

Let $G$ be the real-valued function defined by the formula $G(x)={{x}^{2}}+2x+5$.

  • The definition of $G$ gives the value at each element of the domain by a formula. The value at $3$, for example, is $G(3)=3^2+2\cdot3+5=20$.
  • The definition of $G$ does not specify the domain. The convention in the case of functions defined on the real numbers by a formula is to take the domain to be all real numbers at which the formula is defined. In this case, that is every real number, so the domain is $\mathbb{R}$.
  • The definition does not specify the codomain, either. However, must include all real numbers greater than or equal to 4. (Why?)

What the specification means

  • The specification guarantees that a function satisfies all five of the properties listed.
  • The specification does not define a mathematical structure in the way mathematical structures have been defined in the past: In particular, it does not require a function to be one or more sets with structure.
  • Even so, it is useful to have the specification, because:



    Many mathematical definitions
    introduce extraneous technical elements
    which clutter up your thinking
    about the object they define.



    I will say more about this when I give the various definitions that are in use.


Until late in the nineteenth century, functions were usually thought of as defined by formulas (including infinite series). Problems arose in the theory of harmonic analysis which made mathematicians require a more general notion of function. They came up with the concept of function as a set of ordered pairs with the functional property (discussed below), and that understanding revolutionized our understanding of math.

This discussion is an over­simpli­fication of the history of mathe­matics, which many people have written thick books about. A book relevant to these ideas is Plato's Ghost, by Jeremy Gray.

In particular, this definition, along with the use of set theory, enabled abstract math (ahem) to become a common tool for understanding math and proving theorems. It is conceivable that some of you may wish it hadn't. Well, tough.

The more modern definition of function given here (which builds on the older definition) came into use beginning in the 1950's. The strict version became necessary in algebraic topology and is widely used in many fields today.

The concept of function as a formula never disappeared entirely, but was studied mostly by logicians who generalized it to the study of function-as-algorithm. Of course, the study of algorithms is one of the central topics of modern computing science, so the notion of function-as-formula (updated to function-as-algorithm) has achieved a new importance in recent years.

To state both the old abstract definition and the modern one, we need a preliminary idea.

The functional property

A set $P$ of ordered pairs has the functional property if two pairs in $P$ with the same first coordinate have to have the same second coordinate (which means they are the same pair). In other words, if $(x,a)$ and $(x,b)$ are both in $P$, then $a=b$.

How to think about the functional property

The point of the functional property is that for any pair in the set of ordered pairs, the first coordinate determines what the second one is. That's why you can write "$G(x)$'' for any $x $ in the domain of $G$ and not be ambiguous.


  • The set $\{(1,2), (2,4), (3,2), (5,8)\}$ has the functional property, since no two different pairs have the same first coordinate. Note that there are two different pairs with the same second coordinate. This is irrelevant to the functional property.
  • The set $\{(1,2), (2,4), (3,2), (2,8)\}$ does not have the functional property. There are two different pairs with first coordinate 2.
  • The empty set $\emptyset$ has the function property vacuously.

Graph of a function.

Example: graph of a function defined by a formula

In calculus books, a picture like this one (of part of $y=x^2+2x+5$) is called a graph. Here I use the word "graph" to denote the set of ordered pairs \[\left\{ (x,{{x}^{2}}+2x+5)\,\mathsf{|}\,x\in \mathbb{R } \right\}\] which is a mathematical object rather than some ink on a page or pixels on a screen.

The graph of any function studied in beginning calculus has the functional property. For example, the set of ordered pairs above has the functional property because if $x$ is any real number, the formula ${{x}^{2}}+2x+5$ defines a specific real number.

  • if $x = 0$, then ${{x}^{2}}+2x+5=5$, so the pair $(0, 5)$ is an element of the graph of $G$. Each time you plug in $0$ in the formula you get 5.
  • if $x = 1$, then ${{x}^{2}}+2x+5=8$.
  • if $x = -2$, then ${{x}^{2}}+2x+5=5$.

You can measure where the point $\{-2,5\}$ is on the (picture of) the graph and see that it is on the blue curve as it should be. No other pair whose first coordinate is $-2$ is in the graph of $G$, only $(-2, 5)$. That is because when you plug $-2$ into the formula ${{x}^{2}}+2x+5$, you get $5$ and nothing else. Of course, $(0, 5)$ is in the graph, but that does not contradict the functional property. $(0, 5)$ and $(-2, 5)$ have the same second coordinate, but that is OK.

Modern mathematical definition of function

A function $f$ is a mathematical structure consisting of the following objects:

  • A set called the domain of $f$, denoted by $\text{dom} f$.
  • A set called the codomain of $f$, denoted by $\text{cod} f$.
  • A set of ordered pairs called the graph of $ f$, with the following properties:
  • $\text{dom} f$ is the set of all first coordinates of pairs in the graph of $f$.
  • Every second coordinate of a pair in the graph of $f$ is in $\text{cod} f$ (but $\text{cod} f$ may contain other elements).
  • The graph of $f$ has the functional property.

Using arrow notation, this implies that $f:A\to B$.


The main difference between the specification of function given previously and this definition is that the definition replaces the statement "$f$ has a value at $a$" by introducing a set of ordered pairs (the graph) with the functional property.

  • This set of ordered pairs is extra structure introduced by the definition mainly in order to make the definition a classical sets-with-structure, which makes the graph, which should be a concept derived from the concept of function, into an apparently necessary part of the function.
  • That suggests incorrectly that the graph is more of a primary intuition that other intuitions such as function as relocator, function as transformer, and other points of view discussed in the article Intuitions and metaphors for functions.


  • Let $F$ have graph $\{(1,2), (2,4), (3,2), (5,8)\}$ and define $A = \{1, 2, 3, 5\}$ and $B = \{2, 4, 8\}$. Then $F:A\to B$ is a function. In speaking, we would usually say, "$F$ is a function from $A$ to $B$."
  • Let $G$ have graph $\{(1,2), (2,4), (3,2), (5,8)\}$ (same as above), and define $A = \{1, 2, 3, 5\}$ and $C = \{2, 4, 8, 9, 11, \pi, 3/2\}$. Then $G:A\to C$ is a (admittedly ridiculous) function. Note that all the second coordinates of the graph are in $C$, along with a bunch of miscellaneous suspicious characters that are not second coordinates of pairs in the graph.
  • Let $H$ have graph $\{(1,2), (2,4), (3,2), (5,8)\}$. Then $H:A\to \mathbb{R}$ is a function, since $2$, $4$ and $8$ are all real numbers.
  • Let $D = \{1, 2, 5\}$ and $E = \{1, 2, 3, 4, 5\}$. Then there is no function $D\to A$ and no function $E\to A$ with graph $\{(1,2), (2,4), (3,2), (5,8)\}$. Neither $D$ nor $E$ has exactly the same elements as the first coordinates of the graph.

Identity and inclusion

Suppose we have two sets  A and  B with $A\subseteq B$.

  • The identity function on A is the function ${{\operatorname{id}}_{A}}:A\to A$ defined by ${{\operatorname{id}}_{A}}(x)=x$ for all $x\in A$. (Many authors call it ${{1}_{A}}$).
  • When $A\subseteq B$, the inclusion function from $A$ to $B$ is the function $i:A\to B$ defined by $i(x)=x$ for all $x\in A$. Note that there is a different function for each pair of sets $A$ and $B$ for which $A\subseteq B$. Some authors call it ${{i}_{A,\,B}}$ or $\text{in}{{\text{c}}_{A,\,B}}$.

The identity function and an inclusion function for the same set $A$ have exactly the same graph, namely $\left\{ (a,a)|a\in A \right\}$. More about this below.

Other definitions of function

Original abstract definition of function



Possible confusion

Some confusion can result because of the presence of these two different definitions.

Multivalued function

Some older mathematical papers in com­plex func­tion theory do not tell you that their functions are multi­valued. There was a time when com­plex func­tion theory was such a Big Deal in research mathe­matics that the phrase "func­tion theory" meant complex func­tion theory and all the cogno­scenti knew that their functions were multi­valued.

The phrase multivalued function refers to an object that is like a function $f:S\to T$ except that for $s\in S$, $f(s)$ may denote more than one value.


  • Multivalued functions arose in considering complex functions. In common practice, the symbol $\sqrt{4}$ denoted $2$, although $-2$ is also a square root of $4$. But in complex function theory, the square root function takes on both the values $2$ and $-2$. This is discussed in detail in Wikipedia.
  • The antiderivative is an example of a multivalued operator. For any constant $C$, $\frac{x^3}{3}+C$ is an antiderivative of $x^2$.

A multivalued function $f:S\to T$ can be modeled as a function with domain $S$ and codomain the set of all subsets of $T$. The two meanings are equivalent in a strong sense (naturally equivalent}). Even so, it seems to me that they represent two differ­ent ways of thinking about multivalued functions. ("The value may be any of these things…" as opposed to "The value is this whole set of things.")

The phrases "multivalued function" and "partial function" upset some picky types who say things like, "But a multi­valued func­tion is not a func­tion!". A step­mother is not a mother, either. See the Hand­book article on radial category.

Partial function

A partial function $f:S\to T$ is just like a function except that its input may be defined on only a subset of $S$. For example, the function $f(x)=\frac{1}{x}$ is a partial function from the real numbers to the real numbers.

This models the behavior of computer programs (algorithms): if you consider a program with one input and one output as a function, it may not be defined on some inputs because for them it runs forever (or gives an error message).

In some texts in computing science and mathematical logic, a function is by convention a partial function, and this fact may not be mentioned explicitly, especially in research papers.

New approaches to functions

All the definitions of function given here produce mathematical structures, using the traditional way to define mathematical objects in terms of sets. Such definitions have disadvantages.

Mathematicians have many ways to think about functions. That a function is a set of ordered pairs with a certain property (functional) and possibly some ancillary ideas (domain, codomain, and others) is not the way we usually think about them$\ldots$Except when we need to reduce the thing we are studying to its absolutely most abstract form to make sure our proofs are correct. That most abstract form is what I have called the rigorous view or the dry bones and it is when that reasoning is needed that the sets-with-structure approach has succeeded.

Our practice of abstraction has led us to new approaches to talking about functions. The most important one currently is category theory. Roughly, a category is a bunch of objects together with some arrows going between them that can be composed head to tail. Functions between sets are examples of this: the sets are the objects and the functions the arrows.

This abstracts the idea of function in a way that brings out common ideas in various branches of math. Research papers in many branches of mathematics now routinely use the language of category theory. Categories now appear in some undergraduate math courses, meaning that Someone needs to write a chapter on category theory for

Besides category theory, computing scientists have come up with other abstract ways of dealing with functions, for example type theory. It has not come as far along as category theory, but has shown recent signs of major progress.

Both category theory and type theory define math objects in terms of their effect on and relationship with other math objects. This makes it possible to do abstract math entirely without using sets-with-structure as a means of defining concepts.