Presenting binary operations

This is the first of a set of notes I am writing to help me develop my thoughts about how particular topics in my book Abstracting algebra should be organized. This article describes my plan for the book in some detail. The present post has some thoughts about presenting binary operations.

Before binary operations are introduced

Traditionally, an abstract algebra book assumes that the student is familiar with high school algebra and will then proceed with an observation that such operations as $+$ and $\times$ can be thought of as functions of two variables that take a number to another number. So the first abstract idea is typically the concept of binary operation, although in another post I will consider whether that really should be the first abstract concept.

The Abstracting Algebra book will have a chapter that presents concrete examples of algebraic operations and expressions on numbers as in elementary school and as in high school algebra. This section of the post outlines what should be presented there. Each subsection needs to be expanded with lots of examples.

In elementary school

In elementary school you see expressions such as

  • $3+4$
  • $3\times 4$
  • $3-4$

The student invariably thinks of these expressions as commands to calculate the value given by the expression.

They will also see expressions such as
\[\begin{equation}
\begin{array}[b]{r}
23\\
355\\
+ 96\\
\hline
\end{array}
\end{equation}\]
which they will take as a command to calculate the sum of the whole list:
\[\begin{equation}
\begin{array}[b]{r}
23\\
355\\
+ 96\\
\hline
474
\end{array}
\end{equation}\]

That uses the fact that addition is associative, and the format suggests using the standard school algorithm for adding up lists. You don’t usually see the same format with more than two numbers for multiplication, even though it is associative as well. In some elementary schools in recent years students are learning other ways of doing arithmetic and in particular are encouraged to figure out short cuts for problems that allow them. But the context is always “do it”, not “this represents a number”.

Algebra

In algebra you start using letters for numbers. In algebra, “$a\times b$” and “$a+b$” are expressions in the symbolic language of math, which means they are like noun phrases in English such as “My friend” and “The car I bought last week and immediately totaled” in that both are used semantically as names of objects. English and the symbolic language are both languages, but the symbolic language is not a natural language, nor is it a formal language.

Example

In beginning algebra, we say “$3+5=8$”, which is a (true) statement.

Basic facts about this equation:

The expressions “$3+5$” and “$8$”

  • are not the same expression
  • but in the standard semantics of algebra they have the same meaning
  • and therefore the equation communicates information that neither “$3+5$” nor “$8$” communicate.

Another example is “$3+5=6+2$”.

Facts like this example need to be communicated explicitly before binary operations are introduced formally. The students in a college abstract algebra class probably know the meaning of an equation operationally (subconsciously) but they have never seen it made explicit. See Algebra is a difficult foreign language.

Note

The equation “$3+5=6+2$” is an expression just as much as “$3+5$” and “$6+2$” are. It denotes an object of type “equation”, which is a mathematical object in the same way as numbers are. Most mathematicians do not talk this way, but they should.

Binary operations

Early examples

Consciousness-expanding examples should appear early and often after binary operations are introduced.

Common operations

  • The GCD is a binary operation on the natural numbers. This disturbs some students because it is not written in infix form. It is associative. The GCD can be defined conceptually, but for computation purposes needs (Euclid’s) algorithm. This gives you an early example of conceptual definitions and algorithms.
  • The maximum function is another example of this sort. This is a good place to point out that a binary operation with the “same” definition cen be defined on different sets. The max function on the natural numbers does not have quite the same conceptual definition as the max on the integers.

Extensional definitions

In order to emphasize the arbitrariness of definitions, some random operations on a small finite sets should be given by a multiplication table, on sets of numbers and sets represented by letters of the alphabet. This will elicit the common reaction, “What operation is it?” Hidden behind this question is the fact that you are giving an extensional definition instead of a formula — an algorithm or a combination of familiar operations.

Properties

The associative and commutative properties should be introduced early just for consciousness-raising. Subtraction is not associative or commutative. Rock paper scissors is commutative but not associative. Groups of symmetries are associative but not commutative.

Binary operation as function

The first definition of binary operation should be as a function. For example, “$+$” is a function that takes pairs of numbers to numbers. In other words, $+:\mathbb{Z}\times\mathbb{Z}\to\mathbb{Z}$ is a function.

We then abstract from that example and others like it from specific operations to arbitrary functions $\Delta:S\times S\to S$ for arbitrary sets $S$.

This is abstraction twice.

  • First we replace the example operations by an arbitrary operation. such as multiplication, subtraction, GCD and MAX on $\mathbb{Z}$, or something complicated such as \[(x,y)\mapsto 3(xy-1)^2(x^2+xy^3)^3\].
  • Then we replace sets of numbers by arbitrary sets. An example would be the random multiplication on the set $\{1,2,5\}$ given by the table
    \[
    \begin{array}{c|ccc}
    \Delta& 1&2&5\\
    \hline
    1&2&2&1\\
    2&5&2&1\\
    5&2&1&5
    \end{array}
    \]
    This defines a function $\Delta:\{1,2,5\}\times\{1,2,5\}\to\{1,2,5\}$ for which for example $\Delta(2,1)=5$, or $2\Delta 1=5$. This example uses numbers as elements of the set and is good for eliciting the “What operation is it?” question.
  • I will use examples where the elements are letters of the alphabet, as well. That sort of example makes the students think the letters are variables they can substitute for, another confusion to be banished by the wise professor who know the right thing to say to make it clear. (Don’t ask me; I taught algebra for 35 years and I still don’t know the right thing to say.)

It is important to define prefix notation and infix notation right away and to use both of them in examples.

Other representations of binary operations.

The main way of representing binary operations in Abstracting Algebra will be as trees, which I will cover in later posts. Those posts will be much more interesting than this one.

Binary operations in high school and college algebra

  • Some binops are represented in infix notation: “$a+b$”, “$a-b$”, and “$a\times b$”.
  • “$a\times b$” is usually written “$ab$” for letters and with the “$\times$” symbol for numbers.
  • Some binops have idiosyncratic representation: “$a^b$”, “${a}\choose{b}$”.
  • A lot of binops such as GCD and MAX are given as functions of two variables (prefix notation) and their status as binary operations usually goes unmentioned. (That is not necessarily wrong.)
  • The symbol “$(a,b)$” is used to denote the GCD (a binop) and is also used to denote a point in the plane or an open interval, both of which are not strictly binops. They are binary operations in a multisorted algebra (a concept I expect to introduce later in the book.)
  • Some apparent binops are in infix notation but have flaws: In “$a/b$”, the second entry can’t be $0$, and the expression when $a$ and $b$ are integers is often treated as having good forms ($3/4$) and bad forms ($6/8$).

Trees

The chaotic nature of algebraic notation I just described is a stumbling block, but not the primary reason high school algebra is a stumbling block for many students. The big reason it is hard is that the notation requires students to create and hold complicated abstract structures in their head.

Example

This example is a teaser for future posts on using trees to represent binary operations. The tree below shows much more of the structure of a calculation of the area of a rectangle surmounted by a semicircle than the expression

\[A=wh+\frac{1}{2}\left(\pi(\frac{1}{2}w)^2\right)\]
does.

The tree explicitly embodies the thought process that leads to the formula:

  • You need to add the area of the rectangle and the area of the semicircle.
  • The area of the rectangle is width times height.
  • The area of the semicircle is $\frac{1}{2}(\pi r^2)$.
  • In this case, $r=\frac{1}{2}w$.

Any mathematician will extract the same abstract structure from the formula\[A=wh+\frac{1}{2}\left(\pi(\frac{1}{2}w)^2\right)\] This is difficult for students beginning algebra.

References

The two languages of math

I am revising the (large) section of abstractmath.org that concerns the languages of math. Below is most of the the introduction to that section, which contains in particular detailed links to its contents. All of these are now available, but only a few of them have been revised. They are the ones that say “Abstractmath 2.0″ in the header.

Introduction

Mathematics in the English-speaking world is communicated using two languages:

  • Mathematical English is a special form of English.
  • It uses ordinary words with special meanings.
  • Some of its structural words (“if”, “or”) have different meanings from those of ordinary English.
  • It is both written and spoken.
  • Other languages also have special mathematical forms.
  • The symbolic language of math is a distinct, special-purpose language.
  • It has its own symbols and rules that are rather unlike those that spoken languages have.
  • It is not a dialect of English.
  • It is largely a written language.
  • Simple expressions can be pronounced, but complicated expressions may only be pointed to or referred to.
  • It is used by all mathematicians, not just those who write math in English.

Math in writing and in lectures involve both mathematical English and the symbolic language. They are embedded in each other and refer back and forth to each other.

Contents

The languages of math are covered in three chapters, each with several parts. Some things are not covered; see Notes.

Links to other sites


Notes

Math communication also uses pictures, graphs and diagrams, which abstractmath.org doesn’t discuss. Also not covered is the history and etymology of mathematical notation.

Creative Commons License 

This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

Dysfunctions in doing math III

This post concludes the work begun in Dysfunctions in doing math I and Dysfunctions in doing math II, with more revisions to the article in abstractmath on dysfunctions.

False symmetry

Bases of vector spaces

In a finite dimensional vector space $V$ with subspace $W$, every basis of $W$ can be extended to a basis of $V$. But in general there are bases of $V$ that do not contain a subset that is a basis of $W$. A tragic lack of symmetry that causes innocent students to lose points in linear algebra.

Example

The plane $P$ defined by $x=y$ is a two-dimensional subspace of the three dimensional Euclidean space with axes $x,y,z$. One basis of $P$ is $\{(1,1,0),(0,0,1)\}$. It can be extended to the basis $\{(1,1,0),(0,0,1),(0,1,0)\}$ of $\mathbb{R}^3$. But the basis $\{(1,0,0),(0,1,0),(0,0,1)\}$ of $\mathbb{R}^3$ does not contain a subset that is a basis of $P$.

Normal subgroups

Every subgroup $B$ of a commutative group $A$ is a normal subgroup of $A$. But if $B$ is an commutative subgroup of a non-commutative group $S$, then $B$ may not be a normal subgroup of $S$. For example, $\text{Sym}_3$ (the group of symmetries of an equilateral triangle) has three subgroup with two elements each. Each subgroup is commutative, but is not a normal subgroup of $\text{Sym}_3$.

Jump the fence

If you are working with an expression whose variables are constrained to certain values, and you substitute a value in the expression that violates the constraint, you jump the fence
(also called a fencepost error).

Example

The Fibonacci numbers (MW, Wi) are usually defined inductively like this:

\[F(n)=\left\{ \begin{align}
& 0\text{ if }n=0 \\
& 1\text{ if }n=1 \\
& F(n-1)+F(n-2)\text{ if }n\gt 1 \\
\end{align} \right.\]

In calculating a sum of Fibonacci numbers, you might write

\[\sum_{k=0}^{n}{F(k)=}\sum_{k=0}^{n}{F(k-1)+}\sum_{k=0}^{n}{F(k-2)}\]
This contains errors : the sums on the right involve $F(-1)$ and $F(-2)$, which are not defined by the definition above. You could add
\[F(n)=0\text{ if }n\lt 0\]
to the definition to get around this, or keep better track of the fence by writing

\[\sum_{k=0}^{n}{F(k)=}\sum_{k=1}^{n}{F(k-1)+}\sum_{k=2}^{n}{F(k-2)}\,\,\,\,\,\,\,\,\,\text{
}(n>1)\]

(The notation “$(n \gt 1)$” means “for all $n$ greater than $1$.” See here )

Literalism

Every type of math object has to have a definition. In giving a definition, a few of the many ingredients that are involved in that type of object are selected as a basis for the definition. They are not necessarily the most important parts. People who make definitions try to use as little as possible in the definition so that it is easier to verify that something is an example of the thing being defined.

A definitional literalist is someone who insists on thinking about a type of math object primarily in terms of what the definition says it is.


Definitional literalism inhibits your understanding of abstract math.

Ordered pairs

One of the major tools in the study of the foundations of mathematics is to try to define all mathematical objects in terms of as few as possible objects. The most common form this takes is to define everything in terms of sets. For example, the ordered pair $(a,b)$ can be defined to be the set $\{a, \{a, b\}\}$.
(See Wi). A definitional literalist will conclude that the ordered pair $(a,b)$ is the set $\{a, \{a, b\}\}$.

This would mean that it makes sense to say that $a\in(a,b)$ but $b\notin(a,b)$.
No mathematician would ever think of saying such things.

What is important about an ordered pair is its specification:

  • An ordered pair has a first coordinate and a second coordinate.
  • What the first and second coordinates are completely determine the ordered pair.

It is ludicrous to say something like “$a\in (a,b)$”. The “definition” that $(a,b)$ is the set $\{a,\{\{a,b\}\}$ is done purely for the purpose of showing that the study of ordered pairs can be reduced to the study of sets. It is not a fact about ordered pairs that we can use.

Equivalence relations

An equivalence relation on a set S is a relation on S with certain properties. A partition on S is a set of subsets with certain properties. The two definitions can be proven to give the same structure (that is done here).

I have personally heard literalists say,
“How can they give the same structure? One is a relation and one is a partition.” The point is that an equivalence relation/partition has a total structure which can be described either by starting with a relation and imposing axioms, or by giving a set of subsets and imposing axioms. Each set of axioms describes exactly the same structure; every theorem that can be deduced from the axioms for an equivalence relation can be deduced from the axioms for a partition.

Functions

The
(less strict) definition of function says that a function is a set of ordered pairs with the functional property.

This does not mean that if your function is $F ( x ) = 2 x + 1$, then you would say “$\left( 3,\,7 \right)\in F$” . The most common practice is to say that “$F (3) = 7$” or “the value of $F$ at $3$ is $7$” or something of the sort.

I do know mathematicians who tell me that they really do think of a function as a set of ordered pairs and would indeed say “$\left( 3,\,7 \right)\in F$”.

Vanishing

Many years ago I had a math professor who hated it with a purple passion if anyone said a function vanishes at some number $a$, meaning its value at $a$ is $0$. If you said, “The function $x^2-1$ vanishes at $1$”, he would say, “Pah! The function is still there isn’t it?”

There are in fact two different points a literalist can make about such a statement.

  • The function’s value at $1$ is $0$. The function is not zero anywhere, it is $x^2-1$, or if you have other literalness attitudes, it is “the function $f(x)$ defined by $f(x)=x^2-1$”.
  • Even its value doesn’t literally “vanish”. The value is written as “$0$”. Look at it closely. You can see it. It has not vanished.

The phrase “the function vanishes at $a$” is a metaphor. Mathematicians use metaphors in writing and talking about math all the time, just as people do in writing and talking about anything. Nevertheless, being occasionally the obnoxious literalist sometimes clears up misunderstanding. That is why mathematicians have a reputation for literalism.

Method addiction

Beginners at abstract math sometimes have the attitudes that a problem must be solved or a proof constructed by a specific procedure. They become quite uncomfortable when faced with problem solutions that involve guessing or conceptual proofs that involve little or no calculation.

Example

Once I gave a problem in my Theoretical Computer Science class that in order to solve it required finding the largest integer $n$ for which $n!\lt109$ Most students solved it correctly, but several wrote apologies on their paper for doing it by trial and error. Of course:


Trial and error is a perfectly valid method.

Example

Students at a more advanced level may feel insecure in the case where they are faced with solving a problem for which they know there is no known feasible algorithm, a situation that occurs mostly in senior and graduate level classes. For example, there are no known feasible general algorithms for determining if two finite groups given by their multiplication tables are isomorphic, and there is no algorithm at all to determine if two presentations (generators and relations) give the same group. Even so, the question, “Are the dihedral group of order 8 and the quaternion group isomorphic?” is not hard. (Answer: No, they have different numbers of elements of order 2 and 4.)


Sometimes you can solve special cases of unsolvable problems.

See also look ahead and conceptual.

Proof by Example

Definition: An integer is even if it is divisible by 2.

Theorem : Prove that if
$n$ is an even integer then so is ${{n}^{2}}$.

This is proved by universal generalization .

One type of mistake made by beginners for proofs like this would be the following:

“Proof: Let $n = 8$. Then ${{n}^{2}}=64$ and $64$ is even.”

This violates the requirement of universal generalization that you have ” made no restrictions on $c$” – you have restricted it to being a particular even integer!

It may be that some people who make this kind of mistake don’t understand universal generalization (see also bound variable). But for others, the mistake is caused by misreading the phrase “An integer is even if…” to read that you can prove the statement by picking an integer and showing that it is true for that integer. But in fact, “an” in a statement like this means “any”. See indefinite article.

Reading variable names as labels

An assertion such as “There are six times as many students as professors” is translated by some students as $6s = p$ instead of $6p = s$ (where $p$ and $s$ have the obvious meanings). This sort of thing can be avoided by plugging in numbers for the variables to see if the resulting equations make sense. You know it’s wrong to say that if you have $12$ professors then you have $2$ students!

Math ed people have referred to this as the “student-professor problem”. But it is not the real student-professor problem.

The representation is the object

Many newbies at abstract mathematics firmly believe that the number $735$ is the expression “735”. In fact, the number $735$ is an abstract math object, not a string of symbols that represents the number. This attitude inhibits your ability to use whatever representation of an object is best for the purpose.

Example

Someone faced with a question such as “Does $21$ divide $3 \cdot5\cdot72$?” may immediately multiply the expression out to get $1080$ and then carry out long division to see if indeed $21$ divides $1080$. They will say things such as, “I can’t tell what the number is until I multiply it out.”

In this example, it is easy to see that $21$ does not divide $3 \cdot5\cdot72$, because if it did, $7$ would be a prime factor, but $7$ does not divide $72$.

Integers have many representations: decimal, binary, the prime factorization, and so on. Clearly the prime factorization is the best form for determining divisors, whereas for example the decimal notation is a good form for determining which of two integers is the larger. For example, is $3 \cdot5\cdot72$ bigger or smaller than $2\cdot 11\cdot49$?

Unique

By definition, a set $R$ of ordered pairs has the functional property if two pairs in $R$ with the same first coordinate have to have the same second coordinate

It is wrong to rephrase the definition this way: “The first coordinate determines a unique second coordinate.” That use of “unique” is ambiguous. It could mean the set \[\{(1,2),
(2,4), (3,2), (5,8)\}\] does not have the functional property because the first coordinate in $(1,2)$ determines $2$ and the first coordinate in $(3,2)$ determines $2$, so it is “not unique”. This statement is wrong. . The set does have the functional property.

A related error is to reword the definition of injective by saying, “For each input there is a unique output.” It is easy to read this and think injectivity is merely the functional property.

It seemed to me that during the 35 years I taught calculus and discrete math, students fell into this trap about 100,000 times. Of course, this could be a slight exaggeration.


Avoid rewording any definition that does not use the word unique
so that it DOES use the word unique.
Such activity fries your brain and turns A’s into B’s.


Unnecessarily weak assertion

Examples

  • The statement “Either $x \gt 0$ or $x \lt 2$” is true (for real numbers). Yes, you could make a stronger statement, for example “Either $x\le 0$ or $x \gt 0$”. But the statement “Either $x \gt 0$ or $x \lt 2$” is still true.
  • Some students have problems with the true statements “$2\le 2$” and with “$2\le 3$” for a similar reason, since in fact $2 = 2$ and $2 \lt 3$.
  • You may get a twinge if someone says “Many primes are odd”, since in fact there is only one that is not
    odd. But it is still true that many primes are odd.

An unnecessarily weak assertion may occur in math texts because it is the form your proof gives you, or it is the form you need for a proof. In the latter case you may feel the author has pulled a rabbit out of a hat.

There is another example here.


It is not wrong for an author to make an unnecessarily weak assertion.




Rabbits

Sometimes when you are reading or listening to a proof you will find yourself following each step but with no idea why these steps are going to give a proof. This can happen with the whole structure of the proof or with the sudden appearance of a step that seems like the prover pulled a rabbit out of a hat . You feel as if you are walking blindfolded.

Example
(mysterious proof structure)

The lecturer says he will prove that for an integer $n$, if $n^2$ is even then $n$ is even. He begins the proof: Let $n^2$ be odd” and then continues to the conclusion, “Therefore $n$ is odd.”

Why did he begin a proof about being even with the assumption that $n$ is odd?

The answer is that in this case he is doing a proof by contrapositive . If you don’t recognize the pattern of the proof you may be totally lost. This can happen if you don’t recognize other forms, for example contradiction and induction.

Example (rabbit)

You are reading a proof that $\underset{x\to
2}{\mathop{\lim }}{{x}^{2}}=4$. It is an $\varepsilon \text{-}\delta$ proof, so what must be proved is:

  • (*) For any positive real number $\varepsilon $,
  • there is a positive real number $\delta $ for which:
  • if $\left| x-2 \right|\lt\delta$ then
  • $\left| x^2-4 \right|\lt\varepsilon$.

Proof

Here is the proof, with what I imagine might be your agitated reaction to certain steps. Below is a proof with detailed explanations .

1) Suppose $\varepsilon \gt0$ is given.

2) Let $\delta =\text{min}\,(1,\,\frac{\varepsilon }{5})$ (the minimum of the two numbers 1 and $\frac{\varepsilon}{5}$ ).

Where the *!#@! did that come from? They pulled it out of thin air! I can’t see where we are going with this proof!

3) Suppose that $\left| x-2 \right|\lt\delta$.

4) Then $\left| x-2 \right|\lt1$ by (2) and (3).

5) By (4) and algebra, $\left|x+2 \right|\lt5$.

Well, so what? We know that $\left| x+39
\right|\lt42$ and lots of other things, too. Why did they do this?

6) Also $\left| x-2 \right|\lt\frac{\varepsilon }{5}$ by (2).

7) Then $\left| {{x}^{2}}-4
\right|=\left| (x-2)(x+2) \right|\lt\frac{\varepsilon }{5}\cdot 5=\varepsilon$ by (5) and (6). End of Proof.

Remarks

This proof is typical of proofs in texts.

  • Steps 2) and 5) look like they were rabbits pulled out of a hat.
  • The author gives no explanation of where they came from.
  • Even so, each step of the proof follows from previous steps, so the proof is correct.
  • Whether you are surprised or not has nothing to do with whether it is correct.
  • In order to understand a proof, you do not have to know where the rabbits came from.
  • In general, the author did not think up the proof steps in the order they occur in the proof. (See this remark in the section on Forms of Proofs.)
  • See also look ahead.

Acknowledgments

Thanks to Robert Burns for corrections and suggestions

Variations in meaning in math

Words in a natural language may have different meanings in different social groups or different places.  Words and symbols in both mathematical English and the symbolic language vary according to specialty and, occasionally, country (see convention, default).  And words and symbols can change their meanings from place to place within the same mathematical discourse (see scope).

This article mostly provides pointers to other articles in abstractmath.org that give more details about the ideas.

Conventions

A convention in mathematical discourse is notation or terminology used with a special meaning in certain contexts or in certain fields. Articles and books in a specialty do not always clue you in on these conventions.

Some conventions are nearly universal in math.

Example 1

The use of “if” to mean “if and only if” in a definition is a convention. More about this here. This is a hidden definition by cases. “Hidden” means that no one tells the students, except for Susanna Epp and me.

Example 2

Constants or parameters are conventionally denoted by a, b, … , functions by f, g, … and variables by x, y,…. More.

Example 3

Referring to a group (or other mathematical structure) and its underlying set by the same name is a convention.  This is an example of both synecdoche and context-sensitive.

Example 4

The meaning of ${{\sin }^{n}}x$ in many calculus books is:

  • The inverse sine (arcsin) if $n=-1$.
  • The mult­iplica­tive power for positive $n$; in other words, ${{\sin }^{n}}x={{(\sin x)}^{n}}$ if $n\ne -1$.

This, like Example 1, is a definition by cases. Unlike Example 1, calculus books often make it explicit. Explicit or not, this usage is an abomination.

Some conventions are pervasive among math­ematicians but different conventions hold in other subjects that use mathematics.

  • Scientists and engineers may regard a truncated decimal such as 0.252 as an approximation, but a mathematician is likely to read it as an exact rational number, namely $\frac{252}{1000}$.
  • In most computer languages a distinction is made between real numbers and integers;
    42 would be an integer but 42.0 would be a real number.  Older mathematicians may not know this.
  • Mathematicians use i to denote the imaginary unit. In electrical engineering it is commonly denoted j instead, a fact that many mathematicians are un­aware of. I first learned about it when a student asked me if i was the same as j.

Conventions may vary by country.

  • In France and possibly other countries schools may use “positive” to mean “nonnegative”, so that zero is positive. 
  • In the secondary schools in some places, the value of sin x may be computed clockwise starting at (0,1)  instead of counterclockwise starting at (1,0).  I have heard this from students. 

Conventions may vary by specialty within math.

Field” and “log” are examples. 

Defaults

An interface to a computer program may have many possible choices for the user to make. In most cases, the interface will use certain choices automatically when the user doesn’t specify them.  One says the program defaults to those choices.  

Examples

  • A word processing program may default to justified paragraphs and insert mode, but allow you to pick ragged right or typeover mode.
  • I have spent a lot of time in both Minne­sota and Georgia and the remarks about skiing are based on my own observation. But these usages are not absolute. Some affluent Geor­gians may refer to snow skiing as “skiing”, for example, and this usage can result in a put-down if the hearer thinks they are talking about water skiing. One wonders where the boundary line is. Perhaps people in Kentucky are confused on the issue.

  • There is a sense in which the word “ski” defaults to snow skiing in Minnesota and to water skiing in Georgia.
  • “CSU” defaults to Cleveland State University in northern Ohio and to Colorado State University in parts of the west.

Math language behaves in this way, too.

Default usage in mathematical discourse

Symbols

  • In high school, $\pi$ refers by default to the ratio of the circumference of a circle to its diameter.  Students are often quite surprised when they get to abstract math courses and discover the many other meanings of $\pi $ (see here).
  • Recently authors in the popular literature seem to think that $\phi$ (phi) defaults to the golden ratio.  In fact, a search through the research literature shows very few hits for $\phi$ meaning the golden ratio: in other words, it usually means something else. 
  • The set $\mathbb{R}$ of real numbers has many different group structures defined on it but “The group $\mathbb{R}$” essentially always means that the group operation is ordinary addition.  In other words, “$\mathbb{R}$” as a group defaults to +.  Analogous remarks apply to “the field $\mathbb{R}$”. 
  • In informal conversation among many analysts, functions are continuous by default.
  • It used to be the case that in informal conversations among topologists, “group” defaulted to Abelian group. I don’t know whether that is still true or not.

Remark

This meaning of “default” has made it into dictionaries only since around 1960 (see the Wikipedia entry). This usage does not carry a derogatory connotation.   In abstractmath.org I am using the word to mean a special type of convention that imposes a choice of parameter, so that it is a special case of both “convention” and “suppression of parameters”.

Scope

Both mathematical English and the symbolic language have a feature that is uncommon in ordinary spoken or written English:  The meaning of a phrase or a symbolic expression can be different in different parts of the discourse.   The portion of the text in which a particular meaning is in effect is called the scope of the meaning.  This is accomplished in several ways.

Explicit statement

Examples

  • “In this paper, all groups are abelian”.  This means that every instance of the word “group” or any symbol denoting a group the group is constrained to be abelian.   The scope in this case is the whole paper.   See assumption.
  • “Suppose (or “let” or “assume”) $n$ is divisible by $4$”. Before this statement, you could not assume $n$ is divisible by $4$. Now you can, until the end of the current paragraph or section.

Definition

The definition of a word, phrase or symbol sets its meaning.  If the word definition is used and the scope is not given explicitly, it is probably the whole discourse.

Example

“Definition.  An integer is even if it is divisible by 2.”  This is marked as a definition, so it establishes the meaning of the word “even” (when applied to an integer) for the rest of the text. 

If

Used in modus ponens (see here) and (along with let, usually “now let…”) in proof by cases.

Example(modus ponens)

Suppose you want to prove that if an integer $n$ is divisible by $4$ then it is even. To show that it is even you must show that it is divisible by $2$. So you write:

  • “Let $n$ be divisible by $4$. That means $n=4k$ for some integer $k$. But then $n=2(2k)$, so $n$ is even by definition.”

Now if you start a new paragraph with something like “For any integer $n\ldots$” you can no longer assume $n$ is divisible by $4$.

Example (proof by cases)

Theorem: For all integers $n$, $n^2+n+1$ is odd.

Definitions:

  • “$n$ is even” means that $n=2s$ for some integer $s$.
  • “$n$ is odd” means that $n=2t+1$ for some integer $t$.

Proof:

  • Suppose $n$ is even. Then

    \[\begin{align*}
    n^2+n+1&=4s^2+2s+1\\
    &=2(2s^2+s)+1\\
    &=2(\text{something})+1
    \end{align*}\]

    so $n^2+n+1$ is odd. (See Zooming and Chunking.)

  • Now suppose $n$ is odd. Then

    \[\begin{align*}
    n^2+n+1&=(2t+1)^2+2t+1+1\\
    &=4t^2+4t+1+2t+1+1\\
    &=2(2t^2+3t)+3\\
    &=2(2t^2+3t+1)+1\\
    &=2(\text{something})+1
    \end{align*}\]

    So $n^2+n+1$ is odd.

Remark

The proof I just gave uses only the definition of even and odd and some high school algebra. Some simple grade-school facts about even and odd numbers are:

  • Even plus even is even.
  • Odd plus odd is even.
  • Even times even is even.
  • Odd times odd is odd.

Put these facts together and you get a nicer proof (I think anyway): $n^2+n$ is even, so when you add $1$ to it you must get an odd number.

Bound variables

A variable is bound if it is in the scope of an integral, quantifier, summation, or other binding operators.  More here.

Example

Consider this text:

Exercise: Show that for all real numbers $x$, it is true that $x^2\geq0$. Proof: Let $x=-2$. Then $x^2=(-2)^2=4$ which is greater than $0$. End of proof.”

The problem with that text is that in the statement, “For all real numbers $x$, it is true that $x^2\geq0$”, $x$ is a bound variable. It is bound by the universal quantifier “for all” which means that $x$ can be any real number whatever. But in the next sentence, the meaning of $x$ is changed by the assumption that $x=-2$. So the statement that $x\geq0$ only applies to $-2$. As a result the proof does not cover all cases.

Many students just beginning to learn to do proofs make this mistake. Fellow students who are a little further along may be astonished that someone would write something like that paragraph and might sneer at them. But this common mistake does not deserve a sneer, it deserves an explanation. This is an example of the ratchet effect.

Variable meaning in natural language

Meanings commonly vary in natural language because of conventions and defaults. But varying in scope during a conversation seems to me uncommon.

It does occur in games. In Skat and Bridge, the meaning of “trump” changes from hand to hand. The meaning of “strike” in a baseball game changes according to context: If the current batter has already had fewer than two strikes, a foul is a strike, but not otherwise.

I have not come up with non-game examples, and anyway games are played by rules that are suspiciously like mathematical axioms. Perhaps you can think of some non-game occasions in which meaning is determined by scoping that I have overlooked.

Thinking about thought

Modules of the brain

Cognitive neuroscientists have taken the point of view that concepts, memories, words, and so on are represented in the brain by physical systems: perhaps they are individual neurons, or systems of structures, or even waves of discharges. In my previous writing I have referred to these as modules, and I will do that here. Each module is connected to many other modules that encode various properties of the concept, thoughts and memories that occur when you think of that concept (in other words stimulate the module), and so on.

How these modules implement the way we think and perceive the world is not well understood and forms a major research task of cognitive neuroscience. The fact that they are implemented in physical systems in the brain gives us a new way of thinking about thought and perception.

Examples

The grandmother module

There is a module in your brain representing the concept of grandmother. It is likely to be connected to other modules representing your actual grandmothers if you have any memory of them. These modules are connected to many others — memories (if you knew them), other relatives related to them, incidents in their lives that you were told about, and so on. Even if you don’t have any memory of them, you have a module representing the fact that you don’t have any memory of them, and maybe modules explaining why you don’t.

Each different aspect related to “grandmother” belongs to a separate module somehow connected to the grandmother module. That may be hard to believe, but the human brain has over eighty billion neurons.

A particular module connected with math

There is a module in your brain connected with the number $42$. That module has many connections to things you know about it, such as its factorization, the fact that it is an integer, and so on. The module may also have connections to a module concerning the attitude that $42$ is the Answer. If it does, that module may have a connection with the module representing Douglas Adams. He was physically outside your body, but is the number $42$ outside your body?

That has a decidedly complicated answer. The number $42$ exists in a network of brains which communicate with each other and share some ideas about properties of $42$. So it exists socially. This social existence occasionally changes your knowledge of the properties of $42$ and in particular may make you realize that you were wrong about some of its aspects. (Perhaps you once thought it was $7\times 8$.)

This example suggests how I have been using the module idea to explain how we think about math.

A new metaphor for understanding thinking

I am proposing to use the idea of module as a metaphor for thinking about thinking. I believe that it clarifies a lot of the confusion people have about the relation between thinking and the real world. In particular it clarifies why we think of mathematical objects as if they were real-world objects (see Modules and math below.)

I am explicitly proposing this metaphor as a successor to previous metaphors drawn from science to explain things. For example when machines became useful in the 18th century many naturalists used metaphors such as the Universe is a Machine or the Body is a Machine as a way of understanding the world. In the 20th century we fell heavily for the metaphor that the Mind Is A Computer (or Program). Both the 18th century and the 20th century metaphors (in my opinion) improved our understanding of things, even though they both fell short in many ways.

In no way am I claiming that the ways of thinking I am pushing have anything but a rough resemblance to current neuroscientists’ thinking. Even so, further discoveries in neuroscience may give us even more insight into thinking that they do now. Unless at some point something goes awry and we have to, ahem, think differently again.

For thousands of years, new scientific theories have been giving us new metaphors for thinking about life, the universe and everything. I am saying here is a new apple on the tree of knowledge; let’s eat it.

The rest of this post elaborates my proposed metaphor. Like any metaphor, it gets some things right and some wrong, and my explanations of how it works are no doubt full of errors and dubious ideas. Nevertheless, I think it is worth thinking about thought using these ideas with the usual correction process that happens in society with new metaphors.

Our theory of the world

We don’t have any direct perception of the “real world”; we have only the sensations we get from those parts of our body which sense things in the world. These sensations are organized by our brain into a theory of the world.

  • The theory of the world says that the world is “out there” and that our sensory units give us information about it. We are directly aware of our experiences because they are a function of our brain. That the experiences (many of them) originate from outside our body is a very plausible theory generated by our brain on the bases of these experience.
  • The theory is generated by our brain in a way that we cannot observe and is out of our control (mostly). We see a table and we know we can see in in daytime but not when it is dark and we can bump into it, which causes experiences to occur via our touch and sound facilities. But the concept of “table” and the fact that we decide something is or is not a table takes place in our brain, not “out there”.
  • We do make some conscious amendments to the theory. For example, we “know” the sky is not a blue shell around our world, although it looks like it. That we think of the apparent blue surface as an artifact of our vision processing comes about through conscious reasoning. But most of how we understand the world comes about subconsciously.
  • Our brain (and the rest of our body) does an enormous amount of processing to create the view of the world that we have. Visual perception requires a huge amount of processing in our brain and the other sensory methods we use also undergo a lot of processing, but not as much as vision.
  • The theory of the world organizes a lot of what we experience as interaction with physical objects. We perceive physical objects as having properties such as persistence, changing with time, and so on. Our brains create the concept of physical object and the properties of persistence, changing, and particular properties an individual object might have.
  • We think of the Mississippi River as an object that is many years old even though none of its current molecules are the same as were in the river a decade ago. How is it one thing when its substance is constantly changing? This is a famous and ancient conundrum which becomes a non-problem if you realize that the “object” is created inside your brain and imposed by your thinking on your understanding of the world.
  • The notion that semantics is a connection between our brain and the outside world has also become a philosophical conundrum that vanishes if we understand that the connection with the outside world exists entirely inside our theory, which is entirely within our brain.

Society

Our brain also has a theory of society We are immersed in a world of people, that we have close connections with some of them and more distant connections with many other via speech, stories, reading and various kinds of long-distance communications.

  • We associate with individual people, in our family and with our friend. The communication is not just through speech: it involves vision heavily (seeing what The Other is thinking) and probably through pheromones, among other channels. For one perspective on vision, see The vision revolution, by Mark Changizi. (Review)
  • We consciously and unconsciously absorb ideas and attitudes (cultural and otherwise) from the people around us, especially including the adults and children we grow up with. In this way we are heavily embedded in the social world, which creates our point of view and attitudes by our observation and experience and presumably via memes. An example is the widespread recent changes in attitudes in the USA concerning gay marriage.
  • The theory of society seems to me to be a mechanism in our brain that is separate from our theory of the physical world, but which interacts with it. But it may be that it is better to regard the two theories as modules in one big theory.

Modules and math

The module associated with a math object is connected to many other modules, some of which have nothing to do with math.

  • For example, they may have have connections to our sensory organs. We may get a physical feeling that the parabola $y=x^2$ is going “up” as $x$ “moves to the right”. The mirror neurons in our brain that “feel” this are connected to our “parabola $y=x^2$” module. (See Constructivism and Platonism and the posts it links to.)
  • I tend to think of math objects as “things”. Every time I investigate the number $111$, it turns out to be $3\times37$. Every time I investigate the alternating group on $6$ letters it is simple. If I prove a new theorem it feels as if I have discovered the theorem. So math objects are out there and persistent.
  • If some math calculation does not give the same answer the second time I frequently find that I made a mistake. So math facts are consistent.
  • There is presumably a module that recognizes that something is “out there” when I have repeatable and consistent experiences with it. The feeling originates in a brain arranged to detect consistent behavior. The feeling is not evidence that math objects exist in some ideal space. In this way, my proposed new way of thinking about thought abolishes all the problems with Platonism.
  • If I think of two groups that are isomorphic (for example the cyclic group of order $3$ and the alternating group of rank $3$), I picture them as in two different places with a connection between the two isomorphic ones. This phenomenon is presumably connected with modules that respond to seeing physical objects and carrying with them a sense of where they are (two different places). This is a strategy my brain uses to think about objects without having to name them, using the mechanism already built in to think about two things in different places.

Acknowledgments

Many of the ideas in this post come from my previous writing, listed in the references. This post was also inspired by ideas from Chomsky, Jackendoff (particularly Chapter 9), the Scientific American article Brain cells for Grandmother by Quian Quiroga, Fried and Koch, and the papers by Ernest and Hersh.


References

Previous posts

In reverse chronological order

Abstractmath articles

Other sources

Dysfunctions in doing math II

This post continues Dysfunctions in doing math I, with some more revisions to the article in abstractmath on dysfunctions.

Elements

First Myth

MYTH: There are two kinds of mathematical objects: "sets" and "elements".

This is the TRUTH: Being an element is not a property that some math objects have and others don’t. “Element” is a binary relation; it relates an object and a set. So “$3$ is an element” means nothing, but “$3$ is an element of the set of integers” is true and relates two mathematical objects to each other.


Any mathematical object can be an element of a set
In particular, any set can be the
element of another set.

Examples

  • The number $42$ is not a set, but it is an element of the set $\{5,10,41,42,-30\}$.
  • The sine function is not a set, but it is an element of the set of all differentiable functions defined on the real numbers.
  • The set $\{1,2,5\}$ is a set, but it is also an element of the set $\left\{\{1,2,5\},\{3,5\}, \emptyset,\{42\}\right\}$. It is not an element of the set $\{1,2,3,4,5\}$.

If you find these examples confusing, read this.

Second Myth

MYTH: The empty set is an element of every set.

This is the TRUTH:
The empty set is an element of a set $S$ if and only if the definition of $S$ requires it to be an element.

Examples

  • The empty set is not an element of every set. It is not an element of the set $\{2,3\}$ for example; that set has only the elements $2$ and $3$.
  • The empty set is an element of the set $\{2,3,\emptyset\}$.
  • The empty set is a subset of every set.

Other ways to misunderstand sets

The myths just listed are explicit; students tell them to each other. The articles below tell you about other misunderstanding about sets which are usually subconscious.

Enthymeme

An enthymeme is an argument based partly on unexpressed beliefs. Beginners at the art of writing proofs often produce enthymemes.

Example

In the process of showing that the intersection of two equivalence relations $E$ and $E’$ is also an equivalence relation, a student may write “$E\cap E’$ is transitive because $E$ and $E’$ are transitive.”

  • This is an enthymeme; it omits stating, much less proving, that the intersection of transitive relations is transitive.
  • The student may “know” that it is obvious that the intersection of transitive relations is transitive, having never considered the similar question of the union of transitive relations.
  • It is very possible that the student possesses (probably subconsciously) a malrule to the effect that for any property $P$ the union or intersection of relations with property $P$ also has property $P$.
  • The instructor very possibly suspects this. For some students, of course, the suspicion will be unjustified, but for which ones?
  • This sort of thing is a frequent source of tension between student and instructor: “Why did you take points off because I assumed the intersection of transitive relations is transitive? It’s true!”

Malrule

A malrule is an incorrect rule for syntactic transformation of a mathematical expression.

Example

The malrule $\sqrt{x+y}=\sqrt{x}+\sqrt{y}$ invented by algebra students may come from the pattern given by the distributive law $a(x+y)=ax+ay$. The malrule invented by many first year calculus students that transforms $\frac{d(uv)}{dx}$ to $\frac{du}{dx}\frac{dv}{dx}$ may have been generated by extrapolating from the correct rule
\[\frac{d(u+v)}{dx}=\frac{du}{dx}+\frac{dv}{dx}\] by changing addition to multiplication. Both are examples of “every operation is linear”, which students want desperately to be true, although they are not aware of it.

Existential bigamy

Beginning abstract math students sometimes make a particular type of mistake that occurs in connection with a property $P$ of an mathematical object $x$ that is defined by requiring the existence of an item $y$ with a certain relationship to $x$. When students have a proof that assumes that there are two items $x$ and $x’$ with property $P$, they sometimes assume that the same $y$ serves for both of them. This mistake is called existential bigamy: The fact that Muriel and Bertha are both married (there is a person to whom Muriel is married and there is a person to whom Bertha is married) doesn’t mean they are married to the same person.

Example

Let $m$ and $n$ be integers. By definition, $m$ divides $n$ if there is an integer $q$ such that $n=qm$. Suppose you are asked to prove that if $m$ divides both $n$ and $p$, then $m$ divides $n+p$. If you begin the proof by saying, “Let $n = qm$ and $p = qm$…” then you are committing existential bigamy.

You need to begin the proof this way: “Let $n = qm$ and $p = q’m…”$

Inverse image demo revisited

This post is an update of the post Demonstrating the inverse image of a function.

To manipulate the demos in this post, you must have Wolfram CDF Player installed on your computer. It is available free from the Wolfram website. The code for the demos, with some explanatory remarks, is in the file InverseImage.nb on my ,Mathematica website. That website also includes some other examples as .cdf files.

  • If the diagrams don’t appear, or appear but show pink diagrams, or if the formulas in the text are too high or too low, refresh the screen.
  • The vertical red interval has the horizontal green interval(s) as inverse image.
  • You can move the sliders back and forth to move to different points on the curve. As you move the sliders back and forth you will see the inverse image breaking up into a disjoint union in intervals, merging into a single interval, or disappearing entirely.
  • The arrow at the upper right makes it run automatically in a not very useful sort of way.
  • If you are using Mathematica, you can enter values into the boxes, but if you are using CDF Player, you can only change the number using the slider or the plus and minus incrementers.

 

This is the graph of $y=x^2-1$.

The graph of $-.5 + .5 x + .2 x^2 – .19 x^3 – .015 x^4 + .01 x^5 $

The graph of the rational function $0.5 x+\frac{1.5 \left(x^4-1\right)}{x^4+1}$

The graph of a straight line whose slope can be changed. You can design demos of other functions with variable parameters.

The graph of the sine function. The other demos were coded using the Mathematica Reduce function to get the inverse image. This one had to be done in an ad hoc way as explained in the InverseImage.nb file.

 

Dysfunctions in doing math I

I am in the middle of revising the article in abstractmath.org on dysfunctional attitudes and behaviors in doing math. Here are three of the sections I have finished.

Misuse of analogy

When William Rowan Hamilton was trying to understand the new type of number called quaternions (MW, Wik) that he invented, he assumed by analogy that like other numbers, quaternion multiplication was commutative. It was a major revelation to him that they were not commutative.

Analogy may suggest new theorems or ways of doing things. But it is fallible. What happens particularly often in abstract math is applying a rule to a situation where it is not appropriate. This is an easy trap to fall into when the notation in two different cases has the same form; that is an example of formal analogy.

Matrix multiplication

Matrix multiplication is not commutative

If $r$ and $s$ are real numbers then the products $rs$ and $sr$ are always the same number. In other words, multiplication of real numbers is commutative : $rs = sr$ for all real numbers $r$ and $s$.

The product of two matrices $M $and $N$ is written $MN$, just as for numbers. But matrix multiplication is not commutative. For example,
\[\left(
\begin{array}{cc}
1 & 2 \\
3 & 4\\
\end{array}
\right)
\left(
\begin{array}{cc}
3 & 1 \\
3 &2\\
\end{array}
\right)
=
\left(
\begin{array}{cc}
9 & 5\\
21 & 11 \\
\end{array}
\right)\]
but
\[\left(
\begin{array}{cc}
3 & 1 \\
3 & 2\\
\end{array}
\right)
\left(\begin{array}{cc}
1 & 2 \\
3 & 4\\
\end{array}
\right)
=
\left(
\begin{array}{cc}
6 & 10\\
91 & 14 \\
\end{array}
\right)\]
Because $rs = sr$ for numbers, the formal similarity of the notation suggests $MN$ = $NM$, which is wrong.

This means you can’t blindly manipulate $MNM$ to become $M^2N$. More generally, a law such as $(MN)^n=M^nN^n$ is not correct when $M$ and $N$ are matrices.


You must understand the meanings
of the symbols you manipulate.

The product of two nonzero matrices can be 0

If the product of two numbers is 0, then one or both of the numbers is zero. But that is not true for matrix multiplication:
\[\left(
\begin{array}{cc}
-2 & 2 \\
-1 & 1\\
\end{array}
\right)
\left(
\begin{array}{cc}
1 & 1 \\
1 &1\\
\end{array}
\right)
=
\left(
\begin{array}{cc}
0 &0\\
0 & 0 \\
\end{array}
\right)\]

Canceling sine

  • Beginning calculus students have already learned algebra.
  • They have learned that an expression such as $xy$ means $x$ times $y$.
  • They have learned to cancel like terms in a quotient, so that for example \[\frac{3x}{3y}=\frac{x}{y}\]
  • They have learned to write the value of a function $f$ at the input $x$ by $f(x)$.
  • They have seen people write $\sin x$ instead of $\sin(x)$ but have never really thought about it.
  • So they write \[\frac{\sin x}{\sin y}=\frac{x}{y}\]

This happens fairly often in freshman calculus classes. But you wouldn’t do that, would you?

Boundary values of definitions

Definitions are usually inclusive

Definitions of math concepts usually include the special cases they generalize.

Examples

  • A square is a special case of rectangle. As far as I know texts that define “rectangle” include squares in the definition. Thus a square is a rectangle.
  • A straight line is a curve.
  • A group is a semigroup.
  • An integer is a real number. (But not always in computing languages — see here.)

But not always

  • The axioms of a field include a bunch of axioms that a one-element set satisfies, plus a special axiom that does nothing but exclude the one-element set. So a field has to have at least two elements, and that fact does not follow from the other axioms.
  • Boolean algebras are usually defined that way, too, but not always. MathWorld gives several definitions of Boolean algebra that disagree on this point.

When boundary values are not special cases

Definitions may or may not include other types of boundary values.

Examples

  • If $S$ is a set, it is a subset of itself. The empty set is also a subset of $S$.
  • Similarly the divisors of $6$ are $-6$, $-3$, $-2$, $-1$, $1$, $2$, $3$ and $6$, not just $2$ and $3$ and not just $1$, $2$, $3$ and $6$ (there are two different boundaries here).

But …

  • The positive real numbers include everything bigger than $0$, but not $0$. ( Note).

Blunders

A definition that includes such special cases may be called inclusive; otherwise it is exclusive. People new to abstract math very commonly use words defined inclusively as if their definition was exclusive.

  • They say things such as “That’s not a rectangle, it is a square!” and “Is that a group or a semigroup?”
  • They object if you say “Consider the complex number $\pi $.”

This appears to be natural linguistic behavior. Even so, math is picky-picky: a square is a rectangle, a group is a semigroup and $\pi$ is a complex number (of course, it is also a real number).

Co-intimidator

  • You attend a math lecture and the speaker starts talking about things you never heard of.
  • Your fellow students babble at you about manifolds and tensors and you thought they were car parts and lamps.
  • You suspect your professor is deliberately talking over your head to put you down.
  • You suspect your friends are trying to make you believe they are much smarter than you are.
  • You suspect your friends are smarter than you are.

There are two possibilities:

  • They are not trying to intimidate you (most common).
  • They are deliberately setting out to intimidate you with their arcane knowledge so you will know what a worm you are. (There are people like that.)

Another possibility, which can overlap with the two above, is:

  • You expect to be intimidated. You may be what might be called a co-intimidator, Similar to the way someone who is codependent wants some other person to be dependent on them. (This is not like the “co” in category theory: “product” and “coproduct” have a symmetric relationship with each other, but the co-intimidator relation is asymmetric.)

There are many ways to get around being intimidated.

  • Ask “What the heck is a manifold?”
  • (In a lecture where it might be imprudent or impractical to ask) Write down what they say, then later ask a friend or look it up.
  • Most teachers like to be asked to explain something. Yes, I know some professors repeatedly put down people. Change sections! If you can’t, live with it! Not knowing something says nothing bad about you.

And remember:


If you don’t know something
probably many other students don’t know it either.

A proof by diagram chasing



In Rigorous proofs, I went through the details of a medium-easy epsilon-delta proof in great detail as a way of showing what is hidden by the wording of the proof. In this post, I will do the same for an easy diagram-chasing proof in category theory. This theorem is stated and proved in Category Theory for Computing Science, page 365, but the proof I give here maximizes the diagram-chasing as a way of illustrating the points I want to make.

Theorem (J. Lambek) Let $F$ be a functor from a category to itself and let $\alpha:Fa\to a$ be an algebra for $F$ which is initial. Then $\alpha$ is an isomorphism.

Proof

  1. $F\alpha:FFa\to Fa$ is also an $F$-algebra.
  2. Initiality means that there is a unique algebra morphism $\eta:a\to Fa$ from $\alpha:Fa\to a$ to $F\alpha:FFa\to Fa$ for which this diagram commutes:



  3. To that diagram we can adjoin another (obviously) commutative square:



  4. Then the outside rectangle in the diagram above also commutes.
  5. This means that $\alpha\circ\eta:a\to a$ is an $F$-algebra morphism from $\alpha:Fa\to a$ to itself.
  6. Another such $F$-algebra morphism is $\text{id}_{A}$.
  7. Initiality of $\alpha$ means that the diagram below commutes:



  8. Because the upper bow and the left square both commute we are justified in inserting a diagonal arrow as below.



  9. Now we can read off the diagram that $F\alpha\circ F(\eta)=\text{id}_{Fa}$ and $\eta\circ\alpha=\text{id}_a$. By definition, then, $\eta$ is a two-sided inverse to $\alpha$, so $\alpha$ is an isomorphism.

Analysis of the proof

This is an analysis of the proof showing what is not mentioned in the proof, similar to the analysis in Rigorous proofs.

  • An $F$-algebra is any arrow of the form $\alpha:Fa\to a$. This definition directly verifies statement (1). You do need to know the definition of “functor” and that the notation $Fa$ means $F(a)$ and $FFa$ means $F(F(a))$.
  • When I am chasing diagrams, I visualize the commutativity of the diagram in (2) by thinking of the red path and the blue path as having the same composites in this graph:





    In other words, $F\alpha\circ F\eta=\eta\circ\alpha$. Notice that the diagram carries all the domain and codomain information for the arrows, whereas the formula “$F\alpha\circ F\eta=\eta\circ\alpha$” requires you to hold the domains and codomains in your head.

  • (Definition of morphism of $F$-algebra) The reader needs to know that a morphism of $F$ algebras is any arrow $\delta:c\to d$ for which




    commutes.
  • (Definition of initial $F$-algebra) $\alpha$ is an initial $F$-algebra means that for any algebra $\beta:Fb\to b$, there is a unique arrow $\delta$ for which the diagram above commutes.
  • (2) is justified by the last two definitions.
  • Pulling a “rabbit out of a hat” in a proof means introducing something that is obviously correct with no motivation, and then checking that it results in a proof. Step (9) in the proof given in Rigorous proofs has an example of adding zero cleverly. It is completely OK to pull a rabbit out of a hat in a proof, as long as the result is correct, but it makes students furious.
  • In statement (3) of the proof we are considering here, the rabbit is the trivially commutative diagram that is adjoined on the right of the diagram from (2).
  • Statement (4) uses a fact known to all diagram chasers: Two joined commutative squares make the outside rectangle commute. You can visualize this by seeing that the three red paths shown below all have the same composite. When I am chasing a complicated diagram I trace the various paths with my finger, or in my head.



    You could also show it by pointing out that $\alpha\circ F\alpha\circ F\eta=\alpha\circ\eta\circ\alpha$, but to check that I think most of us would go back and look at the diagram in (3) to see why it is true. Why not work directly with the diagram?

  • The definition of initiality requires that there be only one $F$-algebra morphism from $\alpha:Fa\to a$ to itself. This means that the upper and lower bows in (7) commute.
  • The diagonal identity arrow in (8) is justified by the fact that the upper bow is exactly the same diagram as the upper triangular diagram in (8). It follows that the upper triangle in (8) commutes. I visualize this as moving the bow down and to the left with the upper left node $Fa$ as a hinge, so that the two triangles coincide. (It needs to be flipped, too.) I should make an interactive diagram that shows this.
  • The lower triangle in (8) also commutes because the square in (2) is given to be commutative.
  • (Definition of isomorphism in a category) An arrow $f:a\to b$ in a category is an isomorphism if there is an arrow $g:b\to a$ for which these diagrams commute:


    xx


    This justifies statement (9).

Remark: I have been profligate in using as many diagrams as I want because this can be seen on a screen instead of on paper. That and the fact that much more data about domains and codomains are visible because I am using diagrams instead of equations involving composition means that the proof requires the readers to carry much less invisible data in their heads.

Rigorous proofs

Rich and rigorous

When we try to understand a math statement, we visualize what the statement says using metaphors, images and kinetic feelings to feel how it is true, or to suggest that the statement is not true.

If we are convinced that it is true, we may then want to prove it. Doing that involves pitching out all the lovely pictures and metaphors and gestures and treating the mathematical objects involved in the proof as static and inert. “Static” means the object does not change. “Inert” means that it does not affect anything else. I am saying how we think about math objects for the purpose of rigorous proof. I am not saying anything about “what math objects are”.

In this post I give a detailed example of a proof of the rigorous sort.

Example

Informal statement

First, I’ll describe this example in typical spoken mathematical English. Suppose you suspect that the following statement is true:

Claim: Let $f(x)$ be a differentiable function with $f’(a)=0$.
Going from left to right, suppose the graph of $f(x)$ goes UP before $x$ reaches $a$ and then DOWN for $x$ to the right of $a$
Then $a$ has to be a local maximum of the function.

This claim is written in informal math English. Mathematicians talk like that a lot. In this example they will probably wave their hands around in swoops.

The language used is an attempt to get a feeling for the graph going up to $(a,f(a))$ and then falling away from it. It uses two different metaphors for $x\lt a$ and $x\gt a$. I suspect that most of us would want to clean that up a bit even in informal writing.

A more formal statement

Theorem: Let $f$ be a real valued differentiable function defined on an open interval $R$. Let $a$ be a number in $R$ for which $f’(a)=0$. Suppose that for all $x\in R$, $f$ increases for $x\lt a$ and decreases for $x\gt a$. Then $f(a)$ is a maximum of $f$ in $R$.

Proof

  1. By definition of derivative, \[\lim_{x\to a}\frac{f(x)-f(a)}{x-a}=0.\]
  2. By definition of limit, then for any positive $\epsilon$ there is a positive $\delta$ for which if $0\lt|x-a|\lt\delta$ then \[\left|\frac{f(x)-f(a)}{x-a}\right|\lt\epsilon.\]
  3. By requiring that $\delta\lt 1$, it follows from (2) that for any positive $\epsilon$, there is a positive $\delta$ for which if $0\lt|x-a|\lt\delta$, then $|f(x)-f(a)|\lt\epsilon$.
  4. “$f$ increases for $x\lt a$” means that if $x$ and $y$ are numbers in $R$ and $x\lt y\lt a$, then $f(x)\lt f(y)$.
  5. “$f$ decreases for $x\gt a$” means that if $x$ and $y$ are numbers in $R$ and $a\lt x\lt y$, then $f(x)\gt f(y)$.
  6. “$f(a)$ is a maximum of $f$ in $R$” means that for $x\in R$, if $x\neq a$, then $f(x)\lt f(a)$.
  7. Suppose that $x\in R$ and $x\lt a$. (The case that $x\gt a$ has a symmetric proof.)
  8. Given $\epsilon\gt0$ with $\delta$ as given by (3), choose $y\in R$ such that $x\lt y\lt a$ and $|f(y)-f(a)|\lt\epsilon$.
  9. By (4), $f(x)\lt f(y)$. So by (8), \[\begin{align*}
    f(x)-f(a)&=
    f(x)-f(y)+f(y)-f(a)\\ &\lt f(y)-f(a)\\ &\leq|f(y)-f(a)|\lt\epsilon\end{align*}\]
    so that $f(x)\lt f(a)+\epsilon$. By inserting “$-f(y)+f(y)$” into the second formula, I am “adding zero cleverly”, an example of pulling a rabbit out of a hat. Students hate that. But you have to live with it; as long as the statements following are correct, it makes a valid proof. Rabbit-out-of-a-hat doesn’t make a proof wrong, but it does make you wonder how the author thought of it. Live with it.
  10. Since (9) is true for all positive $\epsilon$, it follows that $f(x)\leq f(a)$.
  11. By the same argument as that leading up to (10), $f(\frac{x-a}{2})\leq f(a)$.
  12. Since $f(x)\lt f(\frac{x-a}{2})$, it follows that $f(x)\lt f(a)$ as required.

About the proof

This proof is intended to be a typical “rigorous” proof. I suspect it tends to be more rigorous than most mathematicians would find necessary,

Extensionality

The point about “rigor”, about insisting that the objects be static and inert, is that this causes symbols and expression to retain the same meaning throughout the text. This is one aspect of extensionality.

Of course, some of the symbols denote variables, or variable objects. This does not mean they are “varying”. I am taking this point of view: A variable refers to a math object but you don’t know what it is. Constraints such as $x\lt a$ rule out some possible values but don’t generally tell you exactly what $x$ is. There is more about this in Variable Objects

The idea in (6), for example, is that $y$ denotes a real number. You don’t know which number it is, but you do know some facts about it: $x\lt y\lt a$, $|f(y)\lt f(a)|\lt\epsilon$ and so on. Similarly you don’t know what function $f$ is, but you do know some facts about it: It is differentiable, for example, and $f’(a)=0$.

My statement that the variables aren’t “varying” means specifically that each unbound occurrence of the variable refers to the same value as any other occurrence, unless some intervening remark changes its meaning. For example, the references to $x$ in (7) through (10) refer to the same value it has in (6), and (10), in particular, constitutes a statement that the claim about $x$ is correct.

Checkability

The elimination of metaphors that lets the proof achieve rigor is part of a plan in the back of the mind of at least some mathematicians who write proofs. The idea is that the proof be totally checkable:

  • Every statement in the proof has a semantics, a meaning, that is invariant (given the remark about variables above).
  • Each statement is justified by some of the previous statements. This justification is given by two systems that the reader is supposed to understand.
  • One system is the rules of symbol manipulation that are applied to the symbolic expressions, ordinary algebra, and higher-level manipulations used in particular branches of math.
  • The other system consists of the rules of logical reasoning that justify the claims that each statement follows logically from preceding ones.
  • These two systems are really branches of one system, the entire system of math computation and reasoning. It can be obscure which system is being used in a particular step.

Suppression of reasons

The logical and symbolic-manipulation reasons justifying the deductions may not be made completely explicit. In fact, for many steps they may not be mentioned at all, and for others, one or two phrases may be used to give a hint. This is standard practice in writing “rigorous” proofs. That is a descriptive statement, made without criticism. Giving all the reasons is essentially impossible without a computer.

I am aware that some work has been done to write proof checkers that can read a theorem like the one we are considering, stated in natural language, and correctly implement the semantics I have described in this list. I don’t know of any references to such work and would appreciate information about it.

Suppression of reasons makes it difficult to mechanically check a proof written in this standard “rigorous” writing style. Basically, you must be at at least the graduate student level to be able to make sense of what is said, and even experienced math research people find it difficult to read a paper in a very different field. Writing the proof so that it can be checked by a proof checker requires understanding of the same sort, and it typically makes the proof much longer.

One hopeful new approach is to write the proofs using homotopy type theory. The pioneers in that field report that the proofs don’t expand nearly as much as is required by first order logic.

Examples of suppression

Here are many examples of suppression in the $\epsilon$-$\delta$ proof above. This is intended to raise your consciousness concerning how nearly opaque writing in math research is to anyone but the cognoscenti.

  • The first sentence of the theorem names $R$ and $f$ and puts constraints on them that can be used to justify statements in the proof. The naming of $R$ and $f$ requires that every occurrence of $R$ in the proof refers to the same mathematical object, and similarly for $f$.

Remark: The savvy reader “knows” the facts stated in (a), possibly entirely subconsciously. For many of us there is no conscious thought of constraints and permanence of naming. My goal is to convince those who teach beginning abstract math course to become conscious of these phenomena. This remark applies to all the following items as well.

  • The second sentence gives $a$ a specific meaning that will be maintained throughout the proof. It also puts constraints on $a$ and an additional constraint on $f$.
  • The third sentence gives a constraint on $R$, $f$ and $a$. It does not give a constraint on $x$, which is a bound variable. Nor does it name $x$ as a specific number with the same meaning in the rest of the proof. (That happens later).
  • The fact that the first three sentences impose constraints on various objects is signaled by the fact that the sentences are introduced by “let” and “suppose”. The savvy reader knows this.
  • The fourth sentence announces that “$f(a)$ is a maximum of $f$ in $R$” is a consequence of the constraints imposed by the preceding three sentences. (In other words, it follows from the context.) This is signaled by the word “then”.
  • The fact that the paragraph is labeled “Theorem” informs us that the fourth sentence is therefore a statement of what is to be proved, and that every constraint imposed by the first three sentences of the Theorem may be used in the proof.
  • In the proof, statements (1), (4), (5) and (6) rewrite the statements in the theorem according to the definitions of the words involved, namely “derivative: “increases”, “decreases” and “maximum”. Rewriting statements according to the definitions of the words involved is a fundamental method for starting a proof.
  • (2) follows from (1) by rewriting using the definition of “limit”. Note that pattern-matching against the definition of limit requires understanding that there is a zero inside the absolute value signs that is not written down. Could a computer proof-checker handle that?
  • (3) follows from (2). The reader or proof-checker must:
    • Know that it is acceptable to put an upper bound on $\delta$ in the definition of limit.
    • Notice that you can move $|x-a|$ out of the denominator because $x\neq a$ by (2).
  • The conclusion in (6) that we much show that $f(x)\lt f(a)$ is now the statement we must prove.

Remark: In the following items, I mention the context of the proof. I am using the word informally here. It is used in some forms of formal logic with a related but more precise meaning. The context consists of the variables you must hold in your head as you read each part of the proof, along with their current constraints. “Current” means the “now” that you are in when considering the step of the proof you are reading right now. I give some references at the end of the post.

  • At the point between (6) and (7), our context consists of $a$, $R$ and $f$ all subject to some constraints. $x$ is not yet in the context of our proof because its previous occurrences in the theorems and in (1) through (6) have been bound, mostly by an unexpressed universal quantifier. Now we are to think of $x$ as a specific number bound by some constraints.
  • The statement in (7) that the case $x\gt a$ as a symmetric proof is a much higher-level claim than the other steps in this proof, even though in fact it is not very high level compared to statements such as “An application of Serre’s spectral sequence shows$\ldots$”. Most mathematicians with even a little experience will read this statement and accept it in the confidence that they will know how to swap “$\lt$” and “$\gt$” in the proof in the correct way (which is a bit picky) to provide a dual proof. Some students might write out the dual proof to make sure they understand it (more likely because writing it out was a class assignment). I await the day that an automated proof checker can handle a statement like this.
  • (8) introduces three new math objects $\epsilon$, $\delta$ and $y$ subject to several constraints. The symbols occur earlier but they are all bound. $\epsilon$ will be fixed in our context from now until (10). The others don’t appear later.
  • (9) consists of several steps of algebraic computation. A cognoscent (I am tired of writing “savvy”) reader first looks at the computation as a whole and notices that it deduces that $|f(x)-f(a)|\lt\epsilon$, which is almost what is to be proved. This helps the reader understand the reason for the calculation. No mention whatever is made in this step of all this stuff that should go through your mind (or the proof-checker’s “mind”).
  • The computations in (9) are are basic algebra not explained step by step, except that the remark that $f(x)\lt f(y)$ explains how you get $f(x)-f(y)+f(y)-f(a) \lt f(y)-f(a)$.
  • (10) banishes $\epsilon$ from the context by universally quantifying over it. That $f(x)\leq f(a)$ follows by the garbage-dump-in-Star-Wars trick that often baffles first year analysis students: Since for all positive $\epsilon$, $f(x)\lt f(a)+\epsilon$, then $f(x)\leq f(a)$. (See also Terry Tao’s article in Tricks Wiki.)
  • (11) “By the same argument as leading up to (10)” puts some demands on the reader, who has to discover that you have to go back to (7) and do the following steps with a new context using a value of $x$ that is halfway closer to $a$ than the “old” $x$ was. This means in particular that the choice of $\frac{x-2}{2}$ is unnecessarily specific. But it works.
  • (12) suppresses the reference to (11).
  • References

    I have written extensively on these topics. Here are some links.

    Rich-rigorous bifurcation in math thinking

The symbolic language

Math English and the language of proofs

Proofs and context