Tag Archives: epsilon-delta

Problems caused for students by the two languages of math

The two languages of math

Mathematics is communicated using two languages: Mathematical English and the symbolic language of math (more about them in two languages).

This post is a collection of examples of the sorts of trouble that the two languages cause beginning abstract math students. I have gathered many of them here since they are scattered throughout the literature. I would welcome suggestions for other references to problems caused by the languages of math.

In many of the examples, I give links to the literature and leave you to fish out the details there. Almost all of the links are to documents on the internet.

There is an extensive list of references.

Conjectures

Scattered through this post are conjectures. Like most of my writing about difficulties students have with math language, these conjectures are based on personal observation over 37 years of teaching mostly computer engineering and math majors. The only hard research of any sort I have done in math ed consists of the 426 citations of written mathematical writing included in the Handbook of Mathematical Discourse.

Disclaimer

This post is an attempt to gather together the ways in which math language causes trouble for students. It is even more preliminary and rough than most of my other posts.

  • The arrangement of the topics is unsatisfactory. Indeed, the topics are so interrelated that it is probably impossible to give a satisfactory linear order to them. That is where writing on line helps: Lots of forward and backward references.
  • Other people and I have written extensively about some of the topics, and they have lots of links. Other topics are stubs and need to be filled out. I have probably missed important points about and references to many of them.
  • Please note that many of the most important difficulties that students have with understanding mathematical ideas are not caused by the languages of math and are not represented here.

I expect to revise this article periodically as I find more references and examples and understand some of the topics better. Suggestions would be very welcome.

Intricate symbolic expressions

I have occasionally had students tell me that have great difficulty understanding a complicated symbolic expression. They can’t just look at it and learn something about what it means.

Example

Consider the symbolic expression \[\displaystyle\left(\frac{x^3-10}{3 e^{-x}+1}\right)^6\]

Now, I could read this expression aloud as if it were text, or more precisely describe it so that someone else could write it down. But if I am in math mode and see this expression I don’t “read” it, even to myself.

I am one of those people who much of the time think in pictures or abstractions without words. (See references here.)

In this case I would look at the expression as a structured picture. I could determine a number of things about it, and when I was explaining it I would point at the board, not try to pronounce it or part of it:

  • The denominator is always positive so the expression is defined for all reals.
  • The exponent is even so the value of the expression is always nonnegative. I would say, “This (pointing at the exponent) is an even power so the expression is never negative.”
  • It is zero in exactly one place, namely $x=\sqrt[3]{10}$.
  • Its derivative is also $0$ at $\sqrt[3]{10}$. You can see this without calculating the formula for the derivative (ugh).

There is much more about this example in Zooming and Chunking.

Algebra in high school

There are many high school students stymied by algebra, never do well at it, and hate math as a result. I have known many such people over the years. A revealing remark that I have heard many times is that “algebra is totally meaningless to me”. This is sometimes accompanied by a remark that geometry is “obvious” or something similar. This may be because they think they have to “read” an algebraic expression instead of studying it as they would a graph or a diagram.

Conjecture

Many beginning abstractmath students have difficulty understanding a symbolic expression like the one above. Could this be cause by resistance to treating the expression as a structure to be studied?

Context-sensitive pronunciation

A symbolic assertion (“formula” to logicians) can be embedded in a math English sentence in different ways, requiring the symbolic assertion to be pronounced in different ways. The assertion itself is not modified in any way in these different situations.

I used the phrase “symbolic assertion” in abstractmath.org because students are confused by the logicians’ use of “formula“.
In everyday English, “$\text{H}_2\text{O}$” is the “formula” for water, but it is a term, not an assertion.

Example

“For every real number $x\gt0$ there is a real number $y$ such that $x\gt y\gt0$.”

  • In the sentence above, the assertion “$x\gt0$” must be pronounced “$x$ that is greater than $0$” or something similar.
  • The standalone assertion “$x\gt0$” is pronounced “$x$ is greater than $0$.”
  • The sentence “Let $x\gt0$” must be pronounced “Let $x$ be greater than $0$”.

The consequence is that the symbolic assertion, in this case “$x\gt0$”, does not reveal that role it plays in the math English sentence that it is embedded in.

Many of the examples occurring later in the post are also examples of context-sensitive pronunciation.

Conjectures

Many students are subconsciously bothered by the way the same symbolic expression is pronounced differently in different math English sentences.

This probably impedes some students’ progress. Teachers should point this phenomenon out with examples.

Students should be discouraged from pronouncing mathematical expressions.

For one thing, this could get you into trouble. Consider pronouncing “$\sqrt{3+5}+6$”. In any case, when you are reading any text you don’t pronounce the words, you just take in their meaning. Why not take in the meaning of algebraic expressions in the same way?

Parenthetic assertions

A parenthetic assertion is a symbolic assertion embedded in a sentence in math English in such a way that is a subordinate clause.

Example

In the math English sentence

“For every real number $x\gt0$ there is a real number $y$ such that $x\gt y\gt0$”

mentioned above, the symbolic assertion “$x\gt0$” plays the role of a subordinate clause.

It is not merely that the pronunciation is different compared to that of the independent statement “$x\gt0$”. The math English sentence is hard to parse. The obvious (to an experienced mathematician) meaning is that the beginning of the sentence can be read this way: “For every real number $x$, which is bigger than $0$…”.

But new student might try to read it is “For every real number $x$ is greater than $0$ …” by literally substituting the standalone meaning of “$x\gt0$” where it occurs in the sentence. This makes the text what linguists call a garden path sentence. The student has to stop and start over to try to make sense of it, and the symbolic expression lacks the natural language hints that help understand how it should be read.

Note that the other two symbolic expressions in the sentence are not parenthetic assertions. The phrase “real number” needs to be followed by a term, and it is, and the phrase “such that” must be followed by a clause, and it is.

More examples

  • “Consider the circle $S^1\subseteq\mathbb{C}=\mathbb{R}^2$.” This has subordinate clauses to depth 2.
  • “The infinite series $\displaystyle\sum_{k=1}^\infty\frac{1}{k^2}$ converges to $\displaystyle\zeta(2)=\frac{\pi^2}{6}\approx1.65$”
  • “We define a null set in $I:=[a,b]$ to be a set that can be covered by a countable of intervals with arbitrarily small total length.” This shows a parenthetical definition.
  • “Let $F:A\to B$ be a function.”
    A type declaration is a function? In any case, it would be better to write this sentence simply as “Let $F:A\to B$”.

David Butler’s post Contrapositive grammar has other good examples.

Math texts are in general badly written. Students need to be taught how to read badly written math as well as how to write math clearly. Those that succeed (in my observation) in being able to read math texts often solve the problem by glancing at what is written and then reconstructing what the author is supposedly saying.

Conjectures

Some students are baffled, or at least bothered consciously or unconsciously, by parenthetic assertions, because the clues that would exist in a purely English statement are missing.

Nevertheless, many if not most math students read parenthetic assertions correctly the first time and never even notice how peculiar they are.

What makes the difference between them and the students who are stymied by parenthetic assertions?

There is another conjecture concerning parenthetic assertions below.

Context-sensitive meaning

“If” in definitions

Example

The word “if” in definitions does not mean the same thing that it means in other math statements.

  • In the definition “An integer is even if it is divisible by $2$,” “if” means “if and only if”. In particular, the definition implies that a function is not even if it is not divisible by $2$.
  • In a theorem, for example “If a function is differentiable, then it is continuous”, the word “if” has the usual one-way meaning. In particular, in this case, a continuous function might not be differentiable.

Context-sensitive meaning occurs in ordinary English as well. Think of a strike in baseball.

Conjectures

The nearly universal custom of using “if” to mean “if and only if” in definitions makes it a harder for students to understand implication.

This custom is not the major problem in understanding the role of definitions. See my article Definitions.

Underlying sets

Example

In a course in group theory, a lecturer may say at one point, “Let $F:G\to H$ be a homomorphism”, and at another point, “Let $g\in G$”.

In the first sentence, $G$ refers to the group, and in the second sentence it refers to the underlying set of the group.

This usage is almost universal. I think the difficulty it causes is subtle. When you refer to $\mathbb{R}$, for example, you (usually) are referring to the set of real numbers together with all its canonical structure. The way students think of it, a real number comes with its many relations and connections with the other real numbers, ordering, field properties, topology, and so on.

But in a group theory class, you may define the Klein $4$-group to be $\mathbb{Z}_2\times\mathbb{Z}_2$. Later you may say “the symmetry group of a rectangle that is not a square is the Klein $4$-group.” Almost invariably some student will balk at this.

Referring to a group by naming its underlying set is also an example of synecdoche.

Conjecture

Students expect every important set in math to have a canonical structure. When they get into a course that is a bit more abstract, suddenly the same set can have different structures, and math objects with different underlying sets can have the same structure. This catastrophic shift in a way of thinking should be described explicitly with examples.

Way back when, it got mighty upsetting when the earth started going around the sun instead of vice versa. Remind your students that these upheavals happen in the math world too.

Overloaded notation

Identity elements

A particular text may refer to the identity element of any group as $e$.

This is as far as I know not a problem for students. I think I know why: There is a generic identity element. The identity element in any group is an instantiation of that generic identity element. The generic identity element exists in the sketch for groups; every group is a functor defined on that sketch. (Or if you insist, the generic identity element exists in the first order theory for groups.) I suspect mathematicians subconsciously think of identity elements in this way.

Matrix multiplication

Matrix multiplication is not commutative. A student may forget this and write $(A^2B^2=(AB)^2$. This also happens in group theory courses.

This problem occurs because the symbolic language uses the same symbol for many different operations, in this case the juxtaposition notation for multiplication. This phenomenon is called overloaded notation and is discussed in abstractmath.org here.

Conjecture

Noncommutative binary operations written using juxtaposition cause students trouble because going to noncommutative operations requires abandoning some overlearned reflexes in doing algebra.

Identity elements seem to behave the same in any binary operation, so there are no reflexes to unlearn. There are generic binary operations of various types as well. That’s why mathematicians are comfortable overloading juxtaposition. But to get to be a mathematician you have to unlearn some reflexes.

Negation

Sometimes you need to reword a math statement that contains symbolic expressions. This particularly causes trouble in connection with negation.

Ordinary English

The English language is notorious among language learners for making it complicated to negate a sentence. The negation of “I saw that movie” is “I did not see that movie”. (You have to put “d** not” (using the appropriate form of “do”) before the verb and then modify the verb appropriately.) You can’t just say “I not saw that movie” (as in Spanish) or “I saw not that movie” (as in German).

Conjecture

The method in English used to negate a sentence may cause problems with math students whose native language is not English. (But does it cause math problems with those students?)

Negating symbolic expressions

Examples

  • The negation of “$n$ is even and a prime” is “$n$ is either odd or it is not a prime”. The negation should not be written “$n$ is not even and a prime” because that sentence is ambiguous. In the heat of doing a proof students may sometimes think the negation is “$n$ is odd and $n$ is not a prime,” essentially forgetting about DeMorgan. (He must roll over in his grave a lot.)
  • The negation of “$x\gt0$” is “$x\leq0$”. It is not “$x\lt0$”. This is a very common mistake.

These examples are difficulties caused by not understanding the math. They are not directly caused by difficulties with the languages of math.

Negating expressions containing parenthetic assertions

Suppose you want to prove:

“If $f:\mathbb{R}\to\mathbb{R}$ is differentiable, then $f$ is continuous”.

A good way to do this is by using the contrapositive. A mechanical way of writing the contrapositive is:

“If $f$ is not continuous, then $f:\mathbb{R}\to\mathbb{R}$ is not differentiable.”

That is not good. The sentence needs to be massaged:

“If $f:\mathbb{R}\to\mathbb{R}$ is not continuous, then $f$ is not differentiable.”

Even better would be to write the original sentence as:

“Suppose $f:\mathbb{R}\to\mathbb{R}$. Then if $f$ is differentiable, then $f$ is continuous.”

This is discussed in detail in David Butler’s post Contrapositive grammar.

Conjecture

Students need to be taught to understand parenthetic assertions that occur in the symbolic language and to learn to extract a parenthetic assertion and write it as a standalone assertion ahead of the statement it occurs in.

Scope

The scope of a word or variable consists of the part of the text for which its current definition is in effect.

Examples

  • “Suppose $n$ is divisible by $4$.” The scope is probably the current paragraph or perhaps the current proof. This means that the properties of $n$ are constrained in that section of the text.
  • “In this book, all rings are unitary.” This will hold for the whole book.

There are many more examples in the abstractmath.org article Scope.

If you are a grasshopper (you like to dive into the middle of a book or paper to find out what it says), knowing the scope of a variable can be hard to determine. It is particularly difficult for commonly used words or symbols that have been defined differently from the usual usage. You may not suspect that this has happened since it might be define once early in the text. Some books on writing mathematics have urged writers to keep global definitions to a minimum. This is good advice.

Finding the scope is considerably easier when the text is online and you can search for the definition.

Conjecture

Knowing the scope of a word or variable can be difficult. It is particular hard when the word or variable has a large scope (chapter or whole book.)

Variables

Variables are often introduced in math writing and then used in the subsequent discussion. In a complicated discussion, several variables may be referred to that have different statuses, some of them introduced several pages before. There are many particular ways discussed below that can cause trouble for students. This post is restricted to trouble in connection with the languages of math. The concept of variable is difficult in itself, not just because of the way the math languages represent them, but that is not covered here.

Much of this part of the post is based on work of Susanna Epp, including three papers listed in the references. Her papers also include many references to other work in the math ed literature that have to do with understanding variables.

See also Variables in abstractmath.org and Variables in Wikipedia.

Types

Students blunder by forgetting the type of the variable they are dealing with. The example given previously of problems with matrix multiplication is occasioned by forgetting the type of a variable.

Conjecture

Students sometimes have problems because they forget the data type of the variables they are dealing with. This is primarily causes by overloaded notation.

Dependent and independent

If you define $y=x^2+1$, then $x$ is an independent variable and $y$ is a dependent variable. But dependence and independence of variablesare more general than that example suggests.
In an epsilon-delta proof of the limit of a function (example below,) $\varepsilon$ is independent and $\delta$ is dependent on $\varepsilon$, although not functionally dependent.

Conjecture

Distinguishing dependent and independent variables causes problems, particularly when the dependence is not clearly functional.

I recently ran across a discussion of this on the internet but failed to record where I saw it. Help!

Bound and free

This causes trouble with integration, among other things. It is discussed in abstractmath.org in Variables and Substitution. I expect to add some references to the math ed literature soon.

Instantiation

Some of these variables may be given by existential instantiation, in which case they are dependent on variables that define them. Others may be given by universal instantiation, in which case the variable is generic; it is independent of other variables, and you can’t impose arbitrary restrictions on it.

Existential instantiation

A theorem that an object exists under certain conditions allows you to name it and use it by that name in further arguments.

Example

Suppose $m$ and $n$ are integers. Then by definition, $m$ divides $n$ if there is an integer $q$ such that $n=qm$. Then you can use “$q$” in further discussion, but $q$ depends on $m$ and $n$. You must not use it with any other meaning unless you start a new paragraph and redefine it.

So the following (start of a) “proof” blunders by ignoring this restriction:

Theorem: Prove that if an integer $m$ divides both integers $n$ and $p$, then $m$ divides $n+p$.

“Proof”: Let $n = qm$ and $p = qm$…”

Universal instantiation

It is a theorem that for any integer $n$, there is no integer strictly between $n$ and $n+1$. So if you are given an arbitrary integer $k$, there is no integer strictly between $k$ and $k+1$. There is no integer between $42$ and $43$.

By itself, universal instantiation does not seem to cause problems, provided you pay attention to the types of your variables. (“There is no integer between $\pi$ and $\pi+1$” is false.)

However, when you introduce variables using both universal and existential quantification, students can get confused.

Example

Consider the definition of limit:

Definition: $\lim_{x\to a} f(x)=L$ if and only if for every $\epsilon\gt0$ there is a $\delta\gt0$ for which if $|x-a|\lt\delta$ then $|f(x)-L|\lt\epsilon$.

A proof for a particular instance of this definition is given in detail in Rabbits out of a Hat. In this proof, you may not put constraints on $\epsilon$ except the given one that it is positive. On the other hand, you have to come up with a definition of $\delta$ and prove that it works. The $\delta$ depends on what $f$, $a$ and $L$ are, but there are always infinitely many values of $\delta$ which fit the constraints, and you have to come up with only one. So in general, two people doing this proof will not get the same answer.

Reference

Susanna Epp’s paper Proof issues with existential quantification discusses the problems that students have with both existential and universal quantification with excellent examples. In particular, that paper gives examples of problems students have that are not hinted at here.

References

A nearly final version of The Handbook of Mathematical Discourse is available on the web with links, including all the citations. This version contains some broken links. I am unable to recompile it because TeX has evolved enough since 2003 that the source no longer compiles. The paperback version (without the citations) can be bought as a book here. (There are usually cheaper used versions on Amazon.)

Abstractmath.org is a website for beginning students in abstract mathematics. It includes most of the material in the Handbook, but not the citations. The Introduction gives you a clue as to what it is about.

Two languages

My take on the two languages of math are discussed in these articles:

The Language of Mathematics, by Mohan Ganesalingam, covers these two languages in more detail than any other book I know of. He says right away on page 18 that mathematical language consists of “textual sentences with symbolic material embedded like ‘islands’ in the text.” So for him, math language is one language.

I have envisioned two separate languages for math in abstractmath.org and in the Handbook, because in fact you can in principle translate any mathematical text into either English or logical notation (first order logic or type theory), although the result in either case would be impossible to understand for any sizeable text.

Topics in abstractmath.org

Context-sensitive interpretation.

“If” in definitions.

Mathematical English.

Parenthetic assertion.

Scope

Semantic contamination.

Substitution.

The symbolic language of math

Variables.

Zooming and Chunking.

Topics in the Handbook of mathematical discourse.

These topics have a strong overlap with the topics with the same name in abstractmath.org. They are included here because the Handbook contains links to citations of the usage.

Context-sensitive.

“If” in definitions.

Parenthetic assertion.

Substitution.

Posts in Gyre&Gimble

Names

Naming mathematical objects

Rabbits out of a Hat.

Semantics of algebra I.

Syntactic and semantic thinkers

Technical meanings clash with everyday meanings

Thinking without words.

Three kinds of mathematical thinkers

Variations in meaning in math.

Other references

Contrapositive grammar, blog post by David Butler.

Proof issues with existential quantification, by Susanna Epp.

The role of logic in teaching proof, by Susanna Epp (2003).

The language of quantification in mathematics instruction, by Susanna Epp (1999).

The Language of Mathematics: A Linguistic and Philosophical Investigation
by Mohan Ganesalingam, 2013. (Not available from the internet.)

On the communication of mathematical reasoning, by Atish Bagchi, and Charles Wells (1998a), PRIMUS, volume 8, pages 15–27.

Variables in Wikipedia.

Creative Commons License

This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

Send to Kindle

Rabbits out of a hat

This is a revision and expansion of the entry on rabbits in the abstractmath article Dysfunctional attitudes and behaviors.

Rabbits

Sometimes when you are reading or listening to a proof you will find yourself following each step but with no idea why these steps are going to give a proof. This can happen with the whole structure of the proof or with the sudden appearance of a step that seems like the prover pulled a rabbit out of a hat . You feel as if you are walking blindfolded.

Example (mysterious proof structure)

The lecturer says he will prove that for an integer $n$, if $n^2$ is even then $n$ is even. He begins the proof: Let $n^2$ be odd” and then continues to the conclusion, “Therefore $n$ is odd.”

Why did he begin a proof about being even with the assumption that $n$ is odd?

The answer is that in this case he is doing a proof by contrapositive . If you don’t recognize the pattern of the proof you may be totally lost. This can happen if you don’t recognize other forms, for example contradiction and induction.

Example (rabbit)

You are reading a proof that $\underset{x\to2}{\mathop{\lim }}{{x}^{2}}=4$. It is an $\varepsilon \text{-}\delta$ proof, so what must be proved is:

  • For any positive real number $\varepsilon $,
  • there is a positive real number $\delta $ for which:
  • if $\left| x-2 \right|\lt\delta$ then
  • $\left| x^2-4 \right|\lt\varepsilon$.

Proof

Here is the proof, with what I imagine might be your agitated reaction to certain steps. Below is a proof with detailed explanations .

1) Suppose $\varepsilon \gt0$ is given.

2) Let $\delta =\text{min}\,(1,\,\frac{\varepsilon }{5})$ (the minimum of the two numbers 1 and $\frac{\varepsilon}{5}$ ).

Where the *!#@! did that come from? They pulled it out of thin air! I can’t see where we are going with this proof!

3) Suppose that $\left| x-2 \right|\lt\delta$.

4) Then $\left| x-2 \right|\lt1$ by (2) and (3).

5) By (4) and algebra, $\left|x+2 \right|\lt5$.

Well, so what? We know that $\left| x+39\right|\lt42$ and lots of other things, too. Why did they do this?

6) Also $\left| x-2 \right|\lt\frac{\varepsilon }{5}$ by (2) and(3).

7) Then $\left| {{x}^{2}}-4\right|=\left| (x-2)(x+2) \right|\lt\frac{\varepsilon }{5}\cdot 5=\varepsilon$ by (5) and (6). End of Proof.

Remarks

This proof is typical of proofs in texts.

  • Steps 2) and 5) look like they were rabbits pulled out of a hat.
  • The author gives no explanation of where they came from.
  • Even so, each step of the proof follows from previous steps, so the proof is correct.
  • Whether you are surprised or not has nothing to do with whether it is correct.
  • In order to understand a proof, you do not have to know where the rabbits came from.
  • In general, the author did not think up the proof steps in the order they occur in the proof. (See this remark in the section on Forms of Proofs.) See also look ahead.

Proof with detailed explanations

  1. Suppose $\varepsilon >0$ is given. (We are starting a proof by universal generalization.)
  2. Let $\delta$ be the minimum of the two numbers $1$ and $\frac{\varepsilon}{5}$). (Rabbit out of the hat. You can “let” any symbol mean anything you want, so this is a legitimate thing to do even if you don’t see where this is all going.{
  3. Suppose $\left|x-2\right|\lt\delta$. (We are about to prove the conditional statement “If $\left| x-2 \right|\lt\delta$ then $\left| {{x}^{2}}-4 \right|\lt\varepsilon$” and we are proceeding by the direct method.)
  4. Then $\left| x-2 \right|\lt 1$ by (2) and (3). (The fact that $\delta =\text{min}\,(1,\,\frac{\varepsilon }{5})$ means that $\delta \le 1$ and that $\delta \le \frac{\varepsilon }{5}$. Since $\left| x-2 \right|\lt \delta $, the statement $\left| x-2 \right|\lt 1$ follows by transitivity of “$\lt $”. This is another rabbit. WHY do we want $\left| x-2 \right|\lt 1$? Be Patient.)
  5. By (4) and algebra, $\left| x+2 \right|\lt 5$. ($\left| x-2 \right|\lt 1$ means that $-1\lt x-2\lt 1$. Add $4$ to each term in this equation to get $3\lt x+2\lt 5$. This is another rabbit, but it is a correct statement!)
  6. Also $\left| x-2 \right|\lt \frac{\varepsilon }{5}$ by (2) and (3). ((2) says that $\delta\le\frac{\varepsilon }{5}$ and (3) says that $\left| x-2 \right|\lt\delta$, so $\left| x-2 \right|\lt \frac{\varepsilon }{5}$ follows by transitivity.)
  7. Then $\left| {{x}^{2}}-4\right|=\left| (x-2)(x+2) \right|\lt\frac{\varepsilon }{5}\cdot 5=\varepsilon$ by (5) and (6). End of Proof. (This last statement actually shows the algebra.)

Coming up with that proof

The author did not think up the proof steps in the order they occur in the proof. She looked ahead at the goal of proving that \[\left| {{x}^{2}}-4\right|\lt\varepsilon\] and thought of factoring the left side. Now she must prove that \[\left| (x-2)(x+2) \right|\lt\varepsilon\]

But if $\left|x-2\right|$ is small then $x$ has to be close to $2$, so that $x + 2$ can’t be too big. Since the only restriction on $\delta$ is that it has to be positive, let’s restrict it to being smaller than $1$. (The choice of $1$ is purely arbitrary. Any positive real number would do.)

In that case step (5) shows that $\left|x+2\right|\lt5$.. So how small do you have to make to make $\varepsilon$? In other words, how small do you have to make $\delta $ to make $\left| 5(x-2) \right|\lt\varepsilon$ (remembering that $\left| x-2 \right|\lt\delta $). Well, clearly $\frac{\varepsilon }{5}$ will do!

That explains her choice of $\delta$ be the minimum of the two numbers $1$ and $\frac{\varepsilon}{5}$. Notice that that choice is made very early in the proof but it was made only after experimenting with the sizes of $\left|x-2\right|$ and $\left|x+2\right|$.

You can check that if she had chosen to restrict $\delta $ to being less than 42, then she would need $\delta =\text{min}\,(42,\,\frac{\varepsilon }{47})$.

Acknowledgments

Thanks to Robert Burns for corrections and suggestions

Send to Kindle

Rigorous proofs

Rich and rigorous

When we try to understand a math statement, we visualize what the statement says using metaphors, images and kinetic feelings to feel how it is true, or to suggest that the statement is not true.

If we are convinced that it is true, we may then want to prove it. Doing that involves pitching out all the lovely pictures and metaphors and gestures and treating the mathematical objects involved in the proof as static and inert. “Static” means the object does not change. “Inert” means that it does not affect anything else. I am saying how we think about math objects for the purpose of rigorous proof. I am not saying anything about “what math objects are”.

In this post I give a detailed example of a proof of the rigorous sort.

Example

Informal statement

First, I’ll describe this example in typical spoken mathematical English. Suppose you suspect that the following statement is true:

Claim: Let $f(x)$ be a differentiable function with $f'(a)=0$.
Going from left to right, suppose the graph of $f(x)$ goes UP before $x$ reaches $a$ and then DOWN for $x$ to the right of $a$
Then $a$ has to be a local maximum of the function.

This claim is written in informal math English. Mathematicians talk like that a lot. In this example they will probably wave their hands around in swoops.

The language used is an attempt to get a feeling for the graph going up to $(a,f(a))$ and then falling away from it. It uses two different metaphors for $x\lt a$ and $x\gt a$. I suspect that most of us would want to clean that up a bit even in informal writing.

A more formal statement

Theorem: Let $f$ be a real valued differentiable function defined on an open interval $R$. Let $a$ be a number in $R$ for which $f'(a)=0$. Suppose that for all $x\in R$, $f$ increases for $x\lt a$ and decreases for $x\gt a$. Then $f(a)$ is a maximum of $f$ in $R$.

Proof

  1. By definition of derivative, \[\lim_{x\to a}\frac{f(x)-f(a)}{x-a}=0.\]
  2. By definition of limit, then for any positive $\epsilon$ there is a positive $\delta$ for which if $0\lt|x-a|\lt\delta$ then \[\left|\frac{f(x)-f(a)}{x-a}\right|\lt\epsilon.\]
  3. By requiring that $\delta\lt 1$, it follows from (2) that for any positive $\epsilon$, there is a positive $\delta$ for which if $0\lt|x-a|\lt\delta$, then $|f(x)-f(a)|\lt\epsilon$.
  4. “$f$ increases for $x\lt a$” means that if $x$ and $y$ are numbers in $R$ and $x\lt y\lt a$, then $f(x)\lt f(y)$.
  5. “$f$ decreases for $x\gt a$” means that if $x$ and $y$ are numbers in $R$ and $a\lt x\lt y$, then $f(x)\gt f(y)$.
  6. “$f(a)$ is a maximum of $f$ in $R$” means that for $x\in R$, if $x\neq a$, then $f(x)\lt f(a)$.
  7. Suppose that $x\in R$ and $x\lt a$. (The case that $x\gt a$ has a symmetric proof.)
  8. Given $\epsilon\gt0$ with $\delta$ as given by (3), choose $y\in R$ such that $x\lt y\lt a$ and $|f(y)-f(a)|\lt\epsilon$.
  9. By (4), $f(x)\lt f(y)$. So by (8), \[\begin{align*}
    f(x)-f(a)&=
    f(x)-f(y)+f(y)-f(a)\\ &\lt f(y)-f(a)\\ &\leq|f(y)-f(a)|\lt\epsilon\end{align*}\]
    so that $f(x)\lt f(a)+\epsilon$. By inserting “$-f(y)+f(y)$” into the second formula, I am “adding zero cleverly”, an example of pulling a rabbit out of a hat. Students hate that. But you have to live with it; as long as the statements following are correct, it makes a valid proof. Rabbit-out-of-a-hat doesn’t make a proof wrong, but it does make you wonder how the author thought of it. Live with it.
  10. Since (9) is true for all positive $\epsilon$, it follows that $f(x)\leq f(a)$.
  11. By the same argument as that leading up to (10), $f(\frac{x-a}{2})\leq f(a)$.
  12. Since $f(x)\lt f(\frac{x-a}{2})$, it follows that $f(x)\lt f(a)$ as required.

About the proof

This proof is intended to be a typical “rigorous” proof. I suspect it tends to be more rigorous than most mathematicians would find necessary,

Extensionality

The point about “rigor”, about insisting that the objects be static and inert, is that this causes symbols and expression to retain the same meaning throughout the text. This is one aspect of extensionality.

Of course, some of the symbols denote variables, or variable objects. This does not mean they are “varying”. I am taking this point of view: A variable refers to a math object but you don’t know what it is. Constraints such as $x\lt a$ rule out some possible values but don’t generally tell you exactly what $x$ is. There is more about this in Variable Objects

The idea in (6), for example, is that $y$ denotes a real number. You don’t know which number it is, but you do know some facts about it: $x\lt y\lt a$, $|f(y)\lt f(a)|\lt\epsilon$ and so on. Similarly you don’t know what function $f$ is, but you do know some facts about it: It is differentiable, for example, and $f'(a)=0$.

My statement that the variables aren’t “varying” means specifically that each unbound occurrence of the variable refers to the same value as any other occurrence, unless some intervening remark changes its meaning. For example, the references to $x$ in (7) through (10) refer to the same value it has in (6), and (10), in particular, constitutes a statement that the claim about $x$ is correct.

Checkability

The elimination of metaphors that lets the proof achieve rigor is part of a plan in the back of the mind of at least some mathematicians who write proofs. The idea is that the proof be totally checkable:

  • Every statement in the proof has a semantics, a meaning, that is invariant (given the remark about variables above).
  • Each statement is justified by some of the previous statements. This justification is given by two systems that the reader is supposed to understand.
  • One system is the rules of symbol manipulation that are applied to the symbolic expressions, ordinary algebra, and higher-level manipulations used in particular branches of math.
  • The other system consists of the rules of logical reasoning that justify the claims that each statement follows logically from preceding ones.
  • These two systems are really branches of one system, the entire system of math computation and reasoning. It can be obscure which system is being used in a particular step.

Suppression of reasons

The logical and symbolic-manipulation reasons justifying the deductions may not be made completely explicit. In fact, for many steps they may not be mentioned at all, and for others, one or two phrases may be used to give a hint. This is standard practice in writing “rigorous” proofs. That is a descriptive statement, made without criticism. Giving all the reasons is essentially impossible without a computer.

I am aware that some work has been done to write proof checkers that can read a theorem like the one we are considering, stated in natural language, and correctly implement the semantics I have described in this list. I don’t know of any references to such work and would appreciate information about it.

Suppression of reasons makes it difficult to mechanically check a proof written in this standard “rigorous” writing style. Basically, you must be at at least the graduate student level to be able to make sense of what is said, and even experienced math research people find it difficult to read a paper in a very different field. Writing the proof so that it can be checked by a proof checker requires understanding of the same sort, and it typically makes the proof much longer.

One hopeful new approach is to write the proofs using homotopy type theory. The pioneers in that field report that the proofs don’t expand nearly as much as is required by first order logic.

Examples of suppression

Here are many examples of suppression in the $\epsilon$-$\delta$ proof above. This is intended to raise your consciousness concerning how nearly opaque writing in math research is to anyone but the cognoscenti.

  • The first sentence of the theorem names $R$ and $f$ and puts constraints on them that can be used to justify statements in the proof. The naming of $R$ and $f$ requires that every occurrence of $R$ in the proof refers to the same mathematical object, and similarly for $f$.

Remark: The savvy reader “knows” the facts stated in (a), possibly entirely subconsciously. For many of us there is no conscious thought of constraints and permanence of naming. My goal is to convince those who teach beginning abstract math course to become conscious of these phenomena. This remark applies to all the following items as well.

  • The second sentence gives $a$ a specific meaning that will be maintained throughout the proof. It also puts constraints on $a$ and an additional constraint on $f$.
  • The third sentence gives a constraint on $R$, $f$ and $a$. It does not give a constraint on $x$, which is a bound variable. Nor does it name $x$ as a specific number with the same meaning in the rest of the proof. (That happens later).
  • The fact that the first three sentences impose constraints on various objects is signaled by the fact that the sentences are introduced by “let” and “suppose”. The savvy reader knows this.
  • The fourth sentence announces that “$f(a)$ is a maximum of $f$ in $R$” is a consequence of the constraints imposed by the preceding three sentences. (In other words, it follows from the context.) This is signaled by the word “then”.
  • The fact that the paragraph is labeled “Theorem” informs us that the fourth sentence is therefore a statement of what is to be proved, and that every constraint imposed by the first three sentences of the Theorem may be used in the proof.
  • In the proof, statements (1), (4), (5) and (6) rewrite the statements in the theorem according to the definitions of the words involved, namely “derivative: “increases”, “decreases” and “maximum”. Rewriting statements according to the definitions of the words involved is a fundamental method for starting a proof.
  • (2) follows from (1) by rewriting using the definition of “limit”. Note that pattern-matching against the definition of limit requires understanding that there is a zero inside the absolute value signs that is not written down. Could a computer proof-checker handle that?
  • (3) follows from (2). The reader or proof-checker must:
    • Know that it is acceptable to put an upper bound on $\delta$ in the definition of limit.
    • Notice that you can move $|x-a|$ out of the denominator because $x\neq a$ by (2).
  • The conclusion in (6) that we much show that $f(x)\lt f(a)$ is now the statement we must prove.

Remark: In the following items, I mention the context of the proof. I am using the word informally here. It is used in some forms of formal logic with a related but more precise meaning. The context consists of the variables you must hold in your head as you read each part of the proof, along with their current constraints. “Current” means the “now” that you are in when considering the step of the proof you are reading right now. I give some references at the end of the post.

  • At the point between (6) and (7), our context consists of $a$, $R$ and $f$ all subject to some constraints. $x$ is not yet in the context of our proof because its previous occurrences in the theorems and in (1) through (6) have been bound, mostly by an unexpressed universal quantifier. Now we are to think of $x$ as a specific number bound by some constraints.
  • The statement in (7) that the case $x\gt a$ as a symmetric proof is a much higher-level claim than the other steps in this proof, even though in fact it is not very high level compared to statements such as “An application of Serre’s spectral sequence shows$\ldots$”. Most mathematicians with even a little experience will read this statement and accept it in the confidence that they will know how to swap “$\lt$” and “$\gt$” in the proof in the correct way (which is a bit picky) to provide a dual proof. Some students might write out the dual proof to make sure they understand it (more likely because writing it out was a class assignment). I await the day that an automated proof checker can handle a statement like this.
  • (8) introduces three new math objects $\epsilon$, $\delta$ and $y$ subject to several constraints. The symbols occur earlier but they are all bound. $\epsilon$ will be fixed in our context from now until (10). The others don’t appear later.
  • (9) consists of several steps of algebraic computation. A cognoscent (I am tired of writing “savvy”) reader first looks at the computation as a whole and notices that it deduces that $|f(x)-f(a)|\lt\epsilon$, which is almost what is to be proved. This helps the reader understand the reason for the calculation. No mention whatever is made in this step of all this stuff that should go through your mind (or the proof-checker’s “mind”).
  • The computations in (9) are are basic algebra not explained step by step, except that the remark that $f(x)\lt f(y)$ explains how you get $f(x)-f(y)+f(y)-f(a) \lt f(y)-f(a)$.
  • (10) banishes $\epsilon$ from the context by universally quantifying over it. That $f(x)\leq f(a)$ follows by the garbage-dump-in-Star-Wars trick that often baffles first year analysis students: Since for all positive $\epsilon$, $f(x)\lt f(a)+\epsilon$, then $f(x)\leq f(a)$. (See also Terry Tao’s article in Tricks Wiki.)
  • (11) “By the same argument as leading up to (10)” puts some demands on the reader, who has to discover that you have to go back to (7) and do the following steps with a new context using a value of $x$ that is halfway closer to $a$ than the “old” $x$ was. This means in particular that the choice of $\frac{x-2}{2}$ is unnecessarily specific. But it works.
  • (12) suppresses the reference to (11).
  • References

    I have written extensively on these topics. Here are some links.

    Rich-rigorous bifurcation in math thinking

The symbolic language

Math English and the language of proofs

Proofs and context

Send to Kindle