Tag Archives: proof

Very early difficulties in studying abstract math

Introduction

There are a some difficulties that students have at the very beginning of studying abstract math that are overwhelmingly important, not because they are difficult to explain but because too many teachers don’t even know the difficulties exist, or if they do, they think they are trivial and the students should know better without being told. These difficulties cause too many students to give up on abstract math and drop out of STEM courses altogether.

I spent my entire career in math at Case Western Reserve University. I taught many calculus sections, some courses taken by math majors, and discrete math courses taken mostly by computing science majors. I became aware that some students who may have been A students in calculus essentially fell off a cliff when they had to do the more abstract reasoning involved in discrete math, and in the initial courses in abstract algebra, linear algebra, advanced calculus and logic.

That experience led me to write the Handbook of Mathematical Discourse and to create the website abstractmath.org. Abstractmath.org in particular grew quite large. It does describe some of the major difficulties that caused good students to fall of the abstraction cliff, but also describes many many minor difficulties. The latter are mostly about the peculiarities of the languages of math.

I have observed people’s use of language since I was like four or five years old. Not because I consciously wanted to — I just did. When I was a teenager I would have wanted to be a linguist if I had known what linguistics is.

I will describe one of the major difficulties here (failure to rewrite according to the definition) with an example. I am planning future posts concerning other difficulties that occur specifically at the very beginning of studying abstract math.

Rewrite according to the definition

To prove that a statement
involving some concepts is true,
start by rewriting the statement
using the definitions of the concepts.

Example

Definition

A function $f:S\to T$ is surjective if for any $t\in T$ there is an $s\in S$ for which $f(s)=t$.

Definition

For a function $f:S\to T$, the image of $f$ is the set \[\{t\in T\,|\,\text{there is an }s\in S\text{ for which }f(s)=t\}\]

Theorem

Let $f:S\to T$ be a function between sets. Then $f$ is surjective if and only if the image of $f$ is $T$.

Proof

If $f$ is surjective, then the statement “there is an $s\in S$ for which $f(s)=t$” is true for any $t\in T$ by definition of surjectivity. Therefore, by definition of image, the image of $f$ is $T$.

If the image of $f$ is $T$, then the definition of image means that there is an $s\in S$ for which $f(s)=t$ for any $t\in T$. So by definition of surjective, $f$ is surjective.

“This proof is trivial”

The response of many mathematicians I know is that this proof is trivial and a student who can’t come up with it doesn’t belong in a university math course. I agree that the proof is trivial. I even agree that such a student is not a likely candidate for getting a Ph.D. in math. But:

  • Most math students in an American university are not going to get a Ph.D. in math. They may be going on in some STEM field or to teach high school math.
  • Some courses taken by students who are not math majors take courses in which simple proofs are required (particularly discrete math and linear algebra). Some of these students may simply be interested in math for its own sake!

A sizeable minority of students who are taking a math course requiring proofs need to be told the most elementary facts about how to do proofs. To refuse to explain these facts is a disfavor to the mathematics community and adds to the fear and dislike of math that too many people already have.

These remarks may not apply to students in many countries other than the USA. See When these problems occur.

“This proof does not describe how mathematicians think”

The proof I wrote out above does not describe how I would come up with a proof of the statement, which would go something like this: I do math largely in pictures. I envision the image of $f$ as a kind of highlighted area of the codomain of $f$. If $f$ is surjective, the highlighting covers the whole codomain. That’s what the theorem says. I wouldn’t dream of writing out the proof I gave about just to verify that it is true.

More examples

Abstractmath.org and Gyre&Gimble contain several spelled-out theorems that start by rewriting according to the definition. In these examples one then goes on to use algebraic manipulation or to quote known theorems to put the proof together.

Comments

This post contains testable claims

Herein, I claim that some things are true of students just beginning abstract math. The claims are based largely on my teaching experience and some statements in the math ed literature. These claims are testable.

When these problems occur

In the United States, the problems I describe here occur in the student’s first or second year, in university courses aimed at math majors and other STEM majors. Students typically start university at age 18, and when they start university they may not choose their major until the second year.

In much of the rest of the world, students are more likely to have one more year in a secondary school (sixth form in England lasts two years) or go to a “college” for a year or two before entering a university, and then they get their bachelor’s degree in three years instead of four as in the USA. Not only that, when they do go to university they enter a particular program immediately — math, computing science, etc.

These differences may mean that the abstract math cliff occurs early in a student’s university career in the USA and before the student enters university elsewhere.

In my experience at CWRU, some math majors fall of the cliff, but the percentage of computing science students having trouble was considerably greater. On the other hand, more of them survived the discrete math course when I taught it because the discrete math course contain less abstraction and more computation than the math major courses (except linear algebra, which had a balance similar to the discrete math course — and was taken by a sizeable number of non-math majors).

References

Creative Commons License

This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

Send to Kindle

Mathematical Information I

Introduction

The January, 2016 meeting of the American Mathematical Society in Seattle included a special session on Mathe­matical Information in the Digital Age of Science. Here is a link to the list of talks in that session (you have to scroll down a ways to get to the list).

Several talks at that session were about communi­cating math, to other mathe­maticians and to the general public. Well, that’s what I have been about for the last 20 years. Mostly.

Overview

These posts discuss the ways we communi­cate math and (mostly in later posts) the revolution in math communication that the internet has caused. Parts of this discussion were inspired by the special session talks. When they are relevant, I include footnotes referring to the talks. Be warned that what I say about these ideas may not be the same as what the speakers had to say, but I feel I ought to give them credit for getting me to think about those concepts.

Some caveats

  • The distinctions between different kinds of math communi­cation are inevitably fuzzy.
  • Not all kinds of communication are mentioned.
  • Several types of communication normally occur in the same document.

Articles published in journals

Until recently, math journals were always published on paper. Now many journals exist only on the internet. What follows is a survey of the types of articles published in journals.

Refereed papers containing new results

These communications typically containing proofs of (usually new) theorems. Such papers are the main way that academic mathematicians get credit for their researchG for the purpose of getting tenure (at least in the USA), although some other types of credit are noted below.

Proofs published in refereed journals in the past were generally restricted to formal proofs, without very many comments intended to aid the reader’s under­standing. This restricted text was often enforced by the journal. In the olden days this would have been prompted by the expense of publishing on paper. I am not sure how much this restriction has relaxed in electronic journals.

I have been writing articles for abstractmath.org and Gyre&Gimble for many years, and it has taken me a very long time to get over unnecessarily restricting the space I use in what I write. If I introduce a diagram in an article and then want to refer to it later, I don’t have to link to it — I can copy it into the current location. If it makes sense for an informative paragraph to occur in two different articles, I can put it into both articles. And so on. Nowadays, that sort of thing doesn’t cost anything.

Survey articles and invited addresses

You may also get credit for an invited address to a prestigious organi­zation, or for a survey of your field, in for example the Bulletin of the AMS. Invited addresses and surveys may contain considerably more explanatory asides. This was quite noticeable in the invited talks at the AMS Seattle meeting.

Books

There is a whole spectrum of math books. The following list mentions some Fraunhofer lines on the spectrum, but the gamut really is as continuous as a large finite list of books could be. This list needs more examples. (This is a blog post, so it has the status of an alpha release.)

Research books that are concise and without much explanation.

The Bourbaki books that I have dipped into (mostly the algebra book and mostly in the 1970’s) are definitely concise and seem to strictly avoid explanation, diagrams, pictures, etc). I have heard people say they are unreadable, but I have not found them so.

Contain helpful explanations that will make sense to people in the field but probably would be formidable to someone in a substantially different area.

Toposes, triples and theories, by Michael Barr and Charles Wells. I am placing our book here in the spectrum because several non-category-theorists (some of them computer scientists) have remarked that it is “formidable” or other words like that.

Intended to introduce professional mathematicians to a particular field.

Categories for the working mathematician, by Saunders Mac Lane. I learned from this (the 1971 edition) in my early days as a category theorist, six years after getting my Ph.D. In fact, I think that this book belongs to the grad student level instead of here, but I have not heard any comments one way or another.

Intended to introduce math graduate students to a particular field.

There are lots of examples of good books in this area. Years ago (but well after I got my Ph.D.), I found Serge Lang’s Algebra quite useful and studied parts of it in detail.

But for grad students? It is still used for grad students, but perhaps Nathan Jacobson’s Basic Algebra would be a better choice for a first course in algebra for first-year grad students.

The post My early life as a mathematician discusses algebra texts in the olden days, among other things.

Intended to explain a part of math to a general audience.

Love and math: the heart of hidden reality. by Edward Frenkel, 2014. This is a wonderful book. After reading it, I felt that at last I had some clue as to what was going on with the Langlands Program. He assumes that the reader knows very little about math and gives hand-waving pictorial expla­nations for some of the ideas. Many of the concepts in the book were already familiar to me (not at an expert level). I doubt that someone who had had no college math courses that included some abstract math would get much out of it.

Symmetry: A Journey into the Patterns of Nature, by Marcus du Sautoy, 2009. He also produced a video on symmetry.

My post Explaining “higher” math to beginners, describes du Sautoy’s use of terminology (among others).

Secrets of creation: the mystery of the prime numbers (Volume 1) by Matthew Watkins (author) and Matt Tweed (Illustrator), 2015. This is the first book of a trilogy that explains the connection between the Riemann $\zeta$ function and the primes. He uses pictures and verbal descriptions, very little terminology or symbolic notation. This is the best attempt I know of at explaining deep math that might really work for non-mathe­maticians.

My post The mystery of the prime numbers: a review describes the first book.

Piper Harron’s Thesis

The Equidistribution of Lattice Shapes of Rings of Integers of Cubic, Quartic, and Quintic Number Fields: an Artist’s Rendering, Ph.D. thesis by Piper Harron.

This is a remarkable departure from the usual dry, condensed, no-useful-asides Ph.D. thesis in math. Each chapter has three main parts, Layscape (explanations for nonspecialists — not (in my opinion) for nonmathe­maticians), Mathscape (most like what goes into the usual math paper but with much more explanation) and Weedscape (irrelevant stuff which she found helpful and perhaps the reader will too). The names of these three sections vary from chapter to chapter. This seems like a great idea, and the parts I have read are well-done.

These blog posts have useful comments about her thesis:

Types of explanations

Any explanation of math in any of the categories above will be of several different types. Some of them are considered here, and more will appear in Mathematical Information II.

The paper Varieties of Mathematical Prose, by Atish Bagchi and me, provides a more fine-grained description of certain types of math communication that includes some types of explanations and also other types of communication.

Images and metaphors

In abstractmath.org

I have written about images and metaphors in abstractmath.org:

Abstractmath.org is aimed at helping students who are beginning their study of abstract math, and so the examples are mostly simple and not at a high level of abstraction. In the general literature, the images and metaphors that are written about may be much more sophisticated.

The User’s GuideW

Luke Wolcott edits a new journal called Enchiridion: Mathematics User’s Guides (this link allows you to download the articles in the first issue). Each article in this journal is written by a mathematician who has published a research paper in a refereed journal. The author’s article in Enchiridion provides information intended to help the reader to understand the research paper. Enchiridion and its rationale is described in more detail in the paper The User’s Guide Project: Giving Experential Context to Research Papers.

The guidelines for writing a User’s Guide suggest writing them in four parts, and one of the parts is to introduce useful images and metaphors that helped the author. You can see how the authors’ user’s guides carry this out in the first issue of Enchiridion.

Piper Harron’s thesis

Piper Harron’s explanation of integrals in her thesis is a description of integrals and measures using creative metaphors that I think may raise some mathematicians’ consciousness and others’ hackles, but I doubt it would be informative to a non-mathematician. I love “funky-summing” (p. 116ff): it communicates how integration is related to real adding up a finite bunch of numbers in a liberal-artsy way, in other words via the connotations of the word “funky”, in contrast to rigorous math which depends on every word have an accumulation-of-properties definition.

The point about “funky-summing” (in my opinion, not necessarily Harron’s) is that when you take the limit of all the Riemann sums as all meshes go to zero, you get a number which

  • Is really and truly not a sum of numbers in any way
  • Smells like a sum of numbers

Connotations communicate metaphors. Metaphors are a major cause of grief for students beginning abstract math, but they are necessary for understanding math. Working around this paradox is probably the most important problem for math teachers.

Informal summaries of a proofW

The User’s Guide requires a “colloquial summary” of a paper as one of the four parts of the guide for that paper.

  • Wolcott’s colloquial summary of his paper keeps the level aimed at non-mathematicians, starting with a hand-waving explanation of what a ring is. He uses many metaphors in the process of explaining what his paper does.
  • The colloquial summary of another User’s Guide, by Cary Malkiewich, stays strictly at the general-public level. He uses a few metaphors. I liked his explanation of how mathematicians work first with examples, then finding patterns among the examples.
  • The colloquial summary of David White’s paper stays at the general-public level but uses some neat metaphors. He also has a perceptive paragraph discussing the role of category theory in math.

The summaries I just mentioned are interesting to read. But I wonder if informal summaries aimed at math majors or early grad students might be more useful.

Insights

The first of the four parts of the explanatory papers in Enchiridion is supposed to present the key insights and organizing principles that were useful in coming up with the proofs. Some of them do a good job with this. They are mostly very special to the work in question, but some are more general.

This suggests that when teaching a course in some math subject you make a point of explaining the basic techniques that have turned out very useful in the subject.

For example, a fundamental insight in group theory is:

Study the linear representations of a group.

That is an excellent example of a fundamental insight that applies everywhere in math:

Find a functor that maps the math objects you are studying to objects in a different branch of math.

The organizing principles listed in David White’s article has (naturally more specialized) insights like that.

Proof stories

“Proof stories” tell in sequence (more or less) how the author came up with a proof. This means describing the false starts, insights and how they came about. Piper Harron’s thesis does that all through her work.

Some authors do more than that: their proof stories intertwine the mathe­matical events of their progress with a recount of life events, which sometimes make a mathe­matical difference and sometimes just produces a pause to let the proof stew in their brain. Luke Wolcott wrote a User’s Guide for one of his own papers, and his proof story for that paper involves personal experiences. (I recommend his User’s Guide as a model to learn from.)

Reports of personal experiences in doing math seem to add to my grasp of the math, but I am not sure I understand why.

References

The talks in Seattle

  • List of all the talks.
  • W. Timothy Gowers, How should mathe­matical knowledge be organized? Talk at the AMS Special Session on Mathe­matical Information in the Digital Age of Science, 6 January 2016.
  • Colloquium notes. Gowers gave a series of invited addresses for which these are the notes. They have many instances of describing what sorts of problems obstruct a desirable step in the proof and what can be done about it.

  • Luke Wolcott, The User’s Guide. Talk at the AMS Special Session on Mathe­matical Information in the Digital Age of Science, 6 January 2016.

Creative Commons License< ![endif]>

This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

Send to Kindle

Context

This is a revised draft of the abstractmath.org article on context in math texts. Note: WordPress changed double primes into quotes. Tsk.

Context

Written and especially spoken language depends heavily on the context – the physical surroundings, the preceding conversation, and social and cultural assumptions.  Mathematical statements are produced in such contexts, too, but here I will discuss a special thing that happens in math conversation and writing that does not seem to happen much in other sorts of discourse:

The meanings of expressions
in both the symbolic language and math English
change from phrase to phrase
as the speaker or writer changes the constraints on them.

Example

In a math text, before the occurrence of a phrase such as “Let $n=3$”, $n$ may be known only as an integer variable.  After the phrase, it means specifically $3$.  So this phrase changes the meaning of $n$ by constraining $n$
to be $3$.  We say the context of occurrences of “$n$” before the phrase requires only that $n$ be an integer, but after the occurrence the context requires $n=3$.

Definition

In this article, the context at a particular location in mathematical discourse is the sum total of what the reader or listener can know about the symbols and names used in the discourse when they have read everything up to that location.

Remarks

  • Each clause can change the meaning of or constraints on one or more symbols or names. The conventions in effect during the discourse can also put constraints on the symbols and names.
  • Chierchia and McConnell-Ginet give a mathematical definition of context in the sense described here.
  • The references to “before” and “after” the phrase “Let $3$” refer to the physical location in text and to actual time in spoken math. There is more about this phenomenon in the Handbook of Mathematical Discourse, page 252, items (f) and (g).
  • Contextual changes of this sort take place using the pretense that you are reading the text in order, which many students and professionals do not do (they are “grasshoppers”).
  • I am not aware of much context-changing in everyday speech. One place it does occur is in playing games. For example, during some card games the word “trumps” changes meaning from time to time.
  • In symbolic logic, the context at a given place may be denoted by “$\Gamma$”.

Detailed example of a math text

Here is a typical example of a theorem and its proof.  It is printed twice, the second time with comments about the changes of context.  This is the same proof that is already analyzed practically to death in the chapter on presentation of proofs.

First time through

Definition: Divides

Let $m$ and $n$ be integers with $m\ne 0$. The statement “$m$ divides $n$” means that there is an integer $q$ for which $n=qm$

Theorem

Let $m$, $n$ and $p$ be integers, with $m$ and $n$ nonzero, and suppose $m$ divides $n$ and $n$ divides $p$.  Then $m$ divides $p$.

Proof

By definition of divides, there are integers $q$ and $q’$ for which $n=qm$ and $p=q’n$. We must prove that there is an integer $q”$ for which $p=q”m$. But $p=q’n=q’qm$, so let $q”=q’q$.  Then $p=q”m$.

Second time, with analysis

Definition: Divides

Begins a definition. The word “divides” is the word being defined. The scope of the definition is the following paragraph.

Let $m$ and $n$ be integers

$m$ and $n$ are new symbols in this discourse, constrained to be integers.

with $m\ne 0$

Another constraint on $m$.

The statement “$m$ divides $n$ means that”

This phrase means that what follows is the definition of “$m$ divides $n$”

there is an integer $q$

“There is” signals that we are beginning an existence statement and that $q$ is the bound variable within the existence statement.

for which $n=qm$

Now we know that “$m$ divides $n$” and “there is an integer $q$ for which $m=qn$” are equivalent statements.  Notes: (1) The first statement would only have implied the second statement if this had not been in the context of a definition. (2) After the conclusion of the definition, $m$, $n$ and $q$ are undefined variables.

Theorem

This announces that the next paragraph is a statement has been proved. In fact, in real time the statement was proved long before this discourse was written, but in terms of reading the text in order, it has not yet been proved.

Let $m$, $n$ and
$p$ be integers,

“Let” tells us that the following statement is the hypothesis of an implication, so we can assume that $m$, $n$ and $p$ are all integers.  This changes the status of $m$ and $n$, which were variables used in the preceding paragraph, but whose constraints disappeared at the end of the paragraph.  We are starting over with $m$ and $n$.

with $m$
and $n$ nonzero.

This clause is also part of the hypothesis. We can assume $m$ and $n$ are constrained to be nonzero.

and suppose $m$ divides $n$ and $n$ divides $p$.

This is the last clause in the hypothesis. We can assume that $m$ divides $n$ and $n$ divides $p$.

Then $m$
divides $p$.

This is a claim that $m$ divides $p$. It has a different status from the assumptions that $m$ divides $n$ and $n$ divides $p$. If we are going to follow the proof we have to treat $m$ and $n$ as if they divide $n$ and $p$ respectively. However, we can’t treat $m$ as if it divides $p$. All we know is that the author is claiming that $m$ divides $p$, given the facts in the hypothesis.

Proof

An announcement that a proof is about to begin, meaning a chain of math reasoning. The fact that it is a proof of the Theorem just stated is not explicitly stated.

By definition of divides, there are integers $q$ and $q’$ for which $n=qm$ and $p=q’n$.

The proof uses the direct method (rather than contradiction or induction or some other method) and begins by rewriting the hypothesis using the definition of “divides”. The proof does not announce the use of these techniques, it just starts in doing it. So $q$ and $q$’ are new symbols that satisfy the equations $n=qm$ and $p=q’n$. The phrase “by definition of divides” justifies the introduction of $q$ and $q’$. $m$, $n$ and $p$ have already been introduced in the statement of the Theorem.

We must prove that there is an integer $q”$ for which $p=q”m$.

Introduces a new variable $q”$ which has not been given a value. We must define it so that $p=q”m$; this requirement is justified (without saying so) by the definition of “divides”.

But $p=q’n=q’qm$,

This is a claim about $p$, $q$, $q’$, $m$ and $n$.  It is justified by certain preceding sentences but this justification is not made explicit. Note that “$p=q’n=q’qm$” pivots on $q’n$, in other words makes two claims about it.

so let $q”=q’q$.

We have already introduced $q”$; now we give it the value $q”=q’q$.

Then $p=q”m$

This is an assertion about $p$, $q”$ and $n$, justified (but not explicitly — note the hidden use of associativity) by the previous claim that $p=q’n=q’qm$.

 

The proof is now complete, although no
statement asserts that it is.

Remark

If you have some skill in reading proofs, all the stuff in the right hand column happens in your brain without, for the most part, your being conscious of it.

Acknowledgment

Thanks to Chris Smith for correcting errors.

References for “context”

Chierchia, G. and S. McConnell-Ginet
(1990), Meaning and Grammar. The MIT Press.

de Bruijn, N. G. (1994), “The mathematical vernacular, a
language for mathematics with typed sets”. In Selected Papers on Automath,
Nederpelt, R. P., J. H. Geuvers, and R. C. de Vrijer, editors, volume 133 of
Studies in Logic and the Foundations of Mathematics, pages 865 – 935. Elsevier

Steenrod, N. E., P. R. Halmos, M. M. Schif­fer,
and J. A. Dieudonné (1975), How to Write Mathematics.
American Mathematical Society.

Send to Kindle

The Mathematics Depository: A Proposal

Introduction

This post is about taking texts written in mathematical English and the symbolic language and encoding it in a formal language that could be tested by an automated proof verifier. This is a very difficult undertaking, but we could get closer and closer to a working system by a worldwide effort continuing over, probably, decades. The system would have to contain many components working together to create incremental improvements in the process.

This post, which is a first draft, outlines some suggestions as to how this could work. I do not discuss the encoding required, which is not my area of expertise. Yes, I understand that coding is the hard part!

Much work has been done by computing scientists in developing proof checking and proof-finding programs. Work has also been done, primarily by math education workers but also by some philosophers and computing scientists, in uncovering the many areas where ordinary math language is ambiguous and deviates from ordinary English usage. These characteristics confuse students and also make it hard to design a program that can interpret the language. I have been working in that area mostly from the math ed point of view for the last twenty years.

The Reference section lists many references to the problem of parsing mathematical English, some from the point of view of automatic translation of math language into code, but most from the point of view of helping students understand how to understand it.

The Mathematics Depository

I imagine a system for converting documents written in math language into machine-readable language and testing their claims. An organization, call it the Mathematics Depository, would be developed that is supported by many countries, organizations and individual supporters. It should consist of several components listed below, no doubt with other components as we become aware of needing them. The organization would be tasked with supporting and improving these components over time.

The main parts of the system

Each component is linked to a more detailed description that is given later in this post.

  • A Proof Verifier (PV), that inputs a proof and determines if it is correct.
  • A specification of a supported subset of Mathematical English and the symbolic language, that I will call Strict Math English (SME).
  • A Text-SME Converter, a program that would input a text written in ordinary math English that has been annotated by a knowledgeable person and convert it into SME.
  • An SME-PV Converter that will convert text written in SME into code that can be directly read by the Proof Verifier.
  • One or more Automatic Theorem Provers, that to begin with can take fairly simple conjectures written in SME and sometimes succeed in proving them.
  • An Annotation System containing an Annotation Editor that would allow a person to use SME to annotate an article written in ordinary math English so that it could be read by the Text-SME Converter.
  • A Data Base that would include the texts that have been collected in this endeavor, along with the annotations and the results of the proof checking.
  • A Data Base Miner that would watch for patterns in the annotations as new papers were submitted. The operators might also program it to watch for patterns in other aspects of the operation.

These facilities would be organized so that the systems work together, with the result that the individual components I named improve over time, both automatically and via human intervention.

Flow of Work

  1. A math text is submitted.
  2. If it is already in Strict Math English (SME), it is input to the Proof Verifier (PV).
  3. Otherwise, the math text is input into the Annotation System.
  4. The resulting SME text is input into the Text-SME Converter.
  5. The output of the Text-SME Converter is input into the Proof Verifier.
  6. The PV incorporates each definition in the text into the context of the math text. This is a specific meaning of the word “context”, including a list of the status of variables (bound, unbound, type, and so on), meanings of technical words, and other facts created in the text. “Context” is described informally in my article Context in abstractmath.org. That article gives references to the formal literature.
  7. In my experience mathematicians spend only a little time reading arguments step by step as described in the Context article. They usually look at a theorem and try to figure it out themselves, “cheating” occasionally by glancing at parts of the proof.

  8. Each mathematical assertion in the text is marked as a claim.
  9. The checking process records those claims occurring in the proof that are not proved in the text, along with any references given to other texts.
  10. If a reference to a result in another text is made, the PV looks for the result in the Database. If it does not find it, the PV incorporates the result and its location in the Database as an externally proven but untested claim.
  11. If no reference or proof for a claim is given, the PV checks the Database to see if it has already been proved.
  12. Any claim in the current text not shown as proven in the Database is submitted to the Automatic Theorem Prover (ATP). The output of the ATP is put in the database (proved, counterexample found, or unable to determine truth).
  13. If a segment of text is presented as a proof, it is input into the PV to be verified.
  14. The PV reports the result for each claimed proof, which can consist of several possibilities:
    • A counterexample for a proof is found, so the claim that the proof was supposed to report is false.
    • The proof contains gaps, so the claim is unsettled.
    • The proof is reported as correct.
  15. At the end of the process, all the information gathered is put into the Database:
    • The original text showing all the annotations.
    • The text in SME.
    • All claims, with their status (proven true, proven false, truth unknown, reference if one was given).
    • Every proof, with its status and the entire context at each step of the proof.

Details

The proof verifier

  • Proof checking programs have been developed over the last thirty or so years. The MD should write or adapt one or more Proof Verifiers and improve it incrementally as a result of experience in running the system. In this post I have assumed the use of just one Proof Verifier.
  • The Proof Verifier should be designed to read the output of the SME-PV converter.
  • The PV must read a whole math text in SME, identify and record each claim and check each proof (among other things). This is different from current proof verifiers, which take exactly one proof as input.
  • The PV must create the context of each proof and change it step by step as it reads each syntactic fragment of the math text.
  • Typically the context for a claimed proof is built up in the whole math text, not just in the part called “Proof”.
  • The PV should automatically query the Data Base for unproved steps in a proof in the input text to see if they have already been verified somewhere else. These results should be quoted in a proof verifier output.
  • The PV should also automatically submit steps in the proof that haven’t been verified to the Automatic Theorem Provers and wait for the step to be verified or not.
  • The Proof Verifier should output details of the result of the checking whether it succeeded in verifying the whole input text or not. In particular, it should list steps in proofs it failed to verify, including steps in proofs for which the input text cited the proof in some other paper, in the MD system or not.
  • The Proof Verifier should be available online for anyone to submit, in SME, a mathematical text claiming to prove a theorem. Submission might require a small charge.

Strict Math English

  • One of the most important aspects of the system would be the simultaneous incremental updating of the SME and the SME-PV Converter.
  • The idea is that SME would get more and more inclusive of the phrases and clauses it allows.

Example: Universal Assertions

At the start SME might allow these statements to be recognized as the same universal assertion:

  • “$\forall x(x^2+1\gt0)$”
  • “For all [every, any] $x$, $x^2+1\gt0$.” (universality asserted using an English word.)
  • “For all [every, any] $x$, $x^2+1$ is positive.”

As time goes on, a person or the Data Base Miner might detect that many annotators also recognized these statements as saying the same thing:

  • “$x^2+1\gt0\,\,\,\,\,(\text{all } x)$” (as a displayed statement)
  • “$x^2+1$ is positive for every $x$.” Universality asserted using an adjective in a postposited phrase.
  • “$x^2+1$ is always positive.” Universality hidden in a postposited adverb that seems to be referring to time!
  • There are more examples in my article Universally True Assertions. See also Susanna Epp’s article on quantification for other problems in this area.

These other variations would then be added to the Strict Math Language. (This is only an example of how the system would evolve. I have no doubt that in fact all the terminology mentioned above would be included at the outset, since they are all documented in the math ed literature.)

Even at the start, SME will include phrases and clauses in the English language as well as symbolic expressions. It is notorious that automatically parsing general English sentences is difficult and that the ubiquity of metaphors makes it essentially impossible to reliably construct the meaning of a sentence. That is why SME must start with a very narrow subset of math English. But even in early days, it should include some stereotyped metaphors, such as using “always” in universal assertions.

The SME-PV Converter

  • The SME-PV Converter would read documents written in SME and convert them into code readable by the proof checking program, as well as by the automatic theorem provers.
  • Such a program is essentially the subject of Ganesingalam’s book.
  • Converting SME so that the Proof Verifier can handle it involves lots of subtleties. For example, if the text says, “For any $x$, $x^2+1\gt0$”, the translation has to recognize not only that this is a universally quantified statement with $x$ as the bound variable, but that $x$ must be a real number, since complex numbers don’t do greater-than.
  • Frequent revisions of the SME-PV Converter will be necessary since its input language, the SME, will be constantly expanded.
  • It may be that the output language of the SME-PV Converter (which the Proof Verifier and Automatic Theorem Provers read) will require only infrequent revisions.

The Automatic Theorem Provers

  • The system could support several ATP’s, each one adapted to read the output of the SME-PV Converter.
  • The Automatic Theorem Provers should provide output in such a way that the Proof Verifier can include in its report the positive or negative results of the Theorem Prover in detail.

The Annotation System

  • The Annotation system would facilitate construction of a data structure that connects each annotation to the specific piece of text it rewrites. The linking should be facilitated by the Annotation Editor.
  • For example, an annotation that is meant to explain that the statement (in the input text) “$x^2+1$ is always greater than $0$” is to be translated as “$\forall x(x^2+1\gt0)$” (which is presumably allowed by SME) should cause the first statement to be be linked to the second statement. The first statement, the one in the input text, should not be changed. This will enable the Data Base Miner to find patterns of similar text being annotated in similar ways.
  • The annotations should clarify words, symbolic expressions and sentences in the input text to allow the Proof Verifier to input them correctly.
  • In particular, every claim that a statement is true should be marked as a proposed theorem, and similarly every proof should be marked as a proof and every definition should be marked as a definition. Such labeling is often omitted in the math literature. Annotators would have to recognize segments of the text as claims, proofs and definitions and annotate them as such.
  • The annotations would be written in the current version of Strict Math English. Since SME is frequently updated, the instructions for the annotator would also have to be frequently updated.

Examples

  • If a paper used the word “domain” without defining it, the annotator would clarify whether it meant an open connected set, a type of ring, a type of poset, or the domain of a function. See Example 1
  • Annotators will note instances in which the same text will use a symbol with two different meanings. See Example 2.
  • In a phrase, a single occurrence of a symbol can require an annotation that assigns more than one attribute to the symbol. See Example 3.

The Annotation Editor

  • The annotators should be provided with an Annotation Editor designed specifically for annotation.
  • The editor should include a system of linking an annotation to the exact phrase it annotates that is easy for a person reading the annotated document to understand it as well as providing the information to the Text-SME Converter.

The Annotators

  • Great demands will be made of an annotator.
  • They must understand the detailed meaning of the text they annotate. This means they must be quite familiar with the field of math the text is concerned with.
  • They must learn SME. I know for a fact that many mathematicians are not good at learning foreign languages. It will help that SME will be a subset of the full language of math.
  • All this means that annotators must be chosen carefully and paid well. This means that not very many papers will get annotated by paid annotators, so that there will have to be some committee that chooses the papers to be annotated. This will be a genuine bottleneck.
  • One thing that will help in the long run is that the SME should evolve to include more features of the general language of math, so many mathematicians will actually write their papers in SME and submit it directly to the Depository. (“Long run” may mean more than ten years).

The Text-to-SME Converter

  • This converter takes a math text in ordinary Math English that has been annotated and convert it into SME.
  • The format for feeding it to the Automatic Theorem Prover may very well have to be different from the format to be read by a human. Both formats should be saved.

The Data Base

  • The Data Base would contain all math papers that have been run through the Proof Verifier, along with the results found by the Proof Verifier. A paper should be included whether or not every claim in the paper was verified.
  • Funding agencies (and private individuals) might choose particularly important papers and pay more money for annotation for those than for other papers.
  • Mathematicians in a particular field could be hired to annotate particular articles in their field, using a standard annotation language that would develop through time.
  • The annotated papers would be made freely available to the public.
  • It will no doubt prove useful for the Data Base to contain many other items. Possibilities:
  • A searchable list of all theorems that have been verified.
  • A glossary: a list of math words that have been defined in the papers in the Depository. This will include synonyms and words with multiple meanings.

The Data Base Miner

Watch for patterns

The DBM would watch for patterns in annotation as new annotated papers were submitted. It should probably look only at annotated papers whose proofs had been verified. The patterns might include:

  • Correlation between annotations that associate particular meanings to particular words or symbols with the branch of math the paper belongs to. See Example 1.
  • Noting that a particular format of combining symbols usually results in the same kind of annotation. See Example 4.
  • Providing data in such a way that lexicographers studying math English could make use of them. My Handbook began with my doing lexicographical research on math English, but I found it so slow that when I started abstractmath.org I resolved not to such research any more. Nevertheless, it needs to be done and the Database should make the process much easier.

Statistical translation

Since the annotated papers will be stored in the Data Base, the Data Base Miner could use the annotations in somewhat the same way some language translators work (in part): to translate a phrase, it will find occurrences of the phrase in the source language that have been translated into the target language and use the most common translation. In this case the source language is the paper (in English) and the target language is in annotated math English readable by the Proof Verifier. Once the Database includes most of the papers ever published (twenty years from now?), statistical translation might actually become useful.

Examples

Example 1: Meaning varies with branch of math

  • Field” means one thing in an algebra paper and another in a mathematical physics paper.
  • Domain” means
  • An open connected set in topology.
  • A type of ring in algebra.
  • A type of poset in theoretical computing science.
  • The domain of a function –everywhere in math, which makes it seem that this is going to be very hard to distinguish without human help!
  • Log” usually implies base $2$ in the computing world, base $10$ in engineering (but I am not sure how prevalant this meaning is there), and base $e$ in pure math. With exceptions!
  • Example 2: Meaning varies even in the same article

    • The notation “$(a,b)$” can mean an ordered pair, an open interval, or the GCD. What’s worse, there are many instances where the symbol is used without definition. Citation 139 in the Handbook provides a single sentence in which the first two meanings both occur:

      $\dots$ Richard Darst and Gerald Taylor investigated the differentiability of functions $f^p$ (which for our purposes we will restrict to $(0,1)$) defined for each $p\geq1$ by\[F(x):=
      \begin{cases}
      0 &
      \text{if }x\text{ is irrational}\\
      \displaystyle{\frac{1}{n^p}} &
      \text{if }x = \displaystyle{\frac{m}{n}}\text{ with }(m,n)=1\\ \end{cases}\]

      The sad thing is that any mathematician will know immediately what each occurrence means. This may be a case where the correct annotation will never be automatically detectable.

    Example 3: One mention of a symbol may require several meanings

    In the sentence, “This infinite series converges to $\zeta(2)=\frac{\pi^2}{6}\approx 1.65$,” the annotator would provide two pieces of information about “$\frac{\pi^2}{6}$”, namely that it is both the right constituent of the equation “$\zeta(2)=\frac{\pi^2}{6}$” and the left constituent of the approximation statement “$\frac{\pi^2}{6}\approx 1.65$” — and that these two statements were the constituents of an asserted conjunction. (See my post Pivoted symbols.)

    Example 4: Function to a power

    Some expressions not in the SME will almost always be annotated in the same way. This makes it discoverable by the Data Base Miner.

    • “$\sin^{-1}x$” always means $\arcsin x$.
    • For positive $n$, “$\sin^n x$” always means $(\sin x)^n$. It never means the $n$-fold application of $\sin$ to $x$.
    • In contrast, for an arbitrary function symbol, $f^n(x)$ will often be annotated as $n$-fold application of $f$ and also often as $f(x)^n$. (And maybe those last two possibilities are correlated by branch of math.)

    References

    I believe that work in formal verification has tended to overlook the work on math language difficulties in math ed, so I have included some articles from that specialty.

    The following are posts from my blog Gyre&Gimble. They are in reverse chronological order.

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.


    Send to Kindle

    Abstraction and axiomatic systems

    Abstraction and the axiomatic method

    This post will become an article in abstractmath.org.

    Abstraction

    An abstraction of a concept $C$ is a concept $C’$ with these properties:

    • $C’$ includes all instances of $C$ and
    • $C’$ is constructed by taking as axioms certain assertions that are true of all instances of $C$.

    There are two major situations where abstraction is used in math.

    • $C$ may be a familiar concept or property that has not yet been given a math definition.
    • $C$ may already have a mathe­matical definition using axioms. In that case the abstraction will be a generalization of $C$. 

    In both cases, the math definition may allow instances of $C’$ that were not originally thought of as being part of $C$.

    Example: Relations

    Mathematicians have made use of relations between math objects since antiquity.

    • For real numbers $r$ and $s$. “$r\lt x$” means that $r$ is less than $s$. So the statement “$5\lt 7$” is true, but the statement “$7\lt 5$” is false. We say that “$\lt$” is a relation on the real numbers. Other relations on real numbers denoted by symbols are “$=$” and “$\leq$”.
    • Suppose $m$ and $n$ are positive integers. $m$ and $n$ are said to be relatively prime if the greatest common divisor of $m$ and $n$ is $1$. So $5$ and $7$ are relatively prime, but $15$ and $21$ are not relatively prime. So being relatively prime is a relation on positive integers. This is a relation that does not have a commonly used symbol.
    • The concept of congruence of triangles has been used for a couple of millenia. In recent centuries it has been denoted by the symbol “$\cong$”. Congruence is a relation on triangles.

    One could say that a relation is a true-or-false statement that can be made about a pair of math objects of a certain type. Logicians have in fact made that a formal definition. But when set theory came to be used around 100 years ago as a basis for all definitions in math, we started using this definition:

    A relation on a set $S$ is a set $\alpha$ of ordered pairs of elements of $S$.

    “$\alpha$” is the Greek letter alpha.

    The idea is that if $(s,t)\in\alpha$, then $s$ is related by $\alpha$ to $t$, then $(s,t)$ is an element of $\alpha$, and if $s$ is not related by $\alpha$ to $t$, then $(s,t)$ is not an element of $\alpha$. That abstracts the everyday concept of relationship by focusing on the property that a relation either holds or doesn’t hold between two given objects.

    For example, the less-than relation on the set of all real numbers $\mathbb{R}$ is the set \[\alpha:=\{(r,s)|r\in\mathbb{R}\text{ and }s\in\mathbb{R}\text{ and }r\lt s\}\] In other words, $r\lt s$ if and only if $(r,s)\in \alpha$.

    Example

    A consequence of this definition is that any set of ordered pairs is a relation. Example: Let $\xi:=\{(2,3),(2,9),(9,1),(9,2)\}$. Then $\xi$ is a relation on the set $\{1,2,3,9\}$. Your reaction may be: What relation IS it? Answer: just that set of ordered pairs. You know that $2\xi3$ and $2\xi9$, for example, but $9\xi1$ is false. There is no other definition of $\xi$.

    Yes, the relation $\xi$ is weird. It is an arbitrary definition. It does not have any verbal description other than listing the element of $\xi$. It is probably useless. Live with it.

    The symbol “$\xi$” is a Greek letter. It looks weird, so I used it to name a weird relation. Its upper case version is “$\Xi$”, which is even weirder. I pronounce “$\xi$” as “ksee” but most mathematicians call it “si” or “zi” (rhyming with “pie”).

    Defining a relation as any old set of ordered pairs is an example of a reconstructive generalization.

    $n$-ary relations

    Years ago, mathematicians started coming up with things that were like relations but which involved more than two elements of a set.

    Example

    Let $r$, $s$ and $t$ be real numbers. We say that “$s$ is between $r$ and $t$” if $r\lt s$ and $s\lt t$. Then betweenness is a relation that is true or false about three real numbers.

    Mathematicians now call this a ternary relation. The abstract definition of a ternary relation is this: A ternary relation on a set $S$ is a set of ordered triple of elements of $S$. This is an reconstructive generalization of the concept of relation that allows ordered triples of elements as well as ordered pairs of elements.

    In the case of betweenness, we have to decide on the ordering. Let us say that the betweenness relation holds for the triple $(r,s,t)$ if $r\lt s$ and $s\lt t$. So $(4,5,7)$ is in the betweenness relation and $(4,7,5)$ is not.

    You could argue that in the sentence, “$s$ is between $r$ and $t$”, the $s$ comes first, so that we should say that the betweenness relation (meaning $r$ is between $s$ and $t$) holds for $(r,s,t)$ if $s\lt r$ and $r\lt t$. Well, when you write an article you can write it that way. But I am writing this article.

    Nowadays we talk about $n$-ary relations for any positive integer $n$. One consequence of this is that if we want to talk just about sets of ordered pairs we must call them binary relations.

    When I was a child there was only one kind of guitar and it was called “a guitar”. (My older cousin Junior has a guitar, but I had only a plastic ukelele.) Some time in the fifties, electrically amplified guitars came into being, so we had to refer to the original kind as “acoustic guitars”. I was a teenager when this happened, and being a typical teenager, I was completely contemptuous of the adults who reacted with extreme irritation at the phrase “acoustic guitar”.

    The axiomatic method

    The axiomatic method is a technique for studying math objects of some kind by formulating them as a type of math structure. You take some basic properties of the kind of structure you are interested in and set them down as axioms, then deduce other properties (that you may or may not have already known) as theorems. The point of doing this is to make your reasoning and all your assumptions completely explicit.

    Nowadays research papers typically state and prove their theorems in terms of math structures defined by axioms, although a particular paper may not mention the axioms but merely refer to other papers or texts where the axioms are given.  For some common structures such as the real numbers and sets, the axioms are not only not referenced, but the authors clearly don’t even think about them in terms of axioms: they use commonly-known properties (or real numbers or sets, for example) without reference.

    The axiomatic method in practice

    Typically when using the axiomatic method some of these things may happen:

    • You discover that there are other examples of this system that you hadn’t previously known about.  This makes the axioms more broadly applicable.
    • You discover that some properties that your original examples had don’t hold for some of the new examples.  Depending on your research goals, you may then add some of those properties to the axioms, so that the new examples are not examples any more.
    • You may discover that some of your axioms follow from others, so that you can omit them from the system.

    Example: Continuity

    A continuous function (from the set of real numbers to the set of real numbers) is sometimes described as a function whose graph you can draw without lifting your chalk from the board.  This is a physical description, not a mathe­matical definition.

    In the nineteenth century, mathe­ma­ticians talked about continuous functions but became aware that they needed a rigorous definition.  One possibility was functions given by formulas, but that didn’t work: some formulas give discontinuous functions and they couldn’t think of formulas for some continuous functions.

    This description of nineteenth century math is an oversimpli­fication.

    Cauchy produced the definition we now use (the epsilon-delta definition) which is a rigorous mathe­matical version of the no-lifting-chalk idea and which included the functions they thought of as continuous.

    To their surprise, some clever mathe­maticians produced examples of some weird continuous functions that you can’t draw, for example the sine blur function.  In the terminology in the discussion of abstraction above, the abstraction $C’$ (epsilon-delta continuous functions) had functions in it that were not in $C$ (no-chalk-lifting functions.) On the other hand, their definition now applied to functions between some spaces besides the real numbers, for example the complex numbers, for which drawing the graph without lifting the chalk doesn’t even make sense.

    Example: Rings

    Suppose you are studying the algebraic properties of numbers.  You know that addition and multiplication are both associative operations and that they are related by the distributive law:  $x(y+z)=xy+xz$. Both addition and multiplication have identity elements ($0$ and $1$) and satisfy some other properties as well: addition forms a commutative group for example, and if $x$ is any number, then $0\cdot x=0$.

    One way to approach this problem is to write down some of these laws as axioms on a set with two binary operations without assuming that the elements are numbers. In doing this, you are abstracting some of the properties of numbers.

    Certain properties such as those in the first paragraph of this example were chosen to define a type of math structure called a ring. (The precise set of axioms for rings is given in the Wikipedia article.)

    You may then prove theorems about rings strictly by logical deduction from the axioms without calling on your familiarity with numbers.

    When mathematicians did this, the following events occurred:

    • They discovered systems such as matrices whose elements are not numbers but which obey most of the axioms for rings.
    • Although multiplication of numbers is commutative, multiplication of matrices is not commutative.
    • Now they had to decide whether to require commutative of multiplication as an axioms for rings or not.  In this example, historically, mathe­maticians decided not to require multi­plication to be commutative, so (for example) the set of all $2\times 2$ matrices with real entries is a ring.
    • They then defined a commutative ring to be a ring in which multi­plication is commutative.
    • So the name “commutative ring” means the multiplication is commutative, because addition in rings is always commutative. Mathematical names are not always transparent.

    • You can prove from the axioms that in any ring, $0 x=0$ for all $x$, so you don’t need to include it as an axiom.

    Nowadays, all math structures are defined by axioms.

    Other examples

    • Historically, the first example of something like the axiomatic method is Euclid’s axiomatization of geometry.  The axiomatic method began to take off in the late nineteenth century and now is a standard tool in math.  For more about the axiomatic method see the Wikipedia article.
    • Partitions. and equivalence
      relations
      are two other concepts that have been axiomatized. Remarkably, although the axioms for the two types of structures are quite different, every partition is in fact an equivalence relation in exactly one way, and any equivalence relation is a partition in exactly one way.

    Remark

    Many articles on the web about the axiomatic method emphasize the representation of the axiom system as a formal logical theory (formal system). 
    In practice, mathematicians create and use a particular axiom system as a tool for research and understanding, and state and prove theorems of the system in semi-formal narrative form rather than in formal logic.


    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle

    Rabbits out of a hat

    This is a revision and expansion of the entry on rabbits in the abstractmath article Dysfunctional attitudes and behaviors.

    Rabbits

    Sometimes when you are reading or listening to a proof you will find yourself following each step but with no idea why these steps are going to give a proof. This can happen with the whole structure of the proof or with the sudden appearance of a step that seems like the prover pulled a rabbit out of a hat . You feel as if you are walking blindfolded.

    Example (mysterious proof structure)

    The lecturer says he will prove that for an integer $n$, if $n^2$ is even then $n$ is even. He begins the proof: Let $n^2$ be odd” and then continues to the conclusion, “Therefore $n$ is odd.”

    Why did he begin a proof about being even with the assumption that $n$ is odd?

    The answer is that in this case he is doing a proof by contrapositive . If you don’t recognize the pattern of the proof you may be totally lost. This can happen if you don’t recognize other forms, for example contradiction and induction.

    Example (rabbit)

    You are reading a proof that $\underset{x\to2}{\mathop{\lim }}{{x}^{2}}=4$. It is an $\varepsilon \text{-}\delta$ proof, so what must be proved is:

    • For any positive real number $\varepsilon $,
    • there is a positive real number $\delta $ for which:
    • if $\left| x-2 \right|\lt\delta$ then
    • $\left| x^2-4 \right|\lt\varepsilon$.

    Proof

    Here is the proof, with what I imagine might be your agitated reaction to certain steps. Below is a proof with detailed explanations .

    1) Suppose $\varepsilon \gt0$ is given.

    2) Let $\delta =\text{min}\,(1,\,\frac{\varepsilon }{5})$ (the minimum of the two numbers 1 and $\frac{\varepsilon}{5}$ ).

    Where the *!#@! did that come from? They pulled it out of thin air! I can’t see where we are going with this proof!

    3) Suppose that $\left| x-2 \right|\lt\delta$.

    4) Then $\left| x-2 \right|\lt1$ by (2) and (3).

    5) By (4) and algebra, $\left|x+2 \right|\lt5$.

    Well, so what? We know that $\left| x+39\right|\lt42$ and lots of other things, too. Why did they do this?

    6) Also $\left| x-2 \right|\lt\frac{\varepsilon }{5}$ by (2) and(3).

    7) Then $\left| {{x}^{2}}-4\right|=\left| (x-2)(x+2) \right|\lt\frac{\varepsilon }{5}\cdot 5=\varepsilon$ by (5) and (6). End of Proof.

    Remarks

    This proof is typical of proofs in texts.

    • Steps 2) and 5) look like they were rabbits pulled out of a hat.
    • The author gives no explanation of where they came from.
    • Even so, each step of the proof follows from previous steps, so the proof is correct.
    • Whether you are surprised or not has nothing to do with whether it is correct.
    • In order to understand a proof, you do not have to know where the rabbits came from.
    • In general, the author did not think up the proof steps in the order they occur in the proof. (See this remark in the section on Forms of Proofs.) See also look ahead.

    Proof with detailed explanations

    1. Suppose $\varepsilon >0$ is given. (We are starting a proof by universal generalization.)
    2. Let $\delta$ be the minimum of the two numbers $1$ and $\frac{\varepsilon}{5}$). (Rabbit out of the hat. You can “let” any symbol mean anything you want, so this is a legitimate thing to do even if you don’t see where this is all going.{
    3. Suppose $\left|x-2\right|\lt\delta$. (We are about to prove the conditional statement “If $\left| x-2 \right|\lt\delta$ then $\left| {{x}^{2}}-4 \right|\lt\varepsilon$” and we are proceeding by the direct method.)
    4. Then $\left| x-2 \right|\lt 1$ by (2) and (3). (The fact that $\delta =\text{min}\,(1,\,\frac{\varepsilon }{5})$ means that $\delta \le 1$ and that $\delta \le \frac{\varepsilon }{5}$. Since $\left| x-2 \right|\lt \delta $, the statement $\left| x-2 \right|\lt 1$ follows by transitivity of “$\lt $”. This is another rabbit. WHY do we want $\left| x-2 \right|\lt 1$? Be Patient.)
    5. By (4) and algebra, $\left| x+2 \right|\lt 5$. ($\left| x-2 \right|\lt 1$ means that $-1\lt x-2\lt 1$. Add $4$ to each term in this equation to get $3\lt x+2\lt 5$. This is another rabbit, but it is a correct statement!)
    6. Also $\left| x-2 \right|\lt \frac{\varepsilon }{5}$ by (2) and (3). ((2) says that $\delta\le\frac{\varepsilon }{5}$ and (3) says that $\left| x-2 \right|\lt\delta$, so $\left| x-2 \right|\lt \frac{\varepsilon }{5}$ follows by transitivity.)
    7. Then $\left| {{x}^{2}}-4\right|=\left| (x-2)(x+2) \right|\lt\frac{\varepsilon }{5}\cdot 5=\varepsilon$ by (5) and (6). End of Proof. (This last statement actually shows the algebra.)

    Coming up with that proof

    The author did not think up the proof steps in the order they occur in the proof. She looked ahead at the goal of proving that \[\left| {{x}^{2}}-4\right|\lt\varepsilon\] and thought of factoring the left side. Now she must prove that \[\left| (x-2)(x+2) \right|\lt\varepsilon\]

    But if $\left|x-2\right|$ is small then $x$ has to be close to $2$, so that $x + 2$ can’t be too big. Since the only restriction on $\delta$ is that it has to be positive, let’s restrict it to being smaller than $1$. (The choice of $1$ is purely arbitrary. Any positive real number would do.)

    In that case step (5) shows that $\left|x+2\right|\lt5$.. So how small do you have to make to make $\varepsilon$? In other words, how small do you have to make $\delta $ to make $\left| 5(x-2) \right|\lt\varepsilon$ (remembering that $\left| x-2 \right|\lt\delta $). Well, clearly $\frac{\varepsilon }{5}$ will do!

    That explains her choice of $\delta$ be the minimum of the two numbers $1$ and $\frac{\varepsilon}{5}$. Notice that that choice is made very early in the proof but it was made only after experimenting with the sizes of $\left|x-2\right|$ and $\left|x+2\right|$.

    You can check that if she had chosen to restrict $\delta $ to being less than 42, then she would need $\delta =\text{min}\,(42,\,\frac{\varepsilon }{47})$.

    Acknowledgments

    Thanks to Robert Burns for corrections and suggestions

    Send to Kindle

    Forms of proofs

    Abstractmath.org is a website I have been maintaining since 2005. It is intended for people beginning the study of abstract math, often a course that requires proofs and thinking about mathematical structures. The Introduction to the website and the article Attitude explain the website in more detail.

    One of the chapters in abstractmath.org covers Proofs. As everywhere in abstractmath.org, there is no attempt at complete coverage: the emphasis is on aspects that cause difficulty for abstraction-newbies. In the case of proofs, this includes sections on how proofs are written (math language is a big emphasis all over abstractmath.org). One of those sections is Forms of Proof. This post is a fairly extensive revision of that section.

    More than half of the section on Proofs has already been revised (the ones entitled “abstractmath.org 2.0)”, and my current task is to finish that revision.

    Normally, I post the actual article here on Gyre&Gimble, but something has changed in the operation of WordPress which causes the html processor to obey linebreaks in the input, which would make the article look chaotic.

    So this time, I have to ask you to click a button to read the revised section on Forms of Proof. I apologize for the excessive effort by your finger.
     

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle

    The many boobytraps of “if…then”

    CONTENTS

    The truth table for conditionals

    Conditionals and truth sets

    Vacuous truth

    Universal conditional assertions

    Related assertions

    Understanding conditionals

    Modus ponens

    CONDITIONAL ASSERTIONS

    This section is concerned with logical construc­tions made with the connective called the conditional operator. In mathe­matical English, applying the conditional operator to $P$ and $Q$ produces a sentence that may bewritten, “If $P$, then $Q$”, or “$P$ implies$Q$”. Sentences of this form are conditional assertions.

    Conditional assertions are at the very heart of mathematical reasoning. Mathematical proofs typically consist of chains of conditional assertions.

    Some of the narrative formats used for proving conditional assertions are discussed in Forms of Proof.

    The truth table for conditional assertions

    A conditional assertion “If $P$ then $Q$” has the precise truth table shown here.

     

    $P$ $Q$ If $P$ then $Q$
    T T T
    T F F
    F T T
    F F T

    The meaning of “If $P$ then $Q$” is determined entirely by the truth values of $P$ and $Q$ and this truth table. The meaning is not determined by the usual English meanings of the words “if” and “then”.

    The truth table is summed up by this purple pronouncement:

    The Prime Directive of conditional assertions:
    A conditional assertion is true unless
    the hypothesis is true and the conclusion is false.
    That means that to prove “If $P$ then $Q$” is  FALSE
    you must show that $P$ is TRUE(!) and $Q$ is FALSE.

    The Prime Directive is harder to believe in than leprechauns. Some who are new to abstract math get into an enormous amount of difficulty because they don’t take it seriously.

    Example

    The statement “if $n\gt 5$, then $n\gt 3$” is true for all integers
    $n$.

    • This means that “If $7\gt 5$ then $7\gt 3$” is true.
    • It also means that “If $2\gt 5$ then $2\gt 3$” is true!  If you really believe that “If $n\gt 5$, then $n\gt 3$” is true for all integers n, then you must in particular believe that  “If $2\gt 5$ then $2\gt 3$” is true.  That’s why the truth table for conditional assertions takes the form it does.
    • On the other hand, “If $n\gt 5$, then $n\gt 8$” is not true for all integers $n$.  In particular, “If $7\gt 5$, then $7\gt 8$” is false. This fits what the truth table says, too.

    For more about this, see Understanding conditionals.

    Remark

    Most of the time in mathematical writing the conditional assertions which are actually stated involve assertions containing variables, and the claim is typically that the assertion is true for all instances of the variables. Assertions involving statements without variables occur only implicitly in the process of checking instances of the assertions. That is why a statement such as, “If $2\gt 5$ then $2\gt 3$” seems awkward and unfamiliar.

    It is unfamiliar and occurs rarely. I mention it here because of the occurrence of vacuous truths, which do occur in mathematical writing.

    Conditionals and Truth Sets

    The set $\{x|P(x)\}$ is the set of exactly all $x$ for which $P(x)$ is true. It is called the truth set of $P(x)$.

    Examples
    • If $n$ is an integer variable, then the truth set of “$3\lt n\lt9$” is the set $\{4,5,6,7,8\}$.
    • The truth set of “$n\gt n+1$” is the empty set.

    Weak and strong

    “If $P(x)$ then $Q(x)$” means that $\{x|P(x)\}\subseteq
    \{x|Q(x)\}$.  We say $P(x)$ is stronger than $Q(x)$, meaning that $P$ puts more requirements on $x$ than $Q$ does.  The objects $x$ that make $P$ true necessarily make $Q$ true, so there might be objects making $Q$ true that don’t make $P$ true.

    Example

    The statement “$x\gt4$” is stronger than the statement “$x\gt\pi$”. That means that $\{x|x\gt4\}$ is a proper subset of $\{x|x\gt\pi\}$. In other words, $\{x|x\gt4\}$ is “smaller” than $\{x|x\gt\pi\}$ in the sense of subsets. For example, $3.5\in\{x|x\gt\pi\}$ but $3.5\notin\{x|x\gt4\}$. This is a kind of reversal (a Galois correspondence) that confused many of my students.

    “Smaller” means the truth set of the stronger statement omits elements that are in the truth set of the weaker statement. In the case of finite truth sets, “smaller” also means it has fewer elements, but that does not necessarily work for infinite sets, such as in the example above, because the two truth sets $\{x|x\gt4\}$ and $\{x|x\gt\pi\}$ have the same cardinality.

    Making a statement stronger
    makes its truth set “smaller”.

    Terminology and usage

    Hypothesis and conclusion

    In the assertion “If $P$, then $Q$”:

    • P is the hypothesis or antecedent
      of the assertion.  It is a constraint or condition that holds in the very narrow context of the assertion.  In other words, the assertion, “If $P$, then $Q$” does not say that $P$ is true. The idea of the direct method of proof is to assume that $P$ is true during the proof.
    • $Q$ is the conclusion or consequent. It is also incorrect to assume that $Q$ is true anywhere else except in the assertion “If $P$, then $Q$”.

    “Implication”

    Conditionals such as “If $P$ then $Q$” are also called implications , but be wary: “implication” is a technical term and does not fit the meaning of the word in conversational English.

    • In ordinary English, you might ask, “What are the implications of knowing that $x\gt4$? Answer: “Well, for one thing, $x$ is bigger that $\pi$.”
    • In the terminology of math and logic, the whole statement “If $x\gt4$ then $x\gt\pi$” is called an “implication”.

    Vacuous truth

    The last two lines of the truth table for conditional assertions mean that if the hypothesis of the assertion is false, then the assertion is automatically true.
    In the case that “If $P$ then $Q$” is true because $P$ is false, the assertion is said to be vacuously true.

    The word “vacuous” refers to the fact that in the vacuous case the conditional assertion says nothing interesting about either $P$ or $Q$. In particular, the conditional assertion may be true even if the conclusion is false (because of the last line of the truth table).

    Example

    Both these statements are vacuously true!

    • If $4$ is odd, then $3 = 3$.
    • If $4$ is odd, then $3\neq3$.
    Example

    If $A$ is any set then $\emptyset\subseteq A.$ Proof (rewrite by definition): You have to prove that if $x\in\emptyset$, then $x\in A$. But the statement “$x\in\emptyset$” is false no matter what $x$ is, so the statement “$\emptyset\subseteq A$” is vacuously true.

    Definitions involving vacuous truth

    Vacuous truth can cause surprises in connection with certain concepts which are defined using a conditional assertion.

    Example
    • Suppose $R$ is a relation on a set $S$. Then $R$ is antisymmetric if the following statement is true: If for all $x,y\in S$, $xRy$ and $yRx$, then $x=y$.
    • For example, the relation “$\leq$” on the real numbers is antisymmetric, because if $x\leq y$ and $y\leq x$, then $x=y$.
    • The relation “$\lt$” on the real numbers is also antisymmetric. It is vacuously antisymmetric, because the statement

      (AS) “if $x\gt y$ and $y\gt x$, then $x = y$”

      is vacuously true. If you say it can’t happen that $x\gt y$ and $y\gt x$, you are correct, and that means precisely that (AS) is vacuously true.

    Remark

    Although vacuous truth may be disturbing when you first see it, making either statement in the example false would result in even more peculiar situations. For example, if you decided that “If $P$ then $Q$” must be false when $P$ and $Q$ are both false, you would then have to say that this statement

    “For any integers $m$ and $n$, if $m\gt 5$ and $5\gt n$, then $m\gt n$”
     

     

    is not always true (substitute $3$ for $m$ and $4$ for $n$ and you get both $P$ and $Q$ false). This would surely be an unsatisfactory state of affairs.

    How conditional assertions are worded

    A conditional assertion may be worded in various ways.  It takes some practice to get used to understanding all of them as conditional.

    Our habit of swiping English words and phrases and changing their meaning in an unintuitive way causes many problems for new students, but I am sure that the worst problem of that kind is caused by the way conditional assertions are worded.

    In math English

    The most common ways of wording a conditional assertion with hypothesis $P$ and conclusion $Q$ are:

    • If $P$, then $Q$.
    • $P$ implies $Q$.
    • $P$ only if $Q$.
    • $P$ is sufficient for $Q$.
    • $Q$ is necessary for $P$.

    In the symbolic language

    • $P(x)\to Q(x)$
    • $P(x)\Rightarrow Q(x)$
    • $P(x)\supset Q(x)$

    Math logic is notorious for the many different symbols used by different authors with the same meaning. This is in part because it developed separately in three different academic areas: Math, Philosophy and Computing Science.

    Example

    All the statements below mean the same thing. In these statements $n$ is an integer variable.

    • If $n\lt5$, then $n\lt10$.
    • $n\lt5$ implies $n\lt10$.
    • $n\lt5$ only if $n\lt10$.
    • $n\lt5$ is sufficient for $n\lt10$.
    • $n\lt10$ is necessary for $n\lt5$.
    • $n\lt5\to n\lt10$
    • $n\lt5\Rightarrow n\lt10$
    • $n\lt5\supset n\lt10$

    Since “$P(x)\supset Q(x)$” means that $\{x|P(x)\}\subseteq
    \{x|Q(x)\}$, there is a notational clash between implication written “$\supset $” and inclusion written “$\subseteq $”. This is exacerbated by the two meanings of the inclusion symbol “$\subset$”.

    These ways of wording conditionals cause problems for students, some of them severe. They are discussed in the section Understanding conditionals.

    Usage of symbols

    The logical symbols “$\to$”, “$\Rightarrow$”,
    “$\supset$” are frequently used when writing on the blackboard, but are not common in texts, except for texts in mathematical logic.

    More about implication in logic

    If you know some logic, you may know that there is a subtle difference between the statements

    • “If $P$ then $Q$”
    • “$P$ implies $Q$”.

    Here is a concrete example:

    1. “If $x\gt2$,  then $x$ is positive.”
    2. “$x\gt2$ implies that $x$ is positive.”

    Note that the subject of sentence (1) is the (variable) number $x$, but the subject of sentence (2) is the assertion
    “$x\lt2$”.   Behind this is a distinction made in formal logic between the material conditional “if $P$ then $Q$” (which means that $P$ and $Q$ obey the truth table for “If..then”) and logical consequence ($Q$ can be proved given $P$). I will ignore the distinction here, as most mathematicians do except when they are proving things about logic.

    In some texts, $P\Rightarrow Q$ denotes the material conditional and $P\to Q$ denotes logical consequence.

    Universal conditional assertions

    A conditional assertion containing a variable that is true for any value of the correct type of that variable is a universally true conditional assertion. It is a special case of the general notion of universally true assertion.

    Examples
    1. For all $x$, if $x\lt5$, then $x\lt10$.
    2. For any integer $n$, if $n^2$ is even, then $n$ is even.
    3. For any real number $x$, if $x$ is an integer, then $x^2$ is an integer.

    These are all assertions of the form “If $P(x)$, then $Q(x)$”. In (1), the hypothesis is the assertion “$x\lt5$”; in (2), it is the assertion “$n^2$ is even”, using an adjective to describe property that $n^2$ is even; in (3), it is the assertion “$x$ is an integer”, using a noun to assert that $x$ has the property of being an integer. (See integral.)

    Expressing universally true conditionals in math English

    The sentences listed in the example above provide ways of expressing universally true conditionals in English. They use “for all” or “for any”, You may also use these forms (compare in this discussion of universal assertions in general.)

    • For all functions $f$, if $f$ is differentiable then it is continuous.
    • For (every, any, each) function $f$, if $f$ is differentiable then it is continuous.
    • If $f$ is differentiable then it is continuous, for any function $f$.
    • If $f$ is differentiable then it is continuous, where $f$ is any function.
    • If a function $f$ is differentiable, then it is continuous. (See indefinite article.)

    Sometimes mathematicians write, “If the function $f$ is differentiable, then it is continuous.” At least sometimes, they mean that every function that is differentiable is continuous. I suspect that this usage occurs in texts written by non-native-English speakers.

    Disguised conditionals

    There are other ways of expressing universal conditionals that are disguised, because they are not conditional assertions in English.

    Let $C(f)$ mean that $f$ is continuous and and $D(f)$ mean that $f$ is differentiable. The (true) assertion

    “For all $f$, if $D(f)$, then $C(f)$”
     

     

    can be said in the following ways:

    1. Every (any, each) differentiable function is continuous.
    2. All differentiable functions are continuous.
    3. Differentiable functions are continuous. Or: “…are always continuous.”
    4. A differentiable function is continuous.
    5. The differentiable functions are continuous.

    Notes

    • Watch out for (4). Beginning abstract math students sometimes don’t recognize it as universal. They may read it as “Some differentiable function is continuous.” Authors often write, “A differentiable function is necessarily continuous.”
    • I believe that (5) is obsolescent. I don’t think younger native-English-speaking Americans would use it. (Warning: This claim is not based on lexicographical research.)

    Assertions related to a conditional assertion

    Converse

    The converse of a conditional assertion “If $P$ then $Q$” is “If $Q$ then $P$”.

    Whether a conditional assertion is true
    has no bearing on whether its converse it true.

    Examples
    • The converse of “If it’s a cow, it eats grass” is “If it eats grass, it’s a cow”. The first statement is true (let’s ignore the Japanese steers that drink beer or whatever), but the second statement is definitely false. Sheep eat grass, and they are not cows..
    • The converse of “For all real numbers $x$, if $x > 3$, then $x > 2$.” is “For all real numbers $x$, if $x > 2$, then $x > 3$.” The first is true and the second one is false.
    • “For all integers $n$, if $n$ is even, then $n^2$ is even.” Both this statement and its converse are true.
    • “For all integers $n$, if $n$ is divisible by $2$, then $2n +1$ is divisible by $3$.” Both this statement and its converse are false.

    Contrapositive

    The contrapositive of a conditional assertion “If $P$ then $Q$” is “If not $Q$ then not $P$.”

    A conditional assertion and its contrapositive
    are both true or both false.

    Example

    The contrapositive of
    “If $x > 3$, then $x > 2$”
    is (after a little translation)
    “If $x\leq2$ then $x\leq3$.”
    For any number $x$, these two statements are both true or both false.

    This means that if you prove “If not $P$ then not $Q$”, then you have also proved “If $P$ then $Q$.”

    You can prove an assertion by proving its contrapositive.

    This is called the contrapositive method and is discussed in detail in this section.

    So a conditional assertion and its contrapositive have the same truth value. Two assertions that have the same truth value are said to be equivalent. Equivalence is discussed with examples in the Wikipedia article on necessary and sufficient.

    Understanding conditional assertions

    As you can see from the preceding discussions, statements of the form “If $P$ then Q” don’t mean the same thing in math that they do in ordinary English. This causes semantic contamination.

    Examples

    Time

    In ordinary English, “If $P$ then $Q$” can suggest order of occurrence. For example, “If we go outside, then the neighbors will see us” implies that the neighbors will see us after we go outside.

    Consider “If $n\gt7$, then $n\gt5$.” If $n\gt7$, that doesn’t mean $n$ suddenly gets greater than $7$ earlier than $n$ gets greater than $5$. On the other hand, “$n\gt5$ is necessary for $n\gt7$” (which remember means the same thing as “If $n\gt7$, then $n\gt5$) doesn’t mean that $n\gt5$ happens earlier than $n\gt7$. Since we are used to “if…then” having a timing implication, I suspect we get subconscious dissonance between “If $P$ then $Q$” and “$Q$ is necessary for $P$” in mathematical statements, and this dissonance makes it difficult to believe that that can mean the same thing.

    Causation

    “If $P$ then $Q$” can also suggest causation. The the sentence, “If we go outside, the neighbors will see us” has the connotation that the neighbors will see us because we went outside.

    The contrapositive is “If the neighbors won’t see us, then we don’t go outside.” This English sentence seems to me to mean that if the neighbors are not around to see us, then that causes us to stay inside. In contrast to contrapositive in math, this means something quite different from the original sentence.

    Wrong truth table

    For some instances of the use of “if…then” in English, the truth table is different.

    Consider: “If you eat your vegetables, you can have dessert.” Every child knows that this means they will get dessert if they eat their vegetables and not otherwise. So the truth table is:






    $P$ $Q$ If $P$ then $Q$
    T T T
    T F F
    F T F
    F F T

    In other words, $P$ is equivalent to $Q$. It appears to me that this truth table corresponds to English “if…then” when a rule is being asserted.

    These examples show:

    The different ways of expressing conditional assertions
    may mean different things in English.

    How can you get to the stage where you automatically understand the meaning of conditional assertions in math English?

    You need to understand the equivalence of these formulations so well that it is part of your unconscious reaction to conditionals.

    How can you gain that intuitive understanding? One way is by doing abstract math regularly for several years! (Of course, this is how you gain expertise in anything.) In other words, Practice, Practice!

    Rigor

    But it may help to remember that when doing proofs, we must take the rigorous view of mathematical objects:

    • Math objects don’t change.
    • Math objects don’t cause anything to happen.

    The integers (like all math objects) just sit there, not doing anything and not affecting anything. $10$ is not greater than $4$ “because” it is greater than $7$. There is no “because” in rigorous math. Both facts, $10\gt4$ and $10\gt7$, are eternally true.

    Eternal is how we think of them – I am not making a claim about “reality”.

    • When you look at the integers, every time you find one that is greater than $7$ it turns out to be greater than $4$. That is how to think about “If $n > 7$, then $n > 4$”.
    • You can’t find one that is greater than $7$ unless it is greater than $4$: It happens that $n > 7$ only if $n > 4$.
    • Every time you look at one less than or equal to $4$ it turns out to be less than or equal to $7$ (contrapositive).

    These three observations describe the same set of facts about a bunch of things (integers) that just sit there in their various relationships without changing, moving or doing anything. If you keep these remarks in mind, you will eventually have a natural, unforced understanding of conditionals in math.

    Remark

    None of this means you have to think of mathematical objects as dead and fossilized all the time. Feel free to think of them using all the metaphors and imagery you know, except when you are reading or formulating a proof written in mathematical English. Then you have to be rigorous!

    Modus ponens

    The truth table for conditional assertions may be summed up by saying: The conditional assertion “If $P$, then $Q$” is true unless $P$ is true and $Q$ is false.

    This fits with the major use of conditional assertions in reasoning:

    Modus Ponens

    • If you know that a conditional assertion is true
    • and
      you know that its hypothesis is true,
    • then you know its conclusion is true.

    In symbols:

    (1) When “If $P$ then $Q$” and $P$ are both true,

    (2) then $Q$ must be true as well.

    Modus Ponens is the most used method of deduction of all.

    Remark

    Modus ponens is not a method of proving conditional assertions. It is a method of using a conditional assertion in the proof of another assertion. Methods for proving conditional assertions are found in the chapter Forms of proof.

    Creative Commons License<![endif]>

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

     


    Send to Kindle

    The role of proofs in mathematical writing

    This post outlines the way that proofs are used in mathematical writing. I have been revising the chapter on Proofs in abstractmath.org, and I felt that giving an overview would keep my mind organized when I was enmeshed in writing up complicated details.

    Proofs are the sole method for ensuring that a math statement is correct.

    • Evidence that something is true gooses us into trying to prove it, but as all research mathematicians know, evidence only means that some instances are true, nothing else.
    • Intuition, metaphors, analogies may lead us to come up with conjectures. If the gods are smiling that day, they may even suggest a method of proof. And that method may even (miracle) work. Sometimes. If it does, we get a theorem, but not a Fields medal.
    • Students may not know these facts about proof. Indeed, students at the very beginning probably don’t know what a proof actually is: “Proof” in math is not at all the same as “proof” in science or “proof” in law.

    A proof has two faces: Its logical structure and its presentation.

    The logical structure of a proof consists of methods of compounding and quantifying assertions and methods of deduction.

    • The logical structure is usually expressed as a mathematical object.
    • The most familiar such math objects are the predicate calculus and type theory.
    • Mathematical logic does not have standard terminology (see Math reasoning.) Because of that, the chapter on Proofs uses English words, for example “or” instead of symbols such as $P\lor Q$ or $P+Q$ or $P||Q$.
    • For beginning students, throwing large chunks of mathematical logic at them doesn’t work. The expressions and the rules of deduction need to be introduced to them in context, and in my opinion using few or no logical symbols.
    • Students vary widely in their ability to grasp foreign languages, and symbolic logic in any of its forms is a foreign language. (So is algebra; see my rant.)
    • The rules of deduction do not come naturally to the students, and yet they need to have the rules operate automatically and subconsciously. They should know the names of the nonobvious rules, like “proof by contradiction” and “induction”, but teaching them to be fluent with logical notation is probably a waste of time, since they would have to learn the rules of deduction and a new foreign language at the same time.
    • I hasten to add, a waste of time for beginning students. There are good reasons for students aiming at certain careers to be proficient in type theory, and maybe even for predicate calculus.

    Presentation of proofs

    • Proofs are usually written in narrative form
    • A major source of difficulties is that the presentation of a proof (the way it is written in narrative form) omits the reasons that most of the proof steps follow from preceding ones.
    • Some of the omitted reasons may depend on knowledge the reader does not have. “Let $S\subset\mathbb{Q}\times\mathbb{Q}$. Let $i:S\to\mathbb{N}$ be a bijection…” Note: I am not criticizing someone who writes an argument like this, I am just saying that it is a problem for many beginning students.
    • Some reasons are given for some of the steps, presumably ones that the writer thinks might not be obvious to the reader.
    • Sometimes the narrative form gives a clue to the form of proof to be used. Example: “Prove that the length $C$ of the hypotenuse of a right triangle is less than the sum of the other two sides $A$ and $B$. Proof: Assume $C\geq A+B$…” So you immediately know that this is going to be a proof by contradiction. But you have to teach the student to recognize this.
    • Another example: in proving $P$ implies $Q$, the author will assume that $Q$ is false implies $P$ is false without further comment. The reader is suppose to recognize the proof by contrapositive.

    Translation problem

    • The Translation problem is the problem of translating a narrative proof into the logical reasoning needed to see that it really is a proof.
    • Many experienced professional mathematicians say it is so hard for them to read a narrative proof that they read the theorem and the try to recreate the proof by thinking about it and glancing at the written proof for hints from time to time. That is a sign of how difficult the translation problem really is.
    • Nevertheless, the students need to learn the unfamiliar proof techniques such as contrapositive and contradiction and the wording tricks that communicate proof methodology. Learning this is hard work. It helps for teachers to be more explicit about the techniques and tricks with students who are beginning math major courses.

    References

    Added 2014-12-19

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle

    Those monks

    In my long post, Proofs without dry bones, I discussed the Monk Theorem (my name) in the context of my ideas about rigorous proof. Here, I want to amplify some of my remarks in the post.

    This post was stimulated by Mark Turner’s new book on conceptual blending. That book has many examples of conceptual blending, including the monk theorem, that go into deep detail about how they work. I highly recommend reading his analysis of the monk theorem. Note: I haven’t finished reading the book.

    The Monk Theorem

    A monk starts at dawn at the bottom of a mountain and goes up a path to the top, arriving there at dusk. The next morning at dawn he begins to go down the path, arriving at dusk at the place he started from on the previous day. Prove that there is a time of day at which he is at the same place on the path on both days.

    Proof: Envision both events occurring on the same day, with a monk starting at the top and another starting at the bottom at the same time and doing the same thing the original monk did on different days. They are on the same path, so they must meet each other. The time at which they meet is the time required.

    The proof

    One of the points in Proofs without dry bones was that the proof above is a genuine mathematical proof, in spite of the fact that it uses no recognizable math theorems or math objects. It does contain unspoken assumptions, but so does any math proof. Some of the assumptions:

    • A path has the property that if two people, one at each end, start walking to the opposite end, they will meet each other at a certain time.
    • A day is a period of time which contains a time “dawn” and a later time “dusk”.

    From a mathematician’s point of view, the words “people”, “walking”, “meet”, “path”, “day”, “dawn” and “dusk” could be arbitrary names having the properties stipulated by the assumptions. This is typical mathematical behavior. “Time” is assumed to behave as we commonly perceive it.

    If you think closely about the proof, you will probably come up with some refinements that are necessary to reveal other hidden assumptions (particularly about time). That is also typical mathematical behavior. (Remember Hilbert refining Euclid’s postulates about geometry after thousands of year of people not noticing the enthymemes in the postulates.)

    This proof does not require that walking on the path be modeled by a function \[t\mapsto (x,y,z):\mathbb{R}\to\mathbb{R}\times\mathbb{R}\times\mathbb{R}\] followed by an appeal to the intermediate value theorem, which I mentioned in “Proofs without dry bones”.

    You could simply proceed to make your assumptions about “path”, “meet”, “time”, and so on more explicit until you (or the mathematician you are arguing with) is satisfied. It is in that sense that I claim the proof given above is a genuine mathematical proof.

    References

    • The Origin of Ideas: Blending, Creativity, and the Human Spark, by Mark Turner. Oxford University Press, 2014.
    • The Way We Think: Conceptual Blending And The Mind’s Hidden Complexities, by Giles Fauconnier and Mark Turner. Basic Books, 2003.
    • Proofs without dry bones. Blog post.
    • The rigorous view: inertness. Article on abstractmath.org.
    • Conceptual blending in Wikipedia.
    Send to Kindle