Tag Archives: mathematical English

The intent of mathematical assertions

An assertion in mathematical writing can be a claim, a definition or a constraint.  It may be difficult to determine the intent of the author.  That is discussed briefly here.

Assertions in math texts can play many different roles.

English sentences can state facts, ask question, give commands, and other things.  The intent of an English sentence is often obvious, but sometimes it can be unexpectedly different from what is apparent in the sentence.  For example, the statement “Could you turn the TV down?” is apparently a question expecting a yes or no answer, but in fact it may be a request. (See the Wikipedia article on speech acts.) Such things are normally understood by people who know each other, but people for whom English is a foreign language or who have a different culture have difficulties with them.

There are some problems of this sort in math English and the symbolic language, too.  An assertion can have the intent of being a claim, a definition, or a constraint.

Most of the time the intent of an assertion in math is obvious. But there are conventions and special formats that newcomers to abstract math may not recognize, so they misunderstand the point of the assertion. This section takes a brief look at some of the problems.

Terminology

The way I am using the words “assertion”, “claim”, and “constraint” is not standard usage in math, logic or linguistics.


Claims

In most circumstances, you would expect that if a lecturer or author makes a math assertion, they are claiming that it is a true statement, and you would be right.

Examples
  1. “The $240$th digit of $\pi$ after the decimal point is $4$.”
  2. “If a function is differentiable, it must be continuous.”
  3. “$7\gt3$”

Remarks

  • You don’t have to know whether these statements are true or not to recognize them as claims. An incorrect claim is still a claim.
  • The assertion in (a) is a statement, in this case a false one.  If it claimed the googolth digit was $4$ you would never be able to tell whether it is true or not, but it
    still would be an assertion intended as a claim.
  • The assertion in (b) uses the standard math convention that an indefinite noun phrase (such as “a widget”) in the subject of a sentence is universally quantified (see also the article about “a” in the Glossary.) In other words, “An integer divisible by $4$ must be even” claims that any integer divisible by $4$ is even. This statement is claim, and it is true.
  • (c) is a (true) claim in the symbolic language. (Note that “$3 + 4$” is not an assertion at all, much less a claim.)


Definitions

Definitions are discussed primarily in the chapter on definitionsA definition is not the same thing as a claim. 

Example

The definition

“An integer is even if it is divisible by $2$”

makes the claim

“An
integer is even if and only if it is
divisible by $2$”

true.

(If you are surprised that the definition uses “if” but the claim uses “if and only if”, see the Glossary article on “if”.)

Unmarked definitions

Math texts sometimes define something without saying that it is a definition. Because of that, students may sometimes think a claim is a definition.

Example

Suppose that the concept of “even integer” was new to you and the book said, “A number is even if it is divisible by $4$.” Perhaps you thought that this was a definition. Later the book refers to $6$ as even and you pull your hair out wondering why. The statement is a correct claim but an incorrect definition. A good writer would write something like “Recall that a number is even if it is divisible by $2$, so that in particular it is even if it is divisible by $4$.”

On the other hand, you may think a definition is only a claim.

Example

A lecturer may say “By definition, an integer is even if it is divisible by $2$”, and you write down: “An integer is even if it is divisible by $2$”. Later, you get all panicky wondering How did she know that?? (This has happened to me.)

The confusion in the preceding example can also occur if a books says, “An integer is even if it is divisible by $2$” and you don’t know about the convention that when an author puts a word or phrase in boldface or italics it may mean that they are defining it.

A good writer always labels definitions


Constraints

Here are two assertions that contain variables.

  • “$n$ is even.”
  • “$x\gt1$”.

Such an assertion is a constraint (or a condition) if the intent is
that the assertion will hold in that part of the text (the scope of the constraint). The part of the text in which it holds is usually the immediate vicinity unless the authors explicitly says it will hold in a larger part of the text such as “this chapter” or “in the rest of the book”.

Examples
  • Sometimes the wording makes it clear that the phrase is a constraint. So a statement such as “Suppose $3x^2-2x-5\geq0$” is a constraint on the possible values of $x$.
  • The statement “Suppose $n$ is even” is an explicit requirement that $n$ be even and an implicit requirement that $n$ be an integer.
  • A condition for which you are told to find the solution(s) is a constraint. For example: “Solve the equation $3x^2-2x-5=0$”. This equation is a constraint on the variable $x$. “Solving” the equation means saying explicitly which numbers make the equation true.

Postconditions

The constraint may appear in parentheses after the assertion as a postcondition on an assertion.

Example

“$x^2\gt x\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,(\text{all }x\gt1)$”

which means that if the constraint “$x\gt1$” holds, then “$x^2\gt x$” is true. In other words, for all $x\gt1$, the statement $x^2\gt x$ is true. In this statement, “$x^2\gt x$” is not a constraint, but a claim which is true when the constraint is true.

Creative Commons License

This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

Send to Kindle

Context

This is a revised draft of the abstractmath.org article on context in math texts. Note: WordPress changed double primes into quotes. Tsk.

Context

Written and especially spoken language depends heavily on the context – the physical surroundings, the preceding conversation, and social and cultural assumptions.  Mathematical statements are produced in such contexts, too, but here I will discuss a special thing that happens in math conversation and writing that does not seem to happen much in other sorts of discourse:

The meanings of expressions
in both the symbolic language and math English
change from phrase to phrase
as the speaker or writer changes the constraints on them.

Example

In a math text, before the occurrence of a phrase such as “Let $n=3$”, $n$ may be known only as an integer variable.  After the phrase, it means specifically $3$.  So this phrase changes the meaning of $n$ by constraining $n$
to be $3$.  We say the context of occurrences of “$n$” before the phrase requires only that $n$ be an integer, but after the occurrence the context requires $n=3$.

Definition

In this article, the context at a particular location in mathematical discourse is the sum total of what the reader or listener can know about the symbols and names used in the discourse when they have read everything up to that location.

Remarks

  • Each clause can change the meaning of or constraints on one or more symbols or names. The conventions in effect during the discourse can also put constraints on the symbols and names.
  • Chierchia and McConnell-Ginet give a mathematical definition of context in the sense described here.
  • The references to “before” and “after” the phrase “Let $3$” refer to the physical location in text and to actual time in spoken math. There is more about this phenomenon in the Handbook of Mathematical Discourse, page 252, items (f) and (g).
  • Contextual changes of this sort take place using the pretense that you are reading the text in order, which many students and professionals do not do (they are “grasshoppers”).
  • I am not aware of much context-changing in everyday speech. One place it does occur is in playing games. For example, during some card games the word “trumps” changes meaning from time to time.
  • In symbolic logic, the context at a given place may be denoted by “$\Gamma$”.

Detailed example of a math text

Here is a typical example of a theorem and its proof.  It is printed twice, the second time with comments about the changes of context.  This is the same proof that is already analyzed practically to death in the chapter on presentation of proofs.

First time through

Definition: Divides

Let $m$ and $n$ be integers with $m\ne 0$. The statement “$m$ divides $n$” means that there is an integer $q$ for which $n=qm$

Theorem

Let $m$, $n$ and $p$ be integers, with $m$ and $n$ nonzero, and suppose $m$ divides $n$ and $n$ divides $p$.  Then $m$ divides $p$.

Proof

By definition of divides, there are integers $q$ and $q’$ for which $n=qm$ and $p=q’n$. We must prove that there is an integer $q”$ for which $p=q”m$. But $p=q’n=q’qm$, so let $q”=q’q$.  Then $p=q”m$.

Second time, with analysis

Definition: Divides

Begins a definition. The word “divides” is the word being defined. The scope of the definition is the following paragraph.

Let $m$ and $n$ be integers

$m$ and $n$ are new symbols in this discourse, constrained to be integers.

with $m\ne 0$

Another constraint on $m$.

The statement “$m$ divides $n$ means that”

This phrase means that what follows is the definition of “$m$ divides $n$”

there is an integer $q$

“There is” signals that we are beginning an existence statement and that $q$ is the bound variable within the existence statement.

for which $n=qm$

Now we know that “$m$ divides $n$” and “there is an integer $q$ for which $m=qn$” are equivalent statements.  Notes: (1) The first statement would only have implied the second statement if this had not been in the context of a definition. (2) After the conclusion of the definition, $m$, $n$ and $q$ are undefined variables.

Theorem

This announces that the next paragraph is a statement has been proved. In fact, in real time the statement was proved long before this discourse was written, but in terms of reading the text in order, it has not yet been proved.

Let $m$, $n$ and
$p$ be integers,

“Let” tells us that the following statement is the hypothesis of an implication, so we can assume that $m$, $n$ and $p$ are all integers.  This changes the status of $m$ and $n$, which were variables used in the preceding paragraph, but whose constraints disappeared at the end of the paragraph.  We are starting over with $m$ and $n$.

with $m$
and $n$ nonzero.

This clause is also part of the hypothesis. We can assume $m$ and $n$ are constrained to be nonzero.

and suppose $m$ divides $n$ and $n$ divides $p$.

This is the last clause in the hypothesis. We can assume that $m$ divides $n$ and $n$ divides $p$.

Then $m$
divides $p$.

This is a claim that $m$ divides $p$. It has a different status from the assumptions that $m$ divides $n$ and $n$ divides $p$. If we are going to follow the proof we have to treat $m$ and $n$ as if they divide $n$ and $p$ respectively. However, we can’t treat $m$ as if it divides $p$. All we know is that the author is claiming that $m$ divides $p$, given the facts in the hypothesis.

Proof

An announcement that a proof is about to begin, meaning a chain of math reasoning. The fact that it is a proof of the Theorem just stated is not explicitly stated.

By definition of divides, there are integers $q$ and $q’$ for which $n=qm$ and $p=q’n$.

The proof uses the direct method (rather than contradiction or induction or some other method) and begins by rewriting the hypothesis using the definition of “divides”. The proof does not announce the use of these techniques, it just starts in doing it. So $q$ and $q$’ are new symbols that satisfy the equations $n=qm$ and $p=q’n$. The phrase “by definition of divides” justifies the introduction of $q$ and $q’$. $m$, $n$ and $p$ have already been introduced in the statement of the Theorem.

We must prove that there is an integer $q”$ for which $p=q”m$.

Introduces a new variable $q”$ which has not been given a value. We must define it so that $p=q”m$; this requirement is justified (without saying so) by the definition of “divides”.

But $p=q’n=q’qm$,

This is a claim about $p$, $q$, $q’$, $m$ and $n$.  It is justified by certain preceding sentences but this justification is not made explicit. Note that “$p=q’n=q’qm$” pivots on $q’n$, in other words makes two claims about it.

so let $q”=q’q$.

We have already introduced $q”$; now we give it the value $q”=q’q$.

Then $p=q”m$

This is an assertion about $p$, $q”$ and $n$, justified (but not explicitly — note the hidden use of associativity) by the previous claim that $p=q’n=q’qm$.

 

The proof is now complete, although no
statement asserts that it is.

Remark

If you have some skill in reading proofs, all the stuff in the right hand column happens in your brain without, for the most part, your being conscious of it.

Acknowledgment

Thanks to Chris Smith for correcting errors.

References for “context”

Chierchia, G. and S. McConnell-Ginet
(1990), Meaning and Grammar. The MIT Press.

de Bruijn, N. G. (1994), “The mathematical vernacular, a
language for mathematics with typed sets”. In Selected Papers on Automath,
Nederpelt, R. P., J. H. Geuvers, and R. C. de Vrijer, editors, volume 133 of
Studies in Logic and the Foundations of Mathematics, pages 865 – 935. Elsevier

Steenrod, N. E., P. R. Halmos, M. M. Schif­fer,
and J. A. Dieudonné (1975), How to Write Mathematics.
American Mathematical Society.

Send to Kindle

The Mathematics Depository: A Proposal

Introduction

This post is about taking texts written in mathematical English and the symbolic language and encoding it in a formal language that could be tested by an automated proof verifier. This is a very difficult undertaking, but we could get closer and closer to a working system by a worldwide effort continuing over, probably, decades. The system would have to contain many components working together to create incremental improvements in the process.

This post, which is a first draft, outlines some suggestions as to how this could work. I do not discuss the encoding required, which is not my area of expertise. Yes, I understand that coding is the hard part!

Much work has been done by computing scientists in developing proof checking and proof-finding programs. Work has also been done, primarily by math education workers but also by some philosophers and computing scientists, in uncovering the many areas where ordinary math language is ambiguous and deviates from ordinary English usage. These characteristics confuse students and also make it hard to design a program that can interpret the language. I have been working in that area mostly from the math ed point of view for the last twenty years.

The Reference section lists many references to the problem of parsing mathematical English, some from the point of view of automatic translation of math language into code, but most from the point of view of helping students understand how to understand it.

The Mathematics Depository

I imagine a system for converting documents written in math language into machine-readable language and testing their claims. An organization, call it the Mathematics Depository, would be developed that is supported by many countries, organizations and individual supporters. It should consist of several components listed below, no doubt with other components as we become aware of needing them. The organization would be tasked with supporting and improving these components over time.

The main parts of the system

Each component is linked to a more detailed description that is given later in this post.

  • A Proof Verifier (PV), that inputs a proof and determines if it is correct.
  • A specification of a supported subset of Mathematical English and the symbolic language, that I will call Strict Math English (SME).
  • A Text-SME Converter, a program that would input a text written in ordinary math English that has been annotated by a knowledgeable person and convert it into SME.
  • An SME-PV Converter that will convert text written in SME into code that can be directly read by the Proof Verifier.
  • One or more Automatic Theorem Provers, that to begin with can take fairly simple conjectures written in SME and sometimes succeed in proving them.
  • An Annotation System containing an Annotation Editor that would allow a person to use SME to annotate an article written in ordinary math English so that it could be read by the Text-SME Converter.
  • A Data Base that would include the texts that have been collected in this endeavor, along with the annotations and the results of the proof checking.
  • A Data Base Miner that would watch for patterns in the annotations as new papers were submitted. The operators might also program it to watch for patterns in other aspects of the operation.

These facilities would be organized so that the systems work together, with the result that the individual components I named improve over time, both automatically and via human intervention.

Flow of Work

  1. A math text is submitted.
  2. If it is already in Strict Math English (SME), it is input to the Proof Verifier (PV).
  3. Otherwise, the math text is input into the Annotation System.
  4. The resulting SME text is input into the Text-SME Converter.
  5. The output of the Text-SME Converter is input into the Proof Verifier.
  6. The PV incorporates each definition in the text into the context of the math text. This is a specific meaning of the word “context”, including a list of the status of variables (bound, unbound, type, and so on), meanings of technical words, and other facts created in the text. “Context” is described informally in my article Context in abstractmath.org. That article gives references to the formal literature.
  7. In my experience mathematicians spend only a little time reading arguments step by step as described in the Context article. They usually look at a theorem and try to figure it out themselves, “cheating” occasionally by glancing at parts of the proof.

  8. Each mathematical assertion in the text is marked as a claim.
  9. The checking process records those claims occurring in the proof that are not proved in the text, along with any references given to other texts.
  10. If a reference to a result in another text is made, the PV looks for the result in the Database. If it does not find it, the PV incorporates the result and its location in the Database as an externally proven but untested claim.
  11. If no reference or proof for a claim is given, the PV checks the Database to see if it has already been proved.
  12. Any claim in the current text not shown as proven in the Database is submitted to the Automatic Theorem Prover (ATP). The output of the ATP is put in the database (proved, counterexample found, or unable to determine truth).
  13. If a segment of text is presented as a proof, it is input into the PV to be verified.
  14. The PV reports the result for each claimed proof, which can consist of several possibilities:
    • A counterexample for a proof is found, so the claim that the proof was supposed to report is false.
    • The proof contains gaps, so the claim is unsettled.
    • The proof is reported as correct.
  15. At the end of the process, all the information gathered is put into the Database:
    • The original text showing all the annotations.
    • The text in SME.
    • All claims, with their status (proven true, proven false, truth unknown, reference if one was given).
    • Every proof, with its status and the entire context at each step of the proof.

Details

The proof verifier

  • Proof checking programs have been developed over the last thirty or so years. The MD should write or adapt one or more Proof Verifiers and improve it incrementally as a result of experience in running the system. In this post I have assumed the use of just one Proof Verifier.
  • The Proof Verifier should be designed to read the output of the SME-PV converter.
  • The PV must read a whole math text in SME, identify and record each claim and check each proof (among other things). This is different from current proof verifiers, which take exactly one proof as input.
  • The PV must create the context of each proof and change it step by step as it reads each syntactic fragment of the math text.
  • Typically the context for a claimed proof is built up in the whole math text, not just in the part called “Proof”.
  • The PV should automatically query the Data Base for unproved steps in a proof in the input text to see if they have already been verified somewhere else. These results should be quoted in a proof verifier output.
  • The PV should also automatically submit steps in the proof that haven’t been verified to the Automatic Theorem Provers and wait for the step to be verified or not.
  • The Proof Verifier should output details of the result of the checking whether it succeeded in verifying the whole input text or not. In particular, it should list steps in proofs it failed to verify, including steps in proofs for which the input text cited the proof in some other paper, in the MD system or not.
  • The Proof Verifier should be available online for anyone to submit, in SME, a mathematical text claiming to prove a theorem. Submission might require a small charge.

Strict Math English

  • One of the most important aspects of the system would be the simultaneous incremental updating of the SME and the SME-PV Converter.
  • The idea is that SME would get more and more inclusive of the phrases and clauses it allows.

Example: Universal Assertions

At the start SME might allow these statements to be recognized as the same universal assertion:

  • “$\forall x(x^2+1\gt0)$”
  • “For all [every, any] $x$, $x^2+1\gt0$.” (universality asserted using an English word.)
  • “For all [every, any] $x$, $x^2+1$ is positive.”

As time goes on, a person or the Data Base Miner might detect that many annotators also recognized these statements as saying the same thing:

  • “$x^2+1\gt0\,\,\,\,\,(\text{all } x)$” (as a displayed statement)
  • “$x^2+1$ is positive for every $x$.” Universality asserted using an adjective in a postposited phrase.
  • “$x^2+1$ is always positive.” Universality hidden in a postposited adverb that seems to be referring to time!
  • There are more examples in my article Universally True Assertions. See also Susanna Epp’s article on quantification for other problems in this area.

These other variations would then be added to the Strict Math Language. (This is only an example of how the system would evolve. I have no doubt that in fact all the terminology mentioned above would be included at the outset, since they are all documented in the math ed literature.)

Even at the start, SME will include phrases and clauses in the English language as well as symbolic expressions. It is notorious that automatically parsing general English sentences is difficult and that the ubiquity of metaphors makes it essentially impossible to reliably construct the meaning of a sentence. That is why SME must start with a very narrow subset of math English. But even in early days, it should include some stereotyped metaphors, such as using “always” in universal assertions.

The SME-PV Converter

  • The SME-PV Converter would read documents written in SME and convert them into code readable by the proof checking program, as well as by the automatic theorem provers.
  • Such a program is essentially the subject of Ganesingalam’s book.
  • Converting SME so that the Proof Verifier can handle it involves lots of subtleties. For example, if the text says, “For any $x$, $x^2+1\gt0$”, the translation has to recognize not only that this is a universally quantified statement with $x$ as the bound variable, but that $x$ must be a real number, since complex numbers don’t do greater-than.
  • Frequent revisions of the SME-PV Converter will be necessary since its input language, the SME, will be constantly expanded.
  • It may be that the output language of the SME-PV Converter (which the Proof Verifier and Automatic Theorem Provers read) will require only infrequent revisions.

The Automatic Theorem Provers

  • The system could support several ATP’s, each one adapted to read the output of the SME-PV Converter.
  • The Automatic Theorem Provers should provide output in such a way that the Proof Verifier can include in its report the positive or negative results of the Theorem Prover in detail.

The Annotation System

  • The Annotation system would facilitate construction of a data structure that connects each annotation to the specific piece of text it rewrites. The linking should be facilitated by the Annotation Editor.
  • For example, an annotation that is meant to explain that the statement (in the input text) “$x^2+1$ is always greater than $0$” is to be translated as “$\forall x(x^2+1\gt0)$” (which is presumably allowed by SME) should cause the first statement to be be linked to the second statement. The first statement, the one in the input text, should not be changed. This will enable the Data Base Miner to find patterns of similar text being annotated in similar ways.
  • The annotations should clarify words, symbolic expressions and sentences in the input text to allow the Proof Verifier to input them correctly.
  • In particular, every claim that a statement is true should be marked as a proposed theorem, and similarly every proof should be marked as a proof and every definition should be marked as a definition. Such labeling is often omitted in the math literature. Annotators would have to recognize segments of the text as claims, proofs and definitions and annotate them as such.
  • The annotations would be written in the current version of Strict Math English. Since SME is frequently updated, the instructions for the annotator would also have to be frequently updated.

Examples

  • If a paper used the word “domain” without defining it, the annotator would clarify whether it meant an open connected set, a type of ring, a type of poset, or the domain of a function. See Example 1
  • Annotators will note instances in which the same text will use a symbol with two different meanings. See Example 2.
  • In a phrase, a single occurrence of a symbol can require an annotation that assigns more than one attribute to the symbol. See Example 3.

The Annotation Editor

  • The annotators should be provided with an Annotation Editor designed specifically for annotation.
  • The editor should include a system of linking an annotation to the exact phrase it annotates that is easy for a person reading the annotated document to understand it as well as providing the information to the Text-SME Converter.

The Annotators

  • Great demands will be made of an annotator.
  • They must understand the detailed meaning of the text they annotate. This means they must be quite familiar with the field of math the text is concerned with.
  • They must learn SME. I know for a fact that many mathematicians are not good at learning foreign languages. It will help that SME will be a subset of the full language of math.
  • All this means that annotators must be chosen carefully and paid well. This means that not very many papers will get annotated by paid annotators, so that there will have to be some committee that chooses the papers to be annotated. This will be a genuine bottleneck.
  • One thing that will help in the long run is that the SME should evolve to include more features of the general language of math, so many mathematicians will actually write their papers in SME and submit it directly to the Depository. (“Long run” may mean more than ten years).

The Text-to-SME Converter

  • This converter takes a math text in ordinary Math English that has been annotated and convert it into SME.
  • The format for feeding it to the Automatic Theorem Prover may very well have to be different from the format to be read by a human. Both formats should be saved.

The Data Base

  • The Data Base would contain all math papers that have been run through the Proof Verifier, along with the results found by the Proof Verifier. A paper should be included whether or not every claim in the paper was verified.
  • Funding agencies (and private individuals) might choose particularly important papers and pay more money for annotation for those than for other papers.
  • Mathematicians in a particular field could be hired to annotate particular articles in their field, using a standard annotation language that would develop through time.
  • The annotated papers would be made freely available to the public.
  • It will no doubt prove useful for the Data Base to contain many other items. Possibilities:
  • A searchable list of all theorems that have been verified.
  • A glossary: a list of math words that have been defined in the papers in the Depository. This will include synonyms and words with multiple meanings.

The Data Base Miner

Watch for patterns

The DBM would watch for patterns in annotation as new annotated papers were submitted. It should probably look only at annotated papers whose proofs had been verified. The patterns might include:

  • Correlation between annotations that associate particular meanings to particular words or symbols with the branch of math the paper belongs to. See Example 1.
  • Noting that a particular format of combining symbols usually results in the same kind of annotation. See Example 4.
  • Providing data in such a way that lexicographers studying math English could make use of them. My Handbook began with my doing lexicographical research on math English, but I found it so slow that when I started abstractmath.org I resolved not to such research any more. Nevertheless, it needs to be done and the Database should make the process much easier.

Statistical translation

Since the annotated papers will be stored in the Data Base, the Data Base Miner could use the annotations in somewhat the same way some language translators work (in part): to translate a phrase, it will find occurrences of the phrase in the source language that have been translated into the target language and use the most common translation. In this case the source language is the paper (in English) and the target language is in annotated math English readable by the Proof Verifier. Once the Database includes most of the papers ever published (twenty years from now?), statistical translation might actually become useful.

Examples

Example 1: Meaning varies with branch of math

  • Field” means one thing in an algebra paper and another in a mathematical physics paper.
  • Domain” means
  • An open connected set in topology.
  • A type of ring in algebra.
  • A type of poset in theoretical computing science.
  • The domain of a function –everywhere in math, which makes it seem that this is going to be very hard to distinguish without human help!
  • Log” usually implies base $2$ in the computing world, base $10$ in engineering (but I am not sure how prevalant this meaning is there), and base $e$ in pure math. With exceptions!
  • Example 2: Meaning varies even in the same article

    • The notation “$(a,b)$” can mean an ordered pair, an open interval, or the GCD. What’s worse, there are many instances where the symbol is used without definition. Citation 139 in the Handbook provides a single sentence in which the first two meanings both occur:

      $\dots$ Richard Darst and Gerald Taylor investigated the differentiability of functions $f^p$ (which for our purposes we will restrict to $(0,1)$) defined for each $p\geq1$ by\[F(x):=
      \begin{cases}
      0 &
      \text{if }x\text{ is irrational}\\
      \displaystyle{\frac{1}{n^p}} &
      \text{if }x = \displaystyle{\frac{m}{n}}\text{ with }(m,n)=1\\ \end{cases}\]

      The sad thing is that any mathematician will know immediately what each occurrence means. This may be a case where the correct annotation will never be automatically detectable.

    Example 3: One mention of a symbol may require several meanings

    In the sentence, “This infinite series converges to $\zeta(2)=\frac{\pi^2}{6}\approx 1.65$,” the annotator would provide two pieces of information about “$\frac{\pi^2}{6}$”, namely that it is both the right constituent of the equation “$\zeta(2)=\frac{\pi^2}{6}$” and the left constituent of the approximation statement “$\frac{\pi^2}{6}\approx 1.65$” — and that these two statements were the constituents of an asserted conjunction. (See my post Pivoted symbols.)

    Example 4: Function to a power

    Some expressions not in the SME will almost always be annotated in the same way. This makes it discoverable by the Data Base Miner.

    • “$\sin^{-1}x$” always means $\arcsin x$.
    • For positive $n$, “$\sin^n x$” always means $(\sin x)^n$. It never means the $n$-fold application of $\sin$ to $x$.
    • In contrast, for an arbitrary function symbol, $f^n(x)$ will often be annotated as $n$-fold application of $f$ and also often as $f(x)^n$. (And maybe those last two possibilities are correlated by branch of math.)

    References

    I believe that work in formal verification has tended to overlook the work on math language difficulties in math ed, so I have included some articles from that specialty.

    The following are posts from my blog Gyre&Gimble. They are in reverse chronological order.

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.


    Send to Kindle

    Problems caused for students by the two languages of math

    The two languages of math

    Mathematics is communicated using two languages: Mathematical English and the symbolic language of math (more about them in two languages).

    This post is a collection of examples of the sorts of trouble that the two languages cause beginning abstract math students. I have gathered many of them here since they are scattered throughout the literature. I would welcome suggestions for other references to problems caused by the languages of math.

    In many of the examples, I give links to the literature and leave you to fish out the details there. Almost all of the links are to documents on the internet.

    There is an extensive list of references.

    Conjectures

    Scattered through this post are conjectures. Like most of my writing about difficulties students have with math language, these conjectures are based on personal observation over 37 years of teaching mostly computer engineering and math majors. The only hard research of any sort I have done in math ed consists of the 426 citations of written mathematical writing included in the Handbook of Mathematical Discourse.

    Disclaimer

    This post is an attempt to gather together the ways in which math language causes trouble for students. It is even more preliminary and rough than most of my other posts.

    • The arrangement of the topics is unsatisfactory. Indeed, the topics are so interrelated that it is probably impossible to give a satisfactory linear order to them. That is where writing on line helps: Lots of forward and backward references.
    • Other people and I have written extensively about some of the topics, and they have lots of links. Other topics are stubs and need to be filled out. I have probably missed important points about and references to many of them.
    • Please note that many of the most important difficulties that students have with understanding mathematical ideas are not caused by the languages of math and are not represented here.

    I expect to revise this article periodically as I find more references and examples and understand some of the topics better. Suggestions would be very welcome.

    Intricate symbolic expressions

    I have occasionally had students tell me that have great difficulty understanding a complicated symbolic expression. They can’t just look at it and learn something about what it means.

    Example

    Consider the symbolic expression \[\displaystyle\left(\frac{x^3-10}{3 e^{-x}+1}\right)^6\]

    Now, I could read this expression aloud as if it were text, or more precisely describe it so that someone else could write it down. But if I am in math mode and see this expression I don’t “read” it, even to myself.

    I am one of those people who much of the time think in pictures or abstractions without words. (See references here.)

    In this case I would look at the expression as a structured picture. I could determine a number of things about it, and when I was explaining it I would point at the board, not try to pronounce it or part of it:

    • The denominator is always positive so the expression is defined for all reals.
    • The exponent is even so the value of the expression is always nonnegative. I would say, “This (pointing at the exponent) is an even power so the expression is never negative.”
    • It is zero in exactly one place, namely $x=\sqrt[3]{10}$.
    • Its derivative is also $0$ at $\sqrt[3]{10}$. You can see this without calculating the formula for the derivative (ugh).

    There is much more about this example in Zooming and Chunking.

    Algebra in high school

    There are many high school students stymied by algebra, never do well at it, and hate math as a result. I have known many such people over the years. A revealing remark that I have heard many times is that “algebra is totally meaningless to me”. This is sometimes accompanied by a remark that geometry is “obvious” or something similar. This may be because they think they have to “read” an algebraic expression instead of studying it as they would a graph or a diagram.

    Conjecture

    Many beginning abstractmath students have difficulty understanding a symbolic expression like the one above. Could this be cause by resistance to treating the expression as a structure to be studied?

    Context-sensitive pronunciation

    A symbolic assertion (“formula” to logicians) can be embedded in a math English sentence in different ways, requiring the symbolic assertion to be pronounced in different ways. The assertion itself is not modified in any way in these different situations.

    I used the phrase “symbolic assertion” in abstractmath.org because students are confused by the logicians’ use of “formula“.
    In everyday English, “$\text{H}_2\text{O}$” is the “formula” for water, but it is a term, not an assertion.

    Example

    “For every real number $x\gt0$ there is a real number $y$ such that $x\gt y\gt0$.”

    • In the sentence above, the assertion “$x\gt0$” must be pronounced “$x$ that is greater than $0$” or something similar.
    • The standalone assertion “$x\gt0$” is pronounced “$x$ is greater than $0$.”
    • The sentence “Let $x\gt0$” must be pronounced “Let $x$ be greater than $0$”.

    The consequence is that the symbolic assertion, in this case “$x\gt0$”, does not reveal that role it plays in the math English sentence that it is embedded in.

    Many of the examples occurring later in the post are also examples of context-sensitive pronunciation.

    Conjectures

    Many students are subconsciously bothered by the way the same symbolic expression is pronounced differently in different math English sentences.

    This probably impedes some students’ progress. Teachers should point this phenomenon out with examples.

    Students should be discouraged from pronouncing mathematical expressions.

    For one thing, this could get you into trouble. Consider pronouncing “$\sqrt{3+5}+6$”. In any case, when you are reading any text you don’t pronounce the words, you just take in their meaning. Why not take in the meaning of algebraic expressions in the same way?

    Parenthetic assertions

    A parenthetic assertion is a symbolic assertion embedded in a sentence in math English in such a way that is a subordinate clause.

    Example

    In the math English sentence

    “For every real number $x\gt0$ there is a real number $y$ such that $x\gt y\gt0$”

    mentioned above, the symbolic assertion “$x\gt0$” plays the role of a subordinate clause.

    It is not merely that the pronunciation is different compared to that of the independent statement “$x\gt0$”. The math English sentence is hard to parse. The obvious (to an experienced mathematician) meaning is that the beginning of the sentence can be read this way: “For every real number $x$, which is bigger than $0$…”.

    But new student might try to read it is “For every real number $x$ is greater than $0$ …” by literally substituting the standalone meaning of “$x\gt0$” where it occurs in the sentence. This makes the text what linguists call a garden path sentence. The student has to stop and start over to try to make sense of it, and the symbolic expression lacks the natural language hints that help understand how it should be read.

    Note that the other two symbolic expressions in the sentence are not parenthetic assertions. The phrase “real number” needs to be followed by a term, and it is, and the phrase “such that” must be followed by a clause, and it is.

    More examples

    • “Consider the circle $S^1\subseteq\mathbb{C}=\mathbb{R}^2$.” This has subordinate clauses to depth 2.
    • “The infinite series $\displaystyle\sum_{k=1}^\infty\frac{1}{k^2}$ converges to $\displaystyle\zeta(2)=\frac{\pi^2}{6}\approx1.65$”
    • “We define a null set in $I:=[a,b]$ to be a set that can be covered by a countable of intervals with arbitrarily small total length.” This shows a parenthetical definition.
    • “Let $F:A\to B$ be a function.”
      A type declaration is a function? In any case, it would be better to write this sentence simply as “Let $F:A\to B$”.

    David Butler’s post Contrapositive grammar has other good examples.

    Math texts are in general badly written. Students need to be taught how to read badly written math as well as how to write math clearly. Those that succeed (in my observation) in being able to read math texts often solve the problem by glancing at what is written and then reconstructing what the author is supposedly saying.

    Conjectures

    Some students are baffled, or at least bothered consciously or unconsciously, by parenthetic assertions, because the clues that would exist in a purely English statement are missing.

    Nevertheless, many if not most math students read parenthetic assertions correctly the first time and never even notice how peculiar they are.

    What makes the difference between them and the students who are stymied by parenthetic assertions?

    There is another conjecture concerning parenthetic assertions below.

    Context-sensitive meaning

    “If” in definitions

    Example

    The word “if” in definitions does not mean the same thing that it means in other math statements.

    • In the definition “An integer is even if it is divisible by $2$,” “if” means “if and only if”. In particular, the definition implies that a function is not even if it is not divisible by $2$.
    • In a theorem, for example “If a function is differentiable, then it is continuous”, the word “if” has the usual one-way meaning. In particular, in this case, a continuous function might not be differentiable.

    Context-sensitive meaning occurs in ordinary English as well. Think of a strike in baseball.

    Conjectures

    The nearly universal custom of using “if” to mean “if and only if” in definitions makes it a harder for students to understand implication.

    This custom is not the major problem in understanding the role of definitions. See my article Definitions.

    Underlying sets

    Example

    In a course in group theory, a lecturer may say at one point, “Let $F:G\to H$ be a homomorphism”, and at another point, “Let $g\in G$”.

    In the first sentence, $G$ refers to the group, and in the second sentence it refers to the underlying set of the group.

    This usage is almost universal. I think the difficulty it causes is subtle. When you refer to $\mathbb{R}$, for example, you (usually) are referring to the set of real numbers together with all its canonical structure. The way students think of it, a real number comes with its many relations and connections with the other real numbers, ordering, field properties, topology, and so on.

    But in a group theory class, you may define the Klein $4$-group to be $\mathbb{Z}_2\times\mathbb{Z}_2$. Later you may say “the symmetry group of a rectangle that is not a square is the Klein $4$-group.” Almost invariably some student will balk at this.

    Referring to a group by naming its underlying set is also an example of synecdoche.

    Conjecture

    Students expect every important set in math to have a canonical structure. When they get into a course that is a bit more abstract, suddenly the same set can have different structures, and math objects with different underlying sets can have the same structure. This catastrophic shift in a way of thinking should be described explicitly with examples.

    Way back when, it got mighty upsetting when the earth started going around the sun instead of vice versa. Remind your students that these upheavals happen in the math world too.

    Overloaded notation

    Identity elements

    A particular text may refer to the identity element of any group as $e$.

    This is as far as I know not a problem for students. I think I know why: There is a generic identity element. The identity element in any group is an instantiation of that generic identity element. The generic identity element exists in the sketch for groups; every group is a functor defined on that sketch. (Or if you insist, the generic identity element exists in the first order theory for groups.) I suspect mathematicians subconsciously think of identity elements in this way.

    Matrix multiplication

    Matrix multiplication is not commutative. A student may forget this and write $(A^2B^2=(AB)^2$. This also happens in group theory courses.

    This problem occurs because the symbolic language uses the same symbol for many different operations, in this case the juxtaposition notation for multiplication. This phenomenon is called overloaded notation and is discussed in abstractmath.org here.

    Conjecture

    Noncommutative binary operations written using juxtaposition cause students trouble because going to noncommutative operations requires abandoning some overlearned reflexes in doing algebra.

    Identity elements seem to behave the same in any binary operation, so there are no reflexes to unlearn. There are generic binary operations of various types as well. That’s why mathematicians are comfortable overloading juxtaposition. But to get to be a mathematician you have to unlearn some reflexes.

    Negation

    Sometimes you need to reword a math statement that contains symbolic expressions. This particularly causes trouble in connection with negation.

    Ordinary English

    The English language is notorious among language learners for making it complicated to negate a sentence. The negation of “I saw that movie” is “I did not see that movie”. (You have to put “d** not” (using the appropriate form of “do”) before the verb and then modify the verb appropriately.) You can’t just say “I not saw that movie” (as in Spanish) or “I saw not that movie” (as in German).

    Conjecture

    The method in English used to negate a sentence may cause problems with math students whose native language is not English. (But does it cause math problems with those students?)

    Negating symbolic expressions

    Examples

    • The negation of “$n$ is even and a prime” is “$n$ is either odd or it is not a prime”. The negation should not be written “$n$ is not even and a prime” because that sentence is ambiguous. In the heat of doing a proof students may sometimes think the negation is “$n$ is odd and $n$ is not a prime,” essentially forgetting about DeMorgan. (He must roll over in his grave a lot.)
    • The negation of “$x\gt0$” is “$x\leq0$”. It is not “$x\lt0$”. This is a very common mistake.

    These examples are difficulties caused by not understanding the math. They are not directly caused by difficulties with the languages of math.

    Negating expressions containing parenthetic assertions

    Suppose you want to prove:

    “If $f:\mathbb{R}\to\mathbb{R}$ is differentiable, then $f$ is continuous”.

    A good way to do this is by using the contrapositive. A mechanical way of writing the contrapositive is:

    “If $f$ is not continuous, then $f:\mathbb{R}\to\mathbb{R}$ is not differentiable.”

    That is not good. The sentence needs to be massaged:

    “If $f:\mathbb{R}\to\mathbb{R}$ is not continuous, then $f$ is not differentiable.”

    Even better would be to write the original sentence as:

    “Suppose $f:\mathbb{R}\to\mathbb{R}$. Then if $f$ is differentiable, then $f$ is continuous.”

    This is discussed in detail in David Butler’s post Contrapositive grammar.

    Conjecture

    Students need to be taught to understand parenthetic assertions that occur in the symbolic language and to learn to extract a parenthetic assertion and write it as a standalone assertion ahead of the statement it occurs in.

    Scope

    The scope of a word or variable consists of the part of the text for which its current definition is in effect.

    Examples

    • “Suppose $n$ is divisible by $4$.” The scope is probably the current paragraph or perhaps the current proof. This means that the properties of $n$ are constrained in that section of the text.
    • “In this book, all rings are unitary.” This will hold for the whole book.

    There are many more examples in the abstractmath.org article Scope.

    If you are a grasshopper (you like to dive into the middle of a book or paper to find out what it says), knowing the scope of a variable can be hard to determine. It is particularly difficult for commonly used words or symbols that have been defined differently from the usual usage. You may not suspect that this has happened since it might be define once early in the text. Some books on writing mathematics have urged writers to keep global definitions to a minimum. This is good advice.

    Finding the scope is considerably easier when the text is online and you can search for the definition.

    Conjecture

    Knowing the scope of a word or variable can be difficult. It is particular hard when the word or variable has a large scope (chapter or whole book.)

    Variables

    Variables are often introduced in math writing and then used in the subsequent discussion. In a complicated discussion, several variables may be referred to that have different statuses, some of them introduced several pages before. There are many particular ways discussed below that can cause trouble for students. This post is restricted to trouble in connection with the languages of math. The concept of variable is difficult in itself, not just because of the way the math languages represent them, but that is not covered here.

    Much of this part of the post is based on work of Susanna Epp, including three papers listed in the references. Her papers also include many references to other work in the math ed literature that have to do with understanding variables.

    See also Variables in abstractmath.org and Variables in Wikipedia.

    Types

    Students blunder by forgetting the type of the variable they are dealing with. The example given previously of problems with matrix multiplication is occasioned by forgetting the type of a variable.

    Conjecture

    Students sometimes have problems because they forget the data type of the variables they are dealing with. This is primarily causes by overloaded notation.

    Dependent and independent

    If you define $y=x^2+1$, then $x$ is an independent variable and $y$ is a dependent variable. But dependence and independence of variablesare more general than that example suggests.
    In an epsilon-delta proof of the limit of a function (example below,) $\varepsilon$ is independent and $\delta$ is dependent on $\varepsilon$, although not functionally dependent.

    Conjecture

    Distinguishing dependent and independent variables causes problems, particularly when the dependence is not clearly functional.

    I recently ran across a discussion of this on the internet but failed to record where I saw it. Help!

    Bound and free

    This causes trouble with integration, among other things. It is discussed in abstractmath.org in Variables and Substitution. I expect to add some references to the math ed literature soon.

    Instantiation

    Some of these variables may be given by existential instantiation, in which case they are dependent on variables that define them. Others may be given by universal instantiation, in which case the variable is generic; it is independent of other variables, and you can’t impose arbitrary restrictions on it.

    Existential instantiation

    A theorem that an object exists under certain conditions allows you to name it and use it by that name in further arguments.

    Example

    Suppose $m$ and $n$ are integers. Then by definition, $m$ divides $n$ if there is an integer $q$ such that $n=qm$. Then you can use “$q$” in further discussion, but $q$ depends on $m$ and $n$. You must not use it with any other meaning unless you start a new paragraph and redefine it.

    So the following (start of a) “proof” blunders by ignoring this restriction:

    Theorem: Prove that if an integer $m$ divides both integers $n$ and $p$, then $m$ divides $n+p$.

    “Proof”: Let $n = qm$ and $p = qm$…”

    Universal instantiation

    It is a theorem that for any integer $n$, there is no integer strictly between $n$ and $n+1$. So if you are given an arbitrary integer $k$, there is no integer strictly between $k$ and $k+1$. There is no integer between $42$ and $43$.

    By itself, universal instantiation does not seem to cause problems, provided you pay attention to the types of your variables. (“There is no integer between $\pi$ and $\pi+1$” is false.)

    However, when you introduce variables using both universal and existential quantification, students can get confused.

    Example

    Consider the definition of limit:

    Definition: $\lim_{x\to a} f(x)=L$ if and only if for every $\epsilon\gt0$ there is a $\delta\gt0$ for which if $|x-a|\lt\delta$ then $|f(x)-L|\lt\epsilon$.

    A proof for a particular instance of this definition is given in detail in Rabbits out of a Hat. In this proof, you may not put constraints on $\epsilon$ except the given one that it is positive. On the other hand, you have to come up with a definition of $\delta$ and prove that it works. The $\delta$ depends on what $f$, $a$ and $L$ are, but there are always infinitely many values of $\delta$ which fit the constraints, and you have to come up with only one. So in general, two people doing this proof will not get the same answer.

    Reference

    Susanna Epp’s paper Proof issues with existential quantification discusses the problems that students have with both existential and universal quantification with excellent examples. In particular, that paper gives examples of problems students have that are not hinted at here.

    References

    A nearly final version of The Handbook of Mathematical Discourse is available on the web with links, including all the citations. This version contains some broken links. I am unable to recompile it because TeX has evolved enough since 2003 that the source no longer compiles. The paperback version (without the citations) can be bought as a book here. (There are usually cheaper used versions on Amazon.)

    Abstractmath.org is a website for beginning students in abstract mathematics. It includes most of the material in the Handbook, but not the citations. The Introduction gives you a clue as to what it is about.

    Two languages

    My take on the two languages of math are discussed in these articles:

    The Language of Mathematics, by Mohan Ganesalingam, covers these two languages in more detail than any other book I know of. He says right away on page 18 that mathematical language consists of “textual sentences with symbolic material embedded like ‘islands’ in the text.” So for him, math language is one language.

    I have envisioned two separate languages for math in abstractmath.org and in the Handbook, because in fact you can in principle translate any mathematical text into either English or logical notation (first order logic or type theory), although the result in either case would be impossible to understand for any sizeable text.

    Topics in abstractmath.org

    Context-sensitive interpretation.

    “If” in definitions.

    Mathematical English.

    Parenthetic assertion.

    Scope

    Semantic contamination.

    Substitution.

    The symbolic language of math

    Variables.

    Zooming and Chunking.

    Topics in the Handbook of mathematical discourse.

    These topics have a strong overlap with the topics with the same name in abstractmath.org. They are included here because the Handbook contains links to citations of the usage.

    Context-sensitive.

    “If” in definitions.

    Parenthetic assertion.

    Substitution.

    Posts in Gyre&Gimble

    Names

    Naming mathematical objects

    Rabbits out of a Hat.

    Semantics of algebra I.

    Syntactic and semantic thinkers

    Technical meanings clash with everyday meanings

    Thinking without words.

    Three kinds of mathematical thinkers

    Variations in meaning in math.

    Other references

    Contrapositive grammar, blog post by David Butler.

    Proof issues with existential quantification, by Susanna Epp.

    The role of logic in teaching proof, by Susanna Epp (2003).

    The language of quantification in mathematics instruction, by Susanna Epp (1999).

    The Language of Mathematics: A Linguistic and Philosophical Investigation
    by Mohan Ganesalingam, 2013. (Not available from the internet.)

    On the communication of mathematical reasoning, by Atish Bagchi, and Charles Wells (1998a), PRIMUS, volume 8, pages 15–27.

    Variables in Wikipedia.

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle

    The many boobytraps of “if…then”

    CONTENTS

    The truth table for conditionals

    Conditionals and truth sets

    Vacuous truth

    Universal conditional assertions

    Related assertions

    Understanding conditionals

    Modus ponens

    CONDITIONAL ASSERTIONS

    This section is concerned with logical construc­tions made with the connective called the conditional operator. In mathe­matical English, applying the conditional operator to $P$ and $Q$ produces a sentence that may bewritten, “If $P$, then $Q$”, or “$P$ implies$Q$”. Sentences of this form are conditional assertions.

    Conditional assertions are at the very heart of mathematical reasoning. Mathematical proofs typically consist of chains of conditional assertions.

    Some of the narrative formats used for proving conditional assertions are discussed in Forms of Proof.

    The truth table for conditional assertions

    A conditional assertion “If $P$ then $Q$” has the precise truth table shown here.

     

    $P$ $Q$ If $P$ then $Q$
    T T T
    T F F
    F T T
    F F T

    The meaning of “If $P$ then $Q$” is determined entirely by the truth values of $P$ and $Q$ and this truth table. The meaning is not determined by the usual English meanings of the words “if” and “then”.

    The truth table is summed up by this purple pronouncement:

    The Prime Directive of conditional assertions:
    A conditional assertion is true unless
    the hypothesis is true and the conclusion is false.
    That means that to prove “If $P$ then $Q$” is  FALSE
    you must show that $P$ is TRUE(!) and $Q$ is FALSE.

    The Prime Directive is harder to believe in than leprechauns. Some who are new to abstract math get into an enormous amount of difficulty because they don’t take it seriously.

    Example

    The statement “if $n\gt 5$, then $n\gt 3$” is true for all integers
    $n$.

    • This means that “If $7\gt 5$ then $7\gt 3$” is true.
    • It also means that “If $2\gt 5$ then $2\gt 3$” is true!  If you really believe that “If $n\gt 5$, then $n\gt 3$” is true for all integers n, then you must in particular believe that  “If $2\gt 5$ then $2\gt 3$” is true.  That’s why the truth table for conditional assertions takes the form it does.
    • On the other hand, “If $n\gt 5$, then $n\gt 8$” is not true for all integers $n$.  In particular, “If $7\gt 5$, then $7\gt 8$” is false. This fits what the truth table says, too.

    For more about this, see Understanding conditionals.

    Remark

    Most of the time in mathematical writing the conditional assertions which are actually stated involve assertions containing variables, and the claim is typically that the assertion is true for all instances of the variables. Assertions involving statements without variables occur only implicitly in the process of checking instances of the assertions. That is why a statement such as, “If $2\gt 5$ then $2\gt 3$” seems awkward and unfamiliar.

    It is unfamiliar and occurs rarely. I mention it here because of the occurrence of vacuous truths, which do occur in mathematical writing.

    Conditionals and Truth Sets

    The set $\{x|P(x)\}$ is the set of exactly all $x$ for which $P(x)$ is true. It is called the truth set of $P(x)$.

    Examples
    • If $n$ is an integer variable, then the truth set of “$3\lt n\lt9$” is the set $\{4,5,6,7,8\}$.
    • The truth set of “$n\gt n+1$” is the empty set.

    Weak and strong

    “If $P(x)$ then $Q(x)$” means that $\{x|P(x)\}\subseteq
    \{x|Q(x)\}$.  We say $P(x)$ is stronger than $Q(x)$, meaning that $P$ puts more requirements on $x$ than $Q$ does.  The objects $x$ that make $P$ true necessarily make $Q$ true, so there might be objects making $Q$ true that don’t make $P$ true.

    Example

    The statement “$x\gt4$” is stronger than the statement “$x\gt\pi$”. That means that $\{x|x\gt4\}$ is a proper subset of $\{x|x\gt\pi\}$. In other words, $\{x|x\gt4\}$ is “smaller” than $\{x|x\gt\pi\}$ in the sense of subsets. For example, $3.5\in\{x|x\gt\pi\}$ but $3.5\notin\{x|x\gt4\}$. This is a kind of reversal (a Galois correspondence) that confused many of my students.

    “Smaller” means the truth set of the stronger statement omits elements that are in the truth set of the weaker statement. In the case of finite truth sets, “smaller” also means it has fewer elements, but that does not necessarily work for infinite sets, such as in the example above, because the two truth sets $\{x|x\gt4\}$ and $\{x|x\gt\pi\}$ have the same cardinality.

    Making a statement stronger
    makes its truth set “smaller”.

    Terminology and usage

    Hypothesis and conclusion

    In the assertion “If $P$, then $Q$”:

    • P is the hypothesis or antecedent
      of the assertion.  It is a constraint or condition that holds in the very narrow context of the assertion.  In other words, the assertion, “If $P$, then $Q$” does not say that $P$ is true. The idea of the direct method of proof is to assume that $P$ is true during the proof.
    • $Q$ is the conclusion or consequent. It is also incorrect to assume that $Q$ is true anywhere else except in the assertion “If $P$, then $Q$”.

    “Implication”

    Conditionals such as “If $P$ then $Q$” are also called implications , but be wary: “implication” is a technical term and does not fit the meaning of the word in conversational English.

    • In ordinary English, you might ask, “What are the implications of knowing that $x\gt4$? Answer: “Well, for one thing, $x$ is bigger that $\pi$.”
    • In the terminology of math and logic, the whole statement “If $x\gt4$ then $x\gt\pi$” is called an “implication”.

    Vacuous truth

    The last two lines of the truth table for conditional assertions mean that if the hypothesis of the assertion is false, then the assertion is automatically true.
    In the case that “If $P$ then $Q$” is true because $P$ is false, the assertion is said to be vacuously true.

    The word “vacuous” refers to the fact that in the vacuous case the conditional assertion says nothing interesting about either $P$ or $Q$. In particular, the conditional assertion may be true even if the conclusion is false (because of the last line of the truth table).

    Example

    Both these statements are vacuously true!

    • If $4$ is odd, then $3 = 3$.
    • If $4$ is odd, then $3\neq3$.
    Example

    If $A$ is any set then $\emptyset\subseteq A.$ Proof (rewrite by definition): You have to prove that if $x\in\emptyset$, then $x\in A$. But the statement “$x\in\emptyset$” is false no matter what $x$ is, so the statement “$\emptyset\subseteq A$” is vacuously true.

    Definitions involving vacuous truth

    Vacuous truth can cause surprises in connection with certain concepts which are defined using a conditional assertion.

    Example
    • Suppose $R$ is a relation on a set $S$. Then $R$ is antisymmetric if the following statement is true: If for all $x,y\in S$, $xRy$ and $yRx$, then $x=y$.
    • For example, the relation “$\leq$” on the real numbers is antisymmetric, because if $x\leq y$ and $y\leq x$, then $x=y$.
    • The relation “$\lt$” on the real numbers is also antisymmetric. It is vacuously antisymmetric, because the statement

      (AS) “if $x\gt y$ and $y\gt x$, then $x = y$”

      is vacuously true. If you say it can’t happen that $x\gt y$ and $y\gt x$, you are correct, and that means precisely that (AS) is vacuously true.

    Remark

    Although vacuous truth may be disturbing when you first see it, making either statement in the example false would result in even more peculiar situations. For example, if you decided that “If $P$ then $Q$” must be false when $P$ and $Q$ are both false, you would then have to say that this statement

    “For any integers $m$ and $n$, if $m\gt 5$ and $5\gt n$, then $m\gt n$”
     

     

    is not always true (substitute $3$ for $m$ and $4$ for $n$ and you get both $P$ and $Q$ false). This would surely be an unsatisfactory state of affairs.

    How conditional assertions are worded

    A conditional assertion may be worded in various ways.  It takes some practice to get used to understanding all of them as conditional.

    Our habit of swiping English words and phrases and changing their meaning in an unintuitive way causes many problems for new students, but I am sure that the worst problem of that kind is caused by the way conditional assertions are worded.

    In math English

    The most common ways of wording a conditional assertion with hypothesis $P$ and conclusion $Q$ are:

    • If $P$, then $Q$.
    • $P$ implies $Q$.
    • $P$ only if $Q$.
    • $P$ is sufficient for $Q$.
    • $Q$ is necessary for $P$.

    In the symbolic language

    • $P(x)\to Q(x)$
    • $P(x)\Rightarrow Q(x)$
    • $P(x)\supset Q(x)$

    Math logic is notorious for the many different symbols used by different authors with the same meaning. This is in part because it developed separately in three different academic areas: Math, Philosophy and Computing Science.

    Example

    All the statements below mean the same thing. In these statements $n$ is an integer variable.

    • If $n\lt5$, then $n\lt10$.
    • $n\lt5$ implies $n\lt10$.
    • $n\lt5$ only if $n\lt10$.
    • $n\lt5$ is sufficient for $n\lt10$.
    • $n\lt10$ is necessary for $n\lt5$.
    • $n\lt5\to n\lt10$
    • $n\lt5\Rightarrow n\lt10$
    • $n\lt5\supset n\lt10$

    Since “$P(x)\supset Q(x)$” means that $\{x|P(x)\}\subseteq
    \{x|Q(x)\}$, there is a notational clash between implication written “$\supset $” and inclusion written “$\subseteq $”. This is exacerbated by the two meanings of the inclusion symbol “$\subset$”.

    These ways of wording conditionals cause problems for students, some of them severe. They are discussed in the section Understanding conditionals.

    Usage of symbols

    The logical symbols “$\to$”, “$\Rightarrow$”,
    “$\supset$” are frequently used when writing on the blackboard, but are not common in texts, except for texts in mathematical logic.

    More about implication in logic

    If you know some logic, you may know that there is a subtle difference between the statements

    • “If $P$ then $Q$”
    • “$P$ implies $Q$”.

    Here is a concrete example:

    1. “If $x\gt2$,  then $x$ is positive.”
    2. “$x\gt2$ implies that $x$ is positive.”

    Note that the subject of sentence (1) is the (variable) number $x$, but the subject of sentence (2) is the assertion
    “$x\lt2$”.   Behind this is a distinction made in formal logic between the material conditional “if $P$ then $Q$” (which means that $P$ and $Q$ obey the truth table for “If..then”) and logical consequence ($Q$ can be proved given $P$). I will ignore the distinction here, as most mathematicians do except when they are proving things about logic.

    In some texts, $P\Rightarrow Q$ denotes the material conditional and $P\to Q$ denotes logical consequence.

    Universal conditional assertions

    A conditional assertion containing a variable that is true for any value of the correct type of that variable is a universally true conditional assertion. It is a special case of the general notion of universally true assertion.

    Examples
    1. For all $x$, if $x\lt5$, then $x\lt10$.
    2. For any integer $n$, if $n^2$ is even, then $n$ is even.
    3. For any real number $x$, if $x$ is an integer, then $x^2$ is an integer.

    These are all assertions of the form “If $P(x)$, then $Q(x)$”. In (1), the hypothesis is the assertion “$x\lt5$”; in (2), it is the assertion “$n^2$ is even”, using an adjective to describe property that $n^2$ is even; in (3), it is the assertion “$x$ is an integer”, using a noun to assert that $x$ has the property of being an integer. (See integral.)

    Expressing universally true conditionals in math English

    The sentences listed in the example above provide ways of expressing universally true conditionals in English. They use “for all” or “for any”, You may also use these forms (compare in this discussion of universal assertions in general.)

    • For all functions $f$, if $f$ is differentiable then it is continuous.
    • For (every, any, each) function $f$, if $f$ is differentiable then it is continuous.
    • If $f$ is differentiable then it is continuous, for any function $f$.
    • If $f$ is differentiable then it is continuous, where $f$ is any function.
    • If a function $f$ is differentiable, then it is continuous. (See indefinite article.)

    Sometimes mathematicians write, “If the function $f$ is differentiable, then it is continuous.” At least sometimes, they mean that every function that is differentiable is continuous. I suspect that this usage occurs in texts written by non-native-English speakers.

    Disguised conditionals

    There are other ways of expressing universal conditionals that are disguised, because they are not conditional assertions in English.

    Let $C(f)$ mean that $f$ is continuous and and $D(f)$ mean that $f$ is differentiable. The (true) assertion

    “For all $f$, if $D(f)$, then $C(f)$”
     

     

    can be said in the following ways:

    1. Every (any, each) differentiable function is continuous.
    2. All differentiable functions are continuous.
    3. Differentiable functions are continuous. Or: “…are always continuous.”
    4. A differentiable function is continuous.
    5. The differentiable functions are continuous.

    Notes

    • Watch out for (4). Beginning abstract math students sometimes don’t recognize it as universal. They may read it as “Some differentiable function is continuous.” Authors often write, “A differentiable function is necessarily continuous.”
    • I believe that (5) is obsolescent. I don’t think younger native-English-speaking Americans would use it. (Warning: This claim is not based on lexicographical research.)

    Assertions related to a conditional assertion

    Converse

    The converse of a conditional assertion “If $P$ then $Q$” is “If $Q$ then $P$”.

    Whether a conditional assertion is true
    has no bearing on whether its converse it true.

    Examples
    • The converse of “If it’s a cow, it eats grass” is “If it eats grass, it’s a cow”. The first statement is true (let’s ignore the Japanese steers that drink beer or whatever), but the second statement is definitely false. Sheep eat grass, and they are not cows..
    • The converse of “For all real numbers $x$, if $x > 3$, then $x > 2$.” is “For all real numbers $x$, if $x > 2$, then $x > 3$.” The first is true and the second one is false.
    • “For all integers $n$, if $n$ is even, then $n^2$ is even.” Both this statement and its converse are true.
    • “For all integers $n$, if $n$ is divisible by $2$, then $2n +1$ is divisible by $3$.” Both this statement and its converse are false.

    Contrapositive

    The contrapositive of a conditional assertion “If $P$ then $Q$” is “If not $Q$ then not $P$.”

    A conditional assertion and its contrapositive
    are both true or both false.

    Example

    The contrapositive of
    “If $x > 3$, then $x > 2$”
    is (after a little translation)
    “If $x\leq2$ then $x\leq3$.”
    For any number $x$, these two statements are both true or both false.

    This means that if you prove “If not $P$ then not $Q$”, then you have also proved “If $P$ then $Q$.”

    You can prove an assertion by proving its contrapositive.

    This is called the contrapositive method and is discussed in detail in this section.

    So a conditional assertion and its contrapositive have the same truth value. Two assertions that have the same truth value are said to be equivalent. Equivalence is discussed with examples in the Wikipedia article on necessary and sufficient.

    Understanding conditional assertions

    As you can see from the preceding discussions, statements of the form “If $P$ then Q” don’t mean the same thing in math that they do in ordinary English. This causes semantic contamination.

    Examples

    Time

    In ordinary English, “If $P$ then $Q$” can suggest order of occurrence. For example, “If we go outside, then the neighbors will see us” implies that the neighbors will see us after we go outside.

    Consider “If $n\gt7$, then $n\gt5$.” If $n\gt7$, that doesn’t mean $n$ suddenly gets greater than $7$ earlier than $n$ gets greater than $5$. On the other hand, “$n\gt5$ is necessary for $n\gt7$” (which remember means the same thing as “If $n\gt7$, then $n\gt5$) doesn’t mean that $n\gt5$ happens earlier than $n\gt7$. Since we are used to “if…then” having a timing implication, I suspect we get subconscious dissonance between “If $P$ then $Q$” and “$Q$ is necessary for $P$” in mathematical statements, and this dissonance makes it difficult to believe that that can mean the same thing.

    Causation

    “If $P$ then $Q$” can also suggest causation. The the sentence, “If we go outside, the neighbors will see us” has the connotation that the neighbors will see us because we went outside.

    The contrapositive is “If the neighbors won’t see us, then we don’t go outside.” This English sentence seems to me to mean that if the neighbors are not around to see us, then that causes us to stay inside. In contrast to contrapositive in math, this means something quite different from the original sentence.

    Wrong truth table

    For some instances of the use of “if…then” in English, the truth table is different.

    Consider: “If you eat your vegetables, you can have dessert.” Every child knows that this means they will get dessert if they eat their vegetables and not otherwise. So the truth table is:






    $P$ $Q$ If $P$ then $Q$
    T T T
    T F F
    F T F
    F F T

    In other words, $P$ is equivalent to $Q$. It appears to me that this truth table corresponds to English “if…then” when a rule is being asserted.

    These examples show:

    The different ways of expressing conditional assertions
    may mean different things in English.

    How can you get to the stage where you automatically understand the meaning of conditional assertions in math English?

    You need to understand the equivalence of these formulations so well that it is part of your unconscious reaction to conditionals.

    How can you gain that intuitive understanding? One way is by doing abstract math regularly for several years! (Of course, this is how you gain expertise in anything.) In other words, Practice, Practice!

    Rigor

    But it may help to remember that when doing proofs, we must take the rigorous view of mathematical objects:

    • Math objects don’t change.
    • Math objects don’t cause anything to happen.

    The integers (like all math objects) just sit there, not doing anything and not affecting anything. $10$ is not greater than $4$ “because” it is greater than $7$. There is no “because” in rigorous math. Both facts, $10\gt4$ and $10\gt7$, are eternally true.

    Eternal is how we think of them – I am not making a claim about “reality”.

    • When you look at the integers, every time you find one that is greater than $7$ it turns out to be greater than $4$. That is how to think about “If $n > 7$, then $n > 4$”.
    • You can’t find one that is greater than $7$ unless it is greater than $4$: It happens that $n > 7$ only if $n > 4$.
    • Every time you look at one less than or equal to $4$ it turns out to be less than or equal to $7$ (contrapositive).

    These three observations describe the same set of facts about a bunch of things (integers) that just sit there in their various relationships without changing, moving or doing anything. If you keep these remarks in mind, you will eventually have a natural, unforced understanding of conditionals in math.

    Remark

    None of this means you have to think of mathematical objects as dead and fossilized all the time. Feel free to think of them using all the metaphors and imagery you know, except when you are reading or formulating a proof written in mathematical English. Then you have to be rigorous!

    Modus ponens

    The truth table for conditional assertions may be summed up by saying: The conditional assertion “If $P$, then $Q$” is true unless $P$ is true and $Q$ is false.

    This fits with the major use of conditional assertions in reasoning:

    Modus Ponens

    • If you know that a conditional assertion is true
    • and
      you know that its hypothesis is true,
    • then you know its conclusion is true.

    In symbols:

    (1) When “If $P$ then $Q$” and $P$ are both true,

    (2) then $Q$ must be true as well.

    Modus Ponens is the most used method of deduction of all.

    Remark

    Modus ponens is not a method of proving conditional assertions. It is a method of using a conditional assertion in the proof of another assertion. Methods for proving conditional assertions are found in the chapter Forms of proof.

    Creative Commons License<![endif]>

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

     


    Send to Kindle

    Variations in meaning in math

    Words in a natural language may have different meanings in different social groups or different places.  Words and symbols in both mathematical English and the symbolic language vary according to specialty and, occasionally, country (see convention, default).  And words and symbols can change their meanings from place to place within the same mathematical discourse (see scope).

    This article mostly provides pointers to other articles in abstractmath.org that give more details about the ideas.

    Conventions

    A convention in mathematical discourse is notation or terminology used with a special meaning in certain contexts or in certain fields. Articles and books in a specialty do not always clue you in on these conventions.

    Some conventions are nearly universal in math.

    Example 1

    The use of “if” to mean “if and only if” in a definition is a convention. More about this here. This is a hidden definition by cases. “Hidden” means that no one tells the students, except for Susanna Epp and me.

    Example 2

    Constants or parameters are conventionally denoted by a, b, … , functions by f, g, … and variables by x, y,…. More.

    Example 3

    Referring to a group (or other mathematical structure) and its underlying set by the same name is a convention.  This is an example of both synecdoche and context-sensitive.

    Example 4

    The meaning of ${{\sin }^{n}}x$ in many calculus books is:

    • The inverse sine (arcsin) if $n=-1$.
    • The mult­iplica­tive power for positive $n$; in other words, ${{\sin }^{n}}x={{(\sin x)}^{n}}$ if $n\ne -1$.

    This, like Example 1, is a definition by cases. Unlike Example 1, calculus books often make it explicit. Explicit or not, this usage is an abomination.

    Some conventions are pervasive among math­ematicians but different conventions hold in other subjects that use mathematics.

    • Scientists and engineers may regard a truncated decimal such as 0.252 as an approximation, but a mathematician is likely to read it as an exact rational number, namely $\frac{252}{1000}$.
    • In most computer languages a distinction is made between real numbers and integers;
      42 would be an integer but 42.0 would be a real number.  Older mathematicians may not know this.
    • Mathematicians use i to denote the imaginary unit. In electrical engineering it is commonly denoted j instead, a fact that many mathematicians are un­aware of. I first learned about it when a student asked me if i was the same as j.

    Conventions may vary by country.

    • In France and possibly other countries schools may use “positive” to mean “nonnegative”, so that zero is positive. 
    • In the secondary schools in some places, the value of sin x may be computed clockwise starting at (0,1)  instead of counterclockwise starting at (1,0).  I have heard this from students. 

    Conventions may vary by specialty within math.

    Field” and “log” are examples. 

    Defaults

    An interface to a computer program may have many possible choices for the user to make. In most cases, the interface will use certain choices automatically when the user doesn’t specify them.  One says the program defaults to those choices.  

    Examples

    • A word processing program may default to justified paragraphs and insert mode, but allow you to pick ragged right or typeover mode.
    • I have spent a lot of time in both Minne­sota and Georgia and the remarks about skiing are based on my own observation. But these usages are not absolute. Some affluent Geor­gians may refer to snow skiing as “skiing”, for example, and this usage can result in a put-down if the hearer thinks they are talking about water skiing. One wonders where the boundary line is. Perhaps people in Kentucky are confused on the issue.

    • There is a sense in which the word “ski” defaults to snow skiing in Minnesota and to water skiing in Georgia.
    • “CSU” defaults to Cleveland State University in northern Ohio and to Colorado State University in parts of the west.

    Math language behaves in this way, too.

    Default usage in mathematical discourse

    Symbols

    • In high school, $\pi$ refers by default to the ratio of the circumference of a circle to its diameter.  Students are often quite surprised when they get to abstract math courses and discover the many other meanings of $\pi $ (see here).
    • Recently authors in the popular literature seem to think that $\phi$ (phi) defaults to the golden ratio.  In fact, a search through the research literature shows very few hits for $\phi$ meaning the golden ratio: in other words, it usually means something else. 
    • The set $\mathbb{R}$ of real numbers has many different group structures defined on it but “The group $\mathbb{R}$” essentially always means that the group operation is ordinary addition.  In other words, “$\mathbb{R}$” as a group defaults to +.  Analogous remarks apply to “the field $\mathbb{R}$”. 
    • In informal conversation among many analysts, functions are continuous by default.
    • It used to be the case that in informal conversations among topologists, “group” defaulted to Abelian group. I don’t know whether that is still true or not.

    Remark

    This meaning of “default” has made it into dictionaries only since around 1960 (see the Wikipedia entry). This usage does not carry a derogatory connotation.   In abstractmath.org I am using the word to mean a special type of convention that imposes a choice of parameter, so that it is a special case of both “convention” and “suppression of parameters”.

    Scope

    Both mathematical English and the symbolic language have a feature that is uncommon in ordinary spoken or written English:  The meaning of a phrase or a symbolic expression can be different in different parts of the discourse.   The portion of the text in which a particular meaning is in effect is called the scope of the meaning.  This is accomplished in several ways.

    Explicit statement

    Examples

    • “In this paper, all groups are abelian”.  This means that every instance of the word “group” or any symbol denoting a group the group is constrained to be abelian.   The scope in this case is the whole paper.   See assumption.
    • “Suppose (or “let” or “assume”) $n$ is divisible by $4$”. Before this statement, you could not assume $n$ is divisible by $4$. Now you can, until the end of the current paragraph or section.

    Definition

    The definition of a word, phrase or symbol sets its meaning.  If the word definition is used and the scope is not given explicitly, it is probably the whole discourse.

    Example

    “Definition.  An integer is even if it is divisible by 2.”  This is marked as a definition, so it establishes the meaning of the word “even” (when applied to an integer) for the rest of the text. 

    If

    Used in modus ponens (see here) and (along with let, usually “now let…”) in proof by cases.

    Example(modus ponens)

    Suppose you want to prove that if an integer $n$ is divisible by $4$ then it is even. To show that it is even you must show that it is divisible by $2$. So you write:

    • “Let $n$ be divisible by $4$. That means $n=4k$ for some integer $k$. But then $n=2(2k)$, so $n$ is even by definition.”

    Now if you start a new paragraph with something like “For any integer $n\ldots$” you can no longer assume $n$ is divisible by $4$.

    Example (proof by cases)

    Theorem: For all integers $n$, $n^2+n+1$ is odd.

    Definitions:

    • “$n$ is even” means that $n=2s$ for some integer $s$.
    • “$n$ is odd” means that $n=2t+1$ for some integer $t$.

    Proof:

    • Suppose $n$ is even. Then

      \[\begin{align*}
      n^2+n+1&=4s^2+2s+1\\
      &=2(2s^2+s)+1\\
      &=2(\text{something})+1
      \end{align*}\]

      so $n^2+n+1$ is odd. (See Zooming and Chunking.)

    • Now suppose $n$ is odd. Then

      \[\begin{align*}
      n^2+n+1&=(2t+1)^2+2t+1+1\\
      &=4t^2+4t+1+2t+1+1\\
      &=2(2t^2+3t)+3\\
      &=2(2t^2+3t+1)+1\\
      &=2(\text{something})+1
      \end{align*}\]

      So $n^2+n+1$ is odd.

    Remark

    The proof I just gave uses only the definition of even and odd and some high school algebra. Some simple grade-school facts about even and odd numbers are:

    • Even plus even is even.
    • Odd plus odd is even.
    • Even times even is even.
    • Odd times odd is odd.

    Put these facts together and you get a nicer proof (I think anyway): $n^2+n$ is even, so when you add $1$ to it you must get an odd number.

    Bound variables

    A variable is bound if it is in the scope of an integral, quantifier, summation, or other binding operators.  More here.

    Example

    Consider this text:

    Exercise: Show that for all real numbers $x$, it is true that $x^2\geq0$. Proof: Let $x=-2$. Then $x^2=(-2)^2=4$ which is greater than $0$. End of proof.”

    The problem with that text is that in the statement, “For all real numbers $x$, it is true that $x^2\geq0$”, $x$ is a bound variable. It is bound by the universal quantifier “for all” which means that $x$ can be any real number whatever. But in the next sentence, the meaning of $x$ is changed by the assumption that $x=-2$. So the statement that $x\geq0$ only applies to $-2$. As a result the proof does not cover all cases.

    Many students just beginning to learn to do proofs make this mistake. Fellow students who are a little further along may be astonished that someone would write something like that paragraph and might sneer at them. But this common mistake does not deserve a sneer, it deserves an explanation. This is an example of the ratchet effect.

    Variable meaning in natural language

    Meanings commonly vary in natural language because of conventions and defaults. But varying in scope during a conversation seems to me uncommon.

    It does occur in games. In Skat and Bridge, the meaning of “trump” changes from hand to hand. The meaning of “strike” in a baseball game changes according to context: If the current batter has already had fewer than two strikes, a foul is a strike, but not otherwise.

    I have not come up with non-game examples, and anyway games are played by rules that are suspiciously like mathematical axioms. Perhaps you can think of some non-game occasions in which meaning is determined by scoping that I have overlooked.

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.


    Send to Kindle

    Rigorous proofs

    Rich and rigorous

    When we try to understand a math statement, we visualize what the statement says using metaphors, images and kinetic feelings to feel how it is true, or to suggest that the statement is not true.

    If we are convinced that it is true, we may then want to prove it. Doing that involves pitching out all the lovely pictures and metaphors and gestures and treating the mathematical objects involved in the proof as static and inert. “Static” means the object does not change. “Inert” means that it does not affect anything else. I am saying how we think about math objects for the purpose of rigorous proof. I am not saying anything about “what math objects are”.

    In this post I give a detailed example of a proof of the rigorous sort.

    Example

    Informal statement

    First, I’ll describe this example in typical spoken mathematical English. Suppose you suspect that the following statement is true:

    Claim: Let $f(x)$ be a differentiable function with $f'(a)=0$.
    Going from left to right, suppose the graph of $f(x)$ goes UP before $x$ reaches $a$ and then DOWN for $x$ to the right of $a$
    Then $a$ has to be a local maximum of the function.

    This claim is written in informal math English. Mathematicians talk like that a lot. In this example they will probably wave their hands around in swoops.

    The language used is an attempt to get a feeling for the graph going up to $(a,f(a))$ and then falling away from it. It uses two different metaphors for $x\lt a$ and $x\gt a$. I suspect that most of us would want to clean that up a bit even in informal writing.

    A more formal statement

    Theorem: Let $f$ be a real valued differentiable function defined on an open interval $R$. Let $a$ be a number in $R$ for which $f'(a)=0$. Suppose that for all $x\in R$, $f$ increases for $x\lt a$ and decreases for $x\gt a$. Then $f(a)$ is a maximum of $f$ in $R$.

    Proof

    1. By definition of derivative, \[\lim_{x\to a}\frac{f(x)-f(a)}{x-a}=0.\]
    2. By definition of limit, then for any positive $\epsilon$ there is a positive $\delta$ for which if $0\lt|x-a|\lt\delta$ then \[\left|\frac{f(x)-f(a)}{x-a}\right|\lt\epsilon.\]
    3. By requiring that $\delta\lt 1$, it follows from (2) that for any positive $\epsilon$, there is a positive $\delta$ for which if $0\lt|x-a|\lt\delta$, then $|f(x)-f(a)|\lt\epsilon$.
    4. “$f$ increases for $x\lt a$” means that if $x$ and $y$ are numbers in $R$ and $x\lt y\lt a$, then $f(x)\lt f(y)$.
    5. “$f$ decreases for $x\gt a$” means that if $x$ and $y$ are numbers in $R$ and $a\lt x\lt y$, then $f(x)\gt f(y)$.
    6. “$f(a)$ is a maximum of $f$ in $R$” means that for $x\in R$, if $x\neq a$, then $f(x)\lt f(a)$.
    7. Suppose that $x\in R$ and $x\lt a$. (The case that $x\gt a$ has a symmetric proof.)
    8. Given $\epsilon\gt0$ with $\delta$ as given by (3), choose $y\in R$ such that $x\lt y\lt a$ and $|f(y)-f(a)|\lt\epsilon$.
    9. By (4), $f(x)\lt f(y)$. So by (8), \[\begin{align*}
      f(x)-f(a)&=
      f(x)-f(y)+f(y)-f(a)\\ &\lt f(y)-f(a)\\ &\leq|f(y)-f(a)|\lt\epsilon\end{align*}\]
      so that $f(x)\lt f(a)+\epsilon$. By inserting “$-f(y)+f(y)$” into the second formula, I am “adding zero cleverly”, an example of pulling a rabbit out of a hat. Students hate that. But you have to live with it; as long as the statements following are correct, it makes a valid proof. Rabbit-out-of-a-hat doesn’t make a proof wrong, but it does make you wonder how the author thought of it. Live with it.
    10. Since (9) is true for all positive $\epsilon$, it follows that $f(x)\leq f(a)$.
    11. By the same argument as that leading up to (10), $f(\frac{x-a}{2})\leq f(a)$.
    12. Since $f(x)\lt f(\frac{x-a}{2})$, it follows that $f(x)\lt f(a)$ as required.

    About the proof

    This proof is intended to be a typical “rigorous” proof. I suspect it tends to be more rigorous than most mathematicians would find necessary,

    Extensionality

    The point about “rigor”, about insisting that the objects be static and inert, is that this causes symbols and expression to retain the same meaning throughout the text. This is one aspect of extensionality.

    Of course, some of the symbols denote variables, or variable objects. This does not mean they are “varying”. I am taking this point of view: A variable refers to a math object but you don’t know what it is. Constraints such as $x\lt a$ rule out some possible values but don’t generally tell you exactly what $x$ is. There is more about this in Variable Objects

    The idea in (6), for example, is that $y$ denotes a real number. You don’t know which number it is, but you do know some facts about it: $x\lt y\lt a$, $|f(y)\lt f(a)|\lt\epsilon$ and so on. Similarly you don’t know what function $f$ is, but you do know some facts about it: It is differentiable, for example, and $f'(a)=0$.

    My statement that the variables aren’t “varying” means specifically that each unbound occurrence of the variable refers to the same value as any other occurrence, unless some intervening remark changes its meaning. For example, the references to $x$ in (7) through (10) refer to the same value it has in (6), and (10), in particular, constitutes a statement that the claim about $x$ is correct.

    Checkability

    The elimination of metaphors that lets the proof achieve rigor is part of a plan in the back of the mind of at least some mathematicians who write proofs. The idea is that the proof be totally checkable:

    • Every statement in the proof has a semantics, a meaning, that is invariant (given the remark about variables above).
    • Each statement is justified by some of the previous statements. This justification is given by two systems that the reader is supposed to understand.
    • One system is the rules of symbol manipulation that are applied to the symbolic expressions, ordinary algebra, and higher-level manipulations used in particular branches of math.
    • The other system consists of the rules of logical reasoning that justify the claims that each statement follows logically from preceding ones.
    • These two systems are really branches of one system, the entire system of math computation and reasoning. It can be obscure which system is being used in a particular step.

    Suppression of reasons

    The logical and symbolic-manipulation reasons justifying the deductions may not be made completely explicit. In fact, for many steps they may not be mentioned at all, and for others, one or two phrases may be used to give a hint. This is standard practice in writing “rigorous” proofs. That is a descriptive statement, made without criticism. Giving all the reasons is essentially impossible without a computer.

    I am aware that some work has been done to write proof checkers that can read a theorem like the one we are considering, stated in natural language, and correctly implement the semantics I have described in this list. I don’t know of any references to such work and would appreciate information about it.

    Suppression of reasons makes it difficult to mechanically check a proof written in this standard “rigorous” writing style. Basically, you must be at at least the graduate student level to be able to make sense of what is said, and even experienced math research people find it difficult to read a paper in a very different field. Writing the proof so that it can be checked by a proof checker requires understanding of the same sort, and it typically makes the proof much longer.

    One hopeful new approach is to write the proofs using homotopy type theory. The pioneers in that field report that the proofs don’t expand nearly as much as is required by first order logic.

    Examples of suppression

    Here are many examples of suppression in the $\epsilon$-$\delta$ proof above. This is intended to raise your consciousness concerning how nearly opaque writing in math research is to anyone but the cognoscenti.

    • The first sentence of the theorem names $R$ and $f$ and puts constraints on them that can be used to justify statements in the proof. The naming of $R$ and $f$ requires that every occurrence of $R$ in the proof refers to the same mathematical object, and similarly for $f$.

    Remark: The savvy reader “knows” the facts stated in (a), possibly entirely subconsciously. For many of us there is no conscious thought of constraints and permanence of naming. My goal is to convince those who teach beginning abstract math course to become conscious of these phenomena. This remark applies to all the following items as well.

    • The second sentence gives $a$ a specific meaning that will be maintained throughout the proof. It also puts constraints on $a$ and an additional constraint on $f$.
    • The third sentence gives a constraint on $R$, $f$ and $a$. It does not give a constraint on $x$, which is a bound variable. Nor does it name $x$ as a specific number with the same meaning in the rest of the proof. (That happens later).
    • The fact that the first three sentences impose constraints on various objects is signaled by the fact that the sentences are introduced by “let” and “suppose”. The savvy reader knows this.
    • The fourth sentence announces that “$f(a)$ is a maximum of $f$ in $R$” is a consequence of the constraints imposed by the preceding three sentences. (In other words, it follows from the context.) This is signaled by the word “then”.
    • The fact that the paragraph is labeled “Theorem” informs us that the fourth sentence is therefore a statement of what is to be proved, and that every constraint imposed by the first three sentences of the Theorem may be used in the proof.
    • In the proof, statements (1), (4), (5) and (6) rewrite the statements in the theorem according to the definitions of the words involved, namely “derivative: “increases”, “decreases” and “maximum”. Rewriting statements according to the definitions of the words involved is a fundamental method for starting a proof.
    • (2) follows from (1) by rewriting using the definition of “limit”. Note that pattern-matching against the definition of limit requires understanding that there is a zero inside the absolute value signs that is not written down. Could a computer proof-checker handle that?
    • (3) follows from (2). The reader or proof-checker must:
      • Know that it is acceptable to put an upper bound on $\delta$ in the definition of limit.
      • Notice that you can move $|x-a|$ out of the denominator because $x\neq a$ by (2).
    • The conclusion in (6) that we much show that $f(x)\lt f(a)$ is now the statement we must prove.

    Remark: In the following items, I mention the context of the proof. I am using the word informally here. It is used in some forms of formal logic with a related but more precise meaning. The context consists of the variables you must hold in your head as you read each part of the proof, along with their current constraints. “Current” means the “now” that you are in when considering the step of the proof you are reading right now. I give some references at the end of the post.

    • At the point between (6) and (7), our context consists of $a$, $R$ and $f$ all subject to some constraints. $x$ is not yet in the context of our proof because its previous occurrences in the theorems and in (1) through (6) have been bound, mostly by an unexpressed universal quantifier. Now we are to think of $x$ as a specific number bound by some constraints.
    • The statement in (7) that the case $x\gt a$ as a symmetric proof is a much higher-level claim than the other steps in this proof, even though in fact it is not very high level compared to statements such as “An application of Serre’s spectral sequence shows$\ldots$”. Most mathematicians with even a little experience will read this statement and accept it in the confidence that they will know how to swap “$\lt$” and “$\gt$” in the proof in the correct way (which is a bit picky) to provide a dual proof. Some students might write out the dual proof to make sure they understand it (more likely because writing it out was a class assignment). I await the day that an automated proof checker can handle a statement like this.
    • (8) introduces three new math objects $\epsilon$, $\delta$ and $y$ subject to several constraints. The symbols occur earlier but they are all bound. $\epsilon$ will be fixed in our context from now until (10). The others don’t appear later.
    • (9) consists of several steps of algebraic computation. A cognoscent (I am tired of writing “savvy”) reader first looks at the computation as a whole and notices that it deduces that $|f(x)-f(a)|\lt\epsilon$, which is almost what is to be proved. This helps the reader understand the reason for the calculation. No mention whatever is made in this step of all this stuff that should go through your mind (or the proof-checker’s “mind”).
    • The computations in (9) are are basic algebra not explained step by step, except that the remark that $f(x)\lt f(y)$ explains how you get $f(x)-f(y)+f(y)-f(a) \lt f(y)-f(a)$.
    • (10) banishes $\epsilon$ from the context by universally quantifying over it. That $f(x)\leq f(a)$ follows by the garbage-dump-in-Star-Wars trick that often baffles first year analysis students: Since for all positive $\epsilon$, $f(x)\lt f(a)+\epsilon$, then $f(x)\leq f(a)$. (See also Terry Tao’s article in Tricks Wiki.)
    • (11) “By the same argument as leading up to (10)” puts some demands on the reader, who has to discover that you have to go back to (7) and do the following steps with a new context using a value of $x$ that is halfway closer to $a$ than the “old” $x$ was. This means in particular that the choice of $\frac{x-2}{2}$ is unnecessarily specific. But it works.
    • (12) suppresses the reference to (11).
    • References

      I have written extensively on these topics. Here are some links.

      Rich-rigorous bifurcation in math thinking

    The symbolic language

    Math English and the language of proofs

    Proofs and context

    Send to Kindle

    Representations of mathematical objects

    This is a long post. Notes on viewing.

    About this post

    A mathematical object, or a type of math object, is represented in practice in a great variety of ways, including some that mathematicians rarely think of as "representations".  

    In this post you will find examples and comments about many different types of representations as well as references to the literature. I am not aware that anyone has considered all these different ideas of representation in one place before. Reading through this post should raise your consciousness about what is going on when you do math.  

    This is also an experiment in exposition.  The examples are discussed in a style similar to the way a Mathematica command is discussed in the Documentation Center, using mostly nonhierarchical bulleted lists. I find it easy to discover what I want to know when it is written in that way.  (What is hard is discovering the name of a command that will do what I want.)

    Types of representations

    Using language

    • Language can be used to define a type of object.
    • A definition is intended to be precise enough to determine all the properties that objects of that type all have.  (Pay attention to the two uses of the word "all" in that sentence; they are both significant, in very different ways.)
    • Language can be used to describe an object, exhibiting properties without determining all properties.
    • It can also provide metaphors, making use of one of the basic tools of our brain to understand the world. 
    • The language used is most commonly mathematical English, a special dialect of English.
    • The symbolic language of mathematics (distinct from mathematical English) is used widely in calculations. Phrases from the symbolic language are often embedded in a statement in math English. The symbolic language includes among others algebraic notation and logical notation. 
    • The language may also be a formal language, a language that is mathematically defined and is thus itself a mathematical object. Logic texts generally present the first order predicate calculus as a formal language. 
    • Neither mathematical English nor the symbolic language is a formal language. Both allow irregularities and ambiguities.

    Mathematical objects

    The representation itself may be a mathematical object, such as:

    • A linear representation of a group. Not only are the groups mathematical objects, so is the representation.
    • An embedding of a manifold into Euclidean space. A definition given in a formal language of the first order predicate calculus of the property of commutativity of binary operations. (Thus a property can be represented as a math object.)

    Visual representations

    A math object can be represented visually using a physical object such as a picture, graph (in several senses), or diagram.  

    • The visual processing of our brain is our major source of knowledge of the world and takes about a fifth of the brain's processing power.  We can learn many things using our vision that would take much longer to learn using verbal descriptions.  (Proofs are a different matter.)
    • When you look at a graph (for example) your brain creates a mental representation of the graph (see below).

    Mental representations

    If you are a mathematician, a math object such as "$42$", "the real numbers" or "continuity" has a mental representation in your brain.  

    • In the math ed literature, such a representation is called "mental image", "concept image", "procept", or "schema".   (The word "image" in these names is not thought of as necessarily visual.) 
    • The procept or schema describe all the things that come to mind when you think about a particular math object: The definition, important theorems, visual images, important examples, and various metaphors that help you understand it. 
    • The visual images occuring in a mental schema for an object may themselves be mental representations of physical objects. The examples and theorems may be mental representations of ideas you learned from language or pictures, and so on.  The relationships between different kinds of representations get quite convoluted.

    Metaphors

    Conceptual metaphors are a particular kind of mental representation of an object which involve mentally associating some aspects of the objects with some aspects of something else — a physical object, an image, an action or another abstract object.

    • A conceptual metaphor may give you new insight into the object.
    • It may also mislead you because you think of properties of the other object that the math object doesn't have.
    • A graph of a function is a conceptual metaphor.
    • When you say that a point on a graph "rises as it goes from left to right" your metaphor is an action. 
    • When you say that the cosets of a normal subgroup of a group "get along" with the group multiplication, your metaphor identifies a property they have with an aspect of human behavior.

    Properties of representations

    A representation of a math object may or may not

    • determine it completely
    • exhibit some of its properties
    • suggest easy proofs of some theorems
    • provide a useful way of thinking about it
    • mislead you about the object's properties
    • mislead you about what is significant about the object

    Examples of representations

    This list shows many of the possibilities of representation.  In each case I discuss the example in terms of the two bulleted lists above. Some of the examples are reused from my previous publications.

    Functions

    Example (F1) "Let $f(x)$ be the function defined by $f(x)=x^3-x$."

    • This is an expression in mathematical English that a fluent reader of mathematical English will recognize gives a definition of a specific function.
    • (F1) is therefore a representation of that function.  
    • The word "representation" is not usually used in this way in math.  My intention is that it should be recognized as the same kind of object as many other representations.
    • The expression contains the formula $x^3-x$.  This is an encapsulated computation in the symbolic language of math. It allows someone who knows basic algebra and calculus to perform calculations that find the roots, extrema and inflection points of the function $f$.  
    • The word "let" suggests to the fluent reader of mathematical English that (F1) is a definition which is probably going to hold for the next chunk of text, but probably not for the whole article or book.
    • Statements in mathematical English are generally subject to conventions.  In a calculus text (F1) would automatically mean that the function had the real numbers as domain and codomain.
    • The last two remarks show that a beginner has to learn to read mathematical English. 
    • Another convention is discussed in the following diatribe.

    Diatribe 

    You would expect $f(x)$ by itself to mean the value of $f$ at $x$, but in (F1) the $x$ has the property of a bound variable.  In mathematical English, "let" binds variables. However, after the definition, in the text the "$x$" in the expression "$f(x)$" will be free, but the $f$ will be bound to the specific meaning.  It is reasonable to say that the term "$f(x)$" represents the expression "$x^3-x$" and that $f$ is the (temporary) name of the function. Nevertheless, it is very common to say "the function $f(x)$" to mean $f$.  

    A fluent reader of mathematical English knows all this, but probably no one has ever said it explicitly to them.  Mathematical English and the symbolic language should be taught explicitly, including its peculiarities such as "the function $f(x)$".  (You may want to deprecate this usage when you teach it, but students deserve to understand its meaning.)

    The positive integers

    You have a mental representation of the positive integers $1,2,3,\ldots$.  In this discussion I will assume that "you" know a certain amount of math.  Non-mathematicians may have very different mental representations of the integers.

    • You have a concept of "an integer" in some operational way as an abstract object.
    • "Abstract object" needs a post of its own. Meanwhile see Mathematical Objects (abstractmath) and the Wikipedia articles on Mathematical objects and Abstract objects.
    • You have a connection in your brain between the concept of integer and the concept of listing things in order, numbering them by $1,2,3,\ldots$.
    • You have a connection in your brain between the concept of an integer and the concept of counting a finite number of objects.  But then you need zero!
    • You understand how to represent an integer using the decimal representation, and perhaps representations to other bases as well. 
    • Your mental image has the integer "$42"$ connected to but not the same as the decimal representation "42". This is not true of many students.
    • The decimal rep has a picture of the string "42" associated to it, and of course the picture of the string may come up when you think of the integer $42$ as well (it does for me — it is a an icon for the number $42$.)
    • You have a concept of the set of integers. 
    • Students need to be told that by convention "the set of integers" means the set of all integers.  This particularly applies to students whose native language does not have articles, but American students have trouble with this, too.
    • Your concept of  "the set of integers" may have the icon "$\mathbb{N}$" associated with it.  If you are a mathematician, the icon and the concept of the set of integers are associated with each other but not identified with each other.
    • For me, at least, the concept "set of integers" is mentally connected to each integer by the "element of" relation. (See third bullet below.)
    • You have a mental representation of the fact that the set of integers is infinite.  
    • This does not mean that your brain contains an infinite number of objects, but that you have a representation of infinity as a concept, it is brain-connected to the concept of the set of integers, and also perhaps to a proof of the fact that $\mathbb{N}$ is infinite.
    • In particular, the idea that the set of integers is mentally connected to each integer does not mean that the whole infinite number of integers is attached in your brain to the concept of the set of integers.  Rather, the idea is a predicate in your brain.  When it is connected to "$42$", it says "yes".  To "$\pi$" it says "No".
    • Philosophers worry about the concept of completed infinity.  It exists as a concept in your brain that interacts as a meme with concepts in other mathematicians' brains. In that way, and in that way only (as far as I am concerned) it is a physical object, in particular an object that exists in scattered physical form in a social network.

    Graph of a function

    This is a graph of the function $y=x^3-x$:

    Graph of a cubic function

    • The graph is a physical object, either on a screen or on paper
    • It is processed by your visual system, the most powerful sensory management system in your brain
    • It also represents the graph in the mathematical sense (set of ordered pairs) of the function $y=x^3-x$
    • Both the mathematical graph and the physical graph are represented by modules in your brain, which associates the two of them with each other by a conceptual metaphor
    • The graph shows some properties of the function: inflection point, going off to infinity in a specific way, and so on.
    • These properties are made apparent (if you are knowledgeable) by means of the powerful pattern recognition system in your brain. You see them much more quickly than you can discover them by calculation.
    • These properties are not proved by the graph. Nevertheless, the graph communicates information: for example, it suggests that you can prove that there is an inflection point near $(0,0)$.
    • The graph does not determine or define the function: It is inaccurate and it does not (cannot) show all of the graph.
    • More subtle details about this graph are discussed in my post Representations 2.

    Continuity

    Example (C1) The $\epsilon-\delta$ definition of the continuity of a function $f:\mathbb{R}\to\mathbb{R}$ may be given in the symbolic language of math:

    A function $f$ is continuous at a number $c$ if \[\forall\epsilon(\epsilon\gt0\implies(\forall x(\exists\delta(|x-c|\lt\delta\implies|f(x)-f(c)|\lt\epsilon)))\]

    • To understand (C1), you must be familiar with the notation of first order logic.  For most students, getting the notation right is quite a bit of work.  
    • You must also understand  the concepts, rules and semantics of first order logic.  
    • Even if you are familiar with all that, continuity is still a difficult concept to understand.
    • This statement does show that the concept is logically complicated. I don't see how it gives any other intuition about the concept. 

    Example (C2) The definition of continuity can also be represented in mathematical English like this:

    A function $f$ is continuous at a number $c$ if for any $\epsilon\gt0$ and for any $x$ there is a $\delta$ such that if $|x-c|\lt\delta$, then $|f(x)-f(c)|\lt\epsilon$. 

    • This definition doesn't give any more intuition that (C1) does.
    • It is easier to read that (C1) for most math students, but it still requires intimate familiarity with the quirks of math English.
    • The fact that "continuous" is in boldface signals that this is a definition.  This is a convention.
    • The phrase "For any $\epsilon\gt0$" contains an unmarked parenthetic insertion that makes it grammatically incoherent.  It could be translated as: "For any $\epsilon$ that is greater than $0$".  Most math majors eventually understand such things subconsciously.  This usage is very common.
    • Unless it is explicitly pointed out, most students won't notice that  if you change the phrase "for any $x$ there is a $\delta$"  to "there is a $\delta$ for any $x$" the result means something quite different.  Cauchy never caught onto this.
    • In both (C1) and (C2), the "if" in the phrase "A function $f$ is continuous at a number $c$ if…" means "if and only if" because it is in a definition.  Students rarely see this pointed out explicitly.  

    Example (C3) The definition of continuity can be given in a formally defined first order logical theory

    • The theory would have to contain function symbols and axioms expressing the algebra of real numbers as an ordered field. 
    • I don't know that such a definition has ever been given, but there are various semi-automated and automated theorem-proving systems (which I know little about) that might be able to state such a definition.  I would appreciate information about this.
    • Such a definition would make the property of continuity a mathematical object.
    • An automated theorem-proving system might be able to prove that $x^3-x$ is continuous, but I wonder if the resulting proof would aid your intuition much.

    Example (C4) A function from one topological space to another is continuous if the inverse of every open set in the codomain is an open set in the domain.

    • This definition is stated in mathematical English.
    • All definitions start with primitive data. 
    • In definitions (C1) – (C3), the primitive data are real numbers and the statement uses properties of an ordered field.
    • In (C4), the data are real numbers and the arithmetic operations of a topological field, along with the open sets of the field. The ordering is not mentioned.
    • This shows that a definition need not mention some important aspects of the structure. 
    • One marvelous example of this is that  a partition of a set and an equivalence relation on a set are based on essentially disjoint sets of data, but they define exactly the same type of structure.

    Example (C4) "The graph of a continuous function can be drawn without picking up the chalk".

    • This is a metaphor that associates an action with the graph.
    • It is incorrect: The graphs of some continuous functions cannot be drawn.  For example, the function $x\mapsto x^2\sin(1/x)$ is continuous on the interval $[-1,1]$ but cannot be drawn at $x=0$. 
    • Generally speaking, if the function can be drawn then it can be drawn without picking up the chalk, so the metaphor provides a useful insight, and it provides an entry into consciousness-raising examples like the one in the preceding bullet.

    References

    1. 1.000… and .999… (post)
    2. Conceptual blending (post)
    3. Conceptual blending (Wikipedia)
    4. Conceptual metaphors (Wikipedia)
    5. Convention (abstractmath)
    6. Definitions (abstractmath)
    7. Embodied cognition (Wikipedia)
    8. Handbook of mathematical discourse (see articles on conceptual blendmental representationrepresentationmetaphor, parenthetic assertion)
    9. Images and Metaphors (abstractmath).
    10. The interplay of text, symbols and graphics in math education, Lin Hammill
    11. Math and the modules of the mind (post)
    12. Mathematical discourse: Language, symbolism and visual images, K. L. O’Halloran.
    13. Mathematical objects (abmath)
    14. Mathematical objects (Wikipedia)
    15. Mathematical objects are “out there?” (post)
    16. Metaphors in computing science ​(post)
    17. Procept (Wikipedia)
    18. Representations 2 (post)     
    19. Representations and models (abstractmath)
    20. Representations II: dry bones (post)
    21. Representation theorems (Wikipedia) Concrete representations of abstractly defined objects.
    22. Representation theory (Wikipedia) Linear representations of algebraic structures.
    23. Semiotics, symbols and mathematical visualization, Norma Presmeg, 2006.
    24. The transition to formal thinking in mathematics, David Tall, 2010
    25. Theory in mathematical logic (Wikipedia)
    26. What is the object of the encapsulation of a process? Tall et al., 2000.
    27. Where mathematics comes from, by George Lakoff and Rafael Núñez, Basic Books, 2000. 
    28. Where mathematics comes from (Wikipedia) This is a review of the preceding book.  It is a permanent link to the version of 04:23, 25 October 2012.  The review is opinionated, partly wrong, not well written and does not fit the requirements of a Wikipedia entry.  I recommend it anyway; it is well worth reading.  It contains links to three other reviews.

    Notes on Viewing  

    This post uses MathJax. If you see mathematical expressions with dollar signs around them, or badly formatted formulas, try refreshing the screen. Sometimes you have to do it two or three times.

    Send to Kindle

    Semantics of algebra I

    Note: This post uses MathJax. If you see mathematical formulas with dollar signs around them, or badly formatted formulas, try refreshing the screen. Sometimes you have to do it two or three times.

    In the post Algebra is a difficult foreign language  I listed some of the difficulties of the syntax of the symbolic language of math (which includes high school algebra and precalculus).  The semantics causes difficulties as well.  Again I will list some examples without any attempt at completeness.

    The status of the symbolic language as a language

    There is a sharp distinction between the symbolic language of math and mathematical English, which I have written about in The languages of math and in the Handbook of mathematical discourse. Other authors do not make this sharp distinction (see the list of references at the end of this post). The symbolic language occurs embedded in mathematical English and the embedding has its own semantics which may cause great difficulty for students.

    The symbolic language of math can be described as a natural formal language. Pieces of it were invented by mathematicians and others over the course of the last several hundred years. Individual pieces (notation such as "$3x+1=2y$") can be given a strictly formal syntax, but the whole system is ambiguous, inconsistent, and context-sensitive.  When you get to the research level, it has many dialects: Research mathematicians in one field may not be able to read research articles in a very different field.

    Examples

    I think the examples below will make these claims plausible.  This should be the subject of deep research.

    Superscripts and functions

    • A superscript, as in $5^2$ or $x^3$, has a pretty standard meaning denoting a power, at least until you get to higher level stuff such as tensors.  
    • A function can be denoted by a letter, symbol, or string, and the notation $f(x)$ refers to the value of the function at input $x$.  

    For functions defined on numbers, it is common in precalculus and higher to write $f^2(x)$ to denoted $(f(x))^2=f(x)\,f(x)$.  Since the value of certain multiletter functions are commonly written without the parentheses (for example, $\sin\,x$), one writes $\sin^2x$ to mean $(\sin\,x)^2$.

    The notation $f^n$ is also widely used to mean the $n$th iterate of $f$ (if it exists), so $f^3(x)=f(f(f(x)))$ and so on.  This leads naturally to writing $f^{-1}(x)$ for the inverse function of $f$; this is common notation whether the function $f$ is bijective or not (in which case $f^{-1}$ is set-valued).  Thus $\sin^{-1}x$ means $\arcsin\,x$.

    It is notorious that words in mathematical English have different meanings in different texts.  This is an example in the symbolic language (and not just at the research level) of a systematic construction that can give expressions that have ambiguous meanings.

    This phenomenon is an example of why I say the symbolic language of math is a natural formal language: I have described a natural extension of notation used with multiplication of values that has been extended to being used for the binary operation of composition.  And that leads to students thinking that $\sin^{-1}x$ means $\frac{1}{\sin\,x}$. 

    History can overtake notation, too: Mathematicians probably took to writing $\sin\,x$ instead of $\sin(x)$ because it saves writing.  That was not very misleading in the old days when mathematical variables were always single symbols.  But students see multiletter variable names all the time these days (in programming languages, Excel and elsewhere), so of course some of them think $\sin\,x$ means $\sin$ times $x$. People who do this are not idiots.

    Juxtaposition

    Juxtaposition of two symbols means many different things.

    • If $m$ and $n$ are numbers, $mn$ denotes the product of the two numbers.
      • Multiplication is commutative, so $mn$ and $nm$ denote the same number, but they correspond to different calculations.  
    • If $M$ and $N$ are matrices, $MN$ denotes the matrix product of the two matrices.
      • This is a binary operation but it is not the same operation denoted by juxtaposition of numbers. (In fact it involves both addition and multiplication of numbers.)
      • Now $MN$ may not be the same matrix as $NM$.
    • If $A$ and $B$ are points in a geometric drawing, $AB$ denotes the line segment from $A$ to $B$.
      • This is a function of two variables denoting points whose value is a line segment.  
      • It is not what is usually called a binary operation, although as an opinionated category theorist I would call it a multisorted binary operation.
      • It is commutative, but it doesn't make sense to ask if it is associative.

    This phenomenon is called overloaded notation.  

    • In order to understand the meaning of the juxtaposition of symbols, you have to know the type of the variables.
    • The surrounding text may tell you specifically the variables denote matrices or whatever. So this is an instance of context-sensitive semantics. 
      • Students tend to expect that they know what any formula means in isolation from the text.  It may make them very sad to discover that this doesn't work — once they believe it, which can take quite a while.
    • In many cases the problem is alleviated by the use of convention.
      • Matrices are usually denoted by capital letters, numbers by lower case letters.
      • But points in geometry are usually denoted by capital letters too.  So you have to know that referring to a geometric diagram is significant to understanding the notation. This is an indirect form of context-sensitivity.  Did any teacher every point this out to students?  Does it appear anywhere in print?

    The earlier example of $\sin^{-1}x$ is a case which is not context-sensitive.  Knowing the types of the variables won't help.  Of course, if the author explains which meaning is meant, that explanation is within the context of the book!  That is not a lot of help for grasshoppers like me that look back and forth at different parts of a math book instead of reading it straight through..  

    Equations

    Consider the expressions

    1. $x^2-5x+4=0$
    2. $x^2+y^2=1$
    3. $x^2+2x+1=(x+1)^2$

    They are assertions that two expressions have the same value. A strictly logical view of an equation containing variables is that it puts a constraint on the variables.  It is true of some numbers (or pairs of numbers) and false of others.  That is the defining property of an equation. Equation 1 requires that $x=1$ or $x=4$.  Equation 2 imposes a constraint which is satisfied by uncountably many pairs of real numbers, and is also not true of uncountably many pairs. But equation 3 puts no constraint on the variable.  It is true of every number $x$.

    A strictly logical view of symbolic notation does math a disservice.  Here, the notion that an equation is by definition a symbolic statement that has a truth set and a falsity set may be correct but it is not the important thing about any particular equation. When we read and do math we have many different metaphors and images about a concept.  The definition of a kind of object is often in terms of things that may not be the most important things to know about it.  (One of the most important fact about groups is that it is an abstraction of symmetries, which the axioms don't mention at all.)

    Equation 1. is something that would make most people set out to discover the truth set.  Equation 2. calls out for drawing its graph.  Equation 3. being an identity means that is useful in algebraic reasoning.  The images they call up are different and what you do with them is different.  The images and metaphors that cluster around a concept are an important part of the semantics of the symbolic language.

    I expect to post separately about the semantics of variables and about the semantics of symbolic language embedded in mathematical English.

    References

    Send to Kindle

    Bugs in English and in math

    Everyone knows that computer programs have bugs.  In fact, languages have bugs, too, although we don't usually call them that.  

    Bugs in English 

      

    Right

    Q: "Should I turn left at the next corner?" A: "Right".  Probably most Americans who drive now know this bug.  The answer could mean "yes" or "turn right".  So we have to stop and think how to answer this question.  That makes it a bug.  

    Too, two

    Comment: " We will take Route 30".  Answer: "We will take Route 30 too".  This bug is probably responsible for the survival of the word "also".  

    Note that unlike the case of "right", this is a bug only of spoken English.

    Subject and predicate

    In Comma rule found dysfunctional, I wrote about the problem that in formal English writing there is no way to indicate where the subject ends and the predicate begins.  This causes a problem reading complicated sentences with many clauses such as academic writing often uses.  Of course, one way around this is to write short, simple sentences!  (That sounds like the subject of a future blog…) 

    Bugs in the symbolic language of math

      

    Fractions

    In both Excel and Mathematica, "1/2*3" means 3/2. Now, I would think "1/2a" means "1/(2a)", but younger mathematicians are taught PEMDAS (see Purplemath), which says that division and multiplication have the same precedence and operations are evaluated from left to right.  

     If in Mathematica you define a function f[a_] := 1/2a, f[3] evaluates to 3/2, so Mathematica (and most other computer languages) agree with PEMDAS. (Note: When you write 1/2a in a Mathematica notebook, it automatically puts a space between the 2 and the a, and space in Mathematica means times, so it does warn you.)

    Nevertheless, my ancient education would lead me to write (1/2)a for that meaning.  This means I must learn to write 1/(2a) for the other meaning instead of 1/2a.  

    Questions:

    • Did the language really change or was I always "doing it wrong"?  I would like to hear from other ancient mathematicians.  (But I don't know very many who would read blogs or Purplemath.)
    • Should such a phenomenon be called a bug? 

    Repeated exponentiation

    In Excel, "2^2^3" means $(2^2)^3$, in other words, 64.  In Mathematica, it means $2^{(2^3)}=2^8=256$.  My impression is that most mathematicians expect it to mean $2^{(2^3)}$.  

    References: This post in Walking Randomly, my post Mathematical UsageWikipedia's article.  

    Exponentiation on functions is ambiguous

    If $f:\mathbb{R}\to\mathbb{R}$ is a function, $f^2(x)$ can mean either $f(f(x))$ or $f(x)f(x)$, and both usages are common.  You should tell your students about this because no one is ever going to make one of the usages go away.

    A far worse catastrophe is the fact that in calculus books, $\sin^2x=(\sin\,x)(\sin\,x)$ but $\sin^{-1}x=\text{arcsin}\,x$.  I betcha (lived in Minnesota four years now) we could succeed with a campaign to convince calc book publishers to always write $(\sin\,x)^2$ and $\arcsin\,x$.  

    Bugs in the Mathematical Dialect of English

    The mathematical dialect of English is what I call Mathematical English in the abstractmath website.  It is a different language from the symbolic language, which is not a dialect of English.

    I have written about the problems with Mathematical English in a ridiculous number of places.  (See references in The Handbook of Mathematical Discourse).  It is normal for a dialect of a language to use words and grammatical structures that in the original language mean different things.  (See Dialects below).

    Words with different meanings

    • A set is a group in standard English, but not in math English.  
    • The number 2+3i is a real number in standard English, but not in math English.  
    • And so on.

    Use of adjectives and prefixes

    • A "noncommutative ring" has commutative addition.
    • A "semigroup" has a fully defined binary operation.

    If, then

    The bug that grabs math newbies by the throat and won't let go is the meaning of "If P, then Q".  

    • "If a number is divisible by 4, then it is even" in math dialect means a number not divisible by 4 might be even anyway.
    • "If you eat your broccoli you will get your dessert" in standard American Parental English does not mean you might get your dessert if you don't eat your broccoli.

    And then there is the phenomenon of Vacuous Implication, which leaves students gasping and writhing.

    About "dialects"

    Most Americans are not familiar with dialects in the sense I am using the word here, since the only really different dialects we have are Gullah and Hawaiian Pidgin, both of which are very hard to understand; although for example Appalachian English and African-American urban vernacular [1] are dialects of a milder sort.  I grew up in Savannah and heard diluted Gullah sometimes on the street (didn't understand much).  I am also rather familiar with Züritüütsch since we lived in Zürich for a year.   

    What the rest of the world call dialects have many distinctive properties:

    • They have nonstandard pronunciation to the point where they are difficult to understand. 
    • They have differences in grammar.  (Both Gullah and especially Hawaiian Creole have differences in grammar from Standard English.) 
    • They have differences in vocabulary, enough sometimes to cause misunderstanding.

    I grew up speaking an Atlanta dialect, which really did have differences in all those parameters.  But what people today call a Southern accent is really just an accent (minor variations in pronunciation), not a dialect.  

    Hawaiian Creole, and possibly Gullah, but not the other dialects I mentioned, are singled out by linguists as creoles because they been modified heavy influence from another language.  Züritüütsch is not a creole, but it is quite difficult for native German-speakers to understand.  The Swiss situation particularly emphasizes the distinction between "dialect" and "accent".  The typical native of Zürich speaks Züritüütsch and also speaks standard German with a Swiss accent.  

    Reference

    [1] What Language Is (And What It Isn't and What It Could Be) by John H. McWhorter. Gotham, 2011.

     

     

    Send to Kindle