Category Archives: language of math

Representations of functions III

Introduction to this post

I am writing a new abstractmath chapter called Representations of Functions. It will replace some of the material in the chapter Functions: Images, Metaphors and Representations. This post is a draft of the sections on representations of finite functions.

The diagrams in this post were created using the Mathematica Notebook Constructions for cographs and endographs of finite functions.nb.
You can access this notebook if you have Mathematica, which can be bought, but is available for free for faculty and students at many universities, or with Mathematica CDF Player, which is free for anyone and runs on Windows, Mac and Linux.

Like everything in abstractmath.org, the notebooks are covered by a Creative Commons ShareAlike 3.0 License.

Segments posted so far

Graphs of finite functions

When a function is continuous, its graph shows up as a curve in the plane or as a curve or surface in 3D space. When a function is defined on a set without any notion of continuity (for example a finite set), the graph is just a set of ordered pairs and does not tell you much.

A finite function $f:S\to T$ may be represented in these ways:

  • Its graph $\{(s,f(s))|s\in S\}$. This is graph as a mathematical object, not as a drawing or as a directed graph — see graph (two meanings)).
  • A table, rule or two-line notation. (All three of these are based on the same idea, but differ in presentation and are used in different mathematical specialties.)
  • By using labels with arrows between them, arranged in one of two ways:
  • A cograph, in which the domain and the codomain are listed separately.
  • An endograph, in which the elements of the domain and the codomain are all listed together without repetition.

All these techniques can also be used to show finite portions of infinite discrete functions, but that possibility will not be discussed here.

Introductory Example

Let \[\text{f}:\{a,b,c,d,e\}\to\{a,b,c,d\}\] be the function defined by requiring that $f(a)=c$, $f(b)=a$, $f(c)=c$, $f(d)=b$, and $f(e)=d$.

Graph

The graph of $f$ is the set
\[(a,c),(b,a),(c,c),(d,b),(e,d)\]
As with any set, the order in which the pairs are listed is irrelevant. Also, the letters $a$, $b$, $c$, $d$ and $e$ are merely letters. They are not variables.

Table

$\text{f}$ is given by this table:

This sort of table is the format used in databases. For example, a table in a database might show the department each employee of a company works in:

Rule

The rule determined by the finite function $f$ has the form

\[(a\mapsto b,b\mapsto a,c\mapsto c,d\mapsto b,e\mapsto d)\]

Rules are built in to Mathematica and are useful in many situations. In particular, the endographs in this article are created using rules. In Mathematica, however, rules are written like this:

\[(a\to b,b\to a,c\to c,d\to b,e\to d)\]

This is inconsistent with the usual math usage (see barred arrow notation) but on the other hand is easier to enter in Mathematica.

In fact, Mathematica uses very short arrows in their notation for rules, shorter than the ones used for the arrow notation for functions. Those extra short arrows don’t seems to exist in TeX.

Two-line notation

Two-line notation is a kind of horizontal table.

\[\begin{pmatrix} a&b&c&d&e\\c&a&c&b&d\end{pmatrix}\]

The three notations table, rule and two-line do the same thing: If $n$ is in the domain, $f(n)$ is shown adjacent to $n$ — to its right for the table and the rule and below it for the two-line.

Note that in contrast to the table, rule and two-line notation, in a cograph each element of the codomain is shown only once, even if the function is not injective.

Cograph

To make the cograph of a finite function, you list the domain and codomain in separate parallel rows or columns (even if the domain and codomain are the same set), and draw an arrow from each $n$ in the domain to $f(n)$ in the codomain.

This is the cograph for $\text{f}$, represented in columns

and in rows (note that $c$ occurs only once in the codomain)

Pretty ugly, but the cograph for finite functions does have its uses, as for example in the Wikipedia article composition of functions.

In both the two-line notation and in cographs displayed vertically, the function goes down from the domain to the codomain. I guess functions obey the law of gravity.

Rearrange the cograph

There is no expectation that in the cograph $f(n)$ will be adjacent to $n$. But in most cases you can rearrange both the domain and the codomain so that some of the structure of the function is made clearer; for example:

The domain and codomain of a finite function can be rearranged in any way you want because finite functions are not continuous functions. This means that the locations of points $x_1$ and $x_2$ have nothing to do with the locations of $f(x_1)$ and $f(x_2)$: The domain and codomain are discrete.

Endograph

The endograph of a function $f:S\to T$ contains one node labeled $s$ for each $s\in S\cup T$, and an arrow from $s$ to $s’$ if $f(s)=s’$. Below is the endograph for $\text{f}$.

The endograph shows you immediately that $\text{f}$ is not a permutation. You can also see that with whatever letter you start with, you will end up at $c$ and continue looping at $c$ forever. You could have figured this out from the cograph (especially the rearranged cograph above), but it is not immediately obvious in the cograph the way it in the endograph.

There are more examples of endographs below and in the blog post
A tiny step towards killing string-based math. Calculus-type functions can also be shown using endographs and cographs: See Mapping Diagrams from A(lgebra) B(asics) to C(alculus) and D(ifferential) E(quation)s, by Martin Flashman, and my blog posts Endographs and cographs of real functions and Demos for graph and cograph of calculus functions.

Example: A permutation

Suppose $p$ is the permutation of the set \[\{0,1,2,3,4,5,6,7,8,9\}\]given in two-line form by
\[\begin{pmatrix} 0&1&2&3&4&5&6&7&8&9\\0&2&1&4&5&3&7&8&9&6\end{pmatrix}\]

Cograph

Endograph

Again, the endograph shows the structure of the function much more clearly than the cograph does.

The endograph consists of four separate parts (called components) not connected with each other. Each part shows that repeated application of the function runs around a kind of loop; such a thing is called a cycle. Every permutation of a finite set consists of disjoint cycles as in this example.

Disjoint cycle notation

Any permutation of a finite set can be represented in disjoint cycle notation: The function $p$ is represented by:

\[(0)(1,2)(3,4,5)(6,7,8,9)\]

Given the disjoint cycle notation, the function can be determined as follows: For a given entry $n$, $p(n)$ is the next entry in the notation, if there is a next entry (instead of a parenthesis). If there is not a next entry, $p(n)$ is the first entry in the cycle that $n$ is in. For example, $p(7)=8$ because $8$ is the next entry after $7$, but $p(5)=3$ because the next symbol after $5$ is a parenthesis and $3$ is the first entry in the same cycle.

The disjoint cycle notation is not unique for a given permutation. All the following notations determine the same function $p$:

\[(0)(1,2)(4,5,3)(6,7,8,9)\]
\[(0)(1,2)(8,9,6,7)(3,4,5)\]
\[(1,2)(3,4,5)(0)(6,7,8,9)\]
\[(2,1)(5,3,4)(9,6,7,8)\]
\[(5,3,4)(1,2)(6,7,8,9)\]

Cycles such as $(0)$ that contain only one element are usually omitted in this notation.

Example: A tree

Below is the endograph of a function \[t:\{0,1,2,3,4,5,6,7,8,9\}\to\{0,1,2,3,4,5,6,7,8,9\}\]

This endograph is a tree. The graph of a function $f$ is a tree if the domain has a particular element $r$ called the root with the properties that

  • $f(r)=r$, and
  • starting at any element of the domain, repreatedly applying $f$ eventually produces $r$.

In the case of $t$, the root is $4$. Note that $t(4)=4$, $t(t(7))=4$, $t(t(t(9)))=4$, $t(1)=4$, and so on.

The endograph

shown here is also a tree.

See the Wikipedia article on trees for the usual definition of tree as a special kind of graph. For reading this article, the definition given in the previous paragraph is sufficient.

The general form of a finite function

This is the endograph of a function $t$ on a $17$-element set:

It has two components. The upper one contains one $2$-cycle, and no matter where you start in that component, when you apply $t$ over and over you wind up flipping back and forth in the $2$-cycle forever. The lower component has a $3$-cycle with a similar property.

This illustrates a general fact about finite functions:

  • The endograph of any finite function contains one or more components $C_1$ through $C_k$.
  • Each component $C_k$ contains exactly one $n_k$ cycle, for some integer $n_k\geq 1$, to which are attached zero or more trees.
  • Each tree in $C_k$ is attached in such a way that its root is on the unique cycle contained in $C_k$.

In the example above, the top component has three trees attached to it, two to $3$ and one to $4$. (This tree does not illustrate the fact that an element of one of the cycles does not have to have any trees attached to it).

You can check your understanding of finite functions by thinking about the following two theorems:

  • A permutation is a finite function with the property that its cycles have no trees attached to them.
  • A tree is a finite function that has exactly one component whose cycle is a $1$-cycle.



Creative Commons License

This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

Send to Kindle

Insights into mathematical definitions

My general practice with abstractmath.org has been to write about the problems students have at the point where they first start studying abstract math, with some emphasis on the languages of math. I have used my own observations of students, lexicographical work I did in the early 2000’s, and papers written by workers in math ed at the college level.

A few months ago, I finished revising and updating abstractmath.org. This took rather more than a year because among other things I had to reconstitute the files so that the html could be edited directly. During that time I just about quit reading the math ed literature. In the last few weeks I have found several articles that have changed my thinking about some things I wrote in abmath, so now I need to go back and revise some more!

In this post I will make some points about definitions that I learned from the paper by Edwards and Ward and the paper by Selden and Selden

I hope math ed people will read the final remarks.

Peculiarities of math definitions

When I use a word, it means just what I choose it to mean–neither more nor less.” — Humpty Dumpty

A mathematical definition is fundamentally different from other sorts of definitions in two different ways. These differences are not widely appreciated by students or even by mathematicians. The differences cause students a lot of trouble.

List of properties

One of the ways in which a math definition is different from other kinds is that the definition of a math object is given by accumulation of attributes, that is, by listing properties that the object is required to have. Any object defined by the definition must have all those properties, and conversely any object with all the properties must be an example of the type of object being defined. Furthermore, there is no other criterion than the list of attributes.

Definitions in many fields, including some sciences, don’t follow this rule. Those definitions may list some properties the objects defined may have, but exceptions may be allowed. They also sometimes give prototypical examples. Dictionary definitions are generally based on observation of usage in writing and speech.

Imposed by decree

One thing that Edwards and Ward pointed out is that, unlike definitions in most other areas of knowledge, a math definition is stipulated. That means that meaning of (the name of) a math object is imposed on the reader by decree, rather than being determined by studying the way the word is used, as a lexicographer would do. Mathematicians have the liberty of defining (or redefining) a math object in any way they want, provided it is expressed as a compulsory list of attributes. (When I read the paper by Edwards and Ward, I realized that the abstractmath.org article on math definitions did not spell that out, although it was implicit. I have recently revised it to say something about this, but it needs further work.)

An example is the fact that in the nineteenth century some mathe­maticians allowed $1$ to be a prime. Eventually they restricted the definition to exclude $1$ because including it made the statement of the Fundamental Theorem of Arithmetic complicated to state.

Another example is that it has become common to stipulate codomains as well as domains for functions.

Student difficulties

Giving the math definition low priority

Some beginning abstract math students don’t give the math definition the absolute dictatorial power that it has. They may depend on their understanding of some examples they have studied and actively avoid referring to the definition. Examples of this are given by Edwards and Ward.

Arbitrary bothers them

Students are bothered by definitions that seem arbitrary. This includes the fact that the definition of “prime” excludes $1$. There is of course no rule that says definitions must not seem arbitrary, but the students still need an explanation (when we can give it) about why definitions are specified in the way they are.

What do you DO with a definition?

Some students don’t realize that a definition gives a magic formula — all you have to do is say it out loud.
More generally, the definition of a kind of math object, and also each theorem about it, gives you one or more methods to deal with the type of object.

For example, $n$ is a prime by definition if $n\gt 1$ and the only positive integers that divide $n$ are $1$ and $n$. Now if you know that $p$ is a prime bigger than $10$ then you can say that $p$ is not divisible by $3$ because the definition of prime says so. (In Hogwarts you have to say it in Latin, but that is no longer true in math!) Likewise, if $n\gt10$ and $3$ divides $n$ then you can say that $n$ is not a prime by definition of prime.

The paper by Bills and Tall calls this sort of thing an operable definition.

The paper by Selden and Selden gives a more substantial example using the definition of inverse image. If $f:S\to T$ and $T’\subseteq T$, then by definition, the inverse image $f^{-1}T’$ is the set $\{s\in S\,|\,f(s)\in T’\}$. You now have a magic spell — just say it and it makes something true:

  • If you know $x\in f^{-1}T’$ then can state that $f(x)\in T’$, and all you need to justify that statement is to say “by definition of inverse image”.
  • If you know $f(x)\in T’$ then you can state that $x\in f^{-1}T’$, using the same magic spell.

Theorems can be operable, too. Wiles’ Theorem wipes out the possibility that there is an integer $n$ for which $n^{42}=365^{42}+666^{42}$. You just quote Wiles’ Theorem — you don’t have to calculate anything. It’s a spell that reveals impossibilities.

What the operability of definitions and theorems means is:

A definition or theorem is not just a static statement,it is a weapon for deducing truth.

Some students do not realize this. The students need to be told what is going on. They do not have to be discarded to become history majors just because they may not have the capability of becoming another Andrew Wiles.

Final remarks

I have a wish that more math ed people would write blog posts or informal articles (like the one by Edwards and Ward) about what that have learned about students learning math at the college level. Math ed people do write scholarly articles, but most of the articles are behind paywalls. We need accessible articles and blog posts aimed at students and others aimed at math teachers.

And feel free to steal other math ed people’s ideas (and credit them in a footnote). That’s what I have been doing in abstractmath.org and in this blog for the last ten years.

References


  • Bills, L., & Tall, D. (1998). Operable definitions in advanced mathematics: The case of the least upper bound. In A. Olivier & K. Newstead (Eds.), Proceedings of the 22nd Conference of the International Group for the Psychology of Mathematics Education, Vol. 2 (pp. 104-111). Stellenbosch, South Africa: University of Stellenbosch.
  • B. S. Edwards, and M. B. Ward, Surprises from mathematics education research: Student (mis) use of mathematical definitions (2004). American Mathematical Monthly, 111, 411-424.
  • G. Lakoff, Women, Fire and Dangerous
    Things
    . University of Chicago Press, 1990. See his discussion of concepts and prototypes.
  • J. Selden and A. Selden, Proof Construction Perspectives: Structure, Sequences of Actions, and Local Memory, Extended Abstract for KHDM Conference, Hanover, Germany, December 1-4, 2015. This paper may be downloaded from Academia.edu.
  • A Handbook of mathematical discourse, by Charles Wells. See concept, definition, and prototype.
  • Definitions, article in abstractmath.org. (Some of the ideas in this post have now been included in this article, but it is due for another revision.)
  • Definitions in logic and mathematics in Wikipedia.
  • Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle

    Very early difficulties II

    Very early difficulties II

    This is the second part of a series of posts about certain difficulties math students have in the very early stages of studying abstract math. The first post, Very early difficulties in studying abstract math, gives some background to the subject and discusses one particular difficulty: Some students do not know that it is worthwhile to try starting a proof by rewriting what is to be proved using the definitions of the terms involved.

    Math StackExchange

    The website Math StackExchange is open to any questions about math, even very easy ones. It is in contrast with Math OverFlow, which is aimed at professional mathematicians asking questions in their own field.

    Math SE contains many examples of the early difficulties discussed in this series of posts, and I recommend to math ed people (not just RUME people, since some abstract math occurs in advanced high school courses) that they might consider reading through questions on Math SE for examples of misunderstanding students have.

    There are two caveats:

    • Most questions on Math SE are at a high enough level that they don’t really concern these early difficulties.
    • Many of the questions are so confused that it is hard to pinpoint what is causing the difficulty that the questioner has.

    Connotations of English words

    The terms(s) defined in a definition are often given ordinary English words as names, and the beginner automatically associates the connotations of the meaning of the English word with the objects defined in the definition.

    Infinite cardinals

    If $A$ if a finite set, the cardinality of $A$ is simply a natural number (including $0$). If $A$ is a proper subset of another set $B$, then the cardinality of $A$ is strictly less than the cardinality of $B$.

    In the nineteenth century, mathematicians extended the definition of cardinality for infinite sets, and for the most part cardinality has the same behavior as for finite sets. For example, the cardinal numbers are well-ordered. However, for infinite sets it is possible for a set and a proper subset of the set to have the same cardinality. For example, the cardinality of the set of natural numbers is the same as the cardinality of the set of rational numbers. This phenomenon causes major cognitive dissonance.

    Question 1331680 on Math Stack Exchange shows an example of this confusion. I have also discussed the problem with cardinality in the abstractmath.org section Cardinality.

    Morphism in category theory

    The concept of category is defined by saying there is a bunch of objects called objects (sorry bout that) and a bunch of objects called morphisms, subject to certain axioms. One requirement is that there are functions from morphisms to objects choosing a “domain” and a “codomain” of each morphism. This is spelled out in Category Theory in Wikibooks, and in any other book on category theory.

    The concepts of morphism, domain and codomain in a category are therefore defined by abstract definitions, which means that any property of morphisms and their domains and codomains that is true in every category must follow from the axioms. However, the word “morphism” and the talk about domains and codomains naturally suggests to many students that a morphism must be a function, so they immediately and incorrectly expect to evaluate it at an element of its domain, or to treat it as a function in other ways.

    Example

    If $\mathcal{C}$ is a category, its opposite category $\mathcal{C}^{op}$ is defined this way:

    • The objects of $\mathcal{C}^{op}$ are the objects of $\mathcal{C}$.
    • A morphism $f:X\to Y$ of $\mathcal{C}^{op}$ is a morphism from $Y$ to $X$ of $\mathcal{C}$ (swap the domain and codomain).

    In Question 980933 on Math SE, the questioner is saying (among other things) that in $\text{Set}^{op}$, this would imply that there has to be a morphism from a nonempty set to the empty set. This of course is true, but the questioner is worried that you can’t have a function from a nonempty set to the empty set. That is also true, but what it implies is that in $\text{Set}^{op}$, the morphism from $\{1,2,3\}$ to the empty set is not a function from $\{1,2,3\}$ to the empty set. The morphism exists, but it is not a function. This does not any any sense make the definition of $\text{Set}^{op}$ incorrect.

    Student confusion like this tends to make the teacher want to have a one foot by six foot billboard in his classroom saying

    A MORPHISM DOESN’T HAVE TO BE A FUNCTION!

    However, even that statement causes confusion. The questioner who asked Question 1594658 essentially responded to the statement in purple prose above by assuming a morphism that is “not a function” must have two distinct values at some input!

    That questioner is still allowing the connotations of the word “morphism” to lead them to assume something that the definition of category does not give: that the morphism can evaluate elements of the domain to give elements of the codomain.

    So we need a more elaborate poster in the classroom:

    The definition of “category” makes no requirement
    that an object has elements
    or that morphisms evaluate elements.

    As was remarked long long ago, category theory is pointless.

    English words implementing logic

    There are lots of questions about logic that show that students really do not think that the definition of some particular logical construction can possibly be correct. That is why in the abstractmath.org chapter on definitions I inserted this purple prose:

    A definition is a totalitarian dictator.

    It is often the case that you can explain why the definition is worded the way it is, and of course when you can you should. But it is also true that the student has to grovel and obey the definition no matter how weird they think it is.

    Formula and term

    In logic you learn that a formula is a statement with variables in it, for example “$\exists x((x+5)^3\gt2)$”. The expression “$(x+5)^3$” is not a formula because it is not a statement; it is a “term”. But in English, $H_2O$ is a formula, the formula for water. As a result, some students have a remarkably difficult time understanding the difference between “term” and “formula”. I think that is because those students don’t really believe that the definition must be taken seriously.

    Exclusive or

    Question 804250 in MathSE says:

    “Consider $P$ and $Q$. Let $P+Q$ denote exclusive or. Then if $P$ and $Q$ are both true or are both false then $P+Q$ is false. If one of them is true and one of them is false then $P+Q$ is true. By exclusive or I mean $P$ or $Q$ but not both. I have been trying to figure out why the truth table is the way it is. For example if $P$ is true and $Q$ is true then no matter what would it be true?”

    I believe that the questioner is really confused by the plus sign: $P+Q$ ought to be true if $P$ and $Q$ are both true because that’s what the plus sign ought to mean.

    Yes, I know this is about a symbol instead of an English word, but I think the difficulty has the same dynamics as the English-word examples I have given.

    If I have understood this difficulty correctly, it is similar to the students who want to know why $1$ is not a prime number. In that case, there is a good explanation.

    Only if

    The phrase “only if” simply does not mean the same thing in math as it does in English. In Question 17562 in MathSE, a reader asks the question, why does “$P$ only if $Q$” mean the same as “if $P$ then $Q$” instead of “if $Q$ then $P$”?

    Many answerers wasted a lot of time trying to convince us that “$P$ only if $Q$” mean the same as “if $P$ then $Q$” in ordinary English, when in fact it does not. That’s because in English, clauses involving “if” usually connote causation, which does not happen in math English.

    Consider these two pairs of examples.

    1. “I take my umbrella only if it is raining.”
    2. “If I take my umbrella, then it is raining.”
    3. “I flip that switch only if a light comes on.”
    4. “If I flip that switch, a light comes on.”

    The average non-mathematical English speaker will easily believe that (1) and (4) are true, but will balk and (2) and (3). To me, (3) means that the light coming on makes me flip the switch. (2) is more problematical, but it does (to me) have a feeling of causation going the wrong way. It is this difference that causes students to balk at the equivalence in math of “$P$ only if $Q$” and “If $P$, then $Q$”. In math, there is no such thing as causation, and the truth tables for implication force us to live with the fact that these two sentences mean the same thing.

    Henning Makholm’ answer to Question 17562 begins this way: “I don’t think there’s really anything to understand here. One simply has to learn as a fact that in mathematics jargon the words ‘only if’ invariably encode that particular meaning. It is not really forced by the everyday meanings of ‘only’ and’ if’ in isolation; it’s just how it is.” That is the best way to answer the question. (Other answerers besides Makholm said something similar.)

    I have also discussed this difficulty (and other difficulties with logic) in the abmath section on “only if“.

    References

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle

    Very early difficulties in studying abstract math

    Introduction

    There are a some difficulties that students have at the very beginning of studying abstract math that are overwhelmingly important, not because they are difficult to explain but because too many teachers don’t even know the difficulties exist, or if they do, they think they are trivial and the students should know better without being told. These difficulties cause too many students to give up on abstract math and drop out of STEM courses altogether.

    I spent my entire career in math at Case Western Reserve University. I taught many calculus sections, some courses taken by math majors, and discrete math courses taken mostly by computing science majors. I became aware that some students who may have been A students in calculus essentially fell off a cliff when they had to do the more abstract reasoning involved in discrete math, and in the initial courses in abstract algebra, linear algebra, advanced calculus and logic.

    That experience led me to write the Handbook of Mathematical Discourse and to create the website abstractmath.org. Abstractmath.org in particular grew quite large. It does describe some of the major difficulties that caused good students to fall of the abstraction cliff, but also describes many many minor difficulties. The latter are mostly about the peculiarities of the languages of math.

    I have observed people’s use of language since I was like four or five years old. Not because I consciously wanted to — I just did. When I was a teenager I would have wanted to be a linguist if I had known what linguistics is.

    I will describe one of the major difficulties here (failure to rewrite according to the definition) with an example. I am planning future posts concerning other difficulties that occur specifically at the very beginning of studying abstract math.

    Rewrite according to the definition

    To prove that a statement
    involving some concepts is true,
    start by rewriting the statement
    using the definitions of the concepts.

    Example

    Definition

    A function $f:S\to T$ is surjective if for any $t\in T$ there is an $s\in S$ for which $f(s)=t$.

    Definition

    For a function $f:S\to T$, the image of $f$ is the set \[\{t\in T\,|\,\text{there is an }s\in S\text{ for which }f(s)=t\}\]

    Theorem

    Let $f:S\to T$ be a function between sets. Then $f$ is surjective if and only if the image of $f$ is $T$.

    Proof

    If $f$ is surjective, then the statement “there is an $s\in S$ for which $f(s)=t$” is true for any $t\in T$ by definition of surjectivity. Therefore, by definition of image, the image of $f$ is $T$.

    If the image of $f$ is $T$, then the definition of image means that there is an $s\in S$ for which $f(s)=t$ for any $t\in T$. So by definition of surjective, $f$ is surjective.

    “This proof is trivial”

    The response of many mathematicians I know is that this proof is trivial and a student who can’t come up with it doesn’t belong in a university math course. I agree that the proof is trivial. I even agree that such a student is not a likely candidate for getting a Ph.D. in math. But:

    • Most math students in an American university are not going to get a Ph.D. in math. They may be going on in some STEM field or to teach high school math.
    • Some courses taken by students who are not math majors take courses in which simple proofs are required (particularly discrete math and linear algebra). Some of these students may simply be interested in math for its own sake!

    A sizeable minority of students who are taking a math course requiring proofs need to be told the most elementary facts about how to do proofs. To refuse to explain these facts is a disfavor to the mathematics community and adds to the fear and dislike of math that too many people already have.

    These remarks may not apply to students in many countries other than the USA. See When these problems occur.

    “This proof does not describe how mathematicians think”

    The proof I wrote out above does not describe how I would come up with a proof of the statement, which would go something like this: I do math largely in pictures. I envision the image of $f$ as a kind of highlighted area of the codomain of $f$. If $f$ is surjective, the highlighting covers the whole codomain. That’s what the theorem says. I wouldn’t dream of writing out the proof I gave about just to verify that it is true.

    More examples

    Abstractmath.org and Gyre&Gimble contain several spelled-out theorems that start by rewriting according to the definition. In these examples one then goes on to use algebraic manipulation or to quote known theorems to put the proof together.

    Comments

    This post contains testable claims

    Herein, I claim that some things are true of students just beginning abstract math. The claims are based largely on my teaching experience and some statements in the math ed literature. These claims are testable.

    When these problems occur

    In the United States, the problems I describe here occur in the student’s first or second year, in university courses aimed at math majors and other STEM majors. Students typically start university at age 18, and when they start university they may not choose their major until the second year.

    In much of the rest of the world, students are more likely to have one more year in a secondary school (sixth form in England lasts two years) or go to a “college” for a year or two before entering a university, and then they get their bachelor’s degree in three years instead of four as in the USA. Not only that, when they do go to university they enter a particular program immediately — math, computing science, etc.

    These differences may mean that the abstract math cliff occurs early in a student’s university career in the USA and before the student enters university elsewhere.

    In my experience at CWRU, some math majors fall of the cliff, but the percentage of computing science students having trouble was considerably greater. On the other hand, more of them survived the discrete math course when I taught it because the discrete math course contain less abstraction and more computation than the math major courses (except linear algebra, which had a balance similar to the discrete math course — and was taken by a sizeable number of non-math majors).

    References

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle

    Names of mathematical objects

    This is a revision of the abstractmath.org article on names.

    The name of a mathematical object is a word or phrase in math English used to identify an object. A name plays the same role that symbolic terms play in the symbolic language.

    Sources of names


    Suggestive English words

    A suggestive name is a a common English word or phrase, chosen to suggest its meaning. This means it is a type of metaphor.

    Examples


    In none of these examples is
    the metaphorical meaning
    exactly suitable to be
    the mathe­matical definition.

    • “Curve”, “point”, “line”, “slope“, “circle” and many other English words are used in elementary math with precise meanings that more or less fit their everyday meanings.
    • Connected subspace (of a topological space). When you draw a picture of a connected set it looks “connected”.
    • “Set” suggests a collection of things and provides a reasonable metaphor for its mathe­matical meaning. Both the abstractmath article on sets and the Wikipedia article on sets give you insight on why this metaphor cannot be entirely accurate.
    • Random English words

      Most English words used in math are not suggestive. They are either chosen at random or were intended to suggest something but misfired in some way.

      Groups

      A group is a collection of math objects with a binary operation defined on it subject to certain constraints. The binary operation is much more impor­tant than the underlying set! To many non-mathe­maticians, a “group” sounds like essentially what a mathe­matician calls a “set”.

      The concept of group was one of the earliest mathe­matical concepts des­cribed as a set-with-structure. I believe that a group was origi­nally referred to as a “group of trans­forma­tions”. May­be that phrase got shortened to “group” without anyone realizing what a disas­trous met­a­phor it caused.

      Fields

      A field in the algebraic sense is a structure which is not in any way suggested by the word “field”. The German word for field in this sense is “Körper”, which means “body”. That is about as bad as “group”, and I suspect it was motivated in much the same way. The name “Körper” may be due to Dedekind. I don’t know who to blame for “field”.

      A field in the sense of an assignment of a scalar or a vector to every point in a space is a completely separate notion than that of field as an algebra. The concept was invented in the nineteenth century by physicists, but any math student is likely to see fields in this sense in several different courses.

      Perhaps the second meaning of field was suggested by contour plowing.

      The word “field” is also discussed in the Glossary.

      Person’s name

      A concept may be named after a person.

      Examples

      • L’Hôpital’s Rule
      • Hausdorff space
      • Turing machine
      • Riemann surface
      • Riemannian manifold
      • Pythagorean Theorem
      • I have no idea why “Riemann” gets an ending when it is a manifold but not when it is a surface.

        Made-up name

        Some names are made up in a random way, not based on any oter language. Googol is an example.

        Named after notation

        Symbols

        A mathematical object may be named by the typographical symbol(s) used to denote it. This is used both formally and in on-the-fly references.  

        Some objects have standard names that are single letters (Greek or Roman), such as $e$, $i$ and $\pi$. There is much more about this in Alphabets.

        Be warned that any letter can be given another definition. $\pi$ is also used to name a projection, $i$ is commonly used as an index, and $e$ means energy in physics.

        Expressions

        • The multiplication in a Lie Algebra is called the “Lie bracket”. It is written “$[v,w]$”.
        • In quantum mechanics, a vector $\vec{w}$ may be notated “$|w\rangle$” and called a “ket”. Another vector $\vec{v}$ induces a linear operator on vectors that is denoted by “$\langle v|$”, which is called a “bra”. The action of $\langle v|$ on $|w\rangle$ is the inner product $\langle v|w\rangle$, which suggested the “bra” and “ket” terminology (from “bracket”). You can blame Paul Dirac for this stuff.
        • In 1985, Michael Barr and I published a book in category theory called Triples, Toposes and Theories. Immediately after that everyone in category theory started saying “monad” for what had been called “triple”. (The notation for a triple, er, monad, is of the form “$(T,\eta,\mu)$”.)
        • Synecdoche

          A synecdoche is a name of part of something that is used as a name for the whole thing.

          Examples

        The Tochar­ians appear to have called a cart by their word for wheel several thousand years ago. See the blog post by Don Ringe.

        Names from other languages

        In English, many technical names are borrowed from other languages. It may be difficult to determine what the meaning in the old language has to do with the mathematical meaning.

        Examples

      • Matrix. This is the Latin word for “uterus”. I suppose the analogy is with “container”.
      • Parabola. “Parabola” is a word borrowed from Greek in late Latin, meaning something like “comparison”. The parabola $y=x^2$ “compares” a number with its square: it curves upward because the area of a square grows faster than the length of its side. “Parable” is from the same word.
      • Algebra. This comes from an Arabic word meaning the art of setting joints, or more generally “restore”. It came through Spanish where it once meant “surgical procedure” but that meaning is now obsolete.

      Much of this information comes from The On-Line Etymological Dictionary. (Read its article about “sine”.) See also my articles on secant and tangent.

      I enjoy finding out about etymol­ogies, but I concede that knowing an ety­mol­ogy doesn’t help you very much in under­standing the math.

      Names made up from other languages’ roots

      A name may be a new word made out of (usually) Greek or Latin roots.

      Examples

      • Homomorphism. “Homo” in Greek is a root meaning “same” and “morphism” comes from a root referring to shape.
      • Quasiconformal. “Quasi” is a Latin word meaning something like “as if”. It is a prefix mathematicians use a bunch. It usually implies a weakening of the constraints that define the word it is attached to. A map is conformal if it preserves angles in a certain sense, and it is quasiconformal then it does not preserve angles but it does take circles into ellipses in a certain restricted sense (which conformal maps also do). So it replaces a constraint by a weaker constraint.


      Mathematical names cause problems for students

      The name may suggest the wrong meaning

      This is discusses in detail in the article cognitive dissonance.

      The name may not suggest any meaning

      English is unusual among major languages in the number of technical words borrowed from other languages instead of being made up from native roots.  We have some, listed under suggestive names.  But how can you tell from looking at them what “parabola” or “homomorphism” mean?   This applies to concepts named after people, too: The fact that “Hausdorff” is German for a village near an estate doesn’t tell me what a Hausdorff space is.

      The English word “carnivore” (from Latin roots) can be translated as “Fleischfresser” in German; to a German speaker, that word means literally “meat eater”.  So a question such as “What does a carnivore eat” translates into something like, “What does a meat-eater eat?” 

      Chinese is another language that forms words in that way: see the discussion of “diagonal” in Julia Lan Dai’s blog.  (I stole the carnivore example from her blog, too.)

      The result is that many technical words in English do not suggest their meaning at all to a reader not familiar with the subject.  Of course, in the case of “carnivore” if you know Latin, French or Spanish you are likely to guess the meaning, but it is nevertheless true that English has a kind of elitist stratum of technical words that provide little or no clue to their meaning and Chinese and German do not, at least not so much. This is a problem in all technical fields, not just in math.

      Pronunciation

      There are two main reasons math students have difficulties in pronouncing technical words in math.

      Most students have little knowledge of other languages

      Forty years ago nearly all Ph.D. students had to show mastery in reading math in two foreign languages; this included pronunciation, although that was not emphasized. Today the language requirements in the USA are much weaker, and younger educated Americans are generally weak in foreign languages. As a result, graduate students pronounce foreign names in a variety of ways, some of which attract ridicule from older mathematicians.

      Example: the graduate student at a blackboard who came to the last step of a long proof and announced, “Viola!”, much to the hilarity of his listeners.

      Pronunciation of words from other languages has become unpredictable

      In English-speaking countries until the early twentieth century, the practice was to pronounce a name from another language as if it were English, following the rules of English pronunciation.

      We still pronounce many common math words this way: “Euclid” is pronounced “you-clid” and “parabola” with the second syllable rhyming with “dab”.

      But other words (mostly derived from people’s names) are pronounced using the pronunciation of the language they came from, or what the speaker thinks is the foreign pronunciation. This particularly involves pronouncing “a” as “ah”, “e” like “ay”, and “i” like “ee”.

      Examples
      • Euler (oiler)
      • Fourier (foo-ree-ay)
      • Lagrange (second a pronounced “ah”)
      • Lie (lee)
      • Riemann (ree-monn)

      The older practice of pronunciation is explained by history: In 1100 AD, the rules of pronunciation of English, Ger­man and French, in particular, were remarkably similar. Over the centuries, the sound systems changed, and Eng­lish­men, for example, changed their pronunciation of “Lagrange” so that the second syllable rhymes with “range”, whereas the French changed it so that the second vowel is nasalized (and the “n” is not otherwise pronounced) and rhymes with the “a” in “father”.

      German spelling

      The German letters “ä”, “ö” and “ü” may also be spelled “ae”, “oe” and “ue” respectively. It is far better to spell “Möbius” as “Moebius” than to spell it “Mobius”.

      The German letter “ß” may be spelled “ss” and often is by the Swiss. Thus Karl Weierstrass spelled his last name “Weierstraß”. Students sometimes confuse the letter “ß” with “f” or “r”. In English language documents it is probably better to use “ss” than “ß”.

      Transliterations from Cyrillic

       The name of the Russian mathematician mot commonly spelled “Chebyshev” in English is also spelled Chebyshov, Chebishev, Chebysheff, Tschebischeff, Tschebyshev, Tschebyscheff and Tschebyschef. (Also Tschebyschew in papers written in German.) The only spelling in the list above that could be said to have some official sanction is “Chebyshev”, which is used by the Library of Congress.

      The correct spelling of his name is “Чебышев” since he was Russian and the Russian language uses the Cyrillic alphabet.

      In spite of the fact that most of the transliterations show the last vowel to be an “e”, the name in Russian is pronounced approximately “chebby-SHOFF”, accent on the last syllable.  Now, that is a ridiculous situation, and it is the transliterators who are ridiculous, not Russian spelling, which in spite of that peculiarity about the Cyrillic letter “e” is much more nearly phonetic than English spelling.

      Some other Russian names have variant spellings (Tychonov, Vinogradov) but Chebyshev probably wins the prize for the most.

    Plurals

    Many authors form the plural of certain technical words using endings from the language from which the words originated. Students may get these wrong, and may sometimes meet with ridicule for doing so.

    Plurals ending in a vowel

    Here are some of the common mathematical terms with vowel plurals.

    singular plural
    automaton automata
    polyhedron polyhedra
    focus foci
    locus loci
    radius radii
    formula formulae
    parabola parabolae
    • Linguists have noted that such plurals seem to be processed differently from s-plurals.  In particular, when used as adjectives, most nouns appear in the singular, but vowel-plural nouns appear in the plural: Compare “automata theory” with “group theory”.  No one says groups theory.  I used to say “automaton theory” but people looked at me funny.
    • The plurals that end in a (of Greek and Latin neuter nouns) are often not recognized as plurals and are therefore used as singulars.  That is how “data” became singular.  This does not seem to happen with my students with the -i plurals and the -ae plurals.
    • In the written literature, the -ae plural appears to be dying, but the -a and -i plurals are hanging on. The commonest -ae plural is “formulae”; other feminine Latin nouns such as “parabola” are usually used with the English plural. In the 1990-1995 issues of six American mathematics journals, I found 829 occurrences of “formulas” and 260 occurrences of “formulae”, in contrast with 17 occurrences of “parabolas” and and no occurrences of “parabolae”. (There were only three occurrences of “parabolae” after 1918.)  In contrast, there were 107 occurrences of “polyhedra” and only 14 of “polyhedrons”.
    • Plurals in s with modified roots

      singular

      plural

      matrix

      matrices

      simplex

      simplices

      vertex

      vertices

      Students recognize these as plurals but produce new singulars for the words as back formations. For example, one hears “matricee” and “verticee” as the singular for “matrix” and “vertex”. I have also heard “vertec”.

      Remarks

      It is not unfair to say that some scholars insist on using foreign plurals as a form of one-upmanship. Students and young professors need to be aware of these plurals in their own self interest.

      It appears to me that ridicule and put-down for using standard English plurals instead of foreign plurals, and for mispronouncing foreign names, is much less common than it was thirty years ago. However, I am assured by students that it still happens.

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle

    Recent revisions to abstractmath.org

    For the last six months or so I have been systematically going through the abstractmath.org files, editing them for consistency, updating them, and in some cases making major revisions.

    In the past I have usually posted revised articles here on Gyre&Gimble, but WordPress makes it difficult to simply paste the HTML into the WP editor, because the editor modifies the HTML and does things such as recognizing line breaks and extra spaces which an HTML interpreters is supposed to ignore.

    Here are two lists of articles that I have revised, with links.

    Major revisions

    Other revised articles

    Other recent changes

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle

    The intent of mathematical assertions

    An assertion in mathematical writing can be a claim, a definition or a constraint.  It may be difficult to determine the intent of the author.  That is discussed briefly here.

    Assertions in math texts can play many different roles.

    English sentences can state facts, ask question, give commands, and other things.  The intent of an English sentence is often obvious, but sometimes it can be unexpectedly different from what is apparent in the sentence.  For example, the statement “Could you turn the TV down?” is apparently a question expecting a yes or no answer, but in fact it may be a request. (See the Wikipedia article on speech acts.) Such things are normally understood by people who know each other, but people for whom English is a foreign language or who have a different culture have difficulties with them.

    There are some problems of this sort in math English and the symbolic language, too.  An assertion can have the intent of being a claim, a definition, or a constraint.

    Most of the time the intent of an assertion in math is obvious. But there are conventions and special formats that newcomers to abstract math may not recognize, so they misunderstand the point of the assertion. This section takes a brief look at some of the problems.

    Terminology

    The way I am using the words “assertion”, “claim”, and “constraint” is not standard usage in math, logic or linguistics.


    Claims

    In most circumstances, you would expect that if a lecturer or author makes a math assertion, they are claiming that it is a true statement, and you would be right.

    Examples
    1. “The $240$th digit of $\pi$ after the decimal point is $4$.”
    2. “If a function is differentiable, it must be continuous.”
    3. “$7\gt3$”

    Remarks

    • You don’t have to know whether these statements are true or not to recognize them as claims. An incorrect claim is still a claim.
    • The assertion in (a) is a statement, in this case a false one.  If it claimed the googolth digit was $4$ you would never be able to tell whether it is true or not, but it
      still would be an assertion intended as a claim.
    • The assertion in (b) uses the standard math convention that an indefinite noun phrase (such as “a widget”) in the subject of a sentence is universally quantified (see also the article about “a” in the Glossary.) In other words, “An integer divisible by $4$ must be even” claims that any integer divisible by $4$ is even. This statement is claim, and it is true.
    • (c) is a (true) claim in the symbolic language. (Note that “$3 + 4$” is not an assertion at all, much less a claim.)


    Definitions

    Definitions are discussed primarily in the chapter on definitionsA definition is not the same thing as a claim. 

    Example

    The definition

    “An integer is even if it is divisible by $2$”

    makes the claim

    “An
    integer is even if and only if it is
    divisible by $2$”

    true.

    (If you are surprised that the definition uses “if” but the claim uses “if and only if”, see the Glossary article on “if”.)

    Unmarked definitions

    Math texts sometimes define something without saying that it is a definition. Because of that, students may sometimes think a claim is a definition.

    Example

    Suppose that the concept of “even integer” was new to you and the book said, “A number is even if it is divisible by $4$.” Perhaps you thought that this was a definition. Later the book refers to $6$ as even and you pull your hair out wondering why. The statement is a correct claim but an incorrect definition. A good writer would write something like “Recall that a number is even if it is divisible by $2$, so that in particular it is even if it is divisible by $4$.”

    On the other hand, you may think a definition is only a claim.

    Example

    A lecturer may say “By definition, an integer is even if it is divisible by $2$”, and you write down: “An integer is even if it is divisible by $2$”. Later, you get all panicky wondering How did she know that?? (This has happened to me.)

    The confusion in the preceding example can also occur if a books says, “An integer is even if it is divisible by $2$” and you don’t know about the convention that when an author puts a word or phrase in boldface or italics it may mean that they are defining it.

    A good writer always labels definitions


    Constraints

    Here are two assertions that contain variables.

    • “$n$ is even.”
    • “$x\gt1$”.

    Such an assertion is a constraint (or a condition) if the intent is
    that the assertion will hold in that part of the text (the scope of the constraint). The part of the text in which it holds is usually the immediate vicinity unless the authors explicitly says it will hold in a larger part of the text such as “this chapter” or “in the rest of the book”.

    Examples
    • Sometimes the wording makes it clear that the phrase is a constraint. So a statement such as “Suppose $3x^2-2x-5\geq0$” is a constraint on the possible values of $x$.
    • The statement “Suppose $n$ is even” is an explicit requirement that $n$ be even and an implicit requirement that $n$ be an integer.
    • A condition for which you are told to find the solution(s) is a constraint. For example: “Solve the equation $3x^2-2x-5=0$”. This equation is a constraint on the variable $x$. “Solving” the equation means saying explicitly which numbers make the equation true.

    Postconditions

    The constraint may appear in parentheses after the assertion as a postcondition on an assertion.

    Example

    “$x^2\gt x\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,(\text{all }x\gt1)$”

    which means that if the constraint “$x\gt1$” holds, then “$x^2\gt x$” is true. In other words, for all $x\gt1$, the statement $x^2\gt x$ is true. In this statement, “$x^2\gt x$” is not a constraint, but a claim which is true when the constraint is true.

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle

    Context

    This is a revised draft of the abstractmath.org article on context in math texts. Note: WordPress changed double primes into quotes. Tsk.

    Context

    Written and especially spoken language depends heavily on the context – the physical surroundings, the preceding conversation, and social and cultural assumptions.  Mathematical statements are produced in such contexts, too, but here I will discuss a special thing that happens in math conversation and writing that does not seem to happen much in other sorts of discourse:

    The meanings of expressions
    in both the symbolic language and math English
    change from phrase to phrase
    as the speaker or writer changes the constraints on them.

    Example

    In a math text, before the occurrence of a phrase such as “Let $n=3$”, $n$ may be known only as an integer variable.  After the phrase, it means specifically $3$.  So this phrase changes the meaning of $n$ by constraining $n$
    to be $3$.  We say the context of occurrences of “$n$” before the phrase requires only that $n$ be an integer, but after the occurrence the context requires $n=3$.

    Definition

    In this article, the context at a particular location in mathematical discourse is the sum total of what the reader or listener can know about the symbols and names used in the discourse when they have read everything up to that location.

    Remarks

    • Each clause can change the meaning of or constraints on one or more symbols or names. The conventions in effect during the discourse can also put constraints on the symbols and names.
    • Chierchia and McConnell-Ginet give a mathematical definition of context in the sense described here.
    • The references to “before” and “after” the phrase “Let $3$” refer to the physical location in text and to actual time in spoken math. There is more about this phenomenon in the Handbook of Mathematical Discourse, page 252, items (f) and (g).
    • Contextual changes of this sort take place using the pretense that you are reading the text in order, which many students and professionals do not do (they are “grasshoppers”).
    • I am not aware of much context-changing in everyday speech. One place it does occur is in playing games. For example, during some card games the word “trumps” changes meaning from time to time.
    • In symbolic logic, the context at a given place may be denoted by “$\Gamma$”.

    Detailed example of a math text

    Here is a typical example of a theorem and its proof.  It is printed twice, the second time with comments about the changes of context.  This is the same proof that is already analyzed practically to death in the chapter on presentation of proofs.

    First time through

    Definition: Divides

    Let $m$ and $n$ be integers with $m\ne 0$. The statement “$m$ divides $n$” means that there is an integer $q$ for which $n=qm$

    Theorem

    Let $m$, $n$ and $p$ be integers, with $m$ and $n$ nonzero, and suppose $m$ divides $n$ and $n$ divides $p$.  Then $m$ divides $p$.

    Proof

    By definition of divides, there are integers $q$ and $q’$ for which $n=qm$ and $p=q’n$. We must prove that there is an integer $q”$ for which $p=q”m$. But $p=q’n=q’qm$, so let $q”=q’q$.  Then $p=q”m$.

    Second time, with analysis

    Definition: Divides

    Begins a definition. The word “divides” is the word being defined. The scope of the definition is the following paragraph.

    Let $m$ and $n$ be integers

    $m$ and $n$ are new symbols in this discourse, constrained to be integers.

    with $m\ne 0$

    Another constraint on $m$.

    The statement “$m$ divides $n$ means that”

    This phrase means that what follows is the definition of “$m$ divides $n$”

    there is an integer $q$

    “There is” signals that we are beginning an existence statement and that $q$ is the bound variable within the existence statement.

    for which $n=qm$

    Now we know that “$m$ divides $n$” and “there is an integer $q$ for which $m=qn$” are equivalent statements.  Notes: (1) The first statement would only have implied the second statement if this had not been in the context of a definition. (2) After the conclusion of the definition, $m$, $n$ and $q$ are undefined variables.

    Theorem

    This announces that the next paragraph is a statement has been proved. In fact, in real time the statement was proved long before this discourse was written, but in terms of reading the text in order, it has not yet been proved.

    Let $m$, $n$ and
    $p$ be integers,

    “Let” tells us that the following statement is the hypothesis of an implication, so we can assume that $m$, $n$ and $p$ are all integers.  This changes the status of $m$ and $n$, which were variables used in the preceding paragraph, but whose constraints disappeared at the end of the paragraph.  We are starting over with $m$ and $n$.

    with $m$
    and $n$ nonzero.

    This clause is also part of the hypothesis. We can assume $m$ and $n$ are constrained to be nonzero.

    and suppose $m$ divides $n$ and $n$ divides $p$.

    This is the last clause in the hypothesis. We can assume that $m$ divides $n$ and $n$ divides $p$.

    Then $m$
    divides $p$.

    This is a claim that $m$ divides $p$. It has a different status from the assumptions that $m$ divides $n$ and $n$ divides $p$. If we are going to follow the proof we have to treat $m$ and $n$ as if they divide $n$ and $p$ respectively. However, we can’t treat $m$ as if it divides $p$. All we know is that the author is claiming that $m$ divides $p$, given the facts in the hypothesis.

    Proof

    An announcement that a proof is about to begin, meaning a chain of math reasoning. The fact that it is a proof of the Theorem just stated is not explicitly stated.

    By definition of divides, there are integers $q$ and $q’$ for which $n=qm$ and $p=q’n$.

    The proof uses the direct method (rather than contradiction or induction or some other method) and begins by rewriting the hypothesis using the definition of “divides”. The proof does not announce the use of these techniques, it just starts in doing it. So $q$ and $q$’ are new symbols that satisfy the equations $n=qm$ and $p=q’n$. The phrase “by definition of divides” justifies the introduction of $q$ and $q’$. $m$, $n$ and $p$ have already been introduced in the statement of the Theorem.

    We must prove that there is an integer $q”$ for which $p=q”m$.

    Introduces a new variable $q”$ which has not been given a value. We must define it so that $p=q”m$; this requirement is justified (without saying so) by the definition of “divides”.

    But $p=q’n=q’qm$,

    This is a claim about $p$, $q$, $q’$, $m$ and $n$.  It is justified by certain preceding sentences but this justification is not made explicit. Note that “$p=q’n=q’qm$” pivots on $q’n$, in other words makes two claims about it.

    so let $q”=q’q$.

    We have already introduced $q”$; now we give it the value $q”=q’q$.

    Then $p=q”m$

    This is an assertion about $p$, $q”$ and $n$, justified (but not explicitly — note the hidden use of associativity) by the previous claim that $p=q’n=q’qm$.

     

    The proof is now complete, although no
    statement asserts that it is.

    Remark

    If you have some skill in reading proofs, all the stuff in the right hand column happens in your brain without, for the most part, your being conscious of it.

    Acknowledgment

    Thanks to Chris Smith for correcting errors.

    References for “context”

    Chierchia, G. and S. McConnell-Ginet
    (1990), Meaning and Grammar. The MIT Press.

    de Bruijn, N. G. (1994), “The mathematical vernacular, a
    language for mathematics with typed sets”. In Selected Papers on Automath,
    Nederpelt, R. P., J. H. Geuvers, and R. C. de Vrijer, editors, volume 133 of
    Studies in Logic and the Foundations of Mathematics, pages 865 – 935. Elsevier

    Steenrod, N. E., P. R. Halmos, M. M. Schif­fer,
    and J. A. Dieudonné (1975), How to Write Mathematics.
    American Mathematical Society.

    Send to Kindle

    The Mathematics Depository: A Proposal

    Introduction

    This post is about taking texts written in mathematical English and the symbolic language and encoding it in a formal language that could be tested by an automated proof verifier. This is a very difficult undertaking, but we could get closer and closer to a working system by a worldwide effort continuing over, probably, decades. The system would have to contain many components working together to create incremental improvements in the process.

    This post, which is a first draft, outlines some suggestions as to how this could work. I do not discuss the encoding required, which is not my area of expertise. Yes, I understand that coding is the hard part!

    Much work has been done by computing scientists in developing proof checking and proof-finding programs. Work has also been done, primarily by math education workers but also by some philosophers and computing scientists, in uncovering the many areas where ordinary math language is ambiguous and deviates from ordinary English usage. These characteristics confuse students and also make it hard to design a program that can interpret the language. I have been working in that area mostly from the math ed point of view for the last twenty years.

    The Reference section lists many references to the problem of parsing mathematical English, some from the point of view of automatic translation of math language into code, but most from the point of view of helping students understand how to understand it.

    The Mathematics Depository

    I imagine a system for converting documents written in math language into machine-readable language and testing their claims. An organization, call it the Mathematics Depository, would be developed that is supported by many countries, organizations and individual supporters. It should consist of several components listed below, no doubt with other components as we become aware of needing them. The organization would be tasked with supporting and improving these components over time.

    The main parts of the system

    Each component is linked to a more detailed description that is given later in this post.

    • A Proof Verifier (PV), that inputs a proof and determines if it is correct.
    • A specification of a supported subset of Mathematical English and the symbolic language, that I will call Strict Math English (SME).
    • A Text-SME Converter, a program that would input a text written in ordinary math English that has been annotated by a knowledgeable person and convert it into SME.
    • An SME-PV Converter that will convert text written in SME into code that can be directly read by the Proof Verifier.
    • One or more Automatic Theorem Provers, that to begin with can take fairly simple conjectures written in SME and sometimes succeed in proving them.
    • An Annotation System containing an Annotation Editor that would allow a person to use SME to annotate an article written in ordinary math English so that it could be read by the Text-SME Converter.
    • A Data Base that would include the texts that have been collected in this endeavor, along with the annotations and the results of the proof checking.
    • A Data Base Miner that would watch for patterns in the annotations as new papers were submitted. The operators might also program it to watch for patterns in other aspects of the operation.

    These facilities would be organized so that the systems work together, with the result that the individual components I named improve over time, both automatically and via human intervention.

    Flow of Work

    1. A math text is submitted.
    2. If it is already in Strict Math English (SME), it is input to the Proof Verifier (PV).
    3. Otherwise, the math text is input into the Annotation System.
    4. The resulting SME text is input into the Text-SME Converter.
    5. The output of the Text-SME Converter is input into the Proof Verifier.
    6. The PV incorporates each definition in the text into the context of the math text. This is a specific meaning of the word “context”, including a list of the status of variables (bound, unbound, type, and so on), meanings of technical words, and other facts created in the text. “Context” is described informally in my article Context in abstractmath.org. That article gives references to the formal literature.
    7. In my experience mathematicians spend only a little time reading arguments step by step as described in the Context article. They usually look at a theorem and try to figure it out themselves, “cheating” occasionally by glancing at parts of the proof.

    8. Each mathematical assertion in the text is marked as a claim.
    9. The checking process records those claims occurring in the proof that are not proved in the text, along with any references given to other texts.
    10. If a reference to a result in another text is made, the PV looks for the result in the Database. If it does not find it, the PV incorporates the result and its location in the Database as an externally proven but untested claim.
    11. If no reference or proof for a claim is given, the PV checks the Database to see if it has already been proved.
    12. Any claim in the current text not shown as proven in the Database is submitted to the Automatic Theorem Prover (ATP). The output of the ATP is put in the database (proved, counterexample found, or unable to determine truth).
    13. If a segment of text is presented as a proof, it is input into the PV to be verified.
    14. The PV reports the result for each claimed proof, which can consist of several possibilities:
      • A counterexample for a proof is found, so the claim that the proof was supposed to report is false.
      • The proof contains gaps, so the claim is unsettled.
      • The proof is reported as correct.
    15. At the end of the process, all the information gathered is put into the Database:
      • The original text showing all the annotations.
      • The text in SME.
      • All claims, with their status (proven true, proven false, truth unknown, reference if one was given).
      • Every proof, with its status and the entire context at each step of the proof.

    Details

    The proof verifier

    • Proof checking programs have been developed over the last thirty or so years. The MD should write or adapt one or more Proof Verifiers and improve it incrementally as a result of experience in running the system. In this post I have assumed the use of just one Proof Verifier.
    • The Proof Verifier should be designed to read the output of the SME-PV converter.
    • The PV must read a whole math text in SME, identify and record each claim and check each proof (among other things). This is different from current proof verifiers, which take exactly one proof as input.
    • The PV must create the context of each proof and change it step by step as it reads each syntactic fragment of the math text.
    • Typically the context for a claimed proof is built up in the whole math text, not just in the part called “Proof”.
    • The PV should automatically query the Data Base for unproved steps in a proof in the input text to see if they have already been verified somewhere else. These results should be quoted in a proof verifier output.
    • The PV should also automatically submit steps in the proof that haven’t been verified to the Automatic Theorem Provers and wait for the step to be verified or not.
    • The Proof Verifier should output details of the result of the checking whether it succeeded in verifying the whole input text or not. In particular, it should list steps in proofs it failed to verify, including steps in proofs for which the input text cited the proof in some other paper, in the MD system or not.
    • The Proof Verifier should be available online for anyone to submit, in SME, a mathematical text claiming to prove a theorem. Submission might require a small charge.

    Strict Math English

    • One of the most important aspects of the system would be the simultaneous incremental updating of the SME and the SME-PV Converter.
    • The idea is that SME would get more and more inclusive of the phrases and clauses it allows.

    Example: Universal Assertions

    At the start SME might allow these statements to be recognized as the same universal assertion:

    • “$\forall x(x^2+1\gt0)$”
    • “For all [every, any] $x$, $x^2+1\gt0$.” (universality asserted using an English word.)
    • “For all [every, any] $x$, $x^2+1$ is positive.”

    As time goes on, a person or the Data Base Miner might detect that many annotators also recognized these statements as saying the same thing:

    • “$x^2+1\gt0\,\,\,\,\,(\text{all } x)$” (as a displayed statement)
    • “$x^2+1$ is positive for every $x$.” Universality asserted using an adjective in a postposited phrase.
    • “$x^2+1$ is always positive.” Universality hidden in a postposited adverb that seems to be referring to time!
    • There are more examples in my article Universally True Assertions. See also Susanna Epp’s article on quantification for other problems in this area.

    These other variations would then be added to the Strict Math Language. (This is only an example of how the system would evolve. I have no doubt that in fact all the terminology mentioned above would be included at the outset, since they are all documented in the math ed literature.)

    Even at the start, SME will include phrases and clauses in the English language as well as symbolic expressions. It is notorious that automatically parsing general English sentences is difficult and that the ubiquity of metaphors makes it essentially impossible to reliably construct the meaning of a sentence. That is why SME must start with a very narrow subset of math English. But even in early days, it should include some stereotyped metaphors, such as using “always” in universal assertions.

    The SME-PV Converter

    • The SME-PV Converter would read documents written in SME and convert them into code readable by the proof checking program, as well as by the automatic theorem provers.
    • Such a program is essentially the subject of Ganesingalam’s book.
    • Converting SME so that the Proof Verifier can handle it involves lots of subtleties. For example, if the text says, “For any $x$, $x^2+1\gt0$”, the translation has to recognize not only that this is a universally quantified statement with $x$ as the bound variable, but that $x$ must be a real number, since complex numbers don’t do greater-than.
    • Frequent revisions of the SME-PV Converter will be necessary since its input language, the SME, will be constantly expanded.
    • It may be that the output language of the SME-PV Converter (which the Proof Verifier and Automatic Theorem Provers read) will require only infrequent revisions.

    The Automatic Theorem Provers

    • The system could support several ATP’s, each one adapted to read the output of the SME-PV Converter.
    • The Automatic Theorem Provers should provide output in such a way that the Proof Verifier can include in its report the positive or negative results of the Theorem Prover in detail.

    The Annotation System

    • The Annotation system would facilitate construction of a data structure that connects each annotation to the specific piece of text it rewrites. The linking should be facilitated by the Annotation Editor.
    • For example, an annotation that is meant to explain that the statement (in the input text) “$x^2+1$ is always greater than $0$” is to be translated as “$\forall x(x^2+1\gt0)$” (which is presumably allowed by SME) should cause the first statement to be be linked to the second statement. The first statement, the one in the input text, should not be changed. This will enable the Data Base Miner to find patterns of similar text being annotated in similar ways.
    • The annotations should clarify words, symbolic expressions and sentences in the input text to allow the Proof Verifier to input them correctly.
    • In particular, every claim that a statement is true should be marked as a proposed theorem, and similarly every proof should be marked as a proof and every definition should be marked as a definition. Such labeling is often omitted in the math literature. Annotators would have to recognize segments of the text as claims, proofs and definitions and annotate them as such.
    • The annotations would be written in the current version of Strict Math English. Since SME is frequently updated, the instructions for the annotator would also have to be frequently updated.

    Examples

    • If a paper used the word “domain” without defining it, the annotator would clarify whether it meant an open connected set, a type of ring, a type of poset, or the domain of a function. See Example 1
    • Annotators will note instances in which the same text will use a symbol with two different meanings. See Example 2.
    • In a phrase, a single occurrence of a symbol can require an annotation that assigns more than one attribute to the symbol. See Example 3.

    The Annotation Editor

    • The annotators should be provided with an Annotation Editor designed specifically for annotation.
    • The editor should include a system of linking an annotation to the exact phrase it annotates that is easy for a person reading the annotated document to understand it as well as providing the information to the Text-SME Converter.

    The Annotators

    • Great demands will be made of an annotator.
    • They must understand the detailed meaning of the text they annotate. This means they must be quite familiar with the field of math the text is concerned with.
    • They must learn SME. I know for a fact that many mathematicians are not good at learning foreign languages. It will help that SME will be a subset of the full language of math.
    • All this means that annotators must be chosen carefully and paid well. This means that not very many papers will get annotated by paid annotators, so that there will have to be some committee that chooses the papers to be annotated. This will be a genuine bottleneck.
    • One thing that will help in the long run is that the SME should evolve to include more features of the general language of math, so many mathematicians will actually write their papers in SME and submit it directly to the Depository. (“Long run” may mean more than ten years).

    The Text-to-SME Converter

    • This converter takes a math text in ordinary Math English that has been annotated and convert it into SME.
    • The format for feeding it to the Automatic Theorem Prover may very well have to be different from the format to be read by a human. Both formats should be saved.

    The Data Base

    • The Data Base would contain all math papers that have been run through the Proof Verifier, along with the results found by the Proof Verifier. A paper should be included whether or not every claim in the paper was verified.
    • Funding agencies (and private individuals) might choose particularly important papers and pay more money for annotation for those than for other papers.
    • Mathematicians in a particular field could be hired to annotate particular articles in their field, using a standard annotation language that would develop through time.
    • The annotated papers would be made freely available to the public.
    • It will no doubt prove useful for the Data Base to contain many other items. Possibilities:
    • A searchable list of all theorems that have been verified.
    • A glossary: a list of math words that have been defined in the papers in the Depository. This will include synonyms and words with multiple meanings.

    The Data Base Miner

    Watch for patterns

    The DBM would watch for patterns in annotation as new annotated papers were submitted. It should probably look only at annotated papers whose proofs had been verified. The patterns might include:

    • Correlation between annotations that associate particular meanings to particular words or symbols with the branch of math the paper belongs to. See Example 1.
    • Noting that a particular format of combining symbols usually results in the same kind of annotation. See Example 4.
    • Providing data in such a way that lexicographers studying math English could make use of them. My Handbook began with my doing lexicographical research on math English, but I found it so slow that when I started abstractmath.org I resolved not to such research any more. Nevertheless, it needs to be done and the Database should make the process much easier.

    Statistical translation

    Since the annotated papers will be stored in the Data Base, the Data Base Miner could use the annotations in somewhat the same way some language translators work (in part): to translate a phrase, it will find occurrences of the phrase in the source language that have been translated into the target language and use the most common translation. In this case the source language is the paper (in English) and the target language is in annotated math English readable by the Proof Verifier. Once the Database includes most of the papers ever published (twenty years from now?), statistical translation might actually become useful.

    Examples

    Example 1: Meaning varies with branch of math

    • Field” means one thing in an algebra paper and another in a mathematical physics paper.
    • Domain” means
    • An open connected set in topology.
    • A type of ring in algebra.
    • A type of poset in theoretical computing science.
    • The domain of a function –everywhere in math, which makes it seem that this is going to be very hard to distinguish without human help!
  • Log” usually implies base $2$ in the computing world, base $10$ in engineering (but I am not sure how prevalant this meaning is there), and base $e$ in pure math. With exceptions!
  • Example 2: Meaning varies even in the same article

    • The notation “$(a,b)$” can mean an ordered pair, an open interval, or the GCD. What’s worse, there are many instances where the symbol is used without definition. Citation 139 in the Handbook provides a single sentence in which the first two meanings both occur:

      $\dots$ Richard Darst and Gerald Taylor investigated the differentiability of functions $f^p$ (which for our purposes we will restrict to $(0,1)$) defined for each $p\geq1$ by\[F(x):=
      \begin{cases}
      0 &
      \text{if }x\text{ is irrational}\\
      \displaystyle{\frac{1}{n^p}} &
      \text{if }x = \displaystyle{\frac{m}{n}}\text{ with }(m,n)=1\\ \end{cases}\]

      The sad thing is that any mathematician will know immediately what each occurrence means. This may be a case where the correct annotation will never be automatically detectable.

    Example 3: One mention of a symbol may require several meanings

    In the sentence, “This infinite series converges to $\zeta(2)=\frac{\pi^2}{6}\approx 1.65$,” the annotator would provide two pieces of information about “$\frac{\pi^2}{6}$”, namely that it is both the right constituent of the equation “$\zeta(2)=\frac{\pi^2}{6}$” and the left constituent of the approximation statement “$\frac{\pi^2}{6}\approx 1.65$” — and that these two statements were the constituents of an asserted conjunction. (See my post Pivoted symbols.)

    Example 4: Function to a power

    Some expressions not in the SME will almost always be annotated in the same way. This makes it discoverable by the Data Base Miner.

    • “$\sin^{-1}x$” always means $\arcsin x$.
    • For positive $n$, “$\sin^n x$” always means $(\sin x)^n$. It never means the $n$-fold application of $\sin$ to $x$.
    • In contrast, for an arbitrary function symbol, $f^n(x)$ will often be annotated as $n$-fold application of $f$ and also often as $f(x)^n$. (And maybe those last two possibilities are correlated by branch of math.)

    References

    I believe that work in formal verification has tended to overlook the work on math language difficulties in math ed, so I have included some articles from that specialty.

    The following are posts from my blog Gyre&Gimble. They are in reverse chronological order.

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.


    Send to Kindle

    The real numbers

    My website abstractmath.org contains separate short articles about certain number systems (natural numbers, integers, rationals, reals). The intent of each article is to discuss problems that students have when they begin studying abstract math. The articles do not give complete coverage of each system. They contain links when concepts are mentioned that the reader might not be familiar with.

    This post is a revision of the abstractmath.org article on the real numbers. The other articles have also been recently revised.

    Introduction

    A real number is a number that can be represented as a (possibly infinite) decimal expansion, such as 2.56, -3 (which is -3.0), 1/3 (which has the infinite decimal expansion 0.333…), and $\pi$. Every integer and every rational number is a real number, but numbers such as $\sqrt{2}$ and $\pi$ are real numbers that are not rational.

    • I will not give a mathematical definition of “real number”.  There are several equivalent definitions of real number all of which are quite complicated.   Mathematicians rarely think about real numbers in terms of these definitions; what they have in mind when they work with them are their familiar algebraic and topological properties.
    • “Real number” is a technical term.  Real numbers are not any more “genuine” that any other numbers.
    • Integers and rational numbers are real numbers, but there are real numbers that are not integers or rationals. One such number is$\sqrt{2}$. Such numbers are called irrational numbers.

    Properties of the real numbers

    Closure

    The real numbers are closed under addition, subtraction, and multiplication, as well as division by a nonzero number.

    Notice that these are exactly the same arithmetic closure properties that rational numbers have. In the previous sections in this chapter on numbers, each new number system — natural numbers, integers and rational numbers — were closed under more arithmetic operations than the earlier ones. We don’t appear to have gained anything concerning arithmetic operations in going from the rationals to the reals.

    The real numbers do allow you to find zeroes of some polynomials that don’t have rational zeroes. For example, the equation $x^2-2=0$ has the root $x=\sqrt{2}$, which is a real number but not a rational number. However, you get only some zeroes of polynomials by going to the reals — consider the equation $x^2+2=0$, which requires going to the complex numbers to get a root.

    Closed under limits

    The real numbers are closed under another operation (not an algebraic operation) that rational numbers are not closed under:

    The real numbers are closed under taking limits.
    That fact is the primary reason real numbers are so important
    in math, science and engineering.

    Consider: The concepts of continuous function, derivative and integral — the basic ideas in calculus and differential equations — are all defined in terms of limits. Those are the basic building blocks of mathematical analysis, which provides most of the mathematical tools used by scientists and engineers.

    Some images and metaphors for real numbers

    Line segments

    The length of any line segment is given by a positive real number.

     

    Example

    The diagonal of the square above has length $2\sqrt{2}$.

    Directed line segments

    Measuring directed line segments requires the use of negative real numbers as well as positive ones. You can regard the diagonal above as a directed line segment. If you regard “left to right” as the positive direction (which is what we usually do), then if you measure it from right to left you get $-2\sqrt{2}$.

    Real numbers are quantities

    Real numbers are used to measure continuous variable quantities.

    Examples
    • The temperature at a given place and a given time.
    • The speed of a moving car.
    • The amount of water in a particular jar.

    Remarks

    • Temperature, speed, volume of water are thought of as quantities that can change, or be changed, which is why I called them “variable” quantities.
    • The name “continuous” for these quantities indicates that the quantity can change from one value to another without “jumping”. (This is a metaphor, not a mathematical definition!)
    Example

    If you have $1.334 \text{ cm}^3$ of water in a jar you can add any additional small amount into it or you can withdraw any small amount from it.  The volume does not suddenly jump from $1.334$ to $1.335$ – as you put in the water it goes up gradually from $1.334$ to $1.335$.

    Caveat

    This explanation of “continuous quantity” is done in terms of how we think about continuous quantities, not in terms of a mathematical definition.  In fact. since you can’t measure an amount smaller than one molecule of water, the volume does jump up in tiny discrete amounts.   Because of quantum phenomena, temperature and speed change in tiny jumps, too (much tinier than molecules). 

    Quantum jumps and individual molecules are ignored in large-scale physical applications because the scale at which they occur is so tiny it doesn’t matter.  For such applications, physicists and chemists (and cooks and traffic policemen!) think of the quantities they are measuring as continuous, even though at tiny scales they are not.

    The fact that scientists and engineers treat changes of physical quantities as continuous, ignoring the fact that they are not continuous at tiny scales, is sometimes called the “continuum hypothesis”. This is not what mathematicians mean by that phrase: see continuum hypothesis in Wikipedia.

    The real line

    It is useful to visualize the set of real numbers as the real line.

    The real line goes off to infinity in both directions. Each real number represents a location on the real line. Some locations are shown here:

    The locations are commonly called points on the real line.  This can lead to a seriously mistaken mental image of the reals as a row of points, like beads.  Just as in the case of the rationals, there is no real number “just to the right” of a given real number. 

    Decimal representation of the real numbers

    In this section, I will go into more detail about the decimal representation of the real numbers. There are two reasons for doing this.

    • People just beginning abstract math tend to think in terms of bad metaphors about the real numbers as decimals, and I want to introduce ways of thinking about them that are more helpful.
    • The real numbers can be defined in terms of the decimal representation. This is spelled out in a blog post by Tim Gowers. The definition requires some detail and in some ways is inelegant compared to the definitions usually used in analysis textbooks. But it means that the more you understand about the decimal representation, the better you understand real numbers, and in a pretty direct way.

    The decimal representation of a real number is also called its decimal expansion.  A representation can be given to other bases besides $10$; more about that here.

    Decimal representation as directed length.

    The decimal representation of a real number gives the approximate location of the number on the real line as its directed distance from $0$.

    Examples
    • The rational number $1/2$ is real and has the decimal representation $0.5$.
    • The rational number $-1/2$ has the representation $-0.5$.
    • The number $1/3$ is also real and has the infinite decimal representation $1.333\ldots$. Thereis an infinite number of $3$’s, or to put it another way, for every
      positive integer $n$, the $n$th decimal place of the decimal representation of $1/3$ is $3$.
    • The number $\pi $ has a decimal representation beginning $3.14159\ldots$. So you can locate $\pi$ approximately by going $3.14$ units to the right from $0$.  You can locate it more exactly by going $3.14159$ units to the right, if you can measure that accurately.  The decimal representation of $\pi$ is infinitely long so you can theoretically represent it with as much accuracy as you wish.  In practice, of course, it would take longer than the age of the universe to find the first ${{10}^{({{10}^{10}})}}$ digits.

    Bar notation

    It is customary to put a bar over a sequence of digits at the end of a decimal representation to indicate that the sequence is repeated forever. 

    Examples
    • $42\frac{1}{3}=42.\overline{3}$
    • $52.71656565\ldots$ (the group $65$ repeating infinitely often) may be written $52.71\overline{65}$.
    • A decimal representation that is only finitely long, for example $5.477$, could also be written $5.477\overline{0}$.
    • In particular, $6=6.0=6.\overline{0}$, and that works for any integer.

    Approximations

    If you give the first few decimal places of a real number, you are giving an approximation to it.  Mathematicians on the one hand and scientists and engineers on the other tend to treat expressions such as $3.14159$ in two different ways:

    • The mathematician may think of it as a precisely given number, namely $\frac{314159}{100000}$, so in particular it represents a rational number. This number is not $\pi$, although it is close to it.
    • The scientistor engineer will probably treat it as the known part of the decimal representation of a real number. From their point of view, one knows $3.14159$ to six significant figures.
    • Abstractmath.org always takes the mathematician’s point of view.  If I refer to $3.14159$, I mean the rational number $\frac{314159}{100000}$.  I may also refer to $\pi$ as “approximately $3.15159$”.

    Integers and reals in computer languages

    Computer languages typically treat integers as if they were distinct from real numbers. In particular, many languages have the convention that the expression ‘$2$’ denotes the integer and the expression ‘$2.0$’ denotes the real number.   Mathematicians do not use this convention.  They usually regard the integer $2$ and the real number $2.0$ as the same mathematical object.

    Decimal representation and infinite series

    The decimal representation of a real number is shorthand for a particular infinite series.  Suppose the part before the decimal place is the integer $n$ and the part after the decimal place is\[{{d}_{1}}{{d}_{2}}{{d}_{3}}…\]where ${{d}_{i}}$ is the digit in the $i$th place.  (For example, for $\pi$, $n=3$, ${{d}_{1}}=1,\,\,\,{{d}_{2}}=4,\,\,\,{{d}_{3}}=1,$ and so forth.)  Then the decimal notation $n.{{d}_{1}}{{d}_{2}}{{d}_{3}}…$ represents the limit of the infinite series\[n+\sum\limits_{i=1}^{\infty }{\frac{{{d}_{i}}}{{{10}^{i}}}}\]

    Example

                 \[42\frac{1}{3}=42+\sum\limits_{i=1}^{\infty}{\frac{3}{{{10}^{i}}}}\]

    The number $42\frac{1}{3}$ is exactly equal to the sum of the infinite series, which is represented by the expression $42.\overline{3}$.

    If you stop the series after a finite number of terms, then the number is approximately equal to the resulting sum. For example, $42\frac{1}{3}$ is approximately equal to\[42+\frac{3}{10}+\frac{3}{100}+\frac{3}{1000}\]which is the same as $42.333$.

    This inequality gives an estimate of the accuracy of this approximation:\[42.333\lt42\frac{1}{3}\lt42.334\]

    How to think about infinite decimal representations

    The expression $42.\overline{3}$ must be thought of as including all the $3$’s all at once rather than as gradually extending to the right over an infinite period of time.

    In ordinary English, the “…” often indicates continuing through time, as in this example

    “They climbed to the top of the ridge, and saw another, higher ridge in the distance, so they walked to that ridge and climbed it, only to see another one still further away…”

    But the situation with decimal representations is different:

    The decimal representation of $42\frac{1}{3}$ as $42.333\ldots$must be thought of as a complete, infinitely long sequence of decimal digits, every one of which (after the decimal point) is a “$3$” right now.

    In the same way, you need to think of the decimal expansion of $\sqrt{2}$ as having all its decimal digits in place at once. Of course, in this case you have to calculate them in order. And note that calculating them is only finding out what they are. They are already there!

    The preceding description is about how a mathematican thinks about infinite decimal expansions.  The thinking has some sort of physical representation in your head that allows you to think about to the hundred millionth decimal place of $\sqrt{2}$ or of $\pi$ even if you don’t know what it is. This does not mean that you have an infinite number of slots in your brain, one for each decimal place!  Nor does it mean that the infinite number of decimal places actually exist “somewhere”.  After all, you can think about unicorns and they don’t actually exist somewhere.

    Exact definitions

    Both the following statements are true:

    • The numbers $1/3$, $\sqrt{2}$and $\pi $ have infinitely long decimal representations, in contrast for example to $\frac{1}{2}$, whose decimal representation is exactly $0.5$.
    • The expressions “$1/3$”, “$\sqrt{2}$” and “$\pi $” exactly determine the numbers $1/3$, $\sqrt{2}$ and $\pi$:

    These two statements don’t contradict each other. All three numbers have exact definitions.

    • $1/3$ is exactly the number that gives 1 when multiplied by $3$.
    • $\sqrt{2}$is exactly the unique positive real number whose square is 2.
    • $\pi $ is exactly the ratio of the circumference of a circle to its
      diameter.

    The decimal representation of each one to a finite number of places provides an approximate location of that number on the real line On the other hand, the complete decimal representation of each one represents it exactly, although you can’t write it down.

    Different decimal representations for the same number

    The decimal representations of two different real numbers must be different. However, two different decimal representations can, in certain circumstances, represent the same real number. This happens when the decimal representation ends in an infinite sequence of $9$’s or an infinite sequence of $0$’s.

    Examples

    • $0.\overline{9}=1.\overline{0}$. This means that $0.\overline{9}$ is exactly the same number as $1$. It is not just an approximation of $1$
    • $3.4\bar{9}=3.5\overline{0}$. Indeed, $3.4\overline{9}$, $3.5$, $35/10$, and $7/2$ are all different representations of the same number. 

    The Wikipedia article “$0.\overline{9}$” is an elaborate discussion of the fact that $0.\overline{9}=1$, a fact that many students find hard to believe.



    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle