Category Archives: exposition

A new kind of introduction to category theory

About this article

  • This post is an alpha version of the first part of the intended article.
  • People who are beginners in learning abstract math concepts have many misunderstandings about the definitions and early theorems of category theory.
  • This article introduces a few basic concepts of category theory. It goes into detail in Purple Prose about the misunderstandings that can arise with each of the concepts. The article is not at all a complete introduction to categories.
  • My blog post Introducing abstract topics describes some of the strategies needed in teaching a new abstract math concept.
  • This article also introduces a few examples of categories that are primarily chosen to cause the reader to come up against some of those misunderstandings. The first example is completely abstract.
  • Math students usually see categories after considerable exposure to abstract math, but students in computing science and other fields may see it without having much background in abstraction. I hope teachers in such courses will include explanations of the sort of misunderstandings mentioned in this article.
  • Like all posts in Gyre&Gimble and all posts in abstractmath.org, this article is licensed under a Creative Commons Attribution-ShareAlike 2.5 License. If you are teaching a class involving category theory, feel free to hand it out, and to modify it (in which case you should include a link to this post).
  • You could also use the article as a source of remarks you make in the class about the topics.

About categories

To be written.

Definition of category

A category is a type of Mathematical structure consisting of two types of data, whose relationships are entirely determined by some axioms. After the definition is complete, I will introduce several categories with a detailed discussion of each one, explaining how they fit the definition of category.

Axiom 1: Data

  1. A category consists of two types of data: objects and arrows.
  2. No object can be an arrow and no arrow can be an object.

Notes for Axiom 1

  • An object of a category can be any kind of mathematical object. It does not have to be a set and it does not have to have elements.
  • Arrows of a category are also called morphisms. You may be familiar with “homomorphisms”, “homeomorphisms” or “isomorphisms”, all of which are functions. This does not mean that a “morphism” in an arbitrary category is a function.

Axiom 2: Domain and codomain

  1. Each arrow has a domain and a codomain, each of which is an object of the category.
  2. The domain and the codomain of an arrow may or may not be the same object.
  3. Each arrow has only one domain and only one codomain.

Notes for Axiom 2

  • If $f$ is an arrow with domain $A$ and codomain $B$, that fact is typically shown either by the notation “$f:A\to B$” or by a diagram like this:
  • The notation “$f:A\to B$” is like that used for functions. This notation may be used in any category, but it does not imply that $f$ is a function or that $A$ and $B$ have elements.
  • For such an arrow, the notation “$\text{dom}(f)$” refers to $A$ and “$\text{cod}(f)$” refers to $B$.
  • For a given category $\mathsf{C}$, the collection of all the arrows with domain $A$ and codomain $B$ may be denoted by
    • “$\text{Hom}(A,B)$” or
    • “$\text{Hom}_\mathsf{C}(A,B)$” or
    • “$\mathsf{C}(A,B)$”.
  • Some newer books and articles in category theory use the name source for domain and target for codomain. This usage has the advantage that a newcomer to category theory will be less likely to think of an arrow as a function.

Axiom 3: Composition

  1. If $f$ and $g$ are arrows in a category for which $\text{cod}(f)=\text{dom}(g)$, as in this diagram:

    then there is a unique arrow with domain $A$ and codomain $C$ called the composite of $f$ and $g$.

Notes for Axiom 3

    diagra

  • An important metaphor for composition is: Every path of length 2 has exactly one composite.
  • The unique arrow required by Axiom 3 may be denoted by “$g\circ f$” or “$gf$”. “$g\circ f$” is more explicit, but “$gf$” is much more commonly used by category theorists.
  • Many constructions in categories may be shown by diagrams, like the one used just above.
  • The diagram

    is said to commute if $h=g\circ f$. The idea is that going along $f$ and then $g$ is the same as going along $h$.

  • It is customary in some texts in category theory to indicate that a diagram commutes by putting a gyre in the middle:
  • The concept of category is an abstraction of the idea of function, and the composition of arrows is an abstraction of the composition of functions. It uses the same notation, “$g\circ f$”. If $f$ and $g$ are set functions, then for an element $x$ in the domain of $f$, \[(g\circ f)(x)=g(f(x))\]
  • But in arbitrary category, it may make no sense to evaluate an arrow $f$ at some element $x$; indeed, the domain of $f$ may not have elements at all, and then the statement “$(g\circ f)(x)=g(f(x))$” is meaningless.

Axiom 4: Identity arrows

  1. For each object $A$ of a category, there is a unique arrow denoted by $\textsf{id}_A$.
  2. $\textsf{dom}(\textsf{id}_A)=A$ and $\textsf{cod}(\textsf{id}_A)=A$.
  3. For any object $B$ and any arrow $f:B\to A$, the diagram

    commutes.

  4. For any object $C$ and any arrow $g:A\to C$, the diagram

    commutes.

Notes for Axiom 4

  • The fact stated in Axiom 4(b) could be shown diagrammatically either as

    or as

  • Facts (c) and (d) can be written in algebraic notation: For any arrow $f$ going to $A$,\[\textsf{id}_A\circ f=f\]and for any arrow $g$ coming from $A$,\[g\circ \textsf{id}_A=g\]

Axiom 5: Associativity

  1. If $f$, $g$ and $h$ are arrows in a category for which $\text{cod}(f)=\text{dom}(g)$ and $\text{cod}(g)=\text{dom}(h)$, as in this diagram:

    then there is a unique arrow $k$ with domain $A$ and codomain $C$ called the composite of $f$, $g$ and $h$.

  2. In the diagram below, the two triangles containing $k$ must both commute.

Notes for Axiom 5

  • Axiom 5b requires that \[h\circ(g\circ f)=(h\circ g)\circ f\](which both equal $k$), which is the usual formula for associativity.
  • Note that the top two triangles commute by Axiom 3.
  • The associativity axiom means that we can get rid of parentheses and write \[k=h\circ f\circ g\]just as we do for addition and multiplication of numbers.
  • In my opinion the notation using categorical diagrams communicates information much more clearly than algebraic notation does. In particular, you don’t have to remember the domains and codomains of the functions — they appear in the picture. I admit that diagrams take up much more space, but now that we read math stuff on a computer screen instead of on paper, space is free.

Examples of categories

For the first three examples, I will give a detailed explanation about how they fit the definition of category.

Example 1: MyFin

This first example is a small, finite category which I have named $\mathsf{MyFin}$ (my very own finite category). It is not at all an important category, but it has advantages as a first example.

  • It’s small enough that you can see all the objects and arrows on the screen at once, but big enough not to be trivial.
  • The objects and arrows have no properties other than being objects and arrows. (The other examples involve familiar math objects.)
  • So in order to check that $\mathsf{MyFin}$ really obeys the axioms for a category, you can use only the skeletal information given here. As a result, you must really understand the axioms!

A correct proof will be based on axioms and theorems. The proof can be suggested by your intuitions, but intuitions are not enough. When working with $\mathsf{MyFin}$ you won’t have any intuitions!

A diagram for $\mathsf{MyFin}$

This diagram gives a partial description of $\mathsf{MyFin}$.

Now let’s see how to make the diagram above into a category.

Axiom 1

  • The objects of $\mathsf{MyFin}$ are $A$, $B$, $C$ and $D$.
  • The arrows are $f$, $g$, $h$, $j$, $k$, $r$, $s$, $u$, $v$, $w$ and $x$.
  • You can regard the letters just listed as names of the objects and arrows. The point is that at this stage all you know about the objects and arrows are their names.
  • If you prefer, you can think of the arrows as the actual arrows shown in the $\mathsf{MyFin}$ diagram.
  • Our definition of $\mathsf{MyFin}$ is an abstract definition. You may have seen multiplication tables of groups given in terms of undefined letters. (If you haven’t, don’t worry.) Those are also abstract definitions.
  • Most of our other definitions of categories involve math objects you actually know something about. They are like the definition of division, for example, where the math objects are integers.

Axiom 2

  • The domains and codomains of the arrows are shown by the diagram above.
  • For example, $\text{dom}(r)=A$ and $\text{cod}(r)=C$, and $\text{dom}(v)=\text{cod}(v)=B$.

Axiom 3

Showing the $\mathsf{MyFin}$ diagram does not completely define $\mathsf{MyFin}$. We must say what the composites of all the paths of length 2 are.

  • In fact, most of them are forced, but two of them are not.
  • We must have $g\circ f=r$ because $r$ is the only arrow possible for the composite, and Axiom 3 requires that every path of length 2 must have a composite.
  • For the same reason, $h\circ g=s$.
  • All the paths involving $u$, $v$, $w$ and $x$ are forced:

  • (p1) $u\circ u=u$, $v\circ v=v$, $w\circ w=w$ and $x\circ x=x$.
  • (p2) $f\circ u=f$, $r\circ u=r$, $j\circ u=j$ and $k\circ u=k$. You can see that, for example, $f\circ u=f$ by opening up the loop on $f$ like this:

    There is only one arrow going from $A$ to $B$, namely$f$, so $f$ has to be the composite $f\circ u$.

  • (p3) $v\circ f=f$, $g\circ v=g$ and $s\circ v=s$.
  • (p4) $w\circ g=g$, $w\circ r=r$ and $h\circ w=h$.
  • (p5) $x\circ h=h$, $x\circ s=s$, $x\circ j=j$ and $x\circ k=k$.

  • For $s\circ f$ and $h\circ r$, we have to choose between $j$ and $k$ as composites. Since $s\circ f=(h\circ g)\circ f$ and $h\circ r=h\circ (g\circ f)$, Axiom 3 requires that we must chose one of $j$ and $k$ to be both composites.

    Definition: $s\circ f=h\circ r=j$.

    If we had defined $s\circ f=h\circ r=k$ we would have a different category, although one that is “isomorphic” to $\mathsf{MyFin}$ (you have to define “isomorphic” or look it up.)

  • Axiom 4

    • It is clear from the $\mathsf{MyFin}$ diagram that for each object there is just one arrow that has that object both as domain and as codomain, as required by Axiom 4a.
    • The requirements in Axiom 4b and 4c are satisfied by statements (p1) through (p5).

    Axiom 5

    • Since we have already required both $(h\circ g)\circ f$ and $h\circ(g\circ f)$ to be $k$, composition is associative.

    Example 2: Set

    To be written.

    This will be a very different example, because it involves known mathematical objects — sets and functions. But there are still issues, for example the fact that the inclusion of $\{1,2\}$ into $\{1,2,3\}$ and the identity map on $\{1,2\}$ are two different arows in the category of sets.

    Example 3: IntegerDiv

    To be written.

    The objects are all the positive integers and there is an arrow from $m$ to $n$ if and only if $m$ divides $n$. So this example involves familiar objects and predicates, but the arrows are nevertheless not functions that take elements to elements. Integers don’t have elements. I would expect to show how the GCD of two integers is a limit.

    References

      Creative Commons License        

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle

    Introducing abstract topics

    I have been busy for the past several years revising abstractmath.org (abmath). Now I believe, perhaps foolishly, that most of the articles in abmath have reached beta, so now it is time for something new.

    For some time I have been considering writing introductions to topics in abstract math, some typically studied by undergraduates and some taken by scientists and engineers. The topics I have in mind to do first include group theory and category theory.

    The point of these introductions is to get the student started at the very beginning of the topic, when some students give up in total confusion. They meet and fall off of what I have called the abstraction cliff, which is discussed here and also in my blog posts Very early difficulties and Very early difficulties II.

    I may have stolen the phrase “abstraction cliff” from someone else.

    Group theory

    Group theory sets several traps for beginning students.

    Multiplication table

    • A student may balk when a small finite group is defined using a set of letters in a multiplication table.
      “But you didn’t say what the letters are or what the multiplication is?”
    • Such a definition is an abstract definition, in contrast to the definition of “prime”, for example, which is stated in terms of already known entities, namely the integers.
    • The multiplication table of a group tells you exactly what the binary operation is and any set with an operation that makes such a table correct is an example of the group being defined.
    • A student who has no understanding of abstraction is going to be totally lost in this situation. It is quite possible that the professor has never even mentioned the concept of abstract definition. The professor is probably like most successful mathematicians: when they were students, they understood abstraction without having to have it explained, and possibly without even noticing they did so.

    Cosets

    • Cosets are a real killer. Some students at this stage are nowhere near thinking of a set as an object or a thing. The concept of applying a binary operation on a pair of sets (or any other mathematical objects with internal structure) is completely foreign to them. Did anyone ever talk to them about mathematical objects?
    • The consequence of this early difficulty is that such a student will find it hard to understand what a quotient group is, and that is one of the major concepts you get early in a group theory course.
    • The conceptual problems with multiplication of cosets is similar to those with pointwise addition of functions. Given two functions $f,g:\mathbb{R}\to\mathbb{R}$, you define $f+g$ to be the function \[(f+g)(x):=f(x)+g(x)\] Along with pointwise multiplication, this makes the space of functions $\mathbb{R}\to\mathbb{R}$ a ring with nice properties.
    • But you have to understand that each element of the ring is a function thought of as a single math object. The values of the function are properties of the function, but they are not elements of the ring. (You can include the real numbers in the ring as constant functions, but don’t confuse me with facts.)
    • Similarly the elements of the quotient group are math objects called cosets. They are not elements of the original group. (To add to the confusion, they are also blocks of a congruence.)

    Isomorphic groups

    • Many books, and many professors (including me) regard two isomorphic groups as the same. I remember getting anguished questions: “But the elements of $\mathbb{Z}_2$ are equivalence classes and the elements of the group of permutations of $\{1,2\}$ are functions.”
    • I admit that regarding two isomorphic groups as the same needs to be treated carefully when, unlike $\mathbb{Z}_2$, the group has a nontrivial automorphism group. ($\mathbb{Z}_3$ is “the same as itself” in two different ways.) But you don’t have to bring that up the first time you attack that subject, any more than you have to bring up the fact that the category of sets does not have a set of objects on the first day you define categories.

    Category theory

    Category theory causes similar troubles. Beginning college math majors don’t usually meet it early. But category theory has begun to be used in other fields, so plenty of computer science students, people dealing with databases, and so on are suddenly trying to understand categories and failing to do so at the very start.

    The G&G post A new kind of introduction to category theory constitutes an alpha draft of the first part of an article introducing category theory following the ideas of this post.

    Objects and arrows are abstract

    • Every once in a while someone asks a question on Math StackExchange that shows they have no idea that an object of a category need not have elements and that morphisms need not be functions that take elements to elements.
    • One questioner understood that the claim that a morphism need not be a function meant that it might be a multivalued function.

    Duality

    • That misunderstanding comes up with duality. The definition of dual category requires turning the arrows around. Even if the original morphism takes elements to elements, the opposite morphism does not have to take elements to elements. In the case of the category of sets, an arrow in $\text{Set}^{op}$ cannot take elements to elements — for example, the opposite of the function $\emptyset\to\{1,2\}$.
    • The fact that there is a concrete category equivalent to $\text{Set}^{op}$ is a red herring. It involves different sets: the function corresponding to the function just mentioned goes from a four-element set to a singleton. But in the category $\text{Set}^{op}$ as defined it is simply an arrow, not a function.

    Not understanding how to use definitions

    • Some of the questioners on Math Stack Exchange ask how to prove a statement that is quite simple to prove directly from the definitions of the terms involved, but what they ask and what they are obviously trying to do is to gain an intuition in order to understand why the statement is true. This is backward — the first thing you should do is use the definition (at least in the first few days of a math class — after that you have to use theorems as well!
    • I have discussed this in the blog post Insights into mathematical definitions (which gives references to other longer discussions by math ed people). See also the abmath section Rewrite according to the definitions.

    How an introduction to a math topic needs to be written

    The following list shows some of the tactics I am thinking of using in the math topic introductions. It is quite likely that I will conclude that some tactics won’t work, and I am sure that tactics I haven’t mentioned here will be used.

    • The introductions should not go very far into the subject. Instead, they should bring an exhaustive and explicit discussion of how to get into the very earliest part of the topic, perhaps the definition, some examples, and a few simple theorems. I doubt that a group theory student who hasn’t mastered abstraction and what proofs are about will ever be ready to learn the Sylow theorems.
    • You can’t do examples and definitions simultaneously, but you can come close by going through an example step by step, checking each part of the definition.
    • There is a real split between students who want the definitions first
      (most of whom don’t have the abstraction problems I am trying to overcome)
      and those who really really think they need examples first (the majority)
      because they don’t understand abstraction.

    • When you introduce an axiom, give an example of how you would prove that some binary operation satisfies the axiom. For example, if the axiom is that every element of a group must have an inverse, right then and there prove that addition on the integers satisfies the axiom and disprove that multiplication on integers satisies it.
    • When the definition uses some undefined math objects, point out immediately with examples that you can’t have any intuition about them except what the axioms give you. (In contrast to definition of division of integers, where you and the student already have intuitions about the objects.)
    • Make explicit the possible problems with abstractmath.org and Gyre&Gimble) will indeed find it difficult to become mathematical researchers — but not impossible!
    • But that is not the point. All college math professors will get people who will go into theoretical computing science, and therefore need to understand category theory, or into particle physics, and need to understand groups, and so on.
    • By being clear at the earliest stages of how mathematicians actually do math, they will produce more people in other fields who actually have some grasp of what is going on with the topics they have studied in math classes, and hopefully will be willing to go back and learn some more math if some type of math rears its head in the theories of their field.
    • Besides, why do you want to alienate huge numbers of people from math, as our way of teaching in the past has done?
    • “Our” means grammar school teachers, high school teachers and college professors.

    Acknowledgment

    Thanks to Kevin Clift for corrections.

      Creative Commons License        

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle

    Very early difficulties II

    Very early difficulties II

    This is the second part of a series of posts about certain difficulties math students have in the very early stages of studying abstract math. The first post, Very early difficulties in studying abstract math, gives some background to the subject and discusses one particular difficulty: Some students do not know that it is worthwhile to try starting a proof by rewriting what is to be proved using the definitions of the terms involved.

    Math StackExchange

    The website Math StackExchange is open to any questions about math, even very easy ones. It is in contrast with Math OverFlow, which is aimed at professional mathematicians asking questions in their own field.

    Math SE contains many examples of the early difficulties discussed in this series of posts, and I recommend to math ed people (not just RUME people, since some abstract math occurs in advanced high school courses) that they might consider reading through questions on Math SE for examples of misunderstanding students have.

    There are two caveats:

    • Most questions on Math SE are at a high enough level that they don’t really concern these early difficulties.
    • Many of the questions are so confused that it is hard to pinpoint what is causing the difficulty that the questioner has.

    Connotations of English words

    The terms(s) defined in a definition are often given ordinary English words as names, and the beginner automatically associates the connotations of the meaning of the English word with the objects defined in the definition.

    Infinite cardinals

    If $A$ if a finite set, the cardinality of $A$ is simply a natural number (including $0$). If $A$ is a proper subset of another set $B$, then the cardinality of $A$ is strictly less than the cardinality of $B$.

    In the nineteenth century, mathematicians extended the definition of cardinality for infinite sets, and for the most part cardinality has the same behavior as for finite sets. For example, the cardinal numbers are well-ordered. However, for infinite sets it is possible for a set and a proper subset of the set to have the same cardinality. For example, the cardinality of the set of natural numbers is the same as the cardinality of the set of rational numbers. This phenomenon causes major cognitive dissonance.

    Question 1331680 on Math Stack Exchange shows an example of this confusion. I have also discussed the problem with cardinality in the abstractmath.org section Cardinality.

    Morphism in category theory

    The concept of category is defined by saying there is a bunch of objects called objects (sorry bout that) and a bunch of objects called morphisms, subject to certain axioms. One requirement is that there are functions from morphisms to objects choosing a “domain” and a “codomain” of each morphism. This is spelled out in Category Theory in Wikibooks, and in any other book on category theory.

    The concepts of morphism, domain and codomain in a category are therefore defined by abstract definitions, which means that any property of morphisms and their domains and codomains that is true in every category must follow from the axioms. However, the word “morphism” and the talk about domains and codomains naturally suggests to many students that a morphism must be a function, so they immediately and incorrectly expect to evaluate it at an element of its domain, or to treat it as a function in other ways.

    Example

    If $\mathcal{C}$ is a category, its opposite category $\mathcal{C}^{op}$ is defined this way:

    • The objects of $\mathcal{C}^{op}$ are the objects of $\mathcal{C}$.
    • A morphism $f:X\to Y$ of $\mathcal{C}^{op}$ is a morphism from $Y$ to $X$ of $\mathcal{C}$ (swap the domain and codomain).

    In Question 980933 on Math SE, the questioner is saying (among other things) that in $\text{Set}^{op}$, this would imply that there has to be a morphism from a nonempty set to the empty set. This of course is true, but the questioner is worried that you can’t have a function from a nonempty set to the empty set. That is also true, but what it implies is that in $\text{Set}^{op}$, the morphism from $\{1,2,3\}$ to the empty set is not a function from $\{1,2,3\}$ to the empty set. The morphism exists, but it is not a function. This does not any any sense make the definition of $\text{Set}^{op}$ incorrect.

    Student confusion like this tends to make the teacher want to have a one foot by six foot billboard in his classroom saying

    A MORPHISM DOESN’T HAVE TO BE A FUNCTION!

    However, even that statement causes confusion. The questioner who asked Question 1594658 essentially responded to the statement in purple prose above by assuming a morphism that is “not a function” must have two distinct values at some input!

    That questioner is still allowing the connotations of the word “morphism” to lead them to assume something that the definition of category does not give: that the morphism can evaluate elements of the domain to give elements of the codomain.

    So we need a more elaborate poster in the classroom:

    The definition of “category” makes no requirement
    that an object has elements
    or that morphisms evaluate elements.

    As was remarked long long ago, category theory is pointless.

    English words implementing logic

    There are lots of questions about logic that show that students really do not think that the definition of some particular logical construction can possibly be correct. That is why in the abstractmath.org chapter on definitions I inserted this purple prose:

    A definition is a totalitarian dictator.

    It is often the case that you can explain why the definition is worded the way it is, and of course when you can you should. But it is also true that the student has to grovel and obey the definition no matter how weird they think it is.

    Formula and term

    In logic you learn that a formula is a statement with variables in it, for example “$\exists x((x+5)^3\gt2)$”. The expression “$(x+5)^3$” is not a formula because it is not a statement; it is a “term”. But in English, $H_2O$ is a formula, the formula for water. As a result, some students have a remarkably difficult time understanding the difference between “term” and “formula”. I think that is because those students don’t really believe that the definition must be taken seriously.

    Exclusive or

    Question 804250 in MathSE says:

    “Consider $P$ and $Q$. Let $P+Q$ denote exclusive or. Then if $P$ and $Q$ are both true or are both false then $P+Q$ is false. If one of them is true and one of them is false then $P+Q$ is true. By exclusive or I mean $P$ or $Q$ but not both. I have been trying to figure out why the truth table is the way it is. For example if $P$ is true and $Q$ is true then no matter what would it be true?”

    I believe that the questioner is really confused by the plus sign: $P+Q$ ought to be true if $P$ and $Q$ are both true because that’s what the plus sign ought to mean.

    Yes, I know this is about a symbol instead of an English word, but I think the difficulty has the same dynamics as the English-word examples I have given.

    If I have understood this difficulty correctly, it is similar to the students who want to know why $1$ is not a prime number. In that case, there is a good explanation.

    Only if

    The phrase “only if” simply does not mean the same thing in math as it does in English. In Question 17562 in MathSE, a reader asks the question, why does “$P$ only if $Q$” mean the same as “if $P$ then $Q$” instead of “if $Q$ then $P$”?

    Many answerers wasted a lot of time trying to convince us that “$P$ only if $Q$” mean the same as “if $P$ then $Q$” in ordinary English, when in fact it does not. That’s because in English, clauses involving “if” usually connote causation, which does not happen in math English.

    Consider these two pairs of examples.

    1. “I take my umbrella only if it is raining.”
    2. “If I take my umbrella, then it is raining.”
    3. “I flip that switch only if a light comes on.”
    4. “If I flip that switch, a light comes on.”

    The average non-mathematical English speaker will easily believe that (1) and (4) are true, but will balk and (2) and (3). To me, (3) means that the light coming on makes me flip the switch. (2) is more problematical, but it does (to me) have a feeling of causation going the wrong way. It is this difference that causes students to balk at the equivalence in math of “$P$ only if $Q$” and “If $P$, then $Q$”. In math, there is no such thing as causation, and the truth tables for implication force us to live with the fact that these two sentences mean the same thing.

    Henning Makholm’ answer to Question 17562 begins this way: “I don’t think there’s really anything to understand here. One simply has to learn as a fact that in mathematics jargon the words ‘only if’ invariably encode that particular meaning. It is not really forced by the everyday meanings of ‘only’ and’ if’ in isolation; it’s just how it is.” That is the best way to answer the question. (Other answerers besides Makholm said something similar.)

    I have also discussed this difficulty (and other difficulties with logic) in the abmath section on “only if“.

    References

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle

    Mathematical Information II

    Introduction

    This is the second post about Mathematical Information inspired by talks the AMS meeting in Seattle in January, 2016. The first post was Mathematical Information I. That post covered, among other things, types of explanations.

    In this post as in the previous one, footnotes link to talks at Seattle that inspired me to write about a topic. The speakers may not agree with what I say.

    The internet

    Math sources on the internet

    Publishing math on the internet

    • Publishing on the internet is instantaneous, in the sense that once it is written (which of course may take a long time), it can be made available on the internet immediately.
    • Publishing online is also cheap. It requires only a modest computer, an editor and LaTeX or MathJax, all of which are either free, one-time purchases, or available from your university. (These days all these items are required for publishing a math book on paper or submitting an article to a paper journal as well as for publishing on the internet.)
    • Publishing online has the advantage that taking up more space does not cost more. I believe this is widely underappreciated. You can add comments explaining how you think about some type of math object, or about false starts that you had to abandon, and so on. If you want to refer to a diagram that occurs in another place in the paper, you can simply include a copy in the current place. (It took me much too long to realize that I could do things like that in abstractmath.org.)

    Online journals

    Many new online journals have appeared in the last few years. Some of them are deliberately intended as a way to avois putting papers behind a paywall. But aside from that, online journals speed up publication and reduce costs (not necessarily to zero if the journal is refereed).

    A special type of online journal is the overlay journalG. A paper published there is posted on ArXiv; the journal merely links to it. This provides a way of refereeing articles that appear on ArXiv. It seems to me that such journals could include articles that already appear on ArXiv if the referees deem them suitable.

    Types of mathematical communication

    I wrote about some types of math communication in Mathematical Information I.

    The paper Varieties of Mathematical Prose, by Atish Bagchi and me, describes other forms of communicating math not described here.

    What mathematicians would like to know

    Has this statement been proved?G

    • The internet has already made it easier to answer this query: Post it on MathOverflow or Math Stack Exchange.
    • It should be a long-term goal of the math community to construct a database of what is known. This would be a difficult, long-term project. I discussed it in my article The Mathematical Depository: A Proposal, which concentrated on how the depository should work as a system. Constructing it would require machine reading and understanding of mathematical prose, which is difficult and not something I know much about (the article gives some references).
    • An approach that would be completely different from the depository might be through a database of proved theorems that anyone could contribute to, like a wiki, but with editing to maintain consistency, avoid repetition, etc.

    Known information about a conjecture

    This information could include partial results.G An example would be Falting’s Theorem, which implies a partial result for Fermat’s Last Theorem: there is only a finite number of solutions of $x^n+y^n=z^n$ for integers $x, y, z, n$, $n\gt2$. That theorem became widely known, but many partial results never even get published.

    Strategies for proofs

    Strategies that are useful in a particular field.

    The website Tricki is developing a list of such strategies.

    It appears that Tricki should be referred to as “The Tricki”, like The Hague and The Bronx.

    Note that there are strategies that essentially work just once, to prove some important theorem. For example, Craig’s Trick, to prove that a recursively enumerable theory is recursive. But of course, who can say that it will never be useful for some other theorem? I can’t think of how, though.

    Strategies that don’t work, and whyG

    The article How to discover for yourself the solution of the cubic, by Timothy Gowers, leads you down the garden path of trying to “complete the cubic” by copying the way you solve a quadratic, and then showing conclusively that that can’t possibly work.

    Instructors should point out situations like that in class when they are relevant. A database of Methods That Work Here But Not There would be helpful, too. And, most important of all, if you run into a method that doesn’t work when you are trying to prove a theorem, when you do prove it, mention the failed method in your paper! (Remember: space is now free.)

    Examples and Counterexample

    I discovered these examples in twenty minutes on the internet.

    Discussions

    “Mathematical discussion is very useful and virtually unpublishable.”G But in the internet age they can take place online, and they do, in discussion lists for particular branches of math. That is not the same thing as discussing in person, but it is still useful.

    PolymathG

    Polymath sessions are organized attempts to use a kind of crowdsourcing to study (and hopefully prove) a conjecture. The Polymath blog and the Polymath wiki provide information about ongoing efforts.

    Videos

    • Videos that teach math are used all over the world now, after the spectacular success of Khan Academy.
    • Some math meetings produce videos of invited talks and make them available on You Tube. It would be wonderful if a systematic effort could be made to increase the number of such videos. I suppose part of the problem is that it requires an operator to operate the equipment. It is not impossible that filming an academic lecture could be automated, but I don’t know if anyone is doing this. It ought to be possible. After all, some computer games follow the motions of the player(s).
    • There are some documentaries explaining research-level math to the general public, but I don’t know much about them. Documentaries about other sciences seem much more common.

    References

    The talks in Seattle

    • List of all the talks.
    • W. Timothy Gowers, How should mathe­matical knowledge be organized? Talk at the AMS Special Session on Mathe­matical Information in the Digital Age of Science, 6 January 2016.
    • Mathematical discussions, links to pages by Timothy Gowers. “Often [these pages] contain ideas that I have come across in one way or another and wish I had been told as an undergraduate.”
    • Colloquium notes

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle

    abstractmath.org beta

    Around two years ago I began a systematic revision of abstractmath.org. This involved rewriting some of the articles completely, fixing many errors and bad links, and deleting some articles. It also involved changing over from using Word and MathType to writing directly in html and using MathJax. The changeover was very time consuming.

    Before I started the revision, abstractmath.org was in alpha mode, and now it is in beta. That means it still has flaws, and I will be repairing them probably till I can’t work any more, but it is essentially in a form that approximates my original intention for the website.

    I do not intend to bring it out of beta into “final form”. I have written and published three books, two of them with Michael Barr, and I found the detailed work necessary to change it into its final form where it will stay frozen was difficult and took me away from things I want to do. I had to do it that way then (the olden days before the internet) but now I think websites that are constantly updated and have live links are far more useful to people who want to learn about some piece of math.

    My last book, the Handbook of Mathematical Discourse, was in fact published after the internet was well under way, but I was still thinking in Olden Days Paper Mode and never clearly realized that there was a better way to do things.

    In any case, the entire website (as well as Gyre&Gimble) is published under a Creative Commons license, so if someone wants to include part or all of it in another website, or in a book, and revise it while they do it, they can do so as long as they publish under the terms of the license and link to abstractmath.org.

    Previous posts about the evolution of abstractmath.org

    Books by Michael Barr and Charles Wells

    Toposes, triples and theories

    Category theory for computing science

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle

    Mathematical Information I

    Introduction

    The January, 2016 meeting of the American Mathematical Society in Seattle included a special session on Mathe­matical Information in the Digital Age of Science. Here is a link to the list of talks in that session (you have to scroll down a ways to get to the list).

    Several talks at that session were about communi­cating math, to other mathe­maticians and to the general public. Well, that’s what I have been about for the last 20 years. Mostly.

    Overview

    These posts discuss the ways we communi­cate math and (mostly in later posts) the revolution in math communication that the internet has caused. Parts of this discussion were inspired by the special session talks. When they are relevant, I include footnotes referring to the talks. Be warned that what I say about these ideas may not be the same as what the speakers had to say, but I feel I ought to give them credit for getting me to think about those concepts.

    Some caveats

    • The distinctions between different kinds of math communi­cation are inevitably fuzzy.
    • Not all kinds of communication are mentioned.
    • Several types of communication normally occur in the same document.

    Articles published in journals

    Until recently, math journals were always published on paper. Now many journals exist only on the internet. What follows is a survey of the types of articles published in journals.

    Refereed papers containing new results

    These communications typically containing proofs of (usually new) theorems. Such papers are the main way that academic mathematicians get credit for their researchG for the purpose of getting tenure (at least in the USA), although some other types of credit are noted below.

    Proofs published in refereed journals in the past were generally restricted to formal proofs, without very many comments intended to aid the reader’s under­standing. This restricted text was often enforced by the journal. In the olden days this would have been prompted by the expense of publishing on paper. I am not sure how much this restriction has relaxed in electronic journals.

    I have been writing articles for abstractmath.org and Gyre&Gimble for many years, and it has taken me a very long time to get over unnecessarily restricting the space I use in what I write. If I introduce a diagram in an article and then want to refer to it later, I don’t have to link to it — I can copy it into the current location. If it makes sense for an informative paragraph to occur in two different articles, I can put it into both articles. And so on. Nowadays, that sort of thing doesn’t cost anything.

    Survey articles and invited addresses

    You may also get credit for an invited address to a prestigious organi­zation, or for a survey of your field, in for example the Bulletin of the AMS. Invited addresses and surveys may contain considerably more explanatory asides. This was quite noticeable in the invited talks at the AMS Seattle meeting.

    Books

    There is a whole spectrum of math books. The following list mentions some Fraunhofer lines on the spectrum, but the gamut really is as continuous as a large finite list of books could be. This list needs more examples. (This is a blog post, so it has the status of an alpha release.)

    Research books that are concise and without much explanation.

    The Bourbaki books that I have dipped into (mostly the algebra book and mostly in the 1970’s) are definitely concise and seem to strictly avoid explanation, diagrams, pictures, etc). I have heard people say they are unreadable, but I have not found them so.

    Contain helpful explanations that will make sense to people in the field but probably would be formidable to someone in a substantially different area.

    Toposes, triples and theories, by Michael Barr and Charles Wells. I am placing our book here in the spectrum because several non-category-theorists (some of them computer scientists) have remarked that it is “formidable” or other words like that.

    Intended to introduce professional mathematicians to a particular field.

    Categories for the working mathematician, by Saunders Mac Lane. I learned from this (the 1971 edition) in my early days as a category theorist, six years after getting my Ph.D. In fact, I think that this book belongs to the grad student level instead of here, but I have not heard any comments one way or another.

    Intended to introduce math graduate students to a particular field.

    There are lots of examples of good books in this area. Years ago (but well after I got my Ph.D.), I found Serge Lang’s Algebra quite useful and studied parts of it in detail.

    But for grad students? It is still used for grad students, but perhaps Nathan Jacobson’s Basic Algebra would be a better choice for a first course in algebra for first-year grad students.

    The post My early life as a mathematician discusses algebra texts in the olden days, among other things.

    Intended to explain a part of math to a general audience.

    Love and math: the heart of hidden reality. by Edward Frenkel, 2014. This is a wonderful book. After reading it, I felt that at last I had some clue as to what was going on with the Langlands Program. He assumes that the reader knows very little about math and gives hand-waving pictorial expla­nations for some of the ideas. Many of the concepts in the book were already familiar to me (not at an expert level). I doubt that someone who had had no college math courses that included some abstract math would get much out of it.

    Symmetry: A Journey into the Patterns of Nature, by Marcus du Sautoy, 2009. He also produced a video on symmetry.

    My post Explaining “higher” math to beginners, describes du Sautoy’s use of terminology (among others).

    Secrets of creation: the mystery of the prime numbers (Volume 1) by Matthew Watkins (author) and Matt Tweed (Illustrator), 2015. This is the first book of a trilogy that explains the connection between the Riemann $\zeta$ function and the primes. He uses pictures and verbal descriptions, very little terminology or symbolic notation. This is the best attempt I know of at explaining deep math that might really work for non-mathe­maticians.

    My post The mystery of the prime numbers: a review describes the first book.

    Piper Harron’s Thesis

    The Equidistribution of Lattice Shapes of Rings of Integers of Cubic, Quartic, and Quintic Number Fields: an Artist’s Rendering, Ph.D. thesis by Piper Harron.

    This is a remarkable departure from the usual dry, condensed, no-useful-asides Ph.D. thesis in math. Each chapter has three main parts, Layscape (explanations for nonspecialists — not (in my opinion) for nonmathe­maticians), Mathscape (most like what goes into the usual math paper but with much more explanation) and Weedscape (irrelevant stuff which she found helpful and perhaps the reader will too). The names of these three sections vary from chapter to chapter. This seems like a great idea, and the parts I have read are well-done.

    These blog posts have useful comments about her thesis:

    Types of explanations

    Any explanation of math in any of the categories above will be of several different types. Some of them are considered here, and more will appear in Mathematical Information II.

    The paper Varieties of Mathematical Prose, by Atish Bagchi and me, provides a more fine-grained description of certain types of math communication that includes some types of explanations and also other types of communication.

    Images and metaphors

    In abstractmath.org

    I have written about images and metaphors in abstractmath.org:

    Abstractmath.org is aimed at helping students who are beginning their study of abstract math, and so the examples are mostly simple and not at a high level of abstraction. In the general literature, the images and metaphors that are written about may be much more sophisticated.

    The User’s GuideW

    Luke Wolcott edits a new journal called Enchiridion: Mathematics User’s Guides (this link allows you to download the articles in the first issue). Each article in this journal is written by a mathematician who has published a research paper in a refereed journal. The author’s article in Enchiridion provides information intended to help the reader to understand the research paper. Enchiridion and its rationale is described in more detail in the paper The User’s Guide Project: Giving Experential Context to Research Papers.

    The guidelines for writing a User’s Guide suggest writing them in four parts, and one of the parts is to introduce useful images and metaphors that helped the author. You can see how the authors’ user’s guides carry this out in the first issue of Enchiridion.

    Piper Harron’s thesis

    Piper Harron’s explanation of integrals in her thesis is a description of integrals and measures using creative metaphors that I think may raise some mathematicians’ consciousness and others’ hackles, but I doubt it would be informative to a non-mathematician. I love “funky-summing” (p. 116ff): it communicates how integration is related to real adding up a finite bunch of numbers in a liberal-artsy way, in other words via the connotations of the word “funky”, in contrast to rigorous math which depends on every word have an accumulation-of-properties definition.

    The point about “funky-summing” (in my opinion, not necessarily Harron’s) is that when you take the limit of all the Riemann sums as all meshes go to zero, you get a number which

    • Is really and truly not a sum of numbers in any way
    • Smells like a sum of numbers

    Connotations communicate metaphors. Metaphors are a major cause of grief for students beginning abstract math, but they are necessary for understanding math. Working around this paradox is probably the most important problem for math teachers.

    Informal summaries of a proofW

    The User’s Guide requires a “colloquial summary” of a paper as one of the four parts of the guide for that paper.

    • Wolcott’s colloquial summary of his paper keeps the level aimed at non-mathematicians, starting with a hand-waving explanation of what a ring is. He uses many metaphors in the process of explaining what his paper does.
    • The colloquial summary of another User’s Guide, by Cary Malkiewich, stays strictly at the general-public level. He uses a few metaphors. I liked his explanation of how mathematicians work first with examples, then finding patterns among the examples.
    • The colloquial summary of David White’s paper stays at the general-public level but uses some neat metaphors. He also has a perceptive paragraph discussing the role of category theory in math.

    The summaries I just mentioned are interesting to read. But I wonder if informal summaries aimed at math majors or early grad students might be more useful.

    Insights

    The first of the four parts of the explanatory papers in Enchiridion is supposed to present the key insights and organizing principles that were useful in coming up with the proofs. Some of them do a good job with this. They are mostly very special to the work in question, but some are more general.

    This suggests that when teaching a course in some math subject you make a point of explaining the basic techniques that have turned out very useful in the subject.

    For example, a fundamental insight in group theory is:

    Study the linear representations of a group.

    That is an excellent example of a fundamental insight that applies everywhere in math:

    Find a functor that maps the math objects you are studying to objects in a different branch of math.

    The organizing principles listed in David White’s article has (naturally more specialized) insights like that.

    Proof stories

    “Proof stories” tell in sequence (more or less) how the author came up with a proof. This means describing the false starts, insights and how they came about. Piper Harron’s thesis does that all through her work.

    Some authors do more than that: their proof stories intertwine the mathe­matical events of their progress with a recount of life events, which sometimes make a mathe­matical difference and sometimes just produces a pause to let the proof stew in their brain. Luke Wolcott wrote a User’s Guide for one of his own papers, and his proof story for that paper involves personal experiences. (I recommend his User’s Guide as a model to learn from.)

    Reports of personal experiences in doing math seem to add to my grasp of the math, but I am not sure I understand why.

    References

    The talks in Seattle

    • List of all the talks.
    • W. Timothy Gowers, How should mathe­matical knowledge be organized? Talk at the AMS Special Session on Mathe­matical Information in the Digital Age of Science, 6 January 2016.
    • Colloquium notes. Gowers gave a series of invited addresses for which these are the notes. They have many instances of describing what sorts of problems obstruct a desirable step in the proof and what can be done about it.

    • Luke Wolcott, The User’s Guide. Talk at the AMS Special Session on Mathe­matical Information in the Digital Age of Science, 6 January 2016.

    Creative Commons License< ![endif]>

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle

    Recent revisions to abstractmath.org

    For the last six months or so I have been systematically going through the abstractmath.org files, editing them for consistency, updating them, and in some cases making major revisions.

    In the past I have usually posted revised articles here on Gyre&Gimble, but WordPress makes it difficult to simply paste the HTML into the WP editor, because the editor modifies the HTML and does things such as recognizing line breaks and extra spaces which an HTML interpreters is supposed to ignore.

    Here are two lists of articles that I have revised, with links.

    Major revisions

    Other revised articles

    Other recent changes

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle

    The intent of mathematical assertions

    An assertion in mathematical writing can be a claim, a definition or a constraint.  It may be difficult to determine the intent of the author.  That is discussed briefly here.

    Assertions in math texts can play many different roles.

    English sentences can state facts, ask question, give commands, and other things.  The intent of an English sentence is often obvious, but sometimes it can be unexpectedly different from what is apparent in the sentence.  For example, the statement “Could you turn the TV down?” is apparently a question expecting a yes or no answer, but in fact it may be a request. (See the Wikipedia article on speech acts.) Such things are normally understood by people who know each other, but people for whom English is a foreign language or who have a different culture have difficulties with them.

    There are some problems of this sort in math English and the symbolic language, too.  An assertion can have the intent of being a claim, a definition, or a constraint.

    Most of the time the intent of an assertion in math is obvious. But there are conventions and special formats that newcomers to abstract math may not recognize, so they misunderstand the point of the assertion. This section takes a brief look at some of the problems.

    Terminology

    The way I am using the words “assertion”, “claim”, and “constraint” is not standard usage in math, logic or linguistics.


    Claims

    In most circumstances, you would expect that if a lecturer or author makes a math assertion, they are claiming that it is a true statement, and you would be right.

    Examples
    1. “The $240$th digit of $\pi$ after the decimal point is $4$.”
    2. “If a function is differentiable, it must be continuous.”
    3. “$7\gt3$”

    Remarks

    • You don’t have to know whether these statements are true or not to recognize them as claims. An incorrect claim is still a claim.
    • The assertion in (a) is a statement, in this case a false one.  If it claimed the googolth digit was $4$ you would never be able to tell whether it is true or not, but it
      still would be an assertion intended as a claim.
    • The assertion in (b) uses the standard math convention that an indefinite noun phrase (such as “a widget”) in the subject of a sentence is universally quantified (see also the article about “a” in the Glossary.) In other words, “An integer divisible by $4$ must be even” claims that any integer divisible by $4$ is even. This statement is claim, and it is true.
    • (c) is a (true) claim in the symbolic language. (Note that “$3 + 4$” is not an assertion at all, much less a claim.)


    Definitions

    Definitions are discussed primarily in the chapter on definitionsA definition is not the same thing as a claim. 

    Example

    The definition

    “An integer is even if it is divisible by $2$”

    makes the claim

    “An
    integer is even if and only if it is
    divisible by $2$”

    true.

    (If you are surprised that the definition uses “if” but the claim uses “if and only if”, see the Glossary article on “if”.)

    Unmarked definitions

    Math texts sometimes define something without saying that it is a definition. Because of that, students may sometimes think a claim is a definition.

    Example

    Suppose that the concept of “even integer” was new to you and the book said, “A number is even if it is divisible by $4$.” Perhaps you thought that this was a definition. Later the book refers to $6$ as even and you pull your hair out wondering why. The statement is a correct claim but an incorrect definition. A good writer would write something like “Recall that a number is even if it is divisible by $2$, so that in particular it is even if it is divisible by $4$.”

    On the other hand, you may think a definition is only a claim.

    Example

    A lecturer may say “By definition, an integer is even if it is divisible by $2$”, and you write down: “An integer is even if it is divisible by $2$”. Later, you get all panicky wondering How did she know that?? (This has happened to me.)

    The confusion in the preceding example can also occur if a books says, “An integer is even if it is divisible by $2$” and you don’t know about the convention that when an author puts a word or phrase in boldface or italics it may mean that they are defining it.

    A good writer always labels definitions


    Constraints

    Here are two assertions that contain variables.

    • “$n$ is even.”
    • “$x\gt1$”.

    Such an assertion is a constraint (or a condition) if the intent is
    that the assertion will hold in that part of the text (the scope of the constraint). The part of the text in which it holds is usually the immediate vicinity unless the authors explicitly says it will hold in a larger part of the text such as “this chapter” or “in the rest of the book”.

    Examples
    • Sometimes the wording makes it clear that the phrase is a constraint. So a statement such as “Suppose $3x^2-2x-5\geq0$” is a constraint on the possible values of $x$.
    • The statement “Suppose $n$ is even” is an explicit requirement that $n$ be even and an implicit requirement that $n$ be an integer.
    • A condition for which you are told to find the solution(s) is a constraint. For example: “Solve the equation $3x^2-2x-5=0$”. This equation is a constraint on the variable $x$. “Solving” the equation means saying explicitly which numbers make the equation true.

    Postconditions

    The constraint may appear in parentheses after the assertion as a postcondition on an assertion.

    Example

    “$x^2\gt x\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,(\text{all }x\gt1)$”

    which means that if the constraint “$x\gt1$” holds, then “$x^2\gt x$” is true. In other words, for all $x\gt1$, the statement $x^2\gt x$ is true. In this statement, “$x^2\gt x$” is not a constraint, but a claim which is true when the constraint is true.

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle

    Context

    This is a revised draft of the abstractmath.org article on context in math texts. Note: WordPress changed double primes into quotes. Tsk.

    Context

    Written and especially spoken language depends heavily on the context – the physical surroundings, the preceding conversation, and social and cultural assumptions.  Mathematical statements are produced in such contexts, too, but here I will discuss a special thing that happens in math conversation and writing that does not seem to happen much in other sorts of discourse:

    The meanings of expressions
    in both the symbolic language and math English
    change from phrase to phrase
    as the speaker or writer changes the constraints on them.

    Example

    In a math text, before the occurrence of a phrase such as “Let $n=3$”, $n$ may be known only as an integer variable.  After the phrase, it means specifically $3$.  So this phrase changes the meaning of $n$ by constraining $n$
    to be $3$.  We say the context of occurrences of “$n$” before the phrase requires only that $n$ be an integer, but after the occurrence the context requires $n=3$.

    Definition

    In this article, the context at a particular location in mathematical discourse is the sum total of what the reader or listener can know about the symbols and names used in the discourse when they have read everything up to that location.

    Remarks

    • Each clause can change the meaning of or constraints on one or more symbols or names. The conventions in effect during the discourse can also put constraints on the symbols and names.
    • Chierchia and McConnell-Ginet give a mathematical definition of context in the sense described here.
    • The references to “before” and “after” the phrase “Let $3$” refer to the physical location in text and to actual time in spoken math. There is more about this phenomenon in the Handbook of Mathematical Discourse, page 252, items (f) and (g).
    • Contextual changes of this sort take place using the pretense that you are reading the text in order, which many students and professionals do not do (they are “grasshoppers”).
    • I am not aware of much context-changing in everyday speech. One place it does occur is in playing games. For example, during some card games the word “trumps” changes meaning from time to time.
    • In symbolic logic, the context at a given place may be denoted by “$\Gamma$”.

    Detailed example of a math text

    Here is a typical example of a theorem and its proof.  It is printed twice, the second time with comments about the changes of context.  This is the same proof that is already analyzed practically to death in the chapter on presentation of proofs.

    First time through

    Definition: Divides

    Let $m$ and $n$ be integers with $m\ne 0$. The statement “$m$ divides $n$” means that there is an integer $q$ for which $n=qm$

    Theorem

    Let $m$, $n$ and $p$ be integers, with $m$ and $n$ nonzero, and suppose $m$ divides $n$ and $n$ divides $p$.  Then $m$ divides $p$.

    Proof

    By definition of divides, there are integers $q$ and $q’$ for which $n=qm$ and $p=q’n$. We must prove that there is an integer $q”$ for which $p=q”m$. But $p=q’n=q’qm$, so let $q”=q’q$.  Then $p=q”m$.

    Second time, with analysis

    Definition: Divides

    Begins a definition. The word “divides” is the word being defined. The scope of the definition is the following paragraph.

    Let $m$ and $n$ be integers

    $m$ and $n$ are new symbols in this discourse, constrained to be integers.

    with $m\ne 0$

    Another constraint on $m$.

    The statement “$m$ divides $n$ means that”

    This phrase means that what follows is the definition of “$m$ divides $n$”

    there is an integer $q$

    “There is” signals that we are beginning an existence statement and that $q$ is the bound variable within the existence statement.

    for which $n=qm$

    Now we know that “$m$ divides $n$” and “there is an integer $q$ for which $m=qn$” are equivalent statements.  Notes: (1) The first statement would only have implied the second statement if this had not been in the context of a definition. (2) After the conclusion of the definition, $m$, $n$ and $q$ are undefined variables.

    Theorem

    This announces that the next paragraph is a statement has been proved. In fact, in real time the statement was proved long before this discourse was written, but in terms of reading the text in order, it has not yet been proved.

    Let $m$, $n$ and
    $p$ be integers,

    “Let” tells us that the following statement is the hypothesis of an implication, so we can assume that $m$, $n$ and $p$ are all integers.  This changes the status of $m$ and $n$, which were variables used in the preceding paragraph, but whose constraints disappeared at the end of the paragraph.  We are starting over with $m$ and $n$.

    with $m$
    and $n$ nonzero.

    This clause is also part of the hypothesis. We can assume $m$ and $n$ are constrained to be nonzero.

    and suppose $m$ divides $n$ and $n$ divides $p$.

    This is the last clause in the hypothesis. We can assume that $m$ divides $n$ and $n$ divides $p$.

    Then $m$
    divides $p$.

    This is a claim that $m$ divides $p$. It has a different status from the assumptions that $m$ divides $n$ and $n$ divides $p$. If we are going to follow the proof we have to treat $m$ and $n$ as if they divide $n$ and $p$ respectively. However, we can’t treat $m$ as if it divides $p$. All we know is that the author is claiming that $m$ divides $p$, given the facts in the hypothesis.

    Proof

    An announcement that a proof is about to begin, meaning a chain of math reasoning. The fact that it is a proof of the Theorem just stated is not explicitly stated.

    By definition of divides, there are integers $q$ and $q’$ for which $n=qm$ and $p=q’n$.

    The proof uses the direct method (rather than contradiction or induction or some other method) and begins by rewriting the hypothesis using the definition of “divides”. The proof does not announce the use of these techniques, it just starts in doing it. So $q$ and $q$’ are new symbols that satisfy the equations $n=qm$ and $p=q’n$. The phrase “by definition of divides” justifies the introduction of $q$ and $q’$. $m$, $n$ and $p$ have already been introduced in the statement of the Theorem.

    We must prove that there is an integer $q”$ for which $p=q”m$.

    Introduces a new variable $q”$ which has not been given a value. We must define it so that $p=q”m$; this requirement is justified (without saying so) by the definition of “divides”.

    But $p=q’n=q’qm$,

    This is a claim about $p$, $q$, $q’$, $m$ and $n$.  It is justified by certain preceding sentences but this justification is not made explicit. Note that “$p=q’n=q’qm$” pivots on $q’n$, in other words makes two claims about it.

    so let $q”=q’q$.

    We have already introduced $q”$; now we give it the value $q”=q’q$.

    Then $p=q”m$

    This is an assertion about $p$, $q”$ and $n$, justified (but not explicitly — note the hidden use of associativity) by the previous claim that $p=q’n=q’qm$.

     

    The proof is now complete, although no
    statement asserts that it is.

    Remark

    If you have some skill in reading proofs, all the stuff in the right hand column happens in your brain without, for the most part, your being conscious of it.

    Acknowledgment

    Thanks to Chris Smith for correcting errors.

    References for “context”

    Chierchia, G. and S. McConnell-Ginet
    (1990), Meaning and Grammar. The MIT Press.

    de Bruijn, N. G. (1994), “The mathematical vernacular, a
    language for mathematics with typed sets”. In Selected Papers on Automath,
    Nederpelt, R. P., J. H. Geuvers, and R. C. de Vrijer, editors, volume 133 of
    Studies in Logic and the Foundations of Mathematics, pages 865 – 935. Elsevier

    Steenrod, N. E., P. R. Halmos, M. M. Schif­fer,
    and J. A. Dieudonné (1975), How to Write Mathematics.
    American Mathematical Society.

    Send to Kindle

    The real numbers

    My website abstractmath.org contains separate short articles about certain number systems (natural numbers, integers, rationals, reals). The intent of each article is to discuss problems that students have when they begin studying abstract math. The articles do not give complete coverage of each system. They contain links when concepts are mentioned that the reader might not be familiar with.

    This post is a revision of the abstractmath.org article on the real numbers. The other articles have also been recently revised.

    Introduction

    A real number is a number that can be represented as a (possibly infinite) decimal expansion, such as 2.56, -3 (which is -3.0), 1/3 (which has the infinite decimal expansion 0.333…), and $\pi$. Every integer and every rational number is a real number, but numbers such as $\sqrt{2}$ and $\pi$ are real numbers that are not rational.

    • I will not give a mathematical definition of “real number”.  There are several equivalent definitions of real number all of which are quite complicated.   Mathematicians rarely think about real numbers in terms of these definitions; what they have in mind when they work with them are their familiar algebraic and topological properties.
    • “Real number” is a technical term.  Real numbers are not any more “genuine” that any other numbers.
    • Integers and rational numbers are real numbers, but there are real numbers that are not integers or rationals. One such number is$\sqrt{2}$. Such numbers are called irrational numbers.

    Properties of the real numbers

    Closure

    The real numbers are closed under addition, subtraction, and multiplication, as well as division by a nonzero number.

    Notice that these are exactly the same arithmetic closure properties that rational numbers have. In the previous sections in this chapter on numbers, each new number system — natural numbers, integers and rational numbers — were closed under more arithmetic operations than the earlier ones. We don’t appear to have gained anything concerning arithmetic operations in going from the rationals to the reals.

    The real numbers do allow you to find zeroes of some polynomials that don’t have rational zeroes. For example, the equation $x^2-2=0$ has the root $x=\sqrt{2}$, which is a real number but not a rational number. However, you get only some zeroes of polynomials by going to the reals — consider the equation $x^2+2=0$, which requires going to the complex numbers to get a root.

    Closed under limits

    The real numbers are closed under another operation (not an algebraic operation) that rational numbers are not closed under:

    The real numbers are closed under taking limits.
    That fact is the primary reason real numbers are so important
    in math, science and engineering.

    Consider: The concepts of continuous function, derivative and integral — the basic ideas in calculus and differential equations — are all defined in terms of limits. Those are the basic building blocks of mathematical analysis, which provides most of the mathematical tools used by scientists and engineers.

    Some images and metaphors for real numbers

    Line segments

    The length of any line segment is given by a positive real number.

     

    Example

    The diagonal of the square above has length $2\sqrt{2}$.

    Directed line segments

    Measuring directed line segments requires the use of negative real numbers as well as positive ones. You can regard the diagonal above as a directed line segment. If you regard “left to right” as the positive direction (which is what we usually do), then if you measure it from right to left you get $-2\sqrt{2}$.

    Real numbers are quantities

    Real numbers are used to measure continuous variable quantities.

    Examples
    • The temperature at a given place and a given time.
    • The speed of a moving car.
    • The amount of water in a particular jar.

    Remarks

    • Temperature, speed, volume of water are thought of as quantities that can change, or be changed, which is why I called them “variable” quantities.
    • The name “continuous” for these quantities indicates that the quantity can change from one value to another without “jumping”. (This is a metaphor, not a mathematical definition!)
    Example

    If you have $1.334 \text{ cm}^3$ of water in a jar you can add any additional small amount into it or you can withdraw any small amount from it.  The volume does not suddenly jump from $1.334$ to $1.335$ – as you put in the water it goes up gradually from $1.334$ to $1.335$.

    Caveat

    This explanation of “continuous quantity” is done in terms of how we think about continuous quantities, not in terms of a mathematical definition.  In fact. since you can’t measure an amount smaller than one molecule of water, the volume does jump up in tiny discrete amounts.   Because of quantum phenomena, temperature and speed change in tiny jumps, too (much tinier than molecules). 

    Quantum jumps and individual molecules are ignored in large-scale physical applications because the scale at which they occur is so tiny it doesn’t matter.  For such applications, physicists and chemists (and cooks and traffic policemen!) think of the quantities they are measuring as continuous, even though at tiny scales they are not.

    The fact that scientists and engineers treat changes of physical quantities as continuous, ignoring the fact that they are not continuous at tiny scales, is sometimes called the “continuum hypothesis”. This is not what mathematicians mean by that phrase: see continuum hypothesis in Wikipedia.

    The real line

    It is useful to visualize the set of real numbers as the real line.

    The real line goes off to infinity in both directions. Each real number represents a location on the real line. Some locations are shown here:

    The locations are commonly called points on the real line.  This can lead to a seriously mistaken mental image of the reals as a row of points, like beads.  Just as in the case of the rationals, there is no real number “just to the right” of a given real number. 

    Decimal representation of the real numbers

    In this section, I will go into more detail about the decimal representation of the real numbers. There are two reasons for doing this.

    • People just beginning abstract math tend to think in terms of bad metaphors about the real numbers as decimals, and I want to introduce ways of thinking about them that are more helpful.
    • The real numbers can be defined in terms of the decimal representation. This is spelled out in a blog post by Tim Gowers. The definition requires some detail and in some ways is inelegant compared to the definitions usually used in analysis textbooks. But it means that the more you understand about the decimal representation, the better you understand real numbers, and in a pretty direct way.

    The decimal representation of a real number is also called its decimal expansion.  A representation can be given to other bases besides $10$; more about that here.

    Decimal representation as directed length.

    The decimal representation of a real number gives the approximate location of the number on the real line as its directed distance from $0$.

    Examples
    • The rational number $1/2$ is real and has the decimal representation $0.5$.
    • The rational number $-1/2$ has the representation $-0.5$.
    • The number $1/3$ is also real and has the infinite decimal representation $1.333\ldots$. Thereis an infinite number of $3$’s, or to put it another way, for every
      positive integer $n$, the $n$th decimal place of the decimal representation of $1/3$ is $3$.
    • The number $\pi $ has a decimal representation beginning $3.14159\ldots$. So you can locate $\pi$ approximately by going $3.14$ units to the right from $0$.  You can locate it more exactly by going $3.14159$ units to the right, if you can measure that accurately.  The decimal representation of $\pi$ is infinitely long so you can theoretically represent it with as much accuracy as you wish.  In practice, of course, it would take longer than the age of the universe to find the first ${{10}^{({{10}^{10}})}}$ digits.

    Bar notation

    It is customary to put a bar over a sequence of digits at the end of a decimal representation to indicate that the sequence is repeated forever. 

    Examples
    • $42\frac{1}{3}=42.\overline{3}$
    • $52.71656565\ldots$ (the group $65$ repeating infinitely often) may be written $52.71\overline{65}$.
    • A decimal representation that is only finitely long, for example $5.477$, could also be written $5.477\overline{0}$.
    • In particular, $6=6.0=6.\overline{0}$, and that works for any integer.

    Approximations

    If you give the first few decimal places of a real number, you are giving an approximation to it.  Mathematicians on the one hand and scientists and engineers on the other tend to treat expressions such as $3.14159$ in two different ways:

    • The mathematician may think of it as a precisely given number, namely $\frac{314159}{100000}$, so in particular it represents a rational number. This number is not $\pi$, although it is close to it.
    • The scientistor engineer will probably treat it as the known part of the decimal representation of a real number. From their point of view, one knows $3.14159$ to six significant figures.
    • Abstractmath.org always takes the mathematician’s point of view.  If I refer to $3.14159$, I mean the rational number $\frac{314159}{100000}$.  I may also refer to $\pi$ as “approximately $3.15159$”.

    Integers and reals in computer languages

    Computer languages typically treat integers as if they were distinct from real numbers. In particular, many languages have the convention that the expression ‘$2$’ denotes the integer and the expression ‘$2.0$’ denotes the real number.   Mathematicians do not use this convention.  They usually regard the integer $2$ and the real number $2.0$ as the same mathematical object.

    Decimal representation and infinite series

    The decimal representation of a real number is shorthand for a particular infinite series.  Suppose the part before the decimal place is the integer $n$ and the part after the decimal place is\[{{d}_{1}}{{d}_{2}}{{d}_{3}}…\]where ${{d}_{i}}$ is the digit in the $i$th place.  (For example, for $\pi$, $n=3$, ${{d}_{1}}=1,\,\,\,{{d}_{2}}=4,\,\,\,{{d}_{3}}=1,$ and so forth.)  Then the decimal notation $n.{{d}_{1}}{{d}_{2}}{{d}_{3}}…$ represents the limit of the infinite series\[n+\sum\limits_{i=1}^{\infty }{\frac{{{d}_{i}}}{{{10}^{i}}}}\]

    Example

                 \[42\frac{1}{3}=42+\sum\limits_{i=1}^{\infty}{\frac{3}{{{10}^{i}}}}\]

    The number $42\frac{1}{3}$ is exactly equal to the sum of the infinite series, which is represented by the expression $42.\overline{3}$.

    If you stop the series after a finite number of terms, then the number is approximately equal to the resulting sum. For example, $42\frac{1}{3}$ is approximately equal to\[42+\frac{3}{10}+\frac{3}{100}+\frac{3}{1000}\]which is the same as $42.333$.

    This inequality gives an estimate of the accuracy of this approximation:\[42.333\lt42\frac{1}{3}\lt42.334\]

    How to think about infinite decimal representations

    The expression $42.\overline{3}$ must be thought of as including all the $3$’s all at once rather than as gradually extending to the right over an infinite period of time.

    In ordinary English, the “…” often indicates continuing through time, as in this example

    “They climbed to the top of the ridge, and saw another, higher ridge in the distance, so they walked to that ridge and climbed it, only to see another one still further away…”

    But the situation with decimal representations is different:

    The decimal representation of $42\frac{1}{3}$ as $42.333\ldots$must be thought of as a complete, infinitely long sequence of decimal digits, every one of which (after the decimal point) is a “$3$” right now.

    In the same way, you need to think of the decimal expansion of $\sqrt{2}$ as having all its decimal digits in place at once. Of course, in this case you have to calculate them in order. And note that calculating them is only finding out what they are. They are already there!

    The preceding description is about how a mathematican thinks about infinite decimal expansions.  The thinking has some sort of physical representation in your head that allows you to think about to the hundred millionth decimal place of $\sqrt{2}$ or of $\pi$ even if you don’t know what it is. This does not mean that you have an infinite number of slots in your brain, one for each decimal place!  Nor does it mean that the infinite number of decimal places actually exist “somewhere”.  After all, you can think about unicorns and they don’t actually exist somewhere.

    Exact definitions

    Both the following statements are true:

    • The numbers $1/3$, $\sqrt{2}$and $\pi $ have infinitely long decimal representations, in contrast for example to $\frac{1}{2}$, whose decimal representation is exactly $0.5$.
    • The expressions “$1/3$”, “$\sqrt{2}$” and “$\pi $” exactly determine the numbers $1/3$, $\sqrt{2}$ and $\pi$:

    These two statements don’t contradict each other. All three numbers have exact definitions.

    • $1/3$ is exactly the number that gives 1 when multiplied by $3$.
    • $\sqrt{2}$is exactly the unique positive real number whose square is 2.
    • $\pi $ is exactly the ratio of the circumference of a circle to its
      diameter.

    The decimal representation of each one to a finite number of places provides an approximate location of that number on the real line On the other hand, the complete decimal representation of each one represents it exactly, although you can’t write it down.

    Different decimal representations for the same number

    The decimal representations of two different real numbers must be different. However, two different decimal representations can, in certain circumstances, represent the same real number. This happens when the decimal representation ends in an infinite sequence of $9$’s or an infinite sequence of $0$’s.

    Examples

    • $0.\overline{9}=1.\overline{0}$. This means that $0.\overline{9}$ is exactly the same number as $1$. It is not just an approximation of $1$
    • $3.4\bar{9}=3.5\overline{0}$. Indeed, $3.4\overline{9}$, $3.5$, $35/10$, and $7/2$ are all different representations of the same number. 

    The Wikipedia article “$0.\overline{9}$” is an elaborate discussion of the fact that $0.\overline{9}=1$, a fact that many students find hard to believe.



    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle