Tag Archives: category theory

A slow introduction to category theory

Category theory turns math inside-out. Definitions depend on nothing inside, but on everything outside. — John Cook

About this post

This is a draft of the first part of an article on category theory that will be posted on abstractmath.org. It replaces an earlier version that was posted in June, 2016.

During the last year or so, I have been monitoring the category theory questions on Math Stack Exchange. Some of the queries are clearly from people who do not have enough of a mathematical background to understand basic abstract reasoning, for example the importance of definitions and the difficulties described in the abmath artice on Dysfunctional attitudes and behaviors. Category theory has become important in several fields outside mathematics, for example computer science and database theory.

This article is intended to get people started in category theory by giving a very detailed definition of “category” and some examples described in detail with an emphasis on how the example fits the definition of category. That’s all the present version does, but I intend to add some examples of constructions and properties such as the dual category, product, and other concepts that some of the inquirers on Math Stack Exchange had great difficulty with.

There is no way in which this article is a proper introduction to category theory. It is intended only to give beginners some help over the initial steps of understanding the subject, particularly the aspects of understanding that cause many hopeful math majors to fall off the Abstraction Cliff.

About categories

To be written.

Definition of category

A category is a type of Mathematical structure consisting of two types of data, whose relationships are entirely determined by some axioms. After the definition is complete, I introduce several example categories with a detailed discussion of each one, explaining how they fit the definition of category.

Axiom 1: Data

A category consists of two types of data: objects and arrows.

Notes for Axiom 1

  • You will see in the section on Examples of categories that every definition of a category $\mathsf{C}$ starts by specifying what the objects of $\mathsf{C}$ are and what the arrows of $\mathsf{C}$ are. That is what Axiom I requires.
  • An object of a category can be any kind of mathematical object. It does not have to be a set and it does not have to have elements.
  • Arrows of a category are also called morphisms. You may be familiar with “homomorphisms”, “homeomorphisms” or “isomorphisms”, all of which are functions. This does not mean that a “morphism” in an arbitrary category is a function.

Axiom 2: Domain and codomain

Each arrow of a category has a domain and a codomain, each of which is an object of the category.

Notes for Axiom 2

  • The domain and the codomain of an arrow may or may not be the same object.
  • Each arrow has only one domain and only one codomain.
  • If $f$ is an arrow with domain $A$ and codomain $B$, that fact is typically shown either by the notation “$f:A\to B$” or by a diagram like this:
  • Warning: The notation “$f:A\to B$” is like that used for functions. This notation may be used in any category, but it does not imply that $f$ is a function or that $A$ and $B$ have elements.
  • For an arrow $f:A\to B$, the notation “$\text{dom}(f)$” refers to $A$ and “$\text{cod}(f)$” refers to $B$.
  • For a given category $\mathsf{C}$, the collection of all the arrows with domain $A$ and codomain $B$ may be denoted by
    • “$\text{Hom}(A,B)$” or
    • “$\text{Hom}_\mathsf{C}(A,B)$” or
    • “$\mathsf{C}(A,B)$”.

  • Some newer books and articles in category theory use the name source for domain and target for codomain. This usage has the advantage that a newcomer to category theory will be less likely to think of an arrow as a function.

Axiom 3: Composition

If $f$ and $g$ are arrows in a category for which $\text{cod}(f)=\text{dom}(g)$, as in this diagram:

then there is a unique arrow with domain $A$ and codomain $C$ called the composite of $f$ and $g$.

Notes for Axiom 3

  • The unique arrow required by Axiom 3 may be denoted by “$g\circ f$” or “$gf$”. “$g\circ f$” is more explicit, but “$gf$” is much more commonly used by category theorists.
  • Many constructions in categories may be shown by diagrams, like the one used just above.
  • The diagram

    is said to commute if $h=g\circ f$.

  • It is useful to think of $f$ followed by $g$ as a path in the diagram. Then a metaphor for composition is: Every path of length 2 has exactly one composite.
  • It is customary in some texts in category theory to indicate that a diagram commutes by putting a gyre in the middle:
  • Note that the composite of the path that I described as “$f$ followed by $g$” is written as “$g\circ f$” or “$gf$”, which seems backward. Nevertheless, the most common notation in category theory for the composite of “$f$ followed by $g$” is $gf$. Some authors in computer science write “$f;g$” for “$gf$” to get around this problem.
  • The concept of category is an abstraction of the idea of function, and the composition of arrows is an abstraction of the composition of functions. It uses the same notation, “$g\circ f$”. If $f$ and $g$ are set functions, then for an element $x$ in the domain of $f$, \[(g\circ f)(x)=g(f(x))\]
  • But in arbitrary category, it may make no sense to evaluate an arrow $f$ at some element $x$; indeed, the domain of $f$ may not have elements at all, and then the statement “$(g\circ f)(x)=g(f(x))$” is meaningless.

Axiom 4: Identity arrows

Note: WordpPress does not recognize the html command

    . Axiom 1 should be 4a, Axiom 2 4b Axiom 3 4c and Axiom 4 4d.

  1. For each object $A$ of a category, there is an arrow denoted by $\mathsf{id}_A$.
  2. $\textsf{dom}(\textsf{id}_A)=A$ and $\textsf{cod}(\textsf{id}_A)=A$.
  3. For any object $B$ and any arrow $f:B\to A$, the diagram


  4. For any object $C$ and any arrow $g:A\to C$, the diagram


Notes for Axiom 4

  • The fact stated in Axiom 4(b) could be shown diagrammatically either as

    or as

  • Facts (c) and (d) can be written in algebraic notation: For any arrow $f$ going to $A$,\[\textsf{id}_A\circ f=f\]and for any arrow $g$ coming from $A$,\[g\circ \textsf{id}_A=g\]
  • There may be many arrows with domain and codomain both equal to $A$ (for example in the category $\mathsf{Set}$), but only one of them is $\textsf{id}_A$. It can be proved that $\textsf{id}_A$ is the unique arrow satisfying both (c) and (d) of the axiom.

Axiom 5: Associativity

  1. If $f$, $g$ and $h$ are arrows in a category for which $\text{cod}(f)=\text{dom}(g)$ and $\text{cod}(g)=\text{dom}(h)$, as in this diagram:

    then there is a unique arrow $k$ with domain $A$ and codomain $D$ called the composite of $f$, $g$ and $h$.

  2. In the diagram below, the two triangles containing $k$ must both commute.

Notes for Axiom 5

  • Axiom 5b requires that \[h\circ(g\circ f)=(h\circ g)\circ f\](which both equal $k$), which is the usual algebraic notation for associativity.
  • Note that the top two triangles commute by Axiom 3.
  • The associativity axiom means that we can get rid of parentheses and write \[k=h\circ f\circ g\]just as we do for addition and multiplication of numbers.
  • In my opinion the notation using categorical diagrams communicates information much more clearly than algebraic notation does. In particular, you don’t have to remember the domains and codomains of the functions — they appear in the picture. I admit that diagrams take up much more space, but now that we read math stuff on a computer screen instead of on paper, space is free.

Examples of categories

For these examples, I give a detailed explanation about how they fit the definition of category.

Example 1: MyFin

This first example is a small, finite category which I have named $\mathsf{MyFin}$ (“my finite category”). It is not at all an important category, but it has advantages as a first example.

  • It’s small enough that you can see all the objects and arrows on the screen at once, but big enough not to be trivial.
  • The objects and arrows have no properties other than being objects and arrows. (Some of the other examples involve familiar math objects.)
  • So in order to check that $\mathsf{MyFin}$ really obeys the axioms for a category, you can use only the skeletal information given here. As a result, you must really understand the axioms!

A correct proof will be based on axioms and theorems.
The proof can be suggested by your intuitions,
but intuitions are not enough.
When working with $\mathsf{MyFin}$ you won’t have any intuitions!

A diagram for $\mathsf{MyFin}$

This diagram gives a partial description of $\mathsf{MyFin}$.

Now let’s see how to make the diagram above into a category.

Axiom 1: Data

  • The objects of $\mathsf{MyFin}$ are $A$, $B$, $C$ and $D$.
  • The arrows are $f$, $g$, $h$, $j$, $k$, $r$, $s$, $u$, $v$, $w$ and $x$.
  • You can regard the letters just listed as names of the objects and arrows. The point is that at this stage all you know about the objects and arrows are their names.
  • If you prefer, you can think of the arrows as the actual arrows shown in the $\mathsf{MyFin}$ diagram.
  • Our definition of $\mathsf{MyFin}$ is an abstract definition. You may have seen multiplication tables of groups given in terms of undefined letters. (If you haven’t, don’t worry.) Those are also abstract definitions.
  • Our other definitions of categories involve math objects you actually know something about.

Axiom 2: Domain and Codomain

  • The domains and codomains of the arrows are shown by the diagram above.
  • For example, $\text{dom}(r)=A$ and $\text{cod}(r)=C$, and $\text{dom}(v)=\text{cod}(v)=B$.

Axiom 3: Composition

Showing the $\mathsf{MyFin}$ diagram does not completely define $\mathsf{MyFin}$. We must say what the composites of all the paths of length 2 are.

  • In fact, most of them are forced, but two of them are not.
  • We must have $g\circ f=r$ because $r$ is the only arrow possible for the composite, and Axiom 3 requires that every path of length 2 must have a composite.
  • For the same reason, $h\circ g=s$.
  • All the paths involving $u$, $v$, $w$ and $x$ are forced:

  • (p1) $u\circ u=u$, $v\circ v=v$, $w\circ w=w$ and $x\circ x=x$.
  • (p2) $f\circ u=f$, $r\circ u=r$, $j\circ u=j$ and $k\circ u=k$. You can see that, for example, $f\circ u=f$ by opening up the loop on $f$ like this:

    There is only one arrow going from $A$ to $B$, namely$f$, so $f$ has to be the composite $f\circ u$.

  • (p3) $v\circ f=f$, $g\circ v=g$ and $s\circ v=s$.
  • (p4) $w\circ g=g$, $w\circ r=r$ and $h\circ w=h$.
  • (p5) $x\circ h=h$, $x\circ s=s$, $x\circ j=j$ and $x\circ k=k$.
  • For $s\circ f$ and $h\circ r$, we have to choose between $j$ and $k$ as composites. Since $s\circ f=(h\circ g)\circ f$ and $h\circ r=h\circ (g\circ f)$, Axiom 3 requires that we must chose one of $j$ and $k$ to be both composites.

    Definition: $s\circ f=h\circ r=j$.

    If we had defined $s\circ f=h\circ r=k$ we would have a different category, although one that is “isomorphic” to $\mathsf{MyFin}$ (you have to define “isomorphic” or look it up.)

Axiom 4: Identity arrows

Axiom 5: Associativity

  • Since we have already required both $(h\circ g)\circ f$ and $h\circ(g\circ f)$ to be $k$, composition is associative.

Example 2: IntegerDiv

  • This example uses familiar mathematical objects — positive integers.
  • The arrows are not functions that can be applied to elements, since integers do not have elements.

Axiom 1: Data

  • The objects of IntegerDiv are all the positive integers.
  • Suppose $m$ and $n$ are positive integers:
  • If $m$ divides $n$, there is exactly one arrow from $m$ to $n$. I will call this arrow $\textsf{mdn}$. (This is my notation. There is no standard notation for this category.) For example there is one arrow from $2$ to $6$, denoted by $\textsf{2d6}$.
  • If $m$ does not divide $n$, there is no arrow from $m$ to $n$.

Axiom 2: Domain and codomain

The arrow denoted by $\textsf{mdn}$ has domain $m$ and codomain $n$.




which may also be shown as

Axiom 3: Composition

The composite of

must be $\textsf{rdt}$, since that is the only arrow with domain $r$ and codomain $t$.

This fact can also be written this way: \[\mathsf{sdt}\circ\textsf{rds}=\textsf{rdt}\]

Axiom 4: Identity arrows

The composites


must commute since the arrows shown are the only possible arrows with the domains and codomains shown. In other words, $\textsf{id}_\textsf{r}=\textsf{rdr}$ and $\textsf{id}_\textsf{s}=\textsf{sds}$.

Axiom 5: Associativity

In the diagram below,

there is only one arrow from one integer to another, so $\textsf{k}$ must be both \[\textsf{tdu}\circ(\textsf{sdt}\circ\textsf{rds})\] and \[(\textsf{tdu}\circ\textsf{sdt})\circ\textsf{rds}\] as required.

Example 3: The category of Sets

In this section, I define the category $\mathsf{Set}$ (that is standard terminology in category theory.) This example will be very different from $\mathsf{MyFin}$, because it involves known mathematical objects — sets and functions.

Axiom 1: Data

  • Every set is an object of $\mathsf{Set}$ and nothing else is.
  • Every function between sets is an arrow of $\mathsf{Set}$ and nothing else is an arrow of $\mathsf{Set}$.

Axiom 2: Domain and codomain

For a given function $f$, $\text{dom}(f)$ is the domain of the function $f$ in the usual sense, and $\text{cod}(f)$ is the codomain of $f$ in the usual sense. (See Functions: specification and definition for more about domain and codomain.)


  • Let $f:\mathbb{R}\to\mathbb{R}$ be the function defined by $f(x):=x^2$. Then the arrow $f$ in $\mathsf{Set}$ satisfies $\text{dom}(f)= \mathbb{R}$ and also $\text{cod}(f)=\mathbb{R}$.
  • Let $j:\{1,2,3\}\to\{1,2,3,4\}$ be defined by $j(1):=1$, $j(2):=4$ and $j(3):=3$. Then $\text{dom}(j)=\{1,2,3\}$ and $\text{cod}(j)=\{1,2,3,4\}$.

Axiom 3: Composition

The composite of $f:A\to B$ and $g:B\to C$ is the function $g\circ f:A\to C$ defined by \[\text{(DC)}\,\,\,\,\,\,\,\,\,\,(g\circ f)(a):=g(f(a))\]


Many other categories have a similar definition of composition, including categories whose objects are math structures with underlying sets and whose arrows are structure-preserving functions between the underlying sets. But be warned: There are many useful categories whose arrows do not evaluate at an element of an object because the objects don’t have elements. In that case, (DC) is meaningless. This is true of $\mathsf{MyFin}$ and $\mathsf{IntegerDiv}$.

Axiom 4: Identity arrows

For a set $A$, the identity arrow $\textsf{id}_A:A\to A$ is, as you might expect, the identity function defined by $\textsf{id}_A(a)=a$ for every $a\in A$. We must prove that these diagrams commute:

The calculations below show that they commute. They use the definition of composite given by (DC).

  • For any $b\in B$, \[(\textsf{id}_A\circ f)(b)=\textsf{id}_A(f(b))=f(b)\]
  • For any $a\in A$, \[(g\circ \textsf{id}_A)(a)=g(\textsf{id}_A(a))=g(a)\]

Note: In $\mathsf{Set}$, there are generally many arrows from a particular set $S$ to itself (for example there are $4$ from $\{1,2\}$ to itself), but only one is the identity arrow.

Axiom 5: Associativity

Composition of arrows in $\mathsf{Set}$ is associative because function composition is associative. Suppose we have functions as in this diagram:

We must show that the two triangles containing $k$ in this diagram commute:

In algebraic notation, this requires showing that for every element $a\in A$,\[(h\circ(g\circ f))(a))=((h\circ g)\circ f)(a)\]

The calculation below does that. It makes repeated use of Definition (DC) of composition. For any $a\in A$,\[\begin{equation}
\big(h\circ (g\circ f)\big)(a)
& = h\big((g\circ f)(a)\big) \\
& = h\big(g(f(a))\big) \\
& = (h\circ g)(f(a)) \\
& = \big((h\circ g)\circ f\big)(a)

Example 4: The category of Monoids

  • This definition makes repeated use of the fact that a homomorphism of monoids is a set function. Specifically, if $(S,\Delta)$ and $(T,\nabla)$ are monoids with identities $e_S$ and $e_{T}$, a homomorphism $h:S\to T$ must be a set function that satisfies the following two axioms: \[\text{(ME)}\,\,\,\,\,\,\,\,h(e_S)=e_T\] and for all elements $s, s’$ of $S$, \[\text{(MM)}\,\,\,\,\,\,\,\,h(s\Delta s’)=h(s)\nabla h(s’)\]
  • The category of monoids may be denoted by $\mathsf{Mon}$.

Axiom 1: Data

  • Every monoid is an object of the category of monoids, and nothing else is.
  • If $f$ is a homomorphism of monoids, then $f$ is an arrow of the category of monoids, and nothing else is.

Axiom 2: Domain and codomain

If $(S,\Delta)$ and $(T,\nabla)$ are monoids and $f:(S,\Delta)\to(T,\nabla)$ is a homomorphism of monoids, then the domain of $f$ is $(S,\Delta)$ and the codomain of $f$ is $(T,\nabla)$.


  • Since $f$ takes elements of the set $S$ to elements of the set $T$, it is also an arrow in the category $\mathsf{Set}$. In general, very few functions from $S$ to $T$ will be monoid homomorphisms from $(S,\Delta)$ to $(T,\nabla)$.
  • Many authors do not distinguish between $f$ regarded as an arrow of $\mathsf{Mon}$ and $f$ regarded as an arrow of $\mathsf{Set}$. Others may write $U(f)$ for the arrow in $\mathsf{Set}$. “$U$” stands for “underlying functor“, also called “forgetful functor”.

Axiom 3: Composition

The composite of

is the composite $g\circ f$ as set functions:

It is necessary to check that $g\circ f$ is a monoid homomorphism. The following calculation shows that it preserves the monoid operation; it makes repeated use of equations (DC) and (MM).

The calculation: For elements $r$ and $r’$ of $R$,\[\begin{align*}
(g\circ f)(r\,{\scriptstyle \square}\, r’)
&=g\left(f(r\, {\scriptstyle \square}\, r’)\right)\,\,\,\,\,\text{(DC)}\\ &=g\left(f(r) {\scriptstyle\, \Delta}\, f(r’)\right)\,\,\,\,\,\text{(MM)}\\
&=g(f(r)){\scriptstyle \,\nabla}\, g(f(r’))\,\,\,\,\text{(MM)}\\
&=(g\circ f)(r){\scriptstyle \,\nabla}\,(g\circ f)(r’)\,\,\,\,\,\text{(DC)}

The fact that $g\circ f$ preserves the identity of the monoid is shown in the next section.

Axiom 4: Identity arrows

For a monoid $(S,\Delta)$, the identity function $\text{id}_S:S\to S$ preserves the monoid operation $\Delta$, because $\text{id}_S(s\Delta s’)=s\Delta s’$ by definition of the identity function, and that is $\text{id}_S(s)\Delta \text{id}_S(s’)$ for the same reason.

The required diagrams below must commute because the set functions commute and, by Axiom 3, the set composition of a monoid homomorphism is a monoid homomorphism.

We also need to show that $g\circ f$ as in

preserves identities. This calculation proves that it does; it uses (DC) and (ME)

(g\circ f)(\text{id}_R)

Axiom 5: Associativity

The diagram

in the category $\mathsf{Set}$ commutes, so the diagram

must also commute.


All these references are available on line.

  Creative Commons License        

This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

Send to Kindle

Very early difficulties II

Very early difficulties II

This is the second part of a series of posts about certain difficulties math students have in the very early stages of studying abstract math. The first post, Very early difficulties in studying abstract math, gives some background to the subject and discusses one particular difficulty: Some students do not know that it is worthwhile to try starting a proof by rewriting what is to be proved using the definitions of the terms involved.

Math StackExchange

The website Math StackExchange is open to any questions about math, even very easy ones. It is in contrast with Math OverFlow, which is aimed at professional mathematicians asking questions in their own field.

Math SE contains many examples of the early difficulties discussed in this series of posts, and I recommend to math ed people (not just RUME people, since some abstract math occurs in advanced high school courses) that they might consider reading through questions on Math SE for examples of misunderstanding students have.

There are two caveats:

  • Most questions on Math SE are at a high enough level that they don’t really concern these early difficulties.
  • Many of the questions are so confused that it is hard to pinpoint what is causing the difficulty that the questioner has.

Connotations of English words

The terms(s) defined in a definition are often given ordinary English words as names, and the beginner automatically associates the connotations of the meaning of the English word with the objects defined in the definition.

Infinite cardinals

If $A$ if a finite set, the cardinality of $A$ is simply a natural number (including $0$). If $A$ is a proper subset of another set $B$, then the cardinality of $A$ is strictly less than the cardinality of $B$.

In the nineteenth century, mathematicians extended the definition of cardinality for infinite sets, and for the most part cardinality has the same behavior as for finite sets. For example, the cardinal numbers are well-ordered. However, for infinite sets it is possible for a set and a proper subset of the set to have the same cardinality. For example, the cardinality of the set of natural numbers is the same as the cardinality of the set of rational numbers. This phenomenon causes major cognitive dissonance.

Question 1331680 on Math Stack Exchange shows an example of this confusion. I have also discussed the problem with cardinality in the abstractmath.org section Cardinality.

Morphism in category theory

The concept of category is defined by saying there is a bunch of objects called objects (sorry bout that) and a bunch of objects called morphisms, subject to certain axioms. One requirement is that there are functions from morphisms to objects choosing a “domain” and a “codomain” of each morphism. This is spelled out in Category Theory in Wikibooks, and in any other book on category theory.

The concepts of morphism, domain and codomain in a category are therefore defined by abstract definitions, which means that any property of morphisms and their domains and codomains that is true in every category must follow from the axioms. However, the word “morphism” and the talk about domains and codomains naturally suggests to many students that a morphism must be a function, so they immediately and incorrectly expect to evaluate it at an element of its domain, or to treat it as a function in other ways.


If $\mathcal{C}$ is a category, its opposite category $\mathcal{C}^{op}$ is defined this way:

  • The objects of $\mathcal{C}^{op}$ are the objects of $\mathcal{C}$.
  • A morphism $f:X\to Y$ of $\mathcal{C}^{op}$ is a morphism from $Y$ to $X$ of $\mathcal{C}$ (swap the domain and codomain).

In Question 980933 on Math SE, the questioner is saying (among other things) that in $\text{Set}^{op}$, this would imply that there has to be a morphism from a nonempty set to the empty set. This of course is true, but the questioner is worried that you can’t have a function from a nonempty set to the empty set. That is also true, but what it implies is that in $\text{Set}^{op}$, the morphism from $\{1,2,3\}$ to the empty set is not a function from $\{1,2,3\}$ to the empty set. The morphism exists, but it is not a function. This does not any any sense make the definition of $\text{Set}^{op}$ incorrect.

Student confusion like this tends to make the teacher want to have a one foot by six foot billboard in his classroom saying


However, even that statement causes confusion. The questioner who asked Question 1594658 essentially responded to the statement in purple prose above by assuming a morphism that is “not a function” must have two distinct values at some input!

That questioner is still allowing the connotations of the word “morphism” to lead them to assume something that the definition of category does not give: that the morphism can evaluate elements of the domain to give elements of the codomain.

So we need a more elaborate poster in the classroom:

The definition of “category” makes no requirement
that an object has elements
or that morphisms evaluate elements.

As was remarked long long ago, category theory is pointless.

English words implementing logic

There are lots of questions about logic that show that students really do not think that the definition of some particular logical construction can possibly be correct. That is why in the abstractmath.org chapter on definitions I inserted this purple prose:

A definition is a totalitarian dictator.

It is often the case that you can explain why the definition is worded the way it is, and of course when you can you should. But it is also true that the student has to grovel and obey the definition no matter how weird they think it is.

Formula and term

In logic you learn that a formula is a statement with variables in it, for example “$\exists x((x+5)^3\gt2)$”. The expression “$(x+5)^3$” is not a formula because it is not a statement; it is a “term”. But in English, $H_2O$ is a formula, the formula for water. As a result, some students have a remarkably difficult time understanding the difference between “term” and “formula”. I think that is because those students don’t really believe that the definition must be taken seriously.

Exclusive or

Question 804250 in MathSE says:

“Consider $P$ and $Q$. Let $P+Q$ denote exclusive or. Then if $P$ and $Q$ are both true or are both false then $P+Q$ is false. If one of them is true and one of them is false then $P+Q$ is true. By exclusive or I mean $P$ or $Q$ but not both. I have been trying to figure out why the truth table is the way it is. For example if $P$ is true and $Q$ is true then no matter what would it be true?”

I believe that the questioner is really confused by the plus sign: $P+Q$ ought to be true if $P$ and $Q$ are both true because that’s what the plus sign ought to mean.

Yes, I know this is about a symbol instead of an English word, but I think the difficulty has the same dynamics as the English-word examples I have given.

If I have understood this difficulty correctly, it is similar to the students who want to know why $1$ is not a prime number. In that case, there is a good explanation.

Only if

The phrase “only if” simply does not mean the same thing in math as it does in English. In Question 17562 in MathSE, a reader asks the question, why does “$P$ only if $Q$” mean the same as “if $P$ then $Q$” instead of “if $Q$ then $P$”?

Many answerers wasted a lot of time trying to convince us that “$P$ only if $Q$” mean the same as “if $P$ then $Q$” in ordinary English, when in fact it does not. That’s because in English, clauses involving “if” usually connote causation, which does not happen in math English.

Consider these two pairs of examples.

  1. “I take my umbrella only if it is raining.”
  2. “If I take my umbrella, then it is raining.”
  3. “I flip that switch only if a light comes on.”
  4. “If I flip that switch, a light comes on.”

The average non-mathematical English speaker will easily believe that (1) and (4) are true, but will balk and (2) and (3). To me, (3) means that the light coming on makes me flip the switch. (2) is more problematical, but it does (to me) have a feeling of causation going the wrong way. It is this difference that causes students to balk at the equivalence in math of “$P$ only if $Q$” and “If $P$, then $Q$”. In math, there is no such thing as causation, and the truth tables for implication force us to live with the fact that these two sentences mean the same thing.

Henning Makholm’ answer to Question 17562 begins this way: “I don’t think there’s really anything to understand here. One simply has to learn as a fact that in mathematics jargon the words ‘only if’ invariably encode that particular meaning. It is not really forced by the everyday meanings of ‘only’ and’ if’ in isolation; it’s just how it is.” That is the best way to answer the question. (Other answerers besides Makholm said something similar.)

I have also discussed this difficulty (and other difficulties with logic) in the abmath section on “only if“.


Creative Commons License

This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

Send to Kindle

My early life as a mathematician

My early life as a mathematician.

Revised 22 January 2016.

In 1965, I received my Ph.D. at Duke University based on a dissertation about polynomials over finite fields. My advisor was Leonard Carlitz.

In Carlitz’s algebra course, the textbook was Van der Waerden’s Algebra. It is way too old-fashioned to be used nowadays, but it did indeed present post-Noether type abstract algebra. Carlitz also had me read large chunks of Martin Weber’s Lehrbuch der Algebra, written in German in 1895 (so totally not post-Noether) and published using Fraktur. A few years ago one of my sons asked me to retype the words to some of the songs written in Fraktur in a German-American shape note book in Roman type (but still in German), which I did. This was for German teachers in the Concordia Language Villages to use with their students. I sometimes wonder if I am the last person on earth able to read Fraktur fluently.

I learned mathematical logic from Joe Shoenfield from his dittoed notes that later became an excellent textbook. I rediscovered Craig’s Trick while working on problem he gave. That considerably strengthened my sense of self-worth.

I accepted a job at Western Reserve University, now Case Western Reserve University, where I stayed until I retired in 1999. In the few years after 1965, I wrote several papers about finite fields. They are all summarized in the book Finite Fields, by Rudolf Lidl and Harald Niederreiter.

I was almost immediately attracted to category theory and to computing science, both of which Carlitz hated. I did not let that stop me. (Now is the time to say, Follow The Beat of your Own Drum or some such cliché.)

Early on, Paul Dedecker was at CWRU briefly, and from him I learned about sheaves, cribles and the like. This inspired me to take part in an algebraic geometry summer school at Bowdoin College, where I learned from lectures by David Mumford and by reading his Red Book when it was still red.

Because one of the papers in finite fields showed that certain types of permutation polynomials formed wreath products of groups, I also pursued group theory, in particular by taking part in the finite group theory summer school at Bowdoin in 1970.

During that time I pored over Beck’s thesis on cohomology, which with the group theory I had learned resulted in my paper Automorphisms of group extensions. That paper has the most citations of all my research papers.

In the early days, I had several graduate students. All of them worked in group theory. One of them, Shair Ahmad, went on to produce several Ph.D. students, all in differential equations and dynamical systems.

One thing I can brag about is that I never ever told him I hated differential equations or dynamical systems. In fact, I didn’t hate either one. There were people in the department in both fields and they made me jealous the way they could model real life phenomena with those tools. One relevant point about that is that I was a liberal arts math major from Oberlin before going to Duke and had had very few courses in any kind of science. This made me very different from most people in the department, who has B.S. undergrad degrees.

In those days, John Isbell and Peter Hilton were in the math department at CWRU for awhile, which boosted my knowledge and interest in category theory. Hilton arranged for me to spend a year at the E.T.H. in Zürich, where I met Michael Barr. I eventually wrote two books on category theory with him. But that is getting away from Early Days, so I will stop here.

Creative Commons License

This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

Send to Kindle