# A slow introduction to category theory

Category theory turns math inside-out. Definitions depend on nothing inside, but on everything outside. — John Cook

This is a draft of the first part of an article on category theory that will be posted on abstractmath.org. It replaces an earlier version that was posted in June, 2016.

During the last year or so, I have been monitoring the category theory questions on Math Stack Exchange. Some of the queries are clearly from people who do not have enough of a mathematical background to understand basic abstract reasoning, for example the importance of definitions and the difficulties described in the abmath artice on Dysfunctional attitudes and behaviors. Category theory has become important in several fields outside mathematics, for example computer science and database theory.

This article is intended to get people started in category theory by giving a very detailed definition of “category” and some examples described in detail with an emphasis on how the example fits the definition of category. That’s all the present version does, but I intend to add some examples of constructions and properties such as the dual category, product, and other concepts that some of the inquirers on Math Stack Exchange had great difficulty with.

There is no way in which this article is a proper introduction to category theory. It is intended only to give beginners some help over the initial steps of understanding the subject, particularly the aspects of understanding that cause many hopeful math majors to fall off the Abstraction Cliff.

To be written.

# Definition of category

A category is a type of Mathematical structure consisting of two types of data, whose relationships are entirely determined by some axioms. After the definition is complete, I introduce several example categories with a detailed discussion of each one, explaining how they fit the definition of category.

## Axiom 1: Data

A category consists of two types of data: objects and arrows.

### Notes for Axiom 1

• You will see in the section on Examples of categories that every definition of a category $\mathsf{C}$ starts by specifying what the objects of $\mathsf{C}$ are and what the arrows of $\mathsf{C}$ are. That is what Axiom I requires.
• An object of a category can be any kind of mathematical object. It does not have to be a set and it does not have to have elements.
• Arrows of a category are also called morphisms. You may be familiar with “homomorphisms”, “homeomorphisms” or “isomorphisms”, all of which are functions. This does not mean that a “morphism” in an arbitrary category is a function.

## Axiom 2: Domain and codomain

Each arrow of a category has a domain and a codomain, each of which is an object of the category.

### Notes for Axiom 2

• The domain and the codomain of an arrow may or may not be the same object.
• Each arrow has only one domain and only one codomain.
• If $f$ is an arrow with domain $A$ and codomain $B$, that fact is typically shown either by the notation “$f:A\to B$” or by a diagram like this:
• Warning: The notation “$f:A\to B$” is like that used for functions. This notation may be used in any category, but it does not imply that $f$ is a function or that $A$ and $B$ have elements.
• For an arrow $f:A\to B$, the notation “$\text{dom}(f)$” refers to $A$ and “$\text{cod}(f)$” refers to $B$.
• For a given category $\mathsf{C}$, the collection of all the arrows with domain $A$ and codomain $B$ may be denoted by
• “$\text{Hom}(A,B)$” or
• “$\text{Hom}_\mathsf{C}(A,B)$” or
• “$\mathsf{C}(A,B)$”.

• Some newer books and articles in category theory use the name source for domain and target for codomain. This usage has the advantage that a newcomer to category theory will be less likely to think of an arrow as a function.

## Axiom 3: Composition

If $f$ and $g$ are arrows in a category for which $\text{cod}(f)=\text{dom}(g)$, as in this diagram:

then there is a unique arrow with domain $A$ and codomain $C$ called the composite of $f$ and $g$.

### Notes for Axiom 3

• The unique arrow required by Axiom 3 may be denoted by “$g\circ f$” or “$gf$”. “$g\circ f$” is more explicit, but “$gf$” is much more commonly used by category theorists.
• Many constructions in categories may be shown by diagrams, like the one used just above.
• The diagram

is said to commute if $h=g\circ f$.

• It is useful to think of $f$ followed by $g$ as a path in the diagram. Then a metaphor for composition is: Every path of length 2 has exactly one composite.
• It is customary in some texts in category theory to indicate that a diagram commutes by putting a gyre in the middle:
• Note that the composite of the path that I described as “$f$ followed by $g$” is written as “$g\circ f$” or “$gf$”, which seems backward. Nevertheless, the most common notation in category theory for the composite of “$f$ followed by $g$” is $gf$. Some authors in computer science write “$f;g$” for “$gf$” to get around this problem.
• The concept of category is an abstraction of the idea of function, and the composition of arrows is an abstraction of the composition of functions. It uses the same notation, “$g\circ f$”. If $f$ and $g$ are set functions, then for an element $x$ in the domain of $f$, $(g\circ f)(x)=g(f(x))$
• But in arbitrary category, it may make no sense to evaluate an arrow $f$ at some element $x$; indeed, the domain of $f$ may not have elements at all, and then the statement “$(g\circ f)(x)=g(f(x))$” is meaningless.

## Axiom 4: Identity arrows

Note: WordpPress does not recognize the html command

. Axiom 1 should be 4a, Axiom 2 4b Axiom 3 4c and Axiom 4 4d.

1. For each object $A$ of a category, there is an arrow denoted by $\mathsf{id}_A$.
2. $\textsf{dom}(\textsf{id}_A)=A$ and $\textsf{cod}(\textsf{id}_A)=A$.
3. For any object $B$ and any arrow $f:B\to A$, the diagram

commutes.

4. For any object $C$ and any arrow $g:A\to C$, the diagram

commutes.

### Notes for Axiom 4

• The fact stated in Axiom 4(b) could be shown diagrammatically either as

or as

• Facts (c) and (d) can be written in algebraic notation: For any arrow $f$ going to $A$,$\textsf{id}_A\circ f=f$and for any arrow $g$ coming from $A$,$g\circ \textsf{id}_A=g$
• There may be many arrows with domain and codomain both equal to $A$ (for example in the category $\mathsf{Set}$), but only one of them is $\textsf{id}_A$. It can be proved that $\textsf{id}_A$ is the unique arrow satisfying both (c) and (d) of the axiom.

## Axiom 5: Associativity

1. If $f$, $g$ and $h$ are arrows in a category for which $\text{cod}(f)=\text{dom}(g)$ and $\text{cod}(g)=\text{dom}(h)$, as in this diagram:

then there is a unique arrow $k$ with domain $A$ and codomain $D$ called the composite of $f$, $g$ and $h$.

2. In the diagram below, the two triangles containing $k$ must both commute.

### Notes for Axiom 5

• Axiom 5b requires that $h\circ(g\circ f)=(h\circ g)\circ f$(which both equal $k$), which is the usual algebraic notation for associativity.
• Note that the top two triangles commute by Axiom 3.
• The associativity axiom means that we can get rid of parentheses and write $k=h\circ f\circ g$just as we do for addition and multiplication of numbers.
• In my opinion the notation using categorical diagrams communicates information much more clearly than algebraic notation does. In particular, you don’t have to remember the domains and codomains of the functions — they appear in the picture. I admit that diagrams take up much more space, but now that we read math stuff on a computer screen instead of on paper, space is free.

# Examples of categories

For these examples, I give a detailed explanation about how they fit the definition of category.

## Example 1: MyFin

This first example is a small, finite category which I have named $\mathsf{MyFin}$ (“my finite category”). It is not at all an important category, but it has advantages as a first example.

• It’s small enough that you can see all the objects and arrows on the screen at once, but big enough not to be trivial.
• The objects and arrows have no properties other than being objects and arrows. (Some of the other examples involve familiar math objects.)
• So in order to check that $\mathsf{MyFin}$ really obeys the axioms for a category, you can use only the skeletal information given here. As a result, you must really understand the axioms!

A correct proof will be based on axioms and theorems.
The proof can be suggested by your intuitions,
but intuitions are not enough.
When working with $\mathsf{MyFin}$ you won’t have any intuitions!

### A diagram for $\mathsf{MyFin}$

This diagram gives a partial description of $\mathsf{MyFin}$.

Now let’s see how to make the diagram above into a category.

### Axiom 1: Data

• The objects of $\mathsf{MyFin}$ are $A$, $B$, $C$ and $D$.
• The arrows are $f$, $g$, $h$, $j$, $k$, $r$, $s$, $u$, $v$, $w$ and $x$.
• You can regard the letters just listed as names of the objects and arrows. The point is that at this stage all you know about the objects and arrows are their names.
• If you prefer, you can think of the arrows as the actual arrows shown in the $\mathsf{MyFin}$ diagram.
• Our definition of $\mathsf{MyFin}$ is an abstract definition. You may have seen multiplication tables of groups given in terms of undefined letters. (If you haven’t, don’t worry.) Those are also abstract definitions.
• Our other definitions of categories involve math objects you actually know something about.

### Axiom 2: Domain and Codomain

• The domains and codomains of the arrows are shown by the diagram above.
• For example, $\text{dom}(r)=A$ and $\text{cod}(r)=C$, and $\text{dom}(v)=\text{cod}(v)=B$.

### Axiom 3: Composition

Showing the $\mathsf{MyFin}$ diagram does not completely define $\mathsf{MyFin}$. We must say what the composites of all the paths of length 2 are.

• In fact, most of them are forced, but two of them are not.
• We must have $g\circ f=r$ because $r$ is the only arrow possible for the composite, and Axiom 3 requires that every path of length 2 must have a composite.
• For the same reason, $h\circ g=s$.
• All the paths involving $u$, $v$, $w$ and $x$ are forced:

• (p1) $u\circ u=u$, $v\circ v=v$, $w\circ w=w$ and $x\circ x=x$.
• (p2) $f\circ u=f$, $r\circ u=r$, $j\circ u=j$ and $k\circ u=k$. You can see that, for example, $f\circ u=f$ by opening up the loop on $f$ like this:

There is only one arrow going from $A$ to $B$, namely$f$, so $f$ has to be the composite $f\circ u$.

• (p3) $v\circ f=f$, $g\circ v=g$ and $s\circ v=s$.
• (p4) $w\circ g=g$, $w\circ r=r$ and $h\circ w=h$.
• (p5) $x\circ h=h$, $x\circ s=s$, $x\circ j=j$ and $x\circ k=k$.
• For $s\circ f$ and $h\circ r$, we have to choose between $j$ and $k$ as composites. Since $s\circ f=(h\circ g)\circ f$ and $h\circ r=h\circ (g\circ f)$, Axiom 3 requires that we must chose one of $j$ and $k$ to be both composites.

Definition: $s\circ f=h\circ r=j$.

If we had defined $s\circ f=h\circ r=k$ we would have a different category, although one that is “isomorphic” to $\mathsf{MyFin}$ (you have to define “isomorphic” or look it up.)

### Axiom 5: Associativity

• Since we have already required both $(h\circ g)\circ f$ and $h\circ(g\circ f)$ to be $k$, composition is associative.

## Example 2: IntegerDiv

• This example uses familiar mathematical objects — positive integers.
• The arrows are not functions that can be applied to elements, since integers do not have elements.

### Axiom 1: Data

• The objects of IntegerDiv are all the positive integers.
• Suppose $m$ and $n$ are positive integers:
• If $m$ divides $n$, there is exactly one arrow from $m$ to $n$. I will call this arrow $\textsf{mdn}$. (This is my notation. There is no standard notation for this category.) For example there is one arrow from $2$ to $6$, denoted by $\textsf{2d6}$.
• If $m$ does not divide $n$, there is no arrow from $m$ to $n$.

### Axiom 2: Domain and codomain

The arrow denoted by $\textsf{mdn}$ has domain $m$ and codomain $n$.

##### Example

which may also be shown as

### Axiom 3: Composition

The composite of

must be $\textsf{rdt}$, since that is the only arrow with domain $r$ and codomain $t$.

This fact can also be written this way: $\mathsf{sdt}\circ\textsf{rds}=\textsf{rdt}$

### Axiom 4: Identity arrows

The composites

and

must commute since the arrows shown are the only possible arrows with the domains and codomains shown. In other words, $\textsf{id}_\textsf{r}=\textsf{rdr}$ and $\textsf{id}_\textsf{s}=\textsf{sds}$.

### Axiom 5: Associativity

In the diagram below,

there is only one arrow from one integer to another, so $\textsf{k}$ must be both $\textsf{tdu}\circ(\textsf{sdt}\circ\textsf{rds})$ and $(\textsf{tdu}\circ\textsf{sdt})\circ\textsf{rds}$ as required.

## Example 3: The category of Sets

In this section, I define the category $\mathsf{Set}$ (that is standard terminology in category theory.) This example will be very different from $\mathsf{MyFin}$, because it involves known mathematical objects — sets and functions.

### Axiom 1: Data

• Every set is an object of $\mathsf{Set}$ and nothing else is.
• Every function between sets is an arrow of $\mathsf{Set}$ and nothing else is an arrow of $\mathsf{Set}$.

### Axiom 2: Domain and codomain

For a given function $f$, $\text{dom}(f)$ is the domain of the function $f$ in the usual sense, and $\text{cod}(f)$ is the codomain of $f$ in the usual sense. (See Functions: specification and definition for more about domain and codomain.)

#### Examples

• Let $f:\mathbb{R}\to\mathbb{R}$ be the function defined by $f(x):=x^2$. Then the arrow $f$ in $\mathsf{Set}$ satisfies $\text{dom}(f)= \mathbb{R}$ and also $\text{cod}(f)=\mathbb{R}$.
• Let $j:\{1,2,3\}\to\{1,2,3,4\}$ be defined by $j(1):=1$, $j(2):=4$ and $j(3):=3$. Then $\text{dom}(j)=\{1,2,3\}$ and $\text{cod}(j)=\{1,2,3,4\}$.

### Axiom 3: Composition

The composite of $f:A\to B$ and $g:B\to C$ is the function $g\circ f:A\to C$ defined by $\text{(DC)}\,\,\,\,\,\,\,\,\,\,(g\circ f)(a):=g(f(a))$

#### Note

Many other categories have a similar definition of composition, including categories whose objects are math structures with underlying sets and whose arrows are structure-preserving functions between the underlying sets. But be warned: There are many useful categories whose arrows do not evaluate at an element of an object because the objects don’t have elements. In that case, (DC) is meaningless. This is true of $\mathsf{MyFin}$ and $\mathsf{IntegerDiv}$.

### Axiom 4: Identity arrows

For a set $A$, the identity arrow $\textsf{id}_A:A\to A$ is, as you might expect, the identity function defined by $\textsf{id}_A(a)=a$ for every $a\in A$. We must prove that these diagrams commute:

The calculations below show that they commute. They use the definition of composite given by (DC).

• For any $b\in B$, $(\textsf{id}_A\circ f)(b)=\textsf{id}_A(f(b))=f(b)$
• For any $a\in A$, $(g\circ \textsf{id}_A)(a)=g(\textsf{id}_A(a))=g(a)$

Note: In $\mathsf{Set}$, there are generally many arrows from a particular set $S$ to itself (for example there are $4$ from $\{1,2\}$ to itself), but only one is the identity arrow.

### Axiom 5: Associativity

Composition of arrows in $\mathsf{Set}$ is associative because function composition is associative. Suppose we have functions as in this diagram:

We must show that the two triangles containing $k$ in this diagram commute:

In algebraic notation, this requires showing that for every element $a\in A$,$(h\circ(g\circ f))(a))=((h\circ g)\circ f)(a)$

The calculation below does that. It makes repeated use of Definition (DC) of composition. For any $a\in A$,$\begin{split} \big(h\circ (g\circ f)\big)(a) & = h\big((g\circ f)(a)\big) \\ & = h\big(g(f(a))\big) \\ & = (h\circ g)(f(a)) \\ & = \big((h\circ g)\circ f\big)(a) \end{split}$

## Example 4: The category of Monoids

• This definition makes repeated use of the fact that a homomorphism of monoids is a set function. Specifically, if $(S,\Delta)$ and $(T,\nabla)$ are monoids with identities $e_S$ and $e_{T}$, a homomorphism $h:S\to T$ must be a set function that satisfies the following two axioms: $\text{(ME)}\,\,\,\,\,\,\,\,h(e_S)=e_T$ and for all elements $s, s’$ of $S$, $\text{(MM)}\,\,\,\,\,\,\,\,h(s\Delta s’)=h(s)\nabla h(s’)$
• The category of monoids may be denoted by $\mathsf{Mon}$.

### Axiom 1: Data

• Every monoid is an object of the category of monoids, and nothing else is.
• If $f$ is a homomorphism of monoids, then $f$ is an arrow of the category of monoids, and nothing else is.

### Axiom 2: Domain and codomain

If $(S,\Delta)$ and $(T,\nabla)$ are monoids and $f:(S,\Delta)\to(T,\nabla)$ is a homomorphism of monoids, then the domain of $f$ is $(S,\Delta)$ and the codomain of $f$ is $(T,\nabla)$.

#### Notes

• Since $f$ takes elements of the set $S$ to elements of the set $T$, it is also an arrow in the category $\mathsf{Set}$. In general, very few functions from $S$ to $T$ will be monoid homomorphisms from $(S,\Delta)$ to $(T,\nabla)$.
• Many authors do not distinguish between $f$ regarded as an arrow of $\mathsf{Mon}$ and $f$ regarded as an arrow of $\mathsf{Set}$. Others may write $U(f)$ for the arrow in $\mathsf{Set}$. “$U$” stands for “underlying functor“, also called “forgetful functor”.

### Axiom 3: Composition

The composite of

is the composite $g\circ f$ as set functions:

It is necessary to check that $g\circ f$ is a monoid homomorphism. The following calculation shows that it preserves the monoid operation; it makes repeated use of equations (DC) and (MM).

The calculation: For elements $r$ and $r’$ of $R$,\begin{align*} (g\circ f)(r\,{\scriptstyle \square}\, r’) &=g\left(f(r\, {\scriptstyle \square}\, r’)\right)\,\,\,\,\,\text{(DC)}\\ &=g\left(f(r) {\scriptstyle\, \Delta}\, f(r’)\right)\,\,\,\,\,\text{(MM)}\\ &=g(f(r)){\scriptstyle \,\nabla}\, g(f(r’))\,\,\,\,\text{(MM)}\\ &=(g\circ f)(r){\scriptstyle \,\nabla}\,(g\circ f)(r’)\,\,\,\,\,\text{(DC)} \end{align*}

The fact that $g\circ f$ preserves the identity of the monoid is shown in the next section.

### Axiom 4: Identity arrows

For a monoid $(S,\Delta)$, the identity function $\text{id}_S:S\to S$ preserves the monoid operation $\Delta$, because $\text{id}_S(s\Delta s’)=s\Delta s’$ by definition of the identity function, and that is $\text{id}_S(s)\Delta \text{id}_S(s’)$ for the same reason.

The required diagrams below must commute because the set functions commute and, by Axiom 3, the set composition of a monoid homomorphism is a monoid homomorphism.

We also need to show that $g\circ f$ as in

preserves identities. This calculation proves that it does; it uses (DC) and (ME)

\begin{align*} (g\circ f)(\text{id}_R) &=g(f((\text{id}_R))\\ &=g(\text{id}_S)\\ &=\text{id}_T \end{align*}

### Axiom 5: Associativity

The diagram

in the category $\mathsf{Set}$ commutes, so the diagram

must also commute.

# References

All these references are available on line.



Send to Kindle

# Monads for High School III: Algebras

The interactive examples in this post require installing Wolfram CDF player, which is free and works on most desktop computers using Firefox, Safari and Internet Explorer, but not Chrome. The source code is the Mathematica Notebook MonadAlg.nb, which is available for free use under a Creative Commons Attribution-ShareAlike 2.5 License. The notebook can be read by CDF Player if you cannot make the embedded versions in this post work.

This is a continuation of Monads for high school I and Monads for High School II: Lists. This post covers the concept of algebras for the monad for lists.

# Lists

$\textrm{Lists}(S)$ is the set of all lists of finite length whose entries are elements of $S$.

• $\boxed{2\; 2\; 4}$ is the way I denote the list of length $3$ whose first and second entries are each $2$ and whose third entry is $4$.
• A list with only one entry, such as $\boxed{2}$, is called a singleton list.
• The empty list $\boxed{\phantom{2}}$ has no entries.
• $\textrm{Lists}^*(S)$ is the set of all nonempty lists of finite length whose entries are elements of $S$.
• $\textrm{Lists}(\textrm{Lists}(S))$ is the list whose entries are lists with entries from $S$.
• For example, $\boxed{\boxed{5\; 7}\; \boxed{2\; 12\; 7}}$ and $\boxed{\boxed{5\; 7\; 2\; 12\; 7}}$ are both entries in $\textrm{Lists}^*(\textrm{Lists}^*(\mathbb{Z}))$. The second one is a singleton list!
• $\boxed{\boxed{\phantom{3}}\; \boxed{2}}$ and $\boxed{\boxed{\phantom{3}}}$ are entries in $\textrm{Lists}^*(\textrm{Lists}(\mathbb{Z}))$.
• The empty list $\boxed{\phantom{2}}$ is an entry in $\textrm{Lists}(\mathbb{Z})$, in $\textrm{Lists}(\textrm{Lists}^*(\mathbb{Z}))$ and in $\textrm{Lists}(\textrm{Lists}(\mathbb{Z}))$. If you have stared at this for more than ten minutes, do something else and come back to it later.

The star notation is used widely in math and computing science to imply that you are including everything except some insignificant shrimp of a thing such as the empty list, the empty set, or $0$. For example, $\mathbb{R}^*$ denotes the set of all nonzero real numbers.

More details about lists are in Monads for High School II: Lists.

# Join

The function join (or concatenation) takes two lists and creates a third list. For example, if you join $\boxed{5\; 7}$ to $\boxed{2\; 12\; 7 }$ in that order you get $\boxed{5\; 7\; 2\; 12\; 7}$.

• I will use this notation: join$\boxed{\boxed{5\; 7}\; \boxed{2\; 12\; 7}}=\boxed{5\; 7\; 2\; 12\; 7}$.
• This notation means that I am regarding join as a function that takes a two-element list in $\textrm{Lists}(\textrm{Lists}(S))$ to an element of $\textrm{Lists}(S)$.
• join removes one level of lists
• join is not commutative: join$\boxed{\boxed{2\; 12\; 7}\; \boxed{5\; 7}}=\boxed{2\; 12\; 7\; 5\; 7}$
• Join is associative, and as for any associative binary operation, join is defined on any finite list of lists of elements of $S$. So for example, join$\boxed{\boxed{5\; 7}\; \boxed{2\; 12\; 7}\; \boxed{1}}=\boxed{5\; 7\; 2\; 12\; 7\; 1}$.
• For any single list $\boxed{a\; b\; c}$, join$\boxed{\boxed{a\; b\; c}}=\boxed{a\; b\; c}$. This is required to make the theory work. It is called the oneidentity property.
• If the empty list $\boxed{\phantom{2}}$ occurs in a list of lists, it disappears when join is applied: join $\boxed{\boxed{2\; 3}\; \boxed{\phantom{2}}\; \boxed{4\; 5\; 6}}=\boxed{2\; 3\; 4\; 5\; 6}$.

When you have a list of lists of lists, join can be applied in two different ways, "inside" and "outside" as illustrated in the diagram below. It gives you several different inputs to try out as a way to understand what is happening.

This is the special case of the main diagram for all monads as it applies to the List monad.

As you can see, after doing either of "inside" and "outside", if you then apply join, you get the same list. That list is simply the list of entries in the beginning list (and the two intermediate ones) in the same order, disregarding groupings.

From what I have just written, you must depend on your pattern recognition abilities to learn what inside and outside mean. But both can also be described in words.

• The lists outlined in black are lists of elements of $\mathbb{Z}$. In other words, they are elements of $\textrm{Lists}(\mathbb{Z})$.
• The lists outlined in blue are lists of elements of $\textrm{Lists}(\mathbb{Z})$. In other words, they are list of lists of elements of $\mathbb{Z}$. Those are the kinds of things you can apply join to.
• The leftmost list in the diagram, outlined in green, is a list in $\textrm{Lists}(\textrm{Lists}(\mathbb{Z}))$. This means you can apply join in two different ways:
• Each list boxed in blue is a list of lists of integers (two of the are singletons!) so you can apply join to each of them. This is joining inside first.
• You can apply join directly to the leftmost list, which is a list of lists (of lists, but forget that for the moment), so you can apply join to the blue lists. This is join outside first.

To understand this diagram, staring at the diagram (for most people) uses the visual pattern recognition part of your brain (which uses over a fifth of the energy used by your brain) to understand what inside and outside mean, and then check your understanding by reading the verbal description. Starting by reading the verbal description first does not work as well for most people.

There is a second unitary diagram for all monads:

The two right hand entries are always the same. Again, I am asking you to use your pattern recognition abilities to learn what singleton list and singleton each mean.

The main and unit monad diagrams will be used as axioms to give the general definition of monad. To give those axioms, we also need the concepts of functor and natural transformation, which I will define later after I have finished the monad algebra diagrams for Lists and several other examples.

# Algebras for the List monad

If you have any associative binary operation on a set $S$, its definition can be extended to any nonempty list of elements (see Monads for High School I.)

Plus and Times are like that:

• $(3+2)+4$ and $3+(2+4)$ have the same value $9$, so you can write $3+2+4$ and it means $9$ no matter how you calculate it.
• I will be using the notation Plus$\boxed{3\; 2\; 4}$ instead of $3+2+4$.
• Times is also associative, so for example we can write Times$\boxed{3\; 2\; 4}=24$.
• Like join, we require that these operations satisfy oneidentity, so we know Plus$\boxed{3}=3$ and Times$\boxed{3}=3$.
• When the associative binary operation has an identity element, you can also define its value on the empty list as the identity element: Plus$\boxed{\phantom{3}}=0$ and Times$\boxed{\phantom{3}}=1$. I recommend that you experiment with examples to see why it works.

An algebra for the List monad is a function algop:$\textrm{Lists}(S)\to S$ with certain properties: It must satisfy the Main Monad Algebra Diagram and the Unit Monad Algebra Diagram, discussed below.

## The main monad algebra diagram

### Example using Plus and Times

The following interactive diagram allows you to see what happens with Plus and Times. Afterwards, I will give the general definition.

Plus insides replaces each inside list with the result of applying Plus to it, and the other operation Join is the same operation I have used before.

### Another example

The main monad algebra diagram requires that if you have a list of lists of numbers such as the one below, you can add up each list (Plus insides) and then add up the list of totals (top list in diagram), you must get the same answer that you get when you join all the lists of numbers together into one list (bottom list in the diagram) and then add up that list.

This is illustrated by this special case of the main monad algebra diagram for Plus:

### General statement of the main monad algebra diagram

Suppose we have any function $\blacksquare$ $:\textrm{Lists}(S)\to S$ for any set $S$.
If we want to give the main monad algebra diagram for $\blacksquare$ we have a problem. We know for example that Plus$\boxed{1\; 2}=3$. But for some elements $a$ and $b$ of $S$, we don’t know what $\blacksquare\boxed{a\; b}$ is. One way to write it is simply to write $\blacksquare\boxed{a\; b}$ (the usual way we write a function). Or we could use tree notation and write

.

I will use tree notation mostly, but it is a good exercise to redraw the diagrams with functional notation.

### Main monad diagram in prose

Below is a presentation of the general main monad algebra diagram using (gasp!) English phrases to describe the nodes.

## The unit monad algebra diagram

Suppose $\blacksquare$ is any function from $\textrm{Lists}(S)$ to $S$ for any set $S$. Then the diagram is

This says that if you apply $\blacksquare$ to a singleton you get the unique entry of the singleton. This is not surprising: I defined above what it means when you apply an operation to a singleton just so this would happen!

## A particular example

These are specific examples of the general main monad algebra diagram for an arbitrary operation $\blacksquare$:

These examples show that if $\blacksquare$ is any function from $\textrm{Lists}(S)$ to $S$ for any set $S$, then

equals

and

equals

Well, according to some ancient Greek guy, that means

equals

which says that

is an associative binary operation!

# The mother of all associative operations

We also know that any associative binary $\blacksquare$ on any set $S$ can be extended to a function on all finite nonempty lists of elements of $S$. This is the general associative law and was discussed (without using that name) in Monads fo High School I.

Let’s put what we’ve done together into one statement:

Every associative binary operation $\blacksquare$ on a set $S$ can be extended uniquely to a function $\blacksquare:\textrm{Lists}^*(S)\to S$ that satisfies both the main monad algebra diagram and the unit monad algebra diagram. Furthermore, any function $\blacksquare:\textrm{Lists}^*(S)\to S$ that satisfies both the main monad algebra diagram and the unit monad algebra diagram is an asssociative binary operation when applied to lists of length $2$ of elements of $S$.

That is why I claim that the NonemptyList monad is the mother of all associative binary operations.

I have not proved this, but the work in this and preceding posts provide (I think) a good intuitive understanding of this fundamental relationship between lists and associative binary operations.

# Things to do in upcoming posts

• I have to give a proper definition of monads using the concepts of functor and natural transformation. I expect to do this just for set functors, not mentioning categories.
• Every type of binary operation that is defined by equations corresponds to a monad which is the mother of all binary operations of that type. I will give examples, but not prove the general case.

• Associative binary operations on $S$ with identity element (monoids) corresponds to all lists, including the empty list, with entries from $S$.
• Commutative, associative and idempotent binary operations, like and and or in Boolean algebra, correspond to the set monad: $\text{Sets}(S)$ is the set of all finite and countably infinite sets of elements of $S$. (You can change the cardinality restrictions, but you have to have some cardinality restrictions.) Join is simply union.
• Commutative and associative binary operations corresponds to the multiset monad (with a proper definition of join) and appropriate cardinality restrictions. You have to fuss about identity elements here, too.
• Various kinds of nonassociative operations get much more complicated, involving tree structures with equivalence relations on them. I expect to work out a few of them.
• There are lots of monads in computing science that you never heard of (unless you are a computing scientist). I will mention a few of them.

• Every type of binary operation defined by equations corresponds to a monad. But some of them are unsolvable, meaning you cannot describe the monad precisely.

There will probably be long delay before I get back to this project. There are too many other things I want to do!

Send to Kindle

# “Category Theory for Computing Science” online

The entire contents of the 3rd edition of Category theory for computing science, by Michael Barr and Charles Wellsis now available online at either of these addresses:

http://www.abstractmath.org/CTCS/ctcs.pdf

We have submitted it to be published by Theory and Application of Categories in their reprint series. That is where Triples, Toposes and Theories is published.

There are two distinct versions of parts of CTCS already available on the internet.  Neither of them contain all the material in the book, nor do they contain answers to all the exercises, as the copy available above does.

Send to Kindle

# Metaphors in computing science 2

In Metaphors in Computer Science 1, I discussed some metaphors used when thinking about various aspects of computing.  This is a continuation of that post.

### Metaphor: A program is a list of instructions.

• I discussed this metaphor in detail in the earlier post.
• Note particularly that the instructions can be in a natural or a programming language. (Is that a zeugma?)  Many writers would call instructions in a natural language an algorithm.
• I will continue to use “program” in the broader sense.

### Metaphor: A programming language is a language.

• This metaphor is a specific conceptual blend that associates the strings of symbols that constitute a program in a computer language with text in a natural language.
• The metaphor is based on some similarities between expressions in a programming language and expressions in a natural language.
• In both, the expressions have a meaning.
• Both natural and programming languages have specific rules for constructing well-formed expressions.
• This way of thinking ignores many deep differences between programming languages and natural languages. In particular, they don’t talk about the same things!
• The metaphor has been powerful in suggesting ways of thinking about computer programs, for example semantics (below) and ambiguity.

### Metaphor: A computer program is a list of statements

• A consequence of this metaphor is that a computer program is a list of symbols that can be stored in a computer’s memory.
• This metaphor comes with the assumption that if the program is written in accordance with the language’s rules, a computer can execute the program and perhaps produce an output.
• This is the profound discovery, probably by Alan Turing, that made the computer revolution possible. (You don’t have to have different physical machines to do different things.)
• You may want me to say more in the heading above: “A computer program is a list of statements in a programming language that satisfies the well-formedness requirements of the language.”  But the point of the metaphor is only that a program is a list of statements.  The metaphor is not intended to define the concept of “program”.

### Metaphor: A program in a computer language has meanings.

A program is intended to mean something to a human reader.

• Some languages are designed to be easily read by a human reader: Cobol, Basic, SQL.
• Their instructions look like English.
• The algorithm can nevertheless be difficult to understand.
• Some languages are written in a dense symbolic style.
• In many cases the style is an extension of the style of algebraic formulas: C, Fortran.
• Other languages are written in a notation not based on algebra:  Lisp, APL, Forth.
• The boundary between “easily read” and “dense symbolic” is a matter of opinion!

A program is intended to be executed by a computer.

• The execution always involves translation into intermediate languages.
• Most often the execution requires repeated translation into a succession of intermediate languages.
• Each translation requires the preservation of the intended meaning of the program.
• The preservation of intended meaning is what is usually called the semanticsof a programming language.
• In fact, the meaning of the program to a person could be called semantics, too.
• And the human semantics had better correspond in “meaning” to the machine semantics!
• The actual execution of the program requires successive changes in the state of the computer.
• By “state” I mean a list of the form of the electrical charges of each unit of memory in the computer.
• Or you can restrict it to the relevant units of memory, but spelling that out is horrifying to contemplate.
• The resulting state of the machine after the program is run is required to preserve the intended meaning as well as all the intermediate translations.
• Notice that the actual execution is a series of physical events.  You can describe the execution in English or in some notation, but that notation is not the actual execution.

#### References

Conceptual blend (Wikipedia)

Conceptual metaphors (Wikipedia)

Images and Metaphors (article in abstractmath)

Semantics in computer science (Wikipedia)

Send to Kindle

# A visualization of a computation in tree form

The interactive examples in this post require installing Wolfram CDF player, which is free and works on most desktop computers using Firefox, Safari and Internet Explorer, but not Chrome. The source code is the Mathematica Notebook Live evaluation of expressions in TreeForm 3, which is available for free use under a Creative Commons Attribution-ShareAlike 2.5 License. The code is ad-hoc.  It might be worthwhile for someone to design a package that produces this sort of tree for any expression. The notebook can be read by CDF Player if you cannot make the embedded versions in this post work.

This demonstration shows the step by step computation of the value of the expression $3x^2+2(1+y)$ shown as a tree.  By moving the first slider from right to left, you go through the six steps of the computation. You may select the values of $x$ and $y$ with the second and third sliders.  If you click on the plus sign next to a slide, a menu opens up that allows you to make the slider move automatically, shows the values, and other things.

Note that subtrees on the same level are evaluate left to right.  Parallel processing would save two steps.

A previous post related to this post is Making visible the abstraction in algebraic notation.

Send to Kindle

# Making visible the abstraction in algebraic notation

The interactive examples in this post require installing Wolfram CDF player, which is free and works on most desktop computers using Firefox, Safari and Internet Explorer, but not Chrome. The source code is the Mathematica Notebook Handmade Exp Tree.nb, which is available for free use under a Creative Commons Attribution-ShareAlike 2.5 License. The notebook can be read by CDF Player if you cannot make the embedded versions in this post work.

### Algebraic notation

Algebraic notation contains a hidden abstract structure coded by apparently arbitrary conventions that many college calculus students don't understand completely. This very simple example shows one of the ways in which calc students may be confused:

1. $x+2y$
2. $(x+2)y$

Students often mean to express formula 2 when they write something like $x\!\!+\!\!2\,\,\,\,y$ (with a space).  This is a perfectly natural way to write it. But it is against the rules, I presume because in handwriting it is not clear when you mean a space and when you don't.

Formula 1 can also be written as $x+(2y)$, and if it were usually written that way students (I predict) would be less confused.   Always writing it this way would exacerbate the clutter of parentheses but would allow a simple rule:

Evaluate every expression inside parentheses first, starting with the innermost.

### Using trees for algebra

Writing algebraic expressions as a tree (as in computing science)

• makes it obvious what gets evaluated first
• uses no parentheses at all.

An example of using the tree of an expression to do calculations is available in Expressions.nb (requires Mathematica) and Expressions.cdf (requires CDF player only) on my Mathematica website.  I could imagine using tree expressions instead of standard notation as the normal way of doing things. That would require working on Ipads or some such and would take a big amount of investment in software making it intuitive and easy to use.  No, I am not going to embark on such an adventure, but I think it ought to be attempted.  (Brett Victor has many ideas like this.)

### Transforming algebraic notation into trees

The two manipulable diagrams below show the algebraic notation being transformed into tree form.  I expect that this will make the abstract structure more concrete for many students and I encourage others to show it to their students.  Note that the tree form makes everything explicit.>

After I return from a ten-day trip I will explore the possibility of making the expression-to-tree transformer turn the expression into an evaluable tree as in Expressions.nb and Expressions.cdf.  In the I hope not to distant future students should have access to many transformers that morph expressions from one form into another.  Such transformers are much more politically correct than Optimus Prime.

Offloading chunking and Computable algebraic expressions in tree form are earlier posts related to this post.

Send to Kindle

# Metaphors in computing science I

Michael Barr recently told me of a transcription of a talk by Edsger Dijkstra dissing the use of metaphors in teaching programming and advocating that every program be written together with a proof that it works.  This led me to think about the metaphors used in computing science, and that is what this post is about.  It is not a direct answer to what Dijkstra said.

We understand almost anything by using metaphors.  This is a broader sense of metaphor than that thing in English class where you had to say "my love is a red red rose" instead of "my love is like a red red rose".  Here I am talking about conceptual metaphors (see references at the end of the post).

### Metaphor: A program is a set of instructions

You can think of a program as a list of instructions that you can read and, if it is not very complicated, understand how to carry them out.  This metaphor comes from your experience with directions on how to do something (like directions from Google Maps or for assembling a toy).   In the case of a program, you can visualize doing what the program says to do and coming out with the expected output. This is one of the fundamental metaphors for programs.

Such a program may be informal text or it may be written in a computer language.

#### Example

A description of how to calculate $n!$ in English could be:  "Multiply the integers $1$ through $n$".  In Mathematica, you could define the factorial function this way:

fac[n_] := Apply[Times, Table[i, {i, 1, n}]]

This more or less directly copies the English definition, which could have been reworded as "Apply the Times function to the integers from $1$ to $n$ inclusive."  Mathematica programmers customarily use the abbreviation "@@" for Apply because it is more convenient:

Fac[n_]:=Times @@ Table[i, {i, 1, 6}]

As far as I know, C does not have list operations built in.  This simple program gives you the factorial function evaluated at $n$:

j=1;  for (i=2; i<=n; i++)   j=j*i; return j;

This does the calculation in a different way: it goes through the numbers $1, 2,\ldots,n$ and multiplies the result-so-far by the new number.  If you are old enough to remember Pascal or Basic, you will see that there you could use a DO loop to accomplish the same thing.

#### What this metaphor makes you think of

Every metaphor suggests both correct and incorrect ideas about the concept.

• If you think of a list of instructions, you typically think that you should carry out the instructions in order.  (If they are Ikea instructions, your experience may have taught you that you must carry out the instructions in order.)
• In fact, you don't have to "multiply the numbers from $1$ to $n$" in order at all: You could break the list of numbers into several lists and give each one to a different person to do, and they would give their answers to you and you would multiply them together.
• The instructions for calculating the factorial can be translated directly into Mathematica instructions, which does not specify an order.   When $n$ is large enough, Mathematica would in fact do something like the process of giving it to several different people (well, processors) to speed things up.
• I had hoped that Wolfram alpha would answer "720" if I wrote "multiply the numbers from $1$ to $6$" in its box, but it didn't work.  If it had worked, the instruction in English would not be translated at all. (Note added 7 July 2012:  Wolfram has repaired this.)
• The example program for C that I gave above explicitly multiplies the numbers together in order from little to big.  That is the way it is usually taught in class.  In fact, you could program a package for lists using pointers (a process taught in class!) and then use your package to write a C program that looks like the  "multiply the numbers from $1$ to $n$" approach.  I don't know much about C; a reader could probably tell me other better ways to do it.

So notice what happened:

• You can translate the "multiply the numbers from $1$ to $n$" directly into Mathematica.
•  For C, you have to write a program that implements multiplying the numbers from $1$ to $n$. Implementation in this sense doesn't seem to come up when we think about instruction sets for putting furniture together.  It is sort of like: Build a robot to insert & tighten all the screws.

Thus the concept of program in computing science comes with the idea of translating the program instruction set into another instruction set.

• The translation provided above for Mathematica resembles translating the instruction set into another language.
• The two translations I suggested for C (the program and the definition of a list package to be used in the translation) are not like translating from English to another language.  They involve a conceptual reconstruction of the set of instructions.

Similarly, a compiler translates a program in a computer language into machine code, which involves automated conceptual reconstruction on a vast scale.

#### Other metaphors

• C or Mathematica as like a natural language in some ways
• Compiling (or interpreting) as translation

Computing science has used other VIM's (Very Important Metaphors) that I need to write about later:

• Semantics (metaphor: meaning)
• Program as text — this allows you to treat the program as a mathematical object
• Program as machine, with states and actions like automata and Turing machines.
• Specification of a program.  You can regard  "the product of the numbers from $1$ to $n$" as a specification.  Notice that saying "the product" instead of "multiply" changes the metaphor from "instruction" to "specification".

#### References

Conceptual metaphors (Wikipedia)

Images and Metaphors (article in abstractmath)

Images and Metaphors for Sets (article in abstractmath)

Images and Metaphors for Functions (incomplete article in abstractmath)

Send to Kindle

# Bugs in English and in math

Everyone knows that computer programs have bugs.  In fact, languages have bugs, too, although we don't usually call them that.

## Bugs in English

### Right

Q: "Should I turn left at the next corner?" A: "Right".  Probably most Americans who drive now know this bug.  The answer could mean "yes" or "turn right".  So we have to stop and think how to answer this question.  That makes it a bug.

### Too, two

Comment: " We will take Route 30".  Answer: "We will take Route 30 too".  This bug is probably responsible for the survival of the word "also".

Note that unlike the case of "right", this is a bug only of spoken English.

### Subject and predicate

In Comma rule found dysfunctional, I wrote about the problem that in formal English writing there is no way to indicate where the subject ends and the predicate begins.  This causes a problem reading complicated sentences with many clauses such as academic writing often uses.  Of course, one way around this is to write short, simple sentences!  (That sounds like the subject of a future blog…)

## Bugs in the symbolic language of math

### Fractions

In both Excel and Mathematica, "1/2*3" means 3/2. Now, I would think "1/2a" means "1/(2a)", but younger mathematicians are taught PEMDAS (see Purplemath), which says that division and multiplication have the same precedence and operations are evaluated from left to right.

If in Mathematica you define a function f[a_] := 1/2a, f[3] evaluates to 3/2, so Mathematica (and most other computer languages) agree with PEMDAS. (Note: When you write 1/2a in a Mathematica notebook, it automatically puts a space between the 2 and the a, and space in Mathematica means times, so it does warn you.)

Nevertheless, my ancient education would lead me to write (1/2)a for that meaning.  This means I must learn to write 1/(2a) for the other meaning instead of 1/2a.

Questions:

• Did the language really change or was I always "doing it wrong"?  I would like to hear from other ancient mathematicians.  (But I don't know very many who would read blogs or Purplemath.)
• Should such a phenomenon be called a bug?

### Repeated exponentiation

In Excel, "2^2^3" means $(2^2)^3$, in other words, 64.  In Mathematica, it means $2^{(2^3)}=2^8=256$.  My impression is that most mathematicians expect it to mean $2^{(2^3)}$.

References: This post in Walking Randomly, my post Mathematical UsageWikipedia's article.

### Exponentiation on functions is ambiguous

If $f:\mathbb{R}\to\mathbb{R}$ is a function, $f^2(x)$ can mean either $f(f(x))$ or $f(x)f(x)$, and both usages are common.  You should tell your students about this because no one is ever going to make one of the usages go away.

A far worse catastrophe is the fact that in calculus books, $\sin^2x=(\sin\,x)(\sin\,x)$ but $\sin^{-1}x=\text{arcsin}\,x$.  I betcha (lived in Minnesota four years now) we could succeed with a campaign to convince calc book publishers to always write $(\sin\,x)^2$ and $\arcsin\,x$.

## Bugs in the Mathematical Dialect of English

The mathematical dialect of English is what I call Mathematical English in the abstractmath website.  It is a different language from the symbolic language, which is not a dialect of English.

I have written about the problems with Mathematical English in a ridiculous number of places.  (See references in The Handbook of Mathematical Discourse).  It is normal for a dialect of a language to use words and grammatical structures that in the original language mean different things.  (See Dialects below).

### Words with different meanings

• A set is a group in standard English, but not in math English.
• The number 2+3i is a real number in standard English, but not in math English.
• And so on.

### Use of adjectives and prefixes

• A "noncommutative ring" has commutative addition.
• A "semigroup" has a fully defined binary operation.

### If, then

The bug that grabs math newbies by the throat and won't let go is the meaning of "If P, then Q".

• "If a number is divisible by 4, then it is even" in math dialect means a number not divisible by 4 might be even anyway.
• "If you eat your broccoli you will get your dessert" in standard American Parental English does not mean you might get your dessert if you don't eat your broccoli.

And then there is the phenomenon of Vacuous Implication, which leaves students gasping and writhing.

Most Americans are not familiar with dialects in the sense I am using the word here, since the only really different dialects we have are Gullah and Hawaiian Pidgin, both of which are very hard to understand; although for example Appalachian English and African-American urban vernacular [1] are dialects of a milder sort.  I grew up in Savannah and heard diluted Gullah sometimes on the street (didn't understand much).  I am also rather familiar with Züritüütsch since we lived in Zürich for a year.

What the rest of the world call dialects have many distinctive properties:

• They have nonstandard pronunciation to the point where they are difficult to understand.
• They have differences in grammar.  (Both Gullah and especially Hawaiian Creole have differences in grammar from Standard English.)
• They have differences in vocabulary, enough sometimes to cause misunderstanding.

I grew up speaking an Atlanta dialect, which really did have differences in all those parameters.  But what people today call a Southern accent is really just an accent (minor variations in pronunciation), not a dialect.

Hawaiian Creole, and possibly Gullah, but not the other dialects I mentioned, are singled out by linguists as creoles because they been modified heavy influence from another language.  Züritüütsch is not a creole, but it is quite difficult for native German-speakers to understand.  The Swiss situation particularly emphasizes the distinction between "dialect" and "accent".  The typical native of Zürich speaks Züritüütsch and also speaks standard German with a Swiss accent.

### Reference

[1] What Language Is (And What It Isn't and What It Could Be) by John H. McWhorter. Gotham, 2011.

Send to Kindle

# An Introduction to Forms

In 2009, I wrote a sequence of posts on this blog explaining the concept of form that I introduced in [1].  I have now updated and combined them into an article [2].  The posts no longer exist on the blog. The article contains links to other papers on forms.

[1] A generalization of the concept of sketch, Theoretical Computer Science 70, 1990.

Send to Kindle

# Operation as metaphor in math

Operation: Is it just a name or is there a metaphor behind it?

A function of the form ${f:S\times S\rightarrow S}$ may be called a binary operation on ${S}$. The main point to notice is that it takes pairs of elements of ${S}$ to the same set ${S}$.

A binary operation is a special case of n-ary operation for any natural number ${n}$, which is a function of the form ${f:S^n\rightarrow S}$. A ${1}$-ary (unary) operation on ${S}$ is a function from a set to itself (such as the map that takes an element of a group to its inverse), and a ${0}$-ary (nullary) operation on ${S}$ is a constant.

It is useful at times to consider multisorted algebra, where a binary operation can be a function ${f:S_1\times S_2\rightarrow S_3}$ where the ${S_i}$ are possibly different sets. Then a unary operation is simply a function.

Calling a function a multisorted unary operation suggest a different way of thinking about it, but as far as I can tell the difference is only that the author is thinking of algebraic operations as examples. This does not seem to be a different metaphor the way “function as map” and “function as transformation” are different metaphors. Am I missing something?

In the 1960’s some mathematicians (not algebraists) were taken aback by the idea that addition of real numbers (for example) is a function. I observed this personally. I don’t think any mathematician would react this way today.

Send to Kindle