Definition of function

Note: This is a revision of the article on specification and definition of functions from abstractmath.org. Many of the links in this article take you to other articles in abstractmath.org.

 A function is a mathematical object.

To deal with functions as a math object, you need a precise definition of “function”. That is what this article gives you.

• The article starts by giving a specification of “function”.
• After that, we get into the technicalities of the
definitions of the general concept of function.
• Things get complicated because there are several inequivalent definitions of “function” in common use.

Specification of “function”

A function $f$ is a mathematical object which determines and is completely determined by the following data:

• (DOM) $f$ has a domain, which is a set. The domain may be denoted by $\text{dom} f$.
• (COD) $f$ has a codomain, which is also a set and may be denoted by $\text{cod} f$.
• (VAL) For each element $a$ of the domain of $f$, $f$ has a value at $a$.
• (FP) The value of $f$ at $a$ is
completely determined by $a$ and $f$.
• (VIC) The value of $f$ at $a$ must be an element of the codomain of $f$.

• The value of $f$ at $a$ is most cohttp://www.abstractmath.org/MM/MMonly written $f(a)$, but see Functions: Notation and Terminology.
• To evaluate $f$ at $a$ means to determine $f(a)$. The two examples of functions below show that different functions may have different strategies for evaluating them.
• In the expression “$f(a)$”, $a$ is called the input or (old-fashioned) argument of $f$.
• “FP” means functional property.
• “VIC” means “value in codomain”.

Examples

I give two examples here. The examples of functions chapter contains many other examples.

A finite function

Let $F$ be the function defined on the set $\left\{\text{a},\text{b},\text{c},\text{d}\right\}$ as follows: $F(\text{a})=\text{a},\,\,\,F(\text{b})=\text{c},\,\,\,F(\text{c})=\text{c},\,\,\,F(\text{d})=\text{b}$In this definition, $\text{a},\text{b},\text{c},\text{d}$ are letters of the alphabet, not variables. This is the function called “Finite” in the chapter on examples of functions.

• The definition of $F$ says “$F$ is defined on the set $\left\{\text{a},\,\text{b},\,\text{c},\,\text{d} \right\}$”. The phrase “is defined on”
means that the domain is that set. That is standard terminology.
• The value of $F$ at each element of the domain is given explicitly. The value at
$\text{b}$, for example, is $\text{c}$, because the definition says that $F(\text{b}) = \text{c}$. No other reason needs to be given. Mathematical definitions can be arbitrary.
• The codomain of $F$ is not specified, but must include the set $\{\text{a},\text{b},\text{c}\}$. The codomain of a function is often not specified when it is not important, which is most of the time in freshman calculus (for example).
• The diagram below shows how $F$ obeys the rule that the value of an element $x$ in the domain is completely determined by $x$ and $F$.
• If two arrows had started from the same element of the domain, then $F$ would not be a function. (It would be a multivalued function).
• If there were an element of the domain that no arrow started from, it $F$ would not be a function. (It would be a partial function.)
• In this example, to evaluate $F$ at $b$ (to determine the value of $F$ at $b$) means to look at the definition of $F$, which says among other things that the value is $c$ (or alternatively, look at the diagram above and see what letter the arrow starting at $b$ points to). In this case, “evaluation” does not imply calculating a formula.

A real-valued function

Let $G$ be the real-valued function defined by the formula $G(x)={{x}^{2}}+2x+5$.

• The definition of $G$ gives the value at each element of the domain by a formula. The value at $3$, for example, is obtained by calculating $G(3)=3^2+2\cdot3+5=20$
• The definition of $G$
does not specify the domain. The convention in the case of functions defined on the real numbers by a formula is to take the domain to be all real numbers at which the formula is defined. In this case, that is every real number, so the domain is $\mathbb{R}$.
• The definition of $G$ does not specify the codomain, either. However, the codomain must include all real numbers greater than or equal to $4$. (Why?)
• So if an author wrote, “Let $H(x)=\frac{1}{x}$”, the domain would be the set of all real numbers except $0$. But a careful author would write, “Let $H(x)=\frac{1}{x}$ ($x\neq0$).”

What the specification means

• The specification guarantees that a function satisfies all five of the properties listed.
• The specification does not define a mathematical structure in the way mathematical structures have been defined in the past: In particular, it does not require a function to be one or more sets with structure.
• Even so, it is useful to have the specification, because:

 Many mathematical definitions introduce extraneous technical elements which clutter up your thinking about the object they define.

History

The discussion below is an over­simpli­fication of the history of mathe­matics, which many people have written thick books about. A book relevant to these ideas is Plato’s Ghost, by Jeremy Gray.

Until late in the nineteenth century, functions were usually thought of as defined by formulas (including infinite series). Problems arose in the theory of harmonic analysis which made mathematicians require a more general notion of function. They came up with the concept of function as a set of ordered pairs with the functional property (discussed below), and that understanding revolutionized our understanding of math.

In particular, this definition, along with the use of set theory, enabled abstract math (ahem) to become a cohttp://www.abstractmath.org/MM/MMon tool for understanding math and proving theorems. It is conceivable that some readers may wish it hadn’t. Well, tough.

The modern definition of function given here (which builds on the ordered pairs with functional property definition) came into use beginning in the 1950’s. The modern definition became necessary in algebraic topology and is widely used in many fields today.

The concept of function as a formula never disappeared entirely, but was studied mostly by logicians who generalized it to the study of function-as-algorithm. Of course, the study of algorithms is one of the central topics of modern computing science, so the notion of function-as-formula (updated to function-as-algorithm) has achieved a new importance in recent years.

To state both the definition, we need a preliminary idea.

The functional property

A set $P$ of ordered pairs has the functional property if two pairs in $P$ with the same first coordinate have to have the same second coordinate (which means they are the same pair). In other words, if $(x,a)$ and $(x,b)$ are both in $P$, then $a=b$.

How to think about the functional property

The point of the functional property is that for any pair in the set of ordered pairs, the first coordinate determines what the second one is (which is just what requirement FP says in the specification). That’s why you can write “$G(x)$” for any $x$ in the domain of $G$ and not be ambiguous.

Examples

• The set $\{(1,2), (2,4), (3,2), (5,8)\}$ has the functional property, since no two different pairs have the same first coordinate. Note that there are two different pairs with the same second coordinate. This is irrelevant to the functional property.
• The set $\{(1,2), (2,4), (3,2), (2,8)\}$ does not have the functional property. There are two different pairs with first coordinate 2.
• The empty set $\emptyset$ has the function property vacuously.

Example: graph of a function defined by a formula

In calculus books, a picture like this one (of part of $y=x^2+2x+5$) is called a graph. Here I use the word “graph” to denote the set of ordered pairs
$\left\{ (x,{{x}^{2}}+2x+5)\,\mathsf{|}\,x\in \mathbb{R } \right\}$
which is a mathematical object rather than some ink on a page or pixels on a screen.

The graph of any function studied in beginning calculus has the functional property. For example, the set of ordered pairs above has the functional property because if $x$ is any real number, the formula ${{x}^{2}}+2x+5$ defines a specific real number.

• if $x = 0$, then ${{x}^{2}}+2x+5=5$, so the pair $(0, 5)$ is an element of the graph of $G$. Each time you plug in $0$ in the formula you get 5.
• if $x = 1$, then ${{x}^{2}}+2x+5=8$.
• if $x = -2$, then ${{x}^{2}}+2x+5=5$.

You can measure where the point $\{-2,5\}$ is on the (picture of) the graph and see that it is on the blue curve as it should be. No other pair whose first coordinate is $-2$ is in the graph of $G$, only $(-2, 5)$. That is because when you plug $-2$ into the formula ${{x}^{2}}+2x+5$, you get $5$ and nothing else. Of course, $(0, 5)$ is in the graph, but that does not contradict the functional property. $(0, 5)$ and $(-2, 5)$ have the same second coordinate, but that is OK.

Mathematical definition of function

A function $f$ is a
mathematical structure consisting of the following objects:

• A set called the domain of $f$, denoted by $\text{dom} f$.
• A set called the codomain of $f$, denoted by $\text{cod} f$.
• A set of ordered pairs called the graph of $f$, with the following properties:
• $\text{dom} f$ \text{dom} fis the set of all first coordinates of pairs in the graph of $f$.
• Every second coordinate of a pair in the graph of $f$ is in $\text{cod} f$ (but $\text{cod} f$ may contain other elements).
• The graph of $f$ has the functional property.

Using arrow notation, this implies that $f:\text{dom}f\to\text{cod} f$.

Remark

The main difference between the specification of function given previously and this definition is that the definition replaces the statement “$f$ has a value at $a$” by introducing a set of ordered pairs (the graph) with the functional property.

• This set of ordered pairs is extra structure introduced by the definition mainly in order to make the definition a classical sets-with-structure.
• This makes the graph, which should be a concept derived from the concept of function, appear to be a necessary part of the function.
• That suggests incorrectly that the graph is more of a primary intuition that other intuitions such as function as map, function as transformer, and other points of view discussed in the article Images and meta­phors for functions.
• The concept of graph of a function is indeed an important intuition, and is discussed with examples in the articles Graphs of continuous functions and Graphs of finite functions.
• Nevertheless, the fact that the concept of graph appears in the definition of function does not make it the most important intuition.

Examples

• Let $F$ have graph $\{(1,2), (2,4), (3,2), (5,8)\}$ and define $A = \{1, 2, 3, 5\}$ and $B = \{2, 4, 8\}$. Then $F:A\to B$ is a function. In speaking, we would usually say, “$F$ is a function from $A$ to $B$.”
• Let $G$ have graph $\{(1,2), (2,4), (3,2), (5,8)\}$ (same as above), and define $A = \{1, 2, 3, 5\}$ and $C = \{2, 4, 8, 9, 11, \pi, 3/2\}$. Then $G:A\to C$ is a (admittedly ridiculous) function. Note that all the second coordinates of the graph are in the codomain $C$, along with a bunch of miscellaneous suspicious characters that are not second coordinates of pairs in the graph.
• Let $H$ have graph $\{(1,2), (2,4), (3,2), (5,8)\}$. Then $H:A\to \mathbb{R}$ is a function, since $2$, $4$ and $8$ are all real numbers.
• Let $D = \{1, 2, 5\}$ and $E = \{1, 2, 3, 4, 5\}$. Then there is no function $D\to A$ and no function $E\to A$ with graph $\{(1,2), (2,4), (3,2), (5,8)\}$. Neither $D$ nor $E$ has exactly the same elements as the first coordinates of the graph.

Identity and inclusion

Suppose we have two sets  A and  B with $A\subseteq B$.

• The identity function on A is the function ${{\operatorname{id}}_{A}}:A\to A$ defined by ${{\operatorname{id}}_{A}}(x)=x$ for all $x\in A$. (Many authors call it ${{1}_{A}}$).
• When $A\subseteq B$, the inclusion function from $A$ to $B$ is the function $i:A\to B$ defined by $i(x)=x$ for all $x\in A$. Note that there is a different function for each pair of sets $A$ and $B$ for which $A\subseteq B$. Some authors call it ${{i}_{A,\,B}}$ or $\text{in}{{\text{c}}_{A,\,B}}$.

The identity function and an inclusion function for the same set $A$ have exactly the same graph, namely $\left\{ (a,a)|a\in A \right\}$. More about this below.

Other definitions of function

Original abstract definition of function

Definition

• A function $f$ is a set of ordered pairs with the functional property.
• If $f$ is a function according to this definition, the domain of $f$ is the set of first coordinates of all the pairs in $f$.
• If $x\in \text{dom} f$, then we define the value of $f$ at $x$, denoted by $f(x)$, to be the second coordinate of the only ordered pair in $f$ whose first coordinate is $x$.

Remarks

• This definition is still widely used in mathematical writing.
• Many authors do not tell you which definition they are using.
• For many purposes (including freshman calculus for the most part) it does not matter which definition is used.
• In some branches of math, the modern definition adds great clarity to many complicated situations; using the older definition can even make it difficult to describe some important constructions. There is more about this in New Approaches below.

Possible confusion

Some confusion can result because of the presence of these two different definitions.

• For example, since the identity function ${{\operatorname{id}}_{A}}:A\to A$ and the inclusion function ${{i}_{A,\,B}}:A\to B$ have the same graph, users of the older definition are required in theory to say they are the same function.
• Also it requires you to say that the graph of a function is the same thing as the function.
• In my observation, this does not make a problem in practice, unless there is a very picky person in the room.
• It also appears to me that the modern definition is (quite rightly) winning and the original abstract definition is disappearing.

Multivalued function

The phrase multivalued function refers to an object that is like a function $f:S\to T$ except that for $s\in S$, $f(s)$ may denote more than one value.

Examples

• Multivalued functions arose in considering complex functions. In cohttp://www.abstractmath.org/MM/MMon practice, the symbol $\sqrt{4}$ denoted $2$, although $-2$ is also a square root of $4$. But in complex function theory, the square root function takes on both the values $2$ and $-2$. This is discussed in detail in Wikipedia.
• The antiderivative is an example of a multivalued operator. For any constant $C$, $\frac{x^3}{3}+C$ is an antiderivative of $x^2$, so that $\frac{x^3}{3}$, $\frac{x^3}{3}+42$, $\frac{x^3}{3}-1$ and $\frac{x^3}{3}+2\pi$ are among the infinitely many antiderivatives of $x^2$.

A multivalued function $f:S\to T$ can be modeled as a function with domain $S$ and codomain the set of all subsets of $T$. The two meanings are equivalent in a strong sense (naturally equivalent). Even so, it seems to me that they represent two differ­ent ways of thinking about
multivalued functions. (“The value may be any of these things…” as opposed to “The value is this whole set of things.”)

Some older mathematical papers in com­plex func­tion theory do not tell you that their functions are multi­valued. There was a time when com­plex func­tion theory was such a Big Deal in research mathe­matics that the phrase “func­tion theory” meant complex func­tion theory and every mathe­ma­tician with a Ph. D. knew that complex functions were multi­valued.

Partial function

A partial function $f:S\to T$ is just like a function except that its input may be defined on only a subset of $S$. For example, the function $f(x):=\frac{1}{x}$ is a partial function from the real numbers to the real numbers.

This models the behavior of computer programs (algorithms): if you consider a program with one input and one output as a function, it may not be defined on some inputs because for them it runs forever (or gives an error message).

In some texts in computing science and mathematical logic, a function is by
convention a partial function, and this fact may not be mentioned explicitly, especially in research papers.

The phrases “multivalued function” and “partial function” upset some picky types who say things like, “But a multi­valued func­tion is not a func­tion!”. A hot dog is not a dog, either. I once had a Russian teacher who was Polish and a German teacher who was Hungarian. So what? See the Hand­book (click on

New approaches to functions

All the definitions of function given here produce mathematical structures, using the traditional way to define mathematical objects in terms of sets. Such definitions have disadvantages.

Mathematicians have many ways to think about functions. That a function is a set of ordered pairs with a certain property (functional) and possibly some ancillary ideas (domain, codomain, and others) is not the way we usually think about them$\ldots$Except when we need to reduce the thing we are studying to its absolutely most abstract form to make sure our proofs are correct.
That most abstract form is what I have called the rigorous view or the dry bones and it is when that reasoning is needed that the sets-with-structure approach has succeeded.

Our practice of abstraction has led us to new approaches to talking about functions. The most important one currently is category theory. Roughly, a category is a bunch of objects together with some arrows going between them that can be composed head to tail. Functions between sets are examples of this: the sets are the objects and the functions the arrows. But arrows in a category do not have to be functions; in that way category theory is an abstraction of functions.

This abstracts the idea of function in a way that brings out common ideas in various branches of math. Research papers in many branches of mathematics now routinely use the language of category theory. Categories now appear in some undergraduate math courses, meaning that Someone needs to write a chapter on category theory for abstractmath.org.

Besides category theory, computing scientists have come up with other abstract ways of dealing with functions, for example type theory. It has not come as far along as category theory, but has shown recent signs of major progress.

Both category theory and type theory define math objects in terms of their effect on and relationship with other math objects. This makes it possible to do abstract math entirely without using sets-with-structure as a means of defining concepts.

References

• Functions in Wikipedia. This is an extensive and mostly well-done description of the use of functions in mathematics.

Send to Kindle

Introducing abstract topics

I have been busy for the past several years revising abstractmath.org (abmath). Now I believe, perhaps foolishly, that most of the articles in abmath have reached beta, so now it is time for something new.

For some time I have been considering writing introductions to topics in abstract math, some typically studied by undergraduates and some taken by scientists and engineers. The topics I have in mind to do first include group theory and category theory.

The point of these introductions is to get the student started at the very beginning of the topic, when some students give up in total confusion. They meet and fall off of what I have called the abstraction cliff, which is discussed here and also in my blog posts Very early difficulties and Very early difficulties II.

I may have stolen the phrase “abstraction cliff” from someone else.

Group theory

Group theory sets several traps for beginning students.

Multiplication table

• A student may balk when a small finite group is defined using a set of letters in a multiplication table.
“But you didn’t say what the letters are or what the multiplication is?”
• Such a definition is an abstract definition, in contrast to the definition of “prime”, for example, which is stated in terms of already known entities, namely the integers.
• The multiplication table of a group tells you exactly what the binary operation is and any set with an operation that makes such a table correct is an example of the group being defined.
• A student who has no understanding of abstraction is going to be totally lost in this situation. It is quite possible that the professor has never even mentioned the concept of abstract definition. The professor is probably like most successful mathematicians: when they were students, they understood abstraction without having to have it explained, and possibly without even noticing they did so.

Cosets

• Cosets are a real killer. Some students at this stage are nowhere near thinking of a set as an object or a thing. The concept of applying a binary operation on a pair of sets (or any other mathematical objects with internal structure) is completely foreign to them. Did anyone ever talk to them about mathematical objects?
• The consequence of this early difficulty is that such a student will find it hard to understand what a quotient group is, and that is one of the major concepts you get early in a group theory course.
• The conceptual problems with multiplication of cosets is similar to those with pointwise addition of functions. Given two functions $f,g:\mathbb{R}\to\mathbb{R}$, you define $f+g$ to be the function $(f+g)(x):=f(x)+g(x)$ Along with pointwise multiplication, this makes the space of functions $\mathbb{R}\to\mathbb{R}$ a ring with nice properties.
• But you have to understand that each element of the ring is a function thought of as a single math object. The values of the function are properties of the function, but they are not elements of the ring. (You can include the real numbers in the ring as constant functions, but don’t confuse me with facts.)
• Similarly the elements of the quotient group are math objects called cosets. They are not elements of the original group. (To add to the confusion, they are also blocks of a congruence.)

Isomorphic groups

• Many books, and many professors (including me) regard two isomorphic groups as the same. I remember getting anguished questions: “But the elements of $\mathbb{Z}_2$ are equivalence classes and the elements of the group of permutations of $\{1,2\}$ are functions.”
• I admit that regarding two isomorphic groups as the same needs to be treated carefully when, unlike $\mathbb{Z}_2$, the group has a nontrivial automorphism group. ($\mathbb{Z}_3$ is “the same as itself” in two different ways.) But you don’t have to bring that up the first time you attack that subject, any more than you have to bring up the fact that the category of sets does not have a set of objects on the first day you define categories.

Category theory

Category theory causes similar troubles. Beginning college math majors don’t usually meet it early. But category theory has begun to be used in other fields, so plenty of computer science students, people dealing with databases, and so on are suddenly trying to understand categories and failing to do so at the very start.

The G&G post A new kind of introduction to category theory constitutes an alpha draft of the first part of an article introducing category theory following the ideas of this post.

Objects and arrows are abstract

• Every once in a while someone asks a question on Math StackExchange that shows they have no idea that an object of a category need not have elements and that morphisms need not be functions that take elements to elements.
• One questioner understood that the claim that a morphism need not be a function meant that it might be a multivalued function.

Duality

• That misunderstanding comes up with duality. The definition of dual category requires turning the arrows around. Even if the original morphism takes elements to elements, the opposite morphism does not have to take elements to elements. In the case of the category of sets, an arrow in $\text{Set}^{op}$ cannot take elements to elements — for example, the opposite of the function $\emptyset\to\{1,2\}$.
• The fact that there is a concrete category equivalent to $\text{Set}^{op}$ is a red herring. It involves different sets: the function corresponding to the function just mentioned goes from a four-element set to a singleton. But in the category $\text{Set}^{op}$ as defined it is simply an arrow, not a function.

Not understanding how to use definitions

• Some of the questioners on Math Stack Exchange ask how to prove a statement that is quite simple to prove directly from the definitions of the terms involved, but what they ask and what they are obviously trying to do is to gain an intuition in order to understand why the statement is true. This is backward — the first thing you should do is use the definition (at least in the first few days of a math class — after that you have to use theorems as well!
• I have discussed this in the blog post Insights into mathematical definitions (which gives references to other longer discussions by math ed people). See also the abmath section Rewrite according to the definitions.

How an introduction to a math topic needs to be written

The following list shows some of the tactics I am thinking of using in the math topic introductions. It is quite likely that I will conclude that some tactics won’t work, and I am sure that tactics I haven’t mentioned here will be used.

• The introductions should not go very far into the subject. Instead, they should bring an exhaustive and explicit discussion of how to get into the very earliest part of the topic, perhaps the definition, some examples, and a few simple theorems. I doubt that a group theory student who hasn’t mastered abstraction and what proofs are about will ever be ready to learn the Sylow theorems.
• You can’t do examples and definitions simultaneously, but you can come close by going through an example step by step, checking each part of the definition.
• There is a real split between students who want the definitions first
(most of whom don’t have the abstraction problems I am trying to overcome)
and those who really really think they need examples first (the majority)
because they don’t understand abstraction.

• When you introduce an axiom, give an example of how you would prove that some binary operation satisfies the axiom. For example, if the axiom is that every element of a group must have an inverse, right then and there prove that addition on the integers satisfies the axiom and disprove that multiplication on integers satisies it.
• When the definition uses some undefined math objects, point out immediately with examples that you can’t have any intuition about them except what the axioms give you. (In contrast to definition of division of integers, where you and the student already have intuitions about the objects.)
• Make explicit the possible problems with abstractmath.org and Gyre&Gimble) will indeed find it difficult to become mathematical researchers — but not impossible!
• But that is not the point. All college math professors will get people who will go into theoretical computing science, and therefore need to understand category theory, or into particle physics, and need to understand groups, and so on.
• By being clear at the earliest stages of how mathematicians actually do math, they will produce more people in other fields who actually have some grasp of what is going on with the topics they have studied in math classes, and hopefully will be willing to go back and learn some more math if some type of math rears its head in the theories of their field.
• Besides, why do you want to alienate huge numbers of people from math, as our way of teaching in the past has done?
• “Our” means grammar school teachers, high school teachers and college professors.

Acknowledgment

Thanks to Kevin Clift for corrections.



Send to Kindle

Proofs using diagrams

Introduction

This post gives a proof of an easy theorem in category theory using the graph-based logic approach of Graph based logic and sketches, (GBLS) by Atish Bagchi and me.

Formal logic is typically defined in terms of formulas and terms, defined recursively as strings of characters, together with rules of inference. GBLS proposes a new approach to logic where diagrams are used instead of strings of characters. The exposition here spells out the proof in more detail than GBLS does and uses various experimental ways of drawing diagrams using Mathematica.

To follow this proof, you need to be familiar with basic category theory. Most special definitions that are needed are defined in this post where they are first used. Section 1 of GBLS also gives the definitions you need with more context.

The theorem

The Theorem to be proved (it is Theorem 8.3.1 of GBLS) says that, in any category, if the triangles in the diagram below commute, then the outside square commutes. This is easy using the associative law: If $xf=h$ and $kx=g$, then $kh=k(xf)=(kx)f=gf$.

Subject Diagram

So what?

This theorem is not interesting. The point of this post is to present a new approach to proving such theorems, using diagrams instead of strings. The reason that exhibiting the dig
rammatic proof is interesting is that many different kinds of categories have a FL cattheory, including these:

Essentially algebraic string-based logic is described in detail in Partial Horn logic and cartesian categories, by E. Palmgren and Steven Vickers.

Commercial

My concept of form in A generalization of the concept of sketch generalizes sketches to all the categories that can be defined as models of FL cattheories. So the method of proof using diagrams can be applied to theorems about the objects defined by forms.

Concepts needed for the graph-based proof

To prove the theorem, I will make use of $\mathbf{ThCat}$, the FL cattheory for categories.

• An FL category is a category with all finite limits.
• GLBS uses the word cattheory for what Category theory for computing science and Toposes, triples and theories call the theory of a sketch.
• In many books and articles, and in nLab, a “sketch” is what we call the cattheory (or the theory) of a sketch. For us, the sketch is a generating collection of objects, arrows, diagrams, cones and cocones for the cattheory. The category of models of the sketch and the cattheory are equivalent.
• $\mathbf{ThCat}$ is a category with finite limits freely generated by certain designated objects, arrows, commutative diagrams and limit cones, listed below.
• A model of $\mathbf{ThCat}$ in $\mathbf{Set}$ (the category of sets, whichever one you like) is an FL functor $\mathfrak{C}:\mathbf{ThCat}\to\mathbf{Set}.$
• Such a model $\mathfrak{C}$ is a small category, and every small category is such a model. If this statement worries you, read Section 3.4 of GBLS.
• Natural transformations between models are FL-preserving functors that preserve the structure on the nose.
• The category of models of $\mathbf{ThCat}$ in $\mathbf{Set}$ is equivalent to the category of small categories and morphisms, which, unlike the category of models, includes functors that don’t preserve things on the nose.
• $\mathbf{ThCat}$ is an example of the theory of an FL sketch. Chapter 4 of GBLS describes this idea in detail. The theory has the same models as the sketch.
• The sketch generating $\mathbf{ThCat}$ is defined in detail in section 7.2 of GBLS.

Some objects and arrows of $\mathbf{ThCat}$

I will make use of the following objects and arrows that occur in $\mathbf{ThCat}.$ A formal thing is a construction in $\mathbf{ThCat}$ that becomes an actual thing in a model. So for example a model $\mathfrak{C}$ of $\mathbf{ThCat}$ in $\mathbf{Set}$ is an actual (small) category, and $\mathfrak{C}(\mathsf{ar_2})$ is the set of all composable pairs of arrows in the category $\mathfrak{C}$.

• $\mathsf{ob}$, the formal set of objects.
• $\mathsf{ar}$, the formal set of arrows.
• $\mathsf{ar}_2$, the formal set of composable pairs of arrows.
• $\mathsf{ar}_3$, the formal set of composable triples of arrows.
• $\mathsf{unit} : \mathsf{ob}\to \mathsf{ar}$ that formally picks out the identity arrow of an object.
• $\mathsf{dom},\mathsf{cod} : \mathsf{ar}\to \mathsf{ob}$ that formally pick out the domain and codomain of an arrow.
• $\mathsf{comp} : \mathsf{ar}_2\to \mathsf{ar}$ that picks out the composite of a composable pair.
• $\mathsf{lfac}, \mathsf{rfac} :\mathsf{ar}_2\to \mathsf{ar}$ that pick out the left and right factors in a composable pair.
• $\mathsf{lfac}, \mathsf{mfac},\mathsf{rfac} :\mathsf{ar}_3 \to\mathsf{ar}$ that pick out the left, middle and right factors in a composable triple of arrows.
• $\mathsf{lass}, \mathsf{rass} : \mathsf{ar}_3 \to \mathsf{ar}_2$: $\mathsf{lass}$ formally takes $\langle{h,g,f}\rangle$ to $\langle{hg,f}\rangle$ and $\mathsf{rass}$ takes it to $\langle{h,gf}\rangle$.

$\mathsf{ob}$, $\mathsf{ar}$, $\mathsf{unit}$, $\mathsf{dom}$, $\mathsf{cod}$ and $\mathsf{comp}$ are given primitives and the others are defined as limits of finite diagrams composed of those objects. This is spelled out in Chapter 7.2 of GBLS. The definition of $\mathbf{ThCat}$ also requires certain diagrams to be commutative. They are all provided in GBLS; the one enforcing associativity is shown later in this post.

Color coding

I will use color coding to separate syntax from semantics.

• Syntax consists of constructions in $\mathbf{ThCat}.$ The description will always be a commutative diagram in black, with annotations as explained later.
• The limit of the description will be an object in $\mathbf{ThCat}$ (the form) whose value in a model $\mathfrak{C}$ will be shown in green, because being an element of the value of a model makes it semantics.
• When a limit cone is defined, the projections (which are arrows in $\mathbf{ThCat}$) will be shown in blue.

Descriptions

In graph-based logic, a type of construction that can be made in a category has a description, which (in the case of our Theorem) is a finite diagram in $\mathbf{ThCat}$. The value of the limit of the description in a model $\mathfrak{C}$ is the set of all instances of that type of construction in $\mathfrak{C}$.

The Subject Diagram

• This diagram is the subject matter of the Theorem. It is not assumed to be commutative.
• As in most diagrams in category theory texts, the labels in this diagram are variables, so the diagram is implicitly universally quantified. The Subject Diagram is a generic diagram of its shape.
• “Any diagram of its shape” includes diagrams in which some of the nodes may represent the same object. An extreme example is the graph in which every node is an object $\mathsf{E}$ and every arrow is its identity arrow. The diagram below is nevertheless an example of the Subject Diagram:
• Shapes of diagrams are defined properly in Section 2.3 of
GBLS and in Section 4.1 of Category Theory for Computing Science.

The description of the Subject Diagram

Diagram SDD below shows the Subject Diagram as the limit of its description. The description is the black diagram.

Diagram SDD

Definition of $\mathsf{ar}_2$

The object $\mathsf{ar}_2$ of composable pairs of arrows is defined as a pullback:

In the usual categorical notation this would be shown as

This makes use of the fact that the unnamed blue arrow is induced by the other two projection arrows. In the rest of the post, projection arrows that are induced are normally omitted.

An enrichment of the description

Because $\mathsf{ar}_2$ is defined as a pullback, we can enrich the description of Diagram SDD by adjoining two pullbacks as shown below. This is Diagram 8.10 in GBLS. The enriched diagram has the same limit as the description of Diagram SDD.

Enriched Diagram SDD

Note that the projections from the limit to the two occurrences of $\mathsf{ar}_2$ induce all the other projections. This follows by diagram chasing; remember that the description must be a commutative diagram.

Make the triangles commute

To make the triangles commute, we add two comp arrows to the enriched diagram as shown below. These two arrows are not induced by the description; they are therefore additions to the description — they describe a more restrictive (green) diagram with commutative triangles and so are shown in black.

Diagram TC: The triangles commute

The left comp makes $xf=h$ and the right comp makes $kx=g$.

The outside square commutes

Now we enrich Diagram TC with four objects, <comp,id>, <id,comp> and three comp arrows as shown in bolder black. These objects and arrows already exist in $\mathbf{ThCat}$ and therefore do not change the limit, which must be the same as the limit of Diagram TC.

The outside square commutes

The diagram in bold black is exactly the commutative diagram that requires associativity for these particular objects and arrows, which immediately implies that $gf=kh$, as the Theorem requires.

By the definitions of $\mathsf{ar_2}$ and $\mathsf{ar_3}$, the part of the description in bold black induces the rest of the diagram. Omitting the rest of the diagram would make $\mathsf{ar_2}$ and $\mathsf{ar_3}$ modules in the sense of GBLS, Chapter 7.4. Modules would be vital to deal with proofs more complicated than the one given here.

Send to Kindle

Sets

I have been working my way through abstractmath.org, revising the articles and turning them into pure HTML so they will be easier to update. In some cases I am making substantial revisions. In particular, many of the articles need a more modern point of view.

The math community’s understanding of sets and structures has changed because of category theory and will change
because of homotopy type theory.

This post considers some issues and possibilities concerning the chapter on sets.

The references listed at the end of the article include several about homotopy type theory. They provide different viewpoints and require different levels of sophistication.

A specification of the concept of set

The abmath article Specification of sets specifies what a set is in this way:

A set is a single math object distinct from but completely determined by what its elements are.

I have used this specification for sets since the eighties, first in my Discrete Math lecture notes and then in abstractmath.org. It has proved useful because it is quite simple and the statement implies lots of immediate consequences. Each of the first four consequences in this list below exposes a confusion that some students have.

Consequences of the specification

1. A set is a math object. It has the same status as the number “$143$” and the sine function and the real line: they are all objects of math. A set is not merely a typographically convenient way to define a certain collection of things.
2. A set is a single object. Many beginners seem to have in their head that the set $\{3,4\}$ is two things.
3. A set is distinct from its elements. The set $\{3,4\}$ is not $3$, it is not $4$, it is not a number at all.
4. The spec implies that $\{3,4\}$ is the same set as $\{4,3\}$. Some students think they understand this but some of their mistakes show that they don’t really understand it.
5. On the other hand, $\{3,5\}$ is a different set from $\{3,4\}$. I haven’t noticed this bothering students but it bothers me. See the discussion on ursets below.

Those consequences make the spec a useful teaching tool. But if a beginning abstract math student gets very far in their studies, some complications come up.

Defining “set”

In the late nineteenth century, math people started formally defining particular math structures such as groups and various
kinds of spaces. This was normally done by starting with a set and adding structure.

You may think that “starting with a set and adding structure” brushes a lot of complications under the rug. Well, don’t look under the rug, at least not right now.

The way they thought about sets was a informal version of what is now called naive set theory. In particular, they freely defined particular sets using what is essentially setbuilder notation, producing sets in a way which (I claim) satisfies my specification.

Bertrand Russell wakes everyone up

Then along came Russell’s paradox. In the context of this discussion, the paradox implied that the spec for sets is not a definition.The spec provides a set of necessary conditions for being a set. But it is not sufficient. You can say “Let $S$ be the set of all sets that…[satisfy some condition]” until you are blue in the face, but there are conditions (including the empty condition) that don’t define a set.

The Zermelo-Fraenkel axioms

The Zermelo-Fraenkel axioms were designed to provide a definition that didn’t create contradictions. The axioms accomplish this by creating a sort of hierarchy that requires that each set must be defined in terms of sets defined previously. They provide a good way (but not the only one) of providing a way of legitimizing our use of sets in math.

Observe that the “set of all sets” is certainly not “defined” in terms of previously defined sets!

Sets as a foundation

During those days there was a movement to provide a solid foundation for mathematics. After Zermelo-Fraenkel came along, the progress of thinking seemed to be:

1. Sets are in trouble.
2. Zermelo-Fraenkel solves our set difficulties.
3. So let’s require that every math object be a set.

That list is oversimplified. In particular, the development of predicate logic was essential to this approach, but I can’t write about everything at once.

This leads to monsters such as the notorious definition of ordered pair:

The ordered pair $(a,b)$ is the set $\{a,\{b\}\}$.

This leads to the ludicrous statement that $a$ is an element of $(a,b)$ but that $b$ is not.

By saying every math object may be modeled as a set with structure, ZF set theory becomes a model of all of math. This approach gives a useful proof that all of math is as consistent as ZF set theory is.

But many mathematicians jumped to the conclusion that every math object must be a set with structure. This approach does not match the way mathematicians think about math objects. In particular, it makes computerized proof assistance hard to use because you have to translate your thinking into sets and first order logic.

Sets by category theory

“A mathematical object is determined by the role it plays in a category.” — A. Grothendieck

In category theory, you define math structures in terms of how they relate to other math structures. This shifts the emphasis from

What is it?

to

What are its properties?

For example, an ordered pair is a mathematical object $p$ determined by these properties:

• It determines mathematical objects $p_1$ and $p_2$.
• $p$ is completely determined by what $p_1$ is and what $p_2$ is.
• If $p$ and $q$ are ordered pairs and $p_1=q_1$ and $p_2=q_2$ then $p=q$.

Categorical definition of set

“Categorical” here means “as understood in category theory”. It unfortunately has a very different meaning in model theory (set of axioms with only one model up to isomorphism) and in general usage, as in “My answer is categorically NO” said by someone who is red in the face. The word “categorial” has an entirely different meaning in linguistics. *Sigh*.

William Lawvere has produced an axiomatization of the category of sets.
The most accessible introduction to it that I know of is the article Rethinking set theory, by Tom Leinster. This axiomatization defines sets by their relationship with each other and other math objects in much the same way as the categorical definition of (for example) groups gives a definition of groups that works in any category.

“Set” means two different things

The word set as used informally has two different meanings.

• According to my specification of sets, $\{3,4\}$ is a set and so is $\{3,5\}$.
• $\{3,4\}$ and $\{3,5\}$ are not the same set because they don’t have the same elements.
• But in the category of sets, any two $2$-element sets are isomorphic. (So are any two seven element sets.)
• From a categorical point of view, two isomorphic objects in a category can be be thought of as the same object, with a caveat that you have better make it clear which isomorphism you are thinking of.

One of the great improvements in mathematics that homotopy type theory supplies is a systematic way of keeping track of the isomorphisms, the isomorphisms between the isomorphisms, and so on ad infinitum (literally). But note: I am just beginning to understand htt, so regard this remark as something to be suspicious of.

• But $\{3,4\}$ and $\{3,5\}$ may not be thought of as the same object according to the spec I gave, because they don’t have the same elements.
• This means that the traditional idea of set is not the same as the strict categorical idea of set.

I suggest that we keep the word “set” for the traditional concept and call the strict categorical concept an urset.

A traditional set is a structure on an urset

The traditional set $\{3,5\}$ consists of the unique two-element urset coindexed on the integers.

A (ur)set $S$ coindexed by a math structure $A$ is a monic map from $S$ to the underlying set of $A$. In this example, the map has codomain the integers and takes one element of the two-element urset to $3$ and the other to $5$.

Note added 2014-10-05 in response to Toby Bartels’ comment: I am inclined to use the names “abstract set” for “urset” and “concrete set” for coindexed sets when I revise the articles on sets. But most of the time we can get away with just “set”.

There is clearly no isomorphism of coindexed sets from $\{3,4\}$ to $\{3,5\}$, so those two traditional sets are not equal in the category of coindexed sets.

I made up the phrase “coindexed set” to use in this sense, since it is a kind of opposite of indexed set. If terminology for this already exists, lemme know. Linguists will tell you they use the word “coindexed” in a different sense.

Elements

The concept of “element” in categorical thinking is very different from the traditional idea, where an element of a set can be any mathematical object. In categorical thinking, an element of an object $A$ of a category $\mathbf{C}$ is an arrow $1\to A$ where $1$ is the terminal object. Thus $4$ as an integer is the arrow $1\to \mathbb{Z}$ whose unique value is the number $4$.

An object is an element of only one set

In the usage of category theory, the arrow $1\to\mathbb{R}$ whose value is the real number $4$ is a different math object from the arrow $1\to\mathbb{Z}$ whose value is the integer $4$.

A category theorist will probably agree that we can identify the integer $4$ with the real number $4$ via the well known canonical embedding of the ring of integers into the field of real numbers. But in categorical thinking you have to keep all such embeddings in mind; you don’t say the integer $4$ is the same thing as the real number $4$. (Most computer languages keep them distinct, too.)

This difference is actually not hard to get used to and is in fact an improvement over traditional set theory. When you do category theory you use lots of commutative diagrams. The embeddings show up as monic arrows and are essential in keeping the different objects ($\mathbb{Z}$ and $\mathbb{R}$ in the example) separate.

The paper Relating first-order set theory and elementary toposes, by Awodey, Butz, Simpson and Streicher, introduces a concept of “structural system of inclusions” that appears to me to restore the idea of object being an element of more than one set for many purposes.

Homotopy type theory allows an object to have only one type, with much the same effect as in the categorical approach.

Variable elements

The arrow $1\to \mathbb{Z}$ that picks out the integer $4$ is a constant function. It is useful to think of any arrow $A\to B$ of any category as a variable element (or generalized element) of the object $B$. For example, the function $f:\mathbb{R}\to \mathbb{R}$ defined by $f(x)=x^2$ allows you to
think of $x^2$ as a variable number with real parameter. This is another way of thinking about the “$y$” in the equation $y=x^2$, which is commonly called a dependent variable.

One way to think about $y$ is that some statements about it are true, some are false, and many statements are neither true nor false.

• $y\geq 0$ is true.
• $y\lt0$ is false.
• $y\leq1$ is neither true nor false.

This way of thinking about variable objects clears up a lot of confusion about variables and deserves to be more widely used in teaching.

The book Category theory for computing science provides some examples of the use of variable elements as a way of thinking about categorical ideas.

References

< ![endif]>


.

Send to Kindle

A proof by diagram chasing

In Rigorous proofs, I went through the details of a medium-easy epsilon-delta proof in great detail as a way of showing what is hidden by the wording of the proof. In this post, I will do the same for an easy diagram-chasing proof in category theory. This theorem is stated and proved in Category Theory for Computing Science, page 365, but the proof I give here maximizes the diagram-chasing as a way of illustrating the points I want to make.

Theorem (J. Lambek) Let $F$ be a functor from a category to itself and let $\alpha:Fa\to a$ be an algebra for $F$ which is initial. Then $\alpha$ is an isomorphism.

Proof

1. $F\alpha:FFa\to Fa$ is also an $F$-algebra.
2. Initiality means that there is a unique algebra morphism $\eta:a\to Fa$ from $\alpha:Fa\to a$ to $F\alpha:FFa\to Fa$ for which this diagram commutes:

3. To that diagram we can adjoin another (obviously) commutative square:

4. Then the outside rectangle in the diagram above also commutes.
5. This means that $\alpha\circ\eta:a\to a$ is an $F$-algebra morphism from $\alpha:Fa\to a$ to itself.
6. Another such $F$-algebra morphism is $\text{id}_{A}$.
7. Initiality of $\alpha$ means that the diagram below commutes:

8. Because the upper bow and the left square both commute we are justified in inserting a diagonal arrow as below.

9. Now we can read off the diagram that $F\alpha\circ F(\eta)=\text{id}_{Fa}$ and $\eta\circ\alpha=\text{id}_a$. By definition, then, $\eta$ is a two-sided inverse to $\alpha$, so $\alpha$ is an isomorphism.

Analysis of the proof

This is an analysis of the proof showing what is not mentioned in the proof, similar to the analysis in Rigorous proofs.

• An $F$-algebra is any arrow of the form $\alpha:Fa\to a$. This definition directly verifies statement (1). You do need to know the definition of “functor” and that the notation $Fa$ means $F(a)$ and $FFa$ means $F(F(a))$.
• When I am chasing diagrams, I visualize the commutativity of the diagram in (2) by thinking of the red path and the blue path as having the same composites in this graph:

In other words, $F\alpha\circ F\eta=\eta\circ\alpha$. Notice that the diagram carries all the domain and codomain information for the arrows, whereas the formula “$F\alpha\circ F\eta=\eta\circ\alpha$” requires you to hold the domains and codomains in your head.

• (Definition of morphism of $F$-algebra) The reader needs to know that a morphism of $F$ algebras is any arrow $\delta:c\to d$ for which

commutes.
• (Definition of initial $F$-algebra) $\alpha$ is an initial $F$-algebra means that for any algebra $\beta:Fb\to b$, there is a unique arrow $\delta$ for which the diagram above commutes.
• (2) is justified by the last two definitions.
• Pulling a “rabbit out of a hat” in a proof means introducing something that is obviously correct with no motivation, and then checking that it results in a proof. Step (9) in the proof given in Rigorous proofs has an example of adding zero cleverly. It is completely OK to pull a rabbit out of a hat in a proof, as long as the result is correct, but it makes students furious.
• In statement (3) of the proof we are considering here, the rabbit is the trivially commutative diagram that is adjoined on the right of the diagram from (2).
• Statement (4) uses a fact known to all diagram chasers: Two joined commutative squares make the outside rectangle commute. You can visualize this by seeing that the three red paths shown below all have the same composite. When I am chasing a complicated diagram I trace the various paths with my finger, or in my head.

You could also show it by pointing out that $\alpha\circ F\alpha\circ F\eta=\alpha\circ\eta\circ\alpha$, but to check that I think most of us would go back and look at the diagram in (3) to see why it is true. Why not work directly with the diagram?

• The definition of initiality requires that there be only one $F$-algebra morphism from $\alpha:Fa\to a$ to itself. This means that the upper and lower bows in (7) commute.
• The diagonal identity arrow in (8) is justified by the fact that the upper bow is exactly the same diagram as the upper triangular diagram in (8). It follows that the upper triangle in (8) commutes. I visualize this as moving the bow down and to the left with the upper left node $Fa$ as a hinge, so that the two triangles coincide. (It needs to be flipped, too.) I should make an interactive diagram that shows this.
• The lower triangle in (8) also commutes because the square in (2) is given to be commutative.
• (Definition of isomorphism in a category) An arrow $f:a\to b$ in a category is an isomorphism if there is an arrow $g:b\to a$ for which these diagrams commute:

xx

This justifies statement (9).

Remark: I have been profligate in using as many diagrams as I want because this can be seen on a screen instead of on paper. That and the fact that much more data about domains and codomains are visible because I am using diagrams instead of equations involving composition means that the proof requires the readers to carry much less invisible data in their heads.

Send to Kindle

A mathematical saga

This post outlines some of the intellectual developments in the history of math. I call it a saga because it is like one:

• It is episodic, telling one story after another.
• It does not try to give an accurate history, but its episodes resemble what happened in math in the last 3000 years.
• It tells about only a few of the things that happened.

Early techniques

We represented numbers by symbols.

Thousands of years ago, we figured out how to write down words and phrases in such a way that someone much later could read and understand them.

Naturally, we wanted to keep records of the number of horses the Queen owned, so we came up with various notations for numbers (number representing count). In some societies, these symbols were separate from the symbols used to represent words.

We invented algorithms

We discovered positional notation. We write $213$, which is based on a system: it means $2\times100+1\times10+3\times 1$. This notation encapsulates a particular computation of a number (its base-10 representation). (The expression $190+23$ is another piece of notation that encapsulates a computation that yields $213$.)

Compare that to the Roman notation $CCXIII$, which is an only partly organized jumble.
Try adding $CCXIII+CDXXIX$. (The answer is $DCXLII$.)

Positional notation allowed us to create the straightforward method of addition involving adding single digits and carrying:
$\overset{\hspace{6pt}1\phantom{1}} {\frac {\overset{\displaystyle{213}}{429}}{642} }$
Measuring land requires multiplication, which positional notation also allows us to perform easily.
The invention of such algorithms (methodically manipulating symbols) made it easy to calculate with numbers.

Geometry: Direct construction of mathematical objects

We discovered geometry in ancient times, in laying out plots of land and designing buildings. We had a bunch of names for different shapes and for some of them we knew how to calculate their area, perimeter and other things.

Euclid showed how to construct new geometric figures from given ones using specific methods (ruler and compasses) that preserve some properties.

Example

We can bisect a line segment (black) by drawing two circles (blue) centered at the endpoints with radius the length of the line segment. We then construct a line segment (red) between the points of intersection of the circle that intersects the given line segment at its midpoint. These constructions can be thought of as algorithms creating and acting on geometric figures rather than on symbols.

It is true that diagrams were drawn to represent line segments, triangles and so on.
But the diagrams are visualization helpers. The way we think about the process is that we are operating directly on the geometric objects to create new ones. We are thinking of the objects Platonically, although we don’t have to share Plato’s exact concept of their reality. It is enough to say we are thinking about the objects as if they were real.

Axioms and theorems

Euclid came up with the idea that we should write down axioms that are true of these figures and constructions, so that we can systematically use the constructions
to prove theorems about figures using axioms and previously proved theorems. This provided documented reasoning (in natural language, not in symbols) for building up a collection of true statements about math objects.

Example

After creating some tools for proving triangles are congruent, we can prove the the intersection of red and black lines in the figure really is the midpoint of the black line by constructing the four green line segments below and making appeals to congruences between the triangles that show up:

Note that the green lines have the same length as the black line.

Euclid thought about axioms and theorems as applying to geometry, but he also proved theorems about numbers by representing them as ratios of line segments.

Algebra

People in ancient India and Greece knew how to solve linear and quadratic equations using verbal descriptions of what you should do.
Later, we started using a symbolic language to express numerical problems and symbolic manipulation to solve (some of) them.

Example

The quadratic formula is an encapsulated computation that provides the roots of a quadratic equation. Newton’s method is a procedure for finding a root of an arbitrary polynomial. It is recursive in the loose sense (it does not always give an answer).

The symbolic language is a vast expansion of the symbolic notation for numbers. A major innovations was to introduce variables to represent unknowns and to state equations that are always true.

Logic

Aristotle developed an early form of logic (syllogisms) aimed at determining which arguments in rhetoric were sound. “All men are mortal. Socrates is a man. Therefore Socrates is mortal.” This was written in sentences, not in symbols.

By explicit analogy with algebra, we introduced symbolism and manipulation rules for logical reasoning, with an eye toward making mathematical reasoning sound and to some extent computable. For example, in one dialect of logical notation, modus ponens (used in the Socrates syllogism) is expressed as $(P\rightarrow Q,\,P)\,\,\vdash\,\, Q$. This formula is an encapsulated algorithm: it says that if you know $P\rightarrow Q$ and $P$ are valid (are theorems) then $Q$ is valid as well.

Crises of understanding

We struggled with the notion of function as a result of dealing with infinite series. For example, the limit of a sequence of algebraic expressions may not be an algebraic expression. It would no longer do to think of a function as the same thing as an algebraic expression.

We realized that Euclid’s axioms for geometry lacked clarity. For example, as I understand it, the original version of his axioms didn’t imply that the two circles in the proof above had to intersect each other. There were other more subtle problems. Hilbert made a big effort to spell out the axioms in more detail.

We refined our understanding of logic by trying to deal with the mysteries of calculus, limits and spaces. An example is the difference between continuity and uniform continuity.
We also created infinitesimals, only to throw up our hands because we could not make a logic that fit them. Infinitesimals were temporarily replaced by the use of epsilon-delta methods.

We began to understand that there are different kinds of spaces. For example, there were other models of some of Euclid’s axioms than just Euclidean space, and some of those models showed that the parallel axiom is independent of the other axioms. And we became aware of many kinds of topological spaces and manifolds.

We started to investigate sets, in part because spaces have sets of points. Then we discovered that a perfectly innocent activity like considering the set of all sets resulted in an impossibility.

We were led to consider how to understand the Axiom of Choice from several upsetting discoveries. For example, the Banach-Tarski “paradox” implies that you can rearrange the points in a sphere of radius $1$ to make two spheres of radius $1$.

Mathematics adopts a new covenant… for awhile

These problems caused a kind of tightening up, or rigorizing.
For a period of time, less than a century, we settled into a standard way of practicing research mathematics called new math or modern math. Those names were used mostly by math educators. Research mathematicians might have called it axiomatic math based on set theory. Although I was around for the last part of that period I was not aware of any professional mathematicians calling it anything at all; it was just what we did.

First, we would come up with a new concept, type of math object, or a new theorem. In this creative process we would freely use intuition, metaphors, images and analogies.

Example

We might come up with the idea that a function reaches its maximum when its graph swoops up from the left, then goes horizontally for an infinitesimal amount of time, then swoops down to the right. The point at which it was going horizontally would obviously have to be the maximum.

But when we came to publish a paper about the subject, all these pictures would disappear. All our visual, metaphorical/conceptual and kinetic feelings that explain the phenomenon would have to be suppressed.

Rigorizing consisted of a set of practices, which I will hint at:

Orthodox behavior among mathematicians in 1950

Definition in terms of sets and axioms

Each mathematical object had to be defined in some way that started with a set and some other data defined in terms of the set. Axioms were imposed on these data. Everything had to be defined in terms of sets, including functions and relations. (Multiple sets were used occasionally.)

Definitions done in this way omit a lot of the intuition that we have concerning the object being defined.

Examples
• The definition of group as a set with a binary operation satisfying some particular axioms does not tell you that groups constitute the essence of symmetry.
• The definitions of equivalence relation and of partition do not even hint that they define the same concept.

Even so, definitions done in this way have an advantage: They tend to be close to minimal in the sense that to verify that something fits the definition requires checking no more (or not much more) than necessary.

Proofs had to be clearly translatable into symbolic logic

First order logic (and other sorts of logic) were well developed and proofs were written in a way that they could in principle be reduced to arguments written in the notation of symbolic logic and following the rules of inference of logic. This resulted in proofs which did not appeal to intuition, metaphors or pictures.

Example

In the case of the theorem that the maximum of a (differentiable) function occurs only where the derivative is zero, that meant epsilon-delta proofs in which the proof appeared as a thick string of symbols. Here, “thick” means it had superscripts, subscripts, and other things that gave the string a fractal dimension of about $1.2$ (just guessing!).

Example

When I was a student at Oberlin College in 1959, Fuzzy Vance (Elbridge P. Vance) would sometimes stop in the middle of an epsilon-delta proof and draw pictures and provide intuition. Before he started that he would say “Shut the door, don’t tell anyone”. (But he told us!)

Example

A more famous example of this is the story that Oscar Zariski, when presenting a proof in algebraic geometry at the board, would sometimes remind himself of a part of a proof by hunching over the board so the students couldn’t see what he was doing and drawing a diagram which he would immediately erase. (I fault him for not telling them about the diagram.)

It doesn’t matter whether this story is true or not. It is true in the sense that any good myth is true.

Commercial

I wrote about rigor in these articles:

Rigorous view in abstractmath.org.

Dry bones, post in this blog.

Logic and sets clarify but get in the way

The orthodox method of “define it by sets and axioms” and “makes proofs at least resemble first order logic” clarified a lot of suspect proofs. But it got in the way of intuition and excessive teaching by using proofs made it harder to students to learn.

• The definition of a concept can make you think of things that are foreign to your intuition of the concept. A function is a mapping,. The ordered pairs are a secondary construction; you should not think of ordered pairs as essential to your intuition. Even so the definition of function in terms of ordered pairs got rid of a lot of cobwebs.
• The cartesian product of sets is obviously an associative binary operation. Except that if you define the cartesian product of sets in terms of ordered pairs then it is not associative.
• Not only that, but if you define the ordered pair $(a,b)$ as $\{\{a,b\},a\}$ the you have to say that $a$ is an element of $(a,b)$ but $b$ is not That is not merely an inconvenient definition of ordered pair, it is wrong. It is not bad way to show that the concept of ordered pair is consistent with ZF set theory, but that is a minor point mathematicians hardly ever worry about.

Mathematical methods applied to everything

The early methods described at the beginning of this post began to be used everywhere in math.

Algorithms on symbols

Algorithms, or methodical procedures, began with the addition and multiplication algorithms and Euclid’s ruler and compass constructions, but they began to be used everywhere.

They are applied to the symbols of math, for example to describe rules for calculating derivatives and integrals and for summing infinite series.

Algorithms are used on strings, arrays and diagrams of math symbols, for example concatenating lists, multiplying matrices, and calculating binary operations on trees.

Algorithms as definitions

Algorithms are used to define the strings that make up the notation of symbolic logic. Such definitions include something like: “If $E$ and $F$ are expressions than $(E)\land (F)$ and $(\forall x)(E)$ are expressions”. So if $E$ is “$x\geq 3$” then $(\forall x)(x\geq 3)$ is an expression. This had the effect of turning an expression in symbolic logic into a mathematical object. Deduction rules such as “$E\land F\vdash E$” also become mathematical objects in this way.

We can define the symbols and expressions of algebra, calculus, and other part of math using algorithms, too. This became a big deal when computer algebra programs such as Mathematica came in.

Example

You can define the set $RP$ of real polynomials this way:

• $0\in RP$
• If $p\in RP$ then $p+r x^n\in RP$, where $x$ is a variable and $n$ a nonnegative integer.

That is a recursive definition. You can also define polynomials by pattern recognition:

Let $n$ be a positive integer, $a_0,\,a_1\,\ldots a_n$ be real numbers and $k_0,\,k_1\,\ldots k_n$ be nonnegative integers. Then $a_0 x^{k_0}+a_1 x^{k_1}+\ldots+ a_n x^{k_n}$ is a polynomial.

The recursive version is a way of letting a compiler discover that a string of symbols is a polynomial. That sort of thing became a Big Deal when computers arrived in our world.

Algorithms on mathematical objects

I am using the word “algorithm” in a loose sense to mean any computation that may or may not give a result. Computer programs are algorithms, but so is the quadratic formula. You might not think of a formula as an algorithm, but that is because if you use it in a computer program you just type in the formula; the language compiler has a built-in algorithm to execute calculations given by formulas.

It has not been clearly understood that mathematicians apply algorithms not only to symbols, but also directly to mathematical objects. Socrates thought that way long ago, as I described in the construction of a midpoint above. The procedure says “draw circles with center at the endpoints of the line segment.” It doesn’t say “draw pictures of circles…”

In the last section and this one, I am talking about how we think of applying an algorithm. Socrates thought he was talking about ideal lines and circles that exist in some other universe that we can access by thought. We can think about them as real things without making a metaphysical claim like Socrates did about them. Our brains are wired to think of abstract ideas in some many of the same ways we think about physical objects.

Example

The unit circle (as a topological space at least) is the quotient space of the space $\mathbb{R}$ of real numbers mod the equivalence relation defined by: $x\sim y$ if and only if $x-y$ is an integer.

Mathematicians who understand that construction may have various images in their mind when they read this. One would be something like imagining the real line $\mathbb{R}$ and then gluing all the points together that are an integer apart. This is a distinctly dizzying thing to think about but mathematicians aren’t worried because they know that taking the quotient of a space is a well-understood construction that works. They might check that by imagining the unit circle as the real line wrapped around an infinite number of times, with points an integer apart corresponding to the same point on the unit circle. (When I did that check I hastily inserted the parenthetical remark saying “as a topological space” because I realized the construction doesn’t preserve the metric.) The point of this paragraph is that many mathematicians think of this construction as a construction on math objects, not a construction on symbols.

Everything is a mathematical object

A lot of concepts start out as semi-vague ideas and eventually get defined as mathematical objects.

Examples

• A function was originally thought of as a formula, but then get formalized in the days of orthodoxy as a set of ordered pairs with the functional property.
• The concept of equation has been turned into a math object many times, for example in universal algebra and in logic. I suspect that some practitioners in those fields might disagree with me. This requires further research.
• Propositions are turned into math objects by Boolean Algebra.
• Perhaps numbers were always thought of as math objects, but much later the set $\mathbb{N}$ of all natural numbers and the set $\mathbb{R}$ of all real numbers came to be thought of explicitly as math objects, causing some mathematicians to have hissy fits.
• Definitions are math objects. This has been done in various ways. A particular theory is a mathematical object, and it is essentially a definition by definition (!): Its models are what the theory defines. A particular example of “theory” is first-order theory which was the gold standard in the orthodox era. A classifying topos is also a math object that is essentially a definition.

Category Theory

The introduction of categories broke the orthodoxy of everything-is-a-set. It has become widely used as a language in many branches of math. It started with problems in homological algebra arising in at least these two ways:

• Homotopy classes of continuous functions are not functions in the set theory sense. So we axiomatized the concept of function as an arrow (morphism) in a category.
• The concept of mathematical object is axiomatized as an object in a category. This forces all properties of an object to be expressed in terms of its external relations with other objects and arrows.
• Categories capture the idea of “kind of math”. There is a category of groups and homomorphisms, a category of topological spaces and homeomorphisms, and so on. This is a new level of abstraction. Before, if someone said “I work in finite groups”, their field was a clear idea and people knew what they were talking about, but now the category of finite groups is a mathematical object.
• Homology maps one kind of math (topology) into another kind (algebra). Since categories capture the general notion of “kind of math”, we invented the idea of functor to capture the idea of modeling or representing one branch of math in another one. So Homology became a mathematical object.
• The concept of functor allowed the definition of natural transformation as a mathematical object. Before categories, naturality was only an informal idea.

Advantages of category theory

• Categories, in the form of toposes, quickly became candidates to replace set theory as a foundation system for math. They are more flexible and allow the kind of logic you want to use (classical, intuitionistic and others) to be a parameter in your foundational system.
• “Arrow” (morphism) replaced not only the concept of function but also the concept of “element of” (“$\in$”). It allows the concept of variable elements. (This link is to a draft of a section of abstractmath.org that has not been included in the table of contents yet.) It also requires that an “element” has to be an element of one single object; for example, the maps $1\to \mathbb{N}$ and $1\to \mathbb{R}$ that pick out the number $42$ are not the same maps, although of course they are related by the canonical inclusion map $\mathbb{N}\to\mathbb{R}$.
• Diagrams are used in proofs and give much better immediate understanding than formulas written in strings, which compress a lot of things unnecessarily into thick strings that require understanding lots of conventions and holding things in your memory.
• Categories-as-kinds-of-math makes it easy to turn an analogy, for example between products of groups and products of spaces, into two examples of the same kind of mathematical object: Namely, a product in a category.

Disadvantages of category theory

• Category theory requires a new way of thinking. Some people think that is a disadvantage. But genuine innovation is always disruptive. New technology breaks old technology. Of course, the new technology has to turn out to be useful to win out.
• Category theory has several notions of “equal”. Objects can be the same or isomorphic. Categories can be isomorphic or equivalent. When you are doing category theory, you should never worry about whether two objects are equal: that is considered evil. Category theorists generally ignored the fuzziness of this problem because you can generally get away with it. Still, it was an example of something that had not been turned into a mathematical definition. Two ways of accomplishing this are anafunctors and homotopy type theory.

I expect to write about homotopy type theory soon. It may be the Next Revolution.

Send to Kindle

Naming mathematical objects

Commonword names confuse

Many technical words and phrases in math are ordinary English words ("commonwords") that are assigned a different and precisely defined mathematical meaning.

• Group  This sounds to the "layman" as if it ought to mean the same things as "set".  You get no clue from the name that it involves a binary operation with certain properties.
• Formula  In some texts on logic, a formula is a precisely defined expression that becomes a true-or-false sentence (in the semantics) when all its variables are instantiated.  So $(\forall x)(x>0)$ is a formula.  The word "formula" in ordinary English makes you think of things like "$\textrm{H}_2\textrm{O}$", which has no semantics that makes it true or false — it is a symbolic expression for a name.
• Simple group This has a technical meaning: a group with no nontrivial normal subgroup.  The Monster Group is "simple".  Yes, the technical meaning is motivated by the usual concept of "simple", but to say the Monster Group is simple causes cognitive dissonance.

Beginning students come with the (generally subconscious) expectation that they will pick up clues about the meanings of words from connotations they are already familiar with, plus things the teacher says using those words.  They think in terms of refining an understanding they already have.  This is more or less what happens in most non-math classes.  They need to be taught what definition means to a mathematician.

Names that don't confuse but may intimidate

Other technical names in math don't cause the problems that commonwords cause.

Named after somebody The phrase "Hausdorff space" leads a math student to understand that it has a technical meaning.  They may not even know it is named after a person, but it screams "geek word" and "you don't know what it means".  That is a signal that you can find out what it means.  You don't assume you know its meaning.

New made-up words  Words such as "affine", "gerbe"  and "logarithm" are made up of words from other languages and don't have an ordinary English meaning.  Acronyms such as "QED", "RSA" and "FOIL" don't occur often.  I don't know of any math objects other than "RSA algorithm" that have an acronymic name.  (No doubt I will think of one the minute I click the Publish button.)  Whole-cloth words such as "googol" are also rare.  All these sorts of words would be good to name new things since they do not fool the readers into thinking they know what the words mean.

Both types of words avoid fooling the student into thinking they know what the words mean, but some students are intimidated by the use of words they haven't seen before.  They seem to come to class ready to be snowed.  A minority of my students over my 35 years of teaching were like that, but that attitude was a real problem for them.

Audience

You can write for several different audiences.

Math fans (non-mathematicians who are interested in math and read books about it occasionally) In my posts Explaining higher math to beginners and in Renaming technical conceptsI wrote about several books aimed at explaining some fairly deep math to interested people who are not mathematicians.  They renamed some things. For example, Mark Ronan in Symmetry and the Monster used the phrase "atom" for "simple group" presumably to get around the cognitive dissonance.  There are other examples in my posts.

Math newbies  (math majors and other students who want to understand some aspect of mathematics).  These are the people abstractmath.org is aimed at. For such an audience you generally don't want to rename mathematical objects. In fact, you need to give them a glossary to explain the words and phrases used by people in the subject area.

Postsecondary math students These people, especially the math majors, have many tasks:

• Gain an intuitive understanding of the subject matter.
• Understand in practice the logical role of definitions.
• Learn how to come up with proofs.
• Understand the ins and outs of mathematical English, particularly the presence of ordinary English words with technical definitions.
• Understand and master the appropriate parts of the symbolic language of math — not just what the symbols mean but how to tell a statement from a symbolic name.

It is appropriate for books for math fans and math newbies to try to give an understanding of concepts without necessary proving theorems.  That is the aim of much of my work, which has more an emphasis on newbies than on fans. But math majors need as well the traditional emphasis on theorem and proof and clear correct explanations.

Lately, books such as Visual Group Theory have addressed beginning math majors, trying for much more effective ways to help the students develop good intuition, as well as getting into proofs and rigor. Visual Group Theory uses standard terminology.  You can contrast it with Symmetry and the Monster and The Mystery of the Prime Numbers (read the excellent reviews on Amazon) which are clearly aimed at math fans and use nonstandard terminology.

Terminology for algebraic structures

I have been thinking about the section of Abstracting Algebra on binary operations.  Notice this terminology:

The "standard names" are those in Wikipedia.  They give little clue to the meaning, but at least most of them, except "magma" and "group", sound technical, cluing the reader in to the fact that they'd better learn the definition.

I came up with the names in the right column in an attempt to make some sense out of them.  The design is somewhat like the names of some chemical compounds.  This would be appropriate for a text aimed at math fans, but for them you probably wouldn't want to get into such an exhaustive list.

I wrote various pieces meant to be part of Abstracting Algebra using the terminology on the right, but thought better of it. I realized that I have been vacillating between thinking of AbAl as for math fans and thinking of it as for newbies. I guess I am plunking for newbies.

I will call groups groups, but for the other structures I will use the phrases in the middle column.  Since the book is for newbies I will include a table like the one above.  I also expect to use tree notation as I did in Visual Algebra II, and other graphical devices and interactive diagrams.

Magmas

In the sixties magmas were called groupoids or monoids, both of which now mean something else.  I was really irritated when the word "magma" started showing up all over Wikipedia. It was the name given by Bourbaki, but it is a bad name because it means something else that is irrelevant.  A magma is just any binary operation. Why not just call it that?

Well, I will tell you why, based on my experience in Ancient Times (the sixties and seventies) in math. (I started as an assistant professor at Western Reserve University in 1965). In those days people made a distinction between a binary operation and a "set with a binary operation on it".  Nowadays, the concept of function carries with it an implied domain and codomain.  So a binary operation is a function $m:S\times S\to S$.  Thinking of a binary operation this way was just beginning to appear in the common mathematical culture in the late 60's, and at least one person remarked to me: "I really like this new idea of thinking of 'plus' and 'times' as functions."  I was startled and thought (but did not say), "Well of course it is a function".  But then, in the late sixties I was being indoctrinated/perverted into category theory by the likes of John Isbell and Peter Hilton, both of whom were briefly at Case Western Reserve University.  (Also Paul Dedecker, who gave me a glimpse of Grothendieck's ideas).

Now, the idea that a binary operation is a function comes with the fact that it has a domain and a codomain, and specifically that the domain is the Cartesian square of the codomain.  People who didn't think that a binary operation was a function had to introduce the idea of the universe (universal algebraists) or the underlying set (category theorists): you had to specify it separately and introduce terminology such as $(S,\times)$ to denote the structure.   Wikipedia still does it mostly this way, and I am not about to start a revolution to get it to change its ways.

Groups

In the olden days, people thought of groups in this way:

• A group is a set $G$ with a binary operation denoted by juxtaposition that is closed on $G$, meaning that if $a$ and $b$ are any elements of $G$, then $ab$ is in $G$.
• The operation is associative, meaning that if $a,\ b,\ c\in G$, then $(ab)c=a(bc)$.
• The operation has a unity element, meaning an element $e$ for which for any element $a\in G$, $ae=ea=a$.
• For each element $a\in G$, there is an element $b$ for which $ab=ba=e$.

This is a better way to describe a group:

• A group consist of a nullary operation e, a unary operation inv,  and a binary operation denoted by juxtaposition, all with the same codomain $G$. (A nullary operation is a map from a singleton set to a set and a unary operation is a map from a set to itself.)
• The value of e is denoted by $e$ and the value of inv$(a)$ is denoted by $a^{-1}$.
• These operations are subject to the following equations, true for all $a,\ b,\ c\in G$:

• $ae=ea=a$.
• $aa^{-1}=a^{-1}a=e$.
• $(ab)c=a(bc)$.

This definition makes it clear that a group is a structure consisting of a set and three operations whose axioms are all equations.  It was formulated by people in universal algebra but you still see the older form in texts.

The old form is not wrong, it is merely inelegant.  With the old form, you have to prove the unity and inverses are unique before you can introduce notation, and more important, by making it clear that groups satisfy equational logic you get a lot of theorems for free: you construct products on the cartesian power of the underlying set, quotients by congruence relations, and other things. (Of course, in AbAl those theorem will be stated later than when groups are defined because the book is for newbies and you want lots of examples before theorems.)

References

1. Three kinds of mathematical thinkers (G&G post)
2. Technical meanings clash with everyday meanings (G&G post)
3. Commonword names for technical concepts (G&G post)
4. Renaming technical concepts (G&G post)
5. Explaining higher math to beginners (G&G post)
6. Visual Algebra II (G&G post)
7. Monads for high school II: Lists (G&G post)
8. The mystery of the prime numbers: a review (G&G post)
9. Hersh, R. (1997a), "Math lingo vs. plain English: Double entendre". American Mathematical Monthly, volume 104, pages 48–51.
10. Names (in abmath)
11. Cognitive dissonance (in abmath)
Send to Kindle

Idempotents by sketches and forms

This post provides a detailed description of an example of a mathematical structure presented as a sketch and as a form.  It is a supplement to my article An Introduction to forms.  Most of the constructions I mention here are given in more detail in that article.

It helps in reading this post to be familiar with the basic ideas of category, including commutative diagram and limit cone, and of the concepts of logical theory and model in logic.

Sketches and forms

sketch of a mathematical structure is a collection of objects and arrows that make up a digraph (directed graph), together with some specified cones, cocones and diagrams in the digraph.  A model of the sketch is a digraph morphism from the digraph to some category that takes the cones to limit cones, the cocones to colimit cocones, and the diagrams to commutative diagrams.  A morphism of models of a sketch from one model to another in the same category is a natural transformation.  Sketches can be used to define all kinds of algebraic structures in the sense of universal algebra, and many other types of structures (including many types of categories).

There are many structures that sketches cannot sketch.  Forms were first defined in [4].  They can define anything a sketch can define and lots of other things.  [5] gives a leisurely description of forms suitable for people who have a little bit of knowledge of categories and [1] gives a more thorough description.

An idempotent is a very simple kind of algebraic structure.  Here I will describe both a sketch and a form for idempotents. In another post I will do the same for binops (magmas).

Idempotent

An idempotent is a unary operation $u$ for which $u^2=u$.

• If $u$ is a morphism in a category whose morphisms are set functions, a function $u:S\to S$ is an idempotent if $u(u(x))=u(x)$ for all $x$ in the domain.
• Any identity element in any category is an idempotent.
• A nontrivial example is the function $u(x,y):=(-x,0)$ on the real plane.

Any idempotent $u$ makes the following diagram commute

and that diagram can be taken as the definition of idempotent in any category.

The diagram is in green.  In this post (and in [5]) diagrams in the category of models of a sketch or a form are shown in green.

A sketch for idempotents

The sketch for idempotents contains a digraph with one object and one arrow from that object to itself (above left) and one diagram (above right).  It has no cones or cocones.  So this is an almost trivial example.  When being expository (well, I can hardly say "when you are exposing") your first example should not be trivial, but it should be easy.  Let's call the sketch $\mathcal{S}$.

• The diagram looks the same as the green diagram above.  It is in black, because I am showing things in syntax (things in sketches and forms) in black and semantics (things in categories of models) in green.
• The green diagram is a commutative diagram in some category (unspecified).
• The black diagram is a diagram in a digraph. It doesn't make sense to say it is commutative because digraphs don't have composition of arrows.
• Each sketch has a specific digraph and lists of specific diagrams, cones and cocones.  The left digraph above is not in the list of diagrams of $\mathcal{S}$ (see below).

The definition of sketch says that every diagram in the official list of diagrams of a given sketch must become a commutative diagram in a model.  This use of the word "become" means in this case that a model must be a digraph morphism $M:\mathcal{S}\to\mathcal{C}$ for some category $\mathcal{C}$ for which the diagram below commutes.

This sketch generates a category called the Theory ("Cattheory" in [5]) of the sketch $\mathcal{S}$, denoted by $\text{Th}(\mathcal{S})$.  It is roughly the "smallest" category containing $f$ and $C$ for which the diagrams in $\mathcal{S}$ are commutative.

This theory contains the generic model $G:\mathcal{S}\to \text{Th}(\mathcal{S})$ that takes $f$ and $C$ to themselves.
• $G$ is "generic" because anything you prove about $G$ is true of every model of $\mathcal{S}$ in any category.
• In particular, in the category $\text{Th}(\mathcal{S})$, $G(f)\circ G(f)=G(f)$.
• $G$ is a universal morphism in the sense of category theory: It lifts any model $M:\mathcal{S}\to\mathcal{C}$ to a unique functor $\bar{M}=M\circ G:\text{Th}(\mathcal{S})\to\mathcal{C}$ which can therefore be regarded as the same model.  See Note [2].
SInce models are functors, morphisms between models are natural transformations.  This gives what you would normally call homomorphisms for models of almost any sketchable structure.  In [2] you can find a sketch for groups, and indeed the natural transformations between models are group homomorphisms.

Sketching categories

You can sketch categories with a sketch CatSk containing diagrams and cones, but no cocones.  This is done in detail in [3]. The resulting theory $\text{Th}(\mathbf{CatSk})$ is required to be the least category-with-finite-limits generated by $\mathcal{S}$ with the diagrams becoming commutative diagrams and the cones becoming limit cones.  This theory is the FL-Theory for categories, which I will call ThCat (suppressing mention of FL).

Doctrines

In general the theory of a particular kind of structure contains a parameter that denotes its doctrine. The sketch $\mathcal{S}$ for idempotents didn't require cones, but you can construct theories $\text{Th}(\mathcal{S})$, $\text{Th} (\text{FP},\mathcal{S})$ and $\text{Th}(\text{FL},\mathcal{S})$ for idempotents (FP means it is a category with finite products).

In a strong sense, all these theories have the same models, namely idempotents, but the doctrine of the theory allows you to use more mechanisms for proving properties of idempotents.  (The doctrine for $\text{Th}(\mathcal{S})$ provides for equational proofs for unary operations only, a doctrine which has no common name such as FP or FS.)  The paper [1] is devoted to explicating proof in the context of forms, using graphs and diagrams instead of formulas that are strings of symbols.

Describing composable pairs of arrows

The form for any type of structure is constructed using the FL theory for some type of category, for example category with all limits, cartesian closed category, topos, and so on.  The form for idempotents can be constructed in ThCat (no extra structure needed).  The form for reflexive function spaces (for example) needs the FL theory for cartesian closed categories (see [5]).

Such an FL theory must contain objects $\text{ob}$ and $\text{ar}$ that become the set of objects and the set of arrows of the category that a model produces.  (Since FL theories have models in any category with finite limits, I could have said "object of objects" and "object of arrows".  But in this post I will talk about only models in Set.)

ThCat contains an object  $\text{ar}_2$ that represents composable pairs of arrows.  That requires a cone to define it:

This must become a limit cone in a model.

• I usually show cones in blue.
• $\text{dom}$ and $\text{cod}$ give (in a model) the domain and codomain of an arrow.
• $\text{lfac}$ gives the left factor and $\text{rfac}$ gives the right factor. It is usually useful to give suggestive names to some of the projections in situations like this, since they will be used elsewhere (where they will be black!).
• The objects and arrows in the diagram (including $\text{ar}_2$) are already members of the FL theory for categories.
• This diagram is annotated in green with sample names of objects and arrows that might exist in a model.  Atish and I introduced that annotation system in [1] to help you chase the diagram and think about what it means.

This cone is a graph-based description of the object of composable arrows in a category (as opposed to a linguistic or string-based description).

Describing endomorphisms

Now an idempotent must be an endomorphism, so we provide a cone describing the object of endomorphisms in a category. This cone already exists in the FL theory for categories.

• $\text{loop}$ is a monomorphism (in fact a regular mono because it is the mono produced by an equalizer) so it is not unreasonable to give the element annotation for $\text{endo}$ and $\text{ar}$ the same name.
• "$\text{dc}$" takes $f$ to its domain and codomain.
• $\text{loop}$ and "$\text{dc}$" were not created when I produced the cone above.  They were already in the FL theory for categories.

Since the cone defining $\text{ar}_2$ is a limit cone (in the Theory, not in a model), if you have any other commutative cone (purple) to that cone, a unique arrow (red) $\text{diag}$ automatically is present as shown below:

This particular purple cone is the limit cone defining $\text{endo}$ just defined.  Now $\text{diag}$ is a specific arrow in the FL theory for categories. In a model of the theory (which is a category in Set or in some other category) takes an endomorphism to the corresponding pair of composable arrows.

The object of idempotents

Now using these arrows we can define the object $\text{idm}$ of idempotents using the diagram below. See Note [3].

Idm is an object in ThCat.  In any category, in other words in any model of ThCat, idm becomes the set of idempotent arrows in that category.

In the terminology of [5], the object idm is the form for idempotents, and the cone it is the limit of is the description of idempotent.

Now take ThCat and adjoin an arrow $g:1\to\text{idm}$.  You get a new FL category I will call the FL-theory of the form for idempotents.  A model of the theory of the form in Set  is a category with a specified idempotent. A particular example of a model of the form idm in the category of real linear vector spaces is the map $u(x,y):=(-x,0)$ of the (set of points of) the real plane to itself (it is an idempotent endomorphism of $\textbf{R}^2$).

This example is typical of forms and their models, except in one way:  Idempotents are also sketchable, as I described above.  Many mathematical structures can be perceived as models of forms, but not models of sketches, such as reflexive function spaces as in [5].

Notes

[1] The diagrams shown in this post were drawn in Mathematica.  The code for them is shown in the notebook SketchFormExamples.nb .  I am in the early stages of developing a package for drawing categorical diagrams in Mathematica, so this notebook shows the diagrams defined in very primitive machine-code-like Mathematica.  The package will not rival xypic for TeX any time soon.  I am doing it so I can produce diagrams (including 3D diagrams) you can manipulate.

[2] In practice I would refer to the names of the objects and arrows in the sketch rather than using the M notation:  I might write $f\circ f=f$ instead of $M(f)\circ M(f)=M(f)$ for example.  Of course this confuses syntax with semantics, which sounds like a Grievous Sin, but it is similar to what we do all the time in writing math:  "In a semigroup, $x$ is an idempotent if $xx=x$."  We use same notation for the binary operation for any semigroup and we use $x$ as an arbitrary element of most anything.  Actually, if I write $f\circ f=f$ I can claim I am talking in the generic model, since any statement true in the generic model is true in any model.  So there.

[3] In the Mathematica notebook SketchFormExamples.nb in which I drew these diagrams, this diagram is plotted in Euclidean 3-space and can be viewed from different viewpoints by running your cursor over it.

References

[1] Atish Bagchi and Charles Wells, Graph-Base Logic and Sketches, draft, September 2008, on ArXiv.

[2] Michael Barr and Charles Wells, Category Theory for Computing Science (1999). Les Publications CRM, Montreal (publication PM023).

[3] Michael Barr and Charles Wells, Toposes, Triples and Theories (2005). Reprints in Theory and Applications of Categories 1.

[4] Charles Wells, A generalization of the concept of sketch, Theoretical Computer Science 70, 1990

[5] Charles Wells, An Introduction to forms.

Send to Kindle

An Introduction to Forms

In 2009, I wrote a sequence of posts on this blog explaining the concept of form that I introduced in [1].  I have now updated and combined them into an article [2].  The posts no longer exist on the blog. The article contains links to other papers on forms.

[1] A generalization of the concept of sketch, Theoretical Computer Science 70, 1990.

Send to Kindle