Category Archives: category theory

A new kind of introduction to category theory

About this article

  • This post is an alpha version of the first part of the intended article.
  • People who are beginners in learning abstract math concepts have many misunderstandings about the definitions and early theorems of category theory.
  • This article introduces a few basic concepts of category theory. It goes into detail in Purple Prose about the misunderstandings that can arise with each of the concepts. The article is not at all a complete introduction to categories.
  • My blog post Introducing abstract topics describes some of the strategies needed in teaching a new abstract math concept.
  • This article also introduces a few examples of categories that are primarily chosen to cause the reader to come up against some of those misunderstandings. The first example is completely abstract.
  • Math students usually see categories after considerable exposure to abstract math, but students in computing science and other fields may see it without having much background in abstraction. I hope teachers in such courses will include explanations of the sort of misunderstandings mentioned in this article.
  • Like all posts in Gyre&Gimble and all posts in abstractmath.org, this article is licensed under a Creative Commons Attribution-ShareAlike 2.5 License. If you are teaching a class involving category theory, feel free to hand it out, and to modify it (in which case you should include a link to this post).
  • You could also use the article as a source of remarks you make in the class about the topics.

About categories

To be written.

Definition of category

A category is a type of Mathematical structure consisting of two types of data, whose relationships are entirely determined by some axioms. After the definition is complete, I will introduce several categories with a detailed discussion of each one, explaining how they fit the definition of category.

Axiom 1: Data

  1. A category consists of two types of data: objects and arrows.
  2. No object can be an arrow and no arrow can be an object.

Notes for Axiom 1

  • An object of a category can be any kind of mathematical object. It does not have to be a set and it does not have to have elements.
  • Arrows of a category are also called morphisms. You may be familiar with “homomorphisms”, “homeomorphisms” or “isomorphisms”, all of which are functions. This does not mean that a “morphism” in an arbitrary category is a function.

Axiom 2: Domain and codomain

  1. Each arrow has a domain and a codomain, each of which is an object of the category.
  2. The domain and the codomain of an arrow may or may not be the same object.
  3. Each arrow has only one domain and only one codomain.

Notes for Axiom 2

  • If $f$ is an arrow with domain $A$ and codomain $B$, that fact is typically shown either by the notation “$f:A\to B$” or by a diagram like this:
  • The notation “$f:A\to B$” is like that used for functions. This notation may be used in any category, but it does not imply that $f$ is a function or that $A$ and $B$ have elements.
  • For such an arrow, the notation “$\text{dom}(f)$” refers to $A$ and “$\text{cod}(f)$” refers to $B$.
  • For a given category $\mathsf{C}$, the collection of all the arrows with domain $A$ and codomain $B$ may be denoted by
    • “$\text{Hom}(A,B)$” or
    • “$\text{Hom}_\mathsf{C}(A,B)$” or
    • “$\mathsf{C}(A,B)$”.
  • Some newer books and articles in category theory use the name source for domain and target for codomain. This usage has the advantage that a newcomer to category theory will be less likely to think of an arrow as a function.

Axiom 3: Composition

  1. If $f$ and $g$ are arrows in a category for which $\text{cod}(f)=\text{dom}(g)$, as in this diagram:

    then there is a unique arrow with domain $A$ and codomain $C$ called the composite of $f$ and $g$.

Notes for Axiom 3

    diagra

  • An important metaphor for composition is: Every path of length 2 has exactly one composite.
  • The unique arrow required by Axiom 3 may be denoted by “$g\circ f$” or “$gf$”. “$g\circ f$” is more explicit, but “$gf$” is much more commonly used by category theorists.
  • Many constructions in categories may be shown by diagrams, like the one used just above.
  • The diagram

    is said to commute if $h=g\circ f$. The idea is that going along $f$ and then $g$ is the same as going along $h$.

  • It is customary in some texts in category theory to indicate that a diagram commutes by putting a gyre in the middle:
  • The concept of category is an abstraction of the idea of function, and the composition of arrows is an abstraction of the composition of functions. It uses the same notation, “$g\circ f$”. If $f$ and $g$ are set functions, then for an element $x$ in the domain of $f$, \[(g\circ f)(x)=g(f(x))\]
  • But in arbitrary category, it may make no sense to evaluate an arrow $f$ at some element $x$; indeed, the domain of $f$ may not have elements at all, and then the statement “$(g\circ f)(x)=g(f(x))$” is meaningless.

Axiom 4: Identity arrows

  1. For each object $A$ of a category, there is a unique arrow denoted by $\textsf{id}_A$.
  2. $\textsf{dom}(\textsf{id}_A)=A$ and $\textsf{cod}(\textsf{id}_A)=A$.
  3. For any object $B$ and any arrow $f:B\to A$, the diagram

    commutes.

  4. For any object $C$ and any arrow $g:A\to C$, the diagram

    commutes.

Notes for Axiom 4

  • The fact stated in Axiom 4(b) could be shown diagrammatically either as

    or as

  • Facts (c) and (d) can be written in algebraic notation: For any arrow $f$ going to $A$,\[\textsf{id}_A\circ f=f\]and for any arrow $g$ coming from $A$,\[g\circ \textsf{id}_A=g\]

Axiom 5: Associativity

  1. If $f$, $g$ and $h$ are arrows in a category for which $\text{cod}(f)=\text{dom}(g)$ and $\text{cod}(g)=\text{dom}(h)$, as in this diagram:

    then there is a unique arrow $k$ with domain $A$ and codomain $C$ called the composite of $f$, $g$ and $h$.

  2. In the diagram below, the two triangles containing $k$ must both commute.

Notes for Axiom 5

  • Axiom 5b requires that \[h\circ(g\circ f)=(h\circ g)\circ f\](which both equal $k$), which is the usual formula for associativity.
  • Note that the top two triangles commute by Axiom 3.
  • The associativity axiom means that we can get rid of parentheses and write \[k=h\circ f\circ g\]just as we do for addition and multiplication of numbers.
  • In my opinion the notation using categorical diagrams communicates information much more clearly than algebraic notation does. In particular, you don’t have to remember the domains and codomains of the functions — they appear in the picture. I admit that diagrams take up much more space, but now that we read math stuff on a computer screen instead of on paper, space is free.

Examples of categories

For the first three examples, I will give a detailed explanation about how they fit the definition of category.

Example 1: MyFin

This first example is a small, finite category which I have named $\mathsf{MyFin}$ (my very own finite category). It is not at all an important category, but it has advantages as a first example.

  • It’s small enough that you can see all the objects and arrows on the screen at once, but big enough not to be trivial.
  • The objects and arrows have no properties other than being objects and arrows. (The other examples involve familiar math objects.)
  • So in order to check that $\mathsf{MyFin}$ really obeys the axioms for a category, you can use only the skeletal information given here. As a result, you must really understand the axioms!

A correct proof will be based on axioms and theorems. The proof can be suggested by your intuitions, but intuitions are not enough. When working with $\mathsf{MyFin}$ you won’t have any intuitions!

A diagram for $\mathsf{MyFin}$

This diagram gives a partial description of $\mathsf{MyFin}$.

Now let’s see how to make the diagram above into a category.

Axiom 1

  • The objects of $\mathsf{MyFin}$ are $A$, $B$, $C$ and $D$.
  • The arrows are $f$, $g$, $h$, $j$, $k$, $r$, $s$, $u$, $v$, $w$ and $x$.
  • You can regard the letters just listed as names of the objects and arrows. The point is that at this stage all you know about the objects and arrows are their names.
  • If you prefer, you can think of the arrows as the actual arrows shown in the $\mathsf{MyFin}$ diagram.
  • Our definition of $\mathsf{MyFin}$ is an abstract definition. You may have seen multiplication tables of groups given in terms of undefined letters. (If you haven’t, don’t worry.) Those are also abstract definitions.
  • Most of our other definitions of categories involve math objects you actually know something about. They are like the definition of division, for example, where the math objects are integers.

Axiom 2

  • The domains and codomains of the arrows are shown by the diagram above.
  • For example, $\text{dom}(r)=A$ and $\text{cod}(r)=C$, and $\text{dom}(v)=\text{cod}(v)=B$.

Axiom 3

Showing the $\mathsf{MyFin}$ diagram does not completely define $\mathsf{MyFin}$. We must say what the composites of all the paths of length 2 are.

  • In fact, most of them are forced, but two of them are not.
  • We must have $g\circ f=r$ because $r$ is the only arrow possible for the composite, and Axiom 3 requires that every path of length 2 must have a composite.
  • For the same reason, $h\circ g=s$.
  • All the paths involving $u$, $v$, $w$ and $x$ are forced:

  • (p1) $u\circ u=u$, $v\circ v=v$, $w\circ w=w$ and $x\circ x=x$.
  • (p2) $f\circ u=f$, $r\circ u=r$, $j\circ u=j$ and $k\circ u=k$. You can see that, for example, $f\circ u=f$ by opening up the loop on $f$ like this:

    There is only one arrow going from $A$ to $B$, namely$f$, so $f$ has to be the composite $f\circ u$.

  • (p3) $v\circ f=f$, $g\circ v=g$ and $s\circ v=s$.
  • (p4) $w\circ g=g$, $w\circ r=r$ and $h\circ w=h$.
  • (p5) $x\circ h=h$, $x\circ s=s$, $x\circ j=j$ and $x\circ k=k$.

  • For $s\circ f$ and $h\circ r$, we have to choose between $j$ and $k$ as composites. Since $s\circ f=(h\circ g)\circ f$ and $h\circ r=h\circ (g\circ f)$, Axiom 3 requires that we must chose one of $j$ and $k$ to be both composites.

    Definition: $s\circ f=h\circ r=j$.

    If we had defined $s\circ f=h\circ r=k$ we would have a different category, although one that is “isomorphic” to $\mathsf{MyFin}$ (you have to define “isomorphic” or look it up.)

  • Axiom 4

    • It is clear from the $\mathsf{MyFin}$ diagram that for each object there is just one arrow that has that object both as domain and as codomain, as required by Axiom 4a.
    • The requirements in Axiom 4b and 4c are satisfied by statements (p1) through (p5).

    Axiom 5

    • Since we have already required both $(h\circ g)\circ f$ and $h\circ(g\circ f)$ to be $k$, composition is associative.

    Example 2: Set

    To be written.

    This will be a very different example, because it involves known mathematical objects — sets and functions. But there are still issues, for example the fact that the inclusion of $\{1,2\}$ into $\{1,2,3\}$ and the identity map on $\{1,2\}$ are two different arows in the category of sets.

    Example 3: IntegerDiv

    To be written.

    The objects are all the positive integers and there is an arrow from $m$ to $n$ if and only if $m$ divides $n$. So this example involves familiar objects and predicates, but the arrows are nevertheless not functions that take elements to elements. Integers don’t have elements. I would expect to show how the GCD of two integers is a limit.

    References

      Creative Commons License        

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle

    Very early difficulties II

    Very early difficulties II

    This is the second part of a series of posts about certain difficulties math students have in the very early stages of studying abstract math. The first post, Very early difficulties in studying abstract math, gives some background to the subject and discusses one particular difficulty: Some students do not know that it is worthwhile to try starting a proof by rewriting what is to be proved using the definitions of the terms involved.

    Math StackExchange

    The website Math StackExchange is open to any questions about math, even very easy ones. It is in contrast with Math OverFlow, which is aimed at professional mathematicians asking questions in their own field.

    Math SE contains many examples of the early difficulties discussed in this series of posts, and I recommend to math ed people (not just RUME people, since some abstract math occurs in advanced high school courses) that they might consider reading through questions on Math SE for examples of misunderstanding students have.

    There are two caveats:

    • Most questions on Math SE are at a high enough level that they don’t really concern these early difficulties.
    • Many of the questions are so confused that it is hard to pinpoint what is causing the difficulty that the questioner has.

    Connotations of English words

    The terms(s) defined in a definition are often given ordinary English words as names, and the beginner automatically associates the connotations of the meaning of the English word with the objects defined in the definition.

    Infinite cardinals

    If $A$ if a finite set, the cardinality of $A$ is simply a natural number (including $0$). If $A$ is a proper subset of another set $B$, then the cardinality of $A$ is strictly less than the cardinality of $B$.

    In the nineteenth century, mathematicians extended the definition of cardinality for infinite sets, and for the most part cardinality has the same behavior as for finite sets. For example, the cardinal numbers are well-ordered. However, for infinite sets it is possible for a set and a proper subset of the set to have the same cardinality. For example, the cardinality of the set of natural numbers is the same as the cardinality of the set of rational numbers. This phenomenon causes major cognitive dissonance.

    Question 1331680 on Math Stack Exchange shows an example of this confusion. I have also discussed the problem with cardinality in the abstractmath.org section Cardinality.

    Morphism in category theory

    The concept of category is defined by saying there is a bunch of objects called objects (sorry bout that) and a bunch of objects called morphisms, subject to certain axioms. One requirement is that there are functions from morphisms to objects choosing a “domain” and a “codomain” of each morphism. This is spelled out in Category Theory in Wikibooks, and in any other book on category theory.

    The concepts of morphism, domain and codomain in a category are therefore defined by abstract definitions, which means that any property of morphisms and their domains and codomains that is true in every category must follow from the axioms. However, the word “morphism” and the talk about domains and codomains naturally suggests to many students that a morphism must be a function, so they immediately and incorrectly expect to evaluate it at an element of its domain, or to treat it as a function in other ways.

    Example

    If $\mathcal{C}$ is a category, its opposite category $\mathcal{C}^{op}$ is defined this way:

    • The objects of $\mathcal{C}^{op}$ are the objects of $\mathcal{C}$.
    • A morphism $f:X\to Y$ of $\mathcal{C}^{op}$ is a morphism from $Y$ to $X$ of $\mathcal{C}$ (swap the domain and codomain).

    In Question 980933 on Math SE, the questioner is saying (among other things) that in $\text{Set}^{op}$, this would imply that there has to be a morphism from a nonempty set to the empty set. This of course is true, but the questioner is worried that you can’t have a function from a nonempty set to the empty set. That is also true, but what it implies is that in $\text{Set}^{op}$, the morphism from $\{1,2,3\}$ to the empty set is not a function from $\{1,2,3\}$ to the empty set. The morphism exists, but it is not a function. This does not any any sense make the definition of $\text{Set}^{op}$ incorrect.

    Student confusion like this tends to make the teacher want to have a one foot by six foot billboard in his classroom saying

    A MORPHISM DOESN’T HAVE TO BE A FUNCTION!

    However, even that statement causes confusion. The questioner who asked Question 1594658 essentially responded to the statement in purple prose above by assuming a morphism that is “not a function” must have two distinct values at some input!

    That questioner is still allowing the connotations of the word “morphism” to lead them to assume something that the definition of category does not give: that the morphism can evaluate elements of the domain to give elements of the codomain.

    So we need a more elaborate poster in the classroom:

    The definition of “category” makes no requirement
    that an object has elements
    or that morphisms evaluate elements.

    As was remarked long long ago, category theory is pointless.

    English words implementing logic

    There are lots of questions about logic that show that students really do not think that the definition of some particular logical construction can possibly be correct. That is why in the abstractmath.org chapter on definitions I inserted this purple prose:

    A definition is a totalitarian dictator.

    It is often the case that you can explain why the definition is worded the way it is, and of course when you can you should. But it is also true that the student has to grovel and obey the definition no matter how weird they think it is.

    Formula and term

    In logic you learn that a formula is a statement with variables in it, for example “$\exists x((x+5)^3\gt2)$”. The expression “$(x+5)^3$” is not a formula because it is not a statement; it is a “term”. But in English, $H_2O$ is a formula, the formula for water. As a result, some students have a remarkably difficult time understanding the difference between “term” and “formula”. I think that is because those students don’t really believe that the definition must be taken seriously.

    Exclusive or

    Question 804250 in MathSE says:

    “Consider $P$ and $Q$. Let $P+Q$ denote exclusive or. Then if $P$ and $Q$ are both true or are both false then $P+Q$ is false. If one of them is true and one of them is false then $P+Q$ is true. By exclusive or I mean $P$ or $Q$ but not both. I have been trying to figure out why the truth table is the way it is. For example if $P$ is true and $Q$ is true then no matter what would it be true?”

    I believe that the questioner is really confused by the plus sign: $P+Q$ ought to be true if $P$ and $Q$ are both true because that’s what the plus sign ought to mean.

    Yes, I know this is about a symbol instead of an English word, but I think the difficulty has the same dynamics as the English-word examples I have given.

    If I have understood this difficulty correctly, it is similar to the students who want to know why $1$ is not a prime number. In that case, there is a good explanation.

    Only if

    The phrase “only if” simply does not mean the same thing in math as it does in English. In Question 17562 in MathSE, a reader asks the question, why does “$P$ only if $Q$” mean the same as “if $P$ then $Q$” instead of “if $Q$ then $P$”?

    Many answerers wasted a lot of time trying to convince us that “$P$ only if $Q$” mean the same as “if $P$ then $Q$” in ordinary English, when in fact it does not. That’s because in English, clauses involving “if” usually connote causation, which does not happen in math English.

    Consider these two pairs of examples.

    1. “I take my umbrella only if it is raining.”
    2. “If I take my umbrella, then it is raining.”
    3. “I flip that switch only if a light comes on.”
    4. “If I flip that switch, a light comes on.”

    The average non-mathematical English speaker will easily believe that (1) and (4) are true, but will balk and (2) and (3). To me, (3) means that the light coming on makes me flip the switch. (2) is more problematical, but it does (to me) have a feeling of causation going the wrong way. It is this difference that causes students to balk at the equivalence in math of “$P$ only if $Q$” and “If $P$, then $Q$”. In math, there is no such thing as causation, and the truth tables for implication force us to live with the fact that these two sentences mean the same thing.

    Henning Makholm’ answer to Question 17562 begins this way: “I don’t think there’s really anything to understand here. One simply has to learn as a fact that in mathematics jargon the words ‘only if’ invariably encode that particular meaning. It is not really forced by the everyday meanings of ‘only’ and’ if’ in isolation; it’s just how it is.” That is the best way to answer the question. (Other answerers besides Makholm said something similar.)

    I have also discussed this difficulty (and other difficulties with logic) in the abmath section on “only if“.

    References

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle

    My early life as a mathematician

    My early life as a mathematician.

    Revised 22 January 2016.

    In 1965, I received my Ph.D. at Duke University based on a dissertation about polynomials over finite fields. My advisor was Leonard Carlitz.

    In Carlitz’s algebra course, the textbook was Van der Waerden’s Algebra. It is way too old-fashioned to be used nowadays, but it did indeed present post-Noether type abstract algebra. Carlitz also had me read large chunks of Martin Weber’s Lehrbuch der Algebra, written in German in 1895 (so totally not post-Noether) and published using Fraktur. A few years ago one of my sons asked me to retype the words to some of the songs written in Fraktur in a German-American shape note book in Roman type (but still in German), which I did. This was for German teachers in the Concordia Language Villages to use with their students. I sometimes wonder if I am the last person on earth able to read Fraktur fluently.

    I learned mathematical logic from Joe Shoenfield from his dittoed notes that later became an excellent textbook. I rediscovered Craig’s Trick while working on problem he gave. That considerably strengthened my sense of self-worth.

    I accepted a job at Western Reserve University, now Case Western Reserve University, where I stayed until I retired in 1999. In the few years after 1965, I wrote several papers about finite fields. They are all summarized in the book Finite Fields, by Rudolf Lidl and Harald Niederreiter.

    I was almost immediately attracted to category theory and to computing science, both of which Carlitz hated. I did not let that stop me. (Now is the time to say, Follow The Beat of your Own Drum or some such cliché.)

    Early on, Paul Dedecker was at CWRU briefly, and from him I learned about sheaves, cribles and the like. This inspired me to take part in an algebraic geometry summer school at Bowdoin College, where I learned from lectures by David Mumford and by reading his Red Book when it was still red.

    Because one of the papers in finite fields showed that certain types of permutation polynomials formed wreath products of groups, I also pursued group theory, in particular by taking part in the finite group theory summer school at Bowdoin in 1970.

    During that time I pored over Beck’s thesis on cohomology, which with the group theory I had learned resulted in my paper Automorphisms of group extensions. That paper has the most citations of all my research papers.

    In the early days, I had several graduate students. All of them worked in group theory. One of them, Shair Ahmad, went on to produce several Ph.D. students, all in differential equations and dynamical systems.

    One thing I can brag about is that I never ever told him I hated differential equations or dynamical systems. In fact, I didn’t hate either one. There were people in the department in both fields and they made me jealous the way they could model real life phenomena with those tools. One relevant point about that is that I was a liberal arts math major from Oberlin before going to Duke and had had very few courses in any kind of science. This made me very different from most people in the department, who has B.S. undergrad degrees.

    In those days, John Isbell and Peter Hilton were in the math department at CWRU for awhile, which boosted my knowledge and interest in category theory. Hilton arranged for me to spend a year at the E.T.H. in Zürich, where I met Michael Barr. I eventually wrote two books on category theory with him. But that is getting away from Early Days, so I will stop here.

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle

    Functions: Metaphors, Images and Representations

    Please read this post at abstractmath.org. I originally posted the document here but some of the diagrams would not render, and I haven’t been able to figure out why. Sorry for having to redirect.

    Send to Kindle

    Proofs using diagrams

    Introduction

    This post gives a proof of an easy theorem in category theory using the graph-based logic approach of Graph based logic and sketches, (GBLS) by Atish Bagchi and me.

    Formal logic is typically defined in terms of formulas and terms, defined recursively as strings of characters, together with rules of inference. GBLS proposes a new approach to logic where diagrams are used instead of strings of characters. The exposition here spells out the proof in more detail than GBLS does and uses various experimental ways of drawing diagrams using Mathematica.

    To follow this proof, you need to be familiar with basic category theory. Most special definitions that are needed are defined in this post where they are first used. Section 1 of GBLS also gives the definitions you need with more context.

    The theorem

    The Theorem to be proved (it is Theorem 8.3.1 of GBLS) says that, in any category, if the triangles in the diagram below commute, then the outside square commutes. This is easy using the associative law: If $xf=h$ and $kx=g$, then $kh=k(xf)=(kx)f=gf$.

    Subject Diagram

    So what?

    This theorem is not interesting. The point of this post is to present a new approach to proving such theorems, using diagrams instead of strings. The reason that exhibiting the dig
    rammatic proof is interesting is that many different kinds of categories have a FL cattheory, including these:

    Essentially algebraic string-based logic is described in detail in Partial Horn logic and cartesian categories, by E. Palmgren and Steven Vickers.

    Commercial

    My concept of form in A generalization of the concept of sketch generalizes sketches to all the categories that can be defined as models of FL cattheories. So the method of proof using diagrams can be applied to theorems about the objects defined by forms.

    Concepts needed for the graph-based proof

    To prove the theorem, I will make use of $\mathbf{ThCat}$, the FL cattheory for categories.

    • An FL category is a category with all finite limits.
    • GLBS uses the word cattheory for what Category theory for computing science and Toposes, triples and theories call the theory of a sketch.
    • In many books and articles, and in nLab, a “sketch” is what we call the cattheory (or the theory) of a sketch. For us, the sketch is a generating collection of objects, arrows, diagrams, cones and cocones for the cattheory. The category of models of the sketch and the cattheory are equivalent.
    • $\mathbf{ThCat}$ is a category with finite limits freely generated by certain designated objects, arrows, commutative diagrams and limit cones, listed below.
    • A model of $\mathbf{ThCat}$ in $\mathbf{Set}$ (the category of sets, whichever one you like) is an FL functor $\mathfrak{C}:\mathbf{ThCat}\to\mathbf{Set}.$
    • Such a model $\mathfrak{C}$ is a small category, and every small category is such a model. If this statement worries you, read Section 3.4 of GBLS.
    • Natural transformations between models are FL-preserving functors that preserve the structure on the nose.
    • The category of models of $\mathbf{ThCat}$ in $\mathbf{Set}$ is equivalent to the category of small categories and morphisms, which, unlike the category of models, includes functors that don’t preserve things on the nose.
    • $\mathbf{ThCat}$ is an example of the theory of an FL sketch. Chapter 4 of GBLS describes this idea in detail. The theory has the same models as the sketch.
    • The sketch generating $\mathbf{ThCat}$ is defined in detail in section 7.2 of GBLS.

    Some objects and arrows of $\mathbf{ThCat}$

    I will make use of the following objects and arrows that occur in $\mathbf{ThCat}.$ A formal thing is a construction in $\mathbf{ThCat}$ that becomes an actual thing in a model. So for example a model $\mathfrak{C}$ of $\mathbf{ThCat}$ in $\mathbf{Set}$ is an actual (small) category, and $\mathfrak{C}(\mathsf{ar_2})$ is the set of all composable pairs of arrows in the category $\mathfrak{C}$.

    • $\mathsf{ob}$, the formal set of objects.
    • $\mathsf{ar}$, the formal set of arrows.
    • $\mathsf{ar}_2$, the formal set of composable pairs of arrows.
    • $\mathsf{ar}_3$, the formal set of composable triples of arrows.
    • $\mathsf{unit} : \mathsf{ob}\to \mathsf{ar}$ that formally picks out the identity arrow of an object.
    • $\mathsf{dom},\mathsf{cod} : \mathsf{ar}\to \mathsf{ob}$ that formally pick out the domain and codomain of an arrow.
    • $\mathsf{comp} : \mathsf{ar}_2\to \mathsf{ar}$ that picks out the composite of a composable pair.
    • $\mathsf{lfac}, \mathsf{rfac} :\mathsf{ar}_2\to \mathsf{ar}$ that pick out the left and right factors in a composable pair.
    • $\mathsf{lfac}, \mathsf{mfac},\mathsf{rfac} :\mathsf{ar}_3 \to\mathsf{ar}$ that pick out the left, middle and right factors in a composable triple of arrows.
    • $\mathsf{lass}, \mathsf{rass} : \mathsf{ar}_3 \to \mathsf{ar}_2$: $\mathsf{lass}$ formally takes $\langle{h,g,f}\rangle$ to $\langle{hg,f}\rangle$ and $\mathsf{rass}$ takes it to $\langle{h,gf}\rangle$.

    $\mathsf{ob}$, $\mathsf{ar}$, $\mathsf{unit}$, $\mathsf{dom}$, $\mathsf{cod}$ and $\mathsf{comp}$ are given primitives and the others are defined as limits of finite diagrams composed of those objects. This is spelled out in Chapter 7.2 of GBLS. The definition of $\mathbf{ThCat}$ also requires certain diagrams to be commutative. They are all provided in GBLS; the one enforcing associativity is shown later in this post.

    Color coding

    I will use color coding to separate syntax from semantics.

    • Syntax consists of constructions in $\mathbf{ThCat}.$ The description will always be a commutative diagram in black, with annotations as explained later.
    • The limit of the description will be an object in $\mathbf{ThCat}$ (the form) whose value in a model $\mathfrak{C}$ will be shown in green, because being an element of the value of a model makes it semantics.
    • When a limit cone is defined, the projections (which are arrows in $\mathbf{ThCat}$) will be shown in blue.

    Descriptions

    In graph-based logic, a type of construction that can be made in a category has a description, which (in the case of our Theorem) is a finite diagram in $\mathbf{ThCat}$. The value of the limit of the description in a model $\mathfrak{C}$ is the set of all instances of that type of construction in $\mathfrak{C}$.

    The Subject Diagram

    • This diagram is the subject matter of the Theorem. It is not assumed to be commutative.
    • As in most diagrams in category theory texts, the labels in this diagram are variables, so the diagram is implicitly universally quantified. The Subject Diagram is a generic diagram of its shape.
    • “Any diagram of its shape” includes diagrams in which some of the nodes may represent the same object. An extreme example is the graph in which every node is an object $\mathsf{E}$ and every arrow is its identity arrow. The diagram below is nevertheless an example of the Subject Diagram:
    • Shapes of diagrams are defined properly in Section 2.3 of
      GBLS and in Section 4.1 of Category Theory for Computing Science.

    The description of the Subject Diagram

    Diagram SDD below shows the Subject Diagram as the limit of its description. The description is the black diagram.


    Diagram SDD

    Definition of $\mathsf{ar}_2$

    The object $\mathsf{ar}_2$ of composable pairs of arrows is defined as a pullback:

    In the usual categorical notation this would be shown as

    This makes use of the fact that the unnamed blue arrow is induced by the other two projection arrows. In the rest of the post, projection arrows that are induced are normally omitted.

    An enrichment of the description

    Because $\mathsf{ar}_2$ is defined as a pullback, we can enrich the description of Diagram SDD by adjoining two pullbacks as shown below. This is Diagram 8.10 in GBLS. The enriched diagram has the same limit as the description of Diagram SDD.

    Enriched Diagram SDD

    Note that the projections from the limit to the two occurrences of $\mathsf{ar}_2$ induce all the other projections. This follows by diagram chasing; remember that the description must be a commutative diagram.

    Make the triangles commute

    To make the triangles commute, we add two comp arrows to the enriched diagram as shown below. These two arrows are not induced by the description; they are therefore additions to the description — they describe a more restrictive (green) diagram with commutative triangles and so are shown in black.

    Diagram TC: The triangles commute

    The left comp makes $xf=h$ and the right comp makes $kx=g$.

    The outside square commutes

    Now we enrich Diagram TC with four objects, <comp,id>, <id,comp> and three comp arrows as shown in bolder black. These objects and arrows already exist in $\mathbf{ThCat}$ and therefore do not change the limit, which must be the same as the limit of Diagram TC.

    The outside square commutes

    The diagram in bold black is exactly the commutative diagram that requires associativity for these particular objects and arrows, which immediately implies that $gf=kh$, as the Theorem requires.

    By the definitions of $\mathsf{ar_2}$ and $\mathsf{ar_3}$, the part of the description in bold black induces the rest of the diagram. Omitting the rest of the diagram would make $\mathsf{ar_2}$ and $\mathsf{ar_3}$ modules in the sense of GBLS, Chapter 7.4. Modules would be vital to deal with proofs more complicated than the one given here.

    References

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

    Send to Kindle

    Grothendieck

    In 1965, I received my Ph.D. with a dissertation about polynomials over finite fields. I accepted a job at Western Reserve U., now Case W.R.U. There I immediately began trying to learn category theory, which I thought was the best math thing since sliced bread. (My thesis adviser hated category theory. He also hated computer science, which also fascinated me.)

    I was really lucky, for during the next few years John Isbell, Peter Hilton and Paul Dedecker came to stay for a few years, allowing me to pick up a decent understanding of categories. Dedecker only stayed one year, but from him I learned about sheaves, cribles and the like, which inspired me to spend a summer at Bowdoin College learning about algebraic geometry from David Mumford and his Red Book.

    It was only gradually that I learned that many of the most interesting ideas came from Alexandre Grothendieck. His ideas are now everywhere in math and we should all be grateful for his life and work.  I wish he had continued working.  He could have done wonders for computing science.

     

    Send to Kindle

    Sets

    I have been working my way through abstractmath.org, revising the articles and turning them into pure HTML so they will be easier to update. In some cases I am making substantial revisions. In particular, many of the articles need a more modern point of view.

     

    The math community’s understanding of sets and structures has changed because of category theory and will change
    because of homotopy type theory.

     

    This post considers some issues and possibilities concerning the chapter on sets.

    The references listed at the end of the article include several about homotopy type theory. They provide different viewpoints and require different levels of sophistication.

    A specification of the concept of set

    The abmath article Specification of sets specifies what a set is in this way:

    A set is a single math object distinct from but completely determined by what its elements are.

    I have used this specification for sets since the eighties, first in my Discrete Math lecture notes and then in abstractmath.org. It has proved useful because it is quite simple and the statement implies lots of immediate consequences. Each of the first four consequences in this list below exposes a confusion that some students have.

    Consequences of the specification

    1. A set is a math object. It has the same status as the number “$143$” and the sine function and the real line: they are all objects of math. A set is not merely a typographically convenient way to define a certain collection of things.
    2. A set is a single object. Many beginners seem to have in their head that the set $\{3,4\}$ is two things.
    3. A set is distinct from its elements. The set $\{3,4\}$ is not $3$, it is not $4$, it is not a number at all.
    4. The spec implies that $\{3,4\}$ is the same set as $\{4,3\}$. Some students think they understand this but some of their mistakes show that they don’t really understand it.
    5. On the other hand, $\{3,5\}$ is a different set from $\{3,4\}$. I haven’t noticed this bothering students but it bothers me. See the discussion on ursets below.

    Those consequences make the spec a useful teaching tool. But if a beginning abstract math student gets very far in their studies, some complications come up.

    Defining “set”

    In the late nineteenth century, math people started formally defining particular math structures such as groups and various
    kinds of spaces. This was normally done by starting with a set and adding structure.

    You may think that “starting with a set and adding structure” brushes a lot of complications under the rug. Well, don’t look under the rug, at least not right now.

    The way they thought about sets was a informal version of what is now called naive set theory. In particular, they freely defined particular sets using what is essentially setbuilder notation, producing sets in a way which (I claim) satisfies my specification.

    Bertrand Russell wakes everyone up

    Then along came Russell’s paradox. In the context of this discussion, the paradox implied that the spec for sets is not a definition.The spec provides a set of necessary conditions for being a set. But it is not sufficient. You can say “Let $S$ be the set of all sets that…[satisfy some condition]” until you are blue in the face, but there are conditions (including the empty condition) that don’t define a set.

    The Zermelo-Fraenkel axioms

    The Zermelo-Fraenkel axioms were designed to provide a definition that didn’t create contradictions. The axioms accomplish this by creating a sort of hierarchy that requires that each set must be defined in terms of sets defined previously. They provide a good way (but not the only one) of providing a way of legitimizing our use of sets in math.

    Observe that the “set of all sets” is certainly not “defined” in terms of previously defined sets!

    Sets as a foundation

    During those days there was a movement to provide a solid foundation for mathematics. After Zermelo-Fraenkel came along, the progress of thinking seemed to be:

    1. Sets are in trouble.
    2. Zermelo-Fraenkel solves our set difficulties.
    3. So let’s require that every math object be a set.

    That list is oversimplified. In particular, the development of predicate logic was essential to this approach, but I can’t write about everything at once.

    This leads to monsters such as the notorious definition of ordered pair:

    The ordered pair $(a,b)$ is the set $\{a,\{b\}\}$.

    This leads to the ludicrous statement that $a$ is an element of $(a,b)$ but that $b$ is not.

    By saying every math object may be modeled as a set with structure, ZF set theory becomes a model of all of math. This approach gives a useful proof that all of math is as consistent as ZF set theory is.

    But many mathematicians jumped to the conclusion that every math object must be a set with structure. This approach does not match the way mathematicians think about math objects. In particular, it makes computerized proof assistance hard to use because you have to translate your thinking into sets and first order logic.

    Sets by category theory

    “A mathematical object is determined by the role it plays in a category.” — A. Grothendieck

    In category theory, you define math structures in terms of how they relate to other math structures. This shifts the emphasis from

    What is it?

    to

    What are its properties?

    For example, an ordered pair is a mathematical object $p$ determined by these properties:

    • It determines mathematical objects $p_1$ and $p_2$.
    • $p$ is completely determined by what $p_1$ is and what $p_2$ is.
    • If $p$ and $q$ are ordered pairs and $p_1=q_1$ and $p_2=q_2$ then $p=q$.

    Categorical definition of set

    “Categorical” here means “as understood in category theory”. It unfortunately has a very different meaning in model theory (set of axioms with only one model up to isomorphism) and in general usage, as in “My answer is categorically NO” said by someone who is red in the face. The word “categorial” has an entirely different meaning in linguistics. *Sigh*.

    William Lawvere has produced an axiomatization of the category of sets.
    The most accessible introduction to it that I know of is the article Rethinking set theory, by Tom Leinster. This axiomatization defines sets by their relationship with each other and other math objects in much the same way as the categorical definition of (for example) groups gives a definition of groups that works in any category.

    “Set” means two different things

    The word set as used informally has two different meanings.

    • According to my specification of sets, $\{3,4\}$ is a set and so is $\{3,5\}$.
    • $\{3,4\}$ and $\{3,5\}$ are not the same set because they don’t have the same elements.
    • But in the category of sets, any two $2$-element sets are isomorphic. (So are any two seven element sets.)
    • From a categorical point of view, two isomorphic objects in a category can be be thought of as the same object, with a caveat that you have better make it clear which isomorphism you are thinking of.

    One of the great improvements in mathematics that homotopy type theory supplies is a systematic way of keeping track of the isomorphisms, the isomorphisms between the isomorphisms, and so on ad infinitum (literally). But note: I am just beginning to understand htt, so regard this remark as something to be suspicious of.

    • But $\{3,4\}$ and $\{3,5\}$ may not be thought of as the same object according to the spec I gave, because they don’t have the same elements.
    • This means that the traditional idea of set is not the same as the strict categorical idea of set.

    I suggest that we keep the word “set” for the traditional concept and call the strict categorical concept an urset.

    A traditional set is a structure on an urset

    The traditional set $\{3,5\}$ consists of the unique two-element urset coindexed on the integers.

    A (ur)set $S$ coindexed by a math structure $A$ is a monic map from $S$ to the underlying set of $A$. In this example, the map has codomain the integers and takes one element of the two-element urset to $3$ and the other to $5$.

    Note added 2014-10-05 in response to Toby Bartels’ comment: I am inclined to use the names “abstract set” for “urset” and “concrete set” for coindexed sets when I revise the articles on sets. But most of the time we can get away with just “set”.

    There is clearly no isomorphism of coindexed sets from $\{3,4\}$ to $\{3,5\}$, so those two traditional sets are not equal in the category of coindexed sets.

    I made up the phrase “coindexed set” to use in this sense, since it is a kind of opposite of indexed set. If terminology for this already exists, lemme know. Linguists will tell you they use the word “coindexed” in a different sense.

    Elements

    The concept of “element” in categorical thinking is very different from the traditional idea, where an element of a set can be any mathematical object. In categorical thinking, an element of an object $A$ of a category $\mathbf{C}$ is an arrow $1\to A$ where $1$ is the terminal object. Thus $4$ as an integer is the arrow $1\to \mathbb{Z}$ whose unique value is the number $4$.

    An object is an element of only one set

    In the usage of category theory, the arrow $1\to\mathbb{R}$ whose value is the real number $4$ is a different math object from the arrow $1\to\mathbb{Z}$ whose value is the integer $4$.

    A category theorist will probably agree that we can identify the integer $4$ with the real number $4$ via the well known canonical embedding of the ring of integers into the field of real numbers. But in categorical thinking you have to keep all such embeddings in mind; you don’t say the integer $4$ is the same thing as the real number $4$. (Most computer languages keep them distinct, too.)

    This difference is actually not hard to get used to and is in fact an improvement over traditional set theory. When you do category theory you use lots of commutative diagrams. The embeddings show up as monic arrows and are essential in keeping the different objects ($\mathbb{Z}$ and $\mathbb{R}$ in the example) separate.

    The paper Relating first-order set theory and elementary toposes, by Awodey, Butz, Simpson and Streicher, introduces a concept of “structural system of inclusions” that appears to me to restore the idea of object being an element of more than one set for many purposes.

    Homotopy type theory allows an object to have only one type, with much the same effect as in the categorical approach.

    Variable elements

    The arrow $1\to \mathbb{Z}$ that picks out the integer $4$ is a constant function. It is useful to think of any arrow $A\to B$ of any category as a variable element (or generalized element) of the object $B$. For example, the function $f:\mathbb{R}\to \mathbb{R}$ defined by $f(x)=x^2$ allows you to
    think of $x^2$ as a variable number with real parameter. This is another way of thinking about the “$y$” in the equation $y=x^2$, which is commonly called a dependent variable.

    One way to think about $y$ is that some statements about it are true, some are false, and many statements are neither true nor false.

    • $y\geq 0$ is true.
    • $y\lt0$ is false.
    • $y\leq1$ is neither true nor false.

    This way of thinking about variable objects clears up a lot of confusion about variables and deserves to be more widely used in teaching.

    The book Category theory for computing science provides some examples of the use of variable elements as a way of thinking about categorical ideas.

    References

    Creative Commons License< ![endif]>
    

    This work is licensed under a Creative
    Commons Attribution-ShareAlike 2.5
    License
    .

    Send to Kindle

    The only axiom of algebra

    This is one of a series of posts I am writing to help me develop my thoughts about how particular topics in my book Abstracting Algebra (“AbAl“) should be organized. This post concerns the relation between substitution and evaluation that essentially constitutes the definition of algebra. The Mathematica code for the diagrams is in Subs Eval.nb.

    Substitution and evaluation

    This post depends heavily on your understanding of the ideas in the post Presenting binary operations as trees.

    Notation for evaluation

    I have been denoting evaluation of an expression represented as a tree like this:



    In standard algebra notation this would be written:\[(6-4)-1=2-1=1\]

    Comments

    This treatment of evaluation is intended to give you an intuition about evaluation that is divorced from the usual one-dimensional (well, nearly) notation of standard algebra. So it is sloppy. It omits fine points that will have to be included in AbAl.

    • The evaluation goes from bottom up until it reaches a single value.
    • If you reach an expression with an empty box, evaluation stops. Thus $(6-3)-a$ evaluates only to $3-a$.
    • $(6-a)-1$ doesn’t evaluate further at all, although you can use properties peculiar to “minus” to change it to $5-a$.
    • I used the boxed “1” to show that the value is represented as a trivial tree, not a number. That’s so it can be substituted into another tree.

    Notation for substitution

    I will use a configuration like this

    to indicate the data needed to substitute the lower tree into the upper one at the variable (blank box). The result of the substitution is the tree

    In standard algebra one would say, “Substitute $3\times 4$ for $a$ in the expression $a+5$.” Note that in doing this you have to name the variable.

    Example

    “If you substitute $12$ for $a$ in $a+5$ you get $12+5$”:

    results in

    Example

    “If you substitute $3\times 4$ for $a$ in $a+b$ you get $3\times4+b$”:

    results in

    Comments

    Like evaluation, this treatment of substitution omits details that will have to be included in AbAl.

    • You can also substitute on the right side.
    • Substitution in standard algebraic notation often requires sudden syntactic changes because the standard notation is essentially two-dimensional. Example: “If you substitute $3+ 4$ for $a$ in $a\times b$ you get $(3+4)\times b$”.
    • The allowed renaming of free variables except when there is a clash causes students much trouble. This has to be illustrated and contrasted with the “binop is tree” treatment which is context-free. Example: The variable $b$ in the expression $(3\times 4)+b$ by itself could be changed to $a$ or $c$, but in the sentence “If you substitute $3+ 4$ for $a$ in $a\times b$ you get $(3+4)\times b$”, the $b$ is bound. It is going to be difficult to decide how much of this needs explaining.

    The axiom

    The Axiom for Algebra says that the operations of substitution and evaluation commute: if you apply them in either order, you get the same resulting tree. That says that for the current example, this diagram commutes:

    The Only Axiom for Algebra

    In standard algebra notation, this becomes:

    • Substitute, then evaluate: If $a=3\times 4$, then $a+5=3\times 4+5=12+5$.
    • Evaluate, then substitute: If $a=3\times 4$, then $a=12$, so $a+5=12+5$.

    Well, how underwhelming. In ordinary algebra notation my so-called Only Axiom amounts to a mere rewording. But that’s the point:


    The Only Axiom of Algebra is what makes algebraic manipulation work.

    Miscellaneous comments

    • In functional notation, the Only Axiom says precisely that $\text{eval}∘\text{subst}=\text{subst}∘(\text{eval},\text{id})$.
    • The Only Axiom has a symmetric form: $\text{eval}∘\text{subst}=\text{subst}∘(\text{id},\text{eval})$ for the right branch.
    • You may expostulate: “What about associativity and commutativity. They are axioms of algebra.” But they are axioms of particular parts of algebra. That’s why I include examples using operations such as subtraction. The Only Axiom is the (ahem) only one that applies to all algebraic expressions.
    • You may further expostulate: Using monads requires the unitary or oneidentity axiom. Here that means that a binary operation $\Delta$ can be applied to one element $a$, and the result is $a$. My post Monads for high school III. shows how it is used for associative operations. The unitary axiom is necessary for representing arbitrary binary operations as a monad, which is a useful way to give a theoretical treatment of algebra. I don’t know if anyone has investigated monads-without-the-unitary-axiom. It sounds icky.
    • The Only Axiom applies to things such as single valued functions, which are unary operations, and ternary and higher operations. They also apply to algebraic expressions involving many different operations of different arities. In that sense, my presentation of the Only Axiom only gives a special case.
    • In the case of unary operations, evaluation is what we usually call evaluation. If you think about sets the way I do (as a special kind of category), evaluation is the same as composition. See “Rethinking Set Theory”, by Tom Leinster, American Mathematical Monthly, May, 2014.
    • Calculus functions such as sine and the exponential are unary operations. But not all of calculus is algebra, because substitution in the differential and integral operators is context-sensitive.

    References

    Preceding posts in this series

    Remarks concerning these posts
    • Each of the posts in this series discusses how I will present a small part of AbAl.
    • The wording of some parts of the posts may look like a first draft, and such wording may indeed appear in the text.
    • In many places I will talk about how I should present the topic, since I am not certain about it.

    Other references

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.


    Send to Kindle

    Presenting binops as trees

    Binary operations as trees

    This is one of a series of posts I am writing to help me develop my thoughts about how particular topics in my book Abstracting Algebra (“AbAl“) should be organized. In some parts, I present various options that I have not decided between.

    This post concerns the presen­ta­tion of binary operations as trees. The Mathematica code for the diagrams is in Substitution in algebra.nb

    Binary operations as functions

    A binary operation or binop $\Delta$ is a function of two variables whose value at $(a,b)$ is traditionally denoted by $a\Delta b$. Most commonly, the function is restricted to having inputs and outputs in the same set. In other words, a binary operation is a function $\Delta:S\times S\to S$ defined on some set $S$. $S$ is the underlying set of the operation. For now, this will be the definition, although binops may be generalized to multiple sets later in the book.

    In AbAl:

    • Binops will be defined as functions in the way just described.
    • Algebraic expressions will be represented
      as trees, which exhibit more clearly the structure of the expressions that is encoded in algebraic notation.
    • They will also be represented using the usual infix expressions such as “$3\times 5$” and “$3-5$”,

    Fine points

    The definition of a binop as a function has termi­no­logical consequences. The correct point of view concerning a function is that it determines its domain and its codomain. In particular:


    A binary operation determines its underlying set.

    Thus if we talk about an arbitrary binop $\Delta$, we don’t have to give a name to its underlying set. We can just say “the underlying set of $\Delta$” or “$U(\Delta)$”.

    Examples

    “$+$” is not one binary operation.

    • $+:\mathbb{Z}\times\mathbb{Z}\to\mathbb{Z}$ is a binary operation.
    • $+:\mathbb{R}\times\mathbb{R}\to\mathbb{R}$ is another binary operation.

    Mathematicians commonly refer to these particular binops as “addition on the integers” and “addition on the reals”.

    Remark

    You almost never see this attitude in textbooks on algebra. It is required by both category theory and type theory, two Waves flooding into math. Category theory is a middle-aged Wave and type theory, in the version of homo­topy type theory, is a brand new baby Wave. Both Waves have changed and will change our under­standing of math in deep ways.

    Trees

    An arbitrary binop $\Delta$ can be represented as a binary tree in this way:

    generic binop

    This tree represents the expression that in standard algebraic notation is “$a\Delta b$”.

    In more detail, the tree is an ordered rooted binary tree. The “ordered” part means that the leaves (nodes with no descendants) are in a specific left to right order. In AbAl, I will define trees in some detail, with lots of pictures.

    The root shows the operation and the two leaves show elements of the underlying set. I follow the custom in computing science to put the root at the top.

    Metaphors should not dictate your life by being taken literally.

    Remark

    The Wikipedia treatment of trees is scat­tered over many articles and they almost always describe things mostly in words, not pictures. Describing math objects in words when you could use pictures is against my religion. Describing is not the same as defining, which usually requires words.

    Some concrete examples:



        
        

    3trees

    These are represen­ta­tions of the expressions “$3+5$”, “$3\times5$”, and “$3-5$”.

    Just as “$5+3$” is a different expression from “$3+5$”, the left tree in 3trees above is a different expression from this one:



        

    switch

    They have the same value, but they are distinct as expressions — otherwise, how could you state the commutative law?

    Fine points

    I regard an expression as an abstract math object that can have many repre­sentations. For example “$3+5$” and the left tree in 3trees are two different represen­ta­tions of the same (abstract) expression. This deviates from the usual idea that “expression” refers to a typographical construction.

    In previous posts, when the operation is not commutative, I have sometimes labeled the legs like this:


    I have thought about using this notation consistently in AbAl, but I suspect it would be awkward in places.

    Evaluation and substitution


    The two basic operations on algebraic expressions
    are evaluation and substitution.

    They and the Only Axiom of Algebra, which I will discuss in a later post, are all that is needed to express the true nature of algebra.

    Evaluation

    • If you evaluate $3+5$ you get $8$.
    • If you evaluate $3\times 5$ you get $15$.
    • If you evaluate $3-5$ you get $-2$.

    I will show evaluation on trees like this:




    Evaluation with trace

    A more elaborate version, valuation with trace, would look like this. This allows you to keep track of where the valuations come from.




    You could also keep track of the operation used at each node. An interactive illustration of this is in the post Visible algebra I supplement. That illustration requires CDF Player to be installed on your computer. You can get it free from the Mathematica website.

    Variables

    In the tree above, the $a$ and $b$ are variables, just as they are in the equivalent expression $a\Delta b$. Algebra beginners have a hard time understanding variables.

    • You can’t evaluate an expression until you substitute numbers for the letters, which produces an instance of expression. (“Instance” is the preferable name for this, but I often refer to such a thing as an “example”.)
    • If a variable is repeated you have to substitute the same value for each occurrence. So $a\Delta b$ is a different expression from $a\Delta a$: $2+3$ is an instance of $a+b$ but it is not an instance of $a+a$. But $a\Delta a$ and $b\Delta b$ are the same expression: any instance of one is an instance of the other.
    • Substitute $a\Delta b$ for $a$ in $a\Delta b$ and you get $(a\Delta b)\Delta b$. You may have committed variable clash. You might have meant $(a\Delta b)\Delta c$. (Somebody please tell me a good link that describes variable clash.)
    • Later, you will deal with multiplication tables for algebraic structures. There the elements are denoted by letters of the alphabet. They can’t be substituted for.

    Empty boxes

    A straightforward way to denote variables would be to use empty boxes:

    The idea is that a number (element of the underlying set) can be inserted in each box. If $3$ (left) and $5$ (right) are placed in the boxes, evaluation would place the value of $3\Delta5$ in the root. Each empty box represents a separate variable.

    Empty boxes could also be used in the standard algebraic notation: $\Delta$ or $+$ or $-$.
    I have seen that notation in texts explaining variables, but I don’t know a reference. I expect to use this notation with trees in AbAl.

    To achieve the effect of one variable in two different places, as in

    we can cause it to repeat, as below, where “$\text{id}$” denotes the identity function on the underlying set:

    To evaluate at a number (member of the underlying set) you insert a number into the only empty box

    which evaluates to

    which of course evaluates to $3\Delta3$.

    This way of treating repeated variables exhibits the nature of repeated variables explicitly and naturally, putting the values automatically in the correct places. This process, like everything in this section, comes from monad theory. It also reminds me of linear logic in that it shows that if you want to use a value more than once you have to copy it.

    Substitution

    Given two binary trees



          

    you could attach the root of the first one to one of the leaves of the second one, in two different ways, to get these trees:



          


    2trees

    which in standard algebra notation would be written $(a-b)-c)$ and $a-(b-c)$ respectively. Note that this tree



    would be represented in algebra as $(a-b)-b$.

    In general, substituting a tree for an input (variable or empty box) consists of replacing the empty box by the whole tree, identifying the root of the new tree with the empty box. In graph theorem, “substitution” may be called “grafting”, which is a good metaphor.

    You can evaluate the left tree in 2trees at particular numbers to evaluate it in two stages:



    Of course, evaluating the right one at the same values would give you a different answer, since subtraction is not associative. Here is another example:


    Binary trees in general

    By repeated substitution, you can create general binary trees built up of individual trees of this form:

    In AbAl I will give examples of such things and their counterparts in algebraic notation. This will include binary trees involving more than one binop, as well. I showed an example in the previous post, which example I repeat here:

    It represents the precise unsimplified expression

    \[A=wh+\frac{1}{2}\left(\pi(\frac{1}{2}w)^2\right)\]

    Some of the operations in that tree are associative and commutative, which is why the expression can be simplified. The collection of all (finite) binary trees built out of a single binop with no assumption that it satisfies laws (associative, commutative and so on) is the free algebra on that binary operation. It is the mother of all binary operations, so it plays the same role for an arbitrary binop that the set of lists plays for associative operations, as described in Monads for High School III: Algebras. All this will be covered in later chapters of AbAl.

    References

    Preceding posts in this series

    Other references

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.


    Send to Kindle

    Presenting binary operations

    This is the first of a set of notes I am writing to help me develop my thoughts about how particular topics in my book Abstracting algebra should be organized. This article describes my plan for the book in some detail. The present post has some thoughts about presenting binary operations.

    Before binary operations are introduced

    Traditionally, an abstract algebra book assumes that the student is familiar with high school algebra and will then proceed with an observation that such operations as $+$ and $\times$ can be thought of as functions of two variables that take a number to another number. So the first abstract idea is typically the concept of binary operation, although in another post I will consider whether that really should be the first abstract concept.

    The Abstracting Algebra book will have a chapter that presents concrete examples of algebraic operations and expressions on numbers as in elementary school and as in high school algebra. This section of the post outlines what should be presented there. Each subsection needs to be expanded with lots of examples.

    In elementary school

    In elementary school you see expressions such as

    • $3+4$
    • $3\times 4$
    • $3-4$

    The student invariably thinks of these expressions as commands to calculate the value given by the expression.

    They will also see expressions such as
    \[\begin{equation}
    \begin{array}[b]{r}
    23\\
    355\\
    + 96\\
    \hline
    \end{array}
    \end{equation}\]
    which they will take as a command to calculate the sum of the whole list:
    \[\begin{equation}
    \begin{array}[b]{r}
    23\\
    355\\
    + 96\\
    \hline
    474
    \end{array}
    \end{equation}\]

    That uses the fact that addition is associative, and the format suggests using the standard school algorithm for adding up lists. You don’t usually see the same format with more than two numbers for multiplication, even though it is associative as well. In some elementary schools in recent years students are learning other ways of doing arithmetic and in particular are encouraged to figure out short cuts for problems that allow them. But the context is always “do it”, not “this represents a number”.

    Algebra

    In algebra you start using letters for numbers. In algebra, “$a\times b$” and “$a+b$” are expressions in the symbolic language of math, which means they are like noun phrases in English such as “My friend” and “The car I bought last week and immediately totaled” in that both are used semantically as names of objects. English and the symbolic language are both languages, but the symbolic language is not a natural language, nor is it a formal language.

    Example

    In beginning algebra, we say “$3+5=8$”, which is a (true) statement.

    Basic facts about this equation:

    The expressions “$3+5$” and “$8$”

    • are not the same expression
    • but in the standard semantics of algebra they have the same meaning
    • and therefore the equation communicates information that neither “$3+5$” nor “$8$” communicate.

    Another example is “$3+5=6+2$”.

    Facts like this example need to be communicated explicitly before binary operations are introduced formally. The students in a college abstract algebra class probably know the meaning of an equation operationally (subconsciously) but they have never seen it made explicit. See Algebra is a difficult foreign language.

    Note

    The equation “$3+5=6+2$” is an expression just as much as “$3+5$” and “$6+2$” are. It denotes an object of type “equation”, which is a mathematical object in the same way as numbers are. Most mathematicians do not talk this way, but they should.

    Binary operations

    Early examples

    Consciousness-expanding examples should appear early and often after binary operations are introduced.

    Common operations

    • The GCD is a binary operation on the natural numbers. This disturbs some students because it is not written in infix form. It is associative. The GCD can be defined conceptually, but for computation purposes needs (Euclid’s) algorithm. This gives you an early example of conceptual definitions and algorithms.
    • The maximum function is another example of this sort. This is a good place to point out that a binary operation with the “same” definition cen be defined on different sets. The max function on the natural numbers does not have quite the same conceptual definition as the max on the integers.

    Extensional definitions

    In order to emphasize the arbitrariness of definitions, some random operations on a small finite sets should be given by a multiplication table, on sets of numbers and sets represented by letters of the alphabet. This will elicit the common reaction, “What operation is it?” Hidden behind this question is the fact that you are giving an extensional definition instead of a formula — an algorithm or a combination of familiar operations.

    Properties

    The associative and commutative properties should be introduced early just for consciousness-raising. Subtraction is not associative or commutative. Rock paper scissors is commutative but not associative. Groups of symmetries are associative but not commutative.

    Binary operation as function

    The first definition of binary operation should be as a function. For example, “$+$” is a function that takes pairs of numbers to numbers. In other words, $+:\mathbb{Z}\times\mathbb{Z}\to\mathbb{Z}$ is a function.

    We then abstract from that example and others like it from specific operations to arbitrary functions $\Delta:S\times S\to S$ for arbitrary sets $S$.

    This is abstraction twice.

    • First we replace the example operations by an arbitrary operation. such as multiplication, subtraction, GCD and MAX on $\mathbb{Z}$, or something complicated such as \[(x,y)\mapsto 3(xy-1)^2(x^2+xy^3)^3\].
    • Then we replace sets of numbers by arbitrary sets. An example would be the random multiplication on the set $\{1,2,5\}$ given by the table
      \[
      \begin{array}{c|ccc}
      \Delta& 1&2&5\\
      \hline
      1&2&2&1\\
      2&5&2&1\\
      5&2&1&5
      \end{array}
      \]
      This defines a function $\Delta:\{1,2,5\}\times\{1,2,5\}\to\{1,2,5\}$ for which for example $\Delta(2,1)=5$, or $2\Delta 1=5$. This example uses numbers as elements of the set and is good for eliciting the “What operation is it?” question.
    • I will use examples where the elements are letters of the alphabet, as well. That sort of example makes the students think the letters are variables they can substitute for, another confusion to be banished by the wise professor who know the right thing to say to make it clear. (Don’t ask me; I taught algebra for 35 years and I still don’t know the right thing to say.)

    It is important to define prefix notation and infix notation right away and to use both of them in examples.

    Other representations of binary operations.

    The main way of representing binary operations in Abstracting Algebra will be as trees, which I will cover in later posts. Those posts will be much more interesting than this one.

    Binary operations in high school and college algebra

    • Some binops are represented in infix notation: “$a+b$”, “$a-b$”, and “$a\times b$”.
    • “$a\times b$” is usually written “$ab$” for letters and with the “$\times$” symbol for numbers.
    • Some binops have idiosyncratic representation: “$a^b$”, “${a}\choose{b}$”.
    • A lot of binops such as GCD and MAX are given as functions of two variables (prefix notation) and their status as binary operations usually goes unmentioned. (That is not necessarily wrong.)
    • The symbol “$(a,b)$” is used to denote the GCD (a binop) and is also used to denote a point in the plane or an open interval, both of which are not strictly binops. They are binary operations in a multisorted algebra (a concept I expect to introduce later in the book.)
    • Some apparent binops are in infix notation but have flaws: In “$a/b$”, the second entry can’t be $0$, and the expression when $a$ and $b$ are integers is often treated as having good forms ($3/4$) and bad forms ($6/8$).

    Trees

    The chaotic nature of algebraic notation I just described is a stumbling block, but not the primary reason high school algebra is a stumbling block for many students. The big reason it is hard is that the notation requires students to create and hold complicated abstract structures in their head.

    Example

    This example is a teaser for future posts on using trees to represent binary operations. The tree below shows much more of the structure of a calculation of the area of a rectangle surmounted by a semicircle than the expression

    \[A=wh+\frac{1}{2}\left(\pi(\frac{1}{2}w)^2\right)\]
    does.

    The tree explicitly embodies the thought process that leads to the formula:

    • You need to add the area of the rectangle and the area of the semicircle.
    • The area of the rectangle is width times height.
    • The area of the semicircle is $\frac{1}{2}(\pi r^2)$.
    • In this case, $r=\frac{1}{2}w$.

    Any mathematician will extract the same abstract structure from the formula\[A=wh+\frac{1}{2}\left(\pi(\frac{1}{2}w)^2\right)\] This is difficult for students beginning algebra.

    References

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.


    Send to Kindle