This post has been replaced by the post A slow introduction to category theory.
Tag Archives: associative
Proofs using diagrams
Introduction
This post gives a proof of an easy theorem in category theory using the graphbased logic approach of Graph based logic and sketches, (GBLS) by Atish Bagchi and me.
Formal logic is typically defined in terms of formulas and terms, defined recursively as strings of characters, together with rules of inference. GBLS proposes a new approach to logic where diagrams are used instead of strings of characters. The exposition here spells out the proof in more detail than GBLS does and uses various experimental ways of drawing diagrams using Mathematica.
To follow this proof, you need to be familiar with basic category theory. Most special definitions that are needed are defined in this post where they are first used. Section 1 of GBLS also gives the definitions you need with more context.
The theorem
The Theorem to be proved (it is Theorem 8.3.1 of GBLS) says that, in any category, if the triangles in the diagram below commute, then the outside square commutes. This is easy using the associative law: If $xf=h$ and $kx=g$, then $kh=k(xf)=(kx)f=gf$.
Subject Diagram
So what?
This theorem is not interesting. The point of this post is to present a new approach to proving such theorems, using diagrams instead of strings. The reason that exhibiting the dig
rammatic proof is interesting is that many different kinds of categories have a FL cattheory, including these:
 Categories with all finite limits, also called FL categories, left exact categories, cartesian categories, or
essentially algebraic theories. Note that what I am saying here is that the category of all categories with finite limits is the category of models of a particular category (a cattheory) with finite limits. It is easy to get syntax and semantics (theory and models) confused!  Finitely complete and cocomplete categories.
 Cartesian closed categories.
 Locally cartesian closed categories.
 Toposes.
Essentially algebraic stringbased logic is described in detail in Partial Horn logic and cartesian categories, by E. Palmgren and Steven Vickers.
Commercial
My concept of form in A generalization of the concept of sketch generalizes sketches to all the categories that can be defined as models of FL cattheories. So the method of proof using diagrams can be applied to theorems about the objects defined by forms.
Concepts needed for the graphbased proof
To prove the theorem, I will make use of $\mathbf{ThCat}$, the FL cattheory for categories.
 An FL category is a category with all finite limits.
 GLBS uses the word cattheory for what Category theory for computing science and Toposes, triples and theories call the theory of a sketch.
 In many books and articles, and in nLab, a “sketch” is what we call the cattheory (or the theory) of a sketch. For us, the sketch is a generating collection of objects, arrows, diagrams, cones and cocones for the cattheory. The category of models of the sketch and the cattheory are equivalent.
 $\mathbf{ThCat}$ is a category with finite limits freely generated by certain designated objects, arrows, commutative diagrams and limit cones, listed below.
 A model of $\mathbf{ThCat}$ in $\mathbf{Set}$ (the category of sets, whichever one you like) is an FL functor $\mathfrak{C}:\mathbf{ThCat}\to\mathbf{Set}.$
 Such a model $\mathfrak{C}$ is a small category, and every small category is such a model. If this statement worries you, read Section 3.4 of GBLS.
 Natural transformations between models are FLpreserving functors that preserve the structure on the nose.
 The category of models of $\mathbf{ThCat}$ in $\mathbf{Set}$ is equivalent to the category of small categories and morphisms, which, unlike the category of models, includes functors that don’t preserve things on the nose.
 $\mathbf{ThCat}$ is an example of the theory of an FL sketch. Chapter 4 of GBLS describes this idea in detail. The theory has the same models as the sketch.
 The sketch generating $\mathbf{ThCat}$ is defined in detail in section 7.2 of GBLS.
Some objects and arrows of $\mathbf{ThCat}$
I will make use of the following objects and arrows that occur in $\mathbf{ThCat}.$ A formal thing is a construction in $\mathbf{ThCat}$ that becomes an actual thing in a model. So for example a model $\mathfrak{C}$ of $\mathbf{ThCat}$ in $\mathbf{Set}$ is an actual (small) category, and $\mathfrak{C}(\mathsf{ar_2})$ is the set of all composable pairs of arrows in the category $\mathfrak{C}$.
 $\mathsf{ob}$, the formal set of objects.
 $\mathsf{ar}$, the formal set of arrows.
 $\mathsf{ar}_2$, the formal set of composable pairs of arrows.
 $\mathsf{ar}_3$, the formal set of composable triples of arrows.
 $\mathsf{unit} : \mathsf{ob}\to \mathsf{ar}$ that formally picks out the identity arrow of an object.
 $\mathsf{dom},\mathsf{cod} : \mathsf{ar}\to \mathsf{ob}$ that formally pick out the domain and codomain of an arrow.
 $\mathsf{comp} : \mathsf{ar}_2\to \mathsf{ar}$ that picks out the composite of a composable pair.
 $\mathsf{lfac}, \mathsf{rfac} :\mathsf{ar}_2\to \mathsf{ar}$ that pick out the left and right factors in a composable pair.
 $\mathsf{lfac}, \mathsf{mfac},\mathsf{rfac} :\mathsf{ar}_3 \to\mathsf{ar}$ that pick out the left, middle and right factors in a composable triple of arrows.
 $\mathsf{lass}, \mathsf{rass} : \mathsf{ar}_3 \to \mathsf{ar}_2$: $\mathsf{lass}$ formally takes $\langle{h,g,f}\rangle$ to $\langle{hg,f}\rangle$ and $\mathsf{rass}$ takes it to $\langle{h,gf}\rangle$.
$\mathsf{ob}$, $\mathsf{ar}$, $\mathsf{unit}$, $\mathsf{dom}$, $\mathsf{cod}$ and $\mathsf{comp}$ are given primitives and the others are defined as limits of finite diagrams composed of those objects. This is spelled out in Chapter 7.2 of GBLS. The definition of $\mathbf{ThCat}$ also requires certain diagrams to be commutative. They are all provided in GBLS; the one enforcing associativity is shown later in this post.
Color coding
I will use color coding to separate syntax from semantics.
 Syntax consists of constructions in $\mathbf{ThCat}.$ The description will always be a commutative diagram in black, with annotations as explained later.
 The limit of the description will be an object in $\mathbf{ThCat}$ (the form) whose value in a model $\mathfrak{C}$ will be shown in green, because being an element of the value of a model makes it semantics.
 When a limit cone is defined, the projections (which are arrows in $\mathbf{ThCat}$) will be shown in blue.
Descriptions
In graphbased logic, a type of construction that can be made in a category has a description, which (in the case of our Theorem) is a finite diagram in $\mathbf{ThCat}$. The value of the limit of the description in a model $\mathfrak{C}$ is the set of all instances of that type of construction in $\mathfrak{C}$.
The Subject Diagram
 This diagram is the subject matter of the Theorem. It is not assumed to be commutative.
 As in most diagrams in category theory texts, the labels in this diagram are variables, so the diagram is implicitly universally quantified. The Subject Diagram is a generic diagram of its shape.
 “Any diagram of its shape” includes diagrams in which some of the nodes may represent the same object. An extreme example is the graph in which every node is an object $\mathsf{E}$ and every arrow is its identity arrow. The diagram below is nevertheless an example of the Subject Diagram:
 Shapes of diagrams are defined properly in Section 2.3 of
GBLS and in Section 4.1 of Category Theory for Computing Science.
The description of the Subject Diagram
Diagram SDD below shows the Subject Diagram as the limit of its description. The description is the black diagram.
Diagram SDD
 Following the practice in GBLS, the nodes of the description are annotated by the variable names used in the Subject Diagram. The annotations are purely for the reader’s convenience, to see for example which copy of $\mathsf{ar}$ formally contains the arrow named $f$.
 The description is a commutative diagram in $\mathbf{ThCat}$, and a model functor $\mathfrak{C}$ must take the description to a commutative diagram in the model. Every node in this commutative diagrams is either the set of all nodes or the set of all arrows in the category $\mathfrak{C}$. Every arrow is either the map that takes an arrow to its domain or the map that takes an arrow to its codomain.
 You can see that the description spells out precisely what the domain and comain of each arrow is. So the value of the limit node must be the set of all diagrams of the form of the Subject Diagram. I have named the limit by exhibiting the Subject Diagram rather than by giving it another random name that you have to remember.
Definition of $\mathsf{ar}_2$
The object $\mathsf{ar}_2$ of composable pairs of arrows is defined as a pullback:
In the usual categorical notation this would be shown as
This makes use of the fact that the unnamed blue arrow is induced by the other two projection arrows. In the rest of the post, projection arrows that are induced are normally omitted.
An enrichment of the description
Because $\mathsf{ar}_2$ is defined as a pullback, we can enrich the description of Diagram SDD by adjoining two pullbacks as shown below. This is Diagram 8.10 in GBLS. The enriched diagram has the same limit as the description of Diagram SDD.
Enriched Diagram SDD
Note that the projections from the limit to the two occurrences of $\mathsf{ar}_2$ induce all the other projections. This follows by diagram chasing; remember that the description must be a commutative diagram.
Make the triangles commute
To make the triangles commute, we add two comp arrows to the enriched diagram as shown below. These two arrows are not induced by the description; they are therefore additions to the description — they describe a more restrictive (green) diagram with commutative triangles and so are shown in black.
Diagram TC: The triangles commute
The left comp makes $xf=h$ and the right comp makes $kx=g$.
The outside square commutes
Now we enrich Diagram TC with four objects, <comp,id>, <id,comp> and three comp arrows as shown in bolder black. These objects and arrows already exist in $\mathbf{ThCat}$ and therefore do not change the limit, which must be the same as the limit of Diagram TC.
The outside square commutes
The diagram in bold black is exactly the commutative diagram that requires associativity for these particular objects and arrows, which immediately implies that $gf=kh$, as the Theorem requires.
By the definitions of $\mathsf{ar_2}$ and $\mathsf{ar_3}$, the part of the description in bold black induces the rest of the diagram. Omitting the rest of the diagram would make $\mathsf{ar_2}$ and $\mathsf{ar_3}$ modules in the sense of GBLS, Chapter 7.4. Modules would be vital to deal with proofs more complicated than the one given here.
References
 Graph based logic and sketches, by Atish Bagchi and Charles Wells.
 Category theory for computing science, by Michael Barr and Charles Wells.
 Toposes, triples and theories, by Michael Barr and Charles Wells.
 On the limitations of sketches, by Michael Barr and Charles Wells. Examples of what forms can describe that sketches can’t.
 Left exact logic, Colin McLarty. Journal of pure and applied algebra 41 (1986), pp. 6366.
 Partial Horn logic and cartesian categories, E. Palmgren and Steven Vickers. Annals of Pure and Applied Logic, 143 (2007), pp. 314353. ISSN 01680072.
 A formalism for the specification of essentially algebraic structures in $2$categories, (1992), by John Power and Charles Wells.
 A generalization of the concept of sketch, (1990), by Charles Wells. This is the founding paper for forms.
 Forms, by Charles Wells. This is an introduction to forms aimed a people with only a little knowledge of category theory.
 Forms diagrams, link to Mathematica notebooks and diagrams concerning Forms.
This work is licensed under a Creative Commons AttributionShareAlike 2.5 License.
A very early satori that occurs with beginning abstract math students
In the previous post Pattern recognition and me, I wrote about how much I enjoyed sudden flashes of understanding that were caused by my recognizing a pattern (or learning about a pattern). I have had several such, shall we say, Thrills in learning about math and doing research in math. This post is about a very early thrill I had when I first started studying abstract algebra. As is my wont, I will make various pronouncements about what these mean for teaching and understanding math.
Cosets
Early in any undergraduate course involving group theory, you learn about cosets.
Basic facts about cosets
 Every subgroup of a group generates a set of left cosets and a set of right cosets.
 If $H$ is a subgroup of $G$ and $a$ and $b$ are elements of $G$, then $a$ and $b$ are in the same left coset of $H$ if and only if $a^{1}b\in H$. They are in the same right coset of $H$ if and only if $ab^{1}\in H$.
 Alternative definition: $a$ and $b$ are in the same left coset of $H$ if $a=bh$ for some $h\in H$ and are in the same right coset of $H$ if $a=hb$ for some $h\in H$
 One of the (left or right) cosets of $H$ is $H$ itself.
 The relations
$a\underset{L}\sim b$ if and only if $a^{1}b\in H$
and
$a\underset{R}\sim b$ if and only if $ab^{1}\in H$are equivalence relations.
 It follows from (5) that each of the set of left cosets of $H$ and the set of right cosets of $H$ is a partition of $G$.
 By definition, $H$ is a normal subgroup of $G$ if the two sets of cosets coincide.
 The index of a subgroup in a group is the cardinal number of (left or right) cosets the subgroup has.
Elementary proofs in group theory
In the course, you will be asked to prove some of the interrelationships between (2) through (5) using just the definitions of group and subgroup. The teacher assigns these exercises to train the students in the elementary algebra of elements of groups.
Examples:
 If $a=bh$ for some $h\in H$, then $b=ah’$ for some $h’\in H$. Proof: If $a=bh$, then $ah^{1}=(bh)h^{1}=b(hh^{1})=b$.
 If $a^{1}b\in H$, then $b=ah$ for some $h\in H$. Proof: $b=a(a^{1}b)$.
 The relation “$\underset{L}\sim$” is transitive. Proof: Let $a^{1}b\in H$ and $b^{1}c\in H$. Then $a^{1}c=a^{1}bb^{1}c$ is the product of two elements of $H$ and so is in $H$.
Miscellaneous remarks about the examples
 Which exercises are used depends on what is taken as definition of coset.
 In proving Exercise 2 at the board, the instructor might write “Proof: $b=a(a^{1}b)$” on the board and the point to the expression “$a^{1}b$” and say, “$a^{1}b$ is in $H$!”
 I wrote “$a^{1}c=a^{1}bb^{1}c$” in Exercise 3. That will result in some brave student asking, “How on earth did you think of inserting $bb^{1}$ like that?” The only reasonable answer is: “This is a trick that often helps in dealing with group elements, so keep it in mind.” See Rabbits.
 That expression “$a^{1}c=a^{1}bb^{1}c$” doesn’t explicitly mention that it uses associativity. That, too, might cause pointing at the board.
 Pointing at the board is one thing you can do in a video presentation that you can’t do in a text. But in watching a video, it is harder to flip back to look at something done earlier. Flipping is easier to do if the video is short.
 The first sentence of the proof of Exercise 3 is, “Let $a^{1}b\in H$ and $b^{1}c\in H$.” This uses rewrite according to the definition. One hopes that beginning group theory students already know about rewrite according to the definition. But my experience is that there will be some who don’t automatically do it.
 An excellent exercise for the students that would require more than short algebraic calculations would be:
 Discuss which of the two definitions of left coset embedded in (2), (3), (5) and (6) is preferable.
 Show in detail how it is equivalent to the other definition.
in beginning abstract math courses, very few teachers
tell students about rewrite according to the definition. Why not?
A theorem
In the undergraduate course, you will almost certainly be asked to prove this theorem:
A subgroup $H$ of index $2$ of a group $G$ is normal in $G$.
Proving the theorem
In trying to prove this, a student may fiddle around with the definition of left and right coset for awhile using elementary manipulations of group elements as illustrated above. Then a lightbulb appears:
In the 1980’s or earlier a well known computer scientist wrote to me that something I had written gave him a satori. I was flattered, but I had to look up “satori”.
If the subgroup has index $2$ then there are two left cosets and two right cosets. One of the left cosets and one of the right cosets must be $H$ itself. In that case the left coset must be the complement of $H$ and so must the right coset. So those two cosets must be the same set! So the $H$ is normal in $G$.
This is one of the earlier cases of sudden pattern recognition that occurs among students of abstract math. Its main attraction for me is that suddenly after a bunch of algebraic calculations (enough to determine that the cosets form a partition) you get the fact that the left cosets are the same as the right cosets by a purely conceptual observation with no computation at all.
This proof raises a question:
Why isn’t this point immediately obvious to students?
I have to admit that it was not immediately obvious to me. However, before I thought about it much someone told me how to do it. So I was denied the Thrill of figuring this out myself. Nevertheless I thought the solution was, shall we say, cute, and so had a little thrill.
A story about how the light bulb appears
In doing exercises like those above, the student has become accustomed to using algebraic manipulation to prove things about groups. They naturally start doing such calculations to prove this theorem. They presevere for awhile…
Scenario I
Some students may be in the habit of abandoning their calculations, getting up to walk around, and trying to find other points of view.
 They think: What else do I know besides the definitions of cosets?
 Well, the cosets form a partition of the group.
 So they draw a picture of two boxes for the left cosets and two boxes for the right cosets, marking one box in each as being the subgroup $H$.
 If they have a sufficiently clear picture in their head of how a partition behaves, it dawns on them that the other two boxes have to be the same.
Remarks about Scenario I
 Not many students at the earliest level of abstract math ever take a break and walk around with the intent of having another approach come to mind. Those who do Will Go Far. Teachers should encourage this practice. I need to push this in abstractmath.org.
 In good weather, David Hilbert would stand outside at a shelf doing math or writing it up. Every once in awhile he would stop for awhile and work in his garden. The breaks no doubt helped. So did standing up, I bet. (I don’t remember where I read this.)
 This scenario would take place only if the students have a clear understanding of what a partition is. I suspect that often the first place they see the connection between equivalence relations and partitions is in a hasty introduction at the beginning of a group theory or abstract algebra course, so the understanding has not had long to sink in.
Scenario II
Some students continue to calculate…
 They might say, suppose $a$ is not in $H$. Then it is in the other left coset, namely $aH$.
 Now suppose $a$ is not in the “other” right coset, the one that is not $H$. But there are only two right cosets, so $a$ must be in $H$.
 But that contradicts the first calculation I made, so the only possibility left is that $a$ is in the right coset $Ha$. So $aH\subseteq Ha$.
 Aha! But then I can use the same argument the other way around, getting $Ha\subseteq aH$.
 So it must be that $aH=Ha$. Aha! …indeed.
Remarks about Scenario 2
 In step (2), the student is starting a proof by contradiction. Many beginning abstract math students are not savvy enough to do this.
 Step (4) involves recognizing that an argument has a dual. Abstractmath.org does not mention dual arguments and I can’t remember emphasizing the idea to my classes. Tsk.
 Scenario 2 involves the student continuing algebraic calculations till the lightbulb strikes. The lightbulb could also occur in other places in the calculation.
References
 Cosets in Wikipedia.
 Normal subgroup in Wikipedia.
 Equivalence Relations. Article in abmath.
 Pattern recognition and me. Post in G&G.
 Pattern recognition in understanding math. Post in G&G.
 Pattern recognition. Article in abmath.
 Rabbits. Article in abmath.
 Representations and models. Article in abmath.
 Rewrite according to the definition. Article in abmath.
Presenting binary operations
This is the first of a set of notes I am writing to help me develop my thoughts about how particular topics in my book Abstracting algebra should be organized. This article describes my plan for the book in some detail. The present post has some thoughts about presenting binary operations.
Before binary operations are introduced
Traditionally, an abstract algebra book assumes that the student is familiar with high school algebra and will then proceed with an observation that such operations as $+$ and $\times$ can be thought of as functions of two variables that take a number to another number. So the first abstract idea is typically the concept of binary operation, although in another post I will consider whether that really should be the first abstract concept.
The Abstracting Algebra book will have a chapter that presents concrete examples of algebraic operations and expressions on numbers as in elementary school and as in high school algebra. This section of the post outlines what should be presented there. Each subsection needs to be expanded with lots of examples.
In elementary school
In elementary school you see expressions such as
 $3+4$
 $3\times 4$
 $34$
The student invariably thinks of these expressions as commands to calculate the value given by the expression.
They will also see expressions such as
\[\begin{equation}
\begin{array}[b]{r}
23\\
355\\
+ 96\\
\hline
\end{array}
\end{equation}\]
which they will take as a command to calculate the sum of the whole list:
\[\begin{equation}
\begin{array}[b]{r}
23\\
355\\
+ 96\\
\hline
474
\end{array}
\end{equation}\]
That uses the fact that addition is associative, and the format suggests using the standard school algorithm for adding up lists. You don’t usually see the same format with more than two numbers for multiplication, even though it is associative as well. In some elementary schools in recent years students are learning other ways of doing arithmetic and in particular are encouraged to figure out short cuts for problems that allow them. But the context is always “do it”, not “this represents a number”.
Algebra
In algebra you start using letters for numbers. In algebra, “$a\times b$” and “$a+b$” are expressions in the symbolic language of math, which means they are like noun phrases in English such as “My friend” and “The car I bought last week and immediately totaled” in that both are used semantically as names of objects. English and the symbolic language are both languages, but the symbolic language is not a natural language, nor is it a formal language.
Example
In beginning algebra, we say “$3+5=8$”, which is a (true) statement.
Basic facts about this equation:
The expressions “$3+5$” and “$8$”
 are not the same expression
 but in the standard semantics of algebra they have the same meaning
 and therefore the equation communicates information that neither “$3+5$” nor “$8$” communicate.
Another example is “$3+5=6+2$”.
Facts like this example need to be communicated explicitly before binary operations are introduced formally. The students in a college abstract algebra class probably know the meaning of an equation operationally (subconsciously) but they have never seen it made explicit. See Algebra is a difficult foreign language.
Note
The equation “$3+5=6+2$” is an expression just as much as “$3+5$” and “$6+2$” are. It denotes an object of type “equation”, which is a mathematical object in the same way as numbers are. Most mathematicians do not talk this way, but they should.
Binary operations
Early examples
Consciousnessexpanding examples should appear early and often after binary operations are introduced.
Common operations
 The GCD is a binary operation on the natural numbers. This disturbs some students because it is not written in infix form. It is associative. The GCD can be defined conceptually, but for computation purposes needs (Euclid’s) algorithm. This gives you an early example of conceptual definitions and algorithms.
 The maximum function is another example of this sort. This is a good place to point out that a binary operation with the “same” definition cen be defined on different sets. The max function on the natural numbers does not have quite the same conceptual definition as the max on the integers.
Extensional definitions
In order to emphasize the arbitrariness of definitions, some random operations on a small finite sets should be given by a multiplication table, on sets of numbers and sets represented by letters of the alphabet. This will elicit the common reaction, “What operation is it?” Hidden behind this question is the fact that you are giving an extensional definition instead of a formula — an algorithm or a combination of familiar operations.
Properties
The associative and commutative properties should be introduced early just for consciousnessraising. Subtraction is not associative or commutative. Rock paper scissors is commutative but not associative. Groups of symmetries are associative but not commutative.
Binary operation as function
The first definition of binary operation should be as a function. For example, “$+$” is a function that takes pairs of numbers to numbers. In other words, $+:\mathbb{Z}\times\mathbb{Z}\to\mathbb{Z}$ is a function.
We then abstract from that example and others like it from specific operations to arbitrary functions $\Delta:S\times S\to S$ for arbitrary sets $S$.
This is abstraction twice.
 First we replace the example operations by an arbitrary operation. such as multiplication, subtraction, GCD and MAX on $\mathbb{Z}$, or something complicated such as \[(x,y)\mapsto 3(xy1)^2(x^2+xy^3)^3\].

Then we replace sets of numbers by arbitrary sets. An example would be the random multiplication on the set $\{1,2,5\}$ given by the table
\[
\begin{array}{cccc}
\Delta& 1&2&5\\
\hline
1&2&2&1\\
2&5&2&1\\
5&2&1&5
\end{array}
\]
This defines a function $\Delta:\{1,2,5\}\times\{1,2,5\}\to\{1,2,5\}$ for which for example $\Delta(2,1)=5$, or $2\Delta 1=5$. This example uses numbers as elements of the set and is good for eliciting the “What operation is it?” question.  I will use examples where the elements are letters of the alphabet, as well. That sort of example makes the students think the letters are variables they can substitute for, another confusion to be banished by the wise professor who know the right thing to say to make it clear. (Don’t ask me; I taught algebra for 35 years and I still don’t know the right thing to say.)
It is important to define prefix notation and infix notation right away and to use both of them in examples.
Other representations of binary operations.
The main way of representing binary operations in Abstracting Algebra will be as trees, which I will cover in later posts. Those posts will be much more interesting than this one.
Binary operations in high school and college algebra
 Some binops are represented in infix notation: “$a+b$”, “$ab$”, and “$a\times b$”.
 “$a\times b$” is usually written “$ab$” for letters and with the “$\times$” symbol for numbers.
 Some binops have idiosyncratic representation: “$a^b$”, “${a}\choose{b}$”.
 A lot of binops such as GCD and MAX are given as functions of two variables (prefix notation) and their status as binary operations usually goes unmentioned. (That is not necessarily wrong.)
 The symbol “$(a,b)$” is used to denote the GCD (a binop) and is also used to denote a point in the plane or an open interval, both of which are not strictly binops. They are binary operations in a multisorted algebra (a concept I expect to introduce later in the book.)
 Some apparent binops are in infix notation but have flaws: In “$a/b$”, the second entry can’t be $0$, and the expression when $a$ and $b$ are integers is often treated as having good forms ($3/4$) and bad forms ($6/8$).
Trees
The chaotic nature of algebraic notation I just described is a stumbling block, but not the primary reason high school algebra is a stumbling block for many students. The big reason it is hard is that the notation requires students to create and hold complicated abstract structures in their head.
Example
This example is a teaser for future posts on using trees to represent binary operations. The tree below shows much more of the structure of a calculation of the area of a rectangle surmounted by a semicircle than the expression
\[A=wh+\frac{1}{2}\left(\pi(\frac{1}{2}w)^2\right)\]
does.
The tree explicitly embodies the thought process that leads to the formula:
 You need to add the area of the rectangle and the area of the semicircle.
 The area of the rectangle is width times height.
 The area of the semicircle is $\frac{1}{2}(\pi r^2)$.
 In this case, $r=\frac{1}{2}w$.
Any mathematician will extract the same abstract structure from the formula\[A=wh+\frac{1}{2}\left(\pi(\frac{1}{2}w)^2\right)\] This is difficult for students beginning algebra.
References
 Abstracting algebra. An outline of the proposed book.
 Abstraction in abstractmath.org.
 Algebra is a difficult foreign language. G&G post.
 Formal language in Wikipedia.
 Monads for high school III. G&G post.
 Monads in Wikipedia
 Natural language in Wikipedia.
 Noun phrases in Wikipedia.
 Symbolic language of math in abstractmath.org.
 Visible Algebra I.G&G post.
 Visible algebra I supplement. G&G post.
 Visible Algebra II. G&G post.
This work is licensed under a Creative Commons AttributionShareAlike 2.5 License.
Monads for High School III: Algebras
The interactive examples in this post require installing Wolfram CDF player, which is free and works on most desktop computers using Firefox, Safari and Internet Explorer, but not Chrome. The source code is the Mathematica Notebook MonadAlg.nb, which is available for free use under a Creative Commons AttributionShareAlike 2.5 License. The notebook can be read by CDF Player if you cannot make the embedded versions in this post work.
This is a continuation of Monads for high school I and Monads for High School II: Lists. This post covers the concept of algebras for the monad for lists.
Lists
$\textrm{Lists}(S)$ is the set of all lists of finite length whose entries are elements of $S$.
 $\boxed{2\; 2\; 4}$ is the way I denote the list of length $3$ whose first and second entries are each $2$ and whose third entry is $4$.
 A list with only one entry, such as $\boxed{2}$, is called a singleton list.
 The empty list $\boxed{\phantom{2}}$ has no entries.
 $\textrm{Lists}^*(S)$ is the set of all nonempty lists of finite length whose entries are elements of $S$.
 $\textrm{Lists}(\textrm{Lists}(S))$ is the list whose entries are lists with entries from $S$.
 For example, $\boxed{\boxed{5\; 7}\; \boxed{2\; 12\; 7}}$ and $\boxed{\boxed{5\; 7\; 2\; 12\; 7}}$ are both entries in $\textrm{Lists}^*(\textrm{Lists}^*(\mathbb{Z}))$. The second one is a singleton list!
 $\boxed{\boxed{\phantom{3}}\; \boxed{2}}
$ and $\boxed{\boxed{\phantom{3}}}$ are entries in $\textrm{Lists}^*(\textrm{Lists}(\mathbb{Z}))$.  The empty list $\boxed{\phantom{2}}$ is an entry in $\textrm{Lists}(\mathbb{Z})$, in $\textrm{Lists}(\textrm{Lists}^*(\mathbb{Z}))$ and in $\textrm{Lists}(\textrm{Lists}(\mathbb{Z}))$. If you have stared at this for more than ten minutes, do something else and come back to it later.
The star notation is used widely in math and computing science to imply that you are including everything except some insignificant shrimp of a thing such as the empty list, the empty set, or $0$. For example, $\mathbb{R}^*$ denotes the set of all nonzero real numbers.
More details about lists are in Monads for High School II: Lists.
Join
The function join (or concatenation) takes two lists and creates a third list. For example, if you join $\boxed{5\; 7}$ to $\boxed{2\; 12\; 7 }$ in that order you get $\boxed{5\; 7\; 2\; 12\; 7}$.
 I will use this notation: join$\boxed{\boxed{5\; 7}\; \boxed{2\; 12\; 7}}=\boxed{5\; 7\; 2\; 12\; 7}$.
 This notation means that I am regarding join as a function that takes a twoelement list in $\textrm{Lists}(\textrm{Lists}(S))$ to an element of $\textrm{Lists}(S)$.
 join removes one level of lists
 join is not commutative: join$\boxed{\boxed{2\; 12\; 7}\; \boxed{5\; 7}}=\boxed{2\; 12\; 7\; 5\; 7}$
 Join is associative, and as for any associative binary operation, join is defined on any finite list of lists of elements of $S$. So for example, join$\boxed{\boxed{5\; 7}\; \boxed{2\; 12\; 7}\; \boxed{1}}=\boxed{5\; 7\; 2\; 12\; 7\; 1}$.
 For any single list $\boxed{a\; b\; c}$, join$\boxed{\boxed{a\; b\; c}}=\boxed{a\; b\; c}$. This is required to make the theory work. It is called the oneidentity property.
 If the empty list $\boxed{\phantom{2}}$ occurs in a list of lists, it disappears when join is applied: join $\boxed{\boxed{2\; 3}\; \boxed{\phantom{2}}\; \boxed{4\; 5\; 6}}=\boxed{2\; 3\; 4\; 5\; 6}$.
More details about join in Monads for High School II: Lists.
The main monad diagram
When you have a list of lists of lists, join can be applied in two different ways, "inside" and "outside" as illustrated in the diagram below. It gives you several different inputs to try out as a way to understand what is happening.
This is the special case of the main diagram for all monads as it applies to the List monad.
As you can see, after doing either of "inside" and "outside", if you then apply join, you get the same list. That list is simply the list of entries in the beginning list (and the two intermediate ones) in the same order, disregarding groupings.
From what I have just written, you must depend on your pattern recognition abilities to learn what inside
and outside
mean. But both can also be described in words.
 The lists outlined in black are lists of elements of $\mathbb{Z}$. In other words, they are elements of $\textrm{Lists}(\mathbb{Z})$.
 The lists outlined in blue are lists of elements of $\textrm{Lists}(\mathbb{Z})$. In other words, they are list of lists of elements of $\mathbb{Z}$. Those are the kinds of things you can apply join to.
 The leftmost list in the diagram, outlined in green, is a list in $\textrm{Lists}(\textrm{Lists}(\mathbb{Z}))$. This means you can apply join in two different ways:
 Each list boxed in blue is a list of lists of integers (two of the are singletons!) so you can apply join to each of them. This is joining inside first.
 You can apply join directly to the leftmost list, which is a list of lists (of lists, but forget that for the moment), so you can apply join to the blue lists. This is join outside first.
To understand this diagram, staring at the diagram (for most people) uses the visual pattern recognition part of your brain (which uses over a fifth of the energy used by your brain) to understand what inside
mean, and then check your understanding by reading the verbal description. Starting by reading the verbal description first does not work as well for most people. and
outside
The unit monad diagram
There is a second unitary diagram for all monads:
The two right hand entries are always the same. Again, I am asking you to use your pattern recognition abilities to learn what singleton list
and singleton each
mean.
The main and unit monad diagrams will be used as axioms to give the general definition of monad. To give those axioms, we also need the concepts of functor and natural transformation, which I will define later after I have finished the monad algebra diagrams for Lists and several other examples.
Algebras for the List monad
If you have any associative binary operation on a set $S$, its definition can be extended to any nonempty list of elements (see Monads for High School I.)
Plus and Times are like that:
 $(3+2)+4$ and $3+(2+4)$ have the same value $9$, so you can write $3+2+4$ and it means $9$ no matter how you calculate it.
 I will be using the notation Plus$\boxed{3\; 2\; 4}$ instead of $3+2+4$.
 Times is also associative, so for example we can write Times$\boxed{3\; 2\; 4}=24$.
 Like join, we require that these operations satisfy oneidentity, so we know Plus$\boxed{3}=3$ and Times$\boxed{3}=3$.
 When the associative binary operation has an identity element, you can also define its value on the empty list as the identity element: Plus$\boxed{\phantom{3}}=0$ and Times$\boxed{\phantom{3}}=1$. I recommend that you experiment with examples to see why it works.
An algebra for the List monad is a function algop:$\textrm{Lists}(S)\to S$ with certain properties: It must satisfy the Main Monad Algebra Diagram and the Unit Monad Algebra Diagram, discussed below.
The main monad algebra diagram
Example using Plus and Times
The following interactive diagram allows you to see what happens with Plus and Times. Afterwards, I will give the general definition.
Plus insides replaces each inside list with the result of applying Plus to it, and the other operation Join is the same operation I have used before.
Another example
The main monad algebra diagram requires that if you have a list of lists of numbers such as the one below, you can add up each list (Plus insides) and then add up the list of totals (top list in diagram), you must get the same answer that you get when you join all the lists of numbers together into one list (bottom list in the diagram) and then add up that list.
This is illustrated by this special case of the main monad algebra diagram for Plus:
General statement of the main monad algebra diagram
Suppose we have any function $\blacksquare$ $:\textrm{Lists}(S)\to S$ for any set $S$.
If we want to give the main monad algebra diagram for $\blacksquare$ we have a problem. We know for example that Plus$\boxed{1\; 2}=3$. But for some elements $a $ and $b$ of $S$, we don’t know what $\blacksquare\boxed{a\; b}$ is. One way to write it is simply to write $\blacksquare\boxed{a\; b}$ (the usual way we write a function). Or we could use tree notation and write
I will use tree notation mostly, but it is a good exercise to redraw the diagrams with functional notation.
Main monad diagram in prose
Below is a presentation of the general main monad algebra diagram using (gasp!) English phrases to describe the nodes.
The unit monad algebra diagram
Suppose $\blacksquare$ is any function from $\textrm{Lists}(S)$ to $S$ for any set $S$. Then the diagram is
This says that if you apply $\blacksquare$ to a singleton you get the unique entry of the singleton. This is not surprising: I defined above what it means when you apply an operation to a singleton just so this would happen!
A particular example
These are specific examples of the general main monad algebra diagram for an arbitrary operation $\blacksquare$:
These examples show that if $\blacksquare$ is any function from $\textrm{Lists}(S)$ to $S$ for any set $S$, then
equals
and
equals
Well, according to some ancient Greek guy, that means
equals
which says that
is an associative binary operation!
The mother of all associative operations
We also know that any associative binary $\blacksquare$ on any set $S$ can be extended to a function on all finite nonempty lists of elements of $S$. This is the general associative law and was discussed (without using that name) in Monads fo High School I.
Let’s put what we’ve done together into one statement:
Every associative binary operation $\blacksquare$ on a set $S$ can be extended uniquely to a function $\blacksquare:\textrm{Lists}^*(S)\to S$ that satisfies both the main monad algebra diagram and the unit monad algebra diagram. Furthermore, any function $\blacksquare:\textrm{Lists}^*(S)\to S$ that satisfies both the main monad algebra diagram and the unit monad algebra diagram is an asssociative binary operation when applied to lists of length $2$ of elements of $S$.
That is why I claim that the NonemptyList monad is the mother of all associative binary operations.
I have not proved this, but the work in this and preceding posts provide (I think) a good intuitive understanding of this fundamental relationship between lists and associative binary operations.
Things to do in upcoming posts
 I have to give a proper definition of monads using the concepts of functor and natural transformation. I expect to do this just for set functors, not mentioning categories.
 Every type of binary operation that is defined by equations corresponds to a monad which is the mother of all binary operations of that type. I will give examples, but not prove the general case.
Other examples of monads
 Associative binary operations on $S$ with identity element (monoids) corresponds to all lists, including the empty list, with entries from $S$.
 Commutative, associative and idempotent binary operations, like and and or in Boolean algebra, correspond to the set monad: $\text{Sets}(S)$ is the set of all finite and countably infinite sets of elements of $S$. (You can change the cardinality restrictions, but you have to have some cardinality restrictions.) Join is simply union.
 Commutative and associative binary operations corresponds to the multiset monad (with a proper definition of join) and appropriate cardinality restrictions. You have to fuss about identity elements here, too.
 Various kinds of nonassociative operations get much more complicated, involving tree structures with equivalence relations on them. I expect to work out a few of them.
 There are lots of monads in computing science that you never heard of (unless you are a computing scientist). I will mention a few of them.
 Every type of binary operation defined by equations corresponds to a monad. But some of them are unsolvable, meaning you cannot describe the monad precisely.
There will probably be long delay before I get back to this project. There are too many other things I want to do!
Naming mathematical objects
Commonword names confuse
Many technical words and phrases in math are ordinary English words ("commonwords") that are assigned a different and precisely defined mathematical meaning.
 Group This sounds to the "layman" as if it ought to mean the same things as "set". You get no clue from the name that it involves a binary operation with certain properties.
 Formula In some texts on logic, a formula is a precisely defined expression that becomes a trueorfalse sentence (in the semantics) when all its variables are instantiated. So $(\forall x)(x>0)$ is a formula. The word "formula" in ordinary English makes you think of things like "$\textrm{H}_2\textrm{O}$", which has no semantics that makes it true or false — it is a symbolic expression for a name.
 Simple group This has a technical meaning: a group with no nontrivial normal subgroup. The Monster Group is "simple". Yes, the technical meaning is motivated by the usual concept of "simple", but to say the Monster Group is simple causes cognitive dissonance.
Beginning students come with the (generally subconscious) expectation that they will pick up clues about the meanings of words from connotations they are already familiar with, plus things the teacher says using those words. They think in terms of refining an understanding they already have. This is more or less what happens in most nonmath classes. They need to be taught what definition means to a mathematician.
Names that don't confuse but may intimidate
Other technical names in math don't cause the problems that commonwords cause.
Named after somebody The phrase "Hausdorff space" leads a math student to understand that it has a technical meaning. They may not even know it is named after a person, but it screams "geek word" and "you don't know what it means". That is a signal that you can find out what it means. You don't assume you know its meaning.
New madeup words Words such as "affine", "gerbe" and "logarithm" are made up of words from other languages and don't have an ordinary English meaning. Acronyms such as "QED", "RSA" and "FOIL" don't occur often. I don't know of any math objects other than "RSA algorithm" that have an acronymic name. (No doubt I will think of one the minute I click the Publish button.) Wholecloth words such as "googol" are also rare. All these sorts of words would be good to name new things since they do not fool the readers into thinking they know what the words mean.
Both types of words avoid fooling the student into thinking they know what the words mean, but some students are intimidated by the use of words they haven't seen before. They seem to come to class ready to be snowed. A minority of my students over my 35 years of teaching were like that, but that attitude was a real problem for them.
Audience
You can write for several different audiences.
Math fans (nonmathematicians who are interested in math and read books about it occasionally) In my posts Explaining higher math to beginners and in Renaming technical concepts, I wrote about several books aimed at explaining some fairly deep math to interested people who are not mathematicians. They renamed some things. For example, Mark Ronan in Symmetry and the Monster used the phrase "atom" for "simple group" presumably to get around the cognitive dissonance. There are other examples in my posts.
Math newbies (math majors and other students who want to understand some aspect of mathematics). These are the people abstractmath.org is aimed at. For such an audience you generally don't want to rename mathematical objects. In fact, you need to give them a glossary to explain the words and phrases used by people in the subject area.
Postsecondary math students These people, especially the math majors, have many tasks:
 Gain an intuitive understanding of the subject matter.
 Understand in practice the logical role of definitions.
 Learn how to come up with proofs.
 Understand the ins and outs of mathematical English, particularly the presence of ordinary English words with technical definitions.
 Understand and master the appropriate parts of the symbolic language of math — not just what the symbols mean but how to tell a statement from a symbolic name.
It is appropriate for books for math fans and math newbies to try to give an understanding of concepts without necessary proving theorems. That is the aim of much of my work, which has more an emphasis on newbies than on fans. But math majors need as well the traditional emphasis on theorem and proof and clear correct explanations.
Lately, books such as Visual Group Theory have addressed beginning math majors, trying for much more effective ways to help the students develop good intuition, as well as getting into proofs and rigor. Visual Group Theory uses standard terminology. You can contrast it with Symmetry and the Monster and The Mystery of the Prime Numbers (read the excellent reviews on Amazon) which are clearly aimed at math fans and use nonstandard terminology.
Terminology for algebraic structures
I have been thinking about the section of Abstracting Algebra on binary operations. Notice this terminology:
The "standard names" are those in Wikipedia. They give little clue to the meaning, but at least most of them, except "magma" and "group", sound technical, cluing the reader in to the fact that they'd better learn the definition.
I came up with the names in the right column in an attempt to make some sense out of them. The design is somewhat like the names of some chemical compounds. This would be appropriate for a text aimed at math fans, but for them you probably wouldn't want to get into such an exhaustive list.
I wrote various pieces meant to be part of Abstracting Algebra using the terminology on the right, but thought better of it. I realized that I have been vacillating between thinking of AbAl as for math fans and thinking of it as for newbies. I guess I am plunking for newbies.
I will call groups groups, but for the other structures I will use the phrases in the middle column. Since the book is for newbies I will include a table like the one above. I also expect to use tree notation as I did in Visual Algebra II, and other graphical devices and interactive diagrams.
Magmas
In the sixties magmas were called groupoids or monoids, both of which now mean something else. I was really irritated when the word "magma" started showing up all over Wikipedia. It was the name given by Bourbaki, but it is a bad name because it means something else that is irrelevant. A magma is just any binary operation. Why not just call it that?
Well, I will tell you why, based on my experience in Ancient Times (the sixties and seventies) in math. (I started as an assistant professor at Western Reserve University in 1965). In those days people made a distinction between a binary operation and a "set with a binary operation on it". Nowadays, the concept of function carries with it an implied domain and codomain. So a binary operation is a function $m:S\times S\to S$. Thinking of a binary operation this way was just beginning to appear in the common mathematical culture in the late 60's, and at least one person remarked to me: "I really like this new idea of thinking of 'plus' and 'times' as functions." I was startled and thought (but did not say), "Well of course it is a function". But then, in the late sixties I was being indoctrinated/perverted into category theory by the likes of John Isbell and Peter Hilton, both of whom were briefly at Case Western Reserve University. (Also Paul Dedecker, who gave me a glimpse of Grothendieck's ideas).
Now, the idea that a binary operation is a function comes with the fact that it has a domain and a codomain, and specifically that the domain is the Cartesian square of the codomain. People who didn't think that a binary operation was a function had to introduce the idea of the universe (universal algebraists) or the underlying set (category theorists): you had to specify it separately and introduce terminology such as $(S,\times)$ to denote the structure. Wikipedia still does it mostly this way, and I am not about to start a revolution to get it to change its ways.
Groups
In the olden days, people thought of groups in this way:
 A group is a set $G$ with a binary operation denoted by juxtaposition that is closed on $G$, meaning that if $a$ and $b$ are any elements of $G$, then $ab$ is in $G$.
 The operation is associative, meaning that if $a,\ b,\ c\in G$, then $(ab)c=a(bc)$.
 The operation has a unity element, meaning an element $e$ for which for any element $a\in G$, $ae=ea=a$.
 For each element $a\in G$, there is an element $b$ for which $ab=ba=e$.
This is a better way to describe a group:
 A group consist of a nullary operation e, a unary operation inv, and a binary operation denoted by juxtaposition, all with the same codomain $G$. (A nullary operation is a map from a singleton set to a set and a unary operation is a map from a set to itself.)
 The value of e is denoted by $e$ and the value of inv$(a)$ is denoted by $a^{1}$.

These operations are subject to the following equations, true for all $a,\ b,\ c\in G$:
 $ae=ea=a$.
 $aa^{1}=a^{1}a=e$.
 $(ab)c=a(bc)$.
This definition makes it clear that a group is a structure consisting of a set and three operations whose axioms are all equations. It was formulated by people in universal algebra but you still see the older form in texts.
The old form is not wrong, it is merely inelegant. With the old form, you have to prove the unity and inverses are unique before you can introduce notation, and more important, by making it clear that groups satisfy equational logic you get a lot of theorems for free: you construct products on the cartesian power of the underlying set, quotients by congruence relations, and other things. (Of course, in AbAl those theorem will be stated later than when groups are defined because the book is for newbies and you want lots of examples before theorems.)
References
 Three kinds of mathematical thinkers (G&G post)
 Technical meanings clash with everyday meanings (G&G post)
 Commonword names for technical concepts (G&G post)
 Renaming technical concepts (G&G post)
 Explaining higher math to beginners (G&G post)
 Visual Algebra II (G&G post)
 Monads for high school II: Lists (G&G post)
 The mystery of the prime numbers: a review (G&G post)
 Hersh, R. (1997a), "Math lingo vs. plain English: Double entendre". American Mathematical Monthly, volume 104, pages 48–51.
 Names (in abmath)
 Cognitive dissonance (in abmath)
Monads for high school II: Lists
The interactive examples in this post require installing Wolfram CDF player, which is free and works on most desktop computers using Firefox, Safari and Internet Explorer, but not Chrome. The source code is the Mathematica Notebook Monad.nb, which is available for free use under a Creative Commons AttributionShareAlike 2.5 License. The notebook can be read by CDF Player if you cannot make the embedded versions in this post work.
Introduction
This is the second part of a series of posts describing how I will lead up to introducing monads in my proposed ebook Abstracting Algebra (AbAl). It follows Monads for high school I. Comments in red are meta and mostly will not be included in the book.
Lists
A list is a specific kind of mathematical object. This is a reasonable specification for lists:
A list of length $n$ determines and is determined by what its first, second, $\ldots$, $n$th entries are.
In this post, lists will always be finite in length.
For doing rigorous proofs you need a precise definition of a list, such as a function from $\{1,2,…,n\}$ to a set, or a recursive definition. This book is not about proofs.
Terminology and representation
The most common way in the symbolic language of math to represent a finite list is to use a commadelimited expression in parentheses. For example, \[(4,4,2,8)\] is the list of length 4 whose first and second entries are both $4$, third entry $2$ and fourth entry $8$.
 The order matters and repetitions are allowed. For example, $(4,4,2,8)$, $(4,2,8)$ and $(4,2,4,8)$ are all different lists.
 Other words for lists are (finite) sequence, word, tuple and string.
 Many mathematicians would call $(4,4,2,8)$ an $4$–tuple.
 My Discrete math classnotes discusses the specification and the definition of lists called tuples there) at length on pages 50ff. This section of AbAl will incorporate some of the information there.
 Some computer languages represent our list without the commas: $(4\,\,4\,\,2\,\,8)$.
 Mathematica represents it this way: $\{4,4,2,8\}$. This conflicts with the usual set notation, where the order does not matter and where repetitions are ignored — the set $\{4,4,2,8\}$ has three elements. But if you type Length[$\{4,4,2,8\}$] in Mathematica, you get the answer 4.
 A list of characters (alphabetical, numerical, or other symbols) can be represented by writing the characters down in order without spaces between them. For example $(a,a,c,d)$ would be written "aacd". This representation is referred to as a string or as a word in computing science. The string "4428" is the base10 representation of the integer $4,428$. Of course, it is also the hexadecimal representation of the integer $17,448$.
 In the text, I will mostly use a cartouche representation: for example, $\boxed{1\ 2\ 3\ 4}$ is the list consisting of the first four positive integers in order.
 The cartouche is more inyourface than the other representations I've listed and as far as I know is not used to mean anything else. I'm not sure I can give any better explanation for why I prefer it than that. Math is supposed to be explicit and precisely defined and justified by clear reasoning, but after all deciding which representation to use is not math, it is art.
Lists with entries from a given set
If $S$ is any set, finite or infinite, $\textrm{Lists}(S)$ denotes the set of all lists of finite length whose entries come from $S$. Thus the set $\textrm{Lists}(\{1,\ 2,\ 3\})$ contains:
 $\boxed{2\ 2\ 3\ 2\ 2\ 1}$,
 $\boxed{3\ 3\ 3\ 3}$,
 the list of length $42$ whose first entry is $3$ and every other entry is $1$,
 the empty list $\boxed{\vphantom{n}}$,
 the singleton lists $\boxed{1}$, $\boxed{2}$ and $\boxed{3}$, and
 an infinite number of other lists,
 but the list $\boxed{4\ 2\ 3}$ is not an element of $\textrm{Lists}(\{1,\ 2,\ 3\})$.
$\textrm{Lists}$ is a function from sets to sets. Its input is any set and its output is the set of all finitelylong lists whose entries are from the input set. We will also use the similar function $\textrm{Lists}^+$ which takes a set to the set of nonempty lists with entries from the set.
Associativity
(Review from Monads for high school I.) If a binary operation is associative, then the operation is defined on any (finite) list of inputs in its underlying set. For example, the sum of the list $\boxed{4\ 4\ 2\ 8}$ is 18. It follows from associativity that you can add it up as $(4+4)+(2+8)$, $4+(4+(2+8))$, $4+((4+2)+8)$, $(4+(4+2))+8$ or as $((4+4)+2)+8$. They all give the same answer. In other words, Plus is in fact an operation on lists of numbers. It is customary to extend associative binary operations to lists of length $0$ and $1$ by setting the value at the empty list to be the identity element of the operation, and the value at a one element list to be its only entry. Thus Plus($\boxed{4\ 4\ 2\ 8}$)$=18$, Plus($\boxed{\ \vphantom{0} }$)$=0$, Times($\boxed{\ \vphantom{0} }$)$=1$ and Plus($\boxed{3 }$)$=3$.
Operations defined on finite lists
You can join two lists together in order to make one list.
The order matters. If you join $\boxed{5\ 7}$ to $\boxed{2\ 12\ 7 }$ in that order you get $\boxed{5\ 7\ 2\ 12\ 7}$.
Join is in fact an associative binary operation on lists. Example:
This means we can define an operation on lists of lists that joins all the lists inside together to make one list.
Notice the blue rectangle disappears when you do the operation. What I have defined here is a function that has a list of lists as input and a list of numbers as output.
The operation of joining lists to get a single list has a property shown by the drawing below (which will be interactive when I work on it some more). Start on the left with a list of lists of lists. The border colors distinguish the innermost lists, bordered in black, from the second level lists, in blue, and the outside list, bordered in green.
 There is only one outside list: It is a list of (blue) lists. That is the kind of list you can apply join to, so when you do you get a single blue list with five lists inside it (on the bottom of the diagram). "Join outside first" means "apply join to the outside list first".
 The single blue list on the bottom is again the kind of list you can apply join to, and when you do you get the lower list on the right end of the diagram.
 However, the green list also contains two lists each of which is a list of lists that you can apply join to. Apply it to both of them and you get the list at the top of the diagram.
 Again, that list is the kind you can apply join to and when you do you get the upper list on the right.
The two lists on the right are the same. That always happens, whatever lists you start with. (Try it with others, and include some singleton and empty lists while you are at it.)
You might not have thought of this property, and now that you see it, it may look like some sort of secondrate phenom to take note of. Or not. But in fact, it turns out that it means that our modest function $\textrm{Lists}^+$, that takes a set to the nonempty set of lists of its elements is a monad. (So is $\textrm{Lists}$.) In order to say this we must define some other concepts: functor and natural transformation, and we have to verify a number of other properties of the $\textrm{List}^+$ function: It is not just a function, it is a functor on the category of sets, the join function is a natural transformation, and some other technicalities.
Once we do that, we can define what the algebras of the join monad are, and it turns out that they are exactly all the associative binary operations.
In other words:
The binary operation of join on nonempty lists is the mother of all associative binary operations.
But that will have to wait for the next post.
References
 Associative in Wikipedia.
 Binary operation in Wikipedia.
 Discrete math classnotes, by Charles Wells.
 Toposes, triples and theories, by Michael Barr and Charles Wells.
Monads for high school I
Notes for viewing
The interactive examples in this post require installing Wolfram CDF player, which is free and works on most desktop computers using Firefox, Safari and Internet Explorer, but not Chrome. The source code is the Mathematica Notebook associative.nb, which is available for free use under a Creative Commons AttributionShareAlike 2.5 License. The notebook can be read by CDF Player if you cannot make the embedded versions in this post work.
Monads in Abstracting Algebra
I've been working on first drafts (topic posts) of several sections of my proposed book Abstracting algebra (AbAl), concentrating on the ideas leading up to monads. This is going slowly because I want the book to be full of illustrations and interactive demos. I am writing the demos in Mathematica simultaneously with writing the text, and designing demos is very s l o w work. It occurred to me that I should write an outline of the path leading up to monads, using some of the demos I have already produced. This is the first of probably two posts about the thread.
 AbAl is intended to give people with a solid high school math background a mental picture of or way of thinking about the various levels of abstraction of high school algebra.
 This outline is not a "Topic post" as described in the AbAl page. In particular, it is not aimed at high school students! It is a guided tour of my current thoughts about a particular thread through the book.
 The AbAl page has a brief outline of the topics to be covered in the whole book. Perhaps it should also have a list of threads like this post.
Associativity
AbAl will have sections introducing functions and binary operations using pictures and demos (not outlined in this thread). The section on binary operations will introduce infix, prefix and postfix notation but will use trees (illustrated below) as the main display method. Then it will introduce associativity, using demos such as this one:
Using this computingscienceish tree notation makes it much easier to visualize what is happening (see Visible Algebra II), compared to, for example, \[(ab)(cd)=a(b(cd))=a((bc)d)=((ab)c)d=(a(bc))d\] In this equation, the abstract structure is hidden. You have to visualize doing the operation starting with the innermost parentheses and moving out. With the trees you can see the computation going up the tree.
I will give examples of associative functions that are not commutative using $2\times2$ matrices and endofunctions on finite sets such as the one below, which gives all the functions from a two element set to itself.
 Note that each function is shown by a diagram, not by an arbitrary name such as "id" or "sw", which would add a burden to the memory for an example that occurs in one place in the book. (See structural notation in the Handbook.)
 The section on composition of functions will also look in some depth at permutations of a threeelement set, anticipating a section on groups.
By introducing a mechanism for transforming trees of associative binary operations, you can demonstrate (as in the demo below) that any associative binary operation is defined on any list of two or more elements of its domain.
For example, applying addition to three numbers $a$, $b$ and $c$ is uniquely defined. This sort of demo gives an understanding of why you get that unique definition but it is not a proof, which requires formal induction. AbAl is not concerned with showing the reader how to prove math statements.
In this section I will also introduce the oneidentity concept: the value of an associative binary operation on a an element $a$ is $a$. Thus applying addition or multiplication to $a$ gives $a$. (The reason for this is that you want an associative binary operation to be a unique quotient of the free associative binary operation. That will come up after we talk about some examples of monads.)
The oneidentity property also implies that for an associative binary operation with identity element, applying the operation to the empty set gives the identity element. Now we can say:
An associative binary operation with identity element is uniquely defined on any finite list of elements of its domain.
Thus, in prefix notation,$+(2,3)=5$, $+(2,3,5)=10$, $+(2)=2$ and $+()=0$. Similarly $\times(2)=2$ and $\times()=1$.
This fact suggests that the natural definition of addition, multiplication, and other associative binary operations is as functions from lists of elements of the domain to elements of the domain. This fits with our early intuition of addition from grade school, not to mention from Excel: Addition is something you do to lists. That feeling (for me) is not so strong for multiplication; for many common business applications you generally multiply two things like price and number sold. That's because multiplication is usually for things of different data types, but you usually add things of the same data type (not apples and oranges?).
That raises the question: Does every function taking lists to elements come from an associative binary operation? I will give an example that says no. But the next thing is to introduce joining lists (concatenation), where we discover that joining lists is an associative binary operation. So it is really an operation on lists of lists. This will turn out to give us a systematic way to define all associative binary operations by one mechanism, because it is an example of a monad. That is for the second installment of this outline.
Visible algebra II
The interactive examples in this post require installing Wolfram CDF player, which is free and works on most desktop computers using Firefox, Safari and Internet Explorer, but not Chrome. The source code is the Mathematica Notebook Wolfram website. The code for the demos is in the Mathematica notebook algebra2.nb, which is available for free use under a Creative Commons AttributionShareAlike 2.5 License. The notebook can be read by CDF Player if you cannot make the embedded versions in this post work.
More about visible algebra
I have written about visible algebra in previous posts (see References). My ideas about the interface are constantly changing. Some new ideas are described here.
In the first place I want to make it clear that what I am showing in these posts is a simulation of a possible visual algebra system. I have not constructed any part of the system; these posts only show something about what the interface will look like. My practice in the last few years is to throw out ideas, not construct completed documents or programs. (I am not saying how long I will continue to do this.) All these posts, Mathematica programs and abstractmath.org are available to reuse under a Creative Commons license.
Commutative and associative operations
Times and Plus are commutative and associative operations. They are usually defined as binary operations. A binary operation $*$ is said to be commutative if for all $x$ and $y$ in the underlying set of the operation, $x*y=y*x$, and it is associative if for all $x$,$y$ and $z$ in the underlying set of the operation, $(x*y)*z=x*(y*z)$.
It is far better to define a commutative and associative operation $*$ on some underlying set $S$ as an operation on any multiset of elements of $S$. A multiset is like a set, in particular elements can be rearranged in any way, but it is not like a set in that elements can be repeated and a different number of repetitions of an element makes a different multiset. So for any particular multiset, the number of repetitions of each element is fixed. Thus $\{a,a,b,b,c\} = \{c,b,a,b,a\}$ but $\{a,a,b,b,c\}\neq\{c,b,a,b,c\}$. This means that the function (operation) Plus, for example, is defined on any multiset of numbers, and \[\mathbf{Plus}\{a,a,b,b,c\}=\mathbf{Plus} \{c,b,a,b,a\}\] but $\mathbf{Plus}\{a,a,b,b,c\}$ might not be equal to $\mathbf{Plus} \{c,b,a,b,c\}$.
This way of defining (any) associative and commutative operation comes from the theory of monads. An operation defined on all the multisets drawn from a particular set is necessarily commutative and associative if it satisfies some basic monad identities, the main one being it commutes with union of multisets (which is defined in the way you would expect, and if this irritates you, read the Wikipedia article on multisets.). You don't have to impose any conditions specifically referring to commutativity or associativity. I expect to write further about monads in a later post.
The input process for a visible algebra system should allow the full strength of this fact. You can attach as many inputs as you want to Times or Plus and you can move them around. For example, you can click on any input and move it to a different place in the following demo.
Other input notations might be suitable for different purposes. The example below shows how the inputs can be placed randomly in two dimensions (but preserving multiplicity). I experimented with making it show the variables slowly moving around inside the circle the way the fish do in that screensaver (which mesmerizes small children, by the way — never mind what it does to me), but I haven't yet made it work.
A visible algebra system might well allow directly input tables to be added up (or multiplied), like the one below. Spreadsheets have such an operation In particular, the spreadsheet operation does not insist that you apply it only as a binary operation to columns with two entries. By far the most natural way to define addition of numbers is as an operation on multisets of numbers.
Other operations
Operations that are associative but not commutative, such as matrix multiplication, can be defined the monad way as operations on finite lists (or tuples or vectors) of numbers. The operation is automatically associative if you require it to preserve concatenation of lists and some other monad requirements.
Some binary operations are neither commutative nor associative. Two such operations on numbers are Subtract and Power. Such operations are truly binary operations; there is no obvious way to apply them to other structures. They are only binary because the two inputs have different roles. This suggests that the inputs be given names, as in the examples below.
Later, I will write more about simplifying trees, solving the max area problem for rectangles surmounted by semicircles, and other things concerning this system of doing algebra.
References
Previous posts about visible algebra
 Making visible the abstraction in algebraic notation
 A visualization of a computation in tree form
 Visible algebra I
 Visible algebra I supplement
Other references