This post has been replaced by the post A slow introduction to category theory.
Tag Archives: abstraction
Introducing abstract topics
I have been busy for the past several years revising abstractmath.org (abmath). Now I believe, perhaps foolishly, that most of the articles in abmath have reached beta, so now it is time for something new.
For some time I have been considering writing introductions to topics in abstract math, some typically studied by undergraduates and some taken by scientists and engineers. The topics I have in mind to do first include group theory and category theory.
The point of these introductions is to get the student started at the very beginning of the topic, when some students give up in total confusion. They meet and fall off of what I have called the abstraction cliff, which is discussed here and also in my blog posts Very early difficulties and Very early difficulties II.
I may have stolen the phrase “abstraction cliff” from someone else.
Group theory
Group theory sets several traps for beginning students.
Multiplication table
- A student may balk when a small finite group is defined using a set of letters in a multiplication table.
“But you didn’t say what the letters are or what the multiplication is?” - Such a definition is an abstract definition, in contrast to the definition of “prime”, for example, which is stated in terms of already known entities, namely the integers.
- The multiplication table of a group tells you exactly what the binary operation is and any set with an operation that makes such a table correct is an example of the group being defined.
- A student who has no understanding of abstraction is going to be totally lost in this situation. It is quite possible that the professor has never even mentioned the concept of abstract definition. The professor is probably like most successful mathematicians: when they were students, they understood abstraction without having to have it explained, and possibly without even noticing they did so.
Cosets
- Cosets are a real killer. Some students at this stage are nowhere near thinking of a set as an object or a thing. The concept of applying a binary operation on a pair of sets (or any other mathematical objects with internal structure) is completely foreign to them. Did anyone ever talk to them about mathematical objects?
- The consequence of this early difficulty is that such a student will find it hard to understand what a quotient group is, and that is one of the major concepts you get early in a group theory course.
- The conceptual problems with multiplication of cosets is similar to those with pointwise addition of functions. Given two functions $f,g:\mathbb{R}\to\mathbb{R}$, you define $f+g$ to be the function \[(f+g)(x):=f(x)+g(x)\] Along with pointwise multiplication, this makes the space of functions $\mathbb{R}\to\mathbb{R}$ a ring with nice properties.
- But you have to understand that each element of the ring is a function thought of as a single math object. The values of the function are properties of the function, but they are not elements of the ring. (You can include the real numbers in the ring as constant functions, but don’t confuse me with facts.)
- Similarly the elements of the quotient group are math objects called cosets. They are not elements of the original group. (To add to the confusion, they are also blocks of a congruence.)
Isomorphic groups
- Many books, and many professors (including me) regard two isomorphic groups as the same. I remember getting anguished questions: “But the elements of $\mathbb{Z}_2$ are equivalence classes and the elements of the group of permutations of $\{1,2\}$ are functions.”
- I admit that regarding two isomorphic groups as the same needs to be treated carefully when, unlike $\mathbb{Z}_2$, the group has a nontrivial automorphism group. ($\mathbb{Z}_3$ is “the same as itself” in two different ways.) But you don’t have to bring that up the first time you attack that subject, any more than you have to bring up the fact that the category of sets does not have a set of objects on the first day you define categories.
Category theory
Category theory causes similar troubles. Beginning college math majors don’t usually meet it early. But category theory has begun to be used in other fields, so plenty of computer science students, people dealing with databases, and so on are suddenly trying to understand categories and failing to do so at the very start.
The G&G post A new kind of introduction to category theory constitutes an alpha draft of the first part of an article introducing category theory following the ideas of this post.
Objects and arrows are abstract
- Every once in a while someone asks a question on Math StackExchange that shows they have no idea that an object of a category need not have elements and that morphisms need not be functions that take elements to elements.
- One questioner understood that the claim that a morphism need not be a function meant that it might be a multivalued function.
Duality
- That misunderstanding comes up with duality. The definition of dual category requires turning the arrows around. Even if the original morphism takes elements to elements, the opposite morphism does not have to take elements to elements. In the case of the category of sets, an arrow in $\text{Set}^{op}$ cannot take elements to elements — for example, the opposite of the function $\emptyset\to\{1,2\}$.
- The fact that there is a concrete category equivalent to $\text{Set}^{op}$ is a red herring. It involves different sets: the function corresponding to the function just mentioned goes from a four-element set to a singleton. But in the category $\text{Set}^{op}$ as defined it is simply an arrow, not a function.
Not understanding how to use definitions
- Some of the questioners on Math Stack Exchange ask how to prove a statement that is quite simple to prove directly from the definitions of the terms involved, but what they ask and what they are obviously trying to do is to gain an intuition in order to understand why the statement is true. This is backward — the first thing you should do is use the definition (at least in the first few days of a math class — after that you have to use theorems as well!
- I have discussed this in the blog post Insights into mathematical definitions (which gives references to other longer discussions by math ed people). See also the abmath section Rewrite according to the definitions.
How an introduction to a math topic needs to be written
The following list shows some of the tactics I am thinking of using in the math topic introductions. It is quite likely that I will conclude that some tactics won’t work, and I am sure that tactics I haven’t mentioned here will be used.
- The introductions should not go very far into the subject. Instead, they should bring an exhaustive and explicit discussion of how to get into the very earliest part of the topic, perhaps the definition, some examples, and a few simple theorems. I doubt that a group theory student who hasn’t mastered abstraction and what proofs are about will ever be ready to learn the Sylow theorems.
- You can’t do examples and definitions simultaneously, but you can come close by going through an example step by step, checking each part of the definition.
- When you introduce an axiom, give an example of how you would prove that some binary operation satisfies the axiom. For example, if the axiom is that every element of a group must have an inverse, right then and there prove that addition on the integers satisfies the axiom and disprove that multiplication on integers satisies it.
- When the definition uses some undefined math objects, point out immediately with examples that you can’t have any intuition about them except what the axioms give you. (In contrast to definition of division of integers, where you and the student already have intuitions about the objects.)
- Make explicit the possible problems with abstractmath.org and Gyre&Gimble) will indeed find it difficult to become mathematical researchers — but not impossible!
- But that is not the point. All college math professors will get people who will go into theoretical computing science, and therefore need to understand category theory, or into particle physics, and need to understand groups, and so on.
- By being clear at the earliest stages of how mathematicians actually do math, they will produce more people in other fields who actually have some grasp of what is going on with the topics they have studied in math classes, and hopefully will be willing to go back and learn some more math if some type of math rears its head in the theories of their field.
- Besides, why do you want to alienate huge numbers of people from math, as our way of teaching in the past has done?
- “Our” means grammar school teachers, high school teachers and college professors.
There is a real split between students who want the definitions first
(most of whom don’t have the abstraction problems I am trying to overcome)
and those who really really think they need examples first (the majority)
because they don’t understand abstraction.
Acknowledgment
Thanks to Kevin Clift for corrections.
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.
Abstraction and axiomatic systems
Abstraction and the axiomatic method
This post will become an article in abstractmath.org.
Abstraction
An abstraction of a concept $C$ is a concept $C’$ with these properties:
- $C’$ includes all instances of $C$ and
- $C’$ is constructed by taking as axioms certain assertions that are true of all instances of $C$.
There are two major situations where abstraction is used in math.
- $C$ may be a familiar concept or property that has not yet been given a math definition.
- $C$ may already have a mathematical definition using axioms. In that case the abstraction will be a generalization of $C$.
In both cases, the math definition may allow instances of $C’$ that were not originally thought of as being part of $C$.
Example: Relations
Mathematicians have made use of relations between math objects since antiquity.
- For real numbers $r$ and $s$. “$r\lt x$” means that $r$ is less than $s$. So the statement “$5\lt 7$” is true, but the statement “$7\lt 5$” is false. We say that “$\lt$” is a relation on the real numbers. Other relations on real numbers denoted by symbols are “$=$” and “$\leq$”.
- Suppose $m$ and $n$ are positive integers. $m$ and $n$ are said to be relatively prime if the greatest common divisor of $m$ and $n$ is $1$. So $5$ and $7$ are relatively prime, but $15$ and $21$ are not relatively prime. So being relatively prime is a relation on positive integers. This is a relation that does not have a commonly used symbol.
- The concept of congruence of triangles has been used for a couple of millenia. In recent centuries it has been denoted by the symbol “$\cong$”. Congruence is a relation on triangles.
One could say that a relation is a true-or-false statement that can be made about a pair of math objects of a certain type. Logicians have in fact made that a formal definition. But when set theory came to be used around 100 years ago as a basis for all definitions in math, we started using this definition:
A relation on a set $S$ is a set $\alpha$ of ordered pairs of elements of $S$.
“$\alpha$” is the Greek letter alpha.
The idea is that if $(s,t)\in\alpha$, then $s$ is related by $\alpha$ to $t$, then $(s,t)$ is an element of $\alpha$, and if $s$ is not related by $\alpha$ to $t$, then $(s,t)$ is not an element of $\alpha$. That abstracts the everyday concept of relationship by focusing on the property that a relation either holds or doesn’t hold between two given objects.
For example, the less-than relation on the set of all real numbers $\mathbb{R}$ is the set \[\alpha:=\{(r,s)|r\in\mathbb{R}\text{ and }s\in\mathbb{R}\text{ and }r\lt s\}\] In other words, $r\lt s$ if and only if $(r,s)\in \alpha$.
Example
A consequence of this definition is that any set of ordered pairs is a relation. Example: Let $\xi:=\{(2,3),(2,9),(9,1),(9,2)\}$. Then $\xi$ is a relation on the set $\{1,2,3,9\}$. Your reaction may be: What relation IS it? Answer: just that set of ordered pairs. You know that $2\xi3$ and $2\xi9$, for example, but $9\xi1$ is false. There is no other definition of $\xi$.
Yes, the relation $\xi$ is weird. It is an arbitrary definition. It does not have any verbal description other than listing the element of $\xi$. It is probably useless. Live with it.
The symbol “$\xi$” is a Greek letter. It looks weird, so I used it to name a weird relation. Its upper case version is “$\Xi$”, which is even weirder. I pronounce “$\xi$” as “ksee” but most mathematicians call it “si” or “zi” (rhyming with “pie”).
Defining a relation as any old set of ordered pairs is an example of a reconstructive generalization.
$n$-ary relations
Years ago, mathematicians started coming up with things that were like relations but which involved more than two elements of a set.
Example
Let $r$, $s$ and $t$ be real numbers. We say that “$s$ is between $r$ and $t$” if $r\lt s$ and $s\lt t$. Then betweenness is a relation that is true or false about three real numbers.
Mathematicians now call this a ternary relation. The abstract definition of a ternary relation is this: A ternary relation on a set $S$ is a set of ordered triple of elements of $S$. This is an reconstructive generalization of the concept of relation that allows ordered triples of elements as well as ordered pairs of elements.
In the case of betweenness, we have to decide on the ordering. Let us say that the betweenness relation holds for the triple $(r,s,t)$ if $r\lt s$ and $s\lt t$. So $(4,5,7)$ is in the betweenness relation and $(4,7,5)$ is not.
You could argue that in the sentence, “$s$ is between $r$ and $t$”, the $s$ comes first, so that we should say that the betweenness relation (meaning $r$ is between $s$ and $t$) holds for $(r,s,t)$ if $s\lt r$ and $r\lt t$. Well, when you write an article you can write it that way. But I am writing this article.
Nowadays we talk about $n$-ary relations for any positive integer $n$. One consequence of this is that if we want to talk just about sets of ordered pairs we must call them binary relations.
When I was a child there was only one kind of guitar and it was called “a guitar”. (My older cousin Junior has a guitar, but I had only a plastic ukelele.) Some time in the fifties, electrically amplified guitars came into being, so we had to refer to the original kind as “acoustic guitars”. I was a teenager when this happened, and being a typical teenager, I was completely contemptuous of the adults who reacted with extreme irritation at the phrase “acoustic guitar”.
The axiomatic method
The axiomatic method is a technique for studying math objects of some kind by formulating them as a type of math structure. You take some basic properties of the kind of structure you are interested in and set them down as axioms, then deduce other properties (that you may or may not have already known) as theorems. The point of doing this is to make your reasoning and all your assumptions completely explicit.
Nowadays research papers typically state and prove their theorems in terms of math structures defined by axioms, although a particular paper may not mention the axioms but merely refer to other papers or texts where the axioms are given. For some common structures such as the real numbers and sets, the axioms are not only not referenced, but the authors clearly don’t even think about them in terms of axioms: they use commonly-known properties (or real numbers or sets, for example) without reference.
The axiomatic method in practice
Typically when using the axiomatic method some of these things may happen:
- You discover that there are other examples of this system that you hadn’t previously known about. This makes the axioms more broadly applicable.
- You discover that some properties that your original examples had don’t hold for some of the new examples. Depending on your research goals, you may then add some of those properties to the axioms, so that the new examples are not examples any more.
- You may discover that some of your axioms follow from others, so that you can omit them from the system.
Example: Continuity
A continuous function (from the set of real numbers to the set of real numbers) is sometimes described as a function whose graph you can draw without lifting your chalk from the board. This is a physical description, not a mathematical definition.
In the nineteenth century, mathematicians talked about continuous functions but became aware that they needed a rigorous definition. One possibility was functions given by formulas, but that didn’t work: some formulas give discontinuous functions and they couldn’t think of formulas for some continuous functions.
This description of nineteenth century math is an oversimplification.
Cauchy produced the definition we now use (the epsilon-delta definition) which is a rigorous mathematical version of the no-lifting-chalk idea and which included the functions they thought of as continuous.
To their surprise, some clever mathematicians produced examples of some weird continuous functions that you can’t draw, for example the sine blur function. In the terminology in the discussion of abstraction above, the abstraction $C’$ (epsilon-delta continuous functions) had functions in it that were not in $C$ (no-chalk-lifting functions.) On the other hand, their definition now applied to functions between some spaces besides the real numbers, for example the complex numbers, for which drawing the graph without lifting the chalk doesn’t even make sense.
Example: Rings
Suppose you are studying the algebraic properties of numbers. You know that addition and multiplication are both associative operations and that they are related by the distributive law: $x(y+z)=xy+xz$. Both addition and multiplication have identity elements ($0$ and $1$) and satisfy some other properties as well: addition forms a commutative group for example, and if $x$ is any number, then $0\cdot x=0$.
One way to approach this problem is to write down some of these laws as axioms on a set with two binary operations without assuming that the elements are numbers. In doing this, you are abstracting some of the properties of numbers.
Certain properties such as those in the first paragraph of this example were chosen to define a type of math structure called a ring. (The precise set of axioms for rings is given in the Wikipedia article.)
You may then prove theorems about rings strictly by logical deduction from the axioms without calling on your familiarity with numbers.
When mathematicians did this, the following events occurred:
- They discovered systems such as matrices whose elements are not numbers but which obey most of the axioms for rings.
- Although multiplication of numbers is commutative, multiplication of matrices is not commutative.
- Now they had to decide whether to require commutative of multiplication as an axioms for rings or not. In this example, historically, mathematicians decided not to require multiplication to be commutative, so (for example) the set of all $2\times 2$ matrices with real entries is a ring.
- They then defined a commutative ring to be a ring in which multiplication is commutative.
- You can prove from the axioms that in any ring, $0 x=0$ for all $x$, so you don’t need to include it as an axiom.
So the name “commutative ring” means the multiplication is commutative, because addition in rings is always commutative. Mathematical names are not always transparent.
Nowadays, all math structures are defined by axioms.
Other examples
- Historically, the first example of something like the axiomatic method is Euclid’s axiomatization of geometry. The axiomatic method began to take off in the late nineteenth century and now is a standard tool in math. For more about the axiomatic method see the Wikipedia article.
- Partitions. and equivalence
relationsare two other concepts that have been axiomatized. Remarkably, although the axioms for the two types of structures are quite different, every partition is in fact an equivalence relation in exactly one way, and any equivalence relation is a partition in exactly one way.
Remark
Many articles on the web about the axiomatic method emphasize the representation of the axiom system as a formal logical theory (formal system).
In practice, mathematicians create and use a particular axiom system as a tool for research and understanding, and state and prove theorems of the system in semi-formal narrative form rather than in formal logic.
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.
A mathematical saga
This post outlines some of the intellectual developments in the history of math. I call it a saga because it is like one:
- It is episodic, telling one story after another.
- It does not try to give an accurate history, but its episodes resemble what happened in math in the last 3000 years.
- It tells about only a few of the things that happened.
Early techniques
We represented numbers by symbols.
Thousands of years ago, we figured out how to write down words and phrases in such a way that someone much later could read and understand them.
Naturally, we wanted to keep records of the number of horses the Queen owned, so we came up with various notations for numbers (number representing count). In some societies, these symbols were separate from the symbols used to represent words.
We invented algorithms
We discovered positional notation. We write $213$, which is based on a system: it means $2\times100+1\times10+3\times 1$. This notation encapsulates a particular computation of a number (its base-10 representation). (The expression $190+23$ is another piece of notation that encapsulates a computation that yields $213$.)
Compare that to the Roman notation $CCXIII$, which is an only partly organized jumble.
Try adding $CCXIII+CDXXIX$. (The answer is $DCXLII$.)
Positional notation allowed us to create the straightforward method of addition involving adding single digits and carrying:
\[\overset{\hspace{6pt}1\phantom{1}}
{\frac
{\overset{\displaystyle{213}}{429}}{642}
}
\]
Measuring land requires multiplication, which positional notation also allows us to perform easily.
The invention of such algorithms (methodically manipulating symbols) made it easy to calculate with numbers.
Geometry: Direct construction of mathematical objects
We discovered geometry in ancient times, in laying out plots of land and designing buildings. We had a bunch of names for different shapes and for some of them we knew how to calculate their area, perimeter and other things.
Euclid showed how to construct new geometric figures from given ones using specific methods (ruler and compasses) that preserve some properties.
Example
We can bisect a line segment (black) by drawing two circles (blue) centered at the endpoints with radius the length of the line segment. We then construct a line segment (red) between the points of intersection of the circle that intersects the given line segment at its midpoint. These constructions can be thought of as algorithms creating and acting on geometric figures rather than on symbols.
It is true that diagrams were drawn to represent line segments, triangles and so on.
But the diagrams are visualization helpers. The way we think about the process is that we are operating directly on the geometric objects to create new ones. We are thinking of the objects Platonically, although we don’t have to share Plato’s exact concept of their reality. It is enough to say we are thinking about the objects as if they were real.
Axioms and theorems
Euclid came up with the idea that we should write down axioms that are true of these figures and constructions, so that we can systematically use the constructions
to prove theorems about figures using axioms and previously proved theorems. This provided documented reasoning (in natural language, not in symbols) for building up a collection of true statements about math objects.
Example
After creating some tools for proving triangles are congruent, we can prove the the intersection of red and black lines in the figure really is the midpoint of the black line by constructing the four green line segments below and making appeals to congruences between the triangles that show up:
Note that the green lines have the same length as the black line.
Euclid thought about axioms and theorems as applying to geometry, but he also proved theorems about numbers by representing them as ratios of line segments.
Algebra
People in ancient India and Greece knew how to solve linear and quadratic equations using verbal descriptions of what you should do.
Later, we started using a symbolic language to express numerical problems and symbolic manipulation to solve (some of) them.
Example
The quadratic formula is an encapsulated computation that provides the roots of a quadratic equation. Newton’s method is a procedure for finding a root of an arbitrary polynomial. It is recursive in the loose sense (it does not always give an answer).
The symbolic language is a vast expansion of the symbolic notation for numbers. A major innovations was to introduce variables to represent unknowns and to state equations that are always true.
Logic
Aristotle developed an early form of logic (syllogisms) aimed at determining which arguments in rhetoric were sound. “All men are mortal. Socrates is a man. Therefore Socrates is mortal.” This was written in sentences, not in symbols.
By explicit analogy with algebra, we introduced symbolism and manipulation rules for logical reasoning, with an eye toward making mathematical reasoning sound and to some extent computable. For example, in one dialect of logical notation, modus ponens (used in the Socrates syllogism) is expressed as $(P\rightarrow Q,\,P)\,\,\vdash\,\, Q$. This formula is an encapsulated algorithm: it says that if you know $P\rightarrow Q$ and $P$ are valid (are theorems) then $Q$ is valid as well.
Crises of understanding
We struggled with the notion of function as a result of dealing with infinite series. For example, the limit of a sequence of algebraic expressions may not be an algebraic expression. It would no longer do to think of a function as the same thing as an algebraic expression.
We realized that Euclid’s axioms for geometry lacked clarity. For example, as I understand it, the original version of his axioms didn’t imply that the two circles in the proof above had to intersect each other. There were other more subtle problems. Hilbert made a big effort to spell out the axioms in more detail.
We refined our understanding of logic by trying to deal with the mysteries of calculus, limits and spaces. An example is the difference between continuity and uniform continuity.
We also created infinitesimals, only to throw up our hands because we could not make a logic that fit them. Infinitesimals were temporarily replaced by the use of epsilon-delta methods.
We began to understand that there are different kinds of spaces. For example, there were other models of some of Euclid’s axioms than just Euclidean space, and some of those models showed that the parallel axiom is independent of the other axioms. And we became aware of many kinds of topological spaces and manifolds.
We started to investigate sets, in part because spaces have sets of points. Then we discovered that a perfectly innocent activity like considering the set of all sets resulted in an impossibility.
We were led to consider how to understand the Axiom of Choice from several upsetting discoveries. For example, the Banach-Tarski “paradox” implies that you can rearrange the points in a sphere of radius $1$ to make two spheres of radius $1$.
Mathematics adopts a new covenant… for awhile
These problems caused a kind of tightening up, or rigorizing.
For a period of time, less than a century, we settled into a standard way of practicing research mathematics called new math or modern math. Those names were used mostly by math educators. Research mathematicians might have called it axiomatic math based on set theory. Although I was around for the last part of that period I was not aware of any professional mathematicians calling it anything at all; it was just what we did.
First, we would come up with a new concept, type of math object, or a new theorem. In this creative process we would freely use intuition, metaphors, images and analogies.
Example
We might come up with the idea that a function reaches its maximum when its graph swoops up from the left, then goes horizontally for an infinitesimal amount of time, then swoops down to the right. The point at which it was going horizontally would obviously have to be the maximum.
But when we came to publish a paper about the subject, all these pictures would disappear. All our visual, metaphorical/conceptual and kinetic feelings that explain the phenomenon would have to be suppressed.
Rigorizing consisted of a set of practices, which I will hint at:
Orthodox behavior among mathematicians in 1950
Definition in terms of sets and axioms
Each mathematical object had to be defined in some way that started with a set and some other data defined in terms of the set. Axioms were imposed on these data. Everything had to be defined in terms of sets, including functions and relations. (Multiple sets were used occasionally.)
Definitions done in this way omit a lot of the intuition that we have concerning the object being defined.
Examples
- The definition of group as a set with a binary operation satisfying some particular axioms does not tell you that groups constitute the essence of symmetry.
- The definitions of equivalence relation and of partition do not even hint that they define the same concept.
Even so, definitions done in this way have an advantage: They tend to be close to minimal in the sense that to verify that something fits the definition requires checking no more (or not much more) than necessary.
Proofs had to be clearly translatable into symbolic logic
First order logic (and other sorts of logic) were well developed and proofs were written in a way that they could in principle be reduced to arguments written in the notation of symbolic logic and following the rules of inference of logic. This resulted in proofs which did not appeal to intuition, metaphors or pictures.
Example
In the case of the theorem that the maximum of a (differentiable) function occurs only where the derivative is zero, that meant epsilon-delta proofs in which the proof appeared as a thick string of symbols. Here, “thick” means it had superscripts, subscripts, and other things that gave the string a fractal dimension of about $1.2$ (just guessing!).
Example
When I was a student at Oberlin College in 1959, Fuzzy Vance (Elbridge P. Vance) would sometimes stop in the middle of an epsilon-delta proof and draw pictures and provide intuition. Before he started that he would say “Shut the door, don’t tell anyone”. (But he told us!)
Example
A more famous example of this is the story that Oscar Zariski, when presenting a proof in algebraic geometry at the board, would sometimes remind himself of a part of a proof by hunching over the board so the students couldn’t see what he was doing and drawing a diagram which he would immediately erase. (I fault him for not telling them about the diagram.)
It doesn’t matter whether this story is true or not. It is true in the sense that any good myth is true.
Commercial
I wrote about rigor in these articles:
Rigorous view in abstractmath.org.
Dry bones, post in this blog.
Logic and sets clarify but get in the way
The orthodox method of “define it by sets and axioms” and “makes proofs at least resemble first order logic” clarified a lot of suspect proofs. But it got in the way of intuition and excessive teaching by using proofs made it harder to students to learn.
- The definition of a concept can make you think of things that are foreign to your intuition of the concept. A function is a mapping,. The ordered pairs are a secondary construction; you should not think of ordered pairs as essential to your intuition. Even so the definition of function in terms of ordered pairs got rid of a lot of cobwebs.
- The cartesian product of sets is obviously an associative binary operation. Except that if you define the cartesian product of sets in terms of ordered pairs then it is not associative.
- Not only that, but if you define the ordered pair $(a,b)$ as $\{\{a,b\},a\}$ the you have to say that $a$ is an element of $(a,b)$ but $b$ is not That is not merely an inconvenient definition of ordered pair, it is wrong. It is not bad way to show that the concept of ordered pair is consistent with ZF set theory, but that is a minor point mathematicians hardly ever worry about.
Mathematical methods applied to everything
The early methods described at the beginning of this post began to be used everywhere in math.
Algorithms on symbols
Algorithms, or methodical procedures, began with the addition and multiplication algorithms and Euclid’s ruler and compass constructions, but they began to be used everywhere.
They are applied to the symbols of math, for example to describe rules for calculating derivatives and integrals and for summing infinite series.
Algorithms are used on strings, arrays and diagrams of math symbols, for example concatenating lists, multiplying matrices, and calculating binary operations on trees.
Algorithms as definitions
Algorithms are used to define the strings that make up the notation of symbolic logic. Such definitions include something like: “If $E$ and $F$ are expressions than $(E)\land (F)$ and $(\forall x)(E)$ are expressions”. So if $E$ is “$x\geq 3$” then $(\forall x)(x\geq 3)$ is an expression. This had the effect of turning an expression in symbolic logic into a mathematical object. Deduction rules such as “$E\land F\vdash E$” also become mathematical objects in this way.
We can define the symbols and expressions of algebra, calculus, and other part of math using algorithms, too. This became a big deal when computer algebra programs such as Mathematica came in.
Example
You can define the set $RP$ of real polynomials this way:
- $0\in RP$
- If $p\in RP$ then $p+r x^n\in RP$, where $x$ is a variable and $n$ a nonnegative integer.
That is a recursive definition. You can also define polynomials by pattern recognition:
Let $n$ be a positive integer, $a_0,\,a_1\,\ldots a_n$ be real numbers and $k_0,\,k_1\,\ldots k_n$ be nonnegative integers. Then $a_0 x^{k_0}+a_1 x^{k_1}+\ldots+ a_n x^{k_n}$ is a polynomial.
The recursive version is a way of letting a compiler discover that a string of symbols is a polynomial. That sort of thing became a Big Deal when computers arrived in our world.
Algorithms on mathematical objects
I am using the word “algorithm” in a loose sense to mean any computation that may or may not give a result. Computer programs are algorithms, but so is the quadratic formula. You might not think of a formula as an algorithm, but that is because if you use it in a computer program you just type in the formula; the language compiler has a built-in algorithm to execute calculations given by formulas.
It has not been clearly understood that mathematicians apply algorithms not only to symbols, but also directly to mathematical objects. Socrates thought that way long ago, as I described in the construction of a midpoint above. The procedure says “draw circles with center at the endpoints of the line segment.” It doesn’t say “draw pictures of circles…”
In the last section and this one, I am talking about how we think of applying an algorithm. Socrates thought he was talking about ideal lines and circles that exist in some other universe that we can access by thought. We can think about them as real things without making a metaphysical claim like Socrates did about them. Our brains are wired to think of abstract ideas in some many of the same ways we think about physical objects.
Example
The unit circle (as a topological space at least) is the quotient space of the space $\mathbb{R}$ of real numbers mod the equivalence relation defined by: $x\sim y$ if and only if $x-y$ is an integer.
Mathematicians who understand that construction may have various images in their mind when they read this. One would be something like imagining the real line $\mathbb{R}$ and then gluing all the points together that are an integer apart. This is a distinctly dizzying thing to think about but mathematicians aren’t worried because they know that taking the quotient of a space is a well-understood construction that works. They might check that by imagining the unit circle as the real line wrapped around an infinite number of times, with points an integer apart corresponding to the same point on the unit circle. (When I did that check I hastily inserted the parenthetical remark saying “as a topological space” because I realized the construction doesn’t preserve the metric.) The point of this paragraph is that many mathematicians think of this construction as a construction on math objects, not a construction on symbols.
Everything is a mathematical object
A lot of concepts start out as semi-vague ideas and eventually get defined as mathematical objects.
Examples
- A function was originally thought of as a formula, but then get formalized in the days of orthodoxy as a set of ordered pairs with the functional property.
- The concept of equation has been turned into a math object many times, for example in universal algebra and in logic. I suspect that some practitioners in those fields might disagree with me. This requires further research.
- Propositions are turned into math objects by Boolean Algebra.
- Perhaps numbers were always thought of as math objects, but much later the set $\mathbb{N}$ of all natural numbers and the set $\mathbb{R}$ of all real numbers came to be thought of explicitly as math objects, causing some mathematicians to have hissy fits.
- Definitions are math objects. This has been done in various ways. A particular theory is a mathematical object, and it is essentially a definition by definition (!): Its models are what the theory defines. A particular example of “theory” is first-order theory which was the gold standard in the orthodox era. A classifying topos is also a math object that is essentially a definition.
Category Theory
The introduction of categories broke the orthodoxy of everything-is-a-set. It has become widely used as a language in many branches of math. It started with problems in homological algebra arising in at least these two ways:
- Homotopy classes of continuous functions are not functions in the set theory sense. So we axiomatized the concept of function as an arrow (morphism) in a category.
- The concept of mathematical object is axiomatized as an object in a category. This forces all properties of an object to be expressed in terms of its external relations with other objects and arrows.
- Categories capture the idea of “kind of math”. There is a category of groups and homomorphisms, a category of topological spaces and homeomorphisms, and so on. This is a new level of abstraction. Before, if someone said “I work in finite groups”, their field was a clear idea and people knew what they were talking about, but now the category of finite groups is a mathematical object.
- Homology maps one kind of math (topology) into another kind (algebra). Since categories capture the general notion of “kind of math”, we invented the idea of functor to capture the idea of modeling or representing one branch of math in another one. So Homology became a mathematical object.
- The concept of functor allowed the definition of natural transformation as a mathematical object. Before categories, naturality was only an informal idea.
Advantages of category theory
- Categories, in the form of toposes, quickly became candidates to replace set theory as a foundation system for math. They are more flexible and allow the kind of logic you want to use (classical, intuitionistic and others) to be a parameter in your foundational system.
- “Arrow” (morphism) replaced not only the concept of function but also the concept of “element of” (“$\in$”). It allows the concept of variable elements. (This link is to a draft of a section of abstractmath.org that has not been included in the table of contents yet.) It also requires that an “element” has to be an element of one single object; for example, the maps $1\to \mathbb{N}$ and $1\to \mathbb{R}$ that pick out the number $42$ are not the same maps, although of course they are related by the canonical inclusion map $\mathbb{N}\to\mathbb{R}$.
- Diagrams are used in proofs and give much better immediate understanding than formulas written in strings, which compress a lot of things unnecessarily into thick strings that require understanding lots of conventions and holding things in your memory.
- Categories-as-kinds-of-math makes it easy to turn an analogy, for example between products of groups and products of spaces, into two examples of the same kind of mathematical object: Namely, a product in a category.
Disadvantages of category theory
- Category theory requires a new way of thinking. Some people think that is a disadvantage. But genuine innovation is always disruptive. New technology breaks old technology. Of course, the new technology has to turn out to be useful to win out.
- Category theory has several notions of “equal”. Objects can be the same or isomorphic. Categories can be isomorphic or equivalent. When you are doing category theory, you should never worry about whether two objects are equal: that is considered evil. Category theorists generally ignored the fuzziness of this problem because you can generally get away with it. Still, it was an example of something that had not been turned into a mathematical definition. Two ways of accomplishing this are anafunctors and homotopy type theory.
I expect to write about homotopy type theory soon. It may be the Next Revolution.
Abstracting algebra
This post has been turned into a page on WordPress, accessible in the upper right corner of the screen. The page will be referred to by all topic posts for Abstracting Algebra.
Visible Algebra I
This is the first in a series of articles about how algebra could be implemented without using the standard language of algebra that so many people find difficult. The code for the graphs are in the Mathematica notebook Algebra1.nb.
An algebra problem
Suppose you are designing a window that is in the shape of a rectangle surmounted by a semicircle, shown above for the window with width 2 and rectangle height 3.
This example occurs in a tiresomely familiar calculus problem where you put a constraint on the perimeter of the window, thus turning it into a one-variable problem, then finding the values of the width and height that give the maximum area. In this post, I am not going to get that far. All I will do is come up with a calculation for the area. I will describe a way you might do it on a laptop five or ten years from now.
You have an algebra application that shows a screen with some operations that you may select to paste into your calculation. The ones we use are called plus, times, power, value and input. You choose a function called value, and label it "Area of window". You recognize that the answer is the sum of the areas of the rectangle and the area of the semicircle, so you choose plus and attach to it two inputs which you label "area of rectangle" and "area of semicircle", like this:
The notational metaphor is that the computation starts at the bottom and goes upward, performing the operations indicated.
You know (or are told by the system) that the area of a rectangle is the product of its width and height, so you replace the value called "area of rectangle" with a times button and attach two values called $w$ and $h$:
You also determine that the area under the semicircle is half the area of a circle of radius $r$ (where $r$ must be calculated).
You have a function for the area of a circle of radius $r$, so you attach that:
Finally, you use the fact that you know that the semicircle has a radius which is half the width of the rectangle.
Now, to make the calculation operational, you attach two inputs named "width" and "height" and feed them into the values $w$ and $h$. When you type numbers into these buttons, the calculation will proceed upward and finally show the area of the window at the top.
In a later post I will produce a live version of this diagram. (Added 2012-09-08: the live version is here.) Right now I want to get this post out before I leave for MathFest. (I might even produce the live version at MathFest, depending on how boring the talks are.)
You can see an example of a live calculation resembling this in my post A visualization of a computation in tree form.
Remarks
Who
- This calculation might be a typical exercise for a student part way along learning basic algebra.
- College students and scientists and engineers would have a system with a lot more built-in functions, including some they built themselves.
Syntax
- Once you have grasped the idea that the calculation proceed upward from the inputs, carrying out the operations shown, this picture is completely self-explanatory.
- Well, you have to know what the operations do.
- The syntax for standard algebra is much more difficult to learn (more later about this).
- The syntax actually used in later years may not look like mine.
- For one thing, the flow might run top down or left to right instead of bottom up.
- Or something very different might be used. What works best will be discovered by using different approaches.
- The syntax is fully two-dimensional, which makes it simple to understand (because it uses the most powerful tool our brain has: the visual system).
- The usual algebraic code was developed because people used pencil and paper.
- I would guess that the usual code has fractional dimension about 1.2.
- The tree syntax would require too much writing with pencil and paper. That is alleviated on a computer by using menus.
- Once you construct the computation and input some data it evaluates automatically.
- It may be worthwhile to use 3D syntax. I have an experiment with this in my post Showing categorical diagrams in 3D.
Later posts will cover related topics:
- The difficulties with standard algebraic notation. They are not trivial.
- Solving equations in tree form.
- Using properties such as associativity and commutativity in tree form.
- Using this syntax with calculus.
- The deep connection with Lawvere theories and sketches.
References
- A visualization of a computation in tree form (previous post)
- Making visible the abstraction in algebraic notation (previous post)
- Scrubbing Calculator by Bret Victor
- Showing categorical diagrams in 3D (previous post)
- Soulver
- The symbolic language of math (abstractmath)
- Up and down the ladder of abstraction by Brett Victor
Metaphors in computing science I
(This article is continued in Metaphors in computing science II)
Michael Barr recently told me of a transcription of a talk by Edsger Dijkstra dissing the use of metaphors in teaching programming and advocating that every program be written together with a proof that it works. This led me to think about the metaphors used in computing science, and that is what this post is about. It is not a direct answer to what Dijkstra said.
We understand almost anything by using metaphors. This is a broader sense of metaphor than that thing in English class where you had to say "my love is a red red rose" instead of "my love is like a red red rose". Here I am talking about conceptual metaphors (see references at the end of the post).
Metaphor: A program is a set of instructions
You can think of a program as a list of instructions that you can read and, if it is not very complicated, understand how to carry them out. This metaphor comes from your experience with directions on how to do something (like directions from Google Maps or for assembling a toy). In the case of a program, you can visualize doing what the program says to do and coming out with the expected output. This is one of the fundamental metaphors for programs.
Such a program may be informal text or it may be written in a computer language.
Example
A description of how to calculate $n!$ in English could be: "Multiply the integers $1$ through $n$". In Mathematica, you could define the factorial function this way:
fac[n_] := Apply[Times, Table[i, {i, 1, n}]]
This more or less directly copies the English definition, which could have been reworded as "Apply the Times function to the integers from $1$ to $n$ inclusive." Mathematica programmers customarily use the abbreviation "@@" for Apply because it is more convenient:
Fac[n_]:=Times @@ Table[i, {i, 1, 6}]
As far as I know, C does not have list operations built in. This simple program gives you the factorial function evaluated at $n$:
j=1; for (i=2; i<=n; i++) j=j*i; return j;
This does the calculation in a different way: it goes through the numbers $1, 2,\ldots,n$ and multiplies the result-so-far by the new number. If you are old enough to remember Pascal or Basic, you will see that there you could use a DO loop to accomplish the same thing.
What this metaphor makes you think of
Every metaphor suggests both correct and incorrect ideas about the concept.
- If you think of a list of instructions, you typically think that you should carry out the instructions in order. (If they are Ikea instructions, your experience may have taught you that you must carry out the instructions in order.)
- In fact, you don't have to "multiply the numbers from $1$ to $n$" in order at all: You could break the list of numbers into several lists and give each one to a different person to do, and they would give their answers to you and you would multiply them together.
- The instructions for calculating the factorial can be translated directly into Mathematica instructions, which does not specify an order. When $n$ is large enough, Mathematica would in fact do something like the process of giving it to several different people (well, processors) to speed things up.
- I had hoped that Wolfram alpha would answer "720" if I wrote "multiply the numbers from $1$ to $6$" in its box, but it didn't work. If it had worked, the instruction in English would not be translated at all. (Note added 7 July 2012: Wolfram has repaired this.)
- The example program for C that I gave above explicitly multiplies the numbers together in order from little to big. That is the way it is usually taught in class. In fact, you could program a package for lists using pointers (a process taught in class!) and then use your package to write a C program that looks like the "multiply the numbers from $1$ to $n$" approach. I don't know much about C; a reader could probably tell me other better ways to do it.
So notice what happened:
- You can translate the "multiply the numbers from $1$ to $n$" directly into Mathematica.
- For C, you have to write a program that implements multiplying the numbers from $1$ to $n$. Implementation in this sense doesn't seem to come up when we think about instruction sets for putting furniture together. It is sort of like: Build a robot to insert & tighten all the screws.
Thus the concept of program in computing science comes with the idea of translating the program instruction set into another instruction set.
- The translation provided above for Mathematica resembles translating the instruction set into another language.
- The two translations I suggested for C (the program and the definition of a list package to be used in the translation) are not like translating from English to another language. They involve a conceptual reconstruction of the set of instructions.
Similarly, a compiler translates a program in a computer language into machine code, which involves automated conceptual reconstruction on a vast scale.
Other metaphors
In writing about this, I have brought in other metaphors, for example:
- C or Mathematica as like a natural language in some ways
- Compiling (or interpreting) as translation
Computing science has used other VIM's (Very Important Metaphors) that I need to write about later:
- Semantics (metaphor: meaning)
- Program as text — this allows you to treat the program as a mathematical object
- Program as machine, with states and actions like automata and Turing machines.
- Specification of a program. You can regard "the product of the numbers from $1$ to $n$" as a specification. Notice that saying "the product" instead of "multiply" changes the metaphor from "instruction" to "specification".
References
Conceptual metaphors (Wikipedia)
Images and Metaphors (article in abstractmath)
Images and Metaphors for Sets (article in abstractmath)
Images and Metaphors for Functions (incomplete article in abstractmath)
An Elaborate Riemann Sums Demo
This post has been rewritten and posted on the abstractmath.org Articles Page as Elaborate Riemann Sum Demo.
Offloading abstraction
The interactive examples in this post require installing Wolfram CDF Player., which is free and works on most desktop computers using Firefox, Safari and Internet Explorer, but not Chrome. The source code is the Mathematica Notebook Tangent Line.nb, which is available for free use under a Creative Commons Attribution-ShareAlike 2.5 License. The notebook can be read by CDF Player if you cannot make the embedded versions in this post work.
The diagram above shows you the tangent line to the curve $y=x^3-x$ at a specific point. The slider allows you to move the point around, and the tangent line moves with it. You can click on one of the plus signs for options about things you can do with the slider. (Note: This is not new. Many other people have produced diagrams like this one.)
I have some comments to make about this very simple diagram. I hope they raise your consciousness about what is going on when you use a manipulable demonstration.
Farming out your abstraction load
A diagram showing a tangent line drawn on the board or in a paper book requires you visualize how the tangent line would look at other points. This imposes a burden of visualization on you. Even if you are a new student you won't find that terribly hard (am I wrong?) but you might miss some things at first:
- There are places where the tangent line is horizontal.
- There are places where some of the tangent lines cross the curve at another point. Many calculus students believe in the myth that the tangent line crosses the curve at only one point. (It is not really a myth, it is a lie. Any decent myth contains illuminating stories and metaphors.)
- You may not envision (until you have some experience anyway) how when you move the tangent line around it sort of rocks like a seesaw.
You see these things immediately when you manipulate the slider.
Manipulating the slider reduces the load of abstract thinking in your learning process. You have less to keep in your memory; some of the abstract thinking is offloaded onto the diagram. This could be described as contracting out (from your head to the picture) part of the visualization process. (Visualizing something in your head is a form of abstraction.)
Of course, reading and writing does that, too. And even a static graph of a function lowers your visualization load. What interactive diagrams give the student is a new tool for offloading abstraction.
You can also think of it as providing external chunking. (I'll have to think about that more…)
Simple manipulative diagrams vs. complicated ones
The diagram above is very simple with no bells and whistles. People have come up with much more complicated diagrams to illustrate a mathematical point. Such diagrams:
- May give you buttons that give you a choice of several curves that show the tangent line.
- May give a numerical table that shows things like the slope or intercept of the current tangent line.
- May also show the graph of the derivative, enabling you to see that it is in fact giving the value of the slope.
Such complicated diagrams are better suited for the student to play with at home, or to play with in class with a partner (much better than doing it by yourself). When the teacher first explains a concept, the diagrams ought to be simple.
Examples
- The Definition of derivative demo (from the Wolfram Demonstration Project) is an example that provides a table that shows the current values of some parameters that depend on the position of the slider.
- The Wolfram demo Graphs of Taylor Polynomials is a good example of a demo to take home and experiment extensively with. It gives buttons to choose different functions, a slider to choose the expansion point, another one to choose the number of Taylor polynomials, and other things.
- On the other hand, the Wolfram demo Tangent to a Curve is very simple and differs from the one above in one respect: It shows only a finite piece of the tangent line. That actually has a very different philosophical basis: it is representing for you the stalk of the tangent space at that point (the infinitesimal vector that contains the essence of the tangent line).
- Brian Hayes wrote an article in American Scientist containing a moving graph (it moves only on the website, not in the paper version!) that shows the changes of the population of the world by bars representing age groups. This makes it much easier to visualize what happens over time. Each age group moves up the graph — and shrinks until it disappears around age 100 — step by step. If you have only the printed version, you have to imagine that happening. The printed version requires more abstract visualization than the moving version.
- Evaluating an algebraic expression requires seeing the abstract structure of the expression, which can be shown as a tree. I would expect that if the students could automatically generate the tree (as you can in Mathematica) they would retain the picture when working with an expression. In my post computable algebraic expressions in tree form I show how you could turn the tree into an evaluation aid. See also my post Syntax trees.
This blog has a category "Mathematica" which contains all the graphs (many of the interactive) that are designed as an aid to offloading abstraction.
Prechunking
The emerging theory of how the brain works gives us a new language to us for discussing how we teach, learn and communicate math.
Modules
Our minds have many functionalities. They are implemented by what I called modules in Math and modules of the mind because I don’t understand very much about what cognitive scientists have learned about how these functionalities are carried out. They talk about a particular neuron, a collection of neurons, electrical charges flowing back and forth, and so on, and it appears there is no complete agreement about these ideas.
The functions the modules implement are physical structures or activities in the brain. At a certain level of abstraction we can ignore the mechanism.
Most modules carry out functionalities that are hidden from our consciousness.
- When we walk, the walking is carried out by a module that operates without our paying (much) attention to it.
- When we recognize someone, the identity of the person pops into our consciousness without us knowing how it got there. Indeed, we cannot introspect to see how the process was carried out; it is completely hidden.
Reasoning, for example if you add 56 and 49 in your head, has part of the process visible to your introspection, but not all of it. It uses modules such as the sum of 9 and 6 which feel like random access memory. When you carry the addition out, you (or at least I) are conscious of the carry: you are aware of it and aware of adding it to 9 to get 10.
Good places to find detailed discussion of this hiddenness are references [2] and [4] below.
Chunking
Math ed people have talked for years about the technique of chunking in doing math.
- You see an algebraic expression, you worry about how it might be undefined, you gray out all of it except the denominator and inspect that, and so on. (This should be the subject of a Mathematica demo.)
- You look at a diagram in the category of topological spaces. Each object in the diagram stands for a whole, even uncountably infinite, space with lots of open and closed subsets and so on, but you think of it just as a little pinpoint in the diagram to discover facts about its relationship with other spaces. You don’t look inside the space unless you have to to verify something.
Students have a hard time doing that. When an experienced mathematician does this, they are very likely to chunk subconsciously; they don’t think, “Now I am chunking”. Nevertheless, you can call it to their attention and they will be aware of the process.
There are modules that perform chunking whose operation you cannot be aware of even if you think about it. Here are two examples.
Example 1. Consider these two sentences from [2], p. 137:
- “I splashed next to the bank.”
- “There was a run on the bank.”
When you read the first one you visualize a river bank. When you read the second one you visualize a bank as an institution that handles money. If these two sentences were separated by a couple of paragraphs, or even a few words, in a text you are likely not to notice that you have processed the same word in two different ways. (When they are together as above it is kind of blatant.)
The point is the when you read each sentence your brain directly presents you with the proper image in each case (different ones as appropriate). You cannot recover the process that did that (by introspection, anyway).
Example 2. I discussed the sentence below in the Handbook. The sentence appears in references [3].
…Richard Darst and Gerald Taylor investigated the
differentiability of functions (which for our
purposes we will restrict to ) defined for
each by
In this sentence, the identical syntax appears twice; the first occurrence refers to the open interval from 0 to 1 and the second refers to the GCD of integers m and n. When I first inserted it into the Handbook’s citation list, I did not notice that (I was using it for another phenomenon, although now I have forgotten what it was). Later I noticed it. My mind preprocessed the two occurrences of the syntax and threw up two different meanings without my noticing it.
Of course, “restricting to (0, 1)” doesn’t make sense if (0, 1) means the GCD of 0 and 1, and saying “(m, n) = 1” doesn’t make sense if (m, n) is an interval. This preprocessing no doubted came to its two different conclusions based on such clues, but I claim that this preprocessing operated at a much deeper level of the brain than the preprocessing that results in your thinking (for example) of a topological space as a single unstructured object in a category.
This phenomenon could be called prechunking. It is clearly a different phenomenon that zooming in on a denominator and then zooming out on the whole expression as I described in [1].
This century’s metaphor
In the nineteenth century we came up with a machine metaphor for how we think. In the twentieth century the big metaphor was our brain is a computer. This century’s metaphor is that of a bunch a processes in our brain and in our body all working simultaneously, mostly out of our awareness, to enable us to live our life, learn things, and just as important (as Davidson [4] points out) to unlearn things. But don’t think we have Finally Discovered The Last Metaphor.
References
- Zooming and chunking in abstractmath.org.
- Mark Changizi, The vision revolution. Benbella Books, 2009.
- Mark Frantz, “Two functions whose powers make fractals”. American Mathematical Monthly, v 105, pp 609–617 (1998).
- Cathy N. Davidson, Now you see it. Viking Penguin, 2011. Chapters 1 and 2.
- Math and modules of the mind (previous post).
- Cognitive science in Wikipedia.
- Charles Wells, The handbook of mathematical discourse, Infinity Publishing Company, 2003.