Please read this post at abstractmath.org. I originally posted the document here but some of the diagrams would not render, and I haven’t been able to figure out why. Sorry for having to redirect.
Send to KindlePlease read this post at abstractmath.org. I originally posted the document here but some of the diagrams would not render, and I haven’t been able to figure out why. Sorry for having to redirect.
Send to KindleThis post outlines some of the intellectual developments in the history of math. I call it a saga because it is like one:
Thousands of years ago, we figured out how to write down words and phrases in such a way that someone much later could read and understand them.
Naturally, we wanted to keep records of the number of horses the Queen owned, so we came up with various notations for numbers (number representing count). In some societies, these symbols were separate from the symbols used to represent words.
We discovered positional notation. We write $213$, which is based on a system: it means $2\times100+1\times10+3\times 1$. This notation encapsulates a particular computation of a number (its base-10 representation). (The expression $190+23$ is another piece of notation that encapsulates a computation that yields $213$.)
Compare that to the Roman notation $CCXIII$, which is an only partly organized jumble.
Try adding $CCXIII+CDXXIX$. (The answer is $DCXLII$.)
Positional notation allowed us to create the straightforward method of addition involving adding single digits and carrying:
\[\overset{\hspace{6pt}1\phantom{1}}
{\frac
{\overset{\displaystyle{213}}{429}}{642}
}
\]
Measuring land requires multiplication, which positional notation also allows us to perform easily.
The invention of such algorithms (methodically manipulating symbols) made it easy to calculate with numbers.
We discovered geometry in ancient times, in laying out plots of land and designing buildings. We had a bunch of names for different shapes and for some of them we knew how to calculate their area, perimeter and other things.
Euclid showed how to construct new geometric figures from given ones using specific methods (ruler and compasses) that preserve some properties.
We can bisect a line segment (black) by drawing two circles (blue) centered at the endpoints with radius the length of the line segment. We then construct a line segment (red) between the points of intersection of the circle that intersects the given line segment at its midpoint. These constructions can be thought of as algorithms creating and acting on geometric figures rather than on symbols.

It is true that diagrams were drawn to represent line segments, triangles and so on.
But the diagrams are visualization helpers. The way we think about the process is that we are operating directly on the geometric objects to create new ones. We are thinking of the objects Platonically, although we don’t have to share Plato’s exact concept of their reality. It is enough to say we are thinking about the objects as if they were real.
Euclid came up with the idea that we should write down axioms that are true of these figures and constructions, so that we can systematically use the constructions
to prove theorems about figures using axioms and previously proved theorems. This provided documented reasoning (in natural language, not in symbols) for building up a collection of true statements about math objects.
After creating some tools for proving triangles are congruent, we can prove the the intersection of red and black lines in the figure really is the midpoint of the black line by constructing the four green line segments below and making appeals to congruences between the triangles that show up:

Note that the green lines have the same length as the black line.
Euclid thought about axioms and theorems as applying to geometry, but he also proved theorems about numbers by representing them as ratios of line segments.
People in ancient India and Greece knew how to solve linear and quadratic equations using verbal descriptions of what you should do.
Later, we started using a symbolic language to express numerical problems and symbolic manipulation to solve (some of) them.
The quadratic formula is an encapsulated computation that provides the roots of a quadratic equation. Newton’s method is a procedure for finding a root of an arbitrary polynomial. It is recursive in the loose sense (it does not always give an answer).
The symbolic language is a vast expansion of the symbolic notation for numbers. A major innovations was to introduce variables to represent unknowns and to state equations that are always true.
Aristotle developed an early form of logic (syllogisms) aimed at determining which arguments in rhetoric were sound. “All men are mortal. Socrates is a man. Therefore Socrates is mortal.” This was written in sentences, not in symbols.
By explicit analogy with algebra, we introduced symbolism and manipulation rules for logical reasoning, with an eye toward making mathematical reasoning sound and to some extent computable. For example, in one dialect of logical notation, modus ponens (used in the Socrates syllogism) is expressed as $(P\rightarrow Q,\,P)\,\,\vdash\,\, Q$. This formula is an encapsulated algorithm: it says that if you know $P\rightarrow Q$ and $P$ are valid (are theorems) then $Q$ is valid as well.
We struggled with the notion of function as a result of dealing with infinite series. For example, the limit of a sequence of algebraic expressions may not be an algebraic expression. It would no longer do to think of a function as the same thing as an algebraic expression.
We realized that Euclid’s axioms for geometry lacked clarity. For example, as I understand it, the original version of his axioms didn’t imply that the two circles in the proof above had to intersect each other. There were other more subtle problems. Hilbert made a big effort to spell out the axioms in more detail.
We refined our understanding of logic by trying to deal with the mysteries of calculus, limits and spaces. An example is the difference between continuity and uniform continuity.
We also created infinitesimals, only to throw up our hands because we could not make a logic that fit them. Infinitesimals were temporarily replaced by the use of epsilon-delta methods.
We began to understand that there are different kinds of spaces. For example, there were other models of some of Euclid’s axioms than just Euclidean space, and some of those models showed that the parallel axiom is independent of the other axioms. And we became aware of many kinds of topological spaces and manifolds.
We started to investigate sets, in part because spaces have sets of points. Then we discovered that a perfectly innocent activity like considering the set of all sets resulted in an impossibility.
We were led to consider how to understand the Axiom of Choice from several upsetting discoveries. For example, the Banach-Tarski “paradox” implies that you can rearrange the points in a sphere of radius $1$ to make two spheres of radius $1$.
These problems caused a kind of tightening up, or rigorizing.
For a period of time, less than a century, we settled into a standard way of practicing research mathematics called new math or modern math. Those names were used mostly by math educators. Research mathematicians might have called it axiomatic math based on set theory. Although I was around for the last part of that period I was not aware of any professional mathematicians calling it anything at all; it was just what we did.
First, we would come up with a new concept, type of math object, or a new theorem. In this creative process we would freely use intuition, metaphors, images and analogies.
We might come up with the idea that a function reaches its maximum when its graph swoops up from the left, then goes horizontally for an infinitesimal amount of time, then swoops down to the right. The point at which it was going horizontally would obviously have to be the maximum.
But when we came to publish a paper about the subject, all these pictures would disappear. All our visual, metaphorical/conceptual and kinetic feelings that explain the phenomenon would have to be suppressed.
Rigorizing consisted of a set of practices, which I will hint at:
Each mathematical object had to be defined in some way that started with a set and some other data defined in terms of the set. Axioms were imposed on these data. Everything had to be defined in terms of sets, including functions and relations. (Multiple sets were used occasionally.)
Definitions done in this way omit a lot of the intuition that we have concerning the object being defined.
Even so, definitions done in this way have an advantage: They tend to be close to minimal in the sense that to verify that something fits the definition requires checking no more (or not much more) than necessary.
First order logic (and other sorts of logic) were well developed and proofs were written in a way that they could in principle be reduced to arguments written in the notation of symbolic logic and following the rules of inference of logic. This resulted in proofs which did not appeal to intuition, metaphors or pictures.
In the case of the theorem that the maximum of a (differentiable) function occurs only where the derivative is zero, that meant epsilon-delta proofs in which the proof appeared as a thick string of symbols. Here, “thick” means it had superscripts, subscripts, and other things that gave the string a fractal dimension of about $1.2$ (just guessing!).
When I was a student at Oberlin College in 1959, Fuzzy Vance (Elbridge P. Vance) would sometimes stop in the middle of an epsilon-delta proof and draw pictures and provide intuition. Before he started that he would say “Shut the door, don’t tell anyone”. (But he told us!)
A more famous example of this is the story that Oscar Zariski, when presenting a proof in algebraic geometry at the board, would sometimes remind himself of a part of a proof by hunching over the board so the students couldn’t see what he was doing and drawing a diagram which he would immediately erase. (I fault him for not telling them about the diagram.)
It doesn’t matter whether this story is true or not. It is true in the sense that any good myth is true.
I wrote about rigor in these articles:
Rigorous view in abstractmath.org.
Dry bones, post in this blog.
The orthodox method of “define it by sets and axioms” and “makes proofs at least resemble first order logic” clarified a lot of suspect proofs. But it got in the way of intuition and excessive teaching by using proofs made it harder to students to learn.
The early methods described at the beginning of this post began to be used everywhere in math.
Algorithms, or methodical procedures, began with the addition and multiplication algorithms and Euclid’s ruler and compass constructions, but they began to be used everywhere.
They are applied to the symbols of math, for example to describe rules for calculating derivatives and integrals and for summing infinite series.
Algorithms are used on strings, arrays and diagrams of math symbols, for example concatenating lists, multiplying matrices, and calculating binary operations on trees.
Algorithms are used to define the strings that make up the notation of symbolic logic. Such definitions include something like: “If $E$ and $F$ are expressions than $(E)\land (F)$ and $(\forall x)(E)$ are expressions”. So if $E$ is “$x\geq 3$” then $(\forall x)(x\geq 3)$ is an expression. This had the effect of turning an expression in symbolic logic into a mathematical object. Deduction rules such as “$E\land F\vdash E$” also become mathematical objects in this way.
We can define the symbols and expressions of algebra, calculus, and other part of math using algorithms, too. This became a big deal when computer algebra programs such as Mathematica came in.
You can define the set $RP$ of real polynomials this way:
That is a recursive definition. You can also define polynomials by pattern recognition:
Let $n$ be a positive integer, $a_0,\,a_1\,\ldots a_n$ be real numbers and $k_0,\,k_1\,\ldots k_n$ be nonnegative integers. Then $a_0 x^{k_0}+a_1 x^{k_1}+\ldots+ a_n x^{k_n}$ is a polynomial.
The recursive version is a way of letting a compiler discover that a string of symbols is a polynomial. That sort of thing became a Big Deal when computers arrived in our world.
I am using the word “algorithm” in a loose sense to mean any computation that may or may not give a result. Computer programs are algorithms, but so is the quadratic formula. You might not think of a formula as an algorithm, but that is because if you use it in a computer program you just type in the formula; the language compiler has a built-in algorithm to execute calculations given by formulas.
It has not been clearly understood that mathematicians apply algorithms not only to symbols, but also directly to mathematical objects. Socrates thought that way long ago, as I described in the construction of a midpoint above. The procedure says “draw circles with center at the endpoints of the line segment.” It doesn’t say “draw pictures of circles…”
In the last section and this one, I am talking about how we think of applying an algorithm. Socrates thought he was talking about ideal lines and circles that exist in some other universe that we can access by thought. We can think about them as real things without making a metaphysical claim like Socrates did about them. Our brains are wired to think of abstract ideas in some many of the same ways we think about physical objects.
The unit circle (as a topological space at least) is the quotient space of the space $\mathbb{R}$ of real numbers mod the equivalence relation defined by: $x\sim y$ if and only if $x-y$ is an integer.
Mathematicians who understand that construction may have various images in their mind when they read this. One would be something like imagining the real line $\mathbb{R}$ and then gluing all the points together that are an integer apart. This is a distinctly dizzying thing to think about but mathematicians aren’t worried because they know that taking the quotient of a space is a well-understood construction that works. They might check that by imagining the unit circle as the real line wrapped around an infinite number of times, with points an integer apart corresponding to the same point on the unit circle. (When I did that check I hastily inserted the parenthetical remark saying “as a topological space” because I realized the construction doesn’t preserve the metric.) The point of this paragraph is that many mathematicians think of this construction as a construction on math objects, not a construction on symbols.
A lot of concepts start out as semi-vague ideas and eventually get defined as mathematical objects.
The introduction of categories broke the orthodoxy of everything-is-a-set. It has become widely used as a language in many branches of math. It started with problems in homological algebra arising in at least these two ways:
I expect to write about homotopy type theory soon. It may be the Next Revolution.
Send to KindleThe interactive examples in this post require installing Wolfram CDF player, which is free and works on most desktop computers using Firefox, Safari and Internet Explorer, but not Chrome. The source code is the Mathematica Notebook SolvEq.nb, which is available for free use under a Creative Commons Attribution-ShareAlike 2.5 License. The notebook can be read by CDF Player if you cannot make the embedded versions in this post work.
This post explains some basic distinctions that need to be made about the process of writing and explaining math. Everyone who teaches math knows subconsciously what is happening here; I am trying to raise your consciousness. For simplicity, I have chosen a technique used in elementary algebra, but much of what I suggest also applies to more abstract college level math.
Solve the equation "$ax=b$" ($a\neq0$).
Understanding the statement of this problem requires a lot of Secret Knowledge (the language of ninth grade algebra) that most people don't have.
The expression "$ax$" means that $a$ and $x$ are numbers and $ax$ is their product. It is not the word "ax". You have to know that writing two symbols next to each other means multiply them, except when it doesn't mean multiply them as in "$\sin\,x$".
The whole expression "$ax=b$" ostensibly says that the number $ax$ is the same number as $b$. In fact, it means more than that. The phrase "solve the equation" tells you that in fact you are supposed to find the value of $x$ that makes $ax$ the same number as $b$.
How do you know that "solve the equation" doesn't mean find the value of $a$ that makes $ax$ the same number as $b$? Answer: The word "solve" triggers a convention that $x$, $y$ and $z$ are numbers you are trying to find and $a$, $b$, $c$ stand for numbers that you are allowed to plug in to the equation.
The conventions of symbolic math require that you give a solution for any nonzero value of $a$ and any value of $b$. You specifically are not allowed to pick $a=1$ and $b=33$ and find the value just for those numbers. (Some college calculus students do this with problems involving literal coefficients.)
The little thingy "$(a\neq0)$" must be read as a constraint on $a$. It does not mean that $a\neq0$ is a fact that you ought to know. ( I've seen college math students make this mistake, admittedly in more complex situations). Nor does it mean that you can't solve the problem if $a=0$ (you can if $b$ is also zero!).
So understanding what this problem asks, as given, requires (fairly sophisticated in some cases) pattern recognition both to understand the symbolic language it uses, and also to understand the special conventions of the mathematical English that it uses.
This problem could be reworded so that it gives an explicit description of the problem, not requiring pattern recognition. (Warning: "Not requiring pattern recognition" is a fuzzy concept.) Something like this:
You have two numbers $a$ and $b$. Find a number $c$ for which if you multiply $a$ by $c$ you get $b$.
This version is not completely explicit. It still requires understanding the idea of referring to a number by a letter, and it still requires pattern recognition to catch on that the two occurrences of each letter means that their meanings have to match. Also, I know from experience that some American first year college students have trouble with the syntax of the sentence ("for which…", "if…").
The following version is more explicit, but it cheats by creating an ad hoc way to distinguish the numbers.
Alice and Bob each give you a number. How do you find a number with the property that Alice's number times your number is equal to Bob's number?
If the problem had a couple more variables it would be so difficult to understand in an explicit form that most people would have to draw a picture of the relationships between them. That is why algebraic notation was invented.
Algebra is a difficult foreign language. Showing the problem visually makes it easier to understand for most people. Our brain's visual processing unit is the most powerful tool the brain has to understand things. There are various ways to do this.
Visualization can help someone understand algebraic notation better.
You can state the problem by producing examples such as
where the reader has to know the multiplication symbol and, one hopes, will recognize "$\boxed{\text{??}}$" as "What's the value?". But the reader does not have to understand what it means to use letters for numbers, or that "$x$ means you are suppose to discover what it is". This way of writing an algebra problem is used in some software aimed at K-12 students. Some of them use a blank box instead of "$\boxed{\text{??}}$".
Such software often shows the algorithm for solving the problem visually, using algebraic notation like this:
I have put in some buttons to show numbers as well as $a$ and $b$. If you have access to Mathematica instead of just to CDF player, you can load SolvEq.nb and put in any numbers you want, but CDF's don't allow input data.
You can also illustrate the algorithm using the tree notation for algebra I used in Monads for high school I (and other posts). The demo below shows how to depict the value-preserving transformation given by the algorithm. (In this case the value is the truth since the root operation is equals.)
This demo is not as visually satisfactory as the one illustrating the use of the associative law in Monads for high school I. For one thing, I had to cheat by reversing the placement of $a$ and $x$. Note that I put labels for the numerator and denominator legs, a practice I have been using in demos for a while for noncommutative operations. I await a new inspiration for a better presentation of this and other equation-solving algorithms.
Another advantage of using pictures is that you can often avoid having to code things as letters which then has to be remembered. In Monads for high school I, I used drawings of the four functions from a two-element set to itself instead of assigning them letters. Even mnemonic letters such as $s$ for "switch" and $\text{id}$ for the identity element carry a burden that the picture dispenses with.
Send to KindleThis post is a first pass at what will eventually be a section of abstractmath.org. It’s time to get back to abstractmath; I have been neglecting it for a couple of years.
What I say here is based mainly on my many years of teaching discrete mathematics at Case Western Reserve University in Cleveland and more recently at Metro State University in Saint Paul.
Beginning abstract math
College students typically get into abstract math at the beginning in such courses as linear algebra, discrete math and abstract algebra. Certain problems that come up in those early courses can be grouped together under the notion of (what I call) applying templates [note 0]. These are not the problems people usually think about concerning beginners in abstract math, of which the following is an incomplete list:
The students’ problems discussed here concern understanding what a template is and how to apply it.
Templates can be formulas, rules of inference, or mini-programs. I’ll talk about three examples here.
The template for quadratic equations
The solution of a real quadratic equation of the form is given by the formula
This is a template for finding the roots of the equations. It has subtleties.
For example, the numerator is symmetric in and
but the denominator isn’t. So sometimes I try to trick my students (warning them ahead of time that that’s what I’m trying to do) by asking for a formula for the solution of the equation
. The answer is
I start writing it on the board, asking them to tell me what comes next. When we get to the denominator, often someone says “”.
The template is telling you that the denominator is 2 times the coefficient of the square term. It is not telling you it is “”. Using a template (in the sense I mean here) requires pattern matching, but in this particular example, the quadratic template has a shallow incorrect matching and a deeper correct matching. In detail, the shallow matching says “match the letters” and the deep matching says “match the position of the letters”.
Most of the time the quadratic being matched has particular numbers instead of the same letters that the template has, so the trap I just described seldom occurs. But this makes me want to try a variation of the trick: Find the solution of . Would some students match the textual position (getting
) instead of the functional position (getting
)? [Note [0]). If they did they would get the solutions
instead of
.
Substituting in algebraic expressions have other traps, too. What sorts of mistakes would students have solving ?
Most students on the verge of abstract math don’t make mistakes with the quadratic formula that I have described. The thing about abstract math is that it uses more sophisticated templates
The template for proof by induction
This template gives a method of proof of a statement of the form , where
is a predicate (presumably containing
as a variable) and
varies over positive integers. The template says:
Goal: Prove .
Method:
For example, to prove using the template, you have to prove that
, and that for any
, if
, then
. You come up with the need to prove these statements by substituting into the template. This template has several problems that the quadratic formula does not have.
Variables of different types
The variable is of type integer and the variable
is of type predicate [note 0]. Having to deal with several types of variables comes up already in multivariable calculus (vectors vs. numbers, cross product vs. numerical product, etc) and they multiply like rabbits in beginning abstract math classes. Students sometimes write things like “Let
”. Multiple types is a big problem that math ed people don’t seem to discuss much (correct me if I am wrong).
Free and bound
The variable occurs as a bound variable in the Goal and a free variable in the Method. This happens in this case because the induction step in the Method originates as the requirement to prove
, but as I have presented it (which seems to be customary) I have translated this into a requirement based on modus ponens. This causes students problems, if they notice it. (“You are assuming what you want to prove!”) Many of them apparently go ahead and produce competent proofs without noticing the dual role of
. I say more power to them. I think.
The template has variations
It’s like you can go to the store and buy one template for quadratic equations, but you have to by a package of templates for induction, like highway engineers used to buy packages of plastic French curves to draw highway curves without discontinuous curvature.
The template for row reduction
I am running out of time and won’t go into as much detail on this one. Row reduction is an algorithm. If you write it up as a proper computer program there have to be all sorts of if-thens depending on what you are doing it for. For example if want solutions to the simultaneous equations
| 2x+4y+z | = | 1 |
| x+2y | = | 0 |
| x+2y+4z | = | 5 |
you must row reduce the matrix
| 2 | 4 | 1 | 1 |
| 1 | 2 | 0 | 0 |
| 1 | 2 | 4 | 5 |
(I haven’t yet figured out how to wrap this in parentheses) which gives you
| 1 | 2 | 0 | 0 |
| 0 | 0 | 1 | 0 |
| 0 | 0 | 0 | 1 |
This introduces another problem with templates: They come with conditions. In this case the condition is “a row of three 0s followed by a nonzero number means the equations have no solutions”. (There is another condition when there is a row of all 0’s.)
It is very easy for the new student to get the calculation right but to never sit back and see what they have — which conditions apply or whatever.
When you do math you have to repeatedly lean in and focus on the details and then lean back and see the Big Picture. This is something that has to be learned.
What to do, what to do
I have recently experimented with being explicit about templates, in particular going through examples of the use of a template after explicitly stating the template. It is too early to say how successful this is. But I want to point out that even though it might not help to be explicit with students about templates, the analysis in this post of a phenomenon that occurs in beginning abstract math courses
Notes
Send to Kindle