Tag Archives: algorithm

Functions: Metaphors, Images and Representations

Please read this post at abstractmath.org. I originally posted the document here but some of the diagrams would not render, and I haven’t been able to figure out why. Sorry for having to redirect.

Send to Kindle

A mathematical saga

This post outlines some of the intellectual developments in the history of math. I call it a saga because it is like one:

  • It is episodic, telling one story after another.
  • It does not try to give an accurate history, but its episodes resemble what happened in math in the last 3000 years.
  • It tells about only a few of the things that happened.

Early techniques

We represented numbers by symbols.

Thousands of years ago, we figured out how to write down words and phrases in such a way that someone much later could read and understand them.

Naturally, we wanted to keep records of the number of horses the Queen owned, so we came up with various notations for numbers (number representing count). In some societies, these symbols were separate from the symbols used to represent words.

We invented algorithms

We discovered positional notation. We write $213$, which is based on a system: it means $2\times100+1\times10+3\times 1$. This notation encapsulates a particular computation of a number (its base-10 representation). (The expression $190+23$ is another piece of notation that encapsulates a computation that yields $213$.)

Compare that to the Roman notation $CCXIII$, which is an only partly organized jumble.
Try adding $CCXIII+CDXXIX$. (The answer is $DCXLII$.)

Positional notation allowed us to create the straightforward method of addition involving adding single digits and carrying:
\[\overset{\hspace{6pt}1\phantom{1}}
{\frac
{\overset{\displaystyle{213}}{429}}{642}
}
\]
Measuring land requires multiplication, which positional notation also allows us to perform easily.
The invention of such algorithms (methodically manipulating symbols) made it easy to calculate with numbers.

Geometry: Direct construction of mathematical objects

We discovered geometry in ancient times, in laying out plots of land and designing buildings. We had a bunch of names for different shapes and for some of them we knew how to calculate their area, perimeter and other things.

Euclid showed how to construct new geometric figures from given ones using specific methods (ruler and compasses) that preserve some properties.

Example

We can bisect a line segment (black) by drawing two circles (blue) centered at the endpoints with radius the length of the line segment. We then construct a line segment (red) between the points of intersection of the circle that intersects the given line segment at its midpoint. These constructions can be thought of as algorithms creating and acting on geometric figures rather than on symbols.



It is true that diagrams were drawn to represent line segments, triangles and so on.
But the diagrams are visualization helpers. The way we think about the process is that we are operating directly on the geometric objects to create new ones. We are thinking of the objects Platonically, although we don’t have to share Plato’s exact concept of their reality. It is enough to say we are thinking about the objects as if they were real.

Axioms and theorems

Euclid came up with the idea that we should write down axioms that are true of these figures and constructions, so that we can systematically use the constructions
to prove theorems about figures using axioms and previously proved theorems. This provided documented reasoning (in natural language, not in symbols) for building up a collection of true statements about math objects.

Example

After creating some tools for proving triangles are congruent, we can prove the the intersection of red and black lines in the figure really is the midpoint of the black line by constructing the four green line segments below and making appeals to congruences between the triangles that show up:



Note that the green lines have the same length as the black line.

Euclid thought about axioms and theorems as applying to geometry, but he also proved theorems about numbers by representing them as ratios of line segments.

Algebra

People in ancient India and Greece knew how to solve linear and quadratic equations using verbal descriptions of what you should do.
Later, we started using a symbolic language to express numerical problems and symbolic manipulation to solve (some of) them.

Example

The quadratic formula is an encapsulated computation that provides the roots of a quadratic equation. Newton’s method is a procedure for finding a root of an arbitrary polynomial. It is recursive in the loose sense (it does not always give an answer).

The symbolic language is a vast expansion of the symbolic notation for numbers. A major innovations was to introduce variables to represent unknowns and to state equations that are always true.

Logic

Aristotle developed an early form of logic (syllogisms) aimed at determining which arguments in rhetoric were sound. “All men are mortal. Socrates is a man. Therefore Socrates is mortal.” This was written in sentences, not in symbols.

By explicit analogy with algebra, we introduced symbolism and manipulation rules for logical reasoning, with an eye toward making mathematical reasoning sound and to some extent computable. For example, in one dialect of logical notation, modus ponens (used in the Socrates syllogism) is expressed as $(P\rightarrow Q,\,P)\,\,\vdash\,\, Q$. This formula is an encapsulated algorithm: it says that if you know $P\rightarrow Q$ and $P$ are valid (are theorems) then $Q$ is valid as well.

Crises of understanding

We struggled with the notion of function as a result of dealing with infinite series. For example, the limit of a sequence of algebraic expressions may not be an algebraic expression. It would no longer do to think of a function as the same thing as an algebraic expression.

We realized that Euclid’s axioms for geometry lacked clarity. For example, as I understand it, the original version of his axioms didn’t imply that the two circles in the proof above had to intersect each other. There were other more subtle problems. Hilbert made a big effort to spell out the axioms in more detail.

We refined our understanding of logic by trying to deal with the mysteries of calculus, limits and spaces. An example is the difference between continuity and uniform continuity.
We also created infinitesimals, only to throw up our hands because we could not make a logic that fit them. Infinitesimals were temporarily replaced by the use of epsilon-delta methods.

We began to understand that there are different kinds of spaces. For example, there were other models of some of Euclid’s axioms than just Euclidean space, and some of those models showed that the parallel axiom is independent of the other axioms. And we became aware of many kinds of topological spaces and manifolds.

We started to investigate sets, in part because spaces have sets of points. Then we discovered that a perfectly innocent activity like considering the set of all sets resulted in an impossibility.

We were led to consider how to understand the Axiom of Choice from several upsetting discoveries. For example, the Banach-Tarski “paradox” implies that you can rearrange the points in a sphere of radius $1$ to make two spheres of radius $1$.

Mathematics adopts a new covenant… for awhile

These problems caused a kind of tightening up, or rigorizing.
For a period of time, less than a century, we settled into a standard way of practicing research mathematics called new math or modern math. Those names were used mostly by math educators. Research mathematicians might have called it axiomatic math based on set theory. Although I was around for the last part of that period I was not aware of any professional mathematicians calling it anything at all; it was just what we did.

First, we would come up with a new concept, type of math object, or a new theorem. In this creative process we would freely use intuition, metaphors, images and analogies.

Example

We might come up with the idea that a function reaches its maximum when its graph swoops up from the left, then goes horizontally for an infinitesimal amount of time, then swoops down to the right. The point at which it was going horizontally would obviously have to be the maximum.

But when we came to publish a paper about the subject, all these pictures would disappear. All our visual, metaphorical/conceptual and kinetic feelings that explain the phenomenon would have to be suppressed.

Rigorizing consisted of a set of practices, which I will hint at:

Orthodox behavior among mathematicians in 1950

Definition in terms of sets and axioms

Each mathematical object had to be defined in some way that started with a set and some other data defined in terms of the set. Axioms were imposed on these data. Everything had to be defined in terms of sets, including functions and relations. (Multiple sets were used occasionally.)

Definitions done in this way omit a lot of the intuition that we have concerning the object being defined.

Examples
  • The definition of group as a set with a binary operation satisfying some particular axioms does not tell you that groups constitute the essence of symmetry.
  • The definitions of equivalence relation and of partition do not even hint that they define the same concept.

Even so, definitions done in this way have an advantage: They tend to be close to minimal in the sense that to verify that something fits the definition requires checking no more (or not much more) than necessary.

Proofs had to be clearly translatable into symbolic logic

First order logic (and other sorts of logic) were well developed and proofs were written in a way that they could in principle be reduced to arguments written in the notation of symbolic logic and following the rules of inference of logic. This resulted in proofs which did not appeal to intuition, metaphors or pictures.

Example

In the case of the theorem that the maximum of a (differentiable) function occurs only where the derivative is zero, that meant epsilon-delta proofs in which the proof appeared as a thick string of symbols. Here, “thick” means it had superscripts, subscripts, and other things that gave the string a fractal dimension of about $1.2$ (just guessing!).

Example

When I was a student at Oberlin College in 1959, Fuzzy Vance (Elbridge P. Vance) would sometimes stop in the middle of an epsilon-delta proof and draw pictures and provide intuition. Before he started that he would say “Shut the door, don’t tell anyone”. (But he told us!)

Example

A more famous example of this is the story that Oscar Zariski, when presenting a proof in algebraic geometry at the board, would sometimes remind himself of a part of a proof by hunching over the board so the students couldn’t see what he was doing and drawing a diagram which he would immediately erase. (I fault him for not telling them about the diagram.)

It doesn’t matter whether this story is true or not. It is true in the sense that any good myth is true.

Commercial

I wrote about rigor in these articles:

Rigorous view in abstractmath.org.

Dry bones, post in this blog.

Logic and sets clarify but get in the way

The orthodox method of “define it by sets and axioms” and “makes proofs at least resemble first order logic” clarified a lot of suspect proofs. But it got in the way of intuition and excessive teaching by using proofs made it harder to students to learn.

  • The definition of a concept can make you think of things that are foreign to your intuition of the concept. A function is a mapping,. The ordered pairs are a secondary construction; you should not think of ordered pairs as essential to your intuition. Even so the definition of function in terms of ordered pairs got rid of a lot of cobwebs.
  • The cartesian product of sets is obviously an associative binary operation. Except that if you define the cartesian product of sets in terms of ordered pairs then it is not associative.
  • Not only that, but if you define the ordered pair $(a,b)$ as $\{\{a,b\},a\}$ the you have to say that $a$ is an element of $(a,b)$ but $b$ is not That is not merely an inconvenient definition of ordered pair, it is wrong. It is not bad way to show that the concept of ordered pair is consistent with ZF set theory, but that is a minor point mathematicians hardly ever worry about.

Mathematical methods applied to everything

The early methods described at the beginning of this post began to be used everywhere in math.

Algorithms on symbols

Algorithms, or methodical procedures, began with the addition and multiplication algorithms and Euclid’s ruler and compass constructions, but they began to be used everywhere.

They are applied to the symbols of math, for example to describe rules for calculating derivatives and integrals and for summing infinite series.

Algorithms are used on strings, arrays and diagrams of math symbols, for example concatenating lists, multiplying matrices, and calculating binary operations on trees.

Algorithms as definitions

Algorithms are used to define the strings that make up the notation of symbolic logic. Such definitions include something like: “If $E$ and $F$ are expressions than $(E)\land (F)$ and $(\forall x)(E)$ are expressions”. So if $E$ is “$x\geq 3$” then $(\forall x)(x\geq 3)$ is an expression. This had the effect of turning an expression in symbolic logic into a mathematical object. Deduction rules such as “$E\land F\vdash E$” also become mathematical objects in this way.

We can define the symbols and expressions of algebra, calculus, and other part of math using algorithms, too. This became a big deal when computer algebra programs such as Mathematica came in.

Example

You can define the set $RP$ of real polynomials this way:

  • $0\in RP$
  • If $p\in RP$ then $p+r x^n\in RP$, where $x$ is a variable and $n$ a nonnegative integer.

That is a recursive definition. You can also define polynomials by pattern recognition:

Let $n$ be a positive integer, $a_0,\,a_1\,\ldots a_n$ be real numbers and $k_0,\,k_1\,\ldots k_n$ be nonnegative integers. Then $a_0 x^{k_0}+a_1 x^{k_1}+\ldots+ a_n x^{k_n}$ is a polynomial.

The recursive version is a way of letting a compiler discover that a string of symbols is a polynomial. That sort of thing became a Big Deal when computers arrived in our world.

Algorithms on mathematical objects

I am using the word “algorithm” in a loose sense to mean any computation that may or may not give a result. Computer programs are algorithms, but so is the quadratic formula. You might not think of a formula as an algorithm, but that is because if you use it in a computer program you just type in the formula; the language compiler has a built-in algorithm to execute calculations given by formulas.

It has not been clearly understood that mathematicians apply algorithms not only to symbols, but also directly to mathematical objects. Socrates thought that way long ago, as I described in the construction of a midpoint above. The procedure says “draw circles with center at the endpoints of the line segment.” It doesn’t say “draw pictures of circles…”

In the last section and this one, I am talking about how we think of applying an algorithm. Socrates thought he was talking about ideal lines and circles that exist in some other universe that we can access by thought. We can think about them as real things without making a metaphysical claim like Socrates did about them. Our brains are wired to think of abstract ideas in some many of the same ways we think about physical objects.

Example

The unit circle (as a topological space at least) is the quotient space of the space $\mathbb{R}$ of real numbers mod the equivalence relation defined by: $x\sim y$ if and only if $x-y$ is an integer.

Mathematicians who understand that construction may have various images in their mind when they read this. One would be something like imagining the real line $\mathbb{R}$ and then gluing all the points together that are an integer apart. This is a distinctly dizzying thing to think about but mathematicians aren’t worried because they know that taking the quotient of a space is a well-understood construction that works. They might check that by imagining the unit circle as the real line wrapped around an infinite number of times, with points an integer apart corresponding to the same point on the unit circle. (When I did that check I hastily inserted the parenthetical remark saying “as a topological space” because I realized the construction doesn’t preserve the metric.) The point of this paragraph is that many mathematicians think of this construction as a construction on math objects, not a construction on symbols.

Everything is a mathematical object

A lot of concepts start out as semi-vague ideas and eventually get defined as mathematical objects.

Examples

  • A function was originally thought of as a formula, but then get formalized in the days of orthodoxy as a set of ordered pairs with the functional property.
  • The concept of equation has been turned into a math object many times, for example in universal algebra and in logic. I suspect that some practitioners in those fields might disagree with me. This requires further research.
  • Propositions are turned into math objects by Boolean Algebra.
  • Perhaps numbers were always thought of as math objects, but much later the set $\mathbb{N}$ of all natural numbers and the set $\mathbb{R}$ of all real numbers came to be thought of explicitly as math objects, causing some mathematicians to have hissy fits.
  • Definitions are math objects. This has been done in various ways. A particular theory is a mathematical object, and it is essentially a definition by definition (!): Its models are what the theory defines. A particular example of “theory” is first-order theory which was the gold standard in the orthodox era. A classifying topos is also a math object that is essentially a definition.

Category Theory

The introduction of categories broke the orthodoxy of everything-is-a-set. It has become widely used as a language in many branches of math. It started with problems in homological algebra arising in at least these two ways:

  • Homotopy classes of continuous functions are not functions in the set theory sense. So we axiomatized the concept of function as an arrow (morphism) in a category.
  • The concept of mathematical object is axiomatized as an object in a category. This forces all properties of an object to be expressed in terms of its external relations with other objects and arrows.
  • Categories capture the idea of “kind of math”. There is a category of groups and homomorphisms, a category of topological spaces and homeomorphisms, and so on. This is a new level of abstraction. Before, if someone said “I work in finite groups”, their field was a clear idea and people knew what they were talking about, but now the category of finite groups is a mathematical object.
  • Homology maps one kind of math (topology) into another kind (algebra). Since categories capture the general notion of “kind of math”, we invented the idea of functor to capture the idea of modeling or representing one branch of math in another one. So Homology became a mathematical object.
  • The concept of functor allowed the definition of natural transformation as a mathematical object. Before categories, naturality was only an informal idea.

Advantages of category theory

  • Categories, in the form of toposes, quickly became candidates to replace set theory as a foundation system for math. They are more flexible and allow the kind of logic you want to use (classical, intuitionistic and others) to be a parameter in your foundational system.
  • “Arrow” (morphism) replaced not only the concept of function but also the concept of “element of” (“$\in$”). It allows the concept of variable elements. (This link is to a draft of a section of abstractmath.org that has not been included in the table of contents yet.) It also requires that an “element” has to be an element of one single object; for example, the maps $1\to \mathbb{N}$ and $1\to \mathbb{R}$ that pick out the number $42$ are not the same maps, although of course they are related by the canonical inclusion map $\mathbb{N}\to\mathbb{R}$.
  • Diagrams are used in proofs and give much better immediate understanding than formulas written in strings, which compress a lot of things unnecessarily into thick strings that require understanding lots of conventions and holding things in your memory.
  • Categories-as-kinds-of-math makes it easy to turn an analogy, for example between products of groups and products of spaces, into two examples of the same kind of mathematical object: Namely, a product in a category.

Disadvantages of category theory

  • Category theory requires a new way of thinking. Some people think that is a disadvantage. But genuine innovation is always disruptive. New technology breaks old technology. Of course, the new technology has to turn out to be useful to win out.
  • Category theory has several notions of “equal”. Objects can be the same or isomorphic. Categories can be isomorphic or equivalent. When you are doing category theory, you should never worry about whether two objects are equal: that is considered evil. Category theorists generally ignored the fuzziness of this problem because you can generally get away with it. Still, it was an example of something that had not been turned into a mathematical definition. Two ways of accomplishing this are anafunctors and homotopy type theory.

I expect to write about homotopy type theory soon. It may be the Next Revolution.

Send to Kindle

Explaining math

The interactive examples in this post require installing Wolfram CDF player, which is free and works on most desktop computers using Firefox, Safari and Internet Explorer, but not Chrome. The source code is the Mathematica Notebook SolvEq.nb, which is available for free use under a Creative Commons Attribution-ShareAlike 2.5 License. The notebook can be read by CDF Player if you cannot make the embedded versions in this post work.

This post explains some basic distinctions that need to be made about the process of writing and explaining math.  Everyone who teaches math knows subconsciously what is happening here; I am trying to raise your consciousness.  For simplicity, I have chosen a technique used in elementary algebra, but much of what I suggest also applies to more abstract college level math.

An algebra problem

Solve the equation "$ax=b$" ($a\neq0$).

Understanding the statement of this problem requires a lot of Secret Knowledge (the language of ninth grade algebra) that most people don't have.

  • The expression "$ax$" means that $a$ and $x$ are numbers and $ax$ is their product. It is not the word "ax". You have to know that writing two symbols next to each other means multiply them, except when it doesn't mean multiply them as in "$\sin\,x$".

  • The whole expression "$ax=b$" ostensibly says that the number $ax$ is the same number as $b$.  In fact, it means more than that. The phrase "solve the equation" tells you that in fact you are supposed to find the value of $x$ that makes $ax$ the same number as $b$.

  • How do you know that "solve the equation" doesn't mean find the value of $a$ that makes $ax$ the same number as $b$? Answer: The word "solve" triggers a convention that $x$, $y$ and $z$ are numbers you are trying to find and $a$, $b$, $c$ stand for numbers that you are allowed to plug in to the equation.

  • The conventions of symbolic math require that you give a solution for any nonzero value of $a$ and any value of $b$.  You specifically are not allowed to pick $a=1$ and $b=33$ and find the value just for those numbers.  (Some college calculus students do this with problems involving literal coefficients.)

  • The little thingy "$(a\neq0)$" must be read as a constraint on $a$.  It does not mean that $a\neq0$ is a fact that you ought to know. ( I've seen college math students make this mistake, admittedly in more complex situations). Nor does it mean that you can't solve the problem if $a=0$ (you can if $b$ is also zero!).

So understanding what this problem asks, as given, requires (fairly sophisticated in some cases) pattern recognition both to understand the symbolic language it uses, and also to understand the special conventions of the mathematical English that it uses.

Explicit descriptions

This problem could be reworded so that it gives an explicit description of the problem, not requiring pattern recognition.  (Warning: "Not requiring pattern recognition" is a fuzzy concept.)  Something like this:  

You have two numbers $a$ and $b$.  Find a number $c$ for which if you multiply $a$ by $c$ you get $b$.

This version is not completely explicit.  It still requires understanding the idea of referring to a number by a letter, and it still requires pattern recognition to catch on that the two occurrences of each letter means that their meanings have to match. Also, I know from experience that some American first year college students have trouble with the syntax of the sentence ("for which…", "if…").

The following version is more explicit, but it cheats by creating an ad hoc way to distinguish the numbers.

Alice and Bob each give you a number.  How do you find a number with the property that Alice's number times your number is equal to Bob's number? 

If the problem had a couple more variables it would be so difficult to understand in an explicit form that most people would have to draw a picture of the relationships between them.  That is why algebraic notation was invented.

Visual descriptions

Algebra is a difficult foreign language.  Showing the problem visually makes it easier to understand for most people. Our brain's visual processing unit is the most powerful tool the brain has to understand things.  There are various ways to do this.  

Visualization can help someone understand algebraic notation better.  

You can state the problem by producing examples such as

  • $\boxed{3}\times\boxed{\text{??}}=\boxed{6}$ 
  • $\boxed{5}\times\boxed{\text{??}}=\boxed{2}$ 
  • $\boxed{42}\times\boxed{\text{??}}=\boxed{24}$

where the reader has to know the multiplication symbol and, one hopes, will recognize "$\boxed{\text{??}}$" as "What's the value?". But the reader does not have to understand what it means to use letters for numbers, or that "$x$ means you are suppose to discover what it is".  This way of writing an algebra problem is used in some software aimed at K-12 students.  Some of them use a blank box instead of "$\boxed{\text{??}}$".

Such software often shows the algorithm for solving the problem visually, using algebraic notation like this:

I have put in some buttons to show numbers as well as $a$ and $b$.  If you have access to Mathematica instead of just to CDF player, you can load SolvEq.nb and put in any numbers you want, but CDF's don't allow input data. 

You can also illustrate the algorithm using the tree notation for algebra I used in Monads for high school I  (and other posts). The demo below shows how to depict the value-preserving transformation given by the algorithm.  (In this case the value is the truth since the root operation is equals.)

This demo is not as visually satisfactory as the one illustrating the use of the associative law in Monads for high school I.  For one thing, I had to cheat by reversing the placement of $a$ and $x$.  Note that I put labels for the numerator and denominator legs, a practice I have been using in demos for a while for noncommutative operations.  I await a new inspiration for a better presentation of this and other equation-solving algorithms.

Another advantage of using pictures is that you can often avoid having to code things as letters which then has to be remembered.  In Monads for high school I, I used drawings of the four functions from a two-element set to itself instead of assigning them letters.  Even mnemonic letters such as $s$ for "switch" and $\text{id}$ for the identity element carry a burden that the picture dispenses with.

Send to Kindle

Templates in mathematical practice

This post is a first pass at what will eventually be a section of abstractmath.org. It’s time to get back to abstractmath; I have been neglecting it for a couple of years.

What I say here is based mainly on my many years of teaching discrete mathematics at Case Western Reserve University in Cleveland and more recently at Metro State University in Saint Paul.

Beginning abstract math

College students typically get into abstract math at the beginning in such courses as linear algebra, discrete math and abstract algebra. Certain problems that come up in those early courses can be grouped together under the notion of (what I call) applying templates [note 0]. These are not the problems people usually think about concerning beginners in abstract math, of which the following is an incomplete list:

The students’ problems discussed here concern understanding what a template is and how to apply it.

Templates can be formulas, rules of inference, or mini-programs. I’ll talk about three examples here.

The template for quadratic equations

The solution of a real quadratic equation of the form {ax^2+bx+c=0} is given by the formula

\displaystyle  x=\frac{-b\pm\sqrt{b^2-4ac}}{2a}

This is a template for finding the roots of the equations. It has subtleties.

For example, the numerator is symmetric in {a} and {c} but the denominator isn’t. So sometimes I try to trick my students (warning them ahead of time that that’s what I’m trying to do) by asking for a formula for the solution of the equation {a+bx+cx^2=0}. The answer is

\displaystyle x=\frac{-b\pm\sqrt{b^2-4ac}}{2c}

I start writing it on the board, asking them to tell me what comes next. When we get to the denominator, often someone says “{2a}”.

The template is telling you that the denominator is 2 times the coefficient of the square term. It is not telling you it is “{a}”. Using a template (in the sense I mean here) requires pattern matching, but in this particular example, the quadratic template has a shallow incorrect matching and a deeper correct matching. In detail, the shallow matching says “match the letters” and the deep matching says “match the position of the letters”.

Most of the time the quadratic being matched has particular numbers instead of the same letters that the template has, so the trap I just described seldom occurs. But this makes me want to try a variation of the trick: Find the solution of {3+5x+2x^2=0}. Would some students match the textual position (getting {a=3}) instead of the functional position (getting {a=5})? [Note [0]). If they did they would get the solutions {(-1,-\frac{2}{3})} instead of {(-1,-\frac{3}{2})}.

Substituting in algebraic expressions have other traps, too. What sorts of mistakes would students have solving {3x^2+b^2x-5=0}?

Most students on the verge of abstract math don’t make mistakes with the quadratic formula that I have described. The thing about abstract math is that it uses more sophisticated templates

  • subject to conditions
  • with variations
  • with extra levels of abstraction

The template for proof by induction

This template gives a method of proof of a statement of the form {\forall{n}\mathcal{P}(n)}, where {\mathcal{P}} is a predicate (presumably containing {n} as a variable) and {n} varies over positive integers. The template says:

Goal: Prove {\forall{n}\mathcal{P}(n)}.

Method:

  • Prove {\mathcal{P}(1)}
  • For an arbitrary integer {n>1}, assume {\mathcal{P}(n)} and deduce {\mathcal{P}(n+1)}.

For example, to prove {\forall n (2^n+1\geq n^2)} using the template, you have to prove that {2^2+1\geq  1^1}, and that for any {n>1}, if {2^n+1\geq n^2}, then {2^{n+1}+1\geq  (n+1)^2}. You come up with the need to prove these statements by substituting into the template. This template has several problems that the quadratic formula does not have.

Variables of different types

The variable {n} is of type integer and the variable {\mathcal{P}} is of type predicate [note 0]. Having to deal with several types of variables comes up already in multivariable calculus (vectors vs. numbers, cross product vs. numerical product, etc) and they multiply like rabbits in beginning abstract math classes. Students sometimes write things like “Let {\mathcal{P}=n+1}”. Multiple types is a big problem that math ed people don’t seem to discuss much (correct me if I am wrong).

Free and bound

The variable {n} occurs as a bound variable in the Goal and a free variable in the Method. This happens in this case because the induction step in the Method originates as the requirement to prove {\forall  n(\mathcal{P}(n)\rightarrow\mathcal{P}(n+1))}, but as I have presented it (which seems to be customary) I have translated this into a requirement based on modus ponens. This causes students problems, if they notice it. (“You are assuming what you want to prove!”) Many of them apparently go ahead and produce competent proofs without noticing the dual role of {n}. I say more power to them. I think.

The template has variations

  • You can start the induction at other places.
  • You may have to have two starting points and a double induction hypothesis (for {n-1} and {n}). In fact, you will have to have two starting points, because it seems to be a Fundamental Law of Discrete Math Teaching that you have to talk about the Fibonacci function ad nauseam.
  • Then there is strong induction.

It’s like you can go to the store and buy one template for quadratic equations, but you have to by a package of templates for induction, like highway engineers used to buy packages of plastic French curves to draw highway curves without discontinuous curvature.

The template for row reduction

I am running out of time and won’t go into as much detail on this one. Row reduction is an algorithm. If you write it up as a proper computer program there have to be all sorts of if-thens depending on what you are doing it for. For example if want solutions to the simultaneous equations

2x+4y+z = 1
x+2y = 0
x+2y+4z = 5

you must row reduce the matrix

2 4 1 1
1 2 0 0
1 2 4 5

(I haven’t yet figured out how to wrap this in parentheses) which gives you

1 2 0 0
0 0 1 0
0 0 0 1

This introduces another problem with templates: They come with conditions. In this case the condition is “a row of three 0s followed by a nonzero number means the equations have no solutions”. (There is another condition when there is a row of all 0’s.)

It is very easy for the new student to get the calculation right but to never sit back and see what they have — which conditions apply or whatever.

When you do math you have to repeatedly lean in and focus on the details and then lean back and see the Big Picture. This is something that has to be learned.

What to do, what to do

I have recently experimented with being explicit about templates, in particular going through examples of the use of a template after explicitly stating the template. It is too early to say how successful this is. But I want to point out that even though it might not help to be explicit with students about templates, the analysis in this post of a phenomenon that occurs in beginning abstract math courses

  • may still be accurate (or not), and
  • may help teachers teach such things if they are aware of the phenomenon, even if the students are not.

Notes

  1. Many years ago, I heard someone use the word “template” in the way I am using it now, but I don’t recollect who it was. Applied mathematicians sometimes use it with a meaning similar to mine to refer to soft algorithms–recipes for computation that are not formal algorithms but close enough to be easily translated into a sufficiently high level computer language.
  2. In the formula {ax^2+bx+c}, the “{a}” has the first textual position but the functional position as the coefficient of the quadratic term. This name “functional position” has nothing to do with functions. Can someone suggest a different name that won’t confuse people?
  3. I am using “variable” the way logicians do. Mathematicians would not normally refer to “{\mathcal{P}}” as a variable.
  4. I didn’t say anything about how templates can involve extra layers of abstract.  That will have to wait.
Send to Kindle