Abstracting algebra

This post has been turned into a page on WordPress, accessible in the upper right corner of the screen.  The page will be referred to by all topic posts for Abstracting Algebra.

 

Visible Algebra I

This is the first in a series of articles about how algebra could be implemented without using the standard language of algebra that so many people find difficult. The code for the graphs are in the Mathematica notebook Algebra1.nb.

An algebra problem

Suppose you are designing a window that is in the shape of a rectangle surmounted by a semicircle, shown above for the window with width 2 and rectangle height 3. 

This example occurs in a tiresomely familiar calculus problem where you put a constraint on the perimeter of the window, thus turning it into a one-variable problem, then finding the values of the width and height that give the maximum area.  In this post, I am not going to get that far.  All I will do is come up with a calculation for the area.  I will describe a way you might do it on a laptop five or ten years from now. 

You have an algebra application that shows a screen with some operations that you may select to paste into your calculation.  The ones we use are called plus, times, power, value and input. You choose a function called value, and label it "Area of window". You recognize that the answer is the sum of the areas of the rectangle and the area of the semicircle, so you choose plus and attach to it two inputs which you label "area of rectangle" and "area of semicircle", like this:

 

The notational metaphor is that the computation starts at the bottom and goes upward, performing the operations indicated.

You know (or are told by the system) that the area of a rectangle is the product of its width and height, so you replace the value called "area of rectangle" with a times button and attach two values called $w$ and $h$:

 

You also determine that the area under the semicircle is half the area of a circle of radius $r$ (where $r$ must be calculated).

 

You have a function for the area of a circle of radius $r$, so you attach that:

Finally, you use the fact that you know that the semicircle has a radius which is half the width of the rectangle.

Now, to make the calculation operational, you attach two inputs named "width" and "height" and feed them into the values $w$ and $h$.  When you type numbers into these buttons, the calculation will proceed upward and finally show the area of the window at the top.

In a later post I will produce a live version of this diagram.  (Added 2012-09-08: the live version is here.) Right now I want to get this post out before I leave for MathFest.  (I might even produce the live version at MathFest, depending on how boring the talks are.) 

You can see an example of a live calculation resembling this in my post A visualization of a computation in tree form.

Remarks

Who

  • This calculation might be a typical exercise for a student part way along learning basic algebra. 
  • College students and scientists and engineers would have a system with a lot more built-in functions, including some they built themselves.

Syntax

  • Once you have grasped the idea that the calculation proceed upward from the inputs, carrying out the operations shown, this picture is completely self-explanatory.
    • Well, you have to know what the operations do.
    • The syntax for standard algebra is much more difficult to learn (more later about this).
  • The syntax actually used in later years may not look like mine.
    • For one thing, the flow might run top down or left to right instead of bottom up. 
    • Or something very different might be used. What works best will be discovered by using different approaches.
  • The syntax is fully two-dimensional, which makes it simple to understand (because it uses the most powerful tool our brain has: the visual system).
    • The usual algebraic code was developed because people used pencil and paper. 
    • I would guess that the usual code has fractional dimension about 1.2. 
    • The tree syntax would require too much writing with pencil and paper.  That is alleviated on a computer by using menus.
    • Once you construct the computation and input some data it evaluates automatically.
  • It may be worthwhile to use 3D syntax.  I have an experiment with this in my post Showing categorical diagrams in 3D.

Later posts will cover related topics:

  • The difficulties with standard algebraic notation.  They are not trivial.
  • Solving equations in tree form.
  • Using properties such as associativity and commutativity in tree form.
  • Using this syntax with calculus.
  • The deep connection with Lawvere theories and sketches.

References

Metaphors in computing science I

(This article is continued in Metaphors in computing science II)

Michael Barr recently told me of a transcription of a talk by Edsger Dijkstra dissing the use of metaphors in teaching programming and advocating that every program be written together with a proof that it works.  This led me to think about the metaphors used in computing science, and that is what this post is about.  It is not a direct answer to what Dijkstra said. 

We understand almost anything by using metaphors.  This is a broader sense of metaphor than that thing in English class where you had to say "my love is a red red rose" instead of "my love is like a red red rose".  Here I am talking about conceptual metaphors (see references at the end of the post).  

Metaphor: A program is a set of instructions

You can think of a program as a list of instructions that you can read and, if it is not very complicated, understand how to carry them out.  This metaphor comes from your experience with directions on how to do something (like directions from Google Maps or for assembling a toy).   In the case of a program, you can visualize doing what the program says to do and coming out with the expected output. This is one of the fundamental metaphors for programs. 

Such a program may be informal text or it may be written in a computer language.

Example

A description of how to calculate $n!$ in English could be:  "Multiply the integers $1$ through $n$".  In Mathematica, you could define the factorial function this way:

fac[n_] := Apply[Times, Table[i, {i, 1, n}]]

This more or less directly copies the English definition, which could have been reworded as "Apply the Times function to the integers from $1$ to $n$ inclusive."  Mathematica programmers customarily use the abbreviation "@@" for Apply because it is more convenient:

Fac[n_]:=Times @@ Table[i, {i, 1, 6}]

As far as I know, C does not have list operations built in.  This simple program gives you the factorial function evaluated at $n$:

 j=1;  for (i=2; i<=n; i++)   j=j*i; return j;  

This does the calculation in a different way: it goes through the numbers $1, 2,\ldots,n$ and multiplies the result-so-far by the new number.  If you are old enough to remember Pascal or Basic, you will see that there you could use a DO loop to accomplish the same thing.

What this metaphor makes you think of

Every metaphor suggests both correct and incorrect ideas about the concept.  

  • If you think of a list of instructions, you typically think that you should carry out the instructions in order.  (If they are Ikea instructions, your experience may have taught you that you must carry out the instructions in order.)  
  • In fact, you don't have to "multiply the numbers from $1$ to $n$" in order at all: You could break the list of numbers into several lists and give each one to a different person to do, and they would give their answers to you and you would multiply them together.
  • The instructions for calculating the factorial can be translated directly into Mathematica instructions, which does not specify an order.   When $n$ is large enough, Mathematica would in fact do something like the process of giving it to several different people (well, processors) to speed things up.
  • I had hoped that Wolfram alpha would answer "720" if I wrote "multiply the numbers from $1$ to $6$" in its box, but it didn't work.  If it had worked, the instruction in English would not be translated at all. (Note added 7 July 2012:  Wolfram has repaired this.)
  • The example program for C that I gave above explicitly multiplies the numbers together in order from little to big.  That is the way it is usually taught in class.  In fact, you could program a package for lists using pointers (a process taught in class!) and then use your package to write a C program that looks like the  "multiply the numbers from $1$ to $n$" approach.  I don't know much about C; a reader could probably tell me other better ways to do it.

So notice what happened:

  • You can translate the "multiply the numbers from $1$ to $n$" directly into Mathematica.
  •  For C, you have to write a program that implements multiplying the numbers from $1$ to $n$. Implementation in this sense doesn't seem to come up when we think about instruction sets for putting furniture together.  It is sort of like: Build a robot to insert & tighten all the screws.

Thus the concept of program in computing science comes with the idea of translating the program instruction set into another instruction set.

  • The translation provided above for Mathematica resembles translating the instruction set into another language. 
  • The two translations I suggested for C (the program and the definition of a list package to be used in the translation) are not like translating from English to another language.  They involve a conceptual reconstruction of the set of instructions.

Similarly, a compiler translates a program in a computer language into machine code, which involves automated conceptual reconstruction on a vast scale.

Other metaphors

In writing about this, I have brought in other metaphors, for example:

  • C or Mathematica as like a natural language in some ways 
  • Compiling (or interpreting) as translation

Computing science has used other VIM's (Very Important Metaphors) that I need to write about later:

  • Semantics (metaphor: meaning)
  • Program as text – this allows you to treat the program as a mathematical object
  • Program as machine, with states and actions like automata and Turing machines.
  • Specification of a program.  You can regard  "the product of the numbers from $1$ to $n$" as a specification.  Notice that saying "the product" instead of "multiply" changes the metaphor from "instruction" to "specification".

References

Conceptual metaphors (Wikipedia)

Images and Metaphors (article in abstractmath)

Images and Metaphors for Sets (article in abstractmath)

Images and Metaphors for Functions (incomplete article in abstractmath)

 

 

An Elaborate Riemann Sums Demo

Note

To manipulate the demo in this post, you must have Wolfram CDF Player installed on your computer. It is available free from the Wolfram website.

The demo currently shows a banner that says "This file contains potentially unsafe dynamic content".  You can view the diagram by clicking on the "Enable Dynamics" button.  If and when I figure out how to get rid of the banner, this paragraph will disappear from the post!

Riemann Sums

The Riemann Sum is a complicated idea.  The integral \[\int_a^b f(x)\,dx\] involves three parameters: two numbers $a$ and $b$ and the function $x\mapsto f(x)$.  These are not freely varying parameters: They are subject to the requirements

  • The function $x\mapsto f(x)$  must be defined on the closed interval $[a,b]$ (let's pretend improper integrals don't exist).
  • The function must be Riemann integrable (continuous will do).

A particular Riemann Sum for this integral looks like \[\sum_{i=1}^n f(p_i)(x_i-x_{i-1})\]

It has three more parameters, a number and two lists of numbers satisfying some complicated conditions:

  • The number $n$ of subdivisions. 
  • The partition, which
    • is a list of $n+1$ numbers $\{x_0,x_1,\ldots,x_n\}$
    • satisfies the conditions
      •  $x_0<x_1<\ldots<x_n$
      • $x_0=a$
      • $x_n=b$
  • The list of evaluation points, which
    • is a list of $n$ numbers $\{p_1,\ldots,p_n\}$
    • satisfies the condition $x_{i-1}\leq p_i \leq x_i$ for $i=1,\ldots,n$.

A Riemann sum may or may not have various important properties.

  • The partition can be
    • uniform
    • random
    • chosen by a rule (increase the number of points as the derivative increases, for example)
  • The evaluation points can be chosen
    • randomly
    • at the midpoint
    • at the left end
    • at the right end
    • at the lowest point
    • at the highest point.

So the concept is complex, with several constituents and interrelationships to hold in your head all at once.  Experienced math people learn concepts like this all the time.  Math students have a harder time.  Manipulable diagrams can help.  Here is an example:

The Demo

 

In a class where students use computers with CDF Player installed, you could give them this demo along with instructions about how to use it and a list of questions that they must answer.  

Examples of instructions

  • Click on the big plus sign in the upper right corner for some options.
  • Move the slide labeled $n$ to make more or fewer subdivisions.
  • Click on the little plus sign besides the slide for some options such as allowing $n$ to increase automatically.
  • The buttons allow you to choose the type of partition, the type of evaluation points, and five functions to play with.

Sample questions

  1. Set $n=1$, uniform partition and midpoint and look at the results for each function.  Explain what you see.
  2. Set $n=4$,  uniform partition and midpoint and look at the results for each function.  Explain each of the following by referring to the picture:
    • For $x\mapsto x$, the estimate is exact.
    • For $x\mapsto x^2$, the estimate is less than the value of the integral.
    • For $x\mapsto x^5$, the error in the estimate is much worse than for $x^2$.
    • For $x\mapsto \sqrt{1-x^2}$ , the estimate is greater than the value of the integral.
  3. Go through the examples in 2. and check that when you make $n$ bigger the properties stated continue to be true.  Can you explain this?
  4. Starting with $n=4$, uniform and midpoint and then using bigger values, note that the error for  $x\mapsto \sqrt{1-x^2}$ is always bigger than the error for  $x\mapsto \sin \pi x$.  Try to explain this.  (Don't ask the students to prove it in freshman calculus).
  5. For $n=4$, uniform and midpoint (and then try bigger $n$), for $x\mapsto x^5$, the LeftSide error is always less than the RightSide error.  Explain using the picture.
  6. For which curves is the LeftSide estimate always the Lower Sum?  Always the Upper Sum?  Neither?  Does using Random instead of Uniform change these answers?

There are many other questions like this you can ask. After answering some of them, I claim (without proof) that the students will have a much better understanding of Riemann sums.

Note that teachers can use this Demo without knowing anything at all about Mathematica.  There are hundreds of Demos available in the cloud that can be used in the same way; many of the best are on the Wolfram Demonstration Project.

If you can program some in Mathematica, you can take the source code for this demo and modify it, for example to use other functions, to provide functions with changeable parameters and to use partitions following dynamic rules.

You could also have this up on a screen in your classroom for class discussion.  But I doubt that is the best use.  For classroom demos you probably need simple on-off demos that you prepare ahead or even write on the spot.  An example of a simple demo is in the post Offloading Abstraction.  I will talk about simple demos more in a later post.

Rant about why math teachers should use manipulable diagrams

A teacher in the past would draw an example of a RIemann sum on the blackboard and talk about a few features as they point at the board.  Nowadays, teachers have slides with accurately drawn Riemann sums and books have pictures of them.  This sort of thing gives the student a picture which (hopefully) stays in their head.  That picture is a kind of metaphor which enables you to think of the sum in terms of something that you are familiar with, just as you can think of a function as position and its derivative as velocity.  (Position and velocity are familiar from driving or any other kind of moving.  The picture of a Riemann sum is not something you knew before you studied them, but your brain has remarkable abilities to absorb a picture and the relations between parts of the picture, so once you have seen it you can call it up whenever you think of Riemann sums.)

But there are a lot of aspects of Riemann sums that cannot be demonstrated by a still picture.  When the mesh gets finer, the value of the sum tends to be closer to the exact value of the integral.  You can stare at the still picture and sort of visualize this.  Can you visualize a situation where changing to a finer mesh could make the error worse?  If someone suggests a high-frequency sine wave, can you visualize in your head why a finer mesh might make it worse?

An elaborate demo with lots of push buttons is something for students to play with on their own time and thereby gain a better understanding of the topic.  Before manipulable diagrams the only way you could do this was produce physical models.  I don't know of anyone who produced a physical model of a Riemann sum.  It is possible to do so with some parameters changeable but it would be difficult and not as flexible as the demo given here.

The world has more possibilities.  Use them.

Related posts

An elaborate Riemann Sum Demo (Mathematica notebook, source of the demo in this post)

Freezing a family of functions (previous post)

Images and Metaphors (in abstractmath.org)

Offloading abstraction (previous post)

Offloading abstraction

Note: To manipulate the diagrams in this post and in most of the files it links to, you must have Wolfram CDF Player installed on your computer. It is available free from the Wolfram website.

The diagram above shows you the tangent line to the curve $y=x^3-x$ at a specific point.  The slider allows you to move the point around, and the tangent line moves with it. You can click on one of the plus signs for options about things you can do with the slider.  (Note: This is not new.  Many other people have produced diagrams like this one.)

I have some comments to make about this very simple diagram. I hope they raise your consciousness about what is going on when you use a manipulable demonstration.

Farming out your abstraction load

A diagram showing a tangent line drawn on the board or in a paper book requires you visualize how the tangent line would look at other points.  This imposes a burden of visualization on you.  Even if you are a new student you won't find that terribly hard (am I wrong?) but you might miss some things at first:

  • There are places where the tangent line is horizontal.
  • There are places where some of the tangent lines cross the curve at another point. Many calculus students believe in the myth that the tangent line crosses the curve at only one point.  (It is not really a myth, it is a lie.  Any decent myth contains illuminating stories and metaphors.)
  • You may not envision (until you have some experience anyway) how when you move the tangent line around it sort of rocks like a seesaw.

You see these things immediately when you manipulate the slider.

Manipulating the slider reduces the load of abstract thinking in your learning process.     You have less to keep in your memory; some of the abstract thinking is offloaded onto the diagram.  This could be described as contracting out (from your head to the picture) part of the visualization process.  (Visualizing something in your head is a form of abstraction.)

Of course, reading and writing does that, too.  And even a static graph of a function lowers your visualization load.  What interactive diagrams give the student is a new tool for offloading abstraction.

You can also think of it as providing external chunking.  (I'll have to think about that more…)

Simple manipulative diagrams vs. complicated ones

The diagram above is very simple with no bells and whistles.  People have come up with much more complicated diagrams to illustrate a mathematical point.  Such diagrams:

  • May give you buttons that give you a choice of several curves that show the tangent line.
  • May give a numerical table that shows things like the slope or intercept of the current tangent line.
  • May also show the graph of the derivative, enabling you to see that it is in fact giving the value of the slope.

Such complicated diagrams are better suited for the student to play with at home, or to play with in class with a partner (much better than doing it by yourself).  When the teacher first explains a concept, the diagrams ought to be simple.

Examples

  • The Definition of derivative demo (from the Wolfram Demonstration Project) is an example that provides a table that shows the current values of some parameters that depend on the position of the slider.
  • The Wolfram demo Graphs of Taylor Polynomials is a good example of a demo to take home and experiment extensively with.  It gives buttons to choose different functions, a slider to choose the expansion point, another one to choose the number of Taylor polynomials, and other things.
  • On the other hand, the Wolfram demo Tangent to a Curve is very simple and differs from the one above in one respect: It shows only a finite piece of the tangent line.  That actually has a very different philosophical basis: it is representing for you the stalk of the tangent space at that point (the infinitesimal vector that contains the essence of the tangent line).
  • Brian Hayes wrote an article in American Scientist containing a moving graph (it moves only  on the website, not in the paper version!) that shows the changes of the population of the world by bars representing age groups.  This makes it much easier to visualize what happens over time.  Each age group moves up the graph — and shrinks until it disappears around age 100 — step by step.  If you have only the printed version, you have to imagine that happening.  The printed version requires more abstract visualization than the moving version.
  • Evaluating an algebraic expression requires seeing the abstract structure of the expression, which can be shown as a tree.  I would expect that if the students could automatically generate the tree (as you can in Mathematica)  they would retain the picture when working with an expression.  In my post computable algebraic expressions in tree form I show how you could turn the tree into an evaluation aid.  See also my post Syntax trees.

This blog has a category "Mathematica" which contains all the graphs (many of the interactive) that are designed as an aid to offloading abstraction.

Prechunking

The emerging theory of how the brain works gives us a new language to us for discussing how we teach, learn and communicate math.

Modules

Our minds have many functionalities.  They are implemented by what I called modules in Math and modules of the mind because I don’t understand very much about what cognitive scientists have learned about how these functionalities are carried out.  They talk about a particular neuron, a collection of neurons, electrical charges flowing back and forth, and so on, and it appears there is no complete agreement about these ideas.

The functions the modules implement are physical structures or activities in the brain.  At a certain level of abstraction we can ignore the mechanism.

Most modules carry out functionalities that are hidden from our consciousness.

  • When we walk, the walking is carried out by a module that operates without our paying (much) attention to it.
  • When we recognize someone, the identity of the person pops into our consciousness without us knowing how it got there.  Indeed, we cannot introspect to see how the process was carried out; it is completely hidden.

Reasoning, for example if you add 56 and 49 in your head, has part of the process visible to your introspection, but not all of it.  It uses modules such as the sum of 9 and 6 which feel like random access memory.  When you carry the addition out, you (or at least I) are conscious of the carry: you are aware of it and aware of adding it to 9 to get 10.

Good places to find detailed discussion of this hiddenness are references [2] and [4] below.

Chunking

Math ed people have talked for years about the technique of chunking in doing math.

  • You see an algebraic expression, you worry about how it might be undefined, you gray out all of it except the denominator and inspect that, and so on.  (This should be the subject of a Mathematica demo.)
  • You look at a diagram in the category of topological spaces.  Each object in the diagram stands for a whole, even uncountably infinite, space with lots of open and closed subsets and so on, but you think of it just as a little pinpoint in the diagram to discover facts about its relationship with other spaces.  You don’t look inside the space unless you have to to verify something.

Students have a hard time doing that.  When an experienced mathematician does this, they are very likely to chunk subconsciously; they don’t think, “Now I am chunking”.  Nevertheless, you can call it to their attention and they will be aware of the process.

There are modules that perform chunking whose operation you cannot be aware of even if you think about it.  Here are two examples.

Example 1. Consider these two sentences from [2], p. 137:

  • “I splashed next to the bank.”
  • “There was a run on the bank.”

When you read the first one you visualize a river bank.  When you read the second one you visualize a bank as an institution that handles money.  If these two sentences were separated by a couple of paragraphs, or even a few words, in a text you are likely not to notice that you have processed the same word in two different ways.  (When they are together as above it is kind of blatant.)

The point is the when you read each sentence your brain directly presents you with the proper image in each case (different ones as appropriate).  You cannot recover the process that did that (by introspection, anyway).

Example 2. I discussed the sentence below in the Handbook.  The sentence appears in references [3].

…Richard Darst and Gerald Taylor investigated the
differentiability of functions $latex f^p$ (which for our
purposes we will restrict to $latex (0,1)$) defined for
each $latex p\geq1$ by

In this sentence, the identical syntax $latex (a,b)$ appears twice; the first occurrence refers to the open interval from 0 to 1 and the second refers to the GCD of integers m and n.  When I first inserted it into the Handbook’s citation list, I did not notice that (I was using it for another phenomenon, although now I have forgotten what it was).  Later I noticed it.  My mind preprocessed the two occurrences of the syntax and threw up two different meanings without my noticing it.

Of course, “restricting to (0, 1)” doesn’t make sense if (0, 1) means the GCD of 0 and 1, and saying “(m, n) = 1doesn’t make sense if (m, n) is an interval.  This preprocessing no doubted came to its two different conclusions based on such clues, but I claim that this preprocessing operated at a much deeper level of the brain than the preprocessing that results in your thinking (for example) of a topological space as a single unstructured object in a category.

This phenomenon could be called prechunking.  It is clearly a different phenomenon that zooming in on a denominator and then zooming out on the whole expression as I described in [1].

This century’s metaphor

In the nineteenth century we came up with a machine metaphor for how we think.  In the twentieth century the big metaphor was our brain is a computer.  This century’s metaphor is that of a bunch a processes in our brain and in our body all working simultaneously, mostly out of our awareness, to enable us to live our life, learn things, and just as important (as Davidson [4] points out) to unlearn things.  But don’t think we have Finally Discovered The Last Metaphor.

References

  1. Zooming and chunking in abstractmath.org.
  2. Mark Changizi, The vision revolution.  Benbella Books, 2009.
  3. Mark Frantz, “Two functions whose powers make fractals”.  American Mathematical Monthly, v 105, pp 609–617 (1998).
  4. Cathy N. Davidson, Now you see it.  Viking Penguin, 2011.  Chapters 1 and 2.
  5. Math and modules of the mind (previous post).
  6. Cognitive science in Wikipedia.
  7. Charles Wells, The handbook of mathematical discourse, Infinity Publishing Company, 2003.

Technical meanings clash with everyday meanings

Recently (see note [a]) on MathOverflow, Colin Tan asked [1] “What does ‘kernel’ mean in ‘integral kernel’?”  He had noticed the different use of the word in referring to the kernels of morphisms.

I have long thought [2] that the clash between technical meanings and everyday meaning of technical terms (not just in math) causes trouble for learners.  I have recently returned to teaching (discrete math) and my feeling is reinforced — some students early in studying abstract math cannot rid themselves of thinking of a concept in terms of familiar meanings of the word.

One of the worst areas is logic, where “implies” causes well-known bafflement.   “How can ‘If P then Q’ be true if P is false??”  For a large minority of beginning college math students, it is useless to say, “Because the truth table says so!”.  I may write in large purple letters (see [3] for example) on the board and in class notes that The Definition of a Technical Math Concept Determines Everything That Is True About the Concept but it does not take.  Not nearly.

The problem seems to be worse in logic, which changes the meaning of words used in communicating math reasoning as well as those naming math concepts. But it is bad enough elsewhere in math.

Colin’s question about “kernel” is motivated by these feelings, although in this case it is the clash of two different technical meanings given to the same English word — he wondered what the original idea was that resulted in the two meanings.  (This is discussed by those who answered his question.)

Well, when I was a grad student I made a more fundamental mistake when I was faced with two meanings of the word “domain” (in fact there are at least four meanings in math).  I tried to prove that the domain of a continuous function had to be a connected open set.  It didn’t take me all that long to realize that calculus books talked about functions defined on closed intervals, so then I thought maybe it was the interior of the domain that was a, uh, domain, but I pretty soon decided the two meanings had no relation to each other.   If I am not mistaken Colin never thought the two meanings of “kernel” had a common mathematical definition.

It is not wrong to ask about the metaphor behind the use of a particular common word for a technical concept.  It is quite illuminating to get an expert in a subject to tell about metaphors and images they have about something.  Younger mathematicians know this.  Many of the questions on MathOverflow are asking just for that.  My recollection of the Bad Old Days of Abstraction and Only Abstraction (1940-1990?) is that such questions were then strongly discouraged.

Notes

[a] The recent stock market crash has been blamed [4] on the fact that computers make buy and sell decisions so rapidly that their actions cannot be communicated around the world fast enough because of the finiteness of the speed of light.  This has affected academic exposition, too.  At the time of writing, “recently” means yesterday.

References

[1] Colin Tan, “What does ‘kernel’ mean in ‘integral kernel’?

[2] Commonword names for technical concepts (previous blog).

[3] Definitions. (Abstractmath).

[4] John Baez, This weeks finds in mathematical physics, Week 297.

Templates in mathematical practice

This post is a first pass at what will eventually be a section of abstractmath.org. It’s time to get back to abstractmath; I have been neglecting it for a couple of years.

What I say here is based mainly on my many years of teaching discrete mathematics at Case Western Reserve University in Cleveland and more recently at Metro State University in Saint Paul.

Beginning abstract math

College students typically get into abstract math at the beginning in such courses as linear algebra, discrete math and abstract algebra. Certain problems that come up in those early courses can be grouped together under the notion of (what I call) applying templates [note 0]. These are not the problems people usually think about concerning beginners in abstract math, of which the following is an incomplete list:

The students’ problems discussed here concern understanding what a template is and how to apply it.

Templates can be formulas, rules of inference, or mini-programs. I’ll talk about three examples here.

The template for quadratic equations

The solution of a real quadratic equation of the form $latex {ax^2+bx+c=0}&fg=000000$ is given by the formula

$latex \displaystyle x=\frac{-b\pm\sqrt{b^2-4ac}}{2a}&fg=000000$

This is a template for finding the roots of the equations. It has subtleties.

For example, the numerator is symmetric in $latex {a}&fg=000000$ and $latex {c}&fg=000000$ but the denominator isn’t. So sometimes I try to trick my students (warning them ahead of time that that’s what I’m trying to do) by asking for a formula for the solution of the equation $latex {a+bx+cx^2=0}&fg=000000$. The answer is

$latex \displaystyle x=\frac{-b\pm\sqrt{b^2-4ac}}{2c}&fg=000000$

I start writing it on the board, asking them to tell me what comes next. When we get to the denominator, often someone says “$latex {2a}&fg=000000$”.

The template is telling you that the denominator is 2 times the coefficient of the square term. It is not telling you it is “$latex {a}&fg=000000$”. Using a template (in the sense I mean here) requires pattern matching, but in this particular example, the quadratic template has a shallow incorrect matching and a deeper correct matching. In detail, the shallow matching says “match the letters” and the deep matching says “match the position of the letters”.

Most of the time the quadratic being matched has particular numbers instead of the same letters that the template has, so the trap I just described seldom occurs. But this makes me want to try a variation of the trick: Find the solution of $latex {3+5x+2x^2=0}&fg=000000$. Would some students match the textual position (getting $latex {a=3}&fg=000000$) instead of the functional position (getting $latex {a=5}&fg=000000$)? [Note [0]). If they did they would get the solutions $latex {(-1,-\frac{2}{3})}&fg=000000$ instead of $latex {(-1,-\frac{3}{2})}&fg=000000$.

Substituting in algebraic expressions have other traps, too. What sorts of mistakes would students have solving $latex {3x^2+b^2x-5=0}&fg=000000$?

Most students on the verge of abstract math don’t make mistakes with the quadratic formula that I have described. The thing about abstract math is that it uses more sophisticated templates

  • subject to conditions
  • with variations
  • with extra levels of abstraction

The template for proof by induction

This template gives a method of proof of a statement of the form $latex {\forall{n}\mathcal{P}(n)}&fg=000000$, where $latex {\mathcal{P}}&fg=000000$ is a predicate (presumably containing $latex {n}&fg=000000$ as a variable) and $latex {n}&fg=000000$ varies over positive integers. The template says:

Goal: Prove $latex {\forall{n}\mathcal{P}(n)}&fg=000000$.

Method:

  • Prove $latex {\mathcal{P}(1)}&fg=000000$
  • For an arbitrary integer $latex {n>1}&fg=000000$, assume $latex {\mathcal{P}(n)}&fg=000000$ and deduce $latex {\mathcal{P}(n+1)}&fg=000000$.

For example, to prove $latex {\forall n (2^n+1\geq n^2)}&fg=000000$ using the template, you have to prove that $latex {2^2+1\geq 1^1}&fg=000000$, and that for any $latex {n>1}&fg=000000$, if $latex {2^n+1\geq n^2}&fg=000000$, then $latex {2^{n+1}+1\geq (n+1)^2}&fg=000000$. You come up with the need to prove these statements by substituting into the template. This template has several problems that the quadratic formula does not have.

Variables of different types

The variable $latex {n}&fg=000000$ is of type integer and the variable $latex {\mathcal{P}}&fg=000000$ is of type predicate [note 0]. Having to deal with several types of variables comes up already in multivariable calculus (vectors vs. numbers, cross product vs. numerical product, etc) and they multiply like rabbits in beginning abstract math classes. Students sometimes write things like “Let $latex {\mathcal{P}=n+1}&fg=000000$”. Multiple types is a big problem that math ed people don’t seem to discuss much (correct me if I am wrong).

Free and bound

The variable $latex {n}&fg=000000$ occurs as a bound variable in the Goal and a free variable in the Method. This happens in this case because the induction step in the Method originates as the requirement to prove $latex {\forall n(\mathcal{P}(n)\rightarrow\mathcal{P}(n+1))}&fg=000000$, but as I have presented it (which seems to be customary) I have translated this into a requirement based on modus ponens. This causes students problems, if they notice it. (“You are assuming what you want to prove!”) Many of them apparently go ahead and produce competent proofs without noticing the dual role of $latex {n}&fg=000000$. I say more power to them. I think.

The template has variations

  • You can start the induction at other places.
  • You may have to have two starting points and a double induction hypothesis (for $latex {n-1}&fg=000000$ and $latex {n}&fg=000000$). In fact, you will have to have two starting points, because it seems to be a Fundamental Law of Discrete Math Teaching that you have to talk about the Fibonacci function ad nauseam.
  • Then there is strong induction.

It’s like you can go to the store and buy one template for quadratic equations, but you have to by a package of templates for induction, like highway engineers used to buy packages of plastic French curves to draw highway curves without discontinuous curvature.

The template for row reduction

I am running out of time and won’t go into as much detail on this one. Row reduction is an algorithm. If you write it up as a proper computer program there have to be all sorts of if-thens depending on what you are doing it for. For example if want solutions to the simultaneous equations

2x+4y+z = 1
x+2y = 0
x+2y+4z = 5

you must row reduce the matrix

2 4 1 1
1 2 0 0
1 2 4 5

(I haven’t yet figured out how to wrap this in parentheses) which gives you

1 2 0 0
0 0 1 0
0 0 0 1

This introduces another problem with templates: They come with conditions. In this case the condition is “a row of three 0s followed by a nonzero number means the equations have no solutions”. (There is another condition when there is a row of all 0′s.)

It is very easy for the new student to get the calculation right but to never sit back and see what they have — which conditions apply or whatever.

When you do math you have to repeatedly lean in and focus on the details and then lean back and see the Big Picture. This is something that has to be learned.

What to do, what to do

I have recently experimented with being explicit about templates, in particular going through examples of the use of a template after explicitly stating the template. It is too early to say how successful this is. But I want to point out that even though it might not help to be explicit with students about templates, the analysis in this post of a phenomenon that occurs in beginning abstract math courses

  • may still be accurate (or not), and
  • may help teachers teach such things if they are aware of the phenomenon, even if the students are not.

Notes

  1. Many years ago, I heard someone use the word “template” in the way I am using it now, but I don’t recollect who it was. Applied mathematicians sometimes use it with a meaning similar to mine to refer to soft algorithms–recipes for computation that are not formal algorithms but close enough to be easily translated into a sufficiently high level computer language.
  2. In the formula $latex {ax^2+bx+c}&fg=000000$, the “$latex {a}&fg=000000$” has the first textual position but the functional position as the coefficient of the quadratic term. This name “functional position” has nothing to do with functions. Can someone suggest a different name that won’t confuse people?
  3. I am using “variable” the way logicians do. Mathematicians would not normally refer to “$latex {\mathcal{P}}&fg=000000$” as a variable.
  4. I didn’t say anything about how templates can involve extra layers of abstract.  That will have to wait.