Metaphors in computing science I

Michael Barr recently told me of a transcription of a talk by Edsger Dijkstra dissing the use of metaphors in teaching programming and advocating that every program be written together with a proof that it works.  This led me to think about the metaphors used in computing science, and that is what this post is about.  It is not a direct answer to what Dijkstra said. 

We understand almost anything by using metaphors.  This is a broader sense of metaphor than that thing in English class where you had to say "my love is a red red rose" instead of "my love is like a red red rose".  Here I am talking about conceptual metaphors (see references at the end of the post).  

Metaphor: A program is a set of instructions

You can think of a program as a list of instructions that you can read and, if it is not very complicated, understand how to carry them out.  This metaphor comes from your experience with directions on how to do something (like directions from Google Maps or for assembling a toy).   In the case of a program, you can visualize doing what the program says to do and coming out with the expected output. This is one of the fundamental metaphors for programs. 

Such a program may be informal text or it may be written in a computer language.

Example

A description of how to calculate $n!$ in English could be:  "Multiply the integers $1$ through $n$".  In Mathematica, you could define the factorial function this way:

fac[n_] := Apply[Times, Table[i, {i, 1, n}]]

This more or less directly copies the English definition, which could have been reworded as "Apply the Times function to the integers from $1$ to $n$ inclusive."  Mathematica programmers customarily use the abbreviation "@@" for Apply because it is more convenient:

Fac[n_]:=Times @@ Table[i, {i, 1, 6}]

As far as I know, C does not have list operations built in.  This simple program gives you the factorial function evaluated at $n$:

 j=1;  for (i=2; i<=n; i++)   j=j*i; return j;  

This does the calculation in a different way: it goes through the numbers $1, 2,\ldots,n$ and multiplies the result-so-far by the new number.  If you are old enough to remember Pascal or Basic, you will see that there you could use a DO loop to accomplish the same thing.

What this metaphor makes you think of

Every metaphor suggests both correct and incorrect ideas about the concept.  

  • If you think of a list of instructions, you typically think that you should carry out the instructions in order.  (If they are Ikea instructions, your experience may have taught you that you must carry out the instructions in order.)  
  • In fact, you don't have to "multiply the numbers from $1$ to $n$" in order at all: You could break the list of numbers into several lists and give each one to a different person to do, and they would give their answers to you and you would multiply them together.
  • The instructions for calculating the factorial can be translated directly into Mathematica instructions, which does not specify an order.   When $n$ is large enough, Mathematica would in fact do something like the process of giving it to several different people (well, processors) to speed things up.
  • I had hoped that Wolfram alpha would answer "720" if I wrote "multiply the numbers from $1$ to $6$" in its box, but it didn't work.  If it had worked, the instruction in English would not be translated at all.
  • The example program for C that I gave above explicitly multiplies the numbers together in order from little to big.  That is the way it is usually taught in class.  In fact, you could program a package for lists using pointers (a process taught in class!) and then use your package to write a C program that looks like the  "multiply the numbers from $1$ to $n$" approach.  I don't know much about C; a reader could probably tell me other better ways to do it.

So notice what happened:

  • You can translate the "multiply the numbers from $1$ to $n$" directly into Mathematica.
  •  For C, you have to write a program that implements multiplying the numbers from $1$ to $n$. Implementation in this sense doesn't seem to come up when we think about instruction sets for putting furniture together.  It is sort of like: Build a robot to insert & tighten all the screws.

Thus the concept of program in computing science comes with the idea of translating the program instruction set into another instruction set.

  • The translation provided above for Mathematica resembles translating the instruction set into another language. 
  • The two translations I suggested for C (the program and the definition of a list package to be used in the translation) are not like translating from English to another language.  They involve a conceptual reconstruction of the set of instructions.

Similarly, a compiler translates a program in a computer language into machine code, which involves automated conceptual reconstruction on a vast scale.

Other metaphors

In writing about this, I have brought in other metaphors, for example:

  • C or Mathematica as like a natural language in some ways 
  • Compiling (or interpreting) as translation

Computing science has used other VIM's (Very Important Metaphors) that I need to write about later:

  • Semantics (metaphor: meaning)
  • Program as text – this allows you to treat the program as a mathematical object
  • Program as machine, with states and actions like automata and Turing machines.
  • Specification of a program.  You can regard  "the product of the numbers from $1$ to $n$" as a specification.  Notice that saying "the product" instead of "multiply" changes the metaphor from "instruction" to "specification".

References

Conceptual metaphors (Wikipedia)

Images and Metaphors (article in abstractmath)

Images and Metaphors for Sets (article in abstractmath)

Images and Metaphors for Functions (incomplete article in abstractmath)

 

 

Improved clouds

In my post Riemann Clouds Improved I showed examples of clouds of values of Riemann sums in such a way that you could see the convergence to the value, the efficiency of the midpoint rule, and other things.  Here I include two Riemann sums that are shown

  • as manipulable graphs,
  • in clouds in an animated form.

To see and work with these demos, you must have Wolfram CDF Player installed on your computer. It is available free from the Wolfram website.

Each manipulable graph (see Elaborate Riemann Sums Demo) has a slider to choose the mesh (1/n) of the partitions.  The small plus sign besides the slider gives you additional options. The buttons allow you to choose the type of partition and the type of evaluation points.

Each cloud shows a collection of values of random Riemann sums of the function, plotted by size of mesh (an upper bound on the width of the largest subdivision) and the value of the sum.  The cloud shows how the sums converge to the value of the integral. 

Every dot represents a random partition.  The sums with blue dots have random valuation points, the green dots use the left side of the subdivision, the brown dots the right side, and the red dots the midpoint.  The clouds may be suitable for students to study.  Some possible questions they could be asked to do are listed at the end.

Pressing the starter shows many clouds in rapid succession.  I don't know how much educational value that has but I think it is fun, and fun is worthwhile in itself.

Quarter Circle

Manipulable graph:

Animated cloud

 

Sine wave

Manipulable graph:

Animated cloud

Questions

I am not sure of the answers to some of these myself.

  • Why is the accuracy generally better for the sine wave than for the quarter circle?  
  • Why are the green dots above all the others and the brown dots below all the others in the quarter circle?
  • Why are they mixed in with the others for the sine curve?  In fact why do they tend upward? (Going from right to left, in other words in the direction of more accuracy).
  • Why are the midpoint sums so much more accurate?
  • Why do they tend downward for the sine wave?
  • Is it an optical illusion or do they also tend downward for the quarter circle? 

Notes:

The source code for these demos is Animated Riemann.nb at my Mathematica site.

The animated clouds show two hundred precalculated clouds for each picture, so you get the same clouds each time you run the animation.  It would have taken too long to generate the random clouds on the fly.  Each list of two hundred took about seven minutes to create on my computer.

An Elaborate Riemann Sums Demo

Note

To manipulate the demo in this post, you must have Wolfram CDF Player installed on your computer. It is available free from the Wolfram website.

The demo currently shows a banner that says "This file contains potentially unsafe dynamic content".  You can view the diagram by clicking on the "Enable Dynamics" button.  If and when I figure out how to get rid of the banner, this paragraph will disappear from the post!

Riemann Sums

The Riemann Sum is a complicated idea.  The integral \[\int_a^b f(x)\,dx\] involves three parameters: two numbers $a$ and $b$ and the function $x\mapsto f(x)$.  These are not freely varying parameters: They are subject to the requirements

  • The function $x\mapsto f(x)$  must be defined on the closed interval $[a,b]$ (let's pretend improper integrals don't exist).
  • The function must be Riemann integrable (continuous will do).

A particular Riemann Sum for this integral looks like \[\sum_{i=1}^n f(p_i)(x_i-x_{i-1})\]

It has three more parameters, a number and two lists of numbers satisfying some complicated conditions:

  • The number $n$ of subdivisions. 
  • The partition, which
    • is a list of $n+1$ numbers $\{x_0,x_1,\ldots,x_n\}$
    • satisfies the conditions
      •  $x_0<x_1<\ldots<x_n$
      • $x_0=a$
      • $x_n=b$
  • The list of evaluation points, which
    • is a list of $n$ numbers $\{p_1,\ldots,p_n\}$
    • satisfies the condition $x_{i-1}\leq p_i \leq x_i$ for $i=1,\ldots,n$.

A Riemann sum may or may not have various important properties.

  • The partition can be
    • uniform
    • random
    • chosen by a rule (increase the number of points as the derivative increases, for example)
  • The evaluation points can be chosen
    • randomly
    • at the midpoint
    • at the left end
    • at the right end
    • at the lowest point
    • at the highest point.

So the concept is complex, with several constituents and interrelationships to hold in your head all at once.  Experienced math people learn concepts like this all the time.  Math students have a harder time.  Manipulable diagrams can help.  Here is an example:

The Demo

 

In a class where students use computers with CDF Player installed, you could give them this demo along with instructions about how to use it and a list of questions that they must answer.  

Examples of instructions

  • Click on the big plus sign in the upper right corner for some options.
  • Move the slide labeled $n$ to make more or fewer subdivisions.
  • Click on the little plus sign besides the slide for some options such as allowing $n$ to increase automatically.
  • The buttons allow you to choose the type of partition, the type of evaluation points, and five functions to play with.

Sample questions

  1. Set $n=1$, uniform partition and midpoint and look at the results for each function.  Explain what you see.
  2. Set $n=4$,  uniform partition and midpoint and look at the results for each function.  Explain each of the following by referring to the picture:
    • For $x\mapsto x$, the estimate is exact.
    • For $x\mapsto x^2$, the estimate is less than the value of the integral.
    • For $x\mapsto x^5$, the error in the estimate is much worse than for $x^2$.
    • For $x\mapsto \sqrt{1-x^2}$ , the estimate is greater than the value of the integral.
  3. Go through the examples in 2. and check that when you make $n$ bigger the properties stated continue to be true.  Can you explain this?
  4. Starting with $n=4$, uniform and midpoint and then using bigger values, note that the error for  $x\mapsto \sqrt{1-x^2}$ is always bigger than the error for  $x\mapsto \sin \pi x$.  Try to explain this.  (Don't ask the students to prove it in freshman calculus).
  5. For $n=4$, uniform and midpoint (and then try bigger $n$), for $x\mapsto x^5$, the LeftSide error is always less than the RightSide error.  Explain using the picture.
  6. For which curves is the LeftSide estimate always the Lower Sum?  Always the Upper Sum?  Neither?  Does using Random instead of Uniform change these answers?

There are many other questions like this you can ask. After answering some of them, I claim (without proof) that the students will have a much better understanding of Riemann sums.

Note that teachers can use this Demo without knowing anything at all about Mathematica.  There are hundreds of Demos available in the cloud that can be used in the same way; many of the best are on the Wolfram Demonstration Project.

If you can program some in Mathematica, you can take the source code for this demo and modify it, for example to use other functions, to provide functions with changeable parameters and to use partitions following dynamic rules.

You could also have this up on a screen in your classroom for class discussion.  But I doubt that is the best use.  For classroom demos you probably need simple on-off demos that you prepare ahead or even write on the spot.  An example of a simple demo is in the post Offloading Abstraction.  I will talk about simple demos more in a later post.

Rant about why math teachers should use manipulable diagrams

A teacher in the past would draw an example of a RIemann sum on the blackboard and talk about a few features as they point at the board.  Nowadays, teachers have slides with accurately drawn Riemann sums and books have pictures of them.  This sort of thing gives the student a picture which (hopefully) stays in their head.  That picture is a kind of metaphor which enables you to think of the sum in terms of something that you are familiar with, just as you can think of a function as position and its derivative as velocity.  (Position and velocity are familiar from driving or any other kind of moving.  The picture of a Riemann sum is not something you knew before you studied them, but your brain has remarkable abilities to absorb a picture and the relations between parts of the picture, so once you have seen it you can call it up whenever you think of Riemann sums.)

But there are a lot of aspects of Riemann sums that cannot be demonstrated by a still picture.  When the mesh gets finer, the value of the sum tends to be closer to the exact value of the integral.  You can stare at the still picture and sort of visualize this.  Can you visualize a situation where changing to a finer mesh could make the error worse?  If someone suggests a high-frequency sine wave, can you visualize in your head why a finer mesh might make it worse?

An elaborate demo with lots of push buttons is something for students to play with on their own time and thereby gain a better understanding of the topic.  Before manipulable diagrams the only way you could do this was produce physical models.  I don't know of anyone who produced a physical model of a Riemann sum.  It is possible to do so with some parameters changeable but it would be difficult and not as flexible as the demo given here.

The world has more possibilities.  Use them.

Related posts

An elaborate Riemann Sum Demo (Mathematica notebook, source of the demo in this post)

Freezing a family of functions (previous post)

Images and Metaphors (in abstractmath.org)

Offloading abstraction (previous post)

More about the definition of function

Maya Incaand commented on my post Definition of "function":

Why did you decide against "two inequivalent descriptions in common use"?  Is it no longer true?

This question concerns [1], which is a draft article.  I have not promoted it to the standard article in abstractmath because I am not satisfied with some things in it. 

More specifically, there really are two inequivalent descriptions in common use.  This is stated by the article, buried in the text, but if you read the beginning, you get the impression that there is only one specification.  I waffled, in other words, and I expect to rewrite the beginning to make things clearer.

Below are the two main definitions you see in university courses taken by math majors and grad students.  A functional relation has the property that no two distinct ordered pairs have the same first element.

Strict definition: A function consists of a functional relation with specified codomain (the domain is then defined to be the set of first elements of pairs in the relation).  Thus if $A$ and $B$ are sets and $A\subseteq B$, then the identity function $1_A:A\to A$ and the inclusion function $i:A\to B$  are two different functions.

Relational definition: A function is a functional relation.  Then the identity and inclusion functions are the same function.  This means that a function and its graph are the same thing (discussed in the draft article).

These definitions are subject to variations:

Variations in the strict definition: Some authors use "range" for "codomain" in the definition, and some don't make it clear that two functions with the same functional relation but different codomains are different functions.

Variations in the relational definition: Most such definitions state explicitly that the domain and range are determined by the relation (the set of first coordinates and the set of second coordinates). 

Formalism

There are many other variations in the formalism used in the definition.  For example, the strict definition can be formalized (as in Wikipedia) as an ordered triple $(A, B, f)$ where $A$ and $B$ are sets and $f$ is a functional relation with the property thar every element of $A$ is the first element of an ordered pair in the relation.  

You could of course talk about an ordered triple $(A,f,B)$ blah blah.  Such definitions introduce arbitrary constructions that have properties irrelevant to the concept of function.  Would you ever say that the second element of the function $f(x)=x+1$ on the reals is the set of real numbers?  (Of course, if you used the formalism $(A,f,B)$ you would have to say the second element of the function is its graph! )

It is that kind of thing that led me to use a specification instead of a definition.  If you pay attention to such irrelevant formalism there seems to be many definitions of function.  In fact, at the university level there are only two, the strict definition and the relational definition.  The usage varies by discipline and age.  Younger mathematicians are more likely to use the strict definition.  Topologists use the strict definition more often than analysts (I think).

Usage

There is also variation in usage.

  • Most authors don't tell you which definition they use, and it often doesn't matter anyway. 
  • If an author defines a function using a formula, there is commonly an implicit assumption that the domain includes everything for which the formula is well-defined.  (The "everything" may be modified by referring to it as an integer, real, or complex function.)

Definitions of function on the web

Below are some definitions of function that appear on the web.  I have excluded most definitions aimed at calculus students or below; they often assume you are talking about numbers and formulas.  I have not surveyed textbooks and research papers.  That would have to be done for a proper scholarly article about mathematical usage of "function". But most younger people get their knowledge from the web anyway.

  1. Abstractmath draft article: Functions: Specification and Definition.  (Note:  Right now you can't get to this from the Table of Contents; you have to click the preceding link.) 
  2. Gyre&Gimble post: Definition of "function"
  3. Intmath discussion of function  Function as functional relation between numbers, with induced domain and range.
  4. Mathworld definition of function Functional-relation definition.  Defines $F:A\to B$ in a way that requires $B$ to be the image.
  5. Planet Math definition of function Strict definition.
  6. Prime Encyclopedia of Mathematics Functional-relation definition.
  7. Springer Encyclopedia of Math definition of function  Strict definition, except not clear if different codomains mean different functions.
  8. Wikipedia definition of function Discusses both definitions.
  9. Wisconsin Department of Public Instruction Definition of function  Function as functional relation.

The meaning of the word “superposition”

This is from the Wikipedia article on Hilbert's 13th Problem as it was on 31 March 2012:

[Hilbert's 13th Problem suggests this] question: can every continuous function of three variables be expressed as a composition  of finitely many continuous functions of two variables? The affirmative answer to this general question was given in 1957 by Vladimir Arnold, then only nineteen years old and a student of Andrey Kolmogorov. Kolmogorov had shown in the previous year that any function of several variables can be constructed with a finite number of three-variable functions. Arnold then expanded on this work to show that only two-variable functions were in fact required, thus answering Hilbert's question.  

In their paper A relation between multidimensional data compression and Hilbert’s 13th  problem,  Masahiro Yamada and Shigeo Akashi describe an example of Arnold's theorem this way: 

Let $f ( \cdot , \cdot, \cdot )$ be the function of three variable defined as \(f(x, y, z)=xy+yz+zx\), $x ,y , z\in \mathbb{C}$ . Then, we can easily prove that there do not exist functions of two variables $g(\cdot , \cdot )$ , $u(\cdot, \cdot)$ and $v(\cdot , \cdot )$ satisfying the following equality: $f(x, y, z)=g(u(x, y),v(x, z)) , x , y , z\in \mathbb{C}$ . This result shows us that $f$ cannot be represented any 1-time nested superposition constructed from three complex-valued functions of two variables. But it is clear that the following equality holds: $f(x, y, z)=x(y+z)+(yz)$ , $x,y,z\in \mathbb{C}$ . This result shows us that $f$ can be represented as a 2-time nested superposition.

The article about superposition in All about circuits says:

The strategy used in the Superposition Theorem is to eliminate all but one source of power within a network at a time, using series/parallel analysis to determine voltage drops (and/or currents) within the modified network for each power source separately. Then, once voltage drops and/or currents have been determined for each power source working separately, the values are all “superimposed” on top of each other (added algebraically) to find the actual voltage drops/currents with all sources active. 

Superposition Theorem in Wikipedia:

The superposition theorem for electrical circuits states that for a linear system the response (Voltage or Current) in any branch of a bilateral linear circuit having more than one independent source equals the algebraic sum of the responses caused by each independent source acting alone, while all other independent sources are replaced by their internal impedances.

Quantum superposition in Wikipedia:  

Quantum superposition is a fundamental principle of quantum mechanics. It holds that a physical system — such as an electron – exists partly in all its particular, theoretically possible states (or, configuration of its properties) simultaneously; but, when measured, it gives a result corresponding to only one of the possible configurations (as described in interpretation of quantum mechanics).

Mathematically, it refers to a property of solutions to the Schrödinger equation; since theSchrödinger equation is linear, any linear combination of solutions to a particular equation will also be a solution of it. Such solutions are often made to be orthogonal (i.e. the vectors are at right-angles to each other), such as the energy levels of an electron. By doing so the overlap energy of the states is nullified, and the expectation value of an operator (any superposition state) is the expectation value of the operator in the individual states, multiplied by the fraction of the superposition state that is "in" that state

The CIO midmarket site says much the same thing as the first paragraph of the Wikipedia Quantum Superposition entry but does not mention the stuff in the second paragraph.

In particular, the  Yamada & Akashi article describes the way the functions of two variables are put together as "superposition", whereas the Wikipedia article on Hilbert's 13th calls it composition.  Of course, superposition in the sense of the Superposition Principle is a composition of multivalued functions with the top function being addition.  Both of Yamada & Akashi's examples have addition at the top.  But the Arnold theorem allows any continuous function at the top (and anywhere else in the composite).  

So one question is: is the word "superposition" ever used for general composition of multivariable functions? This requires the kind of research I proposed in the introduction of The Handbook of Mathematical Discourse, which I am not about to do myself.

The first Wikipedia article above uses "composition" where I would use "composite".  This is part of a general phenomenon of using the operation name for the result of the operation; for examples, students, even college students, sometimes refer to the "plus of 2 and 3" instead of the "sum of 2 and 3". (See "name and value" in abstractmath.org.)  Using "composite" for "composition" is analogous to this, although the analogy is not perfect.  This may be a change in progress in the language which simplifies things without doing much harm.  Even so, I am irritated when "composition" is used for "composite".

Quantum superposition seems to be a separate idea.  The second paragraph of the Wikipedia article on quantum superposition probably explains the use of the word in quantum mechanics.

Offloading chunking

In my previous post I wrote about the idea of offloading abstraction, the sort of things we do with geometric figures, diagrams (that post emphasized manipulable diagrams), drawing the tree of an algebraic expression, and so on.  This post describes a way to offload chunking.  

Chunking

I am talking about chunking in the sense of encapsulation, as some math ed. people use it.  I wrote about it briefly in [1], and [2] describes the general idea.  I don't have a good math ed reference for it, but I will include references if readers supply them.  

Chunking for some educators means breaking a complicated problem down into pieces and concentrating on them one by one.  That is not really the same thing as what I am writing about.  Chunking as I mean it enables you to think more coherently and efficiently about a complicated mathematical structure by objectifying some of the data in the structure.  

Project 

This project an example of how chunking could be made visible in interactive diagrams, so that the reader grasps the idea of chunking.  I guess I am chunking chunking.  

Here is a short version of an example of chunking worked out in ridiculous detail in reference [1]. 

Let \[f(x)=.0002{{\left( \frac{{{x}^{3}}-10}{3{{e}^{-x}}+1} \right)}^{6}}\]  How do I know it is never negative?  Well, because it has the form (a positive number)(times)(something)$^6$.    Now (something)$^6$ is ((something)$^3)^2$ and a square is always nonnegative, so the function is (positive)(times)(nonnegative), so it has to be nonnegative.  

I recognized a salient fact about .0002, namely that it was positive: I grayed out (in my mind) its exact value, which is irrelevant.  I also noticed a salient fact about \[{{\left( \frac{{{x}^{3}}-10}{3{{e}^{-x}}+1} \right)}^{6}}\] namely that it was (a big mess that I grayed out)(to the 6th power).  And proceeded from there.  (And my chunking was inefficient; for example, it is more to the point that .0002 is nonnegative).

I believe you could make a movie of chunking like this using Mathematica CDF.  You would start with the formula, and then as the voiceover said "what's really important is that .0002 is nonnegative" the number would turn into a gray cloud with a thought balloon aimed at it saying "nonnegative".  The other part would turn into a gray cloud to the sixth, then the six would break into 3 times 2 as the voice comments on what is happening.  

It would take a considerable amount of work to carry this out.  Lots of decisions would need to be made.  

One problem is that Mathematica doesn't provide a way to do voiceovers directly (as far as I know).  Perhaps you could make a screen movie using screenshot software in real time while you talked and (offscreen) pushed buttons that made the various changes happen.

You could also do it with print instead of voiceover, as I did in the example in this post. In this case you need to arrange to have the printed part and the diagram simultaneously visible.  

I may someday try my hand at this.  But I would encourage others to attack this project if it interests them.  This whole blog is covered by the Creative Commons Attribution – ShareAlike 3.0 License", which means you may use, adapt and distribute the work freely provided you follow the requirements of the license.

I have other projects in mind that I will post separately.

References

  1. Abstractmath article on chunking.
  2. Wikipedia on chunking

Offloading abstraction

Note: To manipulate the diagrams in this post and in most of the files it links to, you must have Wolfram CDF Player installed on your computer. It is available free from the Wolfram website.

The diagram above shows you the tangent line to the curve $y=x^3-x$ at a specific point.  The slider allows you to move the point around, and the tangent line moves with it. You can click on one of the plus signs for options about things you can do with the slider.  (Note: This is not new.  Many other people have produced diagrams like this one.)

I have some comments to make about this very simple diagram. I hope they raise your consciousness about what is going on when you use a manipulable demonstration.

Farming out your abstraction load

A diagram showing a tangent line drawn on the board or in a paper book requires you visualize how the tangent line would look at other points.  This imposes a burden of visualization on you.  Even if you are a new student you won't find that terribly hard (am I wrong?) but you might miss some things at first:

  • There are places where the tangent line is horizontal.
  • There are places where some of the tangent lines cross the curve at another point. Many calculus students believe in the myth that the tangent line crosses the curve at only one point.  (It is not really a myth, it is a lie.  Any decent myth contains illuminating stories and metaphors.)
  • You may not envision (until you have some experience anyway) how when you move the tangent line around it sort of rocks like a seesaw.

You see these things immediately when you manipulate the slider.

Manipulating the slider reduces the load of abstract thinking in your learning process.     You have less to keep in your memory; some of the abstract thinking is offloaded onto the diagram.  This could be described as contracting out (from your head to the picture) part of the visualization process.  (Visualizing something in your head is a form of abstraction.)

Of course, reading and writing does that, too.  And even a static graph of a function lowers your visualization load.  What interactive diagrams give the student is a new tool for offloading abstraction.

You can also think of it as providing external chunking.  (I'll have to think about that more…)

Simple manipulative diagrams vs. complicated ones

The diagram above is very simple with no bells and whistles.  People have come up with much more complicated diagrams to illustrate a mathematical point.  Such diagrams:

  • May give you buttons that give you a choice of several curves that show the tangent line.
  • May give a numerical table that shows things like the slope or intercept of the current tangent line.
  • May also show the graph of the derivative, enabling you to see that it is in fact giving the value of the slope.

Such complicated diagrams are better suited for the student to play with at home, or to play with in class with a partner (much better than doing it by yourself).  When the teacher first explains a concept, the diagrams ought to be simple.

Examples

  • The Definition of derivative demo (from the Wolfram Demonstration Project) is an example that provides a table that shows the current values of some parameters that depend on the position of the slider.
  • The Wolfram demo Graphs of Taylor Polynomials is a good example of a demo to take home and experiment extensively with.  It gives buttons to choose different functions, a slider to choose the expansion point, another one to choose the number of Taylor polynomials, and other things.
  • On the other hand, the Wolfram demo Tangent to a Curve is very simple and differs from the one above in one respect: It shows only a finite piece of the tangent line.  That actually has a very different philosophical basis: it is representing for you the stalk of the tangent space at that point (the infinitesimal vector that contains the essence of the tangent line).
  • Brian Hayes wrote an article in American Scientist containing a moving graph (it moves only  on the website, not in the paper version!) that shows the changes of the population of the world by bars representing age groups.  This makes it much easier to visualize what happens over time.  Each age group moves up the graph — and shrinks until it disappears around age 100 — step by step.  If you have only the printed version, you have to imagine that happening.  The printed version requires more abstract visualization than the moving version.
  • Evaluating an algebraic expression requires seeing the abstract structure of the expression, which can be shown as a tree.  I would expect that if the students could automatically generate the tree (as you can in Mathematica)  they would retain the picture when working with an expression.  In my post computable algebraic expressions in tree form I show how you could turn the tree into an evaluation aid.  See also my post Syntax trees.

This blog has a category "Mathematica" which contains all the graphs (many of the interactive) that are designed as an aid to offloading abstraction.

Idempotents by sketches and forms

 
This post provides a detailed description of an example of a mathematical structure presented as a sketch and as a form.  It is a supplement to my article An Introduction to forms.  Most of the constructions I mention here are given in more detail in that article.
 
It helps in reading this post to be familiar with the basic ideas of category, including commutative diagram and limit cone, and of the concepts of logical theory and model in logic.
 

Sketches and forms

sketch of a mathematical structure is a collection of objects and arrows that make up a digraph (directed graph), together with some specified cones, cocones and diagrams in the digraph.  A model of the sketch is a digraph morphism from the digraph to some category that takes the cones to limit cones, the cocones to colimit cocones, and the diagrams to commutative diagrams.  A morphism of models of a sketch from one model to another in the same category is a natural transformation.  Sketches can be used to define all kinds of algebraic structures in the sense of universal algebra, and many other types of structures (including many types of categories).  

There are many structures that sketches cannot sketch.  Forms were first defined in [4].  They can define anything a sketch can define and lots of other things.  [5] gives a leisurely description of forms suitable for people who have a little bit of knowledge of categories and [1] gives a more thorough description.  

An idempotent is a very simple kind of algebraic structure.  Here I will describe both a sketch and a form for idempotents. In another post I will do the same for binops (magmas).

Idempotent

An idempotent is a unary operation $u$ for which $u^2=u$.

  • If $u$ is a morphism in a category whose morphisms are set functions, a function $u:S\to S$ is an idempotent if $u(u(x))=u(x)$ for all $x$ in the domain.  
  • Any identity element in any category is an idempotent.
  • A nontrivial example is the function $u(x,y):=(-x,0)$ on the real plane.  

Any idempotent $u$ makes the following diagram commute

and that diagram can be taken as the definition of idempotent in any category.

The diagram is in green.  In this post (and in [5]) diagrams in the category of models of a sketch or a form are shown in green.

A sketch for idempotents

The sketch for idempotents contains a digraph with one object and one arrow from that object to itself (above left) and one diagram (above right).  It has no cones or cocones.  So this is an almost trivial example.  When being expository (well, I can hardly say "when you are exposing") your first example should not be trivial, but it should be easy.  Let's call the sketch $\mathcal{S}$.

  • The diagram looks the same as the green diagram above.  It is in black, because I am showing things in syntax (things in sketches and forms) in black and semantics (things in categories of models) in green.
  • The green diagram is a commutative diagram in some category (unspecified).  
  • The black diagram is a diagram in a digraph. It doesn't make sense to say it is commutative because digraphs don't have composition of arrows.
  • Each sketch has a specific digraph and lists of specific diagrams, cones and cocones.  The left digraph above is not in the list of diagrams of $\mathcal{S}$ (see below).

The definition of sketch says that every diagram in the official list of diagrams of a given sketch must become a commutative diagram in a model.  This use of the word "become" means in this case that a model must be a digraph morphism $M:\mathcal{S}\to\mathcal{C}$ for some category $\mathcal{C}$ for which the diagram below commutes.

This sketch generates a category called the Theory ("Cattheory" in [5]) of the sketch $\mathcal{S}$, denoted by $\text{Th}(\mathcal{S})$.  It is roughly the "smallest" category containing $f$ and $C$ for which the diagrams in $\mathcal{S}$ are commutative.  
 
This theory contains the generic model $G:\mathcal{S}\to \text{Th}(\mathcal{S})$ that takes $f$ and $C$ to themselves.
  • $G$ is "generic" because anything you prove about $G$ is true of every model of $\mathcal{S}$ in any category.
  • In particular, in the category $\text{Th}(\mathcal{S})$, $G(f)\circ G(f)=G(f)$.  
  • $G$ is a universal morphism in the sense of category theory: It lifts any model $M:\mathcal{S}\to\mathcal{C}$ to a unique functor $\bar{M}=M\circ G:\text{Th}(\mathcal{S})\to\mathcal{C}$ which can therefore be regarded as the same model.  See Note [2].
SInce models are functors, morphisms between models are natural transformations.  This gives what you would normally call homomorphisms for models of almost any sketchable structure.  In [2] you can find a sketch for groups, and indeed the natural transformations between models are group homomorphisms.

Sketching categories

You can sketch categories with a sketch CatSk containing diagrams and cones, but no cocones.  This is done in detail in [3]. The resulting theory $\text{Th}(\mathbf{CatSk})$ is required to be the least category-with-finite-limits generated by $\mathcal{S}$ with the diagrams becoming commutative diagrams and the cones becoming limit cones.  This theory is the FL-Theory for categories, which I will call ThCat (suppressing mention of FL).  

Doctrines

In general the theory of a particular kind of structure contains a parameter that denotes its doctrine. The sketch $\mathcal{S}$ for idempotents didn't require cones, but you can construct theories $\text{Th}(\mathcal{S})$, $\text{Th} (\text{FP},\mathcal{S})$ and $\text{Th}(\text{FL},\mathcal{S})$ for idempotents (FP means it is a category with finite products).  

In a strong sense, all these theories have the same models, namely idempotents, but the doctrine of the theory allows you to use more mechanisms for proving properties of idempotents.  (The doctrine for $\text{Th}(\mathcal{S})$ provides for equational proofs for unary operations only, a doctrine which has no common name such as FP or FS.)  The paper [1] is devoted to explicating proof in the context of forms, using graphs and diagrams instead of formulas that are strings of symbols.

Describing composable pairs of arrows

The form for any type of structure is constructed using the FL theory for some type of category, for example category with all limits, cartesian closed category, topos, and so on.  The form for idempotents can be constructed in ThCat (no extra structure needed).  The form for reflexive function spaces (for example) needs the FL theory for cartesian closed categories (see [5]).

Such an FL theory must contain objects $\text{ob}$ and $\text{ar}$ that become the set of objects and the set of arrows of the category that a model produces.  (Since FL theories have models in any category with finite limits, I could have said "object of objects" and "object of arrows".  But in this post I will talk about only models in Set.)

ThCat contains an object  $\text{ar}_2$ that represents composable pairs of arrows.  That requires a cone to define it:

This must become a limit cone in a model.

  • I usually show cones in blue. 
  • $\text{dom}$ and $\text{cod}$ give (in a model) the domain and codomain of an arrow.
  • $\text{lfac}$ gives the left factor and $\text{rfac}$ gives the right factor. It is usually useful to give suggestive names to some of the projections in situations like this, since they will be used elsewhere (where they will be black!).
  • The objects and arrows in the diagram (including $\text{ar}_2$) are already members of the FL theory for categories.
  • This diagram is annotated in green with sample names of objects and arrows that might exist in a model.  Atish and I introduced that annotation system in [1] to help you chase the diagram and think about what it means.

This cone is a graph-based description of the object of composable arrows in a category (as opposed to a linguistic or string-based description).

Describing endomorphisms

Now an idempotent must be an endomorphism, so we provide a cone describing the object of endomorphisms in a category. This cone already exists in the FL theory for categories.

  • $\text{loop}$ is a monomorphism (in fact a regular mono because it is the mono produced by an equalizer) so it is not unreasonable to give the element annotation for $\text{endo}$ and $\text{ar}$ the same name.
  • "$\text{dc}$" takes $f$ to its domain and codomain. 
  • $\text{loop}$ and "$\text{dc}$" were not created when I produced the cone above.  They were already in the FL theory for categories.
 
Since the cone defining $\text{ar}_2$ is a limit cone (in the Theory, not in a model), if you have any other commutative cone (purple) to that cone, a unique arrow (red) $\text{diag}$ automatically is present as shown below:

This particular purple cone is the limit cone defining $\text{endo}$ just defined.  Now $\text{diag}$ is a specific arrow in the FL theory for categories. In a model of the theory (which is a category in Set or in some other category) takes an endomorphism to the corresponding pair of composable arrows.

The object of idempotents

Now using these arrows we can define the object $\text{idm}$ of idempotents using the diagram below. See Note [3].

 

 

 

 

 

Idm is an object in ThCat.  In any category, in other words in any model of ThCat, idm becomes the set of idempotent arrows in that category.

In the terminology of [5], the object idm is the form for idempotents, and the cone it is the limit of is the description of idempotent.  

Now take ThCat and adjoin an arrow $g:1\to\text{idm}$.  You get a new FL category I will call the FL-theory of the form for idempotents.  A model of the theory of the form in Set  is a category with a specified idempotent. A particular example of a model of the form idm in the category of real linear vector spaces is the map $u(x,y):=(-x,0)$ of the (set of points of) the real plane to itself (it is an idempotent endomorphism of $\textbf{R}^2$).  

This example is typical of forms and their models, except in one way:  Idempotents are also sketchable, as I described above.  Many mathematical structures can be perceived as models of forms, but not models of sketches, such as reflexive function spaces as in [5].

Notes

[1] The diagrams shown in this post were drawn in Mathematica.  The code for them is shown in the notebook SketchFormExamples.nb .  I am in the early stages of developing a package for drawing categorical diagrams in Mathematica, so this notebook shows the diagrams defined in very primitive machine-code-like Mathematica.  The package will not rival xypic for TeX any time soon.  I am doing it so I can produce diagrams (including 3D diagrams) you can manipulate.

[2] In practice I would refer to the names of the objects and arrows in the sketch rather than using the M notation:  I might write $f\circ f=f$ instead of $M(f)\circ M(f)=M(f)$ for example.  Of course this confuses syntax with semantics, which sounds like a Grievous Sin, but it is similar to what we do all the time in writing math:  "In a semigroup, $x$ is an idempotent if $xx=x$."  We use same notation for the binary operation for any semigroup and we use $x$ as an arbitrary element of most anything.  Actually, if I write $f\circ f=f$ I can claim I am talking in the generic model, since any statement true in the generic model is true in any model.  So there.

[3] In the Mathematica notebook SketchFormExamples.nb in which I drew these diagrams, this diagram is plotted in Euclidean 3-space and can be viewed from different viewpoints by running your cursor over it.

References

[1] Atish Bagchi and Charles Wells, Graph-Base Logic and Sketches, draft, September 2008, on ArXiv.

[2] Michael Barr and Charles Wells, Category Theory for Computing Science (1999). Les Publications CRM, Montreal (publication PM023).

[3] Michael Barr and Charles Wells, Toposes, Triples and Theories (2005). Reprints in Theory and Applications of Categories 1.

[4] Charles Wells, A generalization of the concept of sketch, Theoretical Computer Science 70, 1990

[5] Charles Wells, An Introduction to forms.

 

 

 

Bugs in English and in math

Everyone knows that computer programs have bugs.  In fact, languages have bugs, too, although we don't usually call them that.  

Bugs in English 

  

Right

Q: "Should I turn left at the next corner?" A: "Right".  Probably most Americans who drive now know this bug.  The answer could mean "yes" or "turn right".  So we have to stop and think how to answer this question.  That makes it a bug.  

Too, two

Comment: " We will take Route 30".  Answer: "We will take Route 30 too".  This bug is probably responsible for the survival of the word "also".  

Note that unlike the case of "right", this is a bug only of spoken English.

Subject and predicate

In Comma rule found dysfunctional, I wrote about the problem that in formal English writing there is no way to indicate where the subject ends and the predicate begins.  This causes a problem reading complicated sentences with many clauses such as academic writing often uses.  Of course, one way around this is to write short, simple sentences!  (That sounds like the subject of a future blog…) 

Bugs in the symbolic language of math

  

Fractions

In both Excel and Mathematica, "1/2*3" means 3/2. Now, I would think "1/2a" means "1/(2a)", but younger mathematicians are taught PEMDAS (see Purplemath), which says that division and multiplication have the same precedence and operations are evaluated from left to right.  

 If in Mathematica you define a function f[a_] := 1/2a, f[3] evaluates to 3/2, so Mathematica (and most other computer languages) agree with PEMDAS. (Note: When you write 1/2a in a Mathematica notebook, it automatically puts a space between the 2 and the a, and space in Mathematica means times, so it does warn you.)

Nevertheless, my ancient education would lead me to write (1/2)a for that meaning.  This means I must learn to write 1/(2a) for the other meaning instead of 1/2a.  

Questions:

  • Did the language really change or was I always "doing it wrong"?  I would like to hear from other ancient mathematicians.  (But I don't know very many who would read blogs or Purplemath.)
  • Should such a phenomenon be called a bug? 

Repeated exponentiation

In Excel, "2^2^3" means $(2^2)^3$, in other words, 64.  In Mathematica, it means $2^{(2^3)}=2^8=256$.  My impression is that most mathematicians expect it to mean $2^{(2^3)}$.  

References: This post in Walking Randomly, my post Mathematical UsageWikipedia's article.  

Exponentiation on functions is ambiguous

If $f:\mathbb{R}\to\mathbb{R}$ is a function, $f^2(x)$ can mean either $f(f(x))$ or $f(x)f(x)$, and both usages are common.  You should tell your students about this because no one is ever going to make one of the usages go away.

A far worse catastrophe is the fact that in calculus books, $\sin^2x=(\sin\,x)(\sin\,x)$ but $\sin^{-1}x=\text{arcsin}\,x$.  I betcha (lived in Minnesota four years now) we could succeed with a campaign to convince calc book publishers to always write $(\sin\,x)^2$ and $\arcsin\,x$.  

Bugs in the Mathematical Dialect of English

The mathematical dialect of English is what I call Mathematical English in the abstractmath website.  It is a different language from the symbolic language, which is not a dialect of English.

I have written about the problems with Mathematical English in a ridiculous number of places.  (See references in The Handbook of Mathematical Discourse).  It is normal for a dialect of a language to use words and grammatical structures that in the original language mean different things.  (See Dialects below).

Words with different meanings

  • A set is a group in standard English, but not in math English.  
  • The number 2+3i is a real number in standard English, but not in math English.  
  • And so on.

Use of adjectives and prefixes

  • A "noncommutative ring" has commutative addition.
  • A "semigroup" has a fully defined binary operation.

If, then

The bug that grabs math newbies by the throat and won't let go is the meaning of "If P, then Q".  

  • "If a number is divisible by 4, then it is even" in math dialect means a number not divisible by 4 might be even anyway.
  • "If you eat your broccoli you will get your dessert" in standard American Parental English does not mean you might get your dessert if you don't eat your broccoli.

And then there is the phenomenon of Vacuous Implication, which leaves students gasping and writhing.

About "dialects"

Most Americans are not familiar with dialects in the sense I am using the word here, since the only really different dialects we have are Gullah and Hawaiian Pidgin, both of which are very hard to understand; although for example Appalachian English and African-American urban vernacular [1] are dialects of a milder sort.  I grew up in Savannah and heard diluted Gullah sometimes on the street (didn't understand much).  I am also rather familiar with Züritüütsch since we lived in Zürich for a year.   

What the rest of the world call dialects have many distinctive properties:

  • They have nonstandard pronunciation to the point where they are difficult to understand. 
  • They have differences in grammar.  (Both Gullah and especially Hawaiian Creole have differences in grammar from Standard English.) 
  • They have differences in vocabulary, enough sometimes to cause misunderstanding.

I grew up speaking an Atlanta dialect, which really did have differences in all those parameters.  But what people today call a Southern accent is really just an accent (minor variations in pronunciation), not a dialect.  

Hawaiian Creole, and possibly Gullah, but not the other dialects I mentioned, are singled out by linguists as creoles because they been modified heavy influence from another language.  Züritüütsch is not a creole, but it is quite difficult for native German-speakers to understand.  The Swiss situation particularly emphasizes the distinction between "dialect" and "accent".  The typical native of Zürich speaks Züritüütsch and also speaks standard German with a Swiss accent.  

Reference

[1] What Language Is (And What It Isn't and What It Could Be) by John H. McWhorter. Gotham, 2011.

 

 

An Introduction to Forms

In 2009, I wrote a sequence of posts on this blog explaining the concept of form that I introduced in [1].  I have now updated and combined them into an article [2].  The posts no longer exist on the blog. The article contains links to other papers on forms.

[1] A generalization of the concept of sketch, Theoretical Computer Science 70, 1990.

[2] An Introduction to forms.