Category Archives: abstractmath.org

Posts about the abstractmath.org website.

Presenting math on the web

This is a long post about ways to present math on the web, in the context of what I have done with The Handbook of Mathematical Discourse and abstractmath.org (Abmath).  “Ways to present math” include both organization and production technology.

The post is motivated by and focused on my plans to reconstruct Abmath this fall, when I will not be teaching.    During the last couple of years I have experimented with several possibilities for the reconstruction (while doing precious little on the actual website) and have come to a tentative conclusion about how I will do it.  I am laying all this out here, past history and future plans, in the hope that readers will have suggestions that will help the process (or change my mind).

I set out to write both the Handbook and Abmath using ideas about how math should be presented on the web.  They came out differently.  Now I think I went wrong with some of the ways in which I organized Abmath and that I need to reconstruct it so that it is more like the Handbook.  On the other hand, I have decided to stick with the production method I used for Abmath. I will explain.

Organization

My concept for both these works was that they  would have these properties:

1) Each work would be a cloud of articles. They would have little or no hierarchy.  They would consist of lots of short articles, not organized into chapters, sections and subsections.

2) The articles would be densely hyperlinked with each other and with the rest of the web. The reader would use the links to move from article to article. The articles might occur in alphabetical order in the production file but to the reader the order would be irrelevant.

I wanted the works to be organized that way because that is what I wanted from an information-presenting website.  I want it that way because I am a grasshopper. Wikipedia and n-lab are each organized as a cloud of articles. I started writing the Handbook in the late nineties before Wikipedia began.

The Handbook exists in two forms. The web version is a hypertext PDF file that consists of short articles with extensive interlinking. The printed book has the same short articles arranged in alphabetical order. In the book form, the links are replaced by page indices (“paper hyperlinks”). In both forms some links are arranged as lists  of related topics.

Abstractmath.org is a large, interlinked collection of html pages.  They are organized in four large sections with many subsections.

Many entrances

For this cloud of articles arrangement to work, there must be many entrances into the website, so that a reader can find what they want. The Handbook has a list of entries in alphabetical order. Certain entries (for example the entries on attitudes, on behaviors, and on multiple meanings) have internal lists of links to examples of what that entry discusses.  In addition, the paper version has an index that (in theory) provides links to all important occurrences of each concept in the book.  This index is not included in the current hypertext version, although the LaTeX package hyperref would make it possible to include it.  On the other hand, the hypertext version has the PDF search capability.

Abmath has a table of contents, listing articles in hierarchical form, as well as an index, which is different from the Handbook index in that it gives only one link from each word or phrase. In addition, it has header sections that briefly describe the contents of each main section and (in some cases) subsection, and also a Diagnostic Examples section (currently fragmentary)in which each entry provides a description of a particular problem that someone may have in understanding abstract math, with links to where it is discussed. The website currently has no search capability.

The Handbook is really a cloud of articles, and Abmath is not. I made a serious mistake imposing a hierarchy on Abmath, and that is the main thing I want to correct when I reconstruct it.  Basically, I want to dissolve the hierarchy into a cloud of articles.

Production methods

The Handbook was composed using LaTeX.  It originally existed in hypertext form (in a PDF file) and lived on the web for several years, generating many useful suggestions. I wrote a LaTeX header that could be set to produce PDF output with hyperlinks or PDF output formatted as a book with paper hyperlinks; that form was eventually published as a book.

I used a number of Awk programs to gather the various kinds of links.  For example, every entry referring to a math word that has multiple meanings was marked and an Awk program gathered them into a list of links.

I generated the html pages for Abmath using Microsoft Word and MathType.  MathType is very easy to use and has the capability (recently acquired) of converting all math entries that it generated  into TeX. The method used for Abmath has several defects.  You can’t apply Awk (or nowadays Python) programs to a Word document since it is in a proprietary format.  Another problem is that the appearance of the result varies with browser.

But the Abmath method also has advantages.  It produces html documents which can be read in windows that you can make narrower or wider and the text will adjust.  PDF files are fixed width and rigid, and I find clicking on links requires you to be annoyingly precise with your fingers.

So my original thought was to go back to LaTeX for the new version of Abmath. There are several ways to produce html files from LaTeX, and converting the MathType entries to TeX provides a big headstart on converting the Word files into text files.  Then I could use Awk to do a lot of bookkeeping and cut the hyperlink errors, the way I did with the Handbook.

So at first I was quite nostalgic about the wonderful time I had doing the Handbook in LaTeX — until I remembered all the fussing I did to include illustrations and marginal remarks. (I couldn’t just put the illo there and leave it.) Until I remembered how slowly the resulting PDF file loads because there seems to be no way to break it into individual article files without breaking the links.

And then I found that (as far as I could determine) there is no HTMLTeX that produces a reasonable HTML file from any TeX file the way PDFTeX produces a PDF file from any TeX file, using Knuth’s  TeX program. In fact all the TeX to HTML systems I investigated don’t use Knuth’s program at all — they just have code in some programming language that reads a TeX file and interprets what the programmer felt like interpreting.  I would love to be contradicted concerning this.

So now my thought is to stick with Word and MathType.  And to do textual manipulation I will have to learn Word Basic.  I just ordered two books on Word Basic. I would rather learn Python, but I have to work with what I have already done.  Stay tuned.

Send to Kindle

Function as map

This is a first draft of an article to eventually appear in abstractmath.

Images and metaphors

To explain a math concept, you need to explain how mathematicians think about the concept. This is what in abstractmath I call the images and metaphors carried by the concept. Of course you have to give the precise definition of the concept and basic theorems about it. But without the images and metaphors most students, not to mention mathematicians from a different field, will find it hard to prove much more than some immediate consequences of the definition. Nor will they have much sense of the place of the concept in math and applications.

Teachers will often explain the images and metaphors with handwaving and pictures in a fairly vague way. That is good to start with, but it’s important to get more precise about the images and metaphors. That’s because images and metaphors are often not quite a good fit for the concept — they may suggest things that are false and not suggest things that are true. For example, if a set is a container, why isn’t the element-of relation transitive? (A coin in a coinpurse in your pocket is a coin in your pocket.)

“A metaphor is a useful way to think about something, but it is not the same thing as the same thing.” (I think I stole that from the Economist.) Here, I am going to get precise with the notion that a function is a map. I am acting like a mathematician in “getting precise”, but I am getting precise about a metaphor, not about a mathematical object.

A function is a map

A map (ordinary paper map) of Minnesota has the property that each point on the paper represents a point in the state of Minnesota. This map can be represented as a mathematical function from a subset of a 2-sphere to {{\mathbb R}^2}. The function is a mathematical idealization of the relation between the state and the piece of paper, analogous to the mathematical description of the flight of a rocket ship as a function from {{\mathbb R}} to {{\mathbb R}^3}.

The Minnesota map-as-function is probably continuous and differentiable, and as is well known it can be angle preserving or area preserving but not both.

So you can say there is a point on the paper that represents the location of the statue of Paul Bunyan in Bemidji. There is a set of points that represents the part of the Mississippi River that lies in Minnesota. And so on.

A function has an image. If you think about it you will realize that the image is just a certain portion of the piece of paper. Knowing that a particular point on the paper is in the image of the function is not the information contained in what we call “this map of Minnesota”.

This yields what I consider a basic insight about function-as-map:  The map contains the information about the preimage of each point on the paper map. So:

The map in the sense of a “map of Minnesota” is represented by the whole function, not merely by the image.

I think that is the essence of the metaphor that a function is a map. And I don’t think newbies in abstractmath always understand that relationship.

A morphism is a map

The preceding discussion doesn’t really represent how we think of a paper map of Minnesota. We don’t think in terms of points at all. What we see are marks on the map showing where some particular things are. If it is a road map it has marks showing a lot of roads, a lot of towns, and maybe county boundaries. If it is a topographical map it will show level curves showing elevation. So a paper map of a state should be represented by a structure preserving map, a morphism. Road maps preserve some structure, topographical maps preserve other structure.

The things we call “maps” in math are usually morphisms. For example, you could say that every simple closed curve in the plane is an equivalence class of maps from the unit circle to the plane. Here equivalence class meaning forget the parametrization.

The very fact that I have to mention forgetting the parametrization is that the commonest mathematical way to talk about morphisms is as point-to-point maps with certain properties. But we think about a simple closed curve in the plane as just a distorted circle. The point-to-point correspondence doesn’t matter. So this example is really talking about a morphism as a shape-preserving map. Mathematicians introduced points into talking about preserving shapes in the nineteenth century and we are so used to doing that that we think we have to have points for all maps.

Not that points aren’t useful. But I am analyzing the metaphor here, not the technical side of the math.

Groups are functors

People who don’t do category theory think the idea of a mathematical structure as a functor is weird. From the point of view of the preceding discussion, a particular group is a functor from the generic group to some category. (The target category is Set if the group is discrete, Top if it is a topological group, and so on.)

The generic group is a group in a category called its theory or sketch that is just big enough to let it be a group. If the theory is the category with finite products that is just big enough then it is the Lawvere theory of the group. If it is a topos that is just big enough then it is the classifying topos of groups. The theory in this sense is equivalent to some theory in the sense of string-based logic, for example the signature-with-axioms (equational theory) or the first order theory of groups. Johnstone’s Elephant book is the best place to find the translation between these ideas.

A particular group is represented by a finite-limit-preserving functor on the algebraic theory, or by a logical functor on the classifying topos, and so on; constructions which bring with them the right concept of group homomorphisms as well (they will be any natural transformations).

The way we talk about groups mimics the way we talk about maps. We look at the symmetric group on five letters and say its multiplication is noncommutative. “Its multiplication” tells us that when we talk about this group we are talking about the functor, not just the values of the functor on objects. We use the same symbols of juxtaposition for multiplication in any group, “{1}” or “{e}” for the identity, “{a^{-1}}” for the inverse of {a}, and so on. That is because we are really talking about the multiplication, identity and inverse function in the generic group — they really are the same for all groups. That is because a group is not its underlying set, it is a functor. Just like the map of Minnesota “is” the whole function from the state to the paper, not just the image of the function.

Send to Kindle

Syntax Trees in Mathematicians’ Brains

Understanding the quadratic formula

In my last post I wrote about how a student’s pattern recognition mechanism can go awry in applying the quadratic formula.

The template for the quadratic formula says that the solution of a quadratic equation of the form ${ax^2+bx+c=0}$ is given by the formula

$\displaystyle x=\frac{-b\pm\sqrt{b^2-4ac}}{2a}$

When you ask students to solve ${a+bx+cx^2=0}$ some may write

$\displaystyle x=\frac{-b\pm\sqrt{b^2-4ac}}{2a}$

instead of

$\displaystyle x=\frac{-b\pm\sqrt{b^2-4ac}}{2c}$

That’s because they have memorized the template in terms of the letters ${a}$, ${b}$ and ${c}$ instead of in terms of their structural meaning — $ {a}$ is the coefficient of the quadratic term, ${c}$ is the constant term, etc.

The problem occurs because there is a clash between the occurrences of the letters “a”, “b”, and “c” in the template and in the equation to solve. But maybe the confusion would occur anyway, just because of the ordering of the coefficients. As I asked in the previous post, what happens if students are asked to solve $ {3+5x+2x^2=0}$ after having learned the quadratic formula in terms of ${ax^2+bx+c=0}$? Some may make the same kind of mistake, getting ${x=-1}$ and ${x=-\frac{2}{3}}$ instead of $ {x=-1}$ and $ {x=-\frac{3}{2}}$. Has anyone ever investigated this sort of thing?

People do pattern recognition remarkably well, but how they do it is mysterious. Just as mistakes in speech may give the linguist a clue as to how the brain processes language, students’ mistakes may tell us something about how pattern recognition works in parsing symbolic statements as well as perhaps suggesting ways to teach them the correct understanding of the quadratic formula.

Syntactic Structure

“Structural meaning” refers to the syntactic structure of a mathematical expression such as ${3+5x+2x^2}$. It can be represented as a tree:

(1)

This is more or less the way a program compiler or interpreter for some language would represent the polynomial. I believe it corresponds pretty well to the organization of the quadratic-polynomial parser in a mathematician’s brain. This is not surprising: The compiler writer would have to have in mind the correct understanding of how polynomials are evaluated in order to write a correct compiler.

Linguists represent English sentences with syntax trees, too. This is a deep and complicated subject, but the kind of tree they would use to represent a sentence such as “My cousin saw a large ship” would look like this:

Parsing by mathematicians

Presumably a mathematician has constructed a parser that builds a structure in their brain corresponding to a quadratic polynomial using the same mechanisms that as a child they learned to parse sentences in their native language. The mathematician learned this mostly unconsciously, just as a child learns a language. In any case it shouldn’t be surprising that the mathematicians’s syntax tree for the polynomial is similar to the compiler’s.

Students who are not yet skilled in algebra have presumably constructed incorrect syntax trees, just as young children do for their native language.

Lots of theoretical work has been done on human parsing of natural language. Parsing mathematical symbolism to be compiled into a computer program is well understood. You can get a start on both of these by reading the Wikipedia articles on parsing and on syntax trees.

There are papers on students’ misunderstandings of mathematical notation. Two articles I recently turned up in a Google search are:

Both of these papers talk specifically about the syntax of mathematical expressions. I know I have read other such papers in the past, as well.

What I have not found is any study of how the trained mathematician parses mathematical expression.

For one thing, for my parsing of the expression $ {3+5x+2x^2}$, the branching is wrong in (1). I think of ${3+5x+2x^2}$ as “Take 3 and add $ {5x}$ to it and then add ${2x^2}$ to that”, which would require the shape of the tree to be like this:

I am saying this from introspection, which is dangerous!

Of course, a compiler may group it that way, too, although my dim recollection of the little bit I understand about compilers is that they tend to group it as in (1) because they read the expression from left to right.

This difference in compiling is well-understood.  Another difference is that the expression could be compiled using addition as an operator on a list, in this case a list of length 3.  I don’t visualize quadratics that way but I certainly understand that it is equivalent to the tree in Diagram (1).  Maybe some mathematicians do think that way.

But these observations indicate what might be learned about mathematicians’ understanding of mathematical expressions if linguists and mathematicians got together to study human parsing of expressions by trained mathematicians.

Some educational constructivists argue against the idea that there is only one correct way to understand a mathematical expression.  To have many metaphors for thinking about math is great, but I believe we want uniformity of understanding of the symbolism, at least in the narrow sense of parsing, so that we can communicate dependably.  It would be really neat if we discovered deep differences in parsing among mathematicians.  It would also be neat if we discovered that mathematicians parsed in generally the same way!


Send to Kindle

Templates in mathematical practice

This post is a first pass at what will eventually be a section of abstractmath.org. It’s time to get back to abstractmath; I have been neglecting it for a couple of years.

What I say here is based mainly on my many years of teaching discrete mathematics at Case Western Reserve University in Cleveland and more recently at Metro State University in Saint Paul.

Beginning abstract math

College students typically get into abstract math at the beginning in such courses as linear algebra, discrete math and abstract algebra. Certain problems that come up in those early courses can be grouped together under the notion of (what I call) applying templates [note 0]. These are not the problems people usually think about concerning beginners in abstract math, of which the following is an incomplete list:

The students’ problems discussed here concern understanding what a template is and how to apply it.

Templates can be formulas, rules of inference, or mini-programs. I’ll talk about three examples here.

The template for quadratic equations

The solution of a real quadratic equation of the form {ax^2+bx+c=0} is given by the formula

\displaystyle  x=\frac{-b\pm\sqrt{b^2-4ac}}{2a}

This is a template for finding the roots of the equations. It has subtleties.

For example, the numerator is symmetric in {a} and {c} but the denominator isn’t. So sometimes I try to trick my students (warning them ahead of time that that’s what I’m trying to do) by asking for a formula for the solution of the equation {a+bx+cx^2=0}. The answer is

\displaystyle x=\frac{-b\pm\sqrt{b^2-4ac}}{2c}

I start writing it on the board, asking them to tell me what comes next. When we get to the denominator, often someone says “{2a}”.

The template is telling you that the denominator is 2 times the coefficient of the square term. It is not telling you it is “{a}”. Using a template (in the sense I mean here) requires pattern matching, but in this particular example, the quadratic template has a shallow incorrect matching and a deeper correct matching. In detail, the shallow matching says “match the letters” and the deep matching says “match the position of the letters”.

Most of the time the quadratic being matched has particular numbers instead of the same letters that the template has, so the trap I just described seldom occurs. But this makes me want to try a variation of the trick: Find the solution of {3+5x+2x^2=0}. Would some students match the textual position (getting {a=3}) instead of the functional position (getting {a=5})? [Note [0]). If they did they would get the solutions {(-1,-\frac{2}{3})} instead of {(-1,-\frac{3}{2})}.

Substituting in algebraic expressions have other traps, too. What sorts of mistakes would students have solving {3x^2+b^2x-5=0}?

Most students on the verge of abstract math don’t make mistakes with the quadratic formula that I have described. The thing about abstract math is that it uses more sophisticated templates

  • subject to conditions
  • with variations
  • with extra levels of abstraction

The template for proof by induction

This template gives a method of proof of a statement of the form {\forall{n}\mathcal{P}(n)}, where {\mathcal{P}} is a predicate (presumably containing {n} as a variable) and {n} varies over positive integers. The template says:

Goal: Prove {\forall{n}\mathcal{P}(n)}.

Method:

  • Prove {\mathcal{P}(1)}
  • For an arbitrary integer {n>1}, assume {\mathcal{P}(n)} and deduce {\mathcal{P}(n+1)}.

For example, to prove {\forall n (2^n+1\geq n^2)} using the template, you have to prove that {2^2+1\geq  1^1}, and that for any {n>1}, if {2^n+1\geq n^2}, then {2^{n+1}+1\geq  (n+1)^2}. You come up with the need to prove these statements by substituting into the template. This template has several problems that the quadratic formula does not have.

Variables of different types

The variable {n} is of type integer and the variable {\mathcal{P}} is of type predicate [note 0]. Having to deal with several types of variables comes up already in multivariable calculus (vectors vs. numbers, cross product vs. numerical product, etc) and they multiply like rabbits in beginning abstract math classes. Students sometimes write things like “Let {\mathcal{P}=n+1}”. Multiple types is a big problem that math ed people don’t seem to discuss much (correct me if I am wrong).

Free and bound

The variable {n} occurs as a bound variable in the Goal and a free variable in the Method. This happens in this case because the induction step in the Method originates as the requirement to prove {\forall  n(\mathcal{P}(n)\rightarrow\mathcal{P}(n+1))}, but as I have presented it (which seems to be customary) I have translated this into a requirement based on modus ponens. This causes students problems, if they notice it. (“You are assuming what you want to prove!”) Many of them apparently go ahead and produce competent proofs without noticing the dual role of {n}. I say more power to them. I think.

The template has variations

  • You can start the induction at other places.
  • You may have to have two starting points and a double induction hypothesis (for {n-1} and {n}). In fact, you will have to have two starting points, because it seems to be a Fundamental Law of Discrete Math Teaching that you have to talk about the Fibonacci function ad nauseam.
  • Then there is strong induction.

It’s like you can go to the store and buy one template for quadratic equations, but you have to by a package of templates for induction, like highway engineers used to buy packages of plastic French curves to draw highway curves without discontinuous curvature.

The template for row reduction

I am running out of time and won’t go into as much detail on this one. Row reduction is an algorithm. If you write it up as a proper computer program there have to be all sorts of if-thens depending on what you are doing it for. For example if want solutions to the simultaneous equations

2x+4y+z = 1
x+2y = 0
x+2y+4z = 5

you must row reduce the matrix

2 4 1 1
1 2 0 0
1 2 4 5

(I haven’t yet figured out how to wrap this in parentheses) which gives you

1 2 0 0
0 0 1 0
0 0 0 1

This introduces another problem with templates: They come with conditions. In this case the condition is “a row of three 0s followed by a nonzero number means the equations have no solutions”. (There is another condition when there is a row of all 0’s.)

It is very easy for the new student to get the calculation right but to never sit back and see what they have — which conditions apply or whatever.

When you do math you have to repeatedly lean in and focus on the details and then lean back and see the Big Picture. This is something that has to be learned.

What to do, what to do

I have recently experimented with being explicit about templates, in particular going through examples of the use of a template after explicitly stating the template. It is too early to say how successful this is. But I want to point out that even though it might not help to be explicit with students about templates, the analysis in this post of a phenomenon that occurs in beginning abstract math courses

  • may still be accurate (or not), and
  • may help teachers teach such things if they are aware of the phenomenon, even if the students are not.

Notes

  1. Many years ago, I heard someone use the word “template” in the way I am using it now, but I don’t recollect who it was. Applied mathematicians sometimes use it with a meaning similar to mine to refer to soft algorithms–recipes for computation that are not formal algorithms but close enough to be easily translated into a sufficiently high level computer language.
  2. In the formula {ax^2+bx+c}, the “{a}” has the first textual position but the functional position as the coefficient of the quadratic term. This name “functional position” has nothing to do with functions. Can someone suggest a different name that won’t confuse people?
  3. I am using “variable” the way logicians do. Mathematicians would not normally refer to “{\mathcal{P}}” as a variable.
  4. I didn’t say anything about how templates can involve extra layers of abstract.  That will have to wait.
Send to Kindle

Learning by osmosis

In the Handbook, I said:

The osmosis theory of teaching is this attitude: We should not have to teach students to understand the way mathematics is written, or the finer points of logic (for example how quantifiers are negated). They should be able to figure these things on their own —“learn it by osmosis”. If they cannot do that they are not qualified to major in mathematics.

We learned our native language(s) as children by osmosis.  That does not imply that college students can or should learn mathematical reasoning that way. It does not even mean that college students should learn a foreign language that way.

I have been meaning to write a section of Understanding Mathematics that describes the osmosis theory and gives lots of examples.  There are already three links from other places in abstractmath.org that point to it.  Too bad it doesn’t exist…

Lately I have been teaching the Gauss-Jordan method using elementary row operations and found a good example.   The textbook uses the notation [m] +a[n] to mean “add a times row n to row m”.  In particular, [m] +[n] means “add row n to row m”, not “add row m to row n”. So in this notation ” [m] +[n] ” is not an expression, but a command, and in that command the plus sign is not commutative.   Similarly, “3[2]” (for example) does not mean “3 times row 2”, it means “change row 2 to 3 times row 2”.

The explanation is given in parentheses in the middle of an example:

…we add three times the first equation to the second equation.  (Abbreviation: [2] + 3[1].  The [2] means we are changing equation [2].  The expression [2] + 3[1] means that we are replacing equation 2 by the original equation plus three times equation 1.)

This explanation, in my opinion, would be incomprehensible to many students, who would understand the meaning only once it was demonstrated at the board using a couple of examples.  The phrase “The [2] means we are changing equation [2]” should have said something like “the left number, [2] in this case, denotes the equation we are changing.”  The last sentence refers to “the original equation”, meaning equation [2].  How many readers would guess that is what they mean?

In any case, better notation would be something like “[2]  3[1]”. I have found several websites that use this notation, sometimes written in the opposite direction. It is familiar to computer science students, which most of the students in my classes are.

Putting the definition of the notation in a parenthetical remark is also undesirable.  It should be in a separate paragraph marked “Notation”.

There is another point here:  No verbal definition of this notation, however well written, can be understood as well as seeing it carried out in an example.  This is also true of matrix multiplication, whose definition in terms of symbols such as a_ib_j is difficult to understand (if a student can figure out how you do it from this definition they should be encouraged to be a math major), whereas the process becomes immediately clear when you see someone pointing with one hand at successive entries in a row of one matrix while pointing with the other hand at successive entries in the other matrix’s columns.  This is an example of the superiority (in many cases) of pattern recognition over definitions in terms of strings of symbols to be interpreted.  I did write about pattern recognition, here.

Send to Kindle

Syntactic and semantic thinkers

A paper by Keith Weber

Reidar Mosvold’s math-ed blog recently provided a link to an article by Keith Weber (Reference [2]) about a very good university math student he referred to as a “syntactic reasoner”.  He interviewed the student in depth as the student worked on some proofs suitable to his level.  The student would “write the proofs out in quantifiers” and reason based on previous steps of the proof in a syntactic way rather than than depending on an intuitive understanding of the problem, as many of us do (the author calls us semantic reasoners).  The student didn’t think about specific examples —  he always tried to make them as abstract as possible while letting them remain examples (or counterexamples).

I recommend this paper if you are at all interested in math education at the university math major level — it is fascinating.  It made all sorts of connections for me with other ideas about how we think about math that I have thought about for years and which appear in the Understanding Math part of abstractmath.org.  It also raises lots of new (to me) questions.

Weber’s paper talks mostly about how the student comes up with a proof.  I suspect that the distinction between syntactic reasoners and semantic reasoners can be seen in other aspects of mathematical behavior, too, in trying to understand and explain math concepts.  Some thoughts:

Other behaviors of syntactic reasoners (maybe)

1) Many mathematicians (and good math students) explain math using conceptual and geometric images and metaphors, as described in Images and metaphors in abstractmath.org.   Some people I think of as syntactic reasoners seem to avoid such things. Some of them even deny thinking in images and metaphors, as I discussed in the post Thinking without words.   It used to be that even semantic reasoners were embarassed to used images and metaphors when lecturing (see the post How “math is logic” ruined math for a generation).

2) In my experience, syntactic reasoners like to use first order symbolic notation, for example eq0001MP

and will often translate a complicated sentence in ordinary mathematical English into this notation so they can understand it better.  (Weber describes the student he interviewed as doing this.)  Furthermore they seem to think that putting a formula such as the one above on the board says it all, so they don’t need to draw pictures, wave their hands [Note 1], and so on.  When you come up with a picture of a concept or theorem that you claim explains it their first impulse is to say it out in words that generally can be translated very easily into first order symbolism, and say that is what is going on.  It is a matter of what is primary.

The semantic reasoners of students and (I think) many mathematicians find the symbolic notation difficult to parse and would rather have it written out in English.  I am pretty good at reading such symbolic notation [Note 2] but I still prefer ordinary English.

3) I suspect the syntactic reasoners also prefer to read proofs step by step, as I described in my post Grasshoppers and linear proofs, rather than skipping around like a grasshopper.

And maybe not

Now it may very well be that syntactic thinkers do not all do all those things I mentioned in (1)-(3).  Perhaps the group is not cohesive in all those ways.  Probably really good mathematicians use both techniques, although Weyl didn’t think so (quoted in Weber’s paper).   I think of myself as an image and metaphor person but I do use syntax, and sometimes even find that a certain syntactic explanation feels like a genuinely useful insight, as in the example I discussed under conceptual in the Handbook.

Distinctions among semantic thinkers

Semantic thinkers differ among themselves.  One demarcation line is between those who use a lot of visual thinking and those who use conceptual thinking which is not necessarily visual.  I have known grad students who couldn’t understand how I could do group theory (that was in a Former Life, before category theory) because how could you “see” what was happening?  But the way I think about groups is certainly conceptual, not syntactic.  When I think of a group acting on a space I think of it as stirring the space around.  But the stirring is something I feel more than I see.  On the other hand, when I am thinking about the relationships between certain abstract objects, I “see” the different objects in different parts of an interior visual space.  For example, group is on the right, stirring the space-acted-upon on the left, or the group is in one place, a subgroup is in another place while simultaneously being inside the group, and the cosets are grouped (sorry) together in a third place, being (guess what) stirred around by the group acting by conjugation (Note [3]).

This distinction between conceptual and visual, perhaps I should say visual-conceptual and non-visual-conceptual, both opposed to linguistic or syntactic reasoning, may or may not be as fundamental as syntactic vs semantic.   But it feels fundamental to me.

Weber’s paper mentions an intriguing sounding book (Reference [1]) by Burton which describes a three-way distinction called conceptual, visual and symbolic, that sounds like it might be the distinction I am discussing here.  I have asked for it on ILL.

Notes

  1. Handwaving is now called kinesthetic communication.  Just to keep you au courant.
  2. I took Joe Shoenfield’s course in logic when his book  Mathematical Logic [3] was still purple.
  3. Clockwise for left action, counterclockwise for right action.  Not.

References

  1. Leone L. Burton, Mathematicians as Enquirers: Learning about Learning Mathematics.  Springer, 2004.
  2. Keith Weber, How syntactic reasoners can develop understanding, evaluate conjectures, and generate counterexamples in advanced mathematics. Proof copy available from Science Direct.
  3. Joseph Shoenfield, Mathematical logic, Addison-Wesley 1967, reprinted 2001 by the Association for Symbolic Logic.
Send to Kindle

Grasshoppers and linear proofs

Below, I give an detailed example of how the context of a proof changes as you read the proof line by line. This example comes from the abstractmath article on context.  I mean something like verbal context or  context in the computer science sense (see also Reference [1]): the values of all the relevant variables as specified up to the current statement in the proof.  For example, if the proof says “Suppose x = 3″, then when you read succeeding statements you know that x has the value 3, as long as it is not changed in some later statement.

Here is the text I will analyze:

Definition: Divides

Let m and n be integers with m\ne 0. The statement “m divides n” means that there is an integer q for which n=qm.

Theorem

Let m, n and p be integers, with m and n nonzero, and suppose m divides n and n divides p .  Then m divides p.

Proof

By definition of divides, there are integers q and q’ for which n=qm and p=q'n. We must prove that there is an integer q'' for which p=q''n. But p=q'n=q'qm, so let q''=q'q.  Then p=q''n.

0) Definition: Divides Changes the status of the word “divides” so that it becomes the definiendum. The scope is the following paragraph.
1) Let m and n be integers m and n are new symbols in this discourse, constrained to be integers
2) with m\neq 0 another constraint on m
3) The statement “m divides n” means that This sentence fragment gives the rest of the sentence (in the box below it) a special status.
4) there is an integer q for which n = qm. This clause introduces q, another new symbol constrained to be an integer.  The clause imposes a restraint on m, n and q, that they satisfy the equation n = qm. But we know this only in the scope of the word Definition, which ends at the end of the sentence.  Once we read the word Theorem we no longer know that q exists, much less that it satisfies the constraint.  Indeed, the statement of the definition means that one way to prove the theorem is to find an integer q for which n = qm. This is not stated explicitly, and indeed the reader would be wrong to draw the conclusion that in what follows the theorem will be proved in this way. (In fact it will in this example, but the author could have done some other kind of proof. )
5) Theorem The placement of the word “Theorem” here announces that the next paragraph is a mathematical statement and that the statement has been proved.  In real time the statement was proved long before this discourse was written, but in terms of reading the text in order, it has not yet been proved.
6) Let m, n and p be integers, We are starting a new context, in which we know that m, n and p are all  integers.  This changes that status of m and n, which were variables used in the preceding paragraph, but now all previous constraints are discarded. We are starting over with m, n, and p.  We are also starting what the reader must recognize as the hypotheses of a conditional sentence, since that affects the context in a very precise way.
7) with m and n nonzero. Now m and n are nonzero.  Note that in the previous paragraph n was not constrained to be nonzero.  Between the words “Let” and “with” in the current sentence, neither were constrained to be nonzero.
8 ) and n divides p More new constraints:  m divides n and n divides p.
9) Then m divides p.   The word “then” signals that we are starting the conclusion of the conditional sentence.  It makes a claim that m divides p whenever the conditions in the hypothesis are true.  Because it is the conclusion, it has a different status from the assumptions that m divides n and n divides p.   We can’t treat m as if it divides p even though this sentence says it does.  All we know is that the author is claiming that m divides p if the hypotheses are true, and we expect (because the next word is “Proof”) that this claim will shortly be proved.
10) Proof

This starts a new paragraph.  It does not necessarily wipe out the context.  If the proof is going to be by the direct method (assume hypothesis, prove conclusion) — as it does — then it will still be true that m and n are nonzero integers,  m divides n and n divides p.
11) By definition of divides, there are integers q and qfor which n = qm and p = q’n .

Since this proof starts by stating the hypothesis of the definition of “divides”, we now know that we are using the direct method, and that q and q’ are new symbols that we are to assume satisfy the equations  n = qm and p = q’n.   The phrase “by definition of divides” tells us (because the definition was given previously) that there are such integers, so in effect this sentence chooses q and qso that  n = qm and p = q’n.  The reader probably knows that there is only one choice for each of q and q′ but in fact that claim is not being made here.  Note that m, n and p are not new symbols – they still fall within the scope of the previous paragraph, so we still know that  m divides n and n divides p. If the proof were by contradiction, we would not know that.
12) We must prove that there is an integer  q” for which p = q”n q’’ is introduced by this sentence and is constrained by the equation. The scope of this sentence is just this sentence. The existence of  q’’ and the constraint on it do not exist in the context after the sentence is finished.  However, the constraints previously imposed on m, n, p, q and q’ do continue.
13) But  p = q’n = q’qm This is a claim about p, q, q′, m and n.  The equations are justified by certain preceding sentences but this justification is not made explicit.
14) so let q” = q’q We are establishing a new variable q″ in the context.   Now we put another constraint on it, namely q” = q’q.  It is significant that a variable named q″ was introduced once before, in the reference to the definition of divides.  A convention of mathematical discourse tells you to expect the author to establish that it fits the requirement of the definition. This condition is triggered by using the same symbol q″ both here and in the definition.
15) Then p = q”n This is an assertion about p, q″ and n, justified (but not explicitly) by the claim that p = q’n = q’qm.
16) The proof is now complete, although no statement asserts that.

I have several comments to make about this kind of analysis that are (mostly) not included in the abstractmath article.

a) This is supposed to be what goes through an experienced mathematician’s head while they are reading the proof.  Mostly subconsciously.  Linguists (as in Reference [1]) seem to think something like this takes place in your mind when you read any text, but it gets much denser in mathematical text.  Computer scientists analyze the operation of subprograms in this way, too.

b) Comment (a) is probably off the mark.  With a short proof like that, I get a global picture of the proof as my eyes dart back and forth over the various statements in the proof.  Now, I am a grasshopper: I read math stuff by jumping back and forth trying to understand the structure of the argument.  I do this both locally in a short proof and also globally when reading a long article or book:  I page through to find the topic I want and then jump back and forth finding the meanings of words and phrases I don’t understand.

c) I think most mathematicians are either grasshoppers or they are not good readers and they simply do not learn math by reading text.  I would like feedback on this.

d) If (a) is incorrect, should I omit this example from abstractmath?  I don’t think so.  My experience in teaching tells me that

  1. some students think this is perfectly obvious and why would I spend time constructing the example?,
  2. others are not aware that this is going on in their head and they are amazed to realize that it is really happening,
  3. and still others do not understand how to read proofs and when you tell them this sort of thing goes on in your head they are terminally intimidated.  (“Terminally” in the sense that they dye their hair black and become sociology majors.  They really do.)  Is that bad?  Well, I don’t think so.  I would like to hear arguments on the other side.

e) Can you figure out why item 8 of the analysis is labeled as “8 )” instead of “8)”?

Time is running out. I have other comments to make which must wait for a later post.

References

G. Chierchia and S. McConnell-Ginet (1990), Meaning and Grammar. The MIT Press.

Send to Kindle

Distributive plurals

A statement in English such as “all squared nonzero real numbers are positive” is called a distributive plural.  This means that the statement “the square of x is positive” is true for every nonzero real number.  It can be translated directly into symbolic notation:  \forall x\,(\text{if }x\ne 0\text{ then }{{x}^{2}}>0)

Not all statements involving plurals in English are distributive plurals.  The statement “The agents are surrounding the building” does not imply that Agent James is surrounding the building.  This type of statement is called a collective plural. Such a statement cannot be translated directly into a statement involving a universal quantifier.  More about this here.  This discussion on Wordwizard suggests that there may be a difference between British and American usage.

The word “distributive” as used here is analogous to the distributive law of arithmetic.  If the set of things referred to is finite, for example the set {-2, -1, 1, 3} then one can say  that “\forall x\,({{x}^{2}}>0)” is equivalent to “{{(-2)}^{2}}>0\text{ and }{{(-1)}^{2}}>0\text{ and }{{\text{1}}^{2}}>0\text{ and }{{\text{3}}^{2}}>0”.

I once found a report on the internet that a Quaker Oats box contained this exhortation: “Eating a good-sized bowl of Quaker Oatmeal for 30 days will actually help remove cholesterol from your body.”  This undoubtedly exhibits a confusion between distributive plurals and the other kind of plural, but I don’t understand the connection well enough to explain it.

I can no longer find the report on the internet.  This may mean the Quaker Oats box with that label never existed.

Send to Kindle

Handbook now online

I have placed an interactive version of the Handbook of Mathematical Discourse on line here. Its formatting is still a little rough, and it omits the quotations and illustrations from the printed book. It also needs the backlinks from the citations and bibliography reactivated. I will do that when I Get Around To It.

Now I can refer to the Handbook via a direct link from a blog post or from abstractmath, and you can click on a lexicographical citation and go directly to the text of the citation.

Comments and error reports are welcome.

Send to Kindle

Variables

One of the themes of abstractmath.org is that we should pay attention to how we think about mathematical objects.  This is not the same questions as “What are mathematical objects?”.    This post addresses the question: How do we think about variables? What follows are  extracts from  newly rewrittens sections from Variables and Substitution and  Mathematical Objects.

Role playing

If the author says “x is a real variable” then x plays the role of a real number in whatever expression it occurs in.  It is like an actor in a play.  If the producer says Dwayne will play Polonius you know that Dwayne will hide behind a curtain at a certain point in the play.  When x occurs in the expression x^3-1  you know that if a number is substituted for x in the expression, the  expression will then denote the result of cubing the number and subtracting 1 from it.

Slot or cell

The variable x is a slot into which you can put any real number.  If you plug 3 into x in the expression x^3-1  you will get 26. 

This is like a blank cell in a spreadsheet. If you define another cell with the formula “=x^3-1” and put 3 in the cell representing x, the other cell will contain 26.

What’s wrong with this metaphor:  In Excel, a blank cell is automatically set to 0. To be a better metaphor the cell shouldn’t have a value until it is given one, and the cell with the formula “= x^3-1” should say “undefined!”.   (I am not saying this would make Excel a better spreadsheet. Excel was not invented so that I could make a point about variables.)

Variable mathematical object

The two metaphors above refer to the name x.  You can instead think of x as a variable mathematical object, meaning x is a genuine mathematical object, but with limitations about what you can say or think about it.  This sort of thinking works for both the symbolic language and mathematical English, and it works for any kind of mathematical structure (“Let G be an Abelian group…”), not just numbers in a symbolic expression.  There are two related points of view:

1. Some statements about the object are neither true nor false.

This means x is a genuine mathematical object and you can make assertions about it, but some of the assertions might have no truth value.  From “Let x be a real number” you know these things:

  • The assertion “Either  x > 0  or x \leq 0 ” is true.
  • The assertion “ x^2 = -1” is false.
  • The assertion x > 0” is neither true nor false.

The assertion “x is a real number” is in a certain sense the most general true statement you can make about x.   In other words, x is a mathematical object given by an incomplete specification, so you are limited in what you can say about it or in what conclusions you can draw about it.

If you say, “Let n be an integer divisible by 4, you cannot assume it is 8 or 12, for example.  In other words, the statement “n is divisible by 4” is true, and “n = 3” is false, but the statement “n = 8” is neither true nor false, and you can’t derive any conclusions from n being 8.

2. The object is fixed but some things are not known about it.

If you say x is a real number, you know x is a real number (duh) and:

  • You know x is either positive or nonnegative.
  • You know x^2 is not equal to any negative number.
  • You don’t know whether x is positive or not.
  • This way of looking at it involves thinking of x as a particular real number.  During the process of solving the equation x^2-5x=-6 you are thinking of x as a specific real number, but you don’t know which one.

    These points of view (1) and (2) provide genuinely different metaphors for variables.  In (1) I say certain statements are neither true nor false, but (2) suggests that all statements about the object are either true or false but you don’t know which.  However, note  when solving the equation
    x^2-5x=-6 that, when you are finished, you still don’t know whether x = 2 or x = 3.  This factcauses me cognitive dissonance, but the point of view that some statements are neither true nor false upsets other people.  I prefer (1) over  (2) but I have to admit that (1)  is much less familiar to most mathematicians.

    View (1) is advocated by category theorists because it allows you to think of a quantity holistically as a single thing rather than as a table of values.  The height of a cannonball is different at different times but the “height” is nevertheless one continuous mathematical quantity.   People who know more about history than I do believe that that is the simple and uncomplicated way nineteenth-century mathematicians thought about variable quantities. 

    We need good tools to do math.  This means good images and metaphors as well as good tools for reasoning.  Having simple and uncomplicated ways to think about math objects (along with guidelines for the way you think about them, such as dropping the law of the excluded middle in some cases!) is every bit as important as making sure our reasoning follows carefully thought out rules that lead from truth only to truth.  

    Note:  Heyting valued logic actually provides sound but non-classical reasoning for thinking about variable objects, but most mathematicians with sound intuitions nevertheless use classical reasoning and come up with correct conclusions.  Some of us are now in the practice of using non-classical logic to study differentials and other things, and that is a Good Thing, but it would be a complete misunderstanding if you read this post as advocating that mathematicians change over to that way of doing things.  This post is about how we think about variability. 

    Send to Kindle