Category Archives: language of math

Learning by osmosis

In the Handbook, I said:

The osmosis theory of teaching is this attitude: We should not have to teach students to understand the way mathematics is written, or the finer points of logic (for example how quantifiers are negated). They should be able to figure these things on their own —“learn it by osmosis”. If they cannot do that they are not qualified to major in mathematics.

We learned our native language(s) as children by osmosis.  That does not imply that college students can or should learn mathematical reasoning that way. It does not even mean that college students should learn a foreign language that way.

I have been meaning to write a section of Understanding Mathematics that describes the osmosis theory and gives lots of examples.  There are already three links from other places in abstractmath.org that point to it.  Too bad it doesn’t exist…

Lately I have been teaching the Gauss-Jordan method using elementary row operations and found a good example.   The textbook uses the notation [m] +a[n] to mean “add a times row n to row m”.  In particular, [m] +[n] means “add row n to row m”, not “add row m to row n”. So in this notation ” [m] +[n] ” is not an expression, but a command, and in that command the plus sign is not commutative.   Similarly, “3[2]” (for example) does not mean “3 times row 2”, it means “change row 2 to 3 times row 2”.

The explanation is given in parentheses in the middle of an example:

…we add three times the first equation to the second equation.  (Abbreviation: [2] + 3[1].  The [2] means we are changing equation [2].  The expression [2] + 3[1] means that we are replacing equation 2 by the original equation plus three times equation 1.)

This explanation, in my opinion, would be incomprehensible to many students, who would understand the meaning only once it was demonstrated at the board using a couple of examples.  The phrase “The [2] means we are changing equation [2]” should have said something like “the left number, [2] in this case, denotes the equation we are changing.”  The last sentence refers to “the original equation”, meaning equation [2].  How many readers would guess that is what they mean?

In any case, better notation would be something like “[2]  3[1]”. I have found several websites that use this notation, sometimes written in the opposite direction. It is familiar to computer science students, which most of the students in my classes are.

Putting the definition of the notation in a parenthetical remark is also undesirable.  It should be in a separate paragraph marked “Notation”.

There is another point here:  No verbal definition of this notation, however well written, can be understood as well as seeing it carried out in an example.  This is also true of matrix multiplication, whose definition in terms of symbols such as a_ib_j is difficult to understand (if a student can figure out how you do it from this definition they should be encouraged to be a math major), whereas the process becomes immediately clear when you see someone pointing with one hand at successive entries in a row of one matrix while pointing with the other hand at successive entries in the other matrix’s columns.  This is an example of the superiority (in many cases) of pattern recognition over definitions in terms of strings of symbols to be interpreted.  I did write about pattern recognition, here.

Send to Kindle

Composites of functions

In my post on automatic spelling reform, I mentioned the various attempts at spelling reform that have resulted in both the old and new systems being used, which only makes things worse.  This happens in Christian denominations, too.  Someone (Martin Luther, John Wesley) tries to reform things; result: two denominations.   But a lot of the time the reform effort simply disappears.  The Chicago Tribune tried for years to get us to write “thru” and “tho” —  and failed.  Nynorsk (really a language reform rather than a spelling reform) is down to 18% of the population and the result of allowing Nynorsk forms to be used in the standard language have mostly been nil.  (See Note 1.)

In my early years as a mathematician I wrote a bunch of papers writing functions on the right (including the one mentioned in the last post).  I was inspired by some algebraists and particularly by Beck’s Thesis (available online via TAC), which I thought was exceptionally well-written.  This makes function composition read left to right and makes the pronunciation of commutative diagrams get along with notation, so when you see the diagram below you naturally write h = fg instead of h = gf. Composite

Sadly, I gave all that up before 1980 (I just looked at some of my old papers to check).  People kept complaining.  I even completely rewrote one long paper (Reference [3]) changing from right hand to left hand (just like Samoa).  I did this in Zürich when I had the gout, and I was happy to do it because it was very complicated and I had a chance to check for errors.

Well, I adapted.  I have learned to read the arrows backward (g then f in the diagram above).  Some French category theorists write the diagram backward, thus:

CompositeOp

But I was co-authoring books on category theory in those days and didn’t think people would accept it. Not to mention Mike Barr (not that he is not a people, oh, never mind).

Nevertheless, we should have gone the other way.  We should have adopted the Dvorak keyboard and Betamax, too.

Notes

[1] A lifelong Norwegian friend of ours said that when her children say “boka” instead of “boken” it sound like hillbilly talk does to Americans.  I kind of regretted this, since I grew up in north Georgia and have been a kind of hillbilly-wannabe (mostly because of the music); I don’t share that negative reaction to hillbillies.  On the other hand, you can fageddabout “ho” for “hun”.

References

[1] Charles Wells, Automorphisms of group extensions, Trans. Amer. Math. Soc, 155 (1970), 189-194.

[2] John Martino and Stewart Priddy, Group extensions and automorphism group rings. Homology, Homotopy and Applications 5 (2003), 53-70.

[3] Charles Wells, Wreath product decomposition of categories 1, Acta Sci. Math. Szeged 52 (1988), 307 – 319.

Send to Kindle

Grasshoppers and linear proofs

Below, I give an detailed example of how the context of a proof changes as you read the proof line by line. This example comes from the abstractmath article on context.  I mean something like verbal context or  context in the computer science sense (see also Reference [1]): the values of all the relevant variables as specified up to the current statement in the proof.  For example, if the proof says “Suppose x = 3″, then when you read succeeding statements you know that x has the value 3, as long as it is not changed in some later statement.

Here is the text I will analyze:

Definition: Divides

Let m and n be integers with m\ne 0. The statement “m divides n” means that there is an integer q for which n=qm.

Theorem

Let m, n and p be integers, with m and n nonzero, and suppose m divides n and n divides p .  Then m divides p.

Proof

By definition of divides, there are integers q and q’ for which n=qm and p=q'n. We must prove that there is an integer q'' for which p=q''n. But p=q'n=q'qm, so let q''=q'q.  Then p=q''n.

0) Definition: Divides Changes the status of the word “divides” so that it becomes the definiendum. The scope is the following paragraph.
1) Let m and n be integers m and n are new symbols in this discourse, constrained to be integers
2) with m\neq 0 another constraint on m
3) The statement “m divides n” means that This sentence fragment gives the rest of the sentence (in the box below it) a special status.
4) there is an integer q for which n = qm. This clause introduces q, another new symbol constrained to be an integer.  The clause imposes a restraint on m, n and q, that they satisfy the equation n = qm. But we know this only in the scope of the word Definition, which ends at the end of the sentence.  Once we read the word Theorem we no longer know that q exists, much less that it satisfies the constraint.  Indeed, the statement of the definition means that one way to prove the theorem is to find an integer q for which n = qm. This is not stated explicitly, and indeed the reader would be wrong to draw the conclusion that in what follows the theorem will be proved in this way. (In fact it will in this example, but the author could have done some other kind of proof. )
5) Theorem The placement of the word “Theorem” here announces that the next paragraph is a mathematical statement and that the statement has been proved.  In real time the statement was proved long before this discourse was written, but in terms of reading the text in order, it has not yet been proved.
6) Let m, n and p be integers, We are starting a new context, in which we know that m, n and p are all  integers.  This changes that status of m and n, which were variables used in the preceding paragraph, but now all previous constraints are discarded. We are starting over with m, n, and p.  We are also starting what the reader must recognize as the hypotheses of a conditional sentence, since that affects the context in a very precise way.
7) with m and n nonzero. Now m and n are nonzero.  Note that in the previous paragraph n was not constrained to be nonzero.  Between the words “Let” and “with” in the current sentence, neither were constrained to be nonzero.
8 ) and n divides p More new constraints:  m divides n and n divides p.
9) Then m divides p.   The word “then” signals that we are starting the conclusion of the conditional sentence.  It makes a claim that m divides p whenever the conditions in the hypothesis are true.  Because it is the conclusion, it has a different status from the assumptions that m divides n and n divides p.   We can’t treat m as if it divides p even though this sentence says it does.  All we know is that the author is claiming that m divides p if the hypotheses are true, and we expect (because the next word is “Proof”) that this claim will shortly be proved.
10) Proof

This starts a new paragraph.  It does not necessarily wipe out the context.  If the proof is going to be by the direct method (assume hypothesis, prove conclusion) — as it does — then it will still be true that m and n are nonzero integers,  m divides n and n divides p.
11) By definition of divides, there are integers q and qfor which n = qm and p = q’n .

Since this proof starts by stating the hypothesis of the definition of “divides”, we now know that we are using the direct method, and that q and q’ are new symbols that we are to assume satisfy the equations  n = qm and p = q’n.   The phrase “by definition of divides” tells us (because the definition was given previously) that there are such integers, so in effect this sentence chooses q and qso that  n = qm and p = q’n.  The reader probably knows that there is only one choice for each of q and q′ but in fact that claim is not being made here.  Note that m, n and p are not new symbols – they still fall within the scope of the previous paragraph, so we still know that  m divides n and n divides p. If the proof were by contradiction, we would not know that.
12) We must prove that there is an integer  q” for which p = q”n q’’ is introduced by this sentence and is constrained by the equation. The scope of this sentence is just this sentence. The existence of  q’’ and the constraint on it do not exist in the context after the sentence is finished.  However, the constraints previously imposed on m, n, p, q and q’ do continue.
13) But  p = q’n = q’qm This is a claim about p, q, q′, m and n.  The equations are justified by certain preceding sentences but this justification is not made explicit.
14) so let q” = q’q We are establishing a new variable q″ in the context.   Now we put another constraint on it, namely q” = q’q.  It is significant that a variable named q″ was introduced once before, in the reference to the definition of divides.  A convention of mathematical discourse tells you to expect the author to establish that it fits the requirement of the definition. This condition is triggered by using the same symbol q″ both here and in the definition.
15) Then p = q”n This is an assertion about p, q″ and n, justified (but not explicitly) by the claim that p = q’n = q’qm.
16) The proof is now complete, although no statement asserts that.

I have several comments to make about this kind of analysis that are (mostly) not included in the abstractmath article.

a) This is supposed to be what goes through an experienced mathematician’s head while they are reading the proof.  Mostly subconsciously.  Linguists (as in Reference [1]) seem to think something like this takes place in your mind when you read any text, but it gets much denser in mathematical text.  Computer scientists analyze the operation of subprograms in this way, too.

b) Comment (a) is probably off the mark.  With a short proof like that, I get a global picture of the proof as my eyes dart back and forth over the various statements in the proof.  Now, I am a grasshopper: I read math stuff by jumping back and forth trying to understand the structure of the argument.  I do this both locally in a short proof and also globally when reading a long article or book:  I page through to find the topic I want and then jump back and forth finding the meanings of words and phrases I don’t understand.

c) I think most mathematicians are either grasshoppers or they are not good readers and they simply do not learn math by reading text.  I would like feedback on this.

d) If (a) is incorrect, should I omit this example from abstractmath?  I don’t think so.  My experience in teaching tells me that

  1. some students think this is perfectly obvious and why would I spend time constructing the example?,
  2. others are not aware that this is going on in their head and they are amazed to realize that it is really happening,
  3. and still others do not understand how to read proofs and when you tell them this sort of thing goes on in your head they are terminally intimidated.  (“Terminally” in the sense that they dye their hair black and become sociology majors.  They really do.)  Is that bad?  Well, I don’t think so.  I would like to hear arguments on the other side.

e) Can you figure out why item 8 of the analysis is labeled as “8 )” instead of “8)”?

Time is running out. I have other comments to make which must wait for a later post.

References

G. Chierchia and S. McConnell-Ginet (1990), Meaning and Grammar. The MIT Press.

Send to Kindle

Commonword names for technical concepts

In a previous post I talked about the use of commonword names for technical concepts, for example, “simple group” for a group with no proper normal subgroups.  This makes the monster group a simple group!  Lay readers on the subject might very well feel terminally put-down by such usage.  (If he calls that “simple” he must be a genius.  How could I ever understand that?  See note 1.)  Mark Ronan used of “atom of symmetry” instead of “simple group” in his book Symmetry and the Monster, probably for some such reason.

Recently I had what used to be called a CAT scan and (perhaps) what used to be called a PET scan on the same day.   The medically community now refers to CT scan or nuclear imaging.   This may be because too many clients were thinking of doing sadistic testing on cats or other pets.   But I have not been able to confirm that.

The nurse called the CT scan an x-ray.  Well, of course, it is an x-ray, but it is an x-ray with tomography.  She explicitly said that calling CT scans x-rays was common usage in their lab.  In the past, other medical people have said to me, “It used to be called CAT scan but now it is CT scan.”   But no one said why.

The situation about PET scan is more complicated.  I didn’t raise the question with the nurse, and Wikipedia has separate articles about PET scans and nuclear imaging, even though they both use positrons and tomography.   The chemicals mentioned for PET are isotopes of low-atomic-number elements, whereas the nuclear medicine article mentions technetium99 as the most commonly used isotope.  Nowhere does it explain the difference.  I wrote a querulous note in the comments section of the NM article requesting clarification.

Note 1.  “If he calls that ‘simple’ he must be a genius.  How could I ever understand that?”   Do not dismiss this as the reaction of a stupid person.  This kind of ready-to-be-intimidated attitude is very common among intelligent, educated, but non-technically-oriented people.   If mathematicians dismiss people like that we will  continue to find mathematics anathema among educated people.  We need people to feel that they understand something about what mathematicians do (I use that wording advisedly).  Even if you are an elitist who doesn’t give a damn about ordinary people, remember who funds the NSF. See co-intimidator.

Send to Kindle

Mathematical concepts

This post was triggered by John Armstrong’s comment on my last post.

We need  to distinguish two ideas: representations of a mathematical concept and the total concept.  (I will say more about terminology later.)

Example: We can construct the quotient of the kernel of a group homomorphism by taking its cosets and defining a multiplication on them.  We can construct the image of the homomorphism by take the set of values of the homomorphism and using the multiplication induced by the codomain group.   The quotient group and the image are the same mathematical structure in the sense that anything useful you can say about one is true of the other.   For example, it may be useful to know the cardinality of the quotient (image) but it is not useful to know what its elements are.

But hold on, as the Australians say, if we knew that the codomain was an Abelian group then we would know that the quotient group was abelian because the elements of the image form a subgroup of the codomain. (But the Australians I know wouldn’t say that.)

Now that kind of thinking is based on the idea that the elements of the image are “really” elements of the codomain whereas elements of the quotients are “really” subsets of the domain.  That is outmoded thinking.  The image and the quotient are the same in all important aspects because they are naturally isomorphic.   We should think of the quotient as just as much as subgroup of the codomain as the image is.  John Baez (I think) would say that to ask whether the subgroup embedding is the identity on elements or not is an evil question.

Let’s step back and look at what is going on here.  The definition of the quotient group is a construction using cosets.  The definition of the image is a construction using values of the homomorphism.  Those are two different specific  representations of the same concept.

But what is the concept, as distinct from its representations?  Intuitively, it is

  • All the constructions made possible by the definition of the concept.
  • All the statements that are true about the concept.

(That is not precise.)

The total concept is like the clone plus the equational theory of a specific type of algebra in the sense of universal algebra.  The clone is all the operations you can construct knowing the given signature and equations and the equational theory is the set of all equations that follow from them.  That is one way of describing it.  Another is the monad in Set that gives the type of algebra — the operations are the arrows and the equations are the commutative diagrams.

Note: The preceding description of the monad is not quite right.  Also the whole discussion omits mention of the fact that we are in the world (doctrine) of universal algebra.  In the world of first order logic, for example, we need to refer to the classifying topos of the category of algebras of that type (or to its first order theory).

Terminology

We need better terminology for all this.  I am not going to propose better terminology, so this is a shaggy dog story.

Math ed people talk about a particular concept image of a concept as well as the total schema of the concept.

In categorical logic, we talk about the sketch or presentation of the concept vs. the theory. The theory is a category (of the kind appropriate to the doctrine) that contains all the possible constructions and commutative diagrams that follow from the presentation.

In this post I have used “total concept” to refer to the schema or theory.  I have referred the particular things as  “representations” (for example construct the image of a homomorphism by cosets or by values of the homomorphism).

“Representation” does not have the same connotations as “presentation”.  Indeed a presentation of a group and a representation of a group are mathematically  two different things.  But I suspect they are two different aspects of the same idea.

All this needs to be untangled.  Maybe we should come up with two completely arbitrary words, like “dostak” and “dosh”.

Send to Kindle

Different names for the same thing

I recommend reading the discussion (to which I contributed) of the post “Why aren’t all functions well-defined?” on Gower’s Weblog.   It resulted in an insight I should have had a long time ago.

I have been preaching the importance of different ways of thinking about a math object (different images, metaphors, mental representations — there are too many names for this in the math ed literature).   Well, mathematicians at least occasionally use different names for a type of math object to indicate how they are thinking about it.

Examples

We talk about a relation and we talk about multivalued functions. Those are two different ways of talking about the same thing (they are the same by an adjunction).   A relation is a predicate.  A multivalued function is a function except that it can have more than one output for a given input.  But they are the same thing.

We talk about an equivalence relation and we talk about a partition of a set (or a quotient set).  The category of equivalence relations and the category of partitions of sets are naturally isomorphic, not merely equivalent.  But one is a special kind of relation and the other is a grouping.

Let’s be open about what we do

We should be explicit about the way we think about and do math.  We have several different ways to think about any interesting type of math object and we should push this practice to students as being absolutely vital.  In particular we (some of us) use different names sometimes for the same object and we refuse to give them up, muttering about “reductionism” and “nothing buttery”.

Some students arrive in class already as (pedantic?)(geeky?) as many mathematicians (I am a recovering pedant myself).  We need to be up front about this phenomenon and explain the value of thinking and talking about the same thing in different ways, even using different words.

It used to be different but now it’s the same

A kind of opposite phenomenon occurs with some students and mathematicians of a certain personality type.  Consider the name “multivalued function”.  Of course a multivalued function is not (necessarily) a function.  Your mother-in-law  is not your mother, either.  I go on about this (using ideas from Lakoff) in the Handbook under “radial concept”.   Pedantic types can’t stand this kind of usage.  “A multivalued function can’t be a function”.  “Equivalence relations and partitions are not the same thing because one is a relation and the other is a set of sets.”  “The image of a homomorphism and the quotient by its kernel are not the same thing because…”

This attitude makes me tired.  Put your hands on the tv screen and think like a category theorist.

Send to Kindle

Distributive plurals

A statement in English such as “all squared nonzero real numbers are positive” is called a distributive plural.  This means that the statement “the square of x is positive” is true for every nonzero real number.  It can be translated directly into symbolic notation:  \forall x\,(\text{if }x\ne 0\text{ then }{{x}^{2}}>0)

Not all statements involving plurals in English are distributive plurals.  The statement “The agents are surrounding the building” does not imply that Agent James is surrounding the building.  This type of statement is called a collective plural. Such a statement cannot be translated directly into a statement involving a universal quantifier.  More about this here.  This discussion on Wordwizard suggests that there may be a difference between British and American usage.

The word “distributive” as used here is analogous to the distributive law of arithmetic.  If the set of things referred to is finite, for example the set {-2, -1, 1, 3} then one can say  that “\forall x\,({{x}^{2}}>0)” is equivalent to “{{(-2)}^{2}}>0\text{ and }{{(-1)}^{2}}>0\text{ and }{{\text{1}}^{2}}>0\text{ and }{{\text{3}}^{2}}>0”.

I once found a report on the internet that a Quaker Oats box contained this exhortation: “Eating a good-sized bowl of Quaker Oatmeal for 30 days will actually help remove cholesterol from your body.”  This undoubtedly exhibits a confusion between distributive plurals and the other kind of plural, but I don’t understand the connection well enough to explain it.

I can no longer find the report on the internet.  This may mean the Quaker Oats box with that label never existed.

Send to Kindle

Handbook now online

I have placed an interactive version of the Handbook of Mathematical Discourse on line here. Its formatting is still a little rough, and it omits the quotations and illustrations from the printed book. It also needs the backlinks from the citations and bibliography reactivated. I will do that when I Get Around To It.

Now I can refer to the Handbook via a direct link from a blog post or from abstractmath, and you can click on a lexicographical citation and go directly to the text of the citation.

Comments and error reports are welcome.

Send to Kindle

Variables

One of the themes of abstractmath.org is that we should pay attention to how we think about mathematical objects.  This is not the same questions as “What are mathematical objects?”.    This post addresses the question: How do we think about variables? What follows are  extracts from  newly rewrittens sections from Variables and Substitution and  Mathematical Objects.

Role playing

If the author says “x is a real variable” then x plays the role of a real number in whatever expression it occurs in.  It is like an actor in a play.  If the producer says Dwayne will play Polonius you know that Dwayne will hide behind a curtain at a certain point in the play.  When x occurs in the expression x^3-1  you know that if a number is substituted for x in the expression, the  expression will then denote the result of cubing the number and subtracting 1 from it.

Slot or cell

The variable x is a slot into which you can put any real number.  If you plug 3 into x in the expression x^3-1  you will get 26. 

This is like a blank cell in a spreadsheet. If you define another cell with the formula “=x^3-1” and put 3 in the cell representing x, the other cell will contain 26.

What’s wrong with this metaphor:  In Excel, a blank cell is automatically set to 0. To be a better metaphor the cell shouldn’t have a value until it is given one, and the cell with the formula “= x^3-1” should say “undefined!”.   (I am not saying this would make Excel a better spreadsheet. Excel was not invented so that I could make a point about variables.)

Variable mathematical object

The two metaphors above refer to the name x.  You can instead think of x as a variable mathematical object, meaning x is a genuine mathematical object, but with limitations about what you can say or think about it.  This sort of thinking works for both the symbolic language and mathematical English, and it works for any kind of mathematical structure (“Let G be an Abelian group…”), not just numbers in a symbolic expression.  There are two related points of view:

1. Some statements about the object are neither true nor false.

This means x is a genuine mathematical object and you can make assertions about it, but some of the assertions might have no truth value.  From “Let x be a real number” you know these things:

  • The assertion “Either  x > 0  or x \leq 0 ” is true.
  • The assertion “ x^2 = -1” is false.
  • The assertion x > 0” is neither true nor false.

The assertion “x is a real number” is in a certain sense the most general true statement you can make about x.   In other words, x is a mathematical object given by an incomplete specification, so you are limited in what you can say about it or in what conclusions you can draw about it.

If you say, “Let n be an integer divisible by 4, you cannot assume it is 8 or 12, for example.  In other words, the statement “n is divisible by 4” is true, and “n = 3” is false, but the statement “n = 8” is neither true nor false, and you can’t derive any conclusions from n being 8.

2. The object is fixed but some things are not known about it.

If you say x is a real number, you know x is a real number (duh) and:

  • You know x is either positive or nonnegative.
  • You know x^2 is not equal to any negative number.
  • You don’t know whether x is positive or not.
  • This way of looking at it involves thinking of x as a particular real number.  During the process of solving the equation x^2-5x=-6 you are thinking of x as a specific real number, but you don’t know which one.

    These points of view (1) and (2) provide genuinely different metaphors for variables.  In (1) I say certain statements are neither true nor false, but (2) suggests that all statements about the object are either true or false but you don’t know which.  However, note  when solving the equation
    x^2-5x=-6 that, when you are finished, you still don’t know whether x = 2 or x = 3.  This factcauses me cognitive dissonance, but the point of view that some statements are neither true nor false upsets other people.  I prefer (1) over  (2) but I have to admit that (1)  is much less familiar to most mathematicians.

    View (1) is advocated by category theorists because it allows you to think of a quantity holistically as a single thing rather than as a table of values.  The height of a cannonball is different at different times but the “height” is nevertheless one continuous mathematical quantity.   People who know more about history than I do believe that that is the simple and uncomplicated way nineteenth-century mathematicians thought about variable quantities. 

    We need good tools to do math.  This means good images and metaphors as well as good tools for reasoning.  Having simple and uncomplicated ways to think about math objects (along with guidelines for the way you think about them, such as dropping the law of the excluded middle in some cases!) is every bit as important as making sure our reasoning follows carefully thought out rules that lead from truth only to truth.  

    Note:  Heyting valued logic actually provides sound but non-classical reasoning for thinking about variable objects, but most mathematicians with sound intuitions nevertheless use classical reasoning and come up with correct conclusions.  Some of us are now in the practice of using non-classical logic to study differentials and other things, and that is a Good Thing, but it would be a complete misunderstanding if you read this post as advocating that mathematicians change over to that way of doing things.  This post is about how we think about variability. 

    Send to Kindle