Category Archives: understanding math

Thinking about mathematical objects revisited

How we think about X

It is notable that many questions posted at MathOverflow are like, “How should I think about X?”, where X can be any type of mathematical object (quotient group, scheme, fibration, cohomology and so on).  Some crotchety contributors to that group want the questions to be specific and well-defined, but “how do I think about…” questions  are in my opinion among the most interesting questions on the website.  (See note [a]).

Don’t confuse “How do I think about X” with “What is X really?” (pace Reuben Hersh).  The latter is a philosophical question.  As far as I am concerned, thinking about how to think about X is very important and needs lots of research by mathematicians, educators, and philosophers — for practical reasons: how you think about it helps you do it.   What it really is is no help and anyway no answer may exist.

Inert and eternal

The idea that mathematical objects should be thought of as  “inert” and “eternal”  has been around for awhile.  (Never mind whether they really are inert and eternal.)  I believe, and have said in the past [1], that thinking about them that way clears up a lot of confusion in newbies concerning logical inference.

  • That mathematical objects are “inert” means that the do not cause anything. They have no effect on the real world or on each other.
  • That they are “eternal” means they don’t change over time.

Naturally, a function (a mathematical object) can model change over time, and it can model causation, too, in that it can describe a process that starts in one state and achieves stasis in another state (that is just one way of relation functions to causation).  But when we want to prove something about a type of math object, our metaphorical understanding of them has to lose all its life and color and go dead, like the dry bones before Ezekiel started nagging them.

It’s only mathematical reasoning if it is about dead things

The effect on logical inference can be seen in the fact that “and” is a commutative logical operator. 

  • “x > 1 and x < 3″ means exactly the same thing as “x < 3 and x > 1″
  • “He picked up his umbrella and went outside” does not mean the same thing as “He went outside and picked up his umbrella”.

The most profound effect concerns logical implication.  “If  x > 1 then x > 0″ says nothing to suggest that x > 1 causes it to be the case that x > 0.  It is purely a statement about the inert truth sets of two predicates lying around the mathematical boneyard of objects:  The second set includes the first one.  This makes vacuous implication perfectly obvious.  (The number -1 lies in neither truth set and is irrelevant to the fact of inclusion).

Inert and eternal rethought

There are better metaphors than these.  The point about the number 3 is that you think about it as outside time. In the world where you think about 3 or any other mathematical object, all questions about time are meaningless.

  • In the sentence “3 is a prime”, we need a new tense in English like the tenses ancient (very ancient) Greek and Hebrew were supposed to have (perfect with gnomic meaning), where a fact is asserted without reference to time.
  • Since causation involves this happens, then this happens, all questions about causation are meaningless, too.  It is not true that 3 causes 6 to be composite, while being irrelevant to the fact that 35 is composite.

This single metaphor “outside time” thus can replace the two metaphors “inert” and “eternal” and (I think) shows that the latter two are really two aspects of the same thing.

Caveat

Thinking of math objects as outside time is a Good Thing when you are being rigorous, for example doing a proof.  The colorful, changing, full-of-life way of thinking of math that occurs when you say things like the statements below is vitally necessary for inspiring proofs and for understanding how to apply the mathematics.

  • The harmonic series goes to infinity in a very leisurely fashion.
  • A function is a machine — when you dump in a number it grinds away and spits out another number.
  • At zero, this function vanishes.

Acknowledgment

Thanks to Jody Azzouni for the italics (see [3]).

Notes

a.  Another interesting type of question  “in what setting does such and such a question (or proof) make sense?” .  An example is my question in [2].

References

1.  Proofs without dry bones

2. Where does the generic triangle live?

3. The revolution in technical exposition II.

Send to Kindle

Just-in-time foundations

Introduction

In MathOverflow, statements similar to the following two occurred in comments:

  1. Sets and functions do not form a category
  2. Categories and functors do not form a category.

I cannot find either one of them now, but I want to talk about them anyway.

If you look at the definition of categories in various works (for example references [1] through [3] below) you find that the objects and arrows of a category must each form a “collection” or “class” together with certain operations.   The authors all describe the connection with Grothendieck’s concept of “universe” and define “large categories” and “small categories” in the usual way.  So Statement 1 above is simply wrong.

Statement 2 is more problematic.  The trouble is that if the word “categories” includes large categories then the objects do not form a set even in the second universe.  You have to go to the third universe.

Now there is a way to define categories where this issue does not come up.  It allows us to think about categories without having a particular system such as ZF and universes in mind.

A syntactic definition of category

A category consists of objects and arrows, together with four methods of construction M1 – M4 satisfying laws L1 -L7.  I treat “object” and “arrow” as predicates:  object[f] means f is an object and arrow[a] means a is an arrow.  “=” means equals in the mathematical sense.

M1 Source If arrow[f], object[f.source].
M2 Target If arrow [f], object[f.target].
M3 Identity If object[a],  arrow[a.identity].
M4 Comp If arrow[g] and arrow[f] and  f.target = g.source, then arrow[(g,f).comp].
L1. If object[a],  a.identity.source = a.
L2. If object[a], a.identity.target = a.
L3. If arrow[g] and arrow[f] and  f.target = g.source, then (g,f).comp.source = f.source.
L4. If arrow[g] and arrow[f] and  f.target = g.source, then (g,f).comp.target = g.target.
L5. If object[a] and arrow[f] and f.source = a, then (f, a.identity) = f.
L6.  If object[a] and arrow[g] and g.target = a, then (a.identity, g) = g.
L7.  If arrow[h] and arrow[g] and arrow[f] and h.source= g.target and g.source = f.target, then (h,(g,f).comp = ((h,g).comp, f.comp).
Remarks on this definition
1. I have deliberately made this definition look like a specification in an object oriented program (see [6]), although the syntax is not the same as any particular oo language.  It is as rigorous a mathematical definition as you could want, and it could presumably be compiled in some oo language, except that I don’t know if oo languages allow the conditional definition of a method as given in M4.
2.  I could have given the definition in mathematical English, for example “If f is an arrow then the source of f is an object”.  My point in providing the impenetrable definition above is to make a connection (admittedly incompletely established) with a part of math (the theory of oo languages) that is definitely rigorous but is not logic.  An informal definition in math English of course could also be transformed rigorously into first order logic.
3.  This definition is exactly equivalent to the FL sketch for categories given in my post [5].  That sketch has models in many categories, not just Set, as well as its generic model living in the corresponding FL-cattheory (or in the classifying topos it generates).
4.  Saunders Mac Lane defined metacategory in precisely this way in [1].  That was of course before anyone every heard of oo languages.  I think he should have made that the definition of category.

Just-in-time foundations

Mathematicians work inside the categories Set (sets and functions) and Cat (categories and functors) all the time, including functors to or from Cat or Set. When they consider a category, the use theorems that follow from the definition above.  They do not have to have foundations in mind.

Once in awhile, they are frustrated because they cannot talk about the set of objects of some category.  For example, Freyd’s solution set condition is required to prove the existence of a left adjoint because of that problem.  The ss condition is a work-around for a familiar obstruction to an easy way to prove something.  I can imagine coming up with such a work-around without ever giving a passing thought to foundations, in particular without thinking of universes.

When you work with a mathematical object, the syntax of the definitions and theorems give you all you need to justify the claim that something is a theorem.  You absolutely need models of the theory to think up and understand proofs, but the models could be sets or classes with structure, or functors (as in sketch theory), or you may work with generic models which may require you to use intuitionistic reasoning.  You don’t have to have any particular kind of model in mind when you work in Set or Cat.

When you do run into something like the impossibility of forming the set of objects of some category (which happens in any model theory environment that uses classical rather than intuitionistic reasonins) then you may want to consider an approach through some theory of foundations.  That is what most mathematicians do: they use just-in-time foundations. For example, in a particular application you may be happy to work in a topos with a set-of-all-objects, particularly if you are a certain type of computer scientists who lives in Pittsburgh.  You may be happy to explicitly consider universes, although I am not aware of any category-theoretical results that do explicitly mention universes.

But my point is that most mathematicians think about foundations only when they need to, and most mathematicians never need to think about foundations in their work. Moral: Don’t think in terms of foundations unless you have to.

This point of view is related to the recent discussions of pragmatic foundations [7] [8].

Side remark

The situation that you can’t always construct a set of somethings is analogous to the problem that you have in working with real numbers:  You can’t name most real numbers. This may get in the way of some analyst wanting to do something, I don’t know.  But in any branch of math, there are obstructions to things you want to do that really do get in your way.  For example, in beginning linear algebra, it may have occurred to you, to your annoyance, that if you have the basis of a subspace you can extend it to the basis for the whole space, but if you have a basis of the whole space, and a subspace, the basis may not contain a basis of the subspace.

References and links

  1. Saunders Mac Lane, Categories for the working mathematician. Springer-Verlag, 1971.
  2. Wikipedia article on category theory
  3. Michael Barr and Charles Wells, Category Theory for Computing Science, Third Edition (1999). Les Publications CRM, Montreal (publication PM023).
  4. Discussion of functions in abstractmath.org.
  5. Definitions into Mathematical Objects 7.
  6. Object oriented programming in Wikipedia.
  7. M. Gelfand, We Do Not Choose Mathematics as Our Profession, It Chooses Us: Interview with Yuri Manin.
  8. Discussion in n-category cafe.
Send to Kindle

Three kinds of mathematical thinkers

This is a continuation of my post Syntactic and semantic thinkers, in which I mentioned Leone Burton’s book [1] but hadn’t read it yet.  Well, now it is due back at the library so I’d better post about it!

I recommend this book for anyone interested in knowing more about how mathematicians think about and learn math.  The book is based on in-depth interviews with seventy mathematicians.  (One in-depth interview is worth a thousand statistical studies.)   On page 53, she writes

At the outset of this study, I had two conjectures with respect to thinking style.  The first was that I would find the two different thinking styles,the visual and the analytic, well recorded in the literature… The second was that research mathematicians would move flexibly between the two.  Neither of these conjectures were confirmed.

What she discovered was three styles of mathematical thinking:

Style A: Visual (or thinking in pictures, often dynamic)

Style B: Analytic (or thinking symbolically, formalistically)

Style C: Conceptual (thinking in ideas, classifying)

Style B corresponds more or less with what was called “syntactic” in [3] (based on [2]).  Styles A and C are rather like the distinctions I made in [3] that I called “conceptual” and “visual”, although I really want Style A to communicate not only “visual” but “geometric”.

I recommend jumping through the book reading the quotes from the interviews.  You get a good picture of the three styles that way.

Visual vs. conceptual

I had thought about this distinction before and have had a hard time explaining what “conceptual” means, particularly since for me it has a visual component.  I mentioned this in [3].  I think about various structures and their relationship by imagining them as each in a different part of a visual field, with the connections as near as I can tell felt rather than seen.  I do not usually think in terms of the structures’ names (see [4]).  It is the position that helps me know what I am thinking about.

When it comes time to write up the work I am doing, I have to come up with names for things and find words to describe the relationships that I was feeling. (See remark (5) below).  Sometimes I have also written things down and come up with names, and if this happened very much I invariable get a clash of notation that didn’t bother me when I was thinking about the concepts because the notations referred to things in different places.

I would be curious if others do math this way.  Especially people better than I am.  (Clue to a reasonable research career:  Hang around people smarter than you.)

Remarks

1) I have written a lot about images and metaphors [5], [6].  They show up in the way I think about things sometimes.  For example, when I am chasing a diagram I am thinking of each successive arrow as doing something.  But I don’t have any sense that I depend a lot on metaphors.  What I depend on is my experience with thinking about the concept!

2) Some of the questions on Math Overflow are of the “how do I think about…” type (or “what is the motivation for…”).  Some of the answers have been Absolutely Entrancing.

3) Some of the respondents in [1] mentioned intuition, most of them saying that they thought of it as an important part of doing math.  I don’t think the book mentioned any correlation between these feelings and the Styles A, B, C, but then I didn’t read the book carefully.  I never read any book carefully. (My experience with Style B of the subtype Logic Rules diss intuition. But not analysts of the sort who estimate errors and so on.)

4) Concerning A, B, C:  I use Style C (conceptual) thinking mostly, but a good bit of Style (B) (analytic) as well.  I think geometrically when I do geometry problems, but my research has never tended in that direction.  Often the analytic part comes after most of the work has been done, when I have to turn the work into a genuine dry-bones proof.

5) As an example of how I have sometimes worked, I remember doing a paper about lifting group automorphisms (see [7]), in which I had a conceptual picture with a conceptual understanding of the calculations of doing one transformation after another which produced an exact sequence in cohomology.  When I wrote it up I thought it would be short.  But all the verifications made the paper much longer.  The paper was conceptually BigChunk BigChunk BigChunk BigChunk … but each BigChunk required a lot of Analytic work.  Even so, I missed a conceptual point (one of the groups involved was a stabilizer but I didn’t notice that.)

References

[1] Leone Burton, Mathematicians as Enquirers: Learning about Learning Mathematics.  Kluwer, 2004.

[2] Keith Weber, Keith Weber, How syntactic reasoners can develop understanding, evaluate conjectures, and generate counterexamples in advanced mathematics. Proof copy available from Science Direct.

[3] Post on this blog: Syntactic and semantic thinkers.

[4] Post: Thinking without words.

[5] Post: Proofs without dry bones.

[6] Abstractmath.org article on Images and Metaphors.

[7] Post: Automorphisms of group extensions updated.

Send to Kindle

Naive proofs

The monk problem

A monk starts at dawn at the bottom of a mountain and goes up a path to the top, arriving there at dusk. The next morning at dawn he begins to go down the path, arriving at dusk at the place he started from on the previous day. Prove that there is a time of day at which he is at the same place on the path on both days.

Proof: Envision both events occurring on the same day, with a monk starting at the top and another starting at the bottom at the same time and doing the same thing the monk did on different days. They are on the same path, so they must meet each other. The time at which they meet is the time required.

The pons asinorum

Theorem: If a triangle has two equal angles, then it has two equal sides.

Proof: In the figure below, assume angle ABC = angle ACB. Then triangle ABC is congruent to triangle ACB since the sides BC and CB are equal and the adjoining angles are equal.

PATriangle

I considered the monk problem at length in my post Proofs Without Dry Bones.  Proofs like the one given of the pons asinorum, particularly its involvement with labeling, recently came up on the mathedu mailing list.  See also my question on Math Overflow.

Naive proofs

These proofs share a characteristic property; I propose to say they are naive, in the sense Halmos used it in his title Naive Set Theory.

The monk problem proof is naive.

For the monk problem, you can give a model of a known mathematical type (for example model the paths as  smoothly parametrized curves on a surface) and use known theorems (for example the intermediate value theorem) and facts (for example that clock time is cyclical and invariant under the appropriate mapping) to prove it.  But the proof says nothing about that.

You could imagine inventing an original set of axioms for the monk problem, giving axioms for a structure that are satisfied by the monk’s journeys and their timing and that imply the result.  In principle, these could be very different from multivariable calculus ideas and still serve the purpose. (But I have not tried to come up with such a thing.)

But the proof as given simply uses directly  known facts about clock time and traveling on paths.  These are known to most people.  I have claimed in several places that this proof is still a mathematical proof.

Every proof is incomplete in the sense that they provide a mathematical model and analyze it using facts the reader is presumed to know.  Proofs never go all the way to foundations.  A naive proof simply depends more than usual on the reader’s knowledge: the percentage of explication is lower.  Perhaps “naive” should also include the connotation that the requisite knowledge is “common knowledge”.

The pons asinorum proof is naive.

This involves some subtle issues.  When I first wrote about this proof in the Handbook I envisioned the triangle as existing independently of any embedding in the plane, as if in the Platonic world of ideals.  I applied some labels and a relabeling and used a known theorem of Euclid’s geometry.  You certainly don’t have to know where the triangle is in order to understand the proof.

That’s a clue.  The triangle in the problem does not need to be planar. It is true for triangles in the sphere or on a saddle surface, because the proof does not involve the parallel axiom. But the connection with the absence of the parallel axiom is illusory.  When you imagine the triangle in your head the proof works directly for a triangle in any suitable geometry, by imagining the triangle as existing in and of itself, and not embedded in anything.

Questions

  1. How do you give a mathematical definition of a triangle so that it is independent of embedding?  This was the origin of my question on Math Overflow, although I muddled the issue by mentioning specific ways of doing it.
  2. (This is a variant of question 1.)  Is there anything like a classifying topos or space for a generic triangle?  In other words, a category or space or something that is just big enough to include the generic triangle and from which mappings to suitable spaces or categories produce what we usually mean by triangles.
  3. Some of the people on mathedu thought a triangle obviously had to have labels and others thought it obviously didn’t.  Specifically, is triangle ABC “the same” as triangle ACB?  Of course they are congruent.  Are they the sameThis is an evil question. The proof works on the generic isosceles triangle.  That’s enough.  Isn’t it?  All three corners of the generic isosceles triangle are different points.  Aren’t they?  (I have had second, third and nth thoughts about this point.)
  4. You can define a triangle as a list of lengths of edges and connectivity data.  But the generic triangle’s sides ought to be (images of) line segments, not abstract data.  I don’t really understand how to formulate this correctly.

Note

1.  I could avoid discussion of irrelevant side issues in the monk problem by referring to specific times of day for starting and stopping, instead of dawn and dusk.  But they really are irrelevant.

Send to Kindle

Syntactic and semantic thinkers

A paper by Keith Weber

Reidar Mosvold’s math-ed blog recently provided a link to an article by Keith Weber (Reference [2]) about a very good university math student he referred to as a “syntactic reasoner”.  He interviewed the student in depth as the student worked on some proofs suitable to his level.  The student would “write the proofs out in quantifiers” and reason based on previous steps of the proof in a syntactic way rather than than depending on an intuitive understanding of the problem, as many of us do (the author calls us semantic reasoners).  The student didn’t think about specific examples —  he always tried to make them as abstract as possible while letting them remain examples (or counterexamples).

I recommend this paper if you are at all interested in math education at the university math major level — it is fascinating.  It made all sorts of connections for me with other ideas about how we think about math that I have thought about for years and which appear in the Understanding Math part of abstractmath.org.  It also raises lots of new (to me) questions.

Weber’s paper talks mostly about how the student comes up with a proof.  I suspect that the distinction between syntactic reasoners and semantic reasoners can be seen in other aspects of mathematical behavior, too, in trying to understand and explain math concepts.  Some thoughts:

Other behaviors of syntactic reasoners (maybe)

1) Many mathematicians (and good math students) explain math using conceptual and geometric images and metaphors, as described in Images and metaphors in abstractmath.org.   Some people I think of as syntactic reasoners seem to avoid such things. Some of them even deny thinking in images and metaphors, as I discussed in the post Thinking without words.   It used to be that even semantic reasoners were embarassed to used images and metaphors when lecturing (see the post How “math is logic” ruined math for a generation).

2) In my experience, syntactic reasoners like to use first order symbolic notation, for example eq0001MP

and will often translate a complicated sentence in ordinary mathematical English into this notation so they can understand it better.  (Weber describes the student he interviewed as doing this.)  Furthermore they seem to think that putting a formula such as the one above on the board says it all, so they don’t need to draw pictures, wave their hands [Note 1], and so on.  When you come up with a picture of a concept or theorem that you claim explains it their first impulse is to say it out in words that generally can be translated very easily into first order symbolism, and say that is what is going on.  It is a matter of what is primary.

The semantic reasoners of students and (I think) many mathematicians find the symbolic notation difficult to parse and would rather have it written out in English.  I am pretty good at reading such symbolic notation [Note 2] but I still prefer ordinary English.

3) I suspect the syntactic reasoners also prefer to read proofs step by step, as I described in my post Grasshoppers and linear proofs, rather than skipping around like a grasshopper.

And maybe not

Now it may very well be that syntactic thinkers do not all do all those things I mentioned in (1)-(3).  Perhaps the group is not cohesive in all those ways.  Probably really good mathematicians use both techniques, although Weyl didn’t think so (quoted in Weber’s paper).   I think of myself as an image and metaphor person but I do use syntax, and sometimes even find that a certain syntactic explanation feels like a genuinely useful insight, as in the example I discussed under conceptual in the Handbook.

Distinctions among semantic thinkers

Semantic thinkers differ among themselves.  One demarcation line is between those who use a lot of visual thinking and those who use conceptual thinking which is not necessarily visual.  I have known grad students who couldn’t understand how I could do group theory (that was in a Former Life, before category theory) because how could you “see” what was happening?  But the way I think about groups is certainly conceptual, not syntactic.  When I think of a group acting on a space I think of it as stirring the space around.  But the stirring is something I feel more than I see.  On the other hand, when I am thinking about the relationships between certain abstract objects, I “see” the different objects in different parts of an interior visual space.  For example, group is on the right, stirring the space-acted-upon on the left, or the group is in one place, a subgroup is in another place while simultaneously being inside the group, and the cosets are grouped (sorry) together in a third place, being (guess what) stirred around by the group acting by conjugation (Note [3]).

This distinction between conceptual and visual, perhaps I should say visual-conceptual and non-visual-conceptual, both opposed to linguistic or syntactic reasoning, may or may not be as fundamental as syntactic vs semantic.   But it feels fundamental to me.

Weber’s paper mentions an intriguing sounding book (Reference [1]) by Burton which describes a three-way distinction called conceptual, visual and symbolic, that sounds like it might be the distinction I am discussing here.  I have asked for it on ILL.

Notes

  1. Handwaving is now called kinesthetic communication.  Just to keep you au courant.
  2. I took Joe Shoenfield’s course in logic when his book  Mathematical Logic [3] was still purple.
  3. Clockwise for left action, counterclockwise for right action.  Not.

References

  1. Leone L. Burton, Mathematicians as Enquirers: Learning about Learning Mathematics.  Springer, 2004.
  2. Keith Weber, How syntactic reasoners can develop understanding, evaluate conjectures, and generate counterexamples in advanced mathematics. Proof copy available from Science Direct.
  3. Joseph Shoenfield, Mathematical logic, Addison-Wesley 1967, reprinted 2001 by the Association for Symbolic Logic.
Send to Kindle

Mastering a proof

In response to Grasshoppers and linear proofs, Avery Andrews said:

Maybe a related question is how much time people do/ought spend on really mastering the proofs of theorems in textbooks, ‘mastering’ being, say, able to explain it in any desired amount of detail at least 2 weeks after last looking at it.

There are two different goals:

  1. Mastering the proof of a theorem in a textbook so that you can explain it in any desired amount of detail…
  2. Mastering a proof of the theorem so that you can explain it in any desired amount of detail…

My observation is that most research mathematicians don’t attempt (1); they are satisfied with (2).  Trying to understand a written proof in detail can be quite difficult:

  • The author may use misleading language.
  • The author may jump over a piece of reasoning that to them is obvious but not to you.
  • The author may mention a previous step or a theorem that justifies the current step, but get the reference wrong.

And so on.

In my observation the typical mathematician will look at the proof, perhaps getting some idea of the overall strategy of the whole proof or a particular part, and then think about it independently until they come up with a proof or part of it.  This may or may not be what the author had in mind.  But by thinking through it the reader will solidify their understanding of the proof in a way that reading and rereading step by step is unlikely to do.

When you construct your knowledge like that you are likely to have it in a permanent, well semi-permanent, way.

Send to Kindle

Grasshoppers and linear proofs

Below, I give an detailed example of how the context of a proof changes as you read the proof line by line. This example comes from the abstractmath article on context.  I mean something like verbal context or  context in the computer science sense (see also Reference [1]): the values of all the relevant variables as specified up to the current statement in the proof.  For example, if the proof says “Suppose x = 3″, then when you read succeeding statements you know that x has the value 3, as long as it is not changed in some later statement.

Here is the text I will analyze:

Definition: Divides

Let m and n be integers with m\ne 0. The statement “m divides n” means that there is an integer q for which n=qm.

Theorem

Let m, n and p be integers, with m and n nonzero, and suppose m divides n and n divides p .  Then m divides p.

Proof

By definition of divides, there are integers q and q’ for which n=qm and p=q'n. We must prove that there is an integer q'' for which p=q''n. But p=q'n=q'qm, so let q''=q'q.  Then p=q''n.

0) Definition: Divides Changes the status of the word “divides” so that it becomes the definiendum. The scope is the following paragraph.
1) Let m and n be integers m and n are new symbols in this discourse, constrained to be integers
2) with m\neq 0 another constraint on m
3) The statement “m divides n” means that This sentence fragment gives the rest of the sentence (in the box below it) a special status.
4) there is an integer q for which n = qm. This clause introduces q, another new symbol constrained to be an integer.  The clause imposes a restraint on m, n and q, that they satisfy the equation n = qm. But we know this only in the scope of the word Definition, which ends at the end of the sentence.  Once we read the word Theorem we no longer know that q exists, much less that it satisfies the constraint.  Indeed, the statement of the definition means that one way to prove the theorem is to find an integer q for which n = qm. This is not stated explicitly, and indeed the reader would be wrong to draw the conclusion that in what follows the theorem will be proved in this way. (In fact it will in this example, but the author could have done some other kind of proof. )
5) Theorem The placement of the word “Theorem” here announces that the next paragraph is a mathematical statement and that the statement has been proved.  In real time the statement was proved long before this discourse was written, but in terms of reading the text in order, it has not yet been proved.
6) Let m, n and p be integers, We are starting a new context, in which we know that m, n and p are all  integers.  This changes that status of m and n, which were variables used in the preceding paragraph, but now all previous constraints are discarded. We are starting over with m, n, and p.  We are also starting what the reader must recognize as the hypotheses of a conditional sentence, since that affects the context in a very precise way.
7) with m and n nonzero. Now m and n are nonzero.  Note that in the previous paragraph n was not constrained to be nonzero.  Between the words “Let” and “with” in the current sentence, neither were constrained to be nonzero.
8 ) and n divides p More new constraints:  m divides n and n divides p.
9) Then m divides p.   The word “then” signals that we are starting the conclusion of the conditional sentence.  It makes a claim that m divides p whenever the conditions in the hypothesis are true.  Because it is the conclusion, it has a different status from the assumptions that m divides n and n divides p.   We can’t treat m as if it divides p even though this sentence says it does.  All we know is that the author is claiming that m divides p if the hypotheses are true, and we expect (because the next word is “Proof”) that this claim will shortly be proved.
10) Proof

This starts a new paragraph.  It does not necessarily wipe out the context.  If the proof is going to be by the direct method (assume hypothesis, prove conclusion) — as it does — then it will still be true that m and n are nonzero integers,  m divides n and n divides p.
11) By definition of divides, there are integers q and qfor which n = qm and p = q’n .

Since this proof starts by stating the hypothesis of the definition of “divides”, we now know that we are using the direct method, and that q and q’ are new symbols that we are to assume satisfy the equations  n = qm and p = q’n.   The phrase “by definition of divides” tells us (because the definition was given previously) that there are such integers, so in effect this sentence chooses q and qso that  n = qm and p = q’n.  The reader probably knows that there is only one choice for each of q and q′ but in fact that claim is not being made here.  Note that m, n and p are not new symbols – they still fall within the scope of the previous paragraph, so we still know that  m divides n and n divides p. If the proof were by contradiction, we would not know that.
12) We must prove that there is an integer  q” for which p = q”n q’’ is introduced by this sentence and is constrained by the equation. The scope of this sentence is just this sentence. The existence of  q’’ and the constraint on it do not exist in the context after the sentence is finished.  However, the constraints previously imposed on m, n, p, q and q’ do continue.
13) But  p = q’n = q’qm This is a claim about p, q, q′, m and n.  The equations are justified by certain preceding sentences but this justification is not made explicit.
14) so let q” = q’q We are establishing a new variable q″ in the context.   Now we put another constraint on it, namely q” = q’q.  It is significant that a variable named q″ was introduced once before, in the reference to the definition of divides.  A convention of mathematical discourse tells you to expect the author to establish that it fits the requirement of the definition. This condition is triggered by using the same symbol q″ both here and in the definition.
15) Then p = q”n This is an assertion about p, q″ and n, justified (but not explicitly) by the claim that p = q’n = q’qm.
16) The proof is now complete, although no statement asserts that.

I have several comments to make about this kind of analysis that are (mostly) not included in the abstractmath article.

a) This is supposed to be what goes through an experienced mathematician’s head while they are reading the proof.  Mostly subconsciously.  Linguists (as in Reference [1]) seem to think something like this takes place in your mind when you read any text, but it gets much denser in mathematical text.  Computer scientists analyze the operation of subprograms in this way, too.

b) Comment (a) is probably off the mark.  With a short proof like that, I get a global picture of the proof as my eyes dart back and forth over the various statements in the proof.  Now, I am a grasshopper: I read math stuff by jumping back and forth trying to understand the structure of the argument.  I do this both locally in a short proof and also globally when reading a long article or book:  I page through to find the topic I want and then jump back and forth finding the meanings of words and phrases I don’t understand.

c) I think most mathematicians are either grasshoppers or they are not good readers and they simply do not learn math by reading text.  I would like feedback on this.

d) If (a) is incorrect, should I omit this example from abstractmath?  I don’t think so.  My experience in teaching tells me that

  1. some students think this is perfectly obvious and why would I spend time constructing the example?,
  2. others are not aware that this is going on in their head and they are amazed to realize that it is really happening,
  3. and still others do not understand how to read proofs and when you tell them this sort of thing goes on in your head they are terminally intimidated.  (“Terminally” in the sense that they dye their hair black and become sociology majors.  They really do.)  Is that bad?  Well, I don’t think so.  I would like to hear arguments on the other side.

e) Can you figure out why item 8 of the analysis is labeled as “8 )” instead of “8)”?

Time is running out. I have other comments to make which must wait for a later post.

References

G. Chierchia and S. McConnell-Ginet (1990), Meaning and Grammar. The MIT Press.

Send to Kindle

Introduction to Wikibook on categories

Below is my newly rewritten introduction to the Wikibook on categories.  I am posting it here because of course the Wikibook version is likely to change at any time.

== Introduction ==

This Wikibook is an introduction to category theory.  It is written for those who have some understanding of one or more branches of abstract mathematics, such as group theory, analysis or topology.  The book contains many examples drawn from various branches of math.  If you are not familiar with some of the kinds of math mentioned, don’t worry.  If practically all the examples are unfamiliar, this book may be too advanced for you.

===What is a category?===

A category is a mathematical structure, like a group or a vector space, abstractly defined by axioms.  Groups were defined in this way in order to study symmetries (of physical objects and equations, among other things).  Vector spaces are an abstraction of vector calculus.

What makes category theory different from the study of other structures is that in a sense the concept of category is an abstraction of a kind of mathematics. (This cannot be made into a precise mathematical definition!)  This makes category theory unusually self-referential and capable of treating many of the same questions that mathematical logic treats.  In particular, it provides a language that unifies many concepts in different parts of math.

In more detail, a category has objects and morphisms or arrows.  (It is best to think of the morphisms as arrows: the word “morphism” makes you think they are set maps, and they are not always set maps. The formal definition of category is given in the chapter on categories.)

  • The category of groups has groups as objects and homomorphisms as arrows.
  • The category of vector spaces has vector spaces as objects and linear maps as arrows.

The maps between categories that preserve structure are called functors.

  • The underlying set of a group determines a functor from the category of groups to the category of sets.
  • The fundamental group of a pointed space determines a functor from the category of pointed topological spaces to the category of groups.  The fact that it is a functor means that a continuous point-preserving map from a pointed space S to a pointed space T induces a group homomorphism from the fundamental group of S to the fundamental group of T.

Categories form a category as well, with functors as arrows.  Most fundamentally, functors between specific categories form a category: its morphisms are called natural transformations. The fact that category theory has natural transformations is arguably the single feature that makes category theory so important.

===History===

Category theory was invented by Samuel Eilenberg and Saunders Mac Lane in the 1940’s as a way of expressing certain constructions in algebraic topology.  Category theory was developed rapidly in the subsequent decades.  It has become an autonomous part of mathematics, studied for its own sake as well as being widely used as a unified language for the expression of mathematical ideas relating different fields.

For example, algebraic topology relates domains of interest in geometry to domains of interest in algebra. Algebraic geometry, on the other hand, goes in the opposite direction, associating, for example, with each commutative ring its spectrum of prime ideals.  These fields were among the earliest to be studied using tools of category theory.  Later applications came to abstract algebra, logic, computing science and physics, among others.

===Aspects of category theory===

Because the concept of a category is so general, it is to be expected that theorems provable for all categories will not usually be very deep. Consequently, many theorems of category theory are stated and proved for particular classes of categories.

  • Homological algebra is concerned with Abelian categories, which exhibit features suggested by the category of Abelian groups.
  • Logic is studied using topos theory: a topos is a category with certain properties in common with the category of sets but which allows the logic of the topos to be weaker than classical logic.  It is characteristic of the malleability of category theory that toposes were originally developed to study algebraic geometry.

An important use purpose of categorical reasoning is to identify within a given argument that part which is trivial and separate it from the part which is deep and proper to the particular context. For example, in the study of the theory of the GCD, the fact that it is essentially unique simply follows from the uniqueness of the product in any category and is thus really trivial. On the other hand, the fact that the GCD of the integers A and B can be expressed as a linear combination of A and B with integer coefficients—GCD(a, b) = ma + nb, for some integers m and n —is a much deeper fact that is special to a much more restricted situation.

===Note on terminology===

Most variations in terminology are discussed in the place where the terminology is defined.  Here it is important to point out one annoying terminological problem:  The adjective corresponding to “category” is “categorical”.  Since “categorical” in logic means having only one model up to isomorphism, this can cause cognitive dissonance; in any case, the use of “categorical” in this book has nothing to do with the idea of having only one model.

Some authors use “categorial” instead.  Unfortunately, this means something else in linguistics.  This book follows majority usage with “categorical”.

Send to Kindle

Mathematical concepts

This post was triggered by John Armstrong’s comment on my last post.

We need  to distinguish two ideas: representations of a mathematical concept and the total concept.  (I will say more about terminology later.)

Example: We can construct the quotient of the kernel of a group homomorphism by taking its cosets and defining a multiplication on them.  We can construct the image of the homomorphism by take the set of values of the homomorphism and using the multiplication induced by the codomain group.   The quotient group and the image are the same mathematical structure in the sense that anything useful you can say about one is true of the other.   For example, it may be useful to know the cardinality of the quotient (image) but it is not useful to know what its elements are.

But hold on, as the Australians say, if we knew that the codomain was an Abelian group then we would know that the quotient group was abelian because the elements of the image form a subgroup of the codomain. (But the Australians I know wouldn’t say that.)

Now that kind of thinking is based on the idea that the elements of the image are “really” elements of the codomain whereas elements of the quotients are “really” subsets of the domain.  That is outmoded thinking.  The image and the quotient are the same in all important aspects because they are naturally isomorphic.   We should think of the quotient as just as much as subgroup of the codomain as the image is.  John Baez (I think) would say that to ask whether the subgroup embedding is the identity on elements or not is an evil question.

Let’s step back and look at what is going on here.  The definition of the quotient group is a construction using cosets.  The definition of the image is a construction using values of the homomorphism.  Those are two different specific  representations of the same concept.

But what is the concept, as distinct from its representations?  Intuitively, it is

  • All the constructions made possible by the definition of the concept.
  • All the statements that are true about the concept.

(That is not precise.)

The total concept is like the clone plus the equational theory of a specific type of algebra in the sense of universal algebra.  The clone is all the operations you can construct knowing the given signature and equations and the equational theory is the set of all equations that follow from them.  That is one way of describing it.  Another is the monad in Set that gives the type of algebra — the operations are the arrows and the equations are the commutative diagrams.

Note: The preceding description of the monad is not quite right.  Also the whole discussion omits mention of the fact that we are in the world (doctrine) of universal algebra.  In the world of first order logic, for example, we need to refer to the classifying topos of the category of algebras of that type (or to its first order theory).

Terminology

We need better terminology for all this.  I am not going to propose better terminology, so this is a shaggy dog story.

Math ed people talk about a particular concept image of a concept as well as the total schema of the concept.

In categorical logic, we talk about the sketch or presentation of the concept vs. the theory. The theory is a category (of the kind appropriate to the doctrine) that contains all the possible constructions and commutative diagrams that follow from the presentation.

In this post I have used “total concept” to refer to the schema or theory.  I have referred the particular things as  “representations” (for example construct the image of a homomorphism by cosets or by values of the homomorphism).

“Representation” does not have the same connotations as “presentation”.  Indeed a presentation of a group and a representation of a group are mathematically  two different things.  But I suspect they are two different aspects of the same idea.

All this needs to be untangled.  Maybe we should come up with two completely arbitrary words, like “dostak” and “dosh”.

Send to Kindle

Different names for the same thing

I recommend reading the discussion (to which I contributed) of the post “Why aren’t all functions well-defined?” on Gower’s Weblog.   It resulted in an insight I should have had a long time ago.

I have been preaching the importance of different ways of thinking about a math object (different images, metaphors, mental representations — there are too many names for this in the math ed literature).   Well, mathematicians at least occasionally use different names for a type of math object to indicate how they are thinking about it.

Examples

We talk about a relation and we talk about multivalued functions. Those are two different ways of talking about the same thing (they are the same by an adjunction).   A relation is a predicate.  A multivalued function is a function except that it can have more than one output for a given input.  But they are the same thing.

We talk about an equivalence relation and we talk about a partition of a set (or a quotient set).  The category of equivalence relations and the category of partitions of sets are naturally isomorphic, not merely equivalent.  But one is a special kind of relation and the other is a grouping.

Let’s be open about what we do

We should be explicit about the way we think about and do math.  We have several different ways to think about any interesting type of math object and we should push this practice to students as being absolutely vital.  In particular we (some of us) use different names sometimes for the same object and we refuse to give them up, muttering about “reductionism” and “nothing buttery”.

Some students arrive in class already as (pedantic?)(geeky?) as many mathematicians (I am a recovering pedant myself).  We need to be up front about this phenomenon and explain the value of thinking and talking about the same thing in different ways, even using different words.

It used to be different but now it’s the same

A kind of opposite phenomenon occurs with some students and mathematicians of a certain personality type.  Consider the name “multivalued function”.  Of course a multivalued function is not (necessarily) a function.  Your mother-in-law  is not your mother, either.  I go on about this (using ideas from Lakoff) in the Handbook under “radial concept”.   Pedantic types can’t stand this kind of usage.  “A multivalued function can’t be a function”.  “Equivalence relations and partitions are not the same thing because one is a relation and the other is a set of sets.”  “The image of a homomorphism and the quotient by its kernel are not the same thing because…”

This attitude makes me tired.  Put your hands on the tv screen and think like a category theorist.

Send to Kindle