Category Archives: math

Syntax Trees in Mathematicians’ Brains

Understanding the quadratic formula

In my last post I wrote about how a student’s pattern recognition mechanism can go awry in applying the quadratic formula.

The template for the quadratic formula says that the solution of a quadratic equation of the form ${ax^2+bx+c=0}$ is given by the formula

$\displaystyle x=\frac{-b\pm\sqrt{b^2-4ac}}{2a}$

When you ask students to solve ${a+bx+cx^2=0}$ some may write

$\displaystyle x=\frac{-b\pm\sqrt{b^2-4ac}}{2a}$

instead of

$\displaystyle x=\frac{-b\pm\sqrt{b^2-4ac}}{2c}$

That’s because they have memorized the template in terms of the letters ${a}$, ${b}$ and ${c}$ instead of in terms of their structural meaning — $ {a}$ is the coefficient of the quadratic term, ${c}$ is the constant term, etc.

The problem occurs because there is a clash between the occurrences of the letters “a”, “b”, and “c” in the template and in the equation to solve. But maybe the confusion would occur anyway, just because of the ordering of the coefficients. As I asked in the previous post, what happens if students are asked to solve $ {3+5x+2x^2=0}$ after having learned the quadratic formula in terms of ${ax^2+bx+c=0}$? Some may make the same kind of mistake, getting ${x=-1}$ and ${x=-\frac{2}{3}}$ instead of $ {x=-1}$ and $ {x=-\frac{3}{2}}$. Has anyone ever investigated this sort of thing?

People do pattern recognition remarkably well, but how they do it is mysterious. Just as mistakes in speech may give the linguist a clue as to how the brain processes language, students’ mistakes may tell us something about how pattern recognition works in parsing symbolic statements as well as perhaps suggesting ways to teach them the correct understanding of the quadratic formula.

Syntactic Structure

“Structural meaning” refers to the syntactic structure of a mathematical expression such as ${3+5x+2x^2}$. It can be represented as a tree:

(1)

This is more or less the way a program compiler or interpreter for some language would represent the polynomial. I believe it corresponds pretty well to the organization of the quadratic-polynomial parser in a mathematician’s brain. This is not surprising: The compiler writer would have to have in mind the correct understanding of how polynomials are evaluated in order to write a correct compiler.

Linguists represent English sentences with syntax trees, too. This is a deep and complicated subject, but the kind of tree they would use to represent a sentence such as “My cousin saw a large ship” would look like this:

Parsing by mathematicians

Presumably a mathematician has constructed a parser that builds a structure in their brain corresponding to a quadratic polynomial using the same mechanisms that as a child they learned to parse sentences in their native language. The mathematician learned this mostly unconsciously, just as a child learns a language. In any case it shouldn’t be surprising that the mathematicians’s syntax tree for the polynomial is similar to the compiler’s.

Students who are not yet skilled in algebra have presumably constructed incorrect syntax trees, just as young children do for their native language.

Lots of theoretical work has been done on human parsing of natural language. Parsing mathematical symbolism to be compiled into a computer program is well understood. You can get a start on both of these by reading the Wikipedia articles on parsing and on syntax trees.

There are papers on students’ misunderstandings of mathematical notation. Two articles I recently turned up in a Google search are:

Both of these papers talk specifically about the syntax of mathematical expressions. I know I have read other such papers in the past, as well.

What I have not found is any study of how the trained mathematician parses mathematical expression.

For one thing, for my parsing of the expression $ {3+5x+2x^2}$, the branching is wrong in (1). I think of ${3+5x+2x^2}$ as “Take 3 and add $ {5x}$ to it and then add ${2x^2}$ to that”, which would require the shape of the tree to be like this:

I am saying this from introspection, which is dangerous!

Of course, a compiler may group it that way, too, although my dim recollection of the little bit I understand about compilers is that they tend to group it as in (1) because they read the expression from left to right.

This difference in compiling is well-understood.  Another difference is that the expression could be compiled using addition as an operator on a list, in this case a list of length 3.  I don’t visualize quadratics that way but I certainly understand that it is equivalent to the tree in Diagram (1).  Maybe some mathematicians do think that way.

But these observations indicate what might be learned about mathematicians’ understanding of mathematical expressions if linguists and mathematicians got together to study human parsing of expressions by trained mathematicians.

Some educational constructivists argue against the idea that there is only one correct way to understand a mathematical expression.  To have many metaphors for thinking about math is great, but I believe we want uniformity of understanding of the symbolism, at least in the narrow sense of parsing, so that we can communicate dependably.  It would be really neat if we discovered deep differences in parsing among mathematicians.  It would also be neat if we discovered that mathematicians parsed in generally the same way!


Send to Kindle

Templates in mathematical practice

This post is a first pass at what will eventually be a section of abstractmath.org. It’s time to get back to abstractmath; I have been neglecting it for a couple of years.

What I say here is based mainly on my many years of teaching discrete mathematics at Case Western Reserve University in Cleveland and more recently at Metro State University in Saint Paul.

Beginning abstract math

College students typically get into abstract math at the beginning in such courses as linear algebra, discrete math and abstract algebra. Certain problems that come up in those early courses can be grouped together under the notion of (what I call) applying templates [note 0]. These are not the problems people usually think about concerning beginners in abstract math, of which the following is an incomplete list:

The students’ problems discussed here concern understanding what a template is and how to apply it.

Templates can be formulas, rules of inference, or mini-programs. I’ll talk about three examples here.

The template for quadratic equations

The solution of a real quadratic equation of the form {ax^2+bx+c=0} is given by the formula

\displaystyle  x=\frac{-b\pm\sqrt{b^2-4ac}}{2a}

This is a template for finding the roots of the equations. It has subtleties.

For example, the numerator is symmetric in {a} and {c} but the denominator isn’t. So sometimes I try to trick my students (warning them ahead of time that that’s what I’m trying to do) by asking for a formula for the solution of the equation {a+bx+cx^2=0}. The answer is

\displaystyle x=\frac{-b\pm\sqrt{b^2-4ac}}{2c}

I start writing it on the board, asking them to tell me what comes next. When we get to the denominator, often someone says “{2a}”.

The template is telling you that the denominator is 2 times the coefficient of the square term. It is not telling you it is “{a}”. Using a template (in the sense I mean here) requires pattern matching, but in this particular example, the quadratic template has a shallow incorrect matching and a deeper correct matching. In detail, the shallow matching says “match the letters” and the deep matching says “match the position of the letters”.

Most of the time the quadratic being matched has particular numbers instead of the same letters that the template has, so the trap I just described seldom occurs. But this makes me want to try a variation of the trick: Find the solution of {3+5x+2x^2=0}. Would some students match the textual position (getting {a=3}) instead of the functional position (getting {a=5})? [Note [0]). If they did they would get the solutions {(-1,-\frac{2}{3})} instead of {(-1,-\frac{3}{2})}.

Substituting in algebraic expressions have other traps, too. What sorts of mistakes would students have solving {3x^2+b^2x-5=0}?

Most students on the verge of abstract math don’t make mistakes with the quadratic formula that I have described. The thing about abstract math is that it uses more sophisticated templates

  • subject to conditions
  • with variations
  • with extra levels of abstraction

The template for proof by induction

This template gives a method of proof of a statement of the form {\forall{n}\mathcal{P}(n)}, where {\mathcal{P}} is a predicate (presumably containing {n} as a variable) and {n} varies over positive integers. The template says:

Goal: Prove {\forall{n}\mathcal{P}(n)}.

Method:

  • Prove {\mathcal{P}(1)}
  • For an arbitrary integer {n>1}, assume {\mathcal{P}(n)} and deduce {\mathcal{P}(n+1)}.

For example, to prove {\forall n (2^n+1\geq n^2)} using the template, you have to prove that {2^2+1\geq  1^1}, and that for any {n>1}, if {2^n+1\geq n^2}, then {2^{n+1}+1\geq  (n+1)^2}. You come up with the need to prove these statements by substituting into the template. This template has several problems that the quadratic formula does not have.

Variables of different types

The variable {n} is of type integer and the variable {\mathcal{P}} is of type predicate [note 0]. Having to deal with several types of variables comes up already in multivariable calculus (vectors vs. numbers, cross product vs. numerical product, etc) and they multiply like rabbits in beginning abstract math classes. Students sometimes write things like “Let {\mathcal{P}=n+1}”. Multiple types is a big problem that math ed people don’t seem to discuss much (correct me if I am wrong).

Free and bound

The variable {n} occurs as a bound variable in the Goal and a free variable in the Method. This happens in this case because the induction step in the Method originates as the requirement to prove {\forall  n(\mathcal{P}(n)\rightarrow\mathcal{P}(n+1))}, but as I have presented it (which seems to be customary) I have translated this into a requirement based on modus ponens. This causes students problems, if they notice it. (“You are assuming what you want to prove!”) Many of them apparently go ahead and produce competent proofs without noticing the dual role of {n}. I say more power to them. I think.

The template has variations

  • You can start the induction at other places.
  • You may have to have two starting points and a double induction hypothesis (for {n-1} and {n}). In fact, you will have to have two starting points, because it seems to be a Fundamental Law of Discrete Math Teaching that you have to talk about the Fibonacci function ad nauseam.
  • Then there is strong induction.

It’s like you can go to the store and buy one template for quadratic equations, but you have to by a package of templates for induction, like highway engineers used to buy packages of plastic French curves to draw highway curves without discontinuous curvature.

The template for row reduction

I am running out of time and won’t go into as much detail on this one. Row reduction is an algorithm. If you write it up as a proper computer program there have to be all sorts of if-thens depending on what you are doing it for. For example if want solutions to the simultaneous equations

2x+4y+z = 1
x+2y = 0
x+2y+4z = 5

you must row reduce the matrix

2 4 1 1
1 2 0 0
1 2 4 5

(I haven’t yet figured out how to wrap this in parentheses) which gives you

1 2 0 0
0 0 1 0
0 0 0 1

This introduces another problem with templates: They come with conditions. In this case the condition is “a row of three 0s followed by a nonzero number means the equations have no solutions”. (There is another condition when there is a row of all 0’s.)

It is very easy for the new student to get the calculation right but to never sit back and see what they have — which conditions apply or whatever.

When you do math you have to repeatedly lean in and focus on the details and then lean back and see the Big Picture. This is something that has to be learned.

What to do, what to do

I have recently experimented with being explicit about templates, in particular going through examples of the use of a template after explicitly stating the template. It is too early to say how successful this is. But I want to point out that even though it might not help to be explicit with students about templates, the analysis in this post of a phenomenon that occurs in beginning abstract math courses

  • may still be accurate (or not), and
  • may help teachers teach such things if they are aware of the phenomenon, even if the students are not.

Notes

  1. Many years ago, I heard someone use the word “template” in the way I am using it now, but I don’t recollect who it was. Applied mathematicians sometimes use it with a meaning similar to mine to refer to soft algorithms–recipes for computation that are not formal algorithms but close enough to be easily translated into a sufficiently high level computer language.
  2. In the formula {ax^2+bx+c}, the “{a}” has the first textual position but the functional position as the coefficient of the quadratic term. This name “functional position” has nothing to do with functions. Can someone suggest a different name that won’t confuse people?
  3. I am using “variable” the way logicians do. Mathematicians would not normally refer to “{\mathcal{P}}” as a variable.
  4. I didn’t say anything about how templates can involve extra layers of abstract.  That will have to wait.
Send to Kindle

Thinking about mathematical objects revisited

How we think about X

It is notable that many questions posted at MathOverflow are like, “How should I think about X?”, where X can be any type of mathematical object (quotient group, scheme, fibration, cohomology and so on).  Some crotchety contributors to that group want the questions to be specific and well-defined, but “how do I think about…” questions  are in my opinion among the most interesting questions on the website.  (See note [a]).

Don’t confuse “How do I think about X” with “What is X really?” (pace Reuben Hersh).  The latter is a philosophical question.  As far as I am concerned, thinking about how to think about X is very important and needs lots of research by mathematicians, educators, and philosophers — for practical reasons: how you think about it helps you do it.   What it really is is no help and anyway no answer may exist.

Inert and eternal

The idea that mathematical objects should be thought of as  “inert” and “eternal”  has been around for awhile.  (Never mind whether they really are inert and eternal.)  I believe, and have said in the past [1], that thinking about them that way clears up a lot of confusion in newbies concerning logical inference.

  • That mathematical objects are “inert” means that the do not cause anything. They have no effect on the real world or on each other.
  • That they are “eternal” means they don’t change over time.

Naturally, a function (a mathematical object) can model change over time, and it can model causation, too, in that it can describe a process that starts in one state and achieves stasis in another state (that is just one way of relation functions to causation).  But when we want to prove something about a type of math object, our metaphorical understanding of them has to lose all its life and color and go dead, like the dry bones before Ezekiel started nagging them.

It’s only mathematical reasoning if it is about dead things

The effect on logical inference can be seen in the fact that “and” is a commutative logical operator. 

  • “x > 1 and x < 3″ means exactly the same thing as “x < 3 and x > 1″
  • “He picked up his umbrella and went outside” does not mean the same thing as “He went outside and picked up his umbrella”.

The most profound effect concerns logical implication.  “If  x > 1 then x > 0″ says nothing to suggest that x > 1 causes it to be the case that x > 0.  It is purely a statement about the inert truth sets of two predicates lying around the mathematical boneyard of objects:  The second set includes the first one.  This makes vacuous implication perfectly obvious.  (The number -1 lies in neither truth set and is irrelevant to the fact of inclusion).

Inert and eternal rethought

There are better metaphors than these.  The point about the number 3 is that you think about it as outside time. In the world where you think about 3 or any other mathematical object, all questions about time are meaningless.

  • In the sentence “3 is a prime”, we need a new tense in English like the tenses ancient (very ancient) Greek and Hebrew were supposed to have (perfect with gnomic meaning), where a fact is asserted without reference to time.
  • Since causation involves this happens, then this happens, all questions about causation are meaningless, too.  It is not true that 3 causes 6 to be composite, while being irrelevant to the fact that 35 is composite.

This single metaphor “outside time” thus can replace the two metaphors “inert” and “eternal” and (I think) shows that the latter two are really two aspects of the same thing.

Caveat

Thinking of math objects as outside time is a Good Thing when you are being rigorous, for example doing a proof.  The colorful, changing, full-of-life way of thinking of math that occurs when you say things like the statements below is vitally necessary for inspiring proofs and for understanding how to apply the mathematics.

  • The harmonic series goes to infinity in a very leisurely fashion.
  • A function is a machine — when you dump in a number it grinds away and spits out another number.
  • At zero, this function vanishes.

Acknowledgment

Thanks to Jody Azzouni for the italics (see [3]).

Notes

a.  Another interesting type of question  “in what setting does such and such a question (or proof) make sense?” .  An example is my question in [2].

References

1.  Proofs without dry bones

2. Where does the generic triangle live?

3. The revolution in technical exposition II.

Send to Kindle

Logarithms mod an integer

Recently on MathOverflow the following question was asked: “9=2^x\ \text{mod}\ 11.   What is x and how do you find x?”  This question was soon closed as being inappropriate for MO.  In fact it is a very interesting question.

The general question is:  Solve a^x=b\ \text{mod}\ m (a, b, x, m all integers).  This is the discrete logarithm problem.  All known general algorithms for solving it run in exponential time.  It is used in cryptography, for example in the RSA algorithm and the Diffie-Hellman algorithm.

For example, the RSA algorithm is implement in this way:  Find two very large primes p and q.  This can be done in polynomial time.  Let m=pq.  Find integers d and e relatively prime to (p-1)(q-1) such that de = 1 \ \text{mod}\ m; this can also be done efficiently.  Note that for any integer a, a^{de}=a\ \text{mod}\ m.  You make m and e public but keep d, p and q private.  Then someone else can encode a message as an integer a and transmit a^e\ \text{mod}\ m (which can be calculated pretty fast)  to you.  Since you know the logarithm d you can decode it by calculating a^{de}=a\ \text{mod}\ m.   (What if the message is too long?  Details, details…)

The fact that you can find large primes fast but you can’t find discrete logarithms fast means that this method is safe.  The fact that no one can prove you can’t find discrete logarithms fast means cryptographers worry about this a lot.

 

Send to Kindle

Models of forms explicated, a little bit

Some miscellaneous notes about the concept of form, which I sketched (ahem) in  a series of posts TDMO1TDMO2TDMO3TDMO4,TDMO5TDMO6, TDMO7, TDMO8, and TDMO9. This series builds up to an explanation of the concept of form in the paper Graph-Based Logic and Sketches by Atish Bagchi and me.  I am now embarking on a series of posts with further explanations and comments.

More about a model of a form

For any constructor space \text{C} (which will be the sketch for a kind of category, a special case of a doctrine), take any object F in the FL Cattheory of \text{C}   and adjoin a new arrow f:1\to F where 1 is the terminal object.  f is what we call a C-form and the enhanced category is denoted by \text{C}_f.  In TDMO9 I described the node \text{rfs} for reflexive function spaces in a cartesian closed category; it is an example of a CCC-form.

A model of a \text{C}-form f  “in a C-category \text{K}” means that \text{K} is a model of \text{C}_f . In particular, in  \text{K}, M(F) is nonempty.

The connection with sketches is this:  If you have a sketch in some doctrine \text{C}, the sketch consists of a graph with some diagrams, cones and cocones.  There is a node F in the FL Cattheory of \text{C} each of whose elements in a model of \text{C} (in other words in a \text{C}-category) will be such a sketch.

Example: The FP sketch for magmas

A magma is a set with a binary operation defined on it (Note 1).  It does not have to be associative or commutative or anything.  In the FP doctrine its sketch consists of one diagram

(Diagram1)

and one cone

(Cone1)

and nothing else.  The FP-Cattheory for this sketch is (equivalent to) the Lawvere theory of magmas.

The FL-Cattheory \text{FP} for FP categories, described in some detail in TDMO8, contains a node whose inhabitants in any model of \text{FP} (in other words in any FP-category) are all such sketches (the diagram and the cone).   This means that the FP sketch for magmas corresponds to an FP-form.  In this way you can see that all sketches in Ehresmann’s sense are forms in my sense.

This node can be constructed as the limit F of a cone over a diagram in \text{FP} as was done in previous posts.  You have to make the diagram become a description of the diagram and cone above, using the arrows in the constructor space \text{FP}, for example \text{ob}, \text{ar}, \text{prod}:\text{ob}\times\text{ob}\rightarrow\text{cone} and others, and including formally commutative diagrams that say for example that Cone1’s projections go to the same object (using \text{lproj} and \text{rproj}).   Maybe someday I will produce this diagram in a post but right now I have a cold.  (Excuses, excuses…)

Adjoining a global element to this limit node will result in an FL-sketch \text{FP}_fwhich contains the FL-Cattheory for \text{FP} along with that global element.

So a model of the form for magmas in an FP category \cal{B} is a model of \text{FP}_f for which the model of the underlying cattheory \text{FP} is \cal{B}; in other words it is the category \cal{B} with a distinguished element f of the node F.  That distinguished element is a particular diagram and cone like the ones shown above for a particular object A (because the projections onto F include a particular projection to \text{ob}).  That object A with the arrows corresponding to m, p_1 and p_2 is a particular magma, a model of the sketch for magmas given above.

Notes

Note 1  “Magma” was the term used by Bourbaki for this structure.  As far as I know, very few people ever used the word until it was published in [1].  When I was a grad student in 1962-65 it was called a “groupoid”, which means something else now (something much more important than a magma in my opinion).  Now the name occurs in examples all over Wikipedia.

References

[1] M. Hazewinkel (2001), “Magma“, in Hazewinkel, Michiel, Encyclopaedia of Mathematics, Kluwer Academic Publishers, ISBN 978-1556080104

Send to Kindle

Where does the generic triangle live?

I recently posted the following question on MathOverflow.  It is a revision of an earlier question.

Motivation 1

There is such a thing as a generic group. In category theory this is done by constructing “theory” of the group, which is a category in a certain doctrine. Functors (in that doctrine) to Set, or more generally to any topos, are groups. The barest such theory (as usually seen) is the Lawverean algebraic theory of groups. This theory is a category containing an object and operations making it a group object in that category, and the theory is the smallest such category that contains all finite limits. There are fancier ones; the fanciest is the classifying topos for groups, which is in some sense the initial topos-with-group object. Since in a topos, you have full-scale first order intuitionistic logic, the classifying topos for groups allows you to reason about the generic group inside the classifying topos and the theorems you prove will be true for all groups. (This is only an approximation of the actual situation.) In particular you can’t prove it is abelian and you can’t prove it isn’t; the logic clearly does not have excluded middle.

Motivation 2

You can prove that a triangle that has two angles that are equal must be isosceles (has two sides that are equal). You can do this with Pappus’ proof: Look at the triangle, flip it over the perpendicular from the odd angle to the other side, look at it again, and the side-angle-side theorem shows you that the “two” triangles are congruent, so two sides much be equal. This appears to me to be true without requiring the parallel postulate. So the theorem and the proof must be true not only in Euclidean 2-space but in any surface of constant curvature. (Here I am getting into territory I know very little about, so this particular motivation may be totally misguided.)

The Question

So what I want is a classifying space of some sort that contains the generic triangle in such a way that maps of the correct sort to any surface of constant curvature are triangles, and so that Pappus’ proof can be carried out in the classifying space. The space doesn’t have to be a topos or a category at all. I have no clue as to what sort of structure it would be.

Note 1: Even the Lawvere theory of groups has its own internal logic — in this case equational logic. You certainly cannot prove the generic group is or is not abelian with equational logic.

Note 2: It does not seem reasonable to me that Pappus’ proof would work in a surface with variable curvature. But maybe there is some trick to define “angle mod curvature” that would make it true.

Send to Kindle

Just-in-time foundations

Introduction

In MathOverflow, statements similar to the following two occurred in comments:

  1. Sets and functions do not form a category
  2. Categories and functors do not form a category.

I cannot find either one of them now, but I want to talk about them anyway.

If you look at the definition of categories in various works (for example references [1] through [3] below) you find that the objects and arrows of a category must each form a “collection” or “class” together with certain operations.   The authors all describe the connection with Grothendieck’s concept of “universe” and define “large categories” and “small categories” in the usual way.  So Statement 1 above is simply wrong.

Statement 2 is more problematic.  The trouble is that if the word “categories” includes large categories then the objects do not form a set even in the second universe.  You have to go to the third universe.

Now there is a way to define categories where this issue does not come up.  It allows us to think about categories without having a particular system such as ZF and universes in mind.

A syntactic definition of category

A category consists of objects and arrows, together with four methods of construction M1 – M4 satisfying laws L1 -L7.  I treat “object” and “arrow” as predicates:  object[f] means f is an object and arrow[a] means a is an arrow.  “=” means equals in the mathematical sense.

M1 Source If arrow[f], object[f.source].
M2 Target If arrow [f], object[f.target].
M3 Identity If object[a],  arrow[a.identity].
M4 Comp If arrow[g] and arrow[f] and  f.target = g.source, then arrow[(g,f).comp].
L1. If object[a],  a.identity.source = a.
L2. If object[a], a.identity.target = a.
L3. If arrow[g] and arrow[f] and  f.target = g.source, then (g,f).comp.source = f.source.
L4. If arrow[g] and arrow[f] and  f.target = g.source, then (g,f).comp.target = g.target.
L5. If object[a] and arrow[f] and f.source = a, then (f, a.identity) = f.
L6.  If object[a] and arrow[g] and g.target = a, then (a.identity, g) = g.
L7.  If arrow[h] and arrow[g] and arrow[f] and h.source= g.target and g.source = f.target, then (h,(g,f).comp = ((h,g).comp, f.comp).
Remarks on this definition
1. I have deliberately made this definition look like a specification in an object oriented program (see [6]), although the syntax is not the same as any particular oo language.  It is as rigorous a mathematical definition as you could want, and it could presumably be compiled in some oo language, except that I don’t know if oo languages allow the conditional definition of a method as given in M4.
2.  I could have given the definition in mathematical English, for example “If f is an arrow then the source of f is an object”.  My point in providing the impenetrable definition above is to make a connection (admittedly incompletely established) with a part of math (the theory of oo languages) that is definitely rigorous but is not logic.  An informal definition in math English of course could also be transformed rigorously into first order logic.
3.  This definition is exactly equivalent to the FL sketch for categories given in my post [5].  That sketch has models in many categories, not just Set, as well as its generic model living in the corresponding FL-cattheory (or in the classifying topos it generates).
4.  Saunders Mac Lane defined metacategory in precisely this way in [1].  That was of course before anyone every heard of oo languages.  I think he should have made that the definition of category.

Just-in-time foundations

Mathematicians work inside the categories Set (sets and functions) and Cat (categories and functors) all the time, including functors to or from Cat or Set. When they consider a category, the use theorems that follow from the definition above.  They do not have to have foundations in mind.

Once in awhile, they are frustrated because they cannot talk about the set of objects of some category.  For example, Freyd’s solution set condition is required to prove the existence of a left adjoint because of that problem.  The ss condition is a work-around for a familiar obstruction to an easy way to prove something.  I can imagine coming up with such a work-around without ever giving a passing thought to foundations, in particular without thinking of universes.

When you work with a mathematical object, the syntax of the definitions and theorems give you all you need to justify the claim that something is a theorem.  You absolutely need models of the theory to think up and understand proofs, but the models could be sets or classes with structure, or functors (as in sketch theory), or you may work with generic models which may require you to use intuitionistic reasoning.  You don’t have to have any particular kind of model in mind when you work in Set or Cat.

When you do run into something like the impossibility of forming the set of objects of some category (which happens in any model theory environment that uses classical rather than intuitionistic reasonins) then you may want to consider an approach through some theory of foundations.  That is what most mathematicians do: they use just-in-time foundations. For example, in a particular application you may be happy to work in a topos with a set-of-all-objects, particularly if you are a certain type of computer scientists who lives in Pittsburgh.  You may be happy to explicitly consider universes, although I am not aware of any category-theoretical results that do explicitly mention universes.

But my point is that most mathematicians think about foundations only when they need to, and most mathematicians never need to think about foundations in their work. Moral: Don’t think in terms of foundations unless you have to.

This point of view is related to the recent discussions of pragmatic foundations [7] [8].

Side remark

The situation that you can’t always construct a set of somethings is analogous to the problem that you have in working with real numbers:  You can’t name most real numbers. This may get in the way of some analyst wanting to do something, I don’t know.  But in any branch of math, there are obstructions to things you want to do that really do get in your way.  For example, in beginning linear algebra, it may have occurred to you, to your annoyance, that if you have the basis of a subspace you can extend it to the basis for the whole space, but if you have a basis of the whole space, and a subspace, the basis may not contain a basis of the subspace.

References and links

  1. Saunders Mac Lane, Categories for the working mathematician. Springer-Verlag, 1971.
  2. Wikipedia article on category theory
  3. Michael Barr and Charles Wells, Category Theory for Computing Science, Third Edition (1999). Les Publications CRM, Montreal (publication PM023).
  4. Discussion of functions in abstractmath.org.
  5. Definitions into Mathematical Objects 7.
  6. Object oriented programming in Wikipedia.
  7. M. Gelfand, We Do Not Choose Mathematics as Our Profession, It Chooses Us: Interview with Yuri Manin.
  8. Discussion in n-category cafe.
Send to Kindle

Three kinds of mathematical thinkers

This is a continuation of my post Syntactic and semantic thinkers, in which I mentioned Leone Burton’s book [1] but hadn’t read it yet.  Well, now it is due back at the library so I’d better post about it!

I recommend this book for anyone interested in knowing more about how mathematicians think about and learn math.  The book is based on in-depth interviews with seventy mathematicians.  (One in-depth interview is worth a thousand statistical studies.)   On page 53, she writes

At the outset of this study, I had two conjectures with respect to thinking style.  The first was that I would find the two different thinking styles,the visual and the analytic, well recorded in the literature… The second was that research mathematicians would move flexibly between the two.  Neither of these conjectures were confirmed.

What she discovered was three styles of mathematical thinking:

Style A: Visual (or thinking in pictures, often dynamic)

Style B: Analytic (or thinking symbolically, formalistically)

Style C: Conceptual (thinking in ideas, classifying)

Style B corresponds more or less with what was called “syntactic” in [3] (based on [2]).  Styles A and C are rather like the distinctions I made in [3] that I called “conceptual” and “visual”, although I really want Style A to communicate not only “visual” but “geometric”.

I recommend jumping through the book reading the quotes from the interviews.  You get a good picture of the three styles that way.

Visual vs. conceptual

I had thought about this distinction before and have had a hard time explaining what “conceptual” means, particularly since for me it has a visual component.  I mentioned this in [3].  I think about various structures and their relationship by imagining them as each in a different part of a visual field, with the connections as near as I can tell felt rather than seen.  I do not usually think in terms of the structures’ names (see [4]).  It is the position that helps me know what I am thinking about.

When it comes time to write up the work I am doing, I have to come up with names for things and find words to describe the relationships that I was feeling. (See remark (5) below).  Sometimes I have also written things down and come up with names, and if this happened very much I invariable get a clash of notation that didn’t bother me when I was thinking about the concepts because the notations referred to things in different places.

I would be curious if others do math this way.  Especially people better than I am.  (Clue to a reasonable research career:  Hang around people smarter than you.)

Remarks

1) I have written a lot about images and metaphors [5], [6].  They show up in the way I think about things sometimes.  For example, when I am chasing a diagram I am thinking of each successive arrow as doing something.  But I don’t have any sense that I depend a lot on metaphors.  What I depend on is my experience with thinking about the concept!

2) Some of the questions on Math Overflow are of the “how do I think about…” type (or “what is the motivation for…”).  Some of the answers have been Absolutely Entrancing.

3) Some of the respondents in [1] mentioned intuition, most of them saying that they thought of it as an important part of doing math.  I don’t think the book mentioned any correlation between these feelings and the Styles A, B, C, but then I didn’t read the book carefully.  I never read any book carefully. (My experience with Style B of the subtype Logic Rules diss intuition. But not analysts of the sort who estimate errors and so on.)

4) Concerning A, B, C:  I use Style C (conceptual) thinking mostly, but a good bit of Style (B) (analytic) as well.  I think geometrically when I do geometry problems, but my research has never tended in that direction.  Often the analytic part comes after most of the work has been done, when I have to turn the work into a genuine dry-bones proof.

5) As an example of how I have sometimes worked, I remember doing a paper about lifting group automorphisms (see [7]), in which I had a conceptual picture with a conceptual understanding of the calculations of doing one transformation after another which produced an exact sequence in cohomology.  When I wrote it up I thought it would be short.  But all the verifications made the paper much longer.  The paper was conceptually BigChunk BigChunk BigChunk BigChunk … but each BigChunk required a lot of Analytic work.  Even so, I missed a conceptual point (one of the groups involved was a stabilizer but I didn’t notice that.)

References

[1] Leone Burton, Mathematicians as Enquirers: Learning about Learning Mathematics.  Kluwer, 2004.

[2] Keith Weber, Keith Weber, How syntactic reasoners can develop understanding, evaluate conjectures, and generate counterexamples in advanced mathematics. Proof copy available from Science Direct.

[3] Post on this blog: Syntactic and semantic thinkers.

[4] Post: Thinking without words.

[5] Post: Proofs without dry bones.

[6] Abstractmath.org article on Images and Metaphors.

[7] Post: Automorphisms of group extensions updated.

Send to Kindle

Naive proofs

The monk problem

A monk starts at dawn at the bottom of a mountain and goes up a path to the top, arriving there at dusk. The next morning at dawn he begins to go down the path, arriving at dusk at the place he started from on the previous day. Prove that there is a time of day at which he is at the same place on the path on both days.

Proof: Envision both events occurring on the same day, with a monk starting at the top and another starting at the bottom at the same time and doing the same thing the monk did on different days. They are on the same path, so they must meet each other. The time at which they meet is the time required.

The pons asinorum

Theorem: If a triangle has two equal angles, then it has two equal sides.

Proof: In the figure below, assume angle ABC = angle ACB. Then triangle ABC is congruent to triangle ACB since the sides BC and CB are equal and the adjoining angles are equal.

PATriangle

I considered the monk problem at length in my post Proofs Without Dry Bones.  Proofs like the one given of the pons asinorum, particularly its involvement with labeling, recently came up on the mathedu mailing list.  See also my question on Math Overflow.

Naive proofs

These proofs share a characteristic property; I propose to say they are naive, in the sense Halmos used it in his title Naive Set Theory.

The monk problem proof is naive.

For the monk problem, you can give a model of a known mathematical type (for example model the paths as  smoothly parametrized curves on a surface) and use known theorems (for example the intermediate value theorem) and facts (for example that clock time is cyclical and invariant under the appropriate mapping) to prove it.  But the proof says nothing about that.

You could imagine inventing an original set of axioms for the monk problem, giving axioms for a structure that are satisfied by the monk’s journeys and their timing and that imply the result.  In principle, these could be very different from multivariable calculus ideas and still serve the purpose. (But I have not tried to come up with such a thing.)

But the proof as given simply uses directly  known facts about clock time and traveling on paths.  These are known to most people.  I have claimed in several places that this proof is still a mathematical proof.

Every proof is incomplete in the sense that they provide a mathematical model and analyze it using facts the reader is presumed to know.  Proofs never go all the way to foundations.  A naive proof simply depends more than usual on the reader’s knowledge: the percentage of explication is lower.  Perhaps “naive” should also include the connotation that the requisite knowledge is “common knowledge”.

The pons asinorum proof is naive.

This involves some subtle issues.  When I first wrote about this proof in the Handbook I envisioned the triangle as existing independently of any embedding in the plane, as if in the Platonic world of ideals.  I applied some labels and a relabeling and used a known theorem of Euclid’s geometry.  You certainly don’t have to know where the triangle is in order to understand the proof.

That’s a clue.  The triangle in the problem does not need to be planar. It is true for triangles in the sphere or on a saddle surface, because the proof does not involve the parallel axiom. But the connection with the absence of the parallel axiom is illusory.  When you imagine the triangle in your head the proof works directly for a triangle in any suitable geometry, by imagining the triangle as existing in and of itself, and not embedded in anything.

Questions

  1. How do you give a mathematical definition of a triangle so that it is independent of embedding?  This was the origin of my question on Math Overflow, although I muddled the issue by mentioning specific ways of doing it.
  2. (This is a variant of question 1.)  Is there anything like a classifying topos or space for a generic triangle?  In other words, a category or space or something that is just big enough to include the generic triangle and from which mappings to suitable spaces or categories produce what we usually mean by triangles.
  3. Some of the people on mathedu thought a triangle obviously had to have labels and others thought it obviously didn’t.  Specifically, is triangle ABC “the same” as triangle ACB?  Of course they are congruent.  Are they the sameThis is an evil question. The proof works on the generic isosceles triangle.  That’s enough.  Isn’t it?  All three corners of the generic isosceles triangle are different points.  Aren’t they?  (I have had second, third and nth thoughts about this point.)
  4. You can define a triangle as a list of lengths of edges and connectivity data.  But the generic triangle’s sides ought to be (images of) line segments, not abstract data.  I don’t really understand how to formulate this correctly.

Note

1.  I could avoid discussion of irrelevant side issues in the monk problem by referring to specific times of day for starting and stopping, instead of dawn and dusk.  But they really are irrelevant.

Send to Kindle

Characterizing triangles unembeddedly

I just posted this question on mathoverflow (I recommend looking into this new forum):

The mathedu mailing list has a recent longish thread which discusses among other things whether we should teach triangles as labeled or unlabeled to high school students (this is a vast oversimplification of the thread).  I have long been concerned with how we think (informally and formally) about mathematical objects, as for example my unfinished article here about the many ways of thinking about function.  So naturally, I started to consider how we think about triangles.

Consider circles.   Most informal and formal descriptions involve an embedding into R^2, but they *can* be characterized as manifolds (even as Riemannian manifolds) of dimension 1 with specific properties, independent of any embedding. This sort of thing has turned out to be a major way to think about all sorts of spaces.  So can we describe triangles in a similar way?

Unfortunately, manifolds are far removed from my usual mathematical work (category theory).  What I *think* I understand is that there can be *piecewise* linear manifolds, even Riemannian ones.  So perhaps we can say a triangle is a piecewise linear manifold of dimension 1 with certain properties.  Now, I want to define a triangle so that it comes complete with information about the lengths of its sides and what the three angles are.  Riemannian manifolds have a way to specify length and angles, and I can believe you can make the sides have specific lengths.  But the angles?  It seems to me that the tangent spaces (like those on a circle) result in all angles being 0 or pi, except at the corners where they don’t exist.  But I may not understand the situation correctly.

So my question is:  Is there a known methodology that allows triangles to be characterized independent of embeddings in such a way that incorporates information about side lengths and angles?

Send to Kindle