Unless

Mark Meckes recently wrote (private communication):

I’m teaching a fairly new transition course at Case this term, which involves explicitly teaching students the basics of mathematical English along with the obvious things like logic and proof techniques.  I had a student recently ask about how to interpret “A unless B”.  After a fairly lively discussion in class today, we couldn’t agree on the truth table for this statement, and concluded in the end that “unless” is best avoided in mathematical writing.  I checked the Handbook of Mathematical Discourse to see if you had anything to say about it there, but there isn’t an entry for it.  So, are you aware of a standard interpretation of “unless” in mathematical English?

I did not consider  “unless” while writing HMD.   What should be done to approach a subject like this is to

  • think up examples  (preferably in a bull session with other mathematicians) and try to understand what they mean logically, then
  • do an extensive research of the mathematical literature to see if you can find examples that do and do not correspond  with your tentative understanding.  (Usually you find other uses besides the one you thought of, and sometimes you will discover that what you came up with is completely wrong.)  

What follows is an example of this process.

I can think of three possible meanings for “P unless Q”:

1.  “P if and only if not Q”,
2.  “not Q implies P”
3.  “not P implies Q”.

An example that satisfies (1) is “$latex x^2-x$ is positive unless $latex 0 \leq x \leq 1$”.  I have said that specific thing to my classes — calculus students tend not to remember that the parabola is below the line $latex y=x$ on that interval. (And that’s the way you should show them — draw a picture, don’t merely lecture.  Indeed, make them draw a picture.)

An example of (2) that is not an example of (1) is “$latex x^2-x$ is positive unless $latex x = 1/2$”.  I don’t think anyone would say that, but they might say “$latex x^2-x$ is positive unless, for example, $latex x = 1/2$”.  I would say that is a correct statement in mathematical English.  I guess the phrase “for example” translates into telling you that this is a statement of form “Q implies not P”, where Q is now “x = 1/2″.   Using the contrapositive, that is equivalent to “P implies not Q”, but that is neither (2) nor (3).

An example of (3) that is not an example of (1) is “$latex x^2-x$ is positive unless $latex -1 < x < 1$”.  I think that any who said that (among math people) would be told that they are wrong, because for example $latex (\frac{-1}{2})^2-\frac{-1}{2} = \frac{3}{4}$.  That reaction amounts to saying that (3) is not a correct interpretation of “P unless Q”.

Because of examples like these, my conjecture is that “P unless Q” means “P if and only if not Q”.  But to settle this point requires searching for “unless” in the math literature and seeing if you can find instances where “P unless Q” is not equivalent to “P if and only if not Q”.  (You could also see what happens with searching for “unless” and “example” close together.)

Having a discussion such as the above where you think up examples can give you a clue, but you really need to search the literature.  What I did with the Handbook is to search JStor, available online at Case.  I have to say I had definite opinions about several usages that were overturned during the literature search. (What “brackets” means is an example.)

My proxy server at Case isn’t working right now but when I get it repaired I will look into this question.

A tiny step towards killing string-based math

I discussed endographs of real functions in my post  Endographs and cographs of real functions.  Endographs of finite functions also provide another way of thinking about functions, and I show some examples here.  This is not a new idea; endographs have appeared from time to time in textbooks, but they are not used much, and they have the advantage of revealing some properties of a function instantly that cannot be seen so easily in a traditional graph or cograph.

In contrast to endographs of functions on the real line, an endograph of a finite function from a set to itself contains all the information about the function.  For real functions, only some of the arrows can be shown; you are dependent on continuity to interpolate where the infinite number of intermediate arrows would be, and of course, it is easy to produce a function, with, say, small-scale periodicity, that the arrows would miss, so to speak.  But with an endograph of a finite function, WYSIATI (what you see is all there is).

Here is the endograph of a function.  It is one function.  The graph has four connected components.

You can see immediately that it is a permutation  of the set $latex \{1,2,3,4,5,6\}$, and that it is involution (a permutation $latex f$ for which $latex f f=\text{id}$).  In cycle notation, it is the permutation $latex (1 2)(5 6)$, and the connected components of the endograph correspond to the cycle structure.

Here is another permutation:

You can see that to get $latex f^n=\text{id}$ you would have to have $latex n=6$, since you have to apply the 3-cycle 3 times and the transposition twice to get the identity.   The cycle structure $latex (1 2 4)(0 3)$ tells you this, but you have to visualize it acting to see that.  The endograph gives the newbie a jumpstart on the visualization.  “The power to understand and predict the quantities of the world should not be restricted to those with a freakish knack for manipulating abstract symbols” (Brett Victor).   This is an argument for insisting that this permutation is the endograph, and the abstract string of symbols $latex (1 2 4)(0 3)$ is a representation of secondary importance.  [See Note 1.]

Here is the cograph of the same function.  It requires a bit of visualization or tracing arrows around to see its cycle structure.

If I had rearranged the nodes like this

the cycle structure would be easier to see.  This does not indicate as much superiority of the endograph metaphor over the cograph metaphor as you might think:  My endograph code [Note 2] uses Mathematica’s graph-displaying algorithm, which automatically shows cycles clearly.   The cograph code that I wrote specifies the placement of the nodes explicitly, so I rearranged them to obtain the second cograph above using my knowledge of the cycle structure.

The following endographs of functions that are not permutations exhibit the general fact that the graph of a finite function consists of cycles with trees attached.   This structure is obvious from the endographs, and it is easy to come up with a proof of this property of finite functions by tracing your finger around the endographs.

This is the endograph of the polynomial $latex 2 n^9+5 n^8+n^7+4 n^6+9 n^5+1$ over the finite field of 11 elements.

Here is another endograph:

I constructed this explicitly by writing a list of rules, and then used Mathematica’s interpolating polynomial to determine that it is given by the polynomial

$latex 6 x^{16}+13 x^{15}+x^{14}+3 x^{13}+10 x^{12}+5
x^{11}\\ +14 x^{10}+4 x^9+9 x^8+x^7+14 x^6\\ +15
x^5+16 x^4+14 x^3+4 x^2+15 x+11$

in GF[17].

Quite a bit is known about polynomials over finite fields that give permutations.  For example there is an easy proof using interpolating polynomials that a polynomial that gives a transposition must have degree $latex q-2$.  The best reference for this stuff is Lidl and Niederreiter, Introduction to Finite Fields and their Applications

The endographs above raise questions such as what can you say about the degree or coefficients of a polynomial that gives a digraph like the function $latex f$ below that is idempotent ($latex f f=f$).  Students find idempotence vs. involution difficult to distinguish between.  Digraphs show you almost immediately what is going on.  Stare at the digraph below for a bit and you will see that if you follow $latex f$ to a node and then follow  it again you stay where you are (the function is the identity on its image).  That’s another example of the insights you can get from a new metaphor for a mathematical object.

The following function is not idempotent even though it has only trivial loops.  But the digraph does tell you easily that it satisfies $latex f^4=f^3$.

Notes

[1] Atish Bagchi and I have contributed to this goal in Graph Based Logic and Sketches, which gives a bare glimpse of the possibility of considering that the real objects of logic are diagrams and their limits and morphisms between them, rather than hard-to-parse strings of letters and logical symbols.  Implementing this (and implementing Brett Victor’s ideas) will require sophisticated computer support.  But that support is coming into existence.  We won’t have to live with string-based math forever.

[2] The Mathematica notebook used to produce these pictures is here.  It has lots of other examples.

Just-in-time foundations

Introduction

In MathOverflow, statements similar to the following two occurred in comments:

  1. Sets and functions do not form a category
  2. Categories and functors do not form a category.

I cannot find either one of them now, but I want to talk about them anyway.

If you look at the definition of categories in various works (for example references [1] through [3] below) you find that the objects and arrows of a category must each form a “collection” or “class” together with certain operations.   The authors all describe the connection with Grothendieck’s concept of “universe” and define “large categories” and “small categories” in the usual way.  So Statement 1 above is simply wrong.

Statement 2 is more problematic.  The trouble is that if the word “categories” includes large categories then the objects do not form a set even in the second universe.  You have to go to the third universe.

Now there is a way to define categories where this issue does not come up.  It allows us to think about categories without having a particular system such as ZF and universes in mind.

A syntactic definition of category

A category consists of objects and arrows, together with four methods of construction M1 – M4 satisfying laws L1 -L7.  I treat “object” and “arrow” as predicates:  object[f] means f is an object and arrow[a] means a is an arrow.  “=” means equals in the mathematical sense.

M1 Source If arrow[f], object[f.source].
M2 Target If arrow [f], object[f.target].
M3 Identity If object[a],  arrow[a.identity].
M4 Comp If arrow[g] and arrow[f] and  f.target = g.source, then arrow[(g,f).comp].
L1. If object[a],  a.identity.source = a.
L2. If object[a], a.identity.target = a.
L3. If arrow[g] and arrow[f] and  f.target = g.source, then (g,f).comp.source = f.source.
L4. If arrow[g] and arrow[f] and  f.target = g.source, then (g,f).comp.target = g.target.
L5. If object[a] and arrow[f] and f.source = a, then (f, a.identity) = f.
L6.  If object[a] and arrow[g] and g.target = a, then (a.identity, g) = g.
L7.  If arrow[h] and arrow[g] and arrow[f] and h.source= g.target and g.source = f.target, then (h,(g,f).comp = ((h,g).comp, f.comp).
Remarks on this definition
1. I have deliberately made this definition look like a specification in an object oriented program (see [6]), although the syntax is not the same as any particular oo language.  It is as rigorous a mathematical definition as you could want, and it could presumably be compiled in some oo language, except that I don’t know if oo languages allow the conditional definition of a method as given in M4.
2.  I could have given the definition in mathematical English, for example “If f is an arrow then the source of f is an object”.  My point in providing the impenetrable definition above is to make a connection (admittedly incompletely established) with a part of math (the theory of oo languages) that is definitely rigorous but is not logic.  An informal definition in math English of course could also be transformed rigorously into first order logic.
3.  This definition is exactly equivalent to the FL sketch for categories given in my post [5].  That sketch has models in many categories, not just Set, as well as its generic model living in the corresponding FL-cattheory (or in the classifying topos it generates).
4.  Saunders Mac Lane defined metacategory in precisely this way in [1].  That was of course before anyone every heard of oo languages.  I think he should have made that the definition of category.

Just-in-time foundations

Mathematicians work inside the categories Set (sets and functions) and Cat (categories and functors) all the time, including functors to or from Cat or Set. When they consider a category, the use theorems that follow from the definition above.  They do not have to have foundations in mind.

Once in awhile, they are frustrated because they cannot talk about the set of objects of some category.  For example, Freyd’s solution set condition is required to prove the existence of a left adjoint because of that problem.  The ss condition is a work-around for a familiar obstruction to an easy way to prove something.  I can imagine coming up with such a work-around without ever giving a passing thought to foundations, in particular without thinking of universes.

When you work with a mathematical object, the syntax of the definitions and theorems give you all you need to justify the claim that something is a theorem.  You absolutely need models of the theory to think up and understand proofs, but the models could be sets or classes with structure, or functors (as in sketch theory), or you may work with generic models which may require you to use intuitionistic reasoning.  You don’t have to have any particular kind of model in mind when you work in Set or Cat.

When you do run into something like the impossibility of forming the set of objects of some category (which happens in any model theory environment that uses classical rather than intuitionistic reasonins) then you may want to consider an approach through some theory of foundations.  That is what most mathematicians do: they use just-in-time foundations. For example, in a particular application you may be happy to work in a topos with a set-of-all-objects, particularly if you are a certain type of computer scientists who lives in Pittsburgh.  You may be happy to explicitly consider universes, although I am not aware of any category-theoretical results that do explicitly mention universes.

But my point is that most mathematicians think about foundations only when they need to, and most mathematicians never need to think about foundations in their work. Moral: Don’t think in terms of foundations unless you have to.

This point of view is related to the recent discussions of pragmatic foundations [7] [8].

Side remark

The situation that you can’t always construct a set of somethings is analogous to the problem that you have in working with real numbers:  You can’t name most real numbers. This may get in the way of some analyst wanting to do something, I don’t know.  But in any branch of math, there are obstructions to things you want to do that really do get in your way.  For example, in beginning linear algebra, it may have occurred to you, to your annoyance, that if you have the basis of a subspace you can extend it to the basis for the whole space, but if you have a basis of the whole space, and a subspace, the basis may not contain a basis of the subspace.

References and links

  1. Saunders Mac Lane, Categories for the working mathematician. Springer-Verlag, 1971.
  2. Wikipedia article on category theory
  3. Michael Barr and Charles Wells, Category Theory for Computing Science, Third Edition (1999). Les Publications CRM, Montreal (publication PM023).
  4. Discussion of functions in abstractmath.org.
  5. Definitions into Mathematical Objects 7.
  6. Object oriented programming in Wikipedia.
  7. M. Gelfand, We Do Not Choose Mathematics as Our Profession, It Chooses Us: Interview with Yuri Manin.
  8. Discussion in n-category cafe.

How "math is logic" ruined math for a generation

Mark Meckes responded to my statement

But it seems to me that this sort of thinking has mostly resulted in people thinking philosophy of math is merely a matter of logic and set theory.  That point of view has been ruinous to the practice of math.

with this comment:

I may be misreading your analysis of the second straw man, but you seem to imply that “people thinking philosophy of math is merely a matter of logic and set theory” has done great damage to mathematics. I think that’s quite an overstatement. It means that in practice, mathematicians find philosophy of mathematics to be irrelevant and useless. Perhaps philosophers of mathematics could in principle have something to say that mathematicians would find helpful but in practice they don’t; however, we’re getting along quite well without their help.

On the other hand, maybe you only meant that people who think “philosophy of math is merely a matter of logic and set theory” are handicapped in their own ability to do mathematics. Again, I think most mathematicians get along fine just not thinking about philosophy.

Mark is right that at least this aspect of philosophy of math is irrelevant and useless to mathematicians.  But my remark that the attitude that ”philosophy of math is merely a matter of logic and set theory” is ruinous to math was sloppy, it was not what I should have said.    I was thinking of a related phenomenon which was ruinous to math communication and teaching.

By the 1950′s many mathematicians adopted the attitude that all math is is theorem and proof.  Images, metaphors and the like were regarded as misleading and resulting in incorrect proofs.  (I am not going to get into how this attitude came about).     Teachers and colloquium lecturers suppressed intuitive insights and motivations in their talks and just stated the theorem and went through the proof.

I believe both expository and research papers were affected by this as well, but I would not be able to defend that with citations.

I was a math student 1959 through 1965.  My undergraduate calculus (and advanced calculus) teacher was a very good teacher but he was affected by this tendency.  He knew he had to give us intuitive insights but he would say things like “close the door” and “don’t tell anyone I said this” before he did.  His attitude seemed to be that that was not real math and was slightly shameful to talk about.  Most of my other undergrad teachers simply did not give us insights.

In graduate school I had courses in Lie Algebra and Mathematical Logic from the same teacher.   He was excellent at giving us theorem-proof lectures, much better than most teachers, but he never gave us any geometric insights into Lie Algebra (I never heard him say anything about differential equations!) or any idea of the significance of mathematical logic.  We went through Killing’s classification theorem and Gödel’s incompleteness theorem in a very thorough way and I came out of his courses pleased with my understanding of the subject matter.  But I had no idea what either one of them had to do with any other part of math.

I had another teacher for several courses in algebra and various levels of number theory.   He was not much for insights, metaphors, etc, but he did do well in explaining how you come up with a proof.  My teacher in point set topology was absolutely awful and turned me off the Moore Method forever.   The Moore method seems to be based on: don’t give the student any insights whatever. I have to say that one of my fellow students thought the Moore method was the best thing since sliced bread and went on to get a degree from this teacher.

These dismal years in math teaching lasted through the seventies and perhaps into the eighties.  Apparently now younger professors are much more into insights, images and metaphors and to some extent into pointing out connections with the rest of math and science.  Since I have been retired since 1999 I don’t have much exposure to the newer generation and I am not sure how thoroughly things have changed.

One noticeable phenomenon was that category theorists (I got into category theory in the mid seventies) were very assiduous in lectures and to some extent in papers in giving motivation and insight.  It may be that attitudes varied a lot between different disciplines.

This Dark Ages of math teaching was one of the motivations for abstractmath.org.  My belief is that not only should we give the students insights, images and metaphors to think about objects, and so on, but that we should be upfront about it:   Tell them what we are doing (don’t just mutter the word “intuitive”) and point out that these insights are necessary for understanding but are dangerous when used in proofs.  Tell them these things with examples. In every class.

My other main motivation for abstractmath.org was the way math language causes difficulties.  But that is another story.

Proofs without dry bones

I have discussed images, metaphors and proofs in math in two ways:

(A) A mathematical proof

A monk starts at dawn at the bottom of a mountain and goes up a path to the top, arriving there at dusk. The next morning at dawn he begins to go down the path, arriving at dusk at the place he started from on the previous day. Prove that there is a time of day at which he is at the same place on the path on both days.

Proof: Envision both events occurring on the same day, with a monk starting at the top and another starting at the bottom at the same time and doing the same thing the monk did on different days. They are on the same path, so they must meet each other. The time at which they meet is the time required.

This example comes from Fauconnier, Mappings in Thought and Language, Cambridge Univ. Press, 1997. I discuss it in the Handbook, pages 46 and 153. See the Wikipedia article on conceptual blending.

(B) Rigor and rigor mortis

The following is quoted from a previous post here. See also the discussion in abstractmath.

When we are trying to understand or explain math, we may use various kinds of images and metaphors about the subject matter to construct a colorful and rich representation of the mathematical objects and processes involved. I described some of these briefly here. They can involve thinking of abstract things moving and changing and affecting each other.

When we set out to prove some math statement, we go into what I have called “rigorous mode”. We feel that we have to forget some of the color and excitement of the rich view. We must think of math objects as inert and static. They don’t move or change over time and they don’t interact with other objects or the real world. In other words, pretend that all math objects are dead.

We don’t always go all the way into this rigorous mode, but if we use an image or metaphor in a proof and someone challenges us about it, we may rewrite that part to get rid of the colorful representation and replace it by a calculation or line of reasoning that refers to the math objects as if they were inert and static – dead.

I didn’t contradict myself.
I want to clear up some tension between these two ideas.

The argument in (A) is a genuine mathematical proof, just as it is written. It contains hidden assumptions (enthymemes), but all math proofs contain hidden assumptions. My remarks in (B) do not mean that a proof is not a proof until everything goes dead, but that when challenged you have to abandon some of the colorful and kinetic reasoning to make sure you have it right. (This is a standard mathematical technique (note 1).)

One of the hidden assumptions in (A) is that two monks walking the opposite way on the path over the same interval of time will meet each other. This is based on our physical experience. If someone questions this we have several ways to get more rigorous. One many mathematicians might think of is to model the path as a curve in space and consider two different parametrizations by the unit interval that go in opposite directions. This model can then appeal to the intermediate value theorem to assert that there is a point where the two parametrizations give the same value.

I suppose that argument goes all the way to the dead. In the original argument the monk is moving. But the parametrized curve just sits there. The parametrizations are sets of ordered pairs in R x (R x R x R). Nothing is moving. All is dry bones. Ezekiel has not done his thing yet.

This technique works, I think, because it allows classical logic to be correct. It is not correct in everyday life when things are moving and changing and time is passing.

Avoid models; axiomatize directly
But it certainly is not necessary to rigorize this argument by using parametrizations involving the real numbers. You could instead look at the situation of the monk and make some axioms the events being described. For example, you could presumably make axioms on locations on the path that treat the locations as intervals rather than as points.

The idea is to make axioms that state properties that intervals have but doesn’t say they are intervals. For example that there is a relation “higher than” between locations that must be reflexive and transitive but not antisymmetric. I have not done this, but I would propose that you could do this without recreating the classical real numbers by the axioms. (You would presumably be creating the intuitionistic real numbers.)

Of course, we commonly fall into using the real numbers because methods of modeling using real numbers have been worked out in great detail. Why start from scratch?

About the heading on this section: There is a sense in which “axiomatizing directly” is a way of creating a model. Nevertheless there is a distinction between these two approaches, but I am to confused to say anything about this right now.

First order logic.
It is commonly held that if you rigorize a proof enough you could get it all the way down to a proof in first order logic. You could do this in the case of the proof in (A) but there is a genuine problem in doing this that people don’t pay enough attention to.

The point is you replace the path and the monks by mathematical models (a curve in space) and their actions by parametrizations. The resulting argument calls on well known theorems in real analysis and I have no doubt can be turned into a strict first order logic argument. But the resulting argument is no longer about the monk on the path.

The argument in (A) involves our understanding of a possibly real physical situation along with a metaphorical transference in time of the two walks (a transference that takes place in our brain using techniques (conceptual blending) the brain uses every minute of every day). Changing over to using a mathematical model might get something wrong. Even if the argument using parametrized curves doesn’t have any important flaws (and I don’t believe it does) it is still transferring the argument from one situation to another.

Conclusion:
Mathematical arguments are still mathematical arguments whether they refer to mathematical objects or not. A mathematical argument can be challenged and tested by uncovering hidden assumptions and making them explicit as well as by transferring the argument to a classical mathematical situation.

Note 1. Did you ever hear anyone talking about rigor requiring making images and metaphors dead? This is indeed a standard mathematical technique but it is almost always suppressed, or more likely unnoticed. But I am not claiming to be the first one to reveal it to the world. Some of the members of Bourbaki talked this way. (I have lost the reference to this.)

They certainly killed more metaphors than most mathematicians.

Note 2. This discussion about rigor and dead things is itself a metaphor, so it involves a metametaphor. Metaphors always have something misleading about them. Metametaphorical statements have the potential of being far worse. For example, the notion that mathematics contains some kind of absolute truth is the result of bad metametaphorical thinking.