Tag Archives: theorem

Rigorous proofs

Rich and rigorous

When we try to understand a math statement, we visualize what the statement says using metaphors, images and kinetic feelings to feel how it is true, or to suggest that the statement is not true.

If we are convinced that it is true, we may then want to prove it. Doing that involves pitching out all the lovely pictures and metaphors and gestures and treating the mathematical objects involved in the proof as static and inert. “Static” means the object does not change. “Inert” means that it does not affect anything else. I am saying how we think about math objects for the purpose of rigorous proof. I am not saying anything about “what math objects are”.

In this post I give a detailed example of a proof of the rigorous sort.

Example

Informal statement

First, I’ll describe this example in typical spoken mathematical English. Suppose you suspect that the following statement is true:

Claim: Let $f(x)$ be a differentiable function with $f'(a)=0$.
Going from left to right, suppose the graph of $f(x)$ goes UP before $x$ reaches $a$ and then DOWN for $x$ to the right of $a$
Then $a$ has to be a local maximum of the function.

This claim is written in informal math English. Mathematicians talk like that a lot. In this example they will probably wave their hands around in swoops.

The language used is an attempt to get a feeling for the graph going up to $(a,f(a))$ and then falling away from it. It uses two different metaphors for $x\lt a$ and $x\gt a$. I suspect that most of us would want to clean that up a bit even in informal writing.

A more formal statement

Theorem: Let $f$ be a real valued differentiable function defined on an open interval $R$. Let $a$ be a number in $R$ for which $f'(a)=0$. Suppose that for all $x\in R$, $f$ increases for $x\lt a$ and decreases for $x\gt a$. Then $f(a)$ is a maximum of $f$ in $R$.

Proof

  1. By definition of derivative, \[\lim_{x\to a}\frac{f(x)-f(a)}{x-a}=0.\]
  2. By definition of limit, then for any positive $\epsilon$ there is a positive $\delta$ for which if $0\lt|x-a|\lt\delta$ then \[\left|\frac{f(x)-f(a)}{x-a}\right|\lt\epsilon.\]
  3. By requiring that $\delta\lt 1$, it follows from (2) that for any positive $\epsilon$, there is a positive $\delta$ for which if $0\lt|x-a|\lt\delta$, then $|f(x)-f(a)|\lt\epsilon$.
  4. “$f$ increases for $x\lt a$” means that if $x$ and $y$ are numbers in $R$ and $x\lt y\lt a$, then $f(x)\lt f(y)$.
  5. “$f$ decreases for $x\gt a$” means that if $x$ and $y$ are numbers in $R$ and $a\lt x\lt y$, then $f(x)\gt f(y)$.
  6. “$f(a)$ is a maximum of $f$ in $R$” means that for $x\in R$, if $x\neq a$, then $f(x)\lt f(a)$.
  7. Suppose that $x\in R$ and $x\lt a$. (The case that $x\gt a$ has a symmetric proof.)
  8. Given $\epsilon\gt0$ with $\delta$ as given by (3), choose $y\in R$ such that $x\lt y\lt a$ and $|f(y)-f(a)|\lt\epsilon$.
  9. By (4), $f(x)\lt f(y)$. So by (8), \[\begin{align*}
    f(x)-f(a)&=
    f(x)-f(y)+f(y)-f(a)\\ &\lt f(y)-f(a)\\ &\leq|f(y)-f(a)|\lt\epsilon\end{align*}\]
    so that $f(x)\lt f(a)+\epsilon$. By inserting “$-f(y)+f(y)$” into the second formula, I am “adding zero cleverly”, an example of pulling a rabbit out of a hat. Students hate that. But you have to live with it; as long as the statements following are correct, it makes a valid proof. Rabbit-out-of-a-hat doesn’t make a proof wrong, but it does make you wonder how the author thought of it. Live with it.
  10. Since (9) is true for all positive $\epsilon$, it follows that $f(x)\leq f(a)$.
  11. By the same argument as that leading up to (10), $f(\frac{x-a}{2})\leq f(a)$.
  12. Since $f(x)\lt f(\frac{x-a}{2})$, it follows that $f(x)\lt f(a)$ as required.

About the proof

This proof is intended to be a typical “rigorous” proof. I suspect it tends to be more rigorous than most mathematicians would find necessary,

Extensionality

The point about “rigor”, about insisting that the objects be static and inert, is that this causes symbols and expression to retain the same meaning throughout the text. This is one aspect of extensionality.

Of course, some of the symbols denote variables, or variable objects. This does not mean they are “varying”. I am taking this point of view: A variable refers to a math object but you don’t know what it is. Constraints such as $x\lt a$ rule out some possible values but don’t generally tell you exactly what $x$ is. There is more about this in Variable Objects

The idea in (6), for example, is that $y$ denotes a real number. You don’t know which number it is, but you do know some facts about it: $x\lt y\lt a$, $|f(y)\lt f(a)|\lt\epsilon$ and so on. Similarly you don’t know what function $f$ is, but you do know some facts about it: It is differentiable, for example, and $f'(a)=0$.

My statement that the variables aren’t “varying” means specifically that each unbound occurrence of the variable refers to the same value as any other occurrence, unless some intervening remark changes its meaning. For example, the references to $x$ in (7) through (10) refer to the same value it has in (6), and (10), in particular, constitutes a statement that the claim about $x$ is correct.

Checkability

The elimination of metaphors that lets the proof achieve rigor is part of a plan in the back of the mind of at least some mathematicians who write proofs. The idea is that the proof be totally checkable:

  • Every statement in the proof has a semantics, a meaning, that is invariant (given the remark about variables above).
  • Each statement is justified by some of the previous statements. This justification is given by two systems that the reader is supposed to understand.
  • One system is the rules of symbol manipulation that are applied to the symbolic expressions, ordinary algebra, and higher-level manipulations used in particular branches of math.
  • The other system consists of the rules of logical reasoning that justify the claims that each statement follows logically from preceding ones.
  • These two systems are really branches of one system, the entire system of math computation and reasoning. It can be obscure which system is being used in a particular step.

Suppression of reasons

The logical and symbolic-manipulation reasons justifying the deductions may not be made completely explicit. In fact, for many steps they may not be mentioned at all, and for others, one or two phrases may be used to give a hint. This is standard practice in writing “rigorous” proofs. That is a descriptive statement, made without criticism. Giving all the reasons is essentially impossible without a computer.

I am aware that some work has been done to write proof checkers that can read a theorem like the one we are considering, stated in natural language, and correctly implement the semantics I have described in this list. I don’t know of any references to such work and would appreciate information about it.

Suppression of reasons makes it difficult to mechanically check a proof written in this standard “rigorous” writing style. Basically, you must be at at least the graduate student level to be able to make sense of what is said, and even experienced math research people find it difficult to read a paper in a very different field. Writing the proof so that it can be checked by a proof checker requires understanding of the same sort, and it typically makes the proof much longer.

One hopeful new approach is to write the proofs using homotopy type theory. The pioneers in that field report that the proofs don’t expand nearly as much as is required by first order logic.

Examples of suppression

Here are many examples of suppression in the $\epsilon$-$\delta$ proof above. This is intended to raise your consciousness concerning how nearly opaque writing in math research is to anyone but the cognoscenti.

  • The first sentence of the theorem names $R$ and $f$ and puts constraints on them that can be used to justify statements in the proof. The naming of $R$ and $f$ requires that every occurrence of $R$ in the proof refers to the same mathematical object, and similarly for $f$.

Remark: The savvy reader “knows” the facts stated in (a), possibly entirely subconsciously. For many of us there is no conscious thought of constraints and permanence of naming. My goal is to convince those who teach beginning abstract math course to become conscious of these phenomena. This remark applies to all the following items as well.

  • The second sentence gives $a$ a specific meaning that will be maintained throughout the proof. It also puts constraints on $a$ and an additional constraint on $f$.
  • The third sentence gives a constraint on $R$, $f$ and $a$. It does not give a constraint on $x$, which is a bound variable. Nor does it name $x$ as a specific number with the same meaning in the rest of the proof. (That happens later).
  • The fact that the first three sentences impose constraints on various objects is signaled by the fact that the sentences are introduced by “let” and “suppose”. The savvy reader knows this.
  • The fourth sentence announces that “$f(a)$ is a maximum of $f$ in $R$” is a consequence of the constraints imposed by the preceding three sentences. (In other words, it follows from the context.) This is signaled by the word “then”.
  • The fact that the paragraph is labeled “Theorem” informs us that the fourth sentence is therefore a statement of what is to be proved, and that every constraint imposed by the first three sentences of the Theorem may be used in the proof.
  • In the proof, statements (1), (4), (5) and (6) rewrite the statements in the theorem according to the definitions of the words involved, namely “derivative: “increases”, “decreases” and “maximum”. Rewriting statements according to the definitions of the words involved is a fundamental method for starting a proof.
  • (2) follows from (1) by rewriting using the definition of “limit”. Note that pattern-matching against the definition of limit requires understanding that there is a zero inside the absolute value signs that is not written down. Could a computer proof-checker handle that?
  • (3) follows from (2). The reader or proof-checker must:
    • Know that it is acceptable to put an upper bound on $\delta$ in the definition of limit.
    • Notice that you can move $|x-a|$ out of the denominator because $x\neq a$ by (2).
  • The conclusion in (6) that we much show that $f(x)\lt f(a)$ is now the statement we must prove.

Remark: In the following items, I mention the context of the proof. I am using the word informally here. It is used in some forms of formal logic with a related but more precise meaning. The context consists of the variables you must hold in your head as you read each part of the proof, along with their current constraints. “Current” means the “now” that you are in when considering the step of the proof you are reading right now. I give some references at the end of the post.

  • At the point between (6) and (7), our context consists of $a$, $R$ and $f$ all subject to some constraints. $x$ is not yet in the context of our proof because its previous occurrences in the theorems and in (1) through (6) have been bound, mostly by an unexpressed universal quantifier. Now we are to think of $x$ as a specific number bound by some constraints.
  • The statement in (7) that the case $x\gt a$ as a symmetric proof is a much higher-level claim than the other steps in this proof, even though in fact it is not very high level compared to statements such as “An application of Serre’s spectral sequence shows$\ldots$”. Most mathematicians with even a little experience will read this statement and accept it in the confidence that they will know how to swap “$\lt$” and “$\gt$” in the proof in the correct way (which is a bit picky) to provide a dual proof. Some students might write out the dual proof to make sure they understand it (more likely because writing it out was a class assignment). I await the day that an automated proof checker can handle a statement like this.
  • (8) introduces three new math objects $\epsilon$, $\delta$ and $y$ subject to several constraints. The symbols occur earlier but they are all bound. $\epsilon$ will be fixed in our context from now until (10). The others don’t appear later.
  • (9) consists of several steps of algebraic computation. A cognoscent (I am tired of writing “savvy”) reader first looks at the computation as a whole and notices that it deduces that $|f(x)-f(a)|\lt\epsilon$, which is almost what is to be proved. This helps the reader understand the reason for the calculation. No mention whatever is made in this step of all this stuff that should go through your mind (or the proof-checker’s “mind”).
  • The computations in (9) are are basic algebra not explained step by step, except that the remark that $f(x)\lt f(y)$ explains how you get $f(x)-f(y)+f(y)-f(a) \lt f(y)-f(a)$.
  • (10) banishes $\epsilon$ from the context by universally quantifying over it. That $f(x)\leq f(a)$ follows by the garbage-dump-in-Star-Wars trick that often baffles first year analysis students: Since for all positive $\epsilon$, $f(x)\lt f(a)+\epsilon$, then $f(x)\leq f(a)$. (See also Terry Tao’s article in Tricks Wiki.)
  • (11) “By the same argument as leading up to (10)” puts some demands on the reader, who has to discover that you have to go back to (7) and do the following steps with a new context using a value of $x$ that is halfway closer to $a$ than the “old” $x$ was. This means in particular that the choice of $\frac{x-2}{2}$ is unnecessarily specific. But it works.
  • (12) suppresses the reference to (11).
  • References

    I have written extensively on these topics. Here are some links.

    Rich-rigorous bifurcation in math thinking

The symbolic language

Math English and the language of proofs

Proofs and context

Send to Kindle

A mathematical saga

This post outlines some of the intellectual developments in the history of math. I call it a saga because it is like one:

  • It is episodic, telling one story after another.
  • It does not try to give an accurate history, but its episodes resemble what happened in math in the last 3000 years.
  • It tells about only a few of the things that happened.

Early techniques

We represented numbers by symbols.

Thousands of years ago, we figured out how to write down words and phrases in such a way that someone much later could read and understand them.

Naturally, we wanted to keep records of the number of horses the Queen owned, so we came up with various notations for numbers (number representing count). In some societies, these symbols were separate from the symbols used to represent words.

We invented algorithms

We discovered positional notation. We write $213$, which is based on a system: it means $2\times100+1\times10+3\times 1$. This notation encapsulates a particular computation of a number (its base-10 representation). (The expression $190+23$ is another piece of notation that encapsulates a computation that yields $213$.)

Compare that to the Roman notation $CCXIII$, which is an only partly organized jumble.
Try adding $CCXIII+CDXXIX$. (The answer is $DCXLII$.)

Positional notation allowed us to create the straightforward method of addition involving adding single digits and carrying:
\[\overset{\hspace{6pt}1\phantom{1}}
{\frac
{\overset{\displaystyle{213}}{429}}{642}
}
\]
Measuring land requires multiplication, which positional notation also allows us to perform easily.
The invention of such algorithms (methodically manipulating symbols) made it easy to calculate with numbers.

Geometry: Direct construction of mathematical objects

We discovered geometry in ancient times, in laying out plots of land and designing buildings. We had a bunch of names for different shapes and for some of them we knew how to calculate their area, perimeter and other things.

Euclid showed how to construct new geometric figures from given ones using specific methods (ruler and compasses) that preserve some properties.

Example

We can bisect a line segment (black) by drawing two circles (blue) centered at the endpoints with radius the length of the line segment. We then construct a line segment (red) between the points of intersection of the circle that intersects the given line segment at its midpoint. These constructions can be thought of as algorithms creating and acting on geometric figures rather than on symbols.



It is true that diagrams were drawn to represent line segments, triangles and so on.
But the diagrams are visualization helpers. The way we think about the process is that we are operating directly on the geometric objects to create new ones. We are thinking of the objects Platonically, although we don’t have to share Plato’s exact concept of their reality. It is enough to say we are thinking about the objects as if they were real.

Axioms and theorems

Euclid came up with the idea that we should write down axioms that are true of these figures and constructions, so that we can systematically use the constructions
to prove theorems about figures using axioms and previously proved theorems. This provided documented reasoning (in natural language, not in symbols) for building up a collection of true statements about math objects.

Example

After creating some tools for proving triangles are congruent, we can prove the the intersection of red and black lines in the figure really is the midpoint of the black line by constructing the four green line segments below and making appeals to congruences between the triangles that show up:



Note that the green lines have the same length as the black line.

Euclid thought about axioms and theorems as applying to geometry, but he also proved theorems about numbers by representing them as ratios of line segments.

Algebra

People in ancient India and Greece knew how to solve linear and quadratic equations using verbal descriptions of what you should do.
Later, we started using a symbolic language to express numerical problems and symbolic manipulation to solve (some of) them.

Example

The quadratic formula is an encapsulated computation that provides the roots of a quadratic equation. Newton’s method is a procedure for finding a root of an arbitrary polynomial. It is recursive in the loose sense (it does not always give an answer).

The symbolic language is a vast expansion of the symbolic notation for numbers. A major innovations was to introduce variables to represent unknowns and to state equations that are always true.

Logic

Aristotle developed an early form of logic (syllogisms) aimed at determining which arguments in rhetoric were sound. “All men are mortal. Socrates is a man. Therefore Socrates is mortal.” This was written in sentences, not in symbols.

By explicit analogy with algebra, we introduced symbolism and manipulation rules for logical reasoning, with an eye toward making mathematical reasoning sound and to some extent computable. For example, in one dialect of logical notation, modus ponens (used in the Socrates syllogism) is expressed as $(P\rightarrow Q,\,P)\,\,\vdash\,\, Q$. This formula is an encapsulated algorithm: it says that if you know $P\rightarrow Q$ and $P$ are valid (are theorems) then $Q$ is valid as well.

Crises of understanding

We struggled with the notion of function as a result of dealing with infinite series. For example, the limit of a sequence of algebraic expressions may not be an algebraic expression. It would no longer do to think of a function as the same thing as an algebraic expression.

We realized that Euclid’s axioms for geometry lacked clarity. For example, as I understand it, the original version of his axioms didn’t imply that the two circles in the proof above had to intersect each other. There were other more subtle problems. Hilbert made a big effort to spell out the axioms in more detail.

We refined our understanding of logic by trying to deal with the mysteries of calculus, limits and spaces. An example is the difference between continuity and uniform continuity.
We also created infinitesimals, only to throw up our hands because we could not make a logic that fit them. Infinitesimals were temporarily replaced by the use of epsilon-delta methods.

We began to understand that there are different kinds of spaces. For example, there were other models of some of Euclid’s axioms than just Euclidean space, and some of those models showed that the parallel axiom is independent of the other axioms. And we became aware of many kinds of topological spaces and manifolds.

We started to investigate sets, in part because spaces have sets of points. Then we discovered that a perfectly innocent activity like considering the set of all sets resulted in an impossibility.

We were led to consider how to understand the Axiom of Choice from several upsetting discoveries. For example, the Banach-Tarski “paradox” implies that you can rearrange the points in a sphere of radius $1$ to make two spheres of radius $1$.

Mathematics adopts a new covenant… for awhile

These problems caused a kind of tightening up, or rigorizing.
For a period of time, less than a century, we settled into a standard way of practicing research mathematics called new math or modern math. Those names were used mostly by math educators. Research mathematicians might have called it axiomatic math based on set theory. Although I was around for the last part of that period I was not aware of any professional mathematicians calling it anything at all; it was just what we did.

First, we would come up with a new concept, type of math object, or a new theorem. In this creative process we would freely use intuition, metaphors, images and analogies.

Example

We might come up with the idea that a function reaches its maximum when its graph swoops up from the left, then goes horizontally for an infinitesimal amount of time, then swoops down to the right. The point at which it was going horizontally would obviously have to be the maximum.

But when we came to publish a paper about the subject, all these pictures would disappear. All our visual, metaphorical/conceptual and kinetic feelings that explain the phenomenon would have to be suppressed.

Rigorizing consisted of a set of practices, which I will hint at:

Orthodox behavior among mathematicians in 1950

Definition in terms of sets and axioms

Each mathematical object had to be defined in some way that started with a set and some other data defined in terms of the set. Axioms were imposed on these data. Everything had to be defined in terms of sets, including functions and relations. (Multiple sets were used occasionally.)

Definitions done in this way omit a lot of the intuition that we have concerning the object being defined.

Examples
  • The definition of group as a set with a binary operation satisfying some particular axioms does not tell you that groups constitute the essence of symmetry.
  • The definitions of equivalence relation and of partition do not even hint that they define the same concept.

Even so, definitions done in this way have an advantage: They tend to be close to minimal in the sense that to verify that something fits the definition requires checking no more (or not much more) than necessary.

Proofs had to be clearly translatable into symbolic logic

First order logic (and other sorts of logic) were well developed and proofs were written in a way that they could in principle be reduced to arguments written in the notation of symbolic logic and following the rules of inference of logic. This resulted in proofs which did not appeal to intuition, metaphors or pictures.

Example

In the case of the theorem that the maximum of a (differentiable) function occurs only where the derivative is zero, that meant epsilon-delta proofs in which the proof appeared as a thick string of symbols. Here, “thick” means it had superscripts, subscripts, and other things that gave the string a fractal dimension of about $1.2$ (just guessing!).

Example

When I was a student at Oberlin College in 1959, Fuzzy Vance (Elbridge P. Vance) would sometimes stop in the middle of an epsilon-delta proof and draw pictures and provide intuition. Before he started that he would say “Shut the door, don’t tell anyone”. (But he told us!)

Example

A more famous example of this is the story that Oscar Zariski, when presenting a proof in algebraic geometry at the board, would sometimes remind himself of a part of a proof by hunching over the board so the students couldn’t see what he was doing and drawing a diagram which he would immediately erase. (I fault him for not telling them about the diagram.)

It doesn’t matter whether this story is true or not. It is true in the sense that any good myth is true.

Commercial

I wrote about rigor in these articles:

Rigorous view in abstractmath.org.

Dry bones, post in this blog.

Logic and sets clarify but get in the way

The orthodox method of “define it by sets and axioms” and “makes proofs at least resemble first order logic” clarified a lot of suspect proofs. But it got in the way of intuition and excessive teaching by using proofs made it harder to students to learn.

  • The definition of a concept can make you think of things that are foreign to your intuition of the concept. A function is a mapping,. The ordered pairs are a secondary construction; you should not think of ordered pairs as essential to your intuition. Even so the definition of function in terms of ordered pairs got rid of a lot of cobwebs.
  • The cartesian product of sets is obviously an associative binary operation. Except that if you define the cartesian product of sets in terms of ordered pairs then it is not associative.
  • Not only that, but if you define the ordered pair $(a,b)$ as $\{\{a,b\},a\}$ the you have to say that $a$ is an element of $(a,b)$ but $b$ is not That is not merely an inconvenient definition of ordered pair, it is wrong. It is not bad way to show that the concept of ordered pair is consistent with ZF set theory, but that is a minor point mathematicians hardly ever worry about.

Mathematical methods applied to everything

The early methods described at the beginning of this post began to be used everywhere in math.

Algorithms on symbols

Algorithms, or methodical procedures, began with the addition and multiplication algorithms and Euclid’s ruler and compass constructions, but they began to be used everywhere.

They are applied to the symbols of math, for example to describe rules for calculating derivatives and integrals and for summing infinite series.

Algorithms are used on strings, arrays and diagrams of math symbols, for example concatenating lists, multiplying matrices, and calculating binary operations on trees.

Algorithms as definitions

Algorithms are used to define the strings that make up the notation of symbolic logic. Such definitions include something like: “If $E$ and $F$ are expressions than $(E)\land (F)$ and $(\forall x)(E)$ are expressions”. So if $E$ is “$x\geq 3$” then $(\forall x)(x\geq 3)$ is an expression. This had the effect of turning an expression in symbolic logic into a mathematical object. Deduction rules such as “$E\land F\vdash E$” also become mathematical objects in this way.

We can define the symbols and expressions of algebra, calculus, and other part of math using algorithms, too. This became a big deal when computer algebra programs such as Mathematica came in.

Example

You can define the set $RP$ of real polynomials this way:

  • $0\in RP$
  • If $p\in RP$ then $p+r x^n\in RP$, where $x$ is a variable and $n$ a nonnegative integer.

That is a recursive definition. You can also define polynomials by pattern recognition:

Let $n$ be a positive integer, $a_0,\,a_1\,\ldots a_n$ be real numbers and $k_0,\,k_1\,\ldots k_n$ be nonnegative integers. Then $a_0 x^{k_0}+a_1 x^{k_1}+\ldots+ a_n x^{k_n}$ is a polynomial.

The recursive version is a way of letting a compiler discover that a string of symbols is a polynomial. That sort of thing became a Big Deal when computers arrived in our world.

Algorithms on mathematical objects

I am using the word “algorithm” in a loose sense to mean any computation that may or may not give a result. Computer programs are algorithms, but so is the quadratic formula. You might not think of a formula as an algorithm, but that is because if you use it in a computer program you just type in the formula; the language compiler has a built-in algorithm to execute calculations given by formulas.

It has not been clearly understood that mathematicians apply algorithms not only to symbols, but also directly to mathematical objects. Socrates thought that way long ago, as I described in the construction of a midpoint above. The procedure says “draw circles with center at the endpoints of the line segment.” It doesn’t say “draw pictures of circles…”

In the last section and this one, I am talking about how we think of applying an algorithm. Socrates thought he was talking about ideal lines and circles that exist in some other universe that we can access by thought. We can think about them as real things without making a metaphysical claim like Socrates did about them. Our brains are wired to think of abstract ideas in some many of the same ways we think about physical objects.

Example

The unit circle (as a topological space at least) is the quotient space of the space $\mathbb{R}$ of real numbers mod the equivalence relation defined by: $x\sim y$ if and only if $x-y$ is an integer.

Mathematicians who understand that construction may have various images in their mind when they read this. One would be something like imagining the real line $\mathbb{R}$ and then gluing all the points together that are an integer apart. This is a distinctly dizzying thing to think about but mathematicians aren’t worried because they know that taking the quotient of a space is a well-understood construction that works. They might check that by imagining the unit circle as the real line wrapped around an infinite number of times, with points an integer apart corresponding to the same point on the unit circle. (When I did that check I hastily inserted the parenthetical remark saying “as a topological space” because I realized the construction doesn’t preserve the metric.) The point of this paragraph is that many mathematicians think of this construction as a construction on math objects, not a construction on symbols.

Everything is a mathematical object

A lot of concepts start out as semi-vague ideas and eventually get defined as mathematical objects.

Examples

  • A function was originally thought of as a formula, but then get formalized in the days of orthodoxy as a set of ordered pairs with the functional property.
  • The concept of equation has been turned into a math object many times, for example in universal algebra and in logic. I suspect that some practitioners in those fields might disagree with me. This requires further research.
  • Propositions are turned into math objects by Boolean Algebra.
  • Perhaps numbers were always thought of as math objects, but much later the set $\mathbb{N}$ of all natural numbers and the set $\mathbb{R}$ of all real numbers came to be thought of explicitly as math objects, causing some mathematicians to have hissy fits.
  • Definitions are math objects. This has been done in various ways. A particular theory is a mathematical object, and it is essentially a definition by definition (!): Its models are what the theory defines. A particular example of “theory” is first-order theory which was the gold standard in the orthodox era. A classifying topos is also a math object that is essentially a definition.

Category Theory

The introduction of categories broke the orthodoxy of everything-is-a-set. It has become widely used as a language in many branches of math. It started with problems in homological algebra arising in at least these two ways:

  • Homotopy classes of continuous functions are not functions in the set theory sense. So we axiomatized the concept of function as an arrow (morphism) in a category.
  • The concept of mathematical object is axiomatized as an object in a category. This forces all properties of an object to be expressed in terms of its external relations with other objects and arrows.
  • Categories capture the idea of “kind of math”. There is a category of groups and homomorphisms, a category of topological spaces and homeomorphisms, and so on. This is a new level of abstraction. Before, if someone said “I work in finite groups”, their field was a clear idea and people knew what they were talking about, but now the category of finite groups is a mathematical object.
  • Homology maps one kind of math (topology) into another kind (algebra). Since categories capture the general notion of “kind of math”, we invented the idea of functor to capture the idea of modeling or representing one branch of math in another one. So Homology became a mathematical object.
  • The concept of functor allowed the definition of natural transformation as a mathematical object. Before categories, naturality was only an informal idea.

Advantages of category theory

  • Categories, in the form of toposes, quickly became candidates to replace set theory as a foundation system for math. They are more flexible and allow the kind of logic you want to use (classical, intuitionistic and others) to be a parameter in your foundational system.
  • “Arrow” (morphism) replaced not only the concept of function but also the concept of “element of” (“$\in$”). It allows the concept of variable elements. (This link is to a draft of a section of abstractmath.org that has not been included in the table of contents yet.) It also requires that an “element” has to be an element of one single object; for example, the maps $1\to \mathbb{N}$ and $1\to \mathbb{R}$ that pick out the number $42$ are not the same maps, although of course they are related by the canonical inclusion map $\mathbb{N}\to\mathbb{R}$.
  • Diagrams are used in proofs and give much better immediate understanding than formulas written in strings, which compress a lot of things unnecessarily into thick strings that require understanding lots of conventions and holding things in your memory.
  • Categories-as-kinds-of-math makes it easy to turn an analogy, for example between products of groups and products of spaces, into two examples of the same kind of mathematical object: Namely, a product in a category.

Disadvantages of category theory

  • Category theory requires a new way of thinking. Some people think that is a disadvantage. But genuine innovation is always disruptive. New technology breaks old technology. Of course, the new technology has to turn out to be useful to win out.
  • Category theory has several notions of “equal”. Objects can be the same or isomorphic. Categories can be isomorphic or equivalent. When you are doing category theory, you should never worry about whether two objects are equal: that is considered evil. Category theorists generally ignored the fuzziness of this problem because you can generally get away with it. Still, it was an example of something that had not been turned into a mathematical definition. Two ways of accomplishing this are anafunctors and homotopy type theory.

I expect to write about homotopy type theory soon. It may be the Next Revolution.

Send to Kindle