Tag Archives: geometry

A mathematical saga

This post outlines some of the intellectual developments in the history of math. I call it a saga because it is like one:

  • It is episodic, telling one story after another.
  • It does not try to give an accurate history, but its episodes resemble what happened in math in the last 3000 years.
  • It tells about only a few of the things that happened.

Early techniques

We represented numbers by symbols.

Thousands of years ago, we figured out how to write down words and phrases in such a way that someone much later could read and understand them.

Naturally, we wanted to keep records of the number of horses the Queen owned, so we came up with various notations for numbers (number representing count). In some societies, these symbols were separate from the symbols used to represent words.

We invented algorithms

We discovered positional notation. We write $213$, which is based on a system: it means $2\times100+1\times10+3\times 1$. This notation encapsulates a particular computation of a number (its base-10 representation). (The expression $190+23$ is another piece of notation that encapsulates a computation that yields $213$.)

Compare that to the Roman notation $CCXIII$, which is an only partly organized jumble.
Try adding $CCXIII+CDXXIX$. (The answer is $DCXLII$.)

Positional notation allowed us to create the straightforward method of addition involving adding single digits and carrying:
\[\overset{\hspace{6pt}1\phantom{1}}
{\frac
{\overset{\displaystyle{213}}{429}}{642}
}
\]
Measuring land requires multiplication, which positional notation also allows us to perform easily.
The invention of such algorithms (methodically manipulating symbols) made it easy to calculate with numbers.

Geometry: Direct construction of mathematical objects

We discovered geometry in ancient times, in laying out plots of land and designing buildings. We had a bunch of names for different shapes and for some of them we knew how to calculate their area, perimeter and other things.

Euclid showed how to construct new geometric figures from given ones using specific methods (ruler and compasses) that preserve some properties.

Example

We can bisect a line segment (black) by drawing two circles (blue) centered at the endpoints with radius the length of the line segment. We then construct a line segment (red) between the points of intersection of the circle that intersects the given line segment at its midpoint. These constructions can be thought of as algorithms creating and acting on geometric figures rather than on symbols.



It is true that diagrams were drawn to represent line segments, triangles and so on.
But the diagrams are visualization helpers. The way we think about the process is that we are operating directly on the geometric objects to create new ones. We are thinking of the objects Platonically, although we don’t have to share Plato’s exact concept of their reality. It is enough to say we are thinking about the objects as if they were real.

Axioms and theorems

Euclid came up with the idea that we should write down axioms that are true of these figures and constructions, so that we can systematically use the constructions
to prove theorems about figures using axioms and previously proved theorems. This provided documented reasoning (in natural language, not in symbols) for building up a collection of true statements about math objects.

Example

After creating some tools for proving triangles are congruent, we can prove the the intersection of red and black lines in the figure really is the midpoint of the black line by constructing the four green line segments below and making appeals to congruences between the triangles that show up:



Note that the green lines have the same length as the black line.

Euclid thought about axioms and theorems as applying to geometry, but he also proved theorems about numbers by representing them as ratios of line segments.

Algebra

People in ancient India and Greece knew how to solve linear and quadratic equations using verbal descriptions of what you should do.
Later, we started using a symbolic language to express numerical problems and symbolic manipulation to solve (some of) them.

Example

The quadratic formula is an encapsulated computation that provides the roots of a quadratic equation. Newton’s method is a procedure for finding a root of an arbitrary polynomial. It is recursive in the loose sense (it does not always give an answer).

The symbolic language is a vast expansion of the symbolic notation for numbers. A major innovations was to introduce variables to represent unknowns and to state equations that are always true.

Logic

Aristotle developed an early form of logic (syllogisms) aimed at determining which arguments in rhetoric were sound. “All men are mortal. Socrates is a man. Therefore Socrates is mortal.” This was written in sentences, not in symbols.

By explicit analogy with algebra, we introduced symbolism and manipulation rules for logical reasoning, with an eye toward making mathematical reasoning sound and to some extent computable. For example, in one dialect of logical notation, modus ponens (used in the Socrates syllogism) is expressed as $(P\rightarrow Q,\,P)\,\,\vdash\,\, Q$. This formula is an encapsulated algorithm: it says that if you know $P\rightarrow Q$ and $P$ are valid (are theorems) then $Q$ is valid as well.

Crises of understanding

We struggled with the notion of function as a result of dealing with infinite series. For example, the limit of a sequence of algebraic expressions may not be an algebraic expression. It would no longer do to think of a function as the same thing as an algebraic expression.

We realized that Euclid’s axioms for geometry lacked clarity. For example, as I understand it, the original version of his axioms didn’t imply that the two circles in the proof above had to intersect each other. There were other more subtle problems. Hilbert made a big effort to spell out the axioms in more detail.

We refined our understanding of logic by trying to deal with the mysteries of calculus, limits and spaces. An example is the difference between continuity and uniform continuity.
We also created infinitesimals, only to throw up our hands because we could not make a logic that fit them. Infinitesimals were temporarily replaced by the use of epsilon-delta methods.

We began to understand that there are different kinds of spaces. For example, there were other models of some of Euclid’s axioms than just Euclidean space, and some of those models showed that the parallel axiom is independent of the other axioms. And we became aware of many kinds of topological spaces and manifolds.

We started to investigate sets, in part because spaces have sets of points. Then we discovered that a perfectly innocent activity like considering the set of all sets resulted in an impossibility.

We were led to consider how to understand the Axiom of Choice from several upsetting discoveries. For example, the Banach-Tarski “paradox” implies that you can rearrange the points in a sphere of radius $1$ to make two spheres of radius $1$.

Mathematics adopts a new covenant… for awhile

These problems caused a kind of tightening up, or rigorizing.
For a period of time, less than a century, we settled into a standard way of practicing research mathematics called new math or modern math. Those names were used mostly by math educators. Research mathematicians might have called it axiomatic math based on set theory. Although I was around for the last part of that period I was not aware of any professional mathematicians calling it anything at all; it was just what we did.

First, we would come up with a new concept, type of math object, or a new theorem. In this creative process we would freely use intuition, metaphors, images and analogies.

Example

We might come up with the idea that a function reaches its maximum when its graph swoops up from the left, then goes horizontally for an infinitesimal amount of time, then swoops down to the right. The point at which it was going horizontally would obviously have to be the maximum.

But when we came to publish a paper about the subject, all these pictures would disappear. All our visual, metaphorical/conceptual and kinetic feelings that explain the phenomenon would have to be suppressed.

Rigorizing consisted of a set of practices, which I will hint at:

Orthodox behavior among mathematicians in 1950

Definition in terms of sets and axioms

Each mathematical object had to be defined in some way that started with a set and some other data defined in terms of the set. Axioms were imposed on these data. Everything had to be defined in terms of sets, including functions and relations. (Multiple sets were used occasionally.)

Definitions done in this way omit a lot of the intuition that we have concerning the object being defined.

Examples
  • The definition of group as a set with a binary operation satisfying some particular axioms does not tell you that groups constitute the essence of symmetry.
  • The definitions of equivalence relation and of partition do not even hint that they define the same concept.

Even so, definitions done in this way have an advantage: They tend to be close to minimal in the sense that to verify that something fits the definition requires checking no more (or not much more) than necessary.

Proofs had to be clearly translatable into symbolic logic

First order logic (and other sorts of logic) were well developed and proofs were written in a way that they could in principle be reduced to arguments written in the notation of symbolic logic and following the rules of inference of logic. This resulted in proofs which did not appeal to intuition, metaphors or pictures.

Example

In the case of the theorem that the maximum of a (differentiable) function occurs only where the derivative is zero, that meant epsilon-delta proofs in which the proof appeared as a thick string of symbols. Here, “thick” means it had superscripts, subscripts, and other things that gave the string a fractal dimension of about $1.2$ (just guessing!).

Example

When I was a student at Oberlin College in 1959, Fuzzy Vance (Elbridge P. Vance) would sometimes stop in the middle of an epsilon-delta proof and draw pictures and provide intuition. Before he started that he would say “Shut the door, don’t tell anyone”. (But he told us!)

Example

A more famous example of this is the story that Oscar Zariski, when presenting a proof in algebraic geometry at the board, would sometimes remind himself of a part of a proof by hunching over the board so the students couldn’t see what he was doing and drawing a diagram which he would immediately erase. (I fault him for not telling them about the diagram.)

It doesn’t matter whether this story is true or not. It is true in the sense that any good myth is true.

Commercial

I wrote about rigor in these articles:

Rigorous view in abstractmath.org.

Dry bones, post in this blog.

Logic and sets clarify but get in the way

The orthodox method of “define it by sets and axioms” and “makes proofs at least resemble first order logic” clarified a lot of suspect proofs. But it got in the way of intuition and excessive teaching by using proofs made it harder to students to learn.

  • The definition of a concept can make you think of things that are foreign to your intuition of the concept. A function is a mapping,. The ordered pairs are a secondary construction; you should not think of ordered pairs as essential to your intuition. Even so the definition of function in terms of ordered pairs got rid of a lot of cobwebs.
  • The cartesian product of sets is obviously an associative binary operation. Except that if you define the cartesian product of sets in terms of ordered pairs then it is not associative.
  • Not only that, but if you define the ordered pair $(a,b)$ as $\{\{a,b\},a\}$ the you have to say that $a$ is an element of $(a,b)$ but $b$ is not That is not merely an inconvenient definition of ordered pair, it is wrong. It is not bad way to show that the concept of ordered pair is consistent with ZF set theory, but that is a minor point mathematicians hardly ever worry about.

Mathematical methods applied to everything

The early methods described at the beginning of this post began to be used everywhere in math.

Algorithms on symbols

Algorithms, or methodical procedures, began with the addition and multiplication algorithms and Euclid’s ruler and compass constructions, but they began to be used everywhere.

They are applied to the symbols of math, for example to describe rules for calculating derivatives and integrals and for summing infinite series.

Algorithms are used on strings, arrays and diagrams of math symbols, for example concatenating lists, multiplying matrices, and calculating binary operations on trees.

Algorithms as definitions

Algorithms are used to define the strings that make up the notation of symbolic logic. Such definitions include something like: “If $E$ and $F$ are expressions than $(E)\land (F)$ and $(\forall x)(E)$ are expressions”. So if $E$ is “$x\geq 3$” then $(\forall x)(x\geq 3)$ is an expression. This had the effect of turning an expression in symbolic logic into a mathematical object. Deduction rules such as “$E\land F\vdash E$” also become mathematical objects in this way.

We can define the symbols and expressions of algebra, calculus, and other part of math using algorithms, too. This became a big deal when computer algebra programs such as Mathematica came in.

Example

You can define the set $RP$ of real polynomials this way:

  • $0\in RP$
  • If $p\in RP$ then $p+r x^n\in RP$, where $x$ is a variable and $n$ a nonnegative integer.

That is a recursive definition. You can also define polynomials by pattern recognition:

Let $n$ be a positive integer, $a_0,\,a_1\,\ldots a_n$ be real numbers and $k_0,\,k_1\,\ldots k_n$ be nonnegative integers. Then $a_0 x^{k_0}+a_1 x^{k_1}+\ldots+ a_n x^{k_n}$ is a polynomial.

The recursive version is a way of letting a compiler discover that a string of symbols is a polynomial. That sort of thing became a Big Deal when computers arrived in our world.

Algorithms on mathematical objects

I am using the word “algorithm” in a loose sense to mean any computation that may or may not give a result. Computer programs are algorithms, but so is the quadratic formula. You might not think of a formula as an algorithm, but that is because if you use it in a computer program you just type in the formula; the language compiler has a built-in algorithm to execute calculations given by formulas.

It has not been clearly understood that mathematicians apply algorithms not only to symbols, but also directly to mathematical objects. Socrates thought that way long ago, as I described in the construction of a midpoint above. The procedure says “draw circles with center at the endpoints of the line segment.” It doesn’t say “draw pictures of circles…”

In the last section and this one, I am talking about how we think of applying an algorithm. Socrates thought he was talking about ideal lines and circles that exist in some other universe that we can access by thought. We can think about them as real things without making a metaphysical claim like Socrates did about them. Our brains are wired to think of abstract ideas in some many of the same ways we think about physical objects.

Example

The unit circle (as a topological space at least) is the quotient space of the space $\mathbb{R}$ of real numbers mod the equivalence relation defined by: $x\sim y$ if and only if $x-y$ is an integer.

Mathematicians who understand that construction may have various images in their mind when they read this. One would be something like imagining the real line $\mathbb{R}$ and then gluing all the points together that are an integer apart. This is a distinctly dizzying thing to think about but mathematicians aren’t worried because they know that taking the quotient of a space is a well-understood construction that works. They might check that by imagining the unit circle as the real line wrapped around an infinite number of times, with points an integer apart corresponding to the same point on the unit circle. (When I did that check I hastily inserted the parenthetical remark saying “as a topological space” because I realized the construction doesn’t preserve the metric.) The point of this paragraph is that many mathematicians think of this construction as a construction on math objects, not a construction on symbols.

Everything is a mathematical object

A lot of concepts start out as semi-vague ideas and eventually get defined as mathematical objects.

Examples

  • A function was originally thought of as a formula, but then get formalized in the days of orthodoxy as a set of ordered pairs with the functional property.
  • The concept of equation has been turned into a math object many times, for example in universal algebra and in logic. I suspect that some practitioners in those fields might disagree with me. This requires further research.
  • Propositions are turned into math objects by Boolean Algebra.
  • Perhaps numbers were always thought of as math objects, but much later the set $\mathbb{N}$ of all natural numbers and the set $\mathbb{R}$ of all real numbers came to be thought of explicitly as math objects, causing some mathematicians to have hissy fits.
  • Definitions are math objects. This has been done in various ways. A particular theory is a mathematical object, and it is essentially a definition by definition (!): Its models are what the theory defines. A particular example of “theory” is first-order theory which was the gold standard in the orthodox era. A classifying topos is also a math object that is essentially a definition.

Category Theory

The introduction of categories broke the orthodoxy of everything-is-a-set. It has become widely used as a language in many branches of math. It started with problems in homological algebra arising in at least these two ways:

  • Homotopy classes of continuous functions are not functions in the set theory sense. So we axiomatized the concept of function as an arrow (morphism) in a category.
  • The concept of mathematical object is axiomatized as an object in a category. This forces all properties of an object to be expressed in terms of its external relations with other objects and arrows.
  • Categories capture the idea of “kind of math”. There is a category of groups and homomorphisms, a category of topological spaces and homeomorphisms, and so on. This is a new level of abstraction. Before, if someone said “I work in finite groups”, their field was a clear idea and people knew what they were talking about, but now the category of finite groups is a mathematical object.
  • Homology maps one kind of math (topology) into another kind (algebra). Since categories capture the general notion of “kind of math”, we invented the idea of functor to capture the idea of modeling or representing one branch of math in another one. So Homology became a mathematical object.
  • The concept of functor allowed the definition of natural transformation as a mathematical object. Before categories, naturality was only an informal idea.

Advantages of category theory

  • Categories, in the form of toposes, quickly became candidates to replace set theory as a foundation system for math. They are more flexible and allow the kind of logic you want to use (classical, intuitionistic and others) to be a parameter in your foundational system.
  • “Arrow” (morphism) replaced not only the concept of function but also the concept of “element of” (“$\in$”). It allows the concept of variable elements. (This link is to a draft of a section of abstractmath.org that has not been included in the table of contents yet.) It also requires that an “element” has to be an element of one single object; for example, the maps $1\to \mathbb{N}$ and $1\to \mathbb{R}$ that pick out the number $42$ are not the same maps, although of course they are related by the canonical inclusion map $\mathbb{N}\to\mathbb{R}$.
  • Diagrams are used in proofs and give much better immediate understanding than formulas written in strings, which compress a lot of things unnecessarily into thick strings that require understanding lots of conventions and holding things in your memory.
  • Categories-as-kinds-of-math makes it easy to turn an analogy, for example between products of groups and products of spaces, into two examples of the same kind of mathematical object: Namely, a product in a category.

Disadvantages of category theory

  • Category theory requires a new way of thinking. Some people think that is a disadvantage. But genuine innovation is always disruptive. New technology breaks old technology. Of course, the new technology has to turn out to be useful to win out.
  • Category theory has several notions of “equal”. Objects can be the same or isomorphic. Categories can be isomorphic or equivalent. When you are doing category theory, you should never worry about whether two objects are equal: that is considered evil. Category theorists generally ignored the fuzziness of this problem because you can generally get away with it. Still, it was an example of something that had not been turned into a mathematical definition. Two ways of accomplishing this are anafunctors and homotopy type theory.

I expect to write about homotopy type theory soon. It may be the Next Revolution.

Send to Kindle

Semantics of algebra I

Note: This post uses MathJax. If you see mathematical formulas with dollar signs around them, or badly formatted formulas, try refreshing the screen. Sometimes you have to do it two or three times.

In the post Algebra is a difficult foreign language  I listed some of the difficulties of the syntax of the symbolic language of math (which includes high school algebra and precalculus).  The semantics causes difficulties as well.  Again I will list some examples without any attempt at completeness.

The status of the symbolic language as a language

There is a sharp distinction between the symbolic language of math and mathematical English, which I have written about in The languages of math and in the Handbook of mathematical discourse. Other authors do not make this sharp distinction (see the list of references at the end of this post). The symbolic language occurs embedded in mathematical English and the embedding has its own semantics which may cause great difficulty for students.

The symbolic language of math can be described as a natural formal language. Pieces of it were invented by mathematicians and others over the course of the last several hundred years. Individual pieces (notation such as "$3x+1=2y$") can be given a strictly formal syntax, but the whole system is ambiguous, inconsistent, and context-sensitive.  When you get to the research level, it has many dialects: Research mathematicians in one field may not be able to read research articles in a very different field.

Examples

I think the examples below will make these claims plausible.  This should be the subject of deep research.

Superscripts and functions

  • A superscript, as in $5^2$ or $x^3$, has a pretty standard meaning denoting a power, at least until you get to higher level stuff such as tensors.  
  • A function can be denoted by a letter, symbol, or string, and the notation $f(x)$ refers to the value of the function at input $x$.  

For functions defined on numbers, it is common in precalculus and higher to write $f^2(x)$ to denoted $(f(x))^2=f(x)\,f(x)$.  Since the value of certain multiletter functions are commonly written without the parentheses (for example, $\sin\,x$), one writes $\sin^2x$ to mean $(\sin\,x)^2$.

The notation $f^n$ is also widely used to mean the $n$th iterate of $f$ (if it exists), so $f^3(x)=f(f(f(x)))$ and so on.  This leads naturally to writing $f^{-1}(x)$ for the inverse function of $f$; this is common notation whether the function $f$ is bijective or not (in which case $f^{-1}$ is set-valued).  Thus $\sin^{-1}x$ means $\arcsin\,x$.

It is notorious that words in mathematical English have different meanings in different texts.  This is an example in the symbolic language (and not just at the research level) of a systematic construction that can give expressions that have ambiguous meanings.

This phenomenon is an example of why I say the symbolic language of math is a natural formal language: I have described a natural extension of notation used with multiplication of values that has been extended to being used for the binary operation of composition.  And that leads to students thinking that $\sin^{-1}x$ means $\frac{1}{\sin\,x}$. 

History can overtake notation, too: Mathematicians probably took to writing $\sin\,x$ instead of $\sin(x)$ because it saves writing.  That was not very misleading in the old days when mathematical variables were always single symbols.  But students see multiletter variable names all the time these days (in programming languages, Excel and elsewhere), so of course some of them think $\sin\,x$ means $\sin$ times $x$. People who do this are not idiots.

Juxtaposition

Juxtaposition of two symbols means many different things.

  • If $m$ and $n$ are numbers, $mn$ denotes the product of the two numbers.
    • Multiplication is commutative, so $mn$ and $nm$ denote the same number, but they correspond to different calculations.  
  • If $M$ and $N$ are matrices, $MN$ denotes the matrix product of the two matrices.
    • This is a binary operation but it is not the same operation denoted by juxtaposition of numbers. (In fact it involves both addition and multiplication of numbers.)
    • Now $MN$ may not be the same matrix as $NM$.
  • If $A$ and $B$ are points in a geometric drawing, $AB$ denotes the line segment from $A$ to $B$.
    • This is a function of two variables denoting points whose value is a line segment.  
    • It is not what is usually called a binary operation, although as an opinionated category theorist I would call it a multisorted binary operation.
    • It is commutative, but it doesn't make sense to ask if it is associative.

This phenomenon is called overloaded notation.  

  • In order to understand the meaning of the juxtaposition of symbols, you have to know the type of the variables.
  • The surrounding text may tell you specifically the variables denote matrices or whatever. So this is an instance of context-sensitive semantics. 
    • Students tend to expect that they know what any formula means in isolation from the text.  It may make them very sad to discover that this doesn't work — once they believe it, which can take quite a while.
  • In many cases the problem is alleviated by the use of convention.
    • Matrices are usually denoted by capital letters, numbers by lower case letters.
    • But points in geometry are usually denoted by capital letters too.  So you have to know that referring to a geometric diagram is significant to understanding the notation. This is an indirect form of context-sensitivity.  Did any teacher every point this out to students?  Does it appear anywhere in print?

The earlier example of $\sin^{-1}x$ is a case which is not context-sensitive.  Knowing the types of the variables won't help.  Of course, if the author explains which meaning is meant, that explanation is within the context of the book!  That is not a lot of help for grasshoppers like me that look back and forth at different parts of a math book instead of reading it straight through..  

Equations

Consider the expressions

  1. $x^2-5x+4=0$
  2. $x^2+y^2=1$
  3. $x^2+2x+1=(x+1)^2$

They are assertions that two expressions have the same value. A strictly logical view of an equation containing variables is that it puts a constraint on the variables.  It is true of some numbers (or pairs of numbers) and false of others.  That is the defining property of an equation. Equation 1 requires that $x=1$ or $x=4$.  Equation 2 imposes a constraint which is satisfied by uncountably many pairs of real numbers, and is also not true of uncountably many pairs. But equation 3 puts no constraint on the variable.  It is true of every number $x$.

A strictly logical view of symbolic notation does math a disservice.  Here, the notion that an equation is by definition a symbolic statement that has a truth set and a falsity set may be correct but it is not the important thing about any particular equation. When we read and do math we have many different metaphors and images about a concept.  The definition of a kind of object is often in terms of things that may not be the most important things to know about it.  (One of the most important fact about groups is that it is an abstraction of symmetries, which the axioms don't mention at all.)

Equation 1. is something that would make most people set out to discover the truth set.  Equation 2. calls out for drawing its graph.  Equation 3. being an identity means that is useful in algebraic reasoning.  The images they call up are different and what you do with them is different.  The images and metaphors that cluster around a concept are an important part of the semantics of the symbolic language.

I expect to post separately about the semantics of variables and about the semantics of symbolic language embedded in mathematical English.

References

Send to Kindle

Characterizing triangles unembeddedly

I just posted this question on mathoverflow (I recommend looking into this new forum):

The mathedu mailing list has a recent longish thread which discusses among other things whether we should teach triangles as labeled or unlabeled to high school students (this is a vast oversimplification of the thread).  I have long been concerned with how we think (informally and formally) about mathematical objects, as for example my unfinished article here about the many ways of thinking about function.  So naturally, I started to consider how we think about triangles.

Consider circles.   Most informal and formal descriptions involve an embedding into R^2, but they *can* be characterized as manifolds (even as Riemannian manifolds) of dimension 1 with specific properties, independent of any embedding. This sort of thing has turned out to be a major way to think about all sorts of spaces.  So can we describe triangles in a similar way?

Unfortunately, manifolds are far removed from my usual mathematical work (category theory).  What I *think* I understand is that there can be *piecewise* linear manifolds, even Riemannian ones.  So perhaps we can say a triangle is a piecewise linear manifold of dimension 1 with certain properties.  Now, I want to define a triangle so that it comes complete with information about the lengths of its sides and what the three angles are.  Riemannian manifolds have a way to specify length and angles, and I can believe you can make the sides have specific lengths.  But the angles?  It seems to me that the tangent spaces (like those on a circle) result in all angles being 0 or pi, except at the corners where they don’t exist.  But I may not understand the situation correctly.

So my question is:  Is there a known methodology that allows triangles to be characterized independent of embeddings in such a way that incorporates information about side lengths and angles?

Send to Kindle