Tag Archives: morphism

Very early difficulties II

Very early difficulties II

This is the second part of a series of posts about certain difficulties math students have in the very early stages of studying abstract math. The first post, Very early difficulties in studying abstract math, gives some background to the subject and discusses one particular difficulty: Some students do not know that it is worthwhile to try starting a proof by rewriting what is to be proved using the definitions of the terms involved.

Math StackExchange

The website Math StackExchange is open to any questions about math, even very easy ones. It is in contrast with Math OverFlow, which is aimed at professional mathematicians asking questions in their own field.

Math SE contains many examples of the early difficulties discussed in this series of posts, and I recommend to math ed people (not just RUME people, since some abstract math occurs in advanced high school courses) that they might consider reading through questions on Math SE for examples of misunderstanding students have.

There are two caveats:

  • Most questions on Math SE are at a high enough level that they don’t really concern these early difficulties.
  • Many of the questions are so confused that it is hard to pinpoint what is causing the difficulty that the questioner has.

Connotations of English words

The terms(s) defined in a definition are often given ordinary English words as names, and the beginner automatically associates the connotations of the meaning of the English word with the objects defined in the definition.

Infinite cardinals

If $A$ if a finite set, the cardinality of $A$ is simply a natural number (including $0$). If $A$ is a proper subset of another set $B$, then the cardinality of $A$ is strictly less than the cardinality of $B$.

In the nineteenth century, mathematicians extended the definition of cardinality for infinite sets, and for the most part cardinality has the same behavior as for finite sets. For example, the cardinal numbers are well-ordered. However, for infinite sets it is possible for a set and a proper subset of the set to have the same cardinality. For example, the cardinality of the set of natural numbers is the same as the cardinality of the set of rational numbers. This phenomenon causes major cognitive dissonance.

Question 1331680 on Math Stack Exchange shows an example of this confusion. I have also discussed the problem with cardinality in the abstractmath.org section Cardinality.

Morphism in category theory

The concept of category is defined by saying there is a bunch of objects called objects (sorry bout that) and a bunch of objects called morphisms, subject to certain axioms. One requirement is that there are functions from morphisms to objects choosing a “domain” and a “codomain” of each morphism. This is spelled out in Category Theory in Wikibooks, and in any other book on category theory.

The concepts of morphism, domain and codomain in a category are therefore defined by abstract definitions, which means that any property of morphisms and their domains and codomains that is true in every category must follow from the axioms. However, the word “morphism” and the talk about domains and codomains naturally suggests to many students that a morphism must be a function, so they immediately and incorrectly expect to evaluate it at an element of its domain, or to treat it as a function in other ways.

Example

If $\mathcal{C}$ is a category, its opposite category $\mathcal{C}^{op}$ is defined this way:

  • The objects of $\mathcal{C}^{op}$ are the objects of $\mathcal{C}$.
  • A morphism $f:X\to Y$ of $\mathcal{C}^{op}$ is a morphism from $Y$ to $X$ of $\mathcal{C}$ (swap the domain and codomain).

In Question 980933 on Math SE, the questioner is saying (among other things) that in $\text{Set}^{op}$, this would imply that there has to be a morphism from a nonempty set to the empty set. This of course is true, but the questioner is worried that you can’t have a function from a nonempty set to the empty set. That is also true, but what it implies is that in $\text{Set}^{op}$, the morphism from $\{1,2,3\}$ to the empty set is not a function from $\{1,2,3\}$ to the empty set. The morphism exists, but it is not a function. This does not any any sense make the definition of $\text{Set}^{op}$ incorrect.

Student confusion like this tends to make the teacher want to have a one foot by six foot billboard in his classroom saying

A MORPHISM DOESN’T HAVE TO BE A FUNCTION!

However, even that statement causes confusion. The questioner who asked Question 1594658 essentially responded to the statement in purple prose above by assuming a morphism that is “not a function” must have two distinct values at some input!

That questioner is still allowing the connotations of the word “morphism” to lead them to assume something that the definition of category does not give: that the morphism can evaluate elements of the domain to give elements of the codomain.

So we need a more elaborate poster in the classroom:

The definition of “category” makes no requirement
that an object has elements
or that morphisms evaluate elements.

As was remarked long long ago, category theory is pointless.

English words implementing logic

There are lots of questions about logic that show that students really do not think that the definition of some particular logical construction can possibly be correct. That is why in the abstractmath.org chapter on definitions I inserted this purple prose:

A definition is a totalitarian dictator.

It is often the case that you can explain why the definition is worded the way it is, and of course when you can you should. But it is also true that the student has to grovel and obey the definition no matter how weird they think it is.

Formula and term

In logic you learn that a formula is a statement with variables in it, for example “$\exists x((x+5)^3\gt2)$”. The expression “$(x+5)^3$” is not a formula because it is not a statement; it is a “term”. But in English, $H_2O$ is a formula, the formula for water. As a result, some students have a remarkably difficult time understanding the difference between “term” and “formula”. I think that is because those students don’t really believe that the definition must be taken seriously.

Exclusive or

Question 804250 in MathSE says:

“Consider $P$ and $Q$. Let $P+Q$ denote exclusive or. Then if $P$ and $Q$ are both true or are both false then $P+Q$ is false. If one of them is true and one of them is false then $P+Q$ is true. By exclusive or I mean $P$ or $Q$ but not both. I have been trying to figure out why the truth table is the way it is. For example if $P$ is true and $Q$ is true then no matter what would it be true?”

I believe that the questioner is really confused by the plus sign: $P+Q$ ought to be true if $P$ and $Q$ are both true because that’s what the plus sign ought to mean.

Yes, I know this is about a symbol instead of an English word, but I think the difficulty has the same dynamics as the English-word examples I have given.

If I have understood this difficulty correctly, it is similar to the students who want to know why $1$ is not a prime number. In that case, there is a good explanation.

Only if

The phrase “only if” simply does not mean the same thing in math as it does in English. In Question 17562 in MathSE, a reader asks the question, why does “$P$ only if $Q$” mean the same as “if $P$ then $Q$” instead of “if $Q$ then $P$”?

Many answerers wasted a lot of time trying to convince us that “$P$ only if $Q$” mean the same as “if $P$ then $Q$” in ordinary English, when in fact it does not. That’s because in English, clauses involving “if” usually connote causation, which does not happen in math English.

Consider these two pairs of examples.

  1. “I take my umbrella only if it is raining.”
  2. “If I take my umbrella, then it is raining.”
  3. “I flip that switch only if a light comes on.”
  4. “If I flip that switch, a light comes on.”

The average non-mathematical English speaker will easily believe that (1) and (4) are true, but will balk and (2) and (3). To me, (3) means that the light coming on makes me flip the switch. (2) is more problematical, but it does (to me) have a feeling of causation going the wrong way. It is this difference that causes students to balk at the equivalence in math of “$P$ only if $Q$” and “If $P$, then $Q$”. In math, there is no such thing as causation, and the truth tables for implication force us to live with the fact that these two sentences mean the same thing.

Henning Makholm’ answer to Question 17562 begins this way: “I don’t think there’s really anything to understand here. One simply has to learn as a fact that in mathematics jargon the words ‘only if’ invariably encode that particular meaning. It is not really forced by the everyday meanings of ‘only’ and’ if’ in isolation; it’s just how it is.” That is the best way to answer the question. (Other answerers besides Makholm said something similar.)

I have also discussed this difficulty (and other difficulties with logic) in the abmath section on “only if“.

References

Creative Commons License

This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.

Send to Kindle

A proof by diagram chasing



In Rigorous proofs, I went through the details of a medium-easy epsilon-delta proof in great detail as a way of showing what is hidden by the wording of the proof. In this post, I will do the same for an easy diagram-chasing proof in category theory. This theorem is stated and proved in Category Theory for Computing Science, page 365, but the proof I give here maximizes the diagram-chasing as a way of illustrating the points I want to make.

Theorem (J. Lambek) Let $F$ be a functor from a category to itself and let $\alpha:Fa\to a$ be an algebra for $F$ which is initial. Then $\alpha$ is an isomorphism.

Proof

  1. $F\alpha:FFa\to Fa$ is also an $F$-algebra.
  2. Initiality means that there is a unique algebra morphism $\eta:a\to Fa$ from $\alpha:Fa\to a$ to $F\alpha:FFa\to Fa$ for which this diagram commutes:



  3. To that diagram we can adjoin another (obviously) commutative square:



  4. Then the outside rectangle in the diagram above also commutes.
  5. This means that $\alpha\circ\eta:a\to a$ is an $F$-algebra morphism from $\alpha:Fa\to a$ to itself.
  6. Another such $F$-algebra morphism is $\text{id}_{A}$.
  7. Initiality of $\alpha$ means that the diagram below commutes:



  8. Because the upper bow and the left square both commute we are justified in inserting a diagonal arrow as below.



  9. Now we can read off the diagram that $F\alpha\circ F(\eta)=\text{id}_{Fa}$ and $\eta\circ\alpha=\text{id}_a$. By definition, then, $\eta$ is a two-sided inverse to $\alpha$, so $\alpha$ is an isomorphism.

Analysis of the proof

This is an analysis of the proof showing what is not mentioned in the proof, similar to the analysis in Rigorous proofs.

  • An $F$-algebra is any arrow of the form $\alpha:Fa\to a$. This definition directly verifies statement (1). You do need to know the definition of “functor” and that the notation $Fa$ means $F(a)$ and $FFa$ means $F(F(a))$.
  • When I am chasing diagrams, I visualize the commutativity of the diagram in (2) by thinking of the red path and the blue path as having the same composites in this graph:





    In other words, $F\alpha\circ F\eta=\eta\circ\alpha$. Notice that the diagram carries all the domain and codomain information for the arrows, whereas the formula “$F\alpha\circ F\eta=\eta\circ\alpha$” requires you to hold the domains and codomains in your head.

  • (Definition of morphism of $F$-algebra) The reader needs to know that a morphism of $F$ algebras is any arrow $\delta:c\to d$ for which




    commutes.
  • (Definition of initial $F$-algebra) $\alpha$ is an initial $F$-algebra means that for any algebra $\beta:Fb\to b$, there is a unique arrow $\delta$ for which the diagram above commutes.
  • (2) is justified by the last two definitions.
  • Pulling a “rabbit out of a hat” in a proof means introducing something that is obviously correct with no motivation, and then checking that it results in a proof. Step (9) in the proof given in Rigorous proofs has an example of adding zero cleverly. It is completely OK to pull a rabbit out of a hat in a proof, as long as the result is correct, but it makes students furious.
  • In statement (3) of the proof we are considering here, the rabbit is the trivially commutative diagram that is adjoined on the right of the diagram from (2).
  • Statement (4) uses a fact known to all diagram chasers: Two joined commutative squares make the outside rectangle commute. You can visualize this by seeing that the three red paths shown below all have the same composite. When I am chasing a complicated diagram I trace the various paths with my finger, or in my head.



    You could also show it by pointing out that $\alpha\circ F\alpha\circ F\eta=\alpha\circ\eta\circ\alpha$, but to check that I think most of us would go back and look at the diagram in (3) to see why it is true. Why not work directly with the diagram?

  • The definition of initiality requires that there be only one $F$-algebra morphism from $\alpha:Fa\to a$ to itself. This means that the upper and lower bows in (7) commute.
  • The diagonal identity arrow in (8) is justified by the fact that the upper bow is exactly the same diagram as the upper triangular diagram in (8). It follows that the upper triangle in (8) commutes. I visualize this as moving the bow down and to the left with the upper left node $Fa$ as a hinge, so that the two triangles coincide. (It needs to be flipped, too.) I should make an interactive diagram that shows this.
  • The lower triangle in (8) also commutes because the square in (2) is given to be commutative.
  • (Definition of isomorphism in a category) An arrow $f:a\to b$ in a category is an isomorphism if there is an arrow $g:b\to a$ for which these diagrams commute:


    xx


    This justifies statement (9).

Remark: I have been profligate in using as many diagrams as I want because this can be seen on a screen instead of on paper. That and the fact that much more data about domains and codomains are visible because I am using diagrams instead of equations involving composition means that the proof requires the readers to carry much less invisible data in their heads.

Send to Kindle

Function as map

This is a first draft of an article to eventually appear in abstractmath.

Images and metaphors

To explain a math concept, you need to explain how mathematicians think about the concept. This is what in abstractmath I call the images and metaphors carried by the concept. Of course you have to give the precise definition of the concept and basic theorems about it. But without the images and metaphors most students, not to mention mathematicians from a different field, will find it hard to prove much more than some immediate consequences of the definition. Nor will they have much sense of the place of the concept in math and applications.

Teachers will often explain the images and metaphors with handwaving and pictures in a fairly vague way. That is good to start with, but it’s important to get more precise about the images and metaphors. That’s because images and metaphors are often not quite a good fit for the concept — they may suggest things that are false and not suggest things that are true. For example, if a set is a container, why isn’t the element-of relation transitive? (A coin in a coinpurse in your pocket is a coin in your pocket.)

“A metaphor is a useful way to think about something, but it is not the same thing as the same thing.” (I think I stole that from the Economist.) Here, I am going to get precise with the notion that a function is a map. I am acting like a mathematician in “getting precise”, but I am getting precise about a metaphor, not about a mathematical object.

A function is a map

A map (ordinary paper map) of Minnesota has the property that each point on the paper represents a point in the state of Minnesota. This map can be represented as a mathematical function from a subset of a 2-sphere to {{\mathbb R}^2}. The function is a mathematical idealization of the relation between the state and the piece of paper, analogous to the mathematical description of the flight of a rocket ship as a function from {{\mathbb R}} to {{\mathbb R}^3}.

The Minnesota map-as-function is probably continuous and differentiable, and as is well known it can be angle preserving or area preserving but not both.

So you can say there is a point on the paper that represents the location of the statue of Paul Bunyan in Bemidji. There is a set of points that represents the part of the Mississippi River that lies in Minnesota. And so on.

A function has an image. If you think about it you will realize that the image is just a certain portion of the piece of paper. Knowing that a particular point on the paper is in the image of the function is not the information contained in what we call “this map of Minnesota”.

This yields what I consider a basic insight about function-as-map:  The map contains the information about the preimage of each point on the paper map. So:

The map in the sense of a “map of Minnesota” is represented by the whole function, not merely by the image.

I think that is the essence of the metaphor that a function is a map. And I don’t think newbies in abstractmath always understand that relationship.

A morphism is a map

The preceding discussion doesn’t really represent how we think of a paper map of Minnesota. We don’t think in terms of points at all. What we see are marks on the map showing where some particular things are. If it is a road map it has marks showing a lot of roads, a lot of towns, and maybe county boundaries. If it is a topographical map it will show level curves showing elevation. So a paper map of a state should be represented by a structure preserving map, a morphism. Road maps preserve some structure, topographical maps preserve other structure.

The things we call “maps” in math are usually morphisms. For example, you could say that every simple closed curve in the plane is an equivalence class of maps from the unit circle to the plane. Here equivalence class meaning forget the parametrization.

The very fact that I have to mention forgetting the parametrization is that the commonest mathematical way to talk about morphisms is as point-to-point maps with certain properties. But we think about a simple closed curve in the plane as just a distorted circle. The point-to-point correspondence doesn’t matter. So this example is really talking about a morphism as a shape-preserving map. Mathematicians introduced points into talking about preserving shapes in the nineteenth century and we are so used to doing that that we think we have to have points for all maps.

Not that points aren’t useful. But I am analyzing the metaphor here, not the technical side of the math.

Groups are functors

People who don’t do category theory think the idea of a mathematical structure as a functor is weird. From the point of view of the preceding discussion, a particular group is a functor from the generic group to some category. (The target category is Set if the group is discrete, Top if it is a topological group, and so on.)

The generic group is a group in a category called its theory or sketch that is just big enough to let it be a group. If the theory is the category with finite products that is just big enough then it is the Lawvere theory of the group. If it is a topos that is just big enough then it is the classifying topos of groups. The theory in this sense is equivalent to some theory in the sense of string-based logic, for example the signature-with-axioms (equational theory) or the first order theory of groups. Johnstone’s Elephant book is the best place to find the translation between these ideas.

A particular group is represented by a finite-limit-preserving functor on the algebraic theory, or by a logical functor on the classifying topos, and so on; constructions which bring with them the right concept of group homomorphisms as well (they will be any natural transformations).

The way we talk about groups mimics the way we talk about maps. We look at the symmetric group on five letters and say its multiplication is noncommutative. “Its multiplication” tells us that when we talk about this group we are talking about the functor, not just the values of the functor on objects. We use the same symbols of juxtaposition for multiplication in any group, “{1}” or “{e}” for the identity, “{a^{-1}}” for the inverse of {a}, and so on. That is because we are really talking about the multiplication, identity and inverse function in the generic group — they really are the same for all groups. That is because a group is not its underlying set, it is a functor. Just like the map of Minnesota “is” the whole function from the state to the paper, not just the image of the function.

Send to Kindle