Tag Archives: chunking

Pattern recognition in understanding math

Abstract patterns

This post is a revision of the article on pattern recognition in abstractmath.org.

When you do math, you must recognize abstract patterns that occur in

  • Symbolic expressions
  • Geometric figures
  • Relations between different kinds of math structures.
  • Your own mental representations of mathematical objects

This happens in high school algebra and in calculus, not just in the higher levels of abstract math.

Examples

Most of these examples are revisited in the section called Laws and Constraints.

At most

For real numbers $x$ and $y$, the phrase “$x$ is at most $y$” means by definition $x\le y$. To understand this definition requires recognizing the pattern “$x$ is at most $y$” no matter what expressions occur in place of $x$ and $y$, as long as they evaluate to real numbers.

Examples

  • “$\sin x$ is at most $1$” means that $\sin x\le 1$. This happens to be true for all real $x$.
  • “$3$ is at most $7$” means that $3\leq7$. You may think that “$3$ is at most $7$” is a silly thing to say, but it nevertheless means that $3\leq7$ and so is a correct statement.
  • “$x^2+(y-1)^2$ is at most $5$” means that
    $x^2+(y-1)^2\leq5$. This is true for some pairs $(x,y)$ and false for others, so it is a constraint. It defines the disk below:

The product rule for derivatives

The product rule for differentiable functions $f$ and $g$ tells you that the derivative of $f(x)g(x)$ is \[f'(x)\,g(x)+f(x)\,g'(x)\]

Example

You recognize that the expression ${{x}^{2}}\sin x$ fits the pattern $f(x)g(x)$ with $f(x)={{x}^{2}}$ and $g(x)=\sin x$. Therefore you know that the derivative of ${{x}^{2}}\,\sin x$ is \[2x\sin x+{{x}^{2}}\cos x\]

The quadratic formula

The quadratic formula for the solutions of an equation of the form $a{{x}^{2}}+bx+c=0$ is usually given as\[r=\frac{-b\pm
\sqrt{{{b}^{2}}-4ac}}{2a}\]

Example

If you are asked for the roots of $3{{x}^{2}}-2x-1=0$, you recognize that the polynomial on the left fits the pattern $a{{x}^{2}}+bx+c$ with

  • $a\leftarrow3$ (“$a$ replaced by $3$”)
  • $b\leftarrow-2$
  • and $c\leftarrow-1$.

Then
substituting those values in the quadratic formula gives you the roots $-1/3$ and $1$.

Difficulties with the quadratic formula

A little problem

The quadratic formula is easy to use but it can still cause pattern recognition problems. Suppose you are asked to find the solutions of $3{{x}^{2}}-7=0$. Of course you can do this by simple algebra — but pretend that the first thing you thought of was using the quadratic formula.

  • Then you got upset because you have to apply it to $a{{x}^{2}}+bx+c$
  • and $3{{x}^{2}}-7$ has only two terms
  • but $a{{x}^{2}}+bx+c$ has three terms…
  • (Help!)
  • Do Not Be Anguished:
  • Write
    $3{{x}^{2}}-7$ as $3{{x}^{2}}+0\cdot x-7$, so $a=3$, $b=0$ and $c=-7$.
  • Then put those values into the quadratic formula and you get $x=\pm \sqrt{\frac{7}{3}}$.   
  • This is an example of the following useful principle:

    Write zero cleverly.

    I suspect that most people reading this would not have had the problem with $3{{x}^{2}}-7$ that I have just described. But before you get all insulted, remember:

    The thing about really easy examples is that they give you the point without getting you lost in some complicated stuff you don’t understand very well.

    A fiendisher problem

      Even college students may have trouble with the following problem (I know because I have tried it on them):

    What are the solutions of the equation $a+bx+c{{x}^{2}}=0$?

    The answer

             

    \[r=\frac{-b\pm
    \sqrt{{{b}^{2}}-4ac}}{2a}\]

    is wrong. The correct answer is

                                     \[r=\frac{-b\pm
    \sqrt{{{b}^{2}}-4ac}}{2c}\]

    When you remember a pattern with particular letters in it and an example has some of the same letters in it, make sure they match the pattern!

    The substitution rule for integration

    The chain rule says that the derivative of a function of the form $f(g(x))$ is $f'(g(x))g'(x)$. From this you get the substitution rule for finding indefinite integrals:

                                      \[\int{f'(g(x))g'(x)\,dx}=f(g(x))+C\]

    Example

    To find $\int{2x\,\cos
    ({{x}^{2}})\,dx}$, you recognize that you can take $f(x)=\sin x$and $g(x)={{x}^{2}}$ in the formula, getting \[\int{2x\,\cos ({{x}^{2}})\,dx}=\sin ({{x}^{2}})\]    Note that in the way I wrote the integral, the functions occur in the opposite order from the pattern. That kind of thing happens a lot.

    Laws and constraints

    • The statement “$(x+1)^2=x^2+2x+1$” is a pattern that is true for all numbers $x$. $3^2=2^2+2\times2+1$ and $(-2)^2=(-1)^2+2\times(-1)+1$, and so on. Such a pattern is a universal assertion, so it is a theorem. When the statement is an equation, as in this case, it is also called a law.
    • The statement “$\sin x\leq 1$” is also true for all $x$, and so is a theorem.
    • The statement “$x^2+(y-1)^2$ is at most $5$” is true for some real numbers and not others, so it is not a theorem, although it is a constraint.
    • The quadratic formula says that:

      The solutions of an equation
      of the form $a{{x}^{2}}+bx+c=0$ is
      given by\[r=\frac{-b\pm
      \sqrt{{{b}^{2}}-4ac}}{2a}\]

      This is true for all complex numbers $a$, $b$, $c$.
      The $x$ in the equation is not a free variable, but a “variable to be solved for” and does not appear in the quadratic formula. Theorems like the quadratic formula are usually called “formulas” rather than “laws”.

    • The product rule for derivatives

      The derivative of $f(x)g(x)$ is $f'(x)\,g(x)+f(x)\,g'(x)$

      is true for all differentiable functions $f$ and $g$. That means it is true for both of these choices of $f$ and $g$:

      • $f(x)=x$ and $g(x)=x\sin x$
      • $f(x)=x^2$ and $g(x)=\sin x$

      But both choices of $f$ and $g$ refer to the same function $x^2\sin x$, so if you apply the product rule in either case you should get the same answer. (Try it).

    Some bothersome types of pattern recognition

    Dependence on conventions

    Definition: A quadratic polynomial in $x$is an expression of the form $a{{x}^{2}}+bx+c$.   

    Examples

    • $-5{{x}^{2}}+32x-5$ is a quadratic polynomial: You have to recognize that it fits the pattern in the definition by writing it as $(-5){{x}^{2}}+32x+(-5)$
    • So is ${{x}^{2}}-1$: You have to recognize that it fits the definition by writing it as ${{x}^{2}}+0\cdot x+(-1)$ (I wrote zero cleverly).

    Some authors would just say, “A quadratic polynomial is an expression of the form $a{{x}^{2}}+bx+c$” leaving you to deduce from conventions on variables that it is a polynomial in $x$ instead of in $a$ (for example).

    Note also that I have deliberately not mentioned what sorts of numbers $a$, $b$, $c$ and $x$ are. The authors may assume that you know they are using real numbers.

    An expression as an instance of substitution

    One particular type of pattern recognition that comes up all the time in math is recognizing that a given expression is an instance of a substitution into a known expression.

    Example

    Students are sometimes baffled when a proof uses the fact that ${{2}^{n}}+{{2}^{n}}={{2}^{n+1}}$ for positive integers $n$. This requires the recognition of the patterns $x+x=2x$ and $2\cdot
    \,{{2}^{n}}={{2}^{n+1}}$.

    Similarly ${{3}^{n}}+{{3}^{n}}+{{3}^{n}}={{3}^{n+1}}$.

    Example

    The assertion

    \[{{x}^{2}}+{{y}^{2}}\ge 0\ \ \ \ \ \text{(1)}\]

    has as a special case

    \[(-x^2-y^2)^2+(y^2-x^2)^2\ge
    0\ \ \ \ \ \text{(2)}\]

    which involves the substitutions $x\leftarrow -{{x}^{2}}-{{y}^{2}}$ and $y\leftarrow
    {{y}^{2}}-{{x}^{2}}$.

    Remarks
    • If you see (2) in a text and the author blithely says it is “never negative”, that is because it is of the form \[{{x}^{2}}+{{y}^{2}}\ge 0\] with certain expressions substituted for $x$ and $y$. (See substitution and The only axiom for algebra.)
    • The fact that there are minus signs in (2) and that $x$ and $y$ play different roles in (1) and in (2) are red herrings. See ratchet effect and variable clash.
    • Most people with some experience in algebra would see quickly that (2) is correct by using chunking. They would visualize (2) as

      \[(\text{something})^2+(\text{anothersomething})^2\ge0\]
      This shows that in many cases

      chunking is a psychological inverse to substitution

    • Note that when you make these substitutions you have to insert appropriate parentheses (more here). After you make the substitution, the expression of course can be simplified a whole bunch, to

      \[2({{x}^{4}}+{{y}^{4}})\ge0\]

    • A common cause of error in doing this (a mistake I make sometimes) is to try to substitute and simplify at the same time. If the situation is complicated, it is best to

      substitute as literally as possible and then simplify

    Integration by Parts

    The rule for integration by parts says that

                             \[\int{f(x)\,g'(x)\,dx=f(x)\,g(x)-\int{f'(x)\,g(x)\,dx}}\]

    Suppose you need to find $\int{\log x\,dx}$.(In abstractmath.org, “log” means ${{\log }_{e}}$).  Then we can recognize this integral as having the pattern for the left side of the parts formula with $f(x)=1$ and $g(x)=\log \,x$. Therefore

    \[\int{\log x\,dx=x\log x-\int{\frac{1}{x}dx=x\log \,x-x+c}}\]

    How on earth did I think to recognize $\log x$ as $1\cdot \log x$??  
    Well, to tell the truth because some nerdy guy (perhaps I should say some other nerdy guy) clued me in when I was taking freshman calculus. Since then I have used this device lots of times without someone telling me — but not the first time.

    This is an example of another really useful principle:

    Write $1$ cleverly.

    Two different substitutions give the same expression

    Some proofs involve recognizing that a symbolic expression or figure fits a pattern in two different ways. This is illustrated by the next two examples. (See also the remark about the product rule above.) I have seen students flummoxed by Example ID, and Example ISO is a proof that is supposed to have flummoxed medieval geometry students.

    Example ID

    Definition: In a set with an associative binary operation and an identity element $e$, an element $y$ is the inverse of an element $x$ if

    \[xy=e\ \ \ \ \text{and}\ \ \ \ yx=e \ \ \ \ (1)\]

    In this situation, it is easy to see that $x$ has only one inverse: If $xy=e$ and $xz=e$ and $yx=e$ and $zx=e$, then \[y=ey=(zx)y=z(xy)=ze=z\]

    Theorem: ${{({{x}^{-1}})}^{-1}}=x$.

    Proof: I am given that ${{x}^{-1}}$ is the inverse of $x$, By definition, this means that

    \[x{{x}^{-1}}=e\ \ \ \text{and}\ \ \ {{x}^{-1}}x=e \ \ \ \ (2)\]

    To prove the theorem, I must show that $x$ is the inverse of ${{x}^{-1}}$. Because $x^{-1}$ has only one inverse, all we have to do is prove that

    \[{{x}^{-1}}x=e\ \ \ \text{and}\ \ \ x{{x}^{-1}}=e\ \ \ \ (3)\]  

    But (2) and (3) are equivalent! (“And” is commutative.)

    Example ISO

    This sort of double substitution occurs in geometry, too.

    Theorem: If a triangle has two equal angles, then it has two equal sides.

    Proof: In the figure, assume $\angle ABC=\angle ACB$. Then triangle $ABC$ is congruent to triangle $ACB$ since the sides $BC$ and $CB$ are equal (they are the same line segment!) and the adjoining angles are equal by hypothesis.

    The point is that although triangles $ABC$ and $ACB$ are the same triangle, and sides $BC$ and $CB$ are the same line segment, the proof involves recognizing them as geometric figures in two different ways.

    This proof (not Euclid’s origi­nal proof) is hundreds of years old and is called the pons asinorum (bridge of donkeys). It became famous as the first theorem in Euclid’s books that many medi­eval stu­dents could not under­stand. I con­jecture that the name comes from the fact that the triangle as drawn here resembles an ancient arched bridge. These days, isos­ce­les tri­angles are usually drawn taller than they are wide.

    Technical problems in carrying out pattern matching

    Parentheses

    In matching a pattern you may have to insert parentheses. For example, if you substitute $x+1$ for $a$, $2y$ for
    $b$ and $4$ for $c$ in the expression \[{{a}^{2}}+{{b}^{2}}={{c}^{2}}\] you get \[{{(x+1)}^{2}}+4{{y}^{2}}=16\]
    If you did the substitution literally without editing the expression so that it had the correct meaning, you would get \[x+{{1}^{2}}+2{{y}^{2}}={{4}^{2}}\] which is not the result of performing the substitution in the expression ${{a}^{2}}+{{b}^{2}}={{c}^{2}}$.   

    Order switching

    You can easily get confused if the patterns involve a switch in the order of the variables.

    Notation for integer division

    • For integers $m$ and $n$, the phrase “$m$ divides $n$” means there is an integer $q$ for which $n=qm$.
    • In number theory (which in spite of its name means the theory of positive integers) the vertical bar is used to denote integer division. So $3|6$ because $6=2\times 3$ ($q$ is $2$ in this case). But “$3|7$” is false because there is no integer $q$ for which $7=q\times 3$.
    • An equivalent definition of division says that $m|n$ if and only if $n/m$ is an integer. Note that $6/3=2$, an integer, but $7/3$ is not an integer.
    • Now look at those expressions:
    • “$m|n$” means that there is an integer $q$ for which $n=qm$.In these two expressions, $m$ and $n$ occur in opposite order.
    • “$m|n$” is true only if $n/m$ is an integer. Again, they are in opposite order. Another way of writing $n/m$ is $\frac{n}{m}$. When math people pronounce “$\frac{n}{m}$” they usually say, “$n$ over $m$” using the same order.
  • I taught these notation in courses for computer engineering and math majors for years. Some of the students stayed hopelessly confused through several lectures and lost points repeatedly on homework and exams by getting these symbols wrong.
  • The problem was not helped by the fact that “$|$” and “$/$” are similar but have very different syntax:

    Math notation gives you no clue which symbols are operators (used to form expressions) and which are verbs (used to form assertions).

  • A majority of the students didn’t have so much trouble with this kind of syntax. I have noticed that many people have no sense of syntax and other people have good intuitive understanding of syntax. I suspect the second type of people find learning foreign languages easy.
  • Many of the articles in the references below concern syntax.
  • References

    Creative Commons License

    This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.


    Send to Kindle

    Thinking about a function as a mathematical object

    A mathematician’s mental representation of a function is generally quite rich and may involve many different metaphors and images kept in mind simultaneously. The abmath article on metaphors and images for functions discusses many of these representations, although the article is incomplete. This post is a fairly thorough rewrite of the discussion in that article of the representation of the concept of “function” as a mathematical object. You must think of functions as math objects when you are taking the rigorous view, which happens when you are trying to prove something about functions (or large classes of functions) in general.

    What often happens is that you visualize one of your functions in many of the ways described in this article (it is a calculation, it maps one space to another, its graph is bounded, and so on) but those images can mislead you. So when you are completely stuck, you go back to thinking of the function as an axiomatically-defined mathe­matical structure of some sort that just sits there, like a complicated machine where you can see all the parts and how they relate to each other. That enables you to prove things by strict logical deduction. (Mathematicians mostly only go this far when they are desperate. We would much rather quote somebody’s theorem.) This is what I have called the dry bones approach.

    The “mathematical structure” is most commonly a definition of function in terms of sets and axioms. The abmath article Specification and definition of “function” discusses the usual definitions of “function” in detail.

    Example

    This example is intended to raise your consciousness about the possibilities for functions as objects.

    Consider the function $f:\mathbb{R}\to\mathbb{R}$ defined by $f(x)=2{{\sin }^{2}}x-1$. Its value can be computed at many different numbers but it is a single, static math object.

    You can apply operators to it

    • Just as you can multiply a number by $2$, you can multiply $f$ by $2$.   You can say “Let $g(x)=2f(x)$” or “Let $g=2f$”. Multiplying a numerical function by $2$ is an operator that take the function $f$ to $2f$. Its input is a function and its output is another function. Then the value of $g$ (which is $2f$) at any real $x$ is $g(x)=2f(x)=4{{\sin }^{2}}x-2$. The notation  “$g=2f$” reveals that mathematicians think of $f$ as a single math object just as the $3$ in the expression “$2\times 3$” represents the number $3$ as a single object.
    • But you can’t do arithmetic operations to functions that don’t have numerical output, such as the function $\text{FL}$ that takes an English word to its first letter, so $\text{FL}(`\text{wolf’})=`\text{w’}$. (The quotes mean that I am writing about the word ‘wolf’ and the letter ‘w’.) The expression $2\times \text{FL}(`\text{wolf’})$ doesn’t make sense because ‘w’ is a letter, not a number.
    • You can find the derivative.  The derivative operator is a function from differentiable functions to functions. Such a thing is usually called an operator.  The derivative operator is sometimes written as $D$, so $Df$ is the function defined by: “$(Df)(x)$ is the slope of the tangent line to $f$ at the point $(x,f(x)$.” That is a perfectly good definition. In calculus class you learn formulas that allow you to calculate $(Df)(x)$ (usually called “$f'(x)$”) to be $4 \sin (x) \cos (x)$.

    Like all math objects, functions may have properties

    • The function defined by $f(x)=2{{\sin}^{2}}x-1$ is differentiable, as noted above. It is also continuous.
    • But $f$ is not injective. This means that two different inputs can give the same output. For example,$f(\frac{\pi}{3})=f(\frac{4\pi}{3})=\frac{1}{2}$. This is a property of the whole function, not individual values. It makes no sense to say that $f(\frac{\pi}{3})$ is injective.
    • The function $f$ is periodic with period $2\pi$, meaning that for any $x$, $f(x+2\pi)=f(x)$.     It is the function itself that has period $2\pi$, not any particular value of it.  

    As a math object, a function can be an element of a set

    • For example,$f$ is an element of the set ${{C}^{\infty }}(\mathbb{R})$ of real-valued functions that have derivatives of all orders.
    • On ${{C}^{\infty }}(\mathbb{R})$, differentiation is an operator that takes a function in that set to another function in the set.   It takes $f(x)$ to the function $4\sin x\cos x$.
    • If you restrict $f$ to the unit interval, it is an element of the function space ${{\text{L}}^{2}}[0,1]$.   As such it is convenient to think of it as a point in the space (the whole function is the point, not just values of it).    In this particular space, you can think of the points as vectors in an uncountably-infinite-dimensional space. (Ideas like that weird some people out. Do not worry if you are one of them. If you keep on doing math, function spaces will seem ordinary. They are OK by me, except that I think they come in entirely too many different kinds which I can never keep straight.) As a vector, $f$ has a norm, which you can think of as its length. The norm of $f$ is about $0.81$.

    The discussion above shows many examples of thinking of a function as an object. You are thinking about it as an undivided whole, as a chunk, just as you think of the number $3$ (or $\pi$) as just a thing. You think the same way about your bicycle as a whole when you say, “I’ll ride my bike to the library”. But if the transmission jams, then you have to put it down on the grass and observe its individual pieces and their relation to each other (the chain came off a gear or whatever), in much the same way as noticing that the function $g(x)=x^3$ goes through the origin and looks kind of flat there, but at $(2,8)$ it is really rather steep. Phrases like “steep” and “goes through the origin” are a clue that you are thinking of the function as a curve that goes left to right and levels off in one place and goes up fast in another — you are thinking in a dynamic, not a static way like the dry bones of a math object.

    Send to Kindle

    Offloading chunking

    In my previous post I wrote about the idea of offloading abstraction, the sort of things we do with geometric figures, diagrams (that post emphasized manipulable diagrams), drawing the tree of an algebraic expression, and so on.  This post describes a way to offload chunking.  

    Chunking

    I am talking about chunking in the sense of encapsulation, as some math ed. people use it.  I wrote about it briefly in [1], and [2] describes the general idea.  I don't have a good math ed reference for it, but I will include references if readers supply them.  

    Chunking for some educators means breaking a complicated problem down into pieces and concentrating on them one by one.  That is not really the same thing as what I am writing about.  Chunking as I mean it enables you to think more coherently and efficiently about a complicated mathematical structure by objectifying some of the data in the structure.  

    Project 

    This project an example of how chunking could be made visible in interactive diagrams, so that the reader grasps the idea of chunking.  I guess I am chunking chunking.  

    Here is a short version of an example of chunking worked out in ridiculous detail in reference [1]. 

    Let \[f(x)=.0002{{\left( \frac{{{x}^{3}}-10}{3{{e}^{-x}}+1} \right)}^{6}}\]  How do I know it is never negative?  Well, because it has the form (a positive number)(times)(something)$^6$.    Now (something)$^6$ is ((something)$^3)^2$ and a square is always nonnegative, so the function is (positive)(times)(nonnegative), so it has to be nonnegative.  

    I recognized a salient fact about .0002, namely that it was positive: I grayed out (in my mind) its exact value, which is irrelevant.  I also noticed a salient fact about \[{{\left( \frac{{{x}^{3}}-10}{3{{e}^{-x}}+1} \right)}^{6}}\] namely that it was (a big mess that I grayed out)(to the 6th power).  And proceeded from there.  (And my chunking was inefficient; for example, it is more to the point that .0002 is nonnegative).

    I believe you could make a movie of chunking like this using Mathematica CDF.  You would start with the formula, and then as the voiceover said "what's really important is that .0002 is nonnegative" the number would turn into a gray cloud with a thought balloon aimed at it saying "nonnegative".  The other part would turn into a gray cloud to the sixth, then the six would break into 3 times 2 as the voice comments on what is happening.  

    It would take a considerable amount of work to carry this out.  Lots of decisions would need to be made.  

    One problem is that Mathematica doesn't provide a way to do voiceovers directly (as far as I know).  Perhaps you could make a screen movie using screenshot software in real time while you talked and (offscreen) pushed buttons that made the various changes happen.

    You could also do it with print instead of voiceover, as I did in the example in this post. In this case you need to arrange to have the printed part and the diagram simultaneously visible.  

    I may someday try my hand at this.  But I would encourage others to attack this project if it interests them.  This whole blog is covered by the Creative Commons Attribution – ShareAlike 3.0 License", which means you may use, adapt and distribute the work freely provided you follow the requirements of the license.

    I have other projects in mind that I will post separately.

    References

    1. Abstractmath article on chunking.
    2. Wikipedia on chunking
    Send to Kindle

    Offloading abstraction

    The interactive examples in this post require installing Wolfram CDF Player., which is free and works on most desktop computers using Firefox, Safari and Internet Explorer, but not Chrome. The source code is the Mathematica Notebook Tangent Line.nb, which is available for free use under a Creative Commons Attribution-ShareAlike 2.5 License. The notebook can be read by CDF Player if you cannot make the embedded versions in this post work.


    The diagram above shows you the tangent line to the curve $y=x^3-x$ at a specific point.  The slider allows you to move the point around, and the tangent line moves with it. You can click on one of the plus signs for options about things you can do with the slider.  (Note: This is not new.  Many other people have produced diagrams like this one.)

    I have some comments to make about this very simple diagram. I hope they raise your consciousness about what is going on when you use a manipulable demonstration.

    Farming out your abstraction load

    A diagram showing a tangent line drawn on the board or in a paper book requires you visualize how the tangent line would look at other points.  This imposes a burden of visualization on you.  Even if you are a new student you won't find that terribly hard (am I wrong?) but you might miss some things at first:

    • There are places where the tangent line is horizontal.
    • There are places where some of the tangent lines cross the curve at another point. Many calculus students believe in the myth that the tangent line crosses the curve at only one point.  (It is not really a myth, it is a lie.  Any decent myth contains illuminating stories and metaphors.)
    • You may not envision (until you have some experience anyway) how when you move the tangent line around it sort of rocks like a seesaw.

    You see these things immediately when you manipulate the slider.

    Manipulating the slider reduces the load of abstract thinking in your learning process.     You have less to keep in your memory; some of the abstract thinking is offloaded onto the diagram.  This could be described as contracting out (from your head to the picture) part of the visualization process.  (Visualizing something in your head is a form of abstraction.)

    Of course, reading and writing does that, too.  And even a static graph of a function lowers your visualization load.  What interactive diagrams give the student is a new tool for offloading abstraction.

    You can also think of it as providing external chunking.  (I'll have to think about that more…)

    Simple manipulative diagrams vs. complicated ones

    The diagram above is very simple with no bells and whistles.  People have come up with much more complicated diagrams to illustrate a mathematical point.  Such diagrams:

    • May give you buttons that give you a choice of several curves that show the tangent line.
    • May give a numerical table that shows things like the slope or intercept of the current tangent line.
    • May also show the graph of the derivative, enabling you to see that it is in fact giving the value of the slope.

    Such complicated diagrams are better suited for the student to play with at home, or to play with in class with a partner (much better than doing it by yourself).  When the teacher first explains a concept, the diagrams ought to be simple.

    Examples

    • The Definition of derivative demo (from the Wolfram Demonstration Project) is an example that provides a table that shows the current values of some parameters that depend on the position of the slider.
    • The Wolfram demo Graphs of Taylor Polynomials is a good example of a demo to take home and experiment extensively with.  It gives buttons to choose different functions, a slider to choose the expansion point, another one to choose the number of Taylor polynomials, and other things.
    • On the other hand, the Wolfram demo Tangent to a Curve is very simple and differs from the one above in one respect: It shows only a finite piece of the tangent line.  That actually has a very different philosophical basis: it is representing for you the stalk of the tangent space at that point (the infinitesimal vector that contains the essence of the tangent line).
    • Brian Hayes wrote an article in American Scientist containing a moving graph (it moves only  on the website, not in the paper version!) that shows the changes of the population of the world by bars representing age groups.  This makes it much easier to visualize what happens over time.  Each age group moves up the graph — and shrinks until it disappears around age 100 — step by step.  If you have only the printed version, you have to imagine that happening.  The printed version requires more abstract visualization than the moving version.
    • Evaluating an algebraic expression requires seeing the abstract structure of the expression, which can be shown as a tree.  I would expect that if the students could automatically generate the tree (as you can in Mathematica)  they would retain the picture when working with an expression.  In my post computable algebraic expressions in tree form I show how you could turn the tree into an evaluation aid.  See also my post Syntax trees.

    This blog has a category "Mathematica" which contains all the graphs (many of the interactive) that are designed as an aid to offloading abstraction.

    Send to Kindle

    Prechunking

    The emerging theory of how the brain works gives us a new language to us for discussing how we teach, learn and communicate math.

    Modules

    Our minds have many functionalities.  They are implemented by what I called modules in Math and modules of the mind because I don’t understand very much about what cognitive scientists have learned about how these functionalities are carried out.  They talk about a particular neuron, a collection of neurons, electrical charges flowing back and forth, and so on, and it appears there is no complete agreement about these ideas.

    The functions the modules implement are physical structures or activities in the brain.  At a certain level of abstraction we can ignore the mechanism.

    Most modules carry out functionalities that are hidden from our consciousness.

    • When we walk, the walking is carried out by a module that operates without our paying (much) attention to it.
    • When we recognize someone, the identity of the person pops into our consciousness without us knowing how it got there.  Indeed, we cannot introspect to see how the process was carried out; it is completely hidden.

    Reasoning, for example if you add 56 and 49 in your head, has part of the process visible to your introspection, but not all of it.  It uses modules such as the sum of 9 and 6 which feel like random access memory.  When you carry the addition out, you (or at least I) are conscious of the carry: you are aware of it and aware of adding it to 9 to get 10.

    Good places to find detailed discussion of this hiddenness are references [2] and [4] below.

    Chunking

    Math ed people have talked for years about the technique of chunking in doing math.

    • You see an algebraic expression, you worry about how it might be undefined, you gray out all of it except the denominator and inspect that, and so on.  (This should be the subject of a Mathematica demo.)
    • You look at a diagram in the category of topological spaces.  Each object in the diagram stands for a whole, even uncountably infinite, space with lots of open and closed subsets and so on, but you think of it just as a little pinpoint in the diagram to discover facts about its relationship with other spaces.  You don’t look inside the space unless you have to to verify something.

    Students have a hard time doing that.  When an experienced mathematician does this, they are very likely to chunk subconsciously; they don’t think, “Now I am chunking”.  Nevertheless, you can call it to their attention and they will be aware of the process.

    There are modules that perform chunking whose operation you cannot be aware of even if you think about it.  Here are two examples.

    Example 1. Consider these two sentences from [2], p. 137:

    • “I splashed next to the bank.”
    • “There was a run on the bank.”

    When you read the first one you visualize a river bank.  When you read the second one you visualize a bank as an institution that handles money.  If these two sentences were separated by a couple of paragraphs, or even a few words, in a text you are likely not to notice that you have processed the same word in two different ways.  (When they are together as above it is kind of blatant.)

    The point is the when you read each sentence your brain directly presents you with the proper image in each case (different ones as appropriate).  You cannot recover the process that did that (by introspection, anyway).

    Example 2. I discussed the sentence below in the Handbook.  The sentence appears in references [3].

    …Richard Darst and Gerald Taylor investigated the
    differentiability of functions f^p (which for our
    purposes we will restrict to (0,1)) defined for
    each p\geq1 by

    In this sentence, the identical syntax (a,b) appears twice; the first occurrence refers to the open interval from 0 to 1 and the second refers to the GCD of integers m and n.  When I first inserted it into the Handbook’s citation list, I did not notice that (I was using it for another phenomenon, although now I have forgotten what it was).  Later I noticed it.  My mind preprocessed the two occurrences of the syntax and threw up two different meanings without my noticing it.

    Of course, “restricting to (0, 1)” doesn’t make sense if (0, 1) means the GCD of 0 and 1, and saying “(m, n) = 1doesn’t make sense if (m, n) is an interval.  This preprocessing no doubted came to its two different conclusions based on such clues, but I claim that this preprocessing operated at a much deeper level of the brain than the preprocessing that results in your thinking (for example) of a topological space as a single unstructured object in a category.

    This phenomenon could be called prechunking.  It is clearly a different phenomenon that zooming in on a denominator and then zooming out on the whole expression as I described in [1].

    This century’s metaphor

    In the nineteenth century we came up with a machine metaphor for how we think.  In the twentieth century the big metaphor was our brain is a computer.  This century’s metaphor is that of a bunch a processes in our brain and in our body all working simultaneously, mostly out of our awareness, to enable us to live our life, learn things, and just as important (as Davidson [4] points out) to unlearn things.  But don’t think we have Finally Discovered The Last Metaphor.

    References

    1. Zooming and chunking in abstractmath.org.
    2. Mark Changizi, The vision revolution.  Benbella Books, 2009.
    3. Mark Frantz, “Two functions whose powers make fractals”.  American Mathematical Monthly, v 105, pp 609–617 (1998).
    4. Cathy N. Davidson, Now you see it.  Viking Penguin, 2011.  Chapters 1 and 2.
    5. Math and modules of the mind (previous post).
    6. Cognitive science in Wikipedia.
    7. Charles Wells, The handbook of mathematical discourse, Infinity Publishing Company, 2003.
    Send to Kindle

    Playing with Riemann Sums

    I had a satori [Note 2].  I felt like the guy in the ads who sits in front of his new ultrafast computer with the wind blowing his hair back and bracing himself by holding onto the desk.  (My hair was dark then but I certainly was not wearing a tie.)

    That convergence theorem was talking about something BIG.

    I visualized a Cloud of Riemann Sums floating around and swerving closer to the Right Answer as their meshes decreased.

    A Riemann Sum has a lot of parameters:

    • Its mesh.  This can be any positive real number.
    • Its choice of subintervals. Any positive integer!  There can be billions of subintervals.
    • And, ye gods, the individual choice of each evaluation point for each subinterval in each Riemann Sum

    Those are three independent parameters, except for the constraint imposed by the mesh on each choice of subintervals.  [Note 3]. This means there are uncountably infinitely many of these sums.

    I tell my students that we have to zoom in and zoom out [Reference 2] from a problem.  When we zoom out a complicated structure is thought of as a point in a certain relationship with other structures-as-points.  Then to understand something we zoom in (selectively) to see the details that make it work.  What I remember from my satori is that I didn’t visualize them as points but rather as little blurs, sort of like the blurs in Mumford’s red book [Reference 3], which I think was the first non-constipated math text I had ever seen.

    Riemann Sums in Mathematica

    In the nineties, I was on a grant to create Mathematica programs for students, and one of the notebooks I created allowed you to easily exhibit Riemann sums with various parameters.  I also included code that would show a cloud.

    Below is a cloud.  It is a plot of the values of 300 Riemann sums for \int_0^{\pi} \sin x \,dx.  They have randomly chosen meshes from 0 to \pi/2 and the subintervals and individual evaluation points for each subinterval are also chosen randomly.

    The cloud below is a plot of the values of 300 Riemann sums for the area of the upper right quarter circle of radius 2 with center at origin.  Its meshes range from 0 to 1, and other properties are similar to the one above.  The vertical spread of the points is considerably bigger,  presumably because of the vertical tangent line at the right hand end of the integral.

    When you click on the code for either of these you get a different cloud with the same parameters.

    You can access the notebook containing the code for this via Abmath Gate.    Be sure to read the ReadMe file.

    Notes

    [1] This was 1961.  Of course the book didn’t say things such as “with any choice of points-to-evaluate-at”.  It said what it had to say in stilted academic prose which required reading it two or three times before understanding it.  Academic prose is much better these days.  See Reference [1].

    I was quite good at reading complicated prose. My ACT scores were a tad higher in English or Language or whatever it is called that they were in Math.  With the Internet, math exposition should do much more with pictures, interactive things, and lots of examples (which don’t waste paper now).  But that is another diatribe…

    [2] This is a snooty word for lightbulb flashing over your head.  Every once in awhile I give in to the temptation to use some obscure word to impress people as to the variety of things I know about.  Teachers, don’t do this to your students.  Other professors are fair game.

    [3] The same choice of subinterval can correspond to many different meshes, if your definition of mesh requires only that each subinterval be narrower than the mesh, rather than requiring that the mesh be the size of the biggest subinterval.  (I had never thought about that until I wrote this.)

    [4] The Mathematica Demonstrations website has several other notebooks that exhibit Riemann Sums.

    References

    [1]  The Revolution in Technical Exposition II, post on this blog.

    [2]  Zooming and Chunking in abmath.

    [3] D. Mumford, The Red Book of Varieties and Schemes (second expanded ed.), Springer Lecture Notes in Math 1358, Springer-Verlag, Berlin, 1999.   (I have not seen this edition.  What I remember is the Red Book as it was in the 1967 Algebraic Geometry Summer School at Bowdoin.  I hope the smudges survive in the new version.  As I remember the smudges were bigger for points that were more generic or something like that.  Those smudges caused me a kind of sartori, too.)

    Send to Kindle