My Summation                      



A ‘result’ could be a lemma, proposition, theorem, or corollary. Hence, this term refers to a proven statement — the outcome of a successful mathematical proof. To prove a statement, there is one potentially trivial prerequisite that is often taken for granted: you must understand the statement. This understanding can easily be checked by restating the potential result statement using quantificational logic. If you find that you cannot restate the result statement in a purely logical form, stop — it makes no sense to try to prove a statement that you cannot clearly restate; you must understand what you are trying to prove…

A Note from Blog Author Michelle Blair
You’ve waited ever so patiently… And for that, I am (infinitely) grateful. My Summation took a hiatus, induced by that ultimate annual partitioning of the calendar year, otherwise regarded as the Fall and Spring semesters. Eventually, I will learn to walk and prove theorems at the same time. In the meanwhile, do not fret. The next full blog post is on the not-too-distant horizon. It will tackle elusive mathematical phraseology — a challenge to restating ‘result’ statements.

I want to give a special shout out to you: all of my blog followers. Just a few days ago, you became over 100 strong!

The mathematical journey continues. Buckle your seat belts. It should prove to be a fun ride.

    Jun 1    
Advice on Achieving Results (and a Note from the Blog Author)

The tradition continues… In a hallowed honor, this blog post carries the torch to the 102nd episode of the Carnival of Mathematics, a monthly tradition of highlighting noteworthy contributions to the mathematical blogosphere. Without loss of respectability (or generality) via mild literary jest, let’s talk blog, my fellow math lovers…

Disclaimer: The attempted humor herein represents the sole effort of My Summation blog creator Michelle F. Blair. Any evidence of lameness, mistakes, or misrepresentations should be blamed on her only.

To be or knot to be and other such questions are tackled in a disentangling post by Richard Elwes titled The Revenge of the Perko Pair.” Here, Elwes resurrects the twisted but enchanting tale of the Perko pair from knot theory topology in an effort to correct repeated misrepresentations (aka errors) by other expositors (a group that includes some fairly high profile web sites).

After reading a classic piece of American literature with a number in the title, blog writer Evelyn Lamb asks herself What Is the Funniest Number? Well, what do you think? If you cannot decide, take a tour through some interesting numbers that may tickle you to a giggle. And if you cannot decide because you don’t think numbers can be funny, I triple dare you to read this article.

Now, if you do not find yourself cracking up with the previous blog entry highlighted, I guarantee you’ll ROTFLYBO (i.e., roll on the floor laughing your butt off) with blog writer Ben Orlin’s hilarious skit titled Math Experts Split the Check.” Have you ever seen a mathematician, an engineer, a computer scientist, an economist, and a physicist split a dinner check?

It’s a curve. It’s a plane. No. It’s an ellipse! Or is it? In Diana the Huntress and Curve Fitting,” blog author Christian Perfect takes on a huntress to determine whether her protector (aka a guardrail) meets the definition of an ellipse. With a photo, a graph, and a definition, this is safe enough to actually try at home! Check it out!

In a tribute, math blogger A.P. Goucher niftily projects history into a needed direction by highlighting inspiring and Influential Mathematicians.” Without spoiling the unusual surprise elegantly embedded in this article, I shall only remark: these mathematicians took on many more challenges both inside and outside of their respective fields.

Rob Eastaway discusses pitfalls in formulations of the Monty Hall puzzle in Monty Hall - Beware the Empty Box.” He also warns that “a little learning can be a dangerous thing.” (Any math lover can appreciate knowing too much… But what an exhilarating danger, eh?)

And in a sobering case study of applied geometry, Nicola Twilley’s post Squaring the Circle discusses that agricultural practice has involved circular plots, resulting in large swaths of essentially wasted corners of land. Oh, the irony.

Take a thought provoking break from studying (or whatever work you’re doing) to be more mathematically observant of your surroundings. Need a role model? Visit Kate Degner’s blog article Talking the Talking and Walking the (Math) Walk to get you started in exercising your math vision.

Ever heard of the annual Bridges conference held in conjunction with the Joint Mathematics Meetings? No? Google it! And check out how blogger Kaz Maslanka contributed in Bridges Enschede 2013.” 

Further, no Carnival would be complete without a PSA (public service announcement) that truly enriches our respective capacities to better engage with mathematics. And so I introduce, blogger Peter Rowlett, who has collected and organized “Your Suggestions of iPad Apps for University Mathematics Teaching.”

Finally, incredulously inventive Patrick Stevens proves that a proof formulated as a poem does exist. This Carnival of Mathematics episode fittingly closes with a proof-poem… Visit Slightly Silly Sylow Pseudo-Sonnets to see the synergistically literary and mathematical antics of Patrick Stevens in all of their grandeur.

Happy math blogging and math blog reading! And join the carnival. Wow and be wowed.

*To learn more about the Carnival of Mathematics, visit

    Sep 17    
Carnival of Mathematics 102: My Summation of Other Blogs

Faith in the existence of infinity is a prerequisite for appreciating the beauty of calculus. For some, this faith comes with little effort, but the longer it takes to brew your faith, the richer your appreciation.

For the uninitiated, calculus is all about limits, and limits are based on the concept of convergence while approaching infinity. Here’s an abbreviated tour of three central concepts in calculus: 

(1) The Limit

(2) The Derivative (A Limit)

(3) The Integral (Another Limit)

Remark: This is all about limits…

The Limit

We will denote a sequence of values indexed by natural numbers by { an }, and we will say that the sequence of converges if and only if there is an index value N such that when n ≥ N, the distance between anand some value (we’ll call this special value a) is smaller than any distance we can come up with, no matter how small (we’ll call this distance ε — which is just any positive real number). If such an N exists (and we’re lucky enough to have identified the special convergent value a), we will say: 


To restate this definition using quantificational logic:


If you’re not lucky enough to identify what this special a is, there is an alternate definition that does not require luck thanks to mathematician Augustin-Louis Cauchy. This alternate definition tells you whether or not the sequence converges, in which case, you will be aware that a special value a exists (even if you don’t know what the special value a is). In quantificational logic:


These 2 definitions actually imply each other. Proving these implications is a great exercise to practice proving convergence (and invoking a dear friend to all students of mathematical analysis, named the “Triangle Inequality”).

We can also define convergence of a function without involving sequences; we can use  the concept of distance alone to show that a function converges to a given value. 

For any distance between f( x ) and f( x0), which we’ll again denote as ε:

If we can always identify a corresponding distance between and x0(which we’ll denote δ) in domain of  such that the distance between f( x ) and f( x0 ) is less than ε,  then the limit of f exists at x0. Restating this definition using quantificational logic: image

The limit of a function exists at x0if and only if the function is continuous at x0. So, if you can find the limit of a function at a given point, you also know the function is continuous at that point.

The Derivative (A Limit)

Supposing we want to find out how fast something is changing in the smallest time increment we can identify, the derivative rescues us. We’ll call this small time increment h, and we name the following limit (providing that it exists) the derivative:


Note that evaluating the argument of this limit at h = 0, places us in an untenable situation because numbers do not exist with denominators equal to 0. Functionally, the limit can be rewritten equivalently as


If this limit exists, the function is differentiable at x0. And with this rewriting and subsequent algebraic gyrations, we can often reach a simpler expression that helps us to just employ properties (i.e., laws) of limits, such as linearity. However, neither form of the derivative may yield an easily identifiable limit. In any case, we can denote the derivative with an accent mark:


This notation raises an important observation: the derivative itself is also a function. So, you can graph it if you wanted to for different values of x0. And so, clearly, you could examine whether the derivative of the derivative exists


This would be the second derivative. You could keep doing this until you eventually get a derivative that equals zero. But some functions are infinitely differentiable (no matter how many times you differentiate the function, you will never get zero). One example: g( x ) =ex.

The Integral (Another Limit)

The geometric interpretation of an integral in single variable calculus is “area.” So, if we want to find the area between the graphed curve or line representing a function’s trajectory and the x-axis, the integral becomes our hero. Considering only differentiable functions leaves us comfortably with mathematician Bernhard Riemann’s strategy for integration. There are other strategies that do not require that the function be differentiable. But (for now) we will remain in the realm of introductory calculus, which uses Riemann’s method to motivate the first and second fundamental theorems of calculus.

For any interval [ a,b ] , we can partition the interval into disjoint subintervals, using partition points xiwhere the index i is a natural number:


With n partition points, we’ll denote the set of these points as Pn.

With these partition points, we can think of a bunch of rectangles with the following area:




Assuming each of these functional values exists (for simplicity, we use this assumption; for rigor, we need to involve infima and suprema), we can just sum these areas to determine the area value we seek. With a given partition Pn,the former can be regarded as the “lower sum” which we’ll denote L( f, Pn ), and the latter can be regarded as the “upper sum” which we’ll denote U( f, Pn ). But if f is not a constant function, then clearly, L( f, Pn ) will not equal U( f, Pn ), that is,


Enter the limit. When the limit of the lower sums and the limit of the upper sums are equal, that is, when


We say that f is integrable, and we call this limit the integral and so


Via the fundamental theorems of calculus, we can even see that we can evaluate the integral (a limit) of a derivative (a limit) using the “antiderivative” (an undoing of the derivative).

Hopefully, you are now convinced: 

If you are studying calculus, you are surrounded by limits. But don’t worry, you’ll only see them when you reach infinity (insert laughter here; this is impossible).

    Aug 19    
Surrounded by (Unreachable) Limits

The 100th edition of the Carnival of Mathematics is being hosted by Richard at his math blog Simple City. And as usual, the carnival features many notable June 2013 contributions to the mathematical blogosphere, including Empowering Determinant Motivation from My Summation!

Thanks to Richard and other Carnival of Mathematics organizers for the feature! Thanks also to all My Summation supporters (old, new, and future) on Tumblr!

Other bloggeriffic announcements: 

My Summation will host the Carnival of Mathematics 102 installment, summing up cool math blog happenings that take place in August 2013.

Posts from My Summation are catalogued at, the My Summation page on FaceBook, and also via @MySummation on Twitter.

    Jul 3    
Bursting into the Mathematical Blogosphere

Tutelage purveyed via rote rules can easily bury the hidden intricacy, centrality, and utility of continuity. Continuity is a desirable property of functions because it facilitates computation of derivatives and integrals, which are both fundamental in describing other properties and implications of rates of change. 

The theoretical backbone of continuity is usually first presented via the concept of the limit, which is essentially the concept of convergence. If the limit exists for every element in the domain of the function f, hereafter dom( f ), then the function is said to be continuous. This is proven by selecting an arbitrary element of dom( f ), and proving the limit exists for this element. To be thorough, the endpoints warrant special treatment; the right-hand limit must exist for the left endpoint, and the left-hand limit must exist for the right endpoint.

Three definitions of continuity are particularly important in early studies of theoretical calculus. Being able to prove continuity via each definition nurtures mastery and empowers students to choose whichever method is preferred. It is often said that the set property method is the least cumbersome, but this method is arguably the most abstract.

So, here we define each of these 3 perspectives conceptually and rigorously, while also demonstrating how we can use these definitions. We conclude by proving why each perspective in this trilogy helps us reach the same goal: 

To determine whether a function is continuous.

The Sequential Perspective

Sequences essentially map each natural number (i.e., positive integer) to a value. Natural numbers go from 1 through infinity, which means that sequences yield an infinite set of values. 

Let { xn } be a sequence that converges to x0. If for every convergent sequence { xn } in dom( f ) we also see that fxn ) converges to fx), then f is continuous at x0.

Note x0 can be regarded as the limit of the sequence  { x} as n approaches infinity (where n is considered an index value for the sequence, and n takes on only positive integer values). And so we can write this as 


Or equivalently we can write this as


Also, observe that this definition can be regarded as a criterion that a function must pass to be considered continuous; for a function to be continuous, the function will map every convergent sequence in the domain to the function value at the limit (of that sequence). To restate this criterion for continuity using quantificational logic:


To see how we can apply this criterion to prove continuity, consider the function f( x ) = 2x - 3. We will show f is continuous.

Define a function f mapping from the reals to the reals such that f( x ) = 2x - 3. Let { x} be a convergent sequence in dom( f ) that converges to x0which is also in dom( f ).  By the definition of convergence, it follows that there exists an index value after which we are extremely close to (i.e., infinitesimally distant from) x0. Denote this index value N. And denote this infinitesimal distance as ε (as a distance, clearly ε is a positive real number). To prove f is continuous, we need to show that for all index values greater than N, fx) is also infinitesimally distant from fx0 ). Hence, to complete this proof, it suffices to show that n > N implies that the distance between f(  xn ) and fx0 ) is less than any ε we choose. So, assume n > N. Computing the distance, yields | fxn ) - fx0 ) | = | 2xn ) - 3 - (2x0 ) - 3) | = | 2xn ) - 3 - 2x0 ) + 3 | = | 2xn ) - 2x0 ) | = 2*| ( xn ) - ( x0 ) |. And since { x} converges to x0, we know | ( xn ) - ( x0 ) | < ε. And so | fxn ) - fx0 ) | = 2*| ( xn ) - ( x0 ) | < 2ε. So, let’s choose another index value N2 such that N2 > N and when n > N2, we see | xn ) - ( x0 ) | < ε/2. Now, assume n > N2.Then | fxn ) - fx0 ) | = 2*| ( xn ) - ( x0 ) | < 2ε/2 = ε as desired. And since x0 was an arbitrary element of dom( f ), it follows that f is continuous on all of dom( f ). ■

The Distance Perspective

As we have already started to see in our proof above, the absolute value of the difference between two numbers is essentially the distance between those two numbers. And so, the idea of infinitesimally small distances (denoted ε) is central to the “distance” perspective of continuity.

Let ε be a positive real number representing a distance between f( x ) and f( x0 ). If we can always identify a corresponding distance between and x0 in dom( f ) such that the distance between f( x ) and f( x0 ) is less than ε, then f is continuous at x0

The distance perspective on continuity says that (for continuous functions) however close we want to get to f( x0 ), we can get there via a carefully selected closeness to x0in dom( f ) and we will represent this closeness by δ. However, identifying this appropriate closeness (i.e., δ) usually takes some scratch work, and it may even involve some hunting. As a result, how you identify δ may be a matter of personal style. The “bottom line”(i.e., key) in this criterion for continuity is that we must be able to find δ, that is: for a continuous function, δ must exist.

Restating this continuity criterion using quantificational logic:


Again, we apply this criterion to prove continuity, by considering the same function f( x ) = 2x - 3. We will show f is continuous.

Let ε be a positive real number.Define a function f mapping from the reals to the reals such that f( x ) = 2x - 3. Let x0be in dom( f ). To prove f is continuous, we need to find a δ > 0 such that | x x0 | < δ implies that | f( x )- fx0) | < ε. Manipulating the inequality | f( x )- fx0 ) | < ε will help us to identify the δ we seek. Proceeding with our manipulation (i.e., scratch work) we see that we need | f( x )- fx0 ) | = | 2x - 3 - ( 2x0 - 3 ) | = | 2x - 3 - 2x0 + 3 | = | 2x - 2x| = 2* | xx0 | < ε.  Note that we achieve our desired result if we set δ equal to ε/2 (which is totally valid since all we need is for δ to be a positive real number and ε/2  is a positive real number). So, letting δ = ε/2, we get | f( x ) - fx) | = 2* | xx0 | < 2δ = 2ε/2 = ε as desired. And since x0 was an arbitrary element of dom( f ), it follows that f is continuous on all of dom( f ). ■

The Set Property Perspective

Topology can be described as a branch of mathematics wherein properties of the domain and range are used to determine the behavior of a continuous function. The domain of a function is a set, and the range of a function is a set. So, in topology, the properties of sets are critically important. Open sets are the star witnesses for continuity. A set S is open if for every element (say x) in the set, there exists an open interval containing x that is a subset of S. In other words, a set is open only if every element of the set has “breathing room” in the set. To exploit open sets to prove continuity requires a careful dance between subsets of the image of f, and preimages of f (obtained via the inverse of f). (The image of a function is its range, and the preimage of a function is the subset of the domain needed to map to a selected image.)

Definition (Theorem): 
Let S be a subset of the image (range) of f. A preimage f-1( S ) is open whenever S is open if and only if fis continuous.

Restating this continuity criterion symbolically yields:


Again, considering the same function f( x ) = 2x - 3. We will show f is continuous.

Define a function f mapping from the reals to the reals such that f( x ) = 2x - 3. Let S be an open subset of the reals (which is the image of f). Let x0be in S. To prove f is continuous, we need to show f-1( S ) is open, which means we need to show for x0in f-1( S ), we can always identify some breathing space for x0within f-1( S ). First, note that 2x0 - 3  = f(x0) which is in S. And recall S is open. So by definition, there exists ε > 0 such that the interval ( 2x0 - 3 - ε, 2x0  - 3 + ε ) is a subset of S. Rewriting, ( 2x0 - 3 - ε, 2x0  - 3 + ε ) = ( 2(x0  - ε/2) - 3, 2 (x0  + ε/2) - 3 ). And note that because f is a bijection (i.e., onto and 1-1) we know that the interval (x0  - ε/2, x0  + ε/2) is in fact within f-1( S ). Thus, we have identified the “breathing space” we sought for x0, and so we conclude f-1( S ) is open. Therefore, f is continuous. ■

Equivalence of the Criteria

Each of these three criteria for continuity is equivalent, which means the sequential definition holds if and only if the distance definition holds, and the distance definition holds if and only if the set property definition holds. And by implication, the set property definition holds if and only if the sequential definition holds. Let’s prove it.

First, we will show that a function meets the sequential criterion if and only if the function meets the distance criterion.

Assume a function meets the sequential criterion at x0. We need to show this function meets the distance criterion at x0. Assume to the contrary that this function did not meet the distance criterion. Then there would exist some ε > 0 such that for all δ > 0, | x x0 | < δ would yield | f( x )- fx0 ) | ≥ ε.  We denote this special ε as ε0. To garner a contradiction, let’s let n be a natural number and set δ = 1/n. Well, for each 1/n we would see | x x0 | < δ = 1/n would yield | f( x )- fx0 ) | ≥ εfor x in dom( f ). Choose such an x for each 1/n and respectively label each selected x as xn. Note this gets us a sequence { xn } in dom( f ) and this sequence converges to xwhich is also in dom( f ). And so recall that via the sequential criterion there always exists a δ such that | xnx0 | < δ implies | f(xn )- fx0 ) | < ε0. Thus, we won our contradiction: it cannot be true that there exists some ε > 0 such that for all δ > 0, | x x0 | < δ implies | f( x )- fx0 ) | ≥ ε  and so, for any possible ε > 0, there is some δ > 0 such that | x x0 | < δ implies | f( x )- fx0 ) | < ε thereby satisfying the distance criterion.

Now, assume a function meets the distance criterion at x0. To complete this proof, we need to show this function meets the sequential criterion at x0. Define a sequence { xn } in dom( f ) that converges to x(which is also in dom ( f )). And let ε > 0. To show that { fxn ) } converges to fx0 ), we must find a natural number N such that for index values n > N, we see that| f(xn ) - fx0 ) | < ε.  Note that via the distance criterion for any x in dom( f ), there will always exist a δ > 0 such that | x x0 | < δ implies | f( x )- fx0 ) | < ε. And so since xn is in dom( f ) we know there will always exist a δ > 0 such that | xnx0 | < δ implies | f(xn )- fx0 ) | < ε whenever n > N. Thus, we have satisfied the sequential criterion. ■

Second, we will show that a function meets the distance criterion if and only if the function meets the set property criterion.

Assume a function meets the distance criterion at x0. We need to show this function meets the set property criterion at x0. Let S be an open subset of the reals. To show f-1S ) is open, we seek a number δ > 0 such that the interval (x0 - δx0+ δ) is inside f-1( S ). Let x0be in f-1( S ). Then clearly fx0 ) is in S (because fx0 ) is in f ( f-1( S ) ) = S). And since S is open, we know there is some ε > 0 such that the interval ( f x0 ) - ε, f ( x0 ) + ε) is in S. Because f meets the distance criterion at x0, we know there is in fact a δ > 0 such that for any x in the interval (x0 - δx0+ δ)  we see that f( x ) is in (fx0 ) - ε, fx0 ) + ε) . And we have already established that this interval is inside f-1( S ). So, f-1S ) is open.

Now, assume a function meets the set property criterion at x0. To complete this proof, we need to show this function meets the distance criterion at x0. Let x0be in dom( f ) and let ε > 0. Again, we seek a number δ > 0, but this time, we need δ such that | x x0 | < δ implies | f( x ) - fx0 ) | < ε.  In other words, we need δ such that for any x in (x0 - δx0+ δ) we see that f( x ) is in ( fx0 ) - ε, fx0 ) + ε). Note that (fx0 ) - ε, fx0 ) + ε)  is open. And so because (via our assumption) we know the function meets the set property at x0, we also know that the entire preimage for this interval is open. Because the preimage is open, we know there is certainly a δ > 0 such that for any x in (x0 - δx0+ δ) we see that f( x ) is in ( fx0 ) - ε, fx0 ) + ε). Choosing this δ, we can see that | x x0 | < δ implies | f( x )- fx0 ) | < ε. And so the function meets the distance criterion. ■

Lastly, we will show that a function meets the set property criterion if and only if the function meets the sequential criterion.

Assume a function meets the set property criterion at x0. To show this function meets the sequential criterion at x0 we invoke our previous results: a function meets the set property criterion at x0 if and only if it meets the distance criterion at x0 if and only if it meets the sequential criterion at x0.

Now, assume a function meets the sequential criterion at x0. To complete this proof, we just show this function meets the set property criterion at x0again via our previous results: a function meets the sequential criterion at at x0 if and only if it meets the distance criterion at x0if and only if it meets the set property criterion at x0

And there you have it. We have have unified the perspectives; the criteria are proven equivalent.


Crossley, Martin D. Essential Topology. 2005.

Fitzpatrick, Patrick M. Advanced Calculus: Second Edition. 2009.

    Jun 30    
Perspectives on Continuous Maps

Unmotivated mathematics slays enthusiasm, similar to announcing a fairy tale ending without the plot. Even when mathematical concepts are introduced in a motivated manner, we can sense an unstated truth: not all motivations are equally titillating.

Besides demonstrating utility, historical context can provide the final intuitive glue that makes it all finally “click” into our understanding. For instance, in the quest to solve systems of linear equations, the determinant was born.

Organizing Coefficients into a Matrix

A system of linear equations is a set of equations with coefficients attached to unknown variables that we essentially want to discover. We can denote this most concisely via the summation operation so that the n total equations in the system would show up as


In this system of equations, there are a total of m unknowns {x1, x2, …, xm} that appear in each of the n equations. There are total of n x m coefficients in the system, and these coefficients may or may not have the same value. These coefficients are considered “given.” Further, these coefficients can be arranged into an array that would be a matrix with n rows and m columns.

A “Hint” from the Matrix

What if you wanted to know whether a solution to this system exists? Who wants to search for something if in fact it does not exist? Enter the determinant: if the determinant is zero, you can stop the hunt. 

The reason the determinant provides this hint is because it communicates information about linear independence (or lack thereof) of the rows in each matrix (where each row represents the coefficients of one linear equation in a system). Linear independence is a necessary condition for finding a solution because conceptually linear independence tells us that each equation’s set of coefficients provides “new” information that can be used to identify the solution to the system. Linear dependence tells us we have essentially redundant pieces of information. As it turns out, to find a solution to a system with m unknowns we need at least m pieces of “new” information.

Defining the Determinant

Whatever way the definition of the determinant is presented, the definition involves other terms also requiring definition. 

The determinant of an n x n matrix (which we’ll call A) is the sum of all the possible permutations of entries from each row in a matrix, where each permutation is respectively multiplied by its “sign.” 

So, the determinant denoted det(A) can be summarized as


In this compact notation, the determinant will always involve a sum of n! terms because this is the number of possible permutations of n items. To see how this works when = 3 for example (that is, a matrix with just 3 rows and 3 columns), observe that


represents a sum of 6 terms (summands). Rewriting this without the summation operation, we get


Note the 6 summands correspond to 6 possible permutations, shown respectively as the second row in each array below each summand. Each permutation is an element of the set S3. To get the final determinant calculation, we need to multiply each summand (for each permutation) by its respective sign. The sign of each “permutation” (embedded in the subscripts) is denoted


and will have one of 3 possible values -1, 0, or 1. If these arrangements include any repeated entries (e.g., a11*a22*a22), we will not call these true permutations; the sign of such an arrangement will be recorded as zero. If an arrangement includes only distinct entries, we’re dealing with an actual permutation and so the sign will map to -1 or 1, depending on the number of position switches (k) needed to change the trivial permutation to the permutation we are considering. The trivial permutation is regarded as such because it simply maps 1 to 1, 2 to 2, 3 to 3, and so forth; that is, it represents an identity mapping. To summarize the “switches” required to form each of the 6 permutations:


So, in this example, the determinant equals


Determinant Formula

Recall the compact notation from above


To link this to the popular cofactor expansion formula, we need only define cofactor and recognize how it relates to this compact notation. For a given matrix entry aij, we define the cofactor with respect to this entry as


which is equal to


Fix i and then sum up all cofactors (that is, sum over all possible j) and we have the same result yielded from our initial compact notation.

Dissecting the Summands via Multilinear Maps

As already discussed, each summand is essentially a product of a subset of coefficients multiplied by either -1 or 1. Returning to the seed of this discussion, a question commanding an answer is: what is each summand and what is it telling us about the coefficient matrix? 

Conceptualizing the determinant via abstract algebra yields the following (equivalent) definition of a determinant: 

The determinant of matrix A denoted det (A) is an alternating n-linear mapping into a field (of scalars). 

Considering again matrix A, let’s denote the set of coefficients for the ith equation with the m-component vector ai = [ai1, ai2, … , aim]. (To enable the determinant, assume m = n and recall we have n total equations hence matrix A is an n x n matrix.) As a mapping (i.e., a function), let’s denote the determinant as D. With this notation and this abstract algebra definition, we can now claim the following useful propositions:


As a mapping,   maps (a1, a2, … , an) to a scalar.


When we switch the positions of any 2 vectors  ai and aj

D (… , ai, …, aj,…) = - (… , aj, …, ai,…)


When vectors repeat (that is when  i = j ),

D (… , ai, …, aj,…) = 0


With 2 scalars s,t and with 2 selected vectors ai and aj,

D (a1, … , s*ai + t*aj, … , an) = D (a1, …, s*ai, … , an) + D (a1, …,t*aj, … , an)


The determinant of the identity matrix is 1 (i.e., no matter how many different standard basis vectors ej you plug into D, D will map to 1).



This is a consequence of proposition (ii).

All of these 6 propositions are part of the abstract algebra definition; they just show what it means for D to be an “alternating n-linear mapping into a field of scalars.” 

With these propositions in mind, we can start to see the inner machinery of the summands because in the coordinate representation of any matrix with linearly independent row vectors, we can specify each vector to be a sum of appropriately scaled standard basis vectors. So, for a given i


which means we can go back to proposition (i) above and rewrite it as


The last equality holds because if for instance the standard basis vectors ej1 and ej2 are equal in D(ej1, ej2, … , ejn), the respective summand zeroes out. So, we are left only with the summands where D is mapping from distinct permutations of subscripts. 

In the special case where our matrix has only 2 rows and 2 columns, we could even show

imageSo, in this case we can represent the determinant with matrix multiplication!

Applications of the Determinant

The determinant provides quick hints about the linear independence of a matrix. Its relationship to the trace of a matrix and the inverse of a matrix yield useful shortcuts. It is even the vehicle for defining the characteristic equation, unleashing the wide applications of eigenvalues. Grasping the meaning of the determinant unlocks far-reaching analytical tools.

Katznelson, Yitzhak and Katznelson, Yonatan R.  A (Terse) Introduction to Linear Algebra. 2008.

Muir, Thomas.  A Treatise on the Theory of Determinants. 1882.

    Jun 10    
Empowering Determinant Motivation

Vector is a latin word that means carrier, and fittingly in mathematics, vectors are defined as mathematical elements that have both magnitude and direction. Hence, a vector conveys information about the direction in which something will be carried as well as the distance over which it will be carried, and this information is central in many academic disciplines—from physics to statistics.

In terms of notation, a vector looks like a string of numbers separated by commas so: a 3-dimensional vector would have 3 components separated by commas, for example: [2,4,6]. A 2-dimensional vector would have 2 components separated by commas as in: [1,5]. And a 1-dimensional vector has just one component so we often refer to these vectors as “scalars” since we can change the scale of any n-dimensional vector (making it smaller or larger) when we multiply it by a 1-dimensional vector.

The magnitude (or length) of a vector v can be denoted


This is also regarded as the “norm” of a vector since we can “normalize” the vector v by dividing each component of the vector by its length. A normalized vector has a length of 1. So, for any vector v, the new vector


will have a length equal to 1, and so we can also think of this as the unit vector. Unit vectors always have a length of 1. And so if we think of this vector protruding from the point (0,0) in a coordinate plane, we can see that unit vectors also tell us in what direction the vector v is going. Fittingly, when we multiply the magnitude of a vector by its directionimage

we get the same vector we started with since the norms cancel out. Mathematically then, we can easily see that vectors are comprised of both their magnitude and their direction.

Calculating the magnitude of an n-dimensional vector is essentially taking the square root of the inner product. So, for an n-dimensional vector v with components [v1, v2, … , vn], we get the magnitude as


The distance between two vectors is calculated by taking the square root of the inner product of the difference between two n-dimensional vectors, say v and w as in


Intuitively, this is similar to the absolute value of the difference between two numbers, which yields the “distance” between those two numbers.

Now, if we connect these 3 vectors v, w, and v - w, we have a triangle which enables us to dust off (i.e., use) our trigonometry facts…


The Law of Cosines tells us


where theta is the angle opposite v - w. 

Rearranging and consolidating terms where possible, we see that


Projecting Vectors

Say we wanted to essentially reflect v on w. We would call this a “projection” of v along w, denoted v* below.


Observe that we now have a right triangle with sides v, v*, and v - v*, which means we also know


Rewriting this and then including the previous information that we gathered about cosine of theta, we have the magnitude of v*


And so we derive v* by multiplying this magnitude by the direction of w


Projections as Linear Transformations

Projections are linear transformations. In other words, if we think of a projection as a mapping T , then by the linearity of T, for any two vectors x and in the domain and for any two real number scalars a and we know

T(ax + by) = aT(x) + bT(y)

Projections are also idempotent linear transformations. In other words, by the idempotence of T for any vector x in the domain of we have

T( x ) = T( T( x ))

Hence, repeated compositions of the mapping T result in the same mapping.

Bridging to Direct Sum Decompositions

In early studies of advanced linear algebra,  the level of abstraction is raised by merging these definitions with additional concepts such as the direct sum decomposition of any vector space say U, wherein each vector x in U has a unique representation as a sum of two other vectors (from two respective vector spaces say V and W). That is, for a 2-dimensional vector [3, 4] in U for example, there is only one v in V and only one w in W that will sum to this 2-dimensional vector. In order to map a vector x to a vector equal to any vector in W using this direct sum decomposition of U, we must always choose the v in V that is equal to the zero vector so that the vector x = v + w = + w = w. This mapping is regarded then as the projection of x onto w along v. With this concept in mind, it can be shown that an idempotent linear transformation on a vector space (say U)is a projection onto its range in the direction of its kernel (also called the null space). In this case, the direct sum decomposition of U is comprised of the range of this mapping (which is considered one vector space) and the kernel of this mapping (which is considered to be another vector space).

Abadir, Karim M. and Magnus, Jan R. Econometric Exercises 1: Matrix Algebra. 2005.

Katznelson, Yitzhak and Katznelson, Yonatan R. A (Terse) Introduction to Linear Algebra. 2008.

Lipshutz, Seymour and Lipson, Marc Lars. Schaum’s Outlines: Linear Algebra. 2009.

    May 11    
Project Vector

Elementary analysis (i.e., introductions to theoretical calculus—see last blog post) starts with revelations of the previously “unforeseen” or perhaps “unappreciated” intricacies of the number line. The first conceptual diamond mined for our awe-filled pleasure is:

we cannot rely merely on whole numbers and fractions (also known as integers and rationals) to unleash the full precision offered by real numbers (also known as the reals).

This gem is usually presented by way of the square root of 2. This is the first positive integer whose square root is not merely another integer. So, it seems like a great candidate for further “analysis.” As it turns out, the square root of 2 is not a rational number, but the proof is not as straightforward as you might expect. As a matter of fact, most textbooks present an indirect proof (a proof by contradiction) perhaps for the simplicity. But indirect proofs have a way of hiding the reasons behind the “truth.”

Indirect Proof

To show that the square root of 2 is not rational, we need to identify the properties of the solution to the following equation:  x = sqrt (2). And rewriting we have x2 = 2. Assume to the contrary, that the solution to this equation were in fact a rational number which we denote as ( p/q ) in lowest terms where p and q are both integers (with q different from zero). Then, p/q = sqrt (2). And rewriting we have ( p/)2 = 2. Rewriting again we have p2 = 2q2. So, pis even since we can write it as a multiple of 2. Because p2 is even, we also know that p is even. (This can be proved separately, and to exercise your proof writing skill, you should prove it to yourself.) But since p is even, we can write it as 2m where m is some integer. And so p2 = 4m2 . Hence, we know 4 is a factor of p2 which means 4 is also a factor of 2qbecause remember as we first said p2 = 2q2. So, 2qcan be written as 4n where n is some integer. Putting this together, we see 2q2 = 4n. And so dividing both sides of this equation by 2, we see q2 = 2n. And so q2 is even since we can write it as a multiple of 2, which implies q is even. But p/q is in lowest terms by our initial assumption. And if both p and q are even, p/q could be further reduced and so this violates our initial assumption. Therefore, if the solution to the equation x = sqrt (2) is a rational number in lowest terms, we reach a contradiction. Hence, the solution to this equation cannot be rational. ■

Direct Proof

For the direct proof, we need the assistance of the rational zeroes theorem for polynomials (also called the “rational root theorem”). This theorem says that if the coefficients in a polynomial are integers and if the solution to the polynomial (the zero or the root) is a rational number (in lowest terms), then the numerator of this rational number divides the first coefficient and the denominator divides the last coefficient (i.e., the coefficient on the highest degree term of the polynomial). In other words, the only possible rational number solutions ( p/q ) to the equation
are such that p is a factor of a0 and q is a factor of an. Hence, to show that the square root of 2 is not rational, we need to identify the properties of the solution to the following equation:  x = sqrt (2) and rewriting we have x2 = 2, and rewriting this again as x2 - 2 = 0, we can easily see that we are dealing with a polynomial of degree 2, and a0 = -2 while an= a2= 1. (If you’re wondering where is a1, that term is zero, which is why you do not see it.) So, our candidates for p are 2, -2, 1, and -1. Meanwhile, our candidates for q are 1 and -1. Trying all possible combinations of this for p/q, we have 2, -2, 1, -1, 1/2, and -1/2. So, if the solution to the equation x2 - 2 = 0 is a rational number, it MUST be one of these. But after trying each of these, one at a time, we see that none of them satisfies the equation. Therefore, the solution is not rational. ■

Comparison of the Direct and Indirect Proofs

Here, we have 2 of many possible ways to prove that the square root of 2 is not a rational number. The indirect proof relies on principles of factorization, making this a mere exercise in number theory: we use the known factor of all even numbers, which is 2, to show that if ( p/q ) is in lowest terms we reach an impossible scenario. The direct proof also relies on principles of factorization, but in this proof, we also invoke properties of polynomials: we generate a list of all the possible rational solutions to the polynomial equation, and then by plugging in one at a time, we see that none of them works.

Grasping the indirect proof begs you to acquaint yourself with some general rules from number theory, while grasping the direct proof begs you to understand an important theorem about polynomials. In either case, you are introduced to real analysis, by seeing that the real number line includes irrational numbers—numbers that cannot be written as a ratio of two integers.


Ross, Kenneth A. Elementary Analysis: The Theory of Calculus. 1980.

Rudin, Walter. Principles of Mathematical Analysis. 1953.

    Mar 31    
Gateway to the Reals

The quest to build computational ability with functions involving single and multiple variables across differential and integral calculus is often broken into: Calculus I, Calculus II, and Calculus III. Beyond this famous trilogy lies analysis, a neat euphemism capturing the theoretical underpinnings of calculus. 

Analysis is further distinguished by two large groupings, real analysis and complex analysis. These titles hint at the two sets of numbers involved: real numbers and complex numbers. If you learn analysis, you essentially develop a thorough understanding of why Calculus “works.” 

Whether it’s a graduation requirement or a graduate school prerequisite (implicit or explicit), a student’s first encounter with analysis is rarely a breeze. One of the key reasons analysis differs from the calculus trilogy is that practically all the facts and techniques you need to learn are communicated through proofs. In the calculus trilogy, the proofs are largely skipped over. Analysis revisits all those skipped proofs in gory detail. 

If you only have a little exposure to proofs, a first course in analysis can be uncomfortable (to say the least). To smooth the transition, there are entire courses devoted strictly to teaching proof writing. But some colleges embed introductions to proofs in other courses, such as number theory, linear algebra, or even “advanced” calculus. 

To support the highest chance of success on the first trial, a first analysis course should be focused on “elementary real analysis,” i.e., the textbook should be a variant of “baby” Rudin (Principles of Mathematical Analysis) for undergraduates and advanced undergraduates, which is not to be confused with variants of “daddy” Rudin (Real and Complex Analysis) for graduate students. But different colleges use different course titles and textbooks, so doing some advance research can help you to have a better sense of what to expect. The relevant Schaum Outline, by the way, is under the heading Advanced Calculus

    Mar 15    
To and Through Analysis: Code Word for Theoretical Calculus