The absurd premise of basically all types of statistical analysis

Statistics. A rather dirty and often absent term in mainstream mathematics education, all the way from K-12 to post-secondary and considered to be outside the realm of formal mathematics (its cousin, probability, has however made significant strides towards being accepted as ‘main stream’, as evidenced by, for example, S._R._Srinivasa_Varadhan receiving the Abel Prize in 2007). This is a larger issue known as ‘the dominance of calculus’, which can be discussed separately. This leads often to incorrect applications of statistical methods and mathematical theorems in real world applications.

To keep things terse, I will discuss just two instances of improperly applied statistical methods by people who really should know better. The first is the insistence of applying the ‘bell curve’ to assigning grades in classes, regardless of class size or the type of distribution that lies below. First, there is theoretical work to support the ubiquity of the bell curve (or normal distribution), because it is indeed the limit distribution of the arithmetic mean of a sequence of independent identically distributed random variables (the Wikipedia page on central limit theorem provides a reasonable account of the result: http://en.wikipedia.org/wiki/Central_limit_theorem). Here is an example where the central limit theorem is still correct but gives a very poor estimate of what actually happens: suppose that you have an exam, taken by a (large) number of students, consisting of only two questions. Each question is worth half of the exam. The actual data from an actual class will be tri-modal: there will be a cluster of students who answered both questions correctly, another cluster who answered just one question right, and finally a cluster with students answering neither question. This data set looks nothing like a bell curve, yet even in cases like this teachers will try to force-fit the data.

The second occurs in economics. In every economics textbook, one of the most basic things to learn is that demand is governed by utility maximization and supply driven by the demand. On a personal level, this basic premise is already difficult to accept. Is it really typical human behavior to consider demand in the following manner? That you would pay 100 dollars for a pair of skates, and would pay 90 dollars for two pairs of (identical) skates? How about 80 dollars for three pairs? This seems counterintuitive, since usually people decide they need something, and then find just one option that best meet their needs and price range. It is thus very hard to justify that individual demand ‘curves’ should be continuous or smooth. Finally, even if this somewhat outlandish (yet universally assumed in economics circles) assumption is accepted without challenge, this still does not lead to the elusive supply-demand curve that supposedly exists on a macro scale. This is the simple result of elementary analysis: the limit of a sequence of continuous functions (in this case, the average of the supply curves of many people) need not be continuous. In fact they could be continuous nowhere.

It is of great interest to me that flaws in applications of statistical methods be exposed and discussed, since these misapplications are a driving force in some of the biggest human-caused catastrophes in the last century, for instance the 2008 financial crisis and the decades of economic ideology that lead the USA to that point.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s