This essay (serialized here across 24 separate posts) uses words and numbers to discuss the uses of words and numbers — particularly examining evaluations of university degrees that employ statistical data to substantiate competing claims. Statistical analyses are crudely introduced as the mode du jour of popular logic, but any ratiocinative technique could likely be inserted in this re-fillable space and applied to create and defend categories of meaning with or without quantitative support. Questions posed across the series include: Is the data informing or affirming what we believe? What are the implications of granting this approach broader authority? The author, Melanie Williams, graduated from UA in 2006, with a B.A. in Anthropology and Religious Studies.
Statistical inference and probability can employ as masochistic a level of computation as the user wishes to pursue, but we can look at some basic principles that will move the conversation along without exceeding the ten-digit limit of nature’s abacus. Statistics, broadly defined, is a branch of the formal sciences that deals with the collection, organization, and analysis of data. Data, for our purposes, may be anything we wish to define as objects of our attention. When you sit on a curb licking a Push Pop and counting blue cars vs. red cars, you are gathering and cataloging statistical data. We often use this data to infer what we may not directly observe – that is, to use our sample to posit a broader statement: “I think there are more red cars than blue cars in this town.” Our conclusion is an example of inductive reasoning, in which bits of information are collected and used to formulate a general proposition. This is where probability comes into play, to gauge the likelihood of our hypothesis holding up in the face of new information – or, just as often, to turn a profit on a more heuristic, casually calculated gamble: “I bet you two Skittles there are more red cars than blue cars.” The confidence you place on such a bet can be assigned a numeric value based on the colors of various cars you have already tallied – kindling, in turn, any spark of compunction in your pals. Probability, in a textbook sense, is an expression of the chance ascribed to a specific event against all possible events within a fixed set of conditions: rolling a die, for example, which has a 1 in 6 chance of landing on any given face.
Conditional probability – the chance that an event will occur given the occurrence of another event – is a little more complicated, and involves calculating the odds of a particular outcome when a bit of data (a condition) limits the number of total possible outcomes. Back on the curb, with craps and our Push Pop, an example of conditional probability might be: If you roll three dice and the first of them settles on four, what are the odds all three will land on four? You can still perform the calculation on your fingers if you enlist the help of your three closest friends, since the answer will be the product of the known outcome (6/6, or 1) and the individual odds of the specified outcomes (1/6 and 1/6): 1 x 1/6 x 1/6 for 1/36 – that is, the only instance of all possible combinations in which both of the remaining two dice will land four-up. This form of deductive reasoning relies on the assumption of a rule (limits or influences, if any, over the outcome) that can be applied to a variety of events (all possible outcomes.)
Statistics, in short, deal with collections of data sets in which limitations may not be known, and in which outcomes associated with that data may be uncertain. Probability, by contrast, is used to calculate the odds of any given outcome once we have drawn limits around the possibilities. Thus, an important difference of method between statistical inference and applications of probability can be highlighted – in calculating probability, we assume operational boundaries within which we dole out the odds; in gathering statistical data, we construct notions of limits and trends, but we don’t know in advance how many blue cars vs. red cars we may count. We can use the interplay of (inductive) statistics and (deductive) probability to move forward, leapfrog-like, through an inquiry on any given topic. Statistical data is often used to calculate probability by providing a set of precedents upon which future projections may be based; probability, in turn, can be applied to measure the potential relevance of small statistical samples, such as our red and blue cars. Despite the mathematical precision in these applications, hypotheses subject to them are just as susceptible to the trappings of logic as any other form of reasoning, since our starting point is some combination of observation and creative conjecture. A common snag, for example, is the post hoc fallacy: when events are deemed consequential rather than coincidental. Returning to our three dice, were you to bet a Skittle that all three dice will land on four, you should do so with the understanding that the condition we have placed on the range of possibilities (at least one die has already landed on four) limits the total possible outcomes, but does not influence the behavior of the other two dice, each of which has an independent 1 in 6 chance of landing on any face. This is a good place to insert the leverage of superstition – just ask any baseball player who pees on his hands before the first pitch of 14 straight, winning games.
Practical applications of conditional probability can be much more complicated, often surveying diverse arrays of statistical data to explore future outcomes of complex, interconnected processes such as weather or stock market futures. But political elections, as Nate Silver explains, lend themselves neatly to conditional probability, since the number of contenders is public information, reams of statistics in the form of polling data are regularly generated, and the election results will yield a single, definitive winner in each race.
Part 4 coming today at noon…