“People can come up with statistics to prove anything…Forfty percent of people know that.” – Homer Simpson, on Smartline

“Oh, people can come up with statistics to prove anything, Kent. Forfty percent of people know that.” – Homer Simpson, on Smartline
We are surrounded by statistics. Read a few news stories or watch a couple of ads, and there’s a 38% chance you’ll be exposed to at least one statistic. In many ways, our lives are governed by statistics: salary negotiations frequently have the employer citing industry and sector averages; insurance rates are determined by actuaries; politicians play free and easy with stats to push for new laws and regulations; municipal tax rates often rely on housing averages, and so on. In fact, many years ago H. G. Wells suggested that, “Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write”.
We’ve been warned to be wary of statistics (one of the more famous quotes of all time is “There are three kinds of lies: lies, damned lies, and statistics”, popularized by Mark Twain and attributed to Benjamin Disraeli), and while some people have a healthy skepticism of statistics others may have a blind faith. In reality, most of us probably fall somewhere in-between because, despite the importance and prevalence of statistics, most of us lack the mathematical training to question any of the stats with which we are presented day in and day out. As a consequence, we let them fly into our heads unchallenged.
But fear not (!), for many years ago a tome was written that makes statistical lies recognizable to the general population: How to Lie with Statistics, by Darrell Huff.
Published in 1954, it remains as relevant today as it was in the post-World War II years; after all, politicians, advertisers, employers, and many other folks have today the same incentives to deliberately mislead as they did back then.
After training us on how to produce wonderful statistical lies, Huff evens out karms by concluding the book with a chapter called How to Talk Back to a Statistic, and in it he arms us with some questions that might make our lives easier. In his words, “Not all the statistical information that you may come upon can be tested with the sureness of chemical analysis or of what goes on in an assayer’s laboratory. But you can prod the stuff with five simple questions, and by finding the answers avoid learning a lot that isn’t so.” (p124)
Question #1: Who says so?
“About the first thing to look for is bias – the laboratory with something to prove for the sake of a theory, or a reputation, or a fee; the newspaper whose aim is a good story; labor or management with a wage level at stake.” (p125)
Huff advises us to look out for conscious bias; that is, a bias that is deliberate and manifests in outright deception, selective presentation of information, suppression of inconvenient data, and so on. Importantly, he also warns against unconscious bias, suggesting that “it is often more dangerous” (p125). An individual’s generally rosy outlook might cause them to inadvertently overlook negative data, for instance. Accidental or not, the impact is the same as a conscious decision.
Huff spends quite a bit of space outlining another common tactic, use of what he calls “the OK name” (p125). By this, he means to watch out for any stat that purportedly comes from an institute with a fancy or recognizable name. But surely data from such an organization is good, you might be thinking. Huff doesn’t disagree. Rather, he warns us to watch out for cases where a stat might originally come from an OK name, but the presentation of the fact twists and turns and deviates from the original study or finding – the result being that the reader attributes to the organization something that was not actually found or intended. In other words, Harvard might have found X, but the newspaper article might have wandered far of course in order to sensationalize, but the reader thinks “Wow, Harvard! I will now suppress all skepticism.”
Question #2: How does he know?
In this section, Huff wants us to learn to ask how data was gathered. Was it a survey? A random sample? A mathematical calculation? He tells us, “Watch out for evidence of a biased sample, one that has been selected improperly…Ask the question: is the sample large enough to permit any reliable conclusion? Similarly with a reported correlation: is it big enough to mean anything?” (p128)
You might be wondering how, as normal people, we can determine if a sample is large enough, or if a correlation is significant. While there’s no perfect mechanism available most of the time, Huff’s confident that we will do alright: “You cannot, as a casual reader, apply tests of significance or come to exact conclusions as to the adequacy of a sample. On a good many of the things you see reported, however, you will be able to tell at a glance – a good long glance, perhaps – that there just weren’t enough cases to convince any reasoning person of anything.” (p128)
Question #3: What’s missing?
In a nutshell, Huff wants us to find out what has been omitted from the presentation of the statistic. While there is no single way of finding out, there are a number of things that are commonly and conveniently forgotten, and all are red flags sufficiently large to cast a shadow of suspicion:
- Information about sample sizes
- When dealing with a correlation, a lack of a reliability figure (standard error or probable error)
- When dealing with an average, an explanation of what type: mean, median, mode
- Related figures that would make a comparison possible
- When dealing with an index, information about the base
- Percentages without raw figures
- When dealing with an average, something to indicate the distribution
- In any situation, an explanation of the measurement technique and consideration of how that technique might have changed over time (extremely useful when examining statistics around diagnosis rates for diseases over time)
No source with nothing to hide should balk at being asked for any of the above.
Question #4: Did somebody change the subject?
In Huff’s words, “When assaying a statistic, watch out for a switch somewhere between the raw figure and the conclusion. One thing is all too often reported as another.” (p133)
For instance, more reported cases of crime are not the same thing as more incidences of crime. Perhaps the mechanism of reporting was made easier. Consider as well the difference between the amount of money people donate to charity per year, and they amount that people say they donate.
Some other common tricks include: comparing apples to oranges, stating cause and effect conclusively when dealing with a correlation, and any claims about being first in some category.
Question #5: Does it make sense?
Maybe Huff could have led with this one and saved us a lot of work, as he readily admits that “‘Does it make sense?’ will often cut a statistic down to size when the whole rigamarole is based on an unproved assumption.” (p139)
As powerful as this simple question is, all too many of us fail to ask it. We suspend our disbelief because of an OK name, or we see some incredible precision in a percentage and then assign some untouchable aura to the whole conclusion.
But we should stop being so gullible and polite! As Huff tells us plainly, “Many a statistic is false on its face. It gets by only because the magic of numbers brings about a suspension of common sense.” (p140)
Extreme precision and anything involving extrapolation (shout out to anyone who’s read Confessions of an Economic Hitman) are other red flags billowing in the wind: just stop and ask if it’s reasonable to know something nebulous to that degree of precision, or to extrapolate a trend (that itself might be based on questionable data) for a long period).
Final Remarks
And there we have it – five simple questions to ask, in place of statistical calculations, that will help you challenge all the advertisers, politicians, employers, think-tanks, and scienticians out there who are trying to pull the wool over our eyes with the magic of numbers. Seriously, try applying those for even just a day or two, each time you’re presented with a stat, and you’ll likely increase your level of skepticism and be the better for it.
But hey, if five questions are too many to remember, then you can just heed this advice:
“Round numbers are always false.” – Samuel Johnson
[…] started off with one of my favourite quotes: “People can come up with statistics to prove anything. Forfty percent of people know […]
[…] How to smell and challenge a statistical rat […]