“The fact is that, despite its mathematical base, statistics is as much an art as it is a science. A great many manipulations and even distortions are possible within the bounds of propriety. Often the statistician must choose among methods, a subjective process, and find the one that he will use to represent the facts. This suggests giving statistical material, the facts and figures in newspapers and books, and magazines and advertising, a very sharp second look before accepting any of them. Sometimes a careful squint will sharpen the focus.” (How to Lie with Statistics – p122 and p123)
Title: How to Lie with Statistics
Author: Darrell Huff
Publication Date: 1954
Origin: If I remember correctly, I found out about How to Lie with Statistics when I was purchasing How to Lie with Maps online: the “you liked this so you might like that” engine suggested it. As it happens, I have quite a bit of experience in providing the world with statistical tidbits (if you’ve ever heard that Netflix is one-third of Internet traffic, then you’re welcome), so I was intrigued.
Summary: Early on, Huff warns us that “The secret language of statistics, so appealing in a fact-minded culture, is employed to sensationalize, inflate, confuse, and oversimplify. Statistical terms and statistical methods are necessary in reporting the mass data of social and economic trends, business conditions, ‘opinion’ polls, the census. But without writers who use the words with honesty and understanding and readers who know what they mean, the result can only be semantic nonsense.” (p10). In this book, he sets about arming us, the readers, with the skills and awareness to see through the many stats with which we are bombarded.
While presented as an entertaining “how to” guide, he’s really arming us with the skills and awareness needed to see through the many questionable stats with which we are bombarded. In Huff’s words, “The crooks already know these tricks; honest men must learn them in self-defense.” (p11) These tricks include biased samples, selective averages, omitting pertinent details, deceptive graphing and imagery, comparing unlike things, misrepresenting correlation as causation, overstating correlations, and others.
He concludes with a chapter that explains how to systematically challenge any statistic.
My Take: In full disclosure, I’m kind’ve a nerd: I’ve read the entire xkcd catalogue, and frequently pass links to friends and reference different strips with colleagues (I’ve been told that I have an xkcd for every situation); one of my favourite Dilbert strips involves stats; I subscribe to Vi Hart’s YouTube channel; and I have many books on math and the great mathematicians.
Additionally, I’ve quite a bit of mathematical training, and I actually quite like math…
Finally, as I alluded to earlier, and have referenced in other posts, I’ve actually authored quite a few comprehensive reports that are full of statistics. Indeed, those reports have spawned the very same sensationalized articles that Huff warns against. In those reports, I (and the author who has now taken it over) took great pains to provide complete information, explain our methodology, and only publish real findings – in our profession, it was crucial that our reports stood up to intense scrutiny (joyfully provided by international engineering organizations, government commissions, large corporations, etc.).
So, to put it lightly, I was excited to read this book. And I was not disappointed!
Huff’s writing is funny, the examples are illustrative, and the illustrations are exemplary. I’m happy to say that I’ve never knowingly or deliberately made any of the mistakes (or deliberate lies) that he warns against. My only disappointment was in the lack of a chapter or discussion of how easy it is to create deliberately surveys that are skewed (like the kind you might get from your elected representatives, in which three or four of the five possible answers are positive).
Plus, despite my experience with the subject, Huff opened my eyes to a few other bits of trickery for which I will now stay vigilant.
Read This Book If: You don’t want to be manipulated by our stat-crazy media, politicians, employers, etc. Even if you don’t aspire to critically examine every stat with which you are presented, simply seeing how easily we can be fooled (or at least seen the tricks that are used to fool us) might develop a healthy skepticism that will cause you to think twice.
Notes and Quotes:
“The secret language of statistics, so appealing in a fact-minded culture, is employed to sensationalize, inflate, confuse, and oversimplify. Statistical terms and statistical methods are necessary in reporting the mass data of social and economic trends, business conditions, ‘opinion’ polls, the census. But without writers who use the words with honesty and understanding and readers who know what they mean, the result can only be semantic nonsense.” (p10)
- p11 hedges his karma: “The crooks already know these tricks; honest men must learn them in self-defense.”
- p24…so basically never trust, entirely, a poll: “The operation of a poll comes down in the end to a running battle against sources of bias…What the reader of the reports must remember is that the battle is never won.”
- p28 on why the pollster might have behaved in an upstanding manner and still produce a biased result: “It is not necessary that a poll be rigged – that is, that the results be deliberately twisted in order to create a false impression. The tendency of the sample to be biased in this consistent direction can rig it automatically.”
- p35-37 demonstrate through use of selective averages why we should not trust salary figures
- p48 argues a bit for disclosure of your interests: “It is dangerous to mention any subject having high emotional content without hastily saying where you are for or agin it.”
- p65 shows how a simple manipulation of a graph’s Y-axis is equivalent to using sensationalized language: “It is a subtler equivalent of editing ‘National income rose ten percent’ into ‘…climbed a whopping ten percent.’ It is vastly more effective, however, because it contains no adjectives or adverbs to spoil the illusion of objectivity. There’s nothing anyone can pin on you.”
- p71, explaining the goal of any statistical misrepresentation, but to introduce the one-dimensional picture specifically: “I want you to infer something, to come away with an exaggerated impression, but I don’t want to be caught at my tricks.”
- p72, on what happens when you scale an image in two or three directions even though it is only representing a single dimension: “To say ‘almost one and one-half’ and to be heard as ‘three’ – that’s what the one-dimensional picture can accomplish.”
- p73: “Some of this may be no more than sloppy draftsmanship. But it is rather like being short-changed: When all the mistakes are in the cashier’s favour, you can’t help wondering.”
- p76, explaining the approach of the semi-attached figure: “If you can’t prove what you want to prove, demonstrate something else and pretend that they are the same thing. In the daze that follows the collision of statistics with the human mind, hardly anybody will notice the difference.”
- p86 provides a nice example of a this-isn’t-quite-that number: “It is an interesting fact that the death rate or number of deaths often is a better measure of the incidence of an ailment than direct incidence figures – simply because the quality of reporting and record-keeping is so much higher on fatalities.”
- p91: “…when there are many reasonable explanations you are hardly entitled to pick one that suits your taste and insist on it. But many people do.”
- p93: “Another thing to watch out for is a conclusion in which a correlation has been inferred to continue beyond the data with which it has been demonstrated.”
- p100, on why we should all be familiar with the common mistake (or deliberate misdirection), “post hoc ergo propter hoc“: “Permitting statistical treatment and the hypnotic presence of numbers and decimal points to befog causal relationships is little better than superstition. And it is often more seriously misleading.”
- p103 highlights something that is no surprise to readers of this site: “One of the trickiest ways to misrepresent statistical data is by means of a map.”
- This excerpt from p117 reminded me of a discussion I had with a reporter years ago, who seemed not to know the difference: “Another fertile field for being fooled lies in the confusion between percentage and percentage points. If your profits should climb from three percent on investment one year to six percent the next, you can make it sound quite modest by calling it a rise of three percentage points. With equal validity you can describe it as a one hundred percent increase. For loose handling of this confusing pair watch particularly the public-opinion pollers.”
- p117 gives us an important proviso about using percentiles, for which we can thank the characteristics of the normal/Gaussian distribution: “The odd thing about percentiles is that a student with a 99-percentile rating is probably quite a bit superior to one standing at 90, while those at the 40 and 60 percentiles may be of almost equal achievement.”
- p118: “You could take your choice of conclusions. Or, perhaps better, you could easily see that neither element could properly be singled out as the guilty one. It is sometimes a substantial service simply to point out that a subject in controversy is not as open-and-shut as it has been made to seem.”
- p122: “The fact is that, despite its mathematical base, statistics is as much an art as it is a science. A great many manipulations and even distortions are possible within the bounds of propriety. Often the statistician must choose among methods, a subjective process, and find the one that he will use to represent the facts.”
“In commercial practice he as about as unlikely to select an unfavorable (statistical) method as a copywriter is to call his sponsor’s product flimsy and cheap when he might as well say light and economical.” (p122)
- p123: “This suggests giving statistical material, the facts and figures in newspapers and books, and magazines and advertising, a very sharp second look before accepting any of them. Sometimes a careful squint will sharpen the focus. But arbitrarily rejecting statistical methods makes no sense, either. That is like refusing to read because writers sometimes use words to hide facts and relationships rather than to reveal them.”
- p124+ is summarized in How to smell and challenge a statistical rat
“Many a statistic is false on its face. It gets by only because the magic of numbers brings about a suspension of common sense.” (p140)
[…] can use data to tell a good story; in fact, many people use data to lie. But, as Dan noted, today’s audiences are becoming skeptical about the stats and facts that […]
[…] yes, I know that lying with statistics is very […]