Monday, October 17, 2022

My Notes from "Calling Bullshit: The Art of Skepticism in a Data-Driven World"

Calling Bullshit: The Art of Skepticism in a Data-Driven WorldCalling Bullshit: The Art of Skepticism in a Data-Driven World by Carl T. Bergstrom
My rating: 5 of 5 stars

A good review, for scientists and engineers, and probably news to the liberal arts culture (reference C. P. Snow's "The Two Cultures"), of the current use of numbers, statistics and charts to mislead readers.

The final chapter "Refuting Bullshit" is the best chapter of the book. Read this chapter first if you are in a hurry. Then fill any gaps in understanding by reviewing the previous chapters and sections as needed.

Page 232 says, "A single study doesn't tell you much about what the world is like. It has little value unless you know the rest of the literature and have a sense about how to integrate these findings with previous ones. Researchers weigh the evidence across multiple studies and try to understand why multiple studies often produce seemingly inconsistent results."

They quote Jonathan Swift, "Falsehood flies, and truth comes limping after it."


In chapter 9 he explains a principle called the "base rate fallacy" based on the statistical concept of "p-value."

He uses several examples, one is a suspect whose finger print matches the finger print on file with the police. The reported odds of this are one in 10 million. But the probability that the suspect is guilty requires that we know how many other peoples have the matching finger print. It turns out that in a database of 50 million, 5 other people are a match also.

Therefore the odds of the person being guilty are one in five, not one in 50 million!

When the suspect comes up with a matching finger print then the probability must evaluated agains all the others who also have a matching finger print.

He provides more examples, Lyme disease testing, testing for ESP with playing cards, why so many "proven" science results cannot be duplicated by other scientists. If you come up positive on a Lyme disease test the probability of your having Lyme disease must be based on the population of positive results. How many false positives for Lyme exist? At the time of this book, it was a surprisingly large percentage.

My take away is the the "p-value" error may be like finding theories to fit the data. About as bad as selecting data to fit the theory. For theories based on research of huge amounts of data, data can always be found to support many spurious correlations.

[Please note that they say matching finger print, distinct from the same finger print.]
View all my reviews