Monday, September 10, 2012

Statistical significance vs. significance

Alright, it's time to start plowing through some of my unfinished drafts here, in no particular order. I've been sitting on this one for a while, and it follows in the theme of this post and this post, both of which discussed the questionable validity of study results. From the Freakonomics blog... 
A new paper by psychologists E.J. Masicampo and David Lalande finds that an uncanny number of psychology findings just barely qualify as statistically significant.  From the abstract:
We examined a large subset of papers from three highly regarded journals. Distributions of p were found to be similar across the different journals. Moreover, p values were much more common immediately below .05 than would be expected based on the number of p values occurring in other ranges. This prevalence of p values just below the arbitrary criterion for significance was observed in all three journals.
Alright, yeah, I know that's a little stat-wonky/jargony, but the basic point is that a large number of clinical trials that report "significant" results are in fact barely scraping by on the statistical validity scale.

In any statistical study, the "goal" is to show a result that is too extreme to have occurred simply by random chance. A "p-value" of .05 means that there is only a 5% chance that the study result could have occurred simply by chance—low, but not impossible. What we're seeing here is that a large number of "statistically significant" studies are scraping by in this little margin-of-error window just on the "right" side of that 5%. Hence, there's a pretty decent chance that at least some of those studies are reporting something as significant that is actually dumb luck or chance—indeed, probably about one out of every twenty is reporting a significant result when none in fact exists.


Now, I don't really want to go too far down a road talking about bell curves and standard deviations on normal distributions, so I won't. But the point of the matter is, the incentives to report a "statistically significant" result are typically pretty strong, and so we should take a lot of the study results that we read (you know, stuff like "Coffee causes cancer! Also, it prevents cancer and cures cancer, but only when taken in specific doses at pre-determined times over several decades! So drink coffee, and also, don't drink coffee!) with an enormous grain of salt.

A lot of the time, the stuff we're reading is just a reporting of statistical noise and random chance, with a catchy headline attached. So please, people, don't fall prey to the people who want to confuse us with numbers—they're seriously everywhere these days, especially in an election year. Know the statistical background, and you'll be better able to determine for yourself whether a study result is actually significant, or just statistically significant.

[Freakonomics]

No comments:

Post a Comment