A recent paper in the prestigious journal Science entitled “Analytic Thinking Promotes Religious Disbelief” by cognitive psychologists Will Gervais and Ara Norenzayan (2012) claimed that subtle prompts designed to elicit analytic thinking cause subjects to reduce their belief in religion. In the paper, the investigators report the results of four experiments. In each experiment, subjects were randomized either to a prompt designed to elicit analytical thinking or to a neutral, control prompt. After being exposed to the prompt, subjects completed a questionnaire that measured their degree of religious belief.
The four experiments differed mainly in the type of prompts used. In the first experiment, subjects were exposed to either a picture of Rodin’s The Thinker (the analytic prompt) or Myron’s Discobolus (Figure 1). In the second experiment, subjects performed a sentence-forming task that involved words suggestive of analytic thinking (eg, analyze, reason, ponder), or control words. The third experiment used the same set of prompts as the second, but scored subjects’ degree of religious belief using a different scale. In the final experiment, the prompt was the questionnaire about religious belief itself. Subjects rated their religious belief using a questionnaire printed in a difficult-to-read font or a normal font, the hypothesis being that the difficult-to-read font prompts analytic thinking.

Left: The Thinker. Right: Discobolus
Table 1 shows the results of the experiments. In all four experiments, the null hypothesis was rejected by a small margin—a p-value of either .03 or .04—and the authors thus concluded that analytical thinking promotes religious disbelief.
| Experiment | Condition | N | Mean | SD | t | P-value | Effect Size |
|---|---|---|---|---|---|---|---|
| 1. Art | Control | 31 | 61.55 | 35.68 | 2.24 | .03 | .59 |
| Analytic | 26 | 41.42 | 31.47 | ||||
| 2. Word task | Control | 43 | 12.65 | 5.29 | 2.11 | .04 | .44 |
| Analytic | 50 | 10.12 | 6.13 | ||||
| 3. Word task | Control | 75 | 40.16 | 16.73 | 2.20 | .03 | .36 |
| Analytic | 70 | 34.39 | 14.77 | ||||
| 4. Font | Control | 88 | 12.16 | 5.99 | 2.06 | .04 | .31 |
| Analytic | 91 | 10.40 | 5.44 |
When multiple independent experiments each reject the same null hypothesis, we normally think of this as strong evidence against the null hypothesis. However, this is only true if the experiments have sufficient statistical power to detect the observed effect. If they do not, and they reject the null hypothesis anyway, then the results suggest some form of systematic error leading to inflated Type 1 error rates, such as publication bias, selective reporting of results, previewing the results and stopping data collection when significance is attained, improper handling of outliers, etc. Furthermore, having four experiments out of four reject the null hypothesis by a small margin is itself unusual, as it suggests that the investigators knew almost exactly how many subjects would be needed for each experiment to attain statistical significance. Finally, the hypothesis itself is unintuitive. Do we really think that looking at a picture of The Thinker or reading a difficult font would significantly induce disbelief in religion? Even if the hypothesis is correct, the magnitude of the effect in some of the experiments seems implausible. In the first experiment, for instance, viewing the analytic artwork reduced subjects’ average scores from 62 out of a possible 100 points to just 41.
Because of these suspicions, I decided to test the results in Gervais and Norenzayan (2012) for inflated significance using the test derived by Iaonnidis and Trikalinos (2007). The test proceeds by computing the power of each experiment to detect the common effect estimated jointly by the studies. The expected number of significant studies is then computed and compared to the observed number of significant studies using a chi-squared or exact binomial test. Since the test is biased against finding inflated significance, the test is usually considered significant if p < .10 (Ioannidis and Trikalinos 2007).
Results
A test for homogeneity of the results of the four experiments indicates essentially perfect homogeneity (I 2 = 0%). Therefore, we are justified in computing a common, pooled effect size. The pooled effect size (Hedges and Olkin 1985) for the four experiments is g* = .3823. Table 2 shows the power of each experiment to detect the pooled effect size.
| Experiment | Power to detect pooled effect size |
|---|---|
| 1 | .292 |
| 2 | .444 |
| 3 | .627 |
| 4 | .720 |
The expected number of significant studies E is the sum of the power values in Table 2: E = 2.08. Since the total number of studies is small, we use an exact binomial test to obtain the probability p that all four studies would reject the null hypothesis. This is simply the product of the power values: p = .0586. This is less than the recommended level of significance (.10). Therefore, we conclude that there is evidence of inflated significance (due to publication bias, selective reporting, etc.) in the reported results. This may mean that the data do not support the study’s conclusions or that the reported effect size is exaggerated. We should therefore be skeptical of the study’s findings.
References
Gervais W. M. and A. Norenzayan (2012). Analytic thinking promotes religious disbelief. Science 336:493–96.
Francis G. (2012). The same old New Look: Publication bias in a study of wishful seeing. i-Perception 3:176–78.
Francis G. (2012). Too good to be true: Publication bias in two prominent studies from experimental psychology. Psychonomic Bulletin & Review 19:151–56.
Francis G. (in press). Publication bias in “Red, Rank, and Romance in Women Viewing Men” by Elliot et al. (2010). Journal of Experimental Psychology: General.
Hedges L. V. and I. Olkin (1985). Statistical methods for meta-analysis. New York, NY: Academic Press.
Ioannidis J. P. A. and T. A. Trikalinos (2007). An exploratory test for an excess of significant findings. Clinical Trials 4:24–53.