# Evidence of Publication Bias in “Analytical Thinking Promotes Religious Disbelief”

A recent paper in the prestigious journal Science entitled “Analytic Thinking Promotes Religious Disbelief” by cognitive psychologists Will Gervais and Ara Norenzayan (2012) claimed that subtle prompts designed to elicit analytic thinking cause subjects to reduce their belief in religion. In the paper, the investigators report the results of four experiments. In each experiment, subjects were randomized either to a prompt designed to elicit analytical thinking or to a neutral, control prompt. After being exposed to the prompt, subjects completed a questionnaire that measured their degree of religious belief.

The four experiments differed mainly in the type of prompts used. In the first experiment, subjects were exposed to either a picture of Rodin’s The Thinker (the analytic prompt) or Myron’s Discobolus (Figure 1). In the second experiment, subjects performed a sentence-forming task that involved words suggestive of analytic thinking (eg, analyze, reason, ponder), or control words. The third experiment used the same set of prompts as the second, but scored subjects’ degree of religious belief using a different scale. In the final experiment, the prompt was the questionnaire about religious belief itself. Subjects rated their religious belief using a questionnaire printed in a difficult-to-read font or a normal font, the hypothesis being that the difficult-to-read font prompts analytic thinking.

Table 1 shows the results of the experiments. In all four experiments, the null hypothesis was rejected by a small margin—a p-value of either .03 or .04—and the authors thus concluded that analytical thinking promotes religious disbelief.

Table 1. Results of the four reported experimental studies
(Gervais & Norenzayan 2012)
Experiment Condition N Mean SD t P-value Effect Size
1. Art Control 31 61.55 35.68 2.24 .03 .59
Analytic 26 41.42 31.47
2. Word task Control 43 12.65 5.29 2.11 .04 .44
Analytic 50 10.12 6.13
3. Word task Control 75 40.16 16.73 2.20 .03 .36
Analytic 70 34.39 14.77
4. Font Control 88 12.16 5.99 2.06 .04 .31
Analytic 91 10.40 5.44

When multiple independent experiments each reject the same null hypothesis, we normally think of this as strong evidence against the null hypothesis. However, this is only true if the experiments have sufficient statistical power to detect the observed effect. If they do not, and they reject the null hypothesis anyway, then the results suggest some form of systematic error leading to inflated Type 1 error rates, such as publication bias, selective reporting of results, previewing the results and stopping data collection when significance is attained, improper handling of outliers, etc. Furthermore, having four experiments out of four reject the null hypothesis by a small margin is itself unusual, as it suggests that the investigators knew almost exactly how many subjects would be needed for each experiment to attain statistical significance. Finally, the hypothesis itself is unintuitive. Do we really think that looking at a picture of The Thinker or reading a difficult font would significantly induce disbelief in religion? Even if the hypothesis is correct, the magnitude of the effect in some of the experiments seems implausible. In the first experiment, for instance, viewing the analytic artwork reduced subjects’ average scores from 62 out of a possible 100 points to just 41.

Because of these suspicions, I decided to test the results in Gervais and Norenzayan (2012) for inflated significance using the test derived by Iaonnidis and Trikalinos (2007). The test proceeds by computing the power of each experiment to detect the common effect estimated jointly by the studies. The expected number of significant studies is then computed and compared to the observed number of significant studies using a chi-squared or exact binomial test. Since the test is biased against finding inflated significance, the test is usually considered significant if p < .10 (Ioannidis and Trikalinos 2007).

## Results

A test for homogeneity of the results of the four experiments indicates essentially perfect homogeneity (I2 = 0%). Therefore, we are justified in computing a common, pooled effect size. The pooled effect size (Hedges and Olkin 1985) for the four experiments is g* = .3823. Table 2 shows the power of each experiment to detect the pooled effect size.

Table 2. Power to detect the pooled effect size
Experiment Power to detect pooled effect size
1 .292
2 .444
3 .627
4 .720

The expected number of significant studies E is the sum of the power values in Table 2: E = 2.08. Since the total number of studies is small, we use an exact binomial test to obtain the probability p that all four studies would reject the null hypothesis. This is simply the product of the power values: p = .0586. This is less than the recommended level of significance (.10). Therefore, we conclude that there is evidence of inflated significance (due to publication bias, selective reporting, etc.) in the reported results. This may mean that the data do not support the study’s conclusions or that the reported effect size is exaggerated. We should therefore be skeptical of the study’s findings.

## References

Gervais W. M. and A. Norenzayan (2012). Analytic thinking promotes religious disbelief. Science 336:493–96.

Francis G. (2012). The same old New Look: Publication bias in a study of wishful seeing. i-Perception 3:176–78.

Francis G. (2012). Too good to be true: Publication bias in two prominent studies from experimental psychology. Psychonomic Bulletin & Review 19:151–56.

Francis G. (in press). Publication bias in “Red, Rank, and Romance in Women Viewing Men” by Elliot et al. (2010). Journal of Experimental Psychology: General.

Hedges L. V. and I. Olkin (1985). Statistical methods for meta-analysis. New York, NY: Academic Press.

Ioannidis J. P. A. and T. A. Trikalinos (2007). An exploratory test for an excess of significant findings. Clinical Trials 4:24–53.

# How to Lose Weight to Improve Your Climbing

The bottom line for losing weight is that you must create a caloric deficit. That means you must consume fewer calories than you burn. You can do this in two ways: reduce the number of calories in your diet or increase the amount of exercise you do. The most effective way is to do both simultaneously.

## How to reduce your caloric intake

In principle, you could reduce your total caloric intake in one of two ways: either by eating the same foods you currently do and reducing portion sizes or by reducing intake of certain macronutrients (i.e., fat, carbohydrate, or protein). However, one of the challenges of dieting is that when your body senses that it is receiving fewer calories than it is burning, it responds by breaking down both body fat and muscle. For an athlete (and if you are rock climber, you should start thinking of yourself as an athlete), this is disastrous because when you diet you can potentially lose strength. Since we want to increase our strength-to-weight ratio, we want to maintain our muscle mass and lose weight in the form of body fat. The way to accomplish this through diet is to maintain carbohydrate intake, increase protein intake, and reduce fat intake enough to produce a caloric deficit.

We want to increase protein intake because the additional protein offsets the body’s increased rate of muscle breakdown while dieting. The reason it is important to maintain high carbohydrate intake is that the higher the carbohydrate intake, the less muscle tissue is broken down for energy (that is, dietary carbohydrate is muscle sparing). Dietary fat, on the other hand, is not muscle sparing; consequently, your entire reduction in calorie intake should come from reducing your intake of fats.

Let’s assume you are doing aerobic exercise for a half-hour 3 days a week and climbing indoors or out 3 sessions per week. (If you are not getting at least this much exercise, you should start. You will find it much easier to lose weight by a combination of diet and exercise than by just dieting.) The average female at this level of exercise will probably require about 2000 calories/day to maintain her body weight, while the average male will require about 2500 calories. You should try to consume about 500 to 750 calories per day less than you burn. This should result in losing 1–1½ pounds per week. This may seem too slow to some; however, more drastic diets do not work—they are virtually impossible to maintain.

OK, so now you have an idea about how many total calories to eat each day. The next question is how should these calories be distributed among protein, carbohydrate, and fat. My recommendations are the following: 25%–30% of the total calories in your diet should come from protein, 10%–20% from fat, and the remainder from carbohydrate. This is a low-fat diet that is relatively high in both protein and carbohydrate, as required to promote retention of muscle tissue. In order to operationalize this diet, you need to become savvy at reading nutritional labels and know that protein and carbohydrate contain 4 calories per gram and that fat contains 9 calories per gram (for those who need to know, it’s 7 calories per gram for alcohol).

## So, what to eat

The challenge in this diet is keeping the protein intake high and the fat intake low. Therefore, you need to look for foods that are very low in fat and high in protein. Ideal foods are the following: beans, white-meat poultry, low-fat fishes (e.g., halibut), canned tuna, and soy-based non-fat mock meats (hint: think Trader Joe’s). You can eat essentially unlimited vegetables, since they are very low in calories. Fruits are essentially all carbohydrate and water and low in total calories, and can (and should) be eaten in moderation. Any grain products you eat should be whole grain, since they are higher in protein, fiber, and micronutrients than their processed counterparts.

Try not to add fat to anything. Throw away your mayonnaise, margarine, butter, and cooking oils. Pure oils such as these contain 120 calories per table- spoon. It is all too easy to turn a healthy, low-calorie salad into an abomination by adding excessive dressing. Instead of mayonnaise on sandwiches, substitute mustard (which is virtually calorie free), or just go without.

Keep a diary of everything you eat. Specifically note the total calories you consume and the total grams of protein in each meal. If, at the end of the day, you didn’t consume enough protein, have a blended shake made from a protein supplement and a piece of fruit in the evening. Buy the cheap soy-protein powder. Let the muscle heads waste their money on designer whey peptides.

I realize that this diet is more quantitative than some people would like. However, in my judgment, counting calories and protein grams is the only way to ensure adequate protein intake while maintaining a low-calorie diet. This is critical for athletes.

# A Brief Introduction to the Tuesday Birthday Problem

If a man has two children, and one of them is a son who was born on a Tuesday, what is the probability that his other child is also a son?

The question posed above is known as the Tuesday Birthday Problem, and in this post I will attempt to briefly explain its solution. The main purpose this post is to propose the Tuesday Birthday Problem as a topic for an article for the skeptical website The Odds Must Be Crazy. If they like the idea, I’ll write a full article on the problem for their website.

Our intuition, of course, is that the day of the week that the son was born on is superfluous information, and therefore, since the probability of the sex of one child is independent of the sex of the other child, the probability that the second child is a son must be 1/2. As obvious as that answer seems, it is wrong. To see why, let’s list all the possible ways, taking the day of the week into account, that a person can have two children, one of whom is a son born on a Tuesday. First we’ll list all the possibilities in which the older child is a son born on a Tuesday; then we’ll list all the possibilities in which the younger child is a son born on a Tuesday.

Here is the first list (older child is a son born on a Tuesday):

Older
Child’s
Sex
Older
Child’s
Birth Day
Younger
Child’s
Sex
Younger
Child’s
Birth Day
Boy Tue Boy Mon
Boy Tue Boy Tue
Boy Tue Boy Wed
Boy Tue Boy Thu
Boy Tue Boy Fri
Boy Tue Boy Sat
Boy Tue Boy Sun
Boy Tue Girl Mon
Boy Tue Girl Tue
Boy Tue Girl Wed
Boy Tue Girl Thu
Boy Tue Girl Fri
Boy Tue Girl Sat
Boy Tue Girl Sun

If you count the rows in the above table, you’ll see that there are 14 possibilities total, in 7 of which the younger child is a son and in 7 of which the younger child is a daughter.

Now here’s the other list (younger child is a son born on a Tuesday):

Older
Child’s
Sex
Older
Child’s
Birth Day
Younger
Child’s
Sex
Younger
Child’s
Birth Day
Boy Mon Boy Tue
Boy Tue Boy Tue
Boy Wed Boy Tue
Boy Thu Boy Tue
Boy Fri Boy Tue
Boy Sat Boy Tue
Boy Sun Boy Tue
Girl Mon Boy Tue
Girl Tue Boy Tue
Girl Wed Boy Tue
Girl Thu Boy Tue
Girl Fri Boy Tue
Girl Sat Boy Tue
Girl Sun Boy Tue

Again we see 14 possibilities, in 7 of which the older child is a son and in 7 of which the older child is a daughter. So, if we add up all the possibilities in the two tables, we have a total of 28 possibilities in 14 of which both children are sons, giving a probability of 1/2, right? Wrong! Looking closely at the two tables, we can see a problem: the second row of each table is the same, indicating that we’ve double-counted one possibility—the case of two sons each born on a Tuesday. So there aren’t actually 28 distinct possibilities, but only 27. Here is a correct table, listing each possibility only once:

Older
Child’s
Sex
Older
Child’s
Birth Day
Younger
Child’s
Sex
Younger
Child’s
Birth Day
Boy Tue Boy Mon
Boy Tue Boy Tue
Boy Tue Boy Wed
Boy Tue Boy Thu
Boy Tue Boy Fri
Boy Tue Boy Sat
Boy Tue Boy Sun
Boy Tue Girl Mon
Boy Tue Girl Tue
Boy Tue Girl Wed
Boy Tue Girl Thu
Boy Tue Girl Fri
Boy Tue Girl Sat
Boy Tue Girl Sun
Boy Mon Boy Tue
Boy Wed Boy Tue
Boy Thu Boy Tue
Boy Fri Boy Tue
Boy Sat Boy Tue
Boy Sun Boy Tue
Girl Mon Boy Tue
Girl Tue Boy Tue
Girl Wed Boy Tue
Girl Thu Boy Tue
Girl Fri Boy Tue
Girl Sat Boy Tue
Girl Sun Boy Tue

Counting the rows confirms that there are only 27 unique (and equally likely) possibilities. In 13 of these, both children are sons. Therefore, the probability that the man has two sons, given that he has one son born on a Tuesday, is 13/27 = .481.

# Calculating the Bayes Factor from a Routinely Reported Statistic: The Hazard Ratio

Most published research in epidemiology and related biomedical disciplines (as well as the social sciences) report hypothesis test results in the form of p-values, which quantify the probability that an observed effect (or a more extreme one) would have occurred if the null hypothesis were true. However, p-values, and null hypothesis significance tests in general, have been widely criticized. First, a “significant” p-value is often misinterpreted as meaning that the null hypothesis is false, or at least probably false. Secondly, small p-values overstate the strength of the evidence that the data provide against the null hypothesis. Third, although a small p-value indicates that the data are unlikely under the null hypothesis, the data may be even less likely under the alternative hypothesis. In this situation, known as the Jeffreys-Lindley paradox, the null hypothesis will be wrongly rejected. Fourth, and most importantly, p-values don’t tell us what we really want to know: the probability that a hypothesis is true in light of the data.

In contrast to a classical hypothesis test, which results in a p-value, the result of a Bayesian hypothesis test is a Bayes factor: the ratio of the probability of the observed data under the null hypothesis to the probability of the data under the alternative hypothesis. The Bayes factor, denoted $$BF_{01}$$, is defined as follows:

BF_{01} = \frac{P(D|H_0)}{P(D|H_1)} .
\label{eq:bf01}

The Bayes factor quantifies how much more (or less) the data favor the null hypothesis compared with the alternative hypothesis. A Bayes factor greater than 1 indicates that the data favor the null hypothesis over the alternative; a Bayes factor less than 1 means the opposite. Additionally, the Bayes factor multiplied by our prior odds in favor of the null hypothesis equals our posterior odds in favor of the null hypothesis, a relation known as the odds form of Bayes’ Theorem:

\frac{P(H_0|D)}{P(H_1|D)} = BF_{01} \times \frac{P(H_0)}{P(H_1)} .
\label{bayes_thm}

## The Cox Proportional Hazards Model

In epidemiology, we are often interested in assessing the effect that an exposure has on the risk rate for an event, such as death or the onset of a disease, after controlling for a number of potential confounding exposures. Epidemiologists frequently employ Cox proportional hazards regression for this purpose. We can write the Cox proportional hazards model as follows:

\lambda(t|x,\mathbf{z}) = \lambda_0(t) \exp(\beta x +
\boldsymbol{\gamma}’\mathbf{z}) \,,
\label{cox_model}

where $$\lambda(t|x,\mathbf{z})$$ is the risk rate, or hazard rate, (given $$x$$ and $$\mathbf{z}$$) at time $$t$$ in the study, $$\lambda_0(t)$$ is the baseline hazard rate at time $$t$$, $$\beta$$ is the effect on the hazard rate of $$x$$ the exposure of interest, and $$\boldsymbol{\gamma}$$ is a vector of regression coefficients for $$\mathbf{z}$$ the vector of potential confounding effects that the model controls for. (Note that the effects of $$x$$ and $$\mathbf{z}$$ are constant with respect to $$t$$.)

Typically, we wish to estimate the hazard ratio—the hazard rate for one level of the exposure $$x_1$$ versus another level $$x_2$$, controlling for (ie, holding constant) the other covariates in the model. From Eq. (\ref{cox_model}), we see that the hazard ratio $$HR$$ is

\begin{align}
HR &= \frac{\lambda(t|x_1,\mathbf{z})}{\lambda(t|x_2,\mathbf{z})} \\
&= \frac{\lambda_0(t) \exp(\beta x_1 +
\boldsymbol{\gamma}’\mathbf{z})}
{\lambda_0(t) \exp(\beta x_2 + \boldsymbol{\gamma}’\mathbf{z})} \\
&= e\,^{\beta(x_1 – x_2)} = e\,^{\beta(\Delta x)} .
\label{HRx}
\end{align}

For a 1-unit difference in the exposure, we have simply

HR = e^{\,\beta} .
\label{HR}

## Calculating the Bayes Factor from the Reported Hazard Ratio

The Bayesian Information Criterion (BIC) is a large-sample approximation to 2 times the logarithm of the Bayes factor. Unlike the Bayes factor itself, calculation of the BIC does not require numerical integration or the specification of a prior distribution on the alternative hypothesis. Rather, it can be calculated from statistics that are routinely reported in journal articles. Furthermore, the BIC implies that the prior distribution on the alternative hypothesis is the unit-information prior, which has statistical properties desirable in a default prior (Kass and Wasserman 1995; Rouder et al 2009). The BIC approximation to the Bayes factor is good for sample sizes as small as 25 or 30 (Kass and Wasserman 1995; Rouder et al 2009, Figure 4); for sample sizes on the order of those in epidemiology (1000–1,000,000), it is essentially perfect.

We wish to calculate the Bayes factor for the null hypothesis $$H_0$$ that the exposure $$x$$ has no effect on the hazard rate, versus the alternative hypothesis $$H_1$$ that the exposure does affect the hazard rate. The null hypothesis implies that in Eq. (\ref{HR}) $$HR = 1$$, and thus $$\beta = 0$$. Therefore, we can restate our null and alternative hypotheses in terms of $$\beta$$:

\label{H01}

The BIC corresponding to a test of the above hypotheses is

BIC = -2[\ell_1(\hat{\beta}_1) - \ell_0(\hat{\beta}_0)] + \log(n) \,,
\label{BIC}

where $$\ell_1(\hat{\beta}_1)$$ and $$\ell_0(\hat{\beta}_0)$$ are the maximum log-likelihoods under $$H_1$$ and $$H_0$$ respectively, and $$n$$ is the number of uncensored events (eg, the number of observed deaths) in the study population (Volinsky and Raftery 1999).

Once we have BIC, $$BF_{01}$$ is simply

BF_{01} = \exp\left(\frac{BIC}{2}\right) \,.
\label{bf_bic}

The maximum log-likelihoods $$\ell_1(\hat{\beta}_1)$$ and $$\ell_0(\hat{\beta}_0)$$ in Eq. (\ref{BIC}) are not ordinarily reported in published papers; however, the quantity $$2[\ell_1(\hat{\beta}_1) - \ell_0(\hat{\beta}_0)]$$ can readily be computed from data in the paper, because, for large $$n$$,

2[\ell_1(\hat{\beta}_1) - \ell_0(\hat{\beta}_0)] \approx \left[
\frac{\hat{\beta}}{\text{se}(\hat{\beta})}\right]^2
\label{z}

(for a proof, see Zhang 2005), and we can compute $$\hat{\beta}$$ and its standard error $$\text{se}(\hat{\beta})$$ from the reported hazard ratio $$\widehat{HR}$$ and its confidence interval, respectively. From Eq. (\ref{HR}),

\hat{\beta} = \log (\widehat{HR}) \,.
\label{beta}

To calculate $$\text{se}(\hat{\beta})$$, we first need to calculate the confidence interval for $$\hat{\beta}$$. To do this, we apply Eq. (\ref{beta}) to the reported confidence limits of the hazard ratio. If $$\widehat{HR}_L$$ and $$\widehat{HR}_U$$ are the lower and upper confidence limits of $$\widehat{HR}$$, then $$\hat{\beta}_L$$ and $$\hat{\beta}_U$$, the lower and upper confidence limits of $$\hat{\beta}$$, are

\log(\widehat{HR}_U) \,.
\label{CLs}

Since $$\hat{\beta}$$ has an approximate large-sample normal distribution, the width of its $$(1-\alpha)\times100\%$$ confidence interval is $$2z_{1-\alpha/2} \text{se}(\hat{\beta})$$, where $$z_{1-\alpha/2}$$ is the $$1-\frac{\alpha}{2}$$ quantile of the standard normal distribution. Therefore,

\text{se}(\hat{\beta}) = \frac{\hat{\beta}_U – \hat{\beta}_L}{2z_{1-\alpha/2}} .
\label{width}

For the ubiquitous $$95\%$$ confidence interval, $$z_{1-\alpha/2}=1.96$$.

Combining Eqs. (\ref{BIC}), (\ref{bf_bic}), and (\ref{z}), we have

BF_{01} \approx \exp\,\left\{\frac{1}{2}\left(- \left[
\frac{\hat{\beta}}{\text{se}(\hat{\beta})}\right]^2 +
\log(n)\right)\,\right\} \,,
\label{final}

where $$\hat{\beta}$$ is computed from Eq. (\ref{beta}) and $$\text{se}(\hat{\beta})$$ is computed from Eqs. (\ref{CLs}) and (\ref{width}).

## Example: Red Meat and Mortality

Pan et al (2012) investigated the effect of red meat intake on total mortality in a combined cohort comprised of 121,342 adults who were enrolled in either the Nurses Health Study or the Health Professionals Follow-up Study. Intakes of processed and unprocessed red meat, as well as numerous dietary and non-dietary potentially confounding variables, were assessed for each study participant at baseline and every four years thereafter by using a detailed validated food-frequency questionnaire. After 2,958,146 person-years of follow-up, the investigators observed 23,926 deaths in the combined cohort. Using Cox proportional hazards regression, the investigators found a highly statistically significant relationship between the amount of unprocessed red meat consumed and the risk of death: the hazard ratio for a 1-serving-per-day increment in red meat intake was 1.13 [95% CI (1.07, 1.20); p-value<.001]. We wish to calculate the Bayes factor for $$H_0\!\!:HR=1$$ vs. $$H_1\!\!:HR\ne1$$.

From Eq. (\ref{beta}),

\hat{\beta} = \log(1.13) = .1222\,.
\label{ex:beta}

From Eq. (\ref{CLs}),

\hat{\beta}_U = \log(1.20) = .18232\,.
\label{ex:CLs}

And thus from Eq. (\ref{width}),

\text{se}(\hat{\beta}) = \frac{.18232 – .06766}{2(1.96)} = .02925\,.
\label{ex:width}

And therefore from Eq. (\ref{final}),

\begin{align}
BF_{01} &\approx \exp\left\{\frac{1}{2} \left(-
\left[\frac{.1222}{.02925}\right]^2 + \log(23,\!926)\right)\right\}
\nonumber \\
&=.025\,.
\label{ex:final}
\end{align}

Thus the data favor the alternative hypothesis over the null hypothesis by a factor of 40 (1/.025). Although this is strong evidence against the null hypothesis, notice that it is not nearly as strong as the reported p-value (p<.001) seems to imply.

Of course, we should bear in mind that the foregoing calculations do not take into account potential sources of systematic error, such as error in exposure measurement, residual confounding, etc., that could have affected the results reported by Pan et al (2012). Our posterior probability of the null hypothesis should be tempered by consideration of these factors.

## References

Kass, R. E and L. Wasserman (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. Journal of the American Statistical Association 90(431):928–34.

Pan A., Q. Sun, A. M. Bernstein, et al. (2012). Red meat consumption and mortality: results from 2 prospective cohort studies. Archives of Internal Medicine. Published online March 12, 2012.

Rouder, J. N., P. L. Speckman, D. Sun, R. D. Morey, and G. Iverson (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review 16(2):225–37.

Volinsky, C. T. and A. E. Raftery (1999). Bayesian information criterion for censored survival models. Technical Report 349, Department of Statistics, University of Washington. (http://www.stat.washington.edu/tech.reports/tr349.ps).

Zhang D. (2005). Chapter 7 in ST 745. Analysis of Survival Data. Lecture Notes. Department of Statistics, North Carolina State University. (http://www4.stat.ncsu.edu/%7Edzhang2/st745/chap7.ps).