# Calculating the Bayes Factor from a Routinely Reported Statistic: The Hazard Ratio

Most published research in epidemiology and related biomedical disciplines (as well as the social sciences) report hypothesis test results in the form of p-values, which quantify the probability that an observed effect (or a more extreme one) would have occurred if the null hypothesis were true. However, p-values, and null hypothesis significance tests in general, have been widely criticized. First, a “significant” p-value is often misinterpreted as meaning that the null hypothesis is false, or at least probably false. Secondly, small p-values overstate the strength of the evidence that the data provide against the null hypothesis. Third, although a small p-value indicates that the data are unlikely under the null hypothesis, the data may be even less likely under the alternative hypothesis. In this situation, known as the Jeffreys-Lindley paradox, the null hypothesis will be wrongly rejected. Fourth, and most importantly, p-values don’t tell us what we really want to know: the probability that a hypothesis is true in light of the data.

In contrast to a classical hypothesis test, which results in a p-value, the result of a Bayesian hypothesis test is a Bayes factor: the ratio of the probability of the observed data under the null hypothesis to the probability of the data under the alternative hypothesis. The Bayes factor, denoted $$BF_{01}$$, is defined as follows:

BF_{01} = \frac{P(D|H_0)}{P(D|H_1)} .
\label{eq:bf01}

The Bayes factor quantifies how much more (or less) the data favor the null hypothesis compared with the alternative hypothesis. A Bayes factor greater than 1 indicates that the data favor the null hypothesis over the alternative; a Bayes factor less than 1 means the opposite. Additionally, the Bayes factor multiplied by our prior odds in favor of the null hypothesis equals our posterior odds in favor of the null hypothesis, a relation known as the odds form of Bayes’ Theorem:

\frac{P(H_0|D)}{P(H_1|D)} = BF_{01} \times \frac{P(H_0)}{P(H_1)} .
\label{bayes_thm}

## The Cox Proportional Hazards Model

In epidemiology, we are often interested in assessing the effect that an exposure has on the risk rate for an event, such as death or the onset of a disease, after controlling for a number of potential confounding exposures. Epidemiologists frequently employ Cox proportional hazards regression for this purpose. We can write the Cox proportional hazards model as follows:

\lambda(t|x,\mathbf{z}) = \lambda_0(t) \exp(\beta x +
\boldsymbol{\gamma}’\mathbf{z}) \,,
\label{cox_model}

where $$\lambda(t|x,\mathbf{z})$$ is the risk rate, or hazard rate, (given $$x$$ and $$\mathbf{z}$$) at time $$t$$ in the study, $$\lambda_0(t)$$ is the baseline hazard rate at time $$t$$, $$\beta$$ is the effect on the hazard rate of $$x$$ the exposure of interest, and $$\boldsymbol{\gamma}$$ is a vector of regression coefficients for $$\mathbf{z}$$ the vector of potential confounding effects that the model controls for. (Note that the effects of $$x$$ and $$\mathbf{z}$$ are constant with respect to $$t$$.)

Typically, we wish to estimate the hazard ratio—the hazard rate for one level of the exposure $$x_1$$ versus another level $$x_2$$, controlling for (ie, holding constant) the other covariates in the model. From Eq. (\ref{cox_model}), we see that the hazard ratio $$HR$$ is

\begin{align}
HR &= \frac{\lambda(t|x_1,\mathbf{z})}{\lambda(t|x_2,\mathbf{z})} \\
&= \frac{\lambda_0(t) \exp(\beta x_1 +
\boldsymbol{\gamma}’\mathbf{z})}
{\lambda_0(t) \exp(\beta x_2 + \boldsymbol{\gamma}’\mathbf{z})} \\
&= e\,^{\beta(x_1 – x_2)} = e\,^{\beta(\Delta x)} .
\label{HRx}
\end{align}

For a 1-unit difference in the exposure, we have simply

HR = e^{\,\beta} .
\label{HR}

## Calculating the Bayes Factor from the Reported Hazard Ratio

The Bayesian Information Criterion (BIC) is a large-sample approximation to 2 times the logarithm of the Bayes factor. Unlike the Bayes factor itself, calculation of the BIC does not require numerical integration or the specification of a prior distribution on the alternative hypothesis. Rather, it can be calculated from statistics that are routinely reported in journal articles. Furthermore, the BIC implies that the prior distribution on the alternative hypothesis is the unit-information prior, which has statistical properties desirable in a default prior (Kass and Wasserman 1995; Rouder et al 2009). The BIC approximation to the Bayes factor is good for sample sizes as small as 25 or 30 (Kass and Wasserman 1995; Rouder et al 2009, Figure 4); for sample sizes on the order of those in epidemiology (1000–1,000,000), it is essentially perfect.

We wish to calculate the Bayes factor for the null hypothesis $$H_0$$ that the exposure $$x$$ has no effect on the hazard rate, versus the alternative hypothesis $$H_1$$ that the exposure does affect the hazard rate. The null hypothesis implies that in Eq. (\ref{HR}) $$HR = 1$$, and thus $$\beta = 0$$. Therefore, we can restate our null and alternative hypotheses in terms of $$\beta$$:

\label{H01}

The BIC corresponding to a test of the above hypotheses is

BIC = -2[\ell_1(\hat{\beta}_1) - \ell_0(\hat{\beta}_0)] + \log(n) \,,
\label{BIC}

where $$\ell_1(\hat{\beta}_1)$$ and $$\ell_0(\hat{\beta}_0)$$ are the maximum log-likelihoods under $$H_1$$ and $$H_0$$ respectively, and $$n$$ is the number of uncensored events (eg, the number of observed deaths) in the study population (Volinsky and Raftery 1999).

Once we have BIC, $$BF_{01}$$ is simply

BF_{01} = \exp\left(\frac{BIC}{2}\right) \,.
\label{bf_bic}

The maximum log-likelihoods $$\ell_1(\hat{\beta}_1)$$ and $$\ell_0(\hat{\beta}_0)$$ in Eq. (\ref{BIC}) are not ordinarily reported in published papers; however, the quantity $$2[\ell_1(\hat{\beta}_1) - \ell_0(\hat{\beta}_0)]$$ can readily be computed from data in the paper, because, for large $$n$$,

2[\ell_1(\hat{\beta}_1) - \ell_0(\hat{\beta}_0)] \approx \left[
\frac{\hat{\beta}}{\text{se}(\hat{\beta})}\right]^2
\label{z}

(for a proof, see Zhang 2005), and we can compute $$\hat{\beta}$$ and its standard error $$\text{se}(\hat{\beta})$$ from the reported hazard ratio $$\widehat{HR}$$ and its confidence interval, respectively. From Eq. (\ref{HR}),

\hat{\beta} = \log (\widehat{HR}) \,.
\label{beta}

To calculate $$\text{se}(\hat{\beta})$$, we first need to calculate the confidence interval for $$\hat{\beta}$$. To do this, we apply Eq. (\ref{beta}) to the reported confidence limits of the hazard ratio. If $$\widehat{HR}_L$$ and $$\widehat{HR}_U$$ are the lower and upper confidence limits of $$\widehat{HR}$$, then $$\hat{\beta}_L$$ and $$\hat{\beta}_U$$, the lower and upper confidence limits of $$\hat{\beta}$$, are

\log(\widehat{HR}_U) \,.
\label{CLs}

Since $$\hat{\beta}$$ has an approximate large-sample normal distribution, the width of its $$(1-\alpha)\times100\%$$ confidence interval is $$2z_{1-\alpha/2} \text{se}(\hat{\beta})$$, where $$z_{1-\alpha/2}$$ is the $$1-\frac{\alpha}{2}$$ quantile of the standard normal distribution. Therefore,

\text{se}(\hat{\beta}) = \frac{\hat{\beta}_U – \hat{\beta}_L}{2z_{1-\alpha/2}} .
\label{width}

For the ubiquitous $$95\%$$ confidence interval, $$z_{1-\alpha/2}=1.96$$.

Combining Eqs. (\ref{BIC}), (\ref{bf_bic}), and (\ref{z}), we have

BF_{01} \approx \exp\,\left\{\frac{1}{2}\left(- \left[
\frac{\hat{\beta}}{\text{se}(\hat{\beta})}\right]^2 +
\log(n)\right)\,\right\} \,,
\label{final}

where $$\hat{\beta}$$ is computed from Eq. (\ref{beta}) and $$\text{se}(\hat{\beta})$$ is computed from Eqs. (\ref{CLs}) and (\ref{width}).

## Example: Red Meat and Mortality

Pan et al (2012) investigated the effect of red meat intake on total mortality in a combined cohort comprised of 121,342 adults who were enrolled in either the Nurses Health Study or the Health Professionals Follow-up Study. Intakes of processed and unprocessed red meat, as well as numerous dietary and non-dietary potentially confounding variables, were assessed for each study participant at baseline and every four years thereafter by using a detailed validated food-frequency questionnaire. After 2,958,146 person-years of follow-up, the investigators observed 23,926 deaths in the combined cohort. Using Cox proportional hazards regression, the investigators found a highly statistically significant relationship between the amount of unprocessed red meat consumed and the risk of death: the hazard ratio for a 1-serving-per-day increment in red meat intake was 1.13 [95% CI (1.07, 1.20); p-value<.001]. We wish to calculate the Bayes factor for $$H_0\!\!:HR=1$$ vs. $$H_1\!\!:HR\ne1$$.

From Eq. (\ref{beta}),

\hat{\beta} = \log(1.13) = .1222\,.
\label{ex:beta}

From Eq. (\ref{CLs}),

\hat{\beta}_U = \log(1.20) = .18232\,.
\label{ex:CLs}

And thus from Eq. (\ref{width}),

\text{se}(\hat{\beta}) = \frac{.18232 – .06766}{2(1.96)} = .02925\,.
\label{ex:width}

And therefore from Eq. (\ref{final}),

\begin{align}
BF_{01} &\approx \exp\left\{\frac{1}{2} \left(-
\left[\frac{.1222}{.02925}\right]^2 + \log(23,\!926)\right)\right\}
\nonumber \\
&=.025\,.
\label{ex:final}
\end{align}

Thus the data favor the alternative hypothesis over the null hypothesis by a factor of 40 (1/.025). Although this is strong evidence against the null hypothesis, notice that it is not nearly as strong as the reported p-value (p<.001) seems to imply.

Of course, we should bear in mind that the foregoing calculations do not take into account potential sources of systematic error, such as error in exposure measurement, residual confounding, etc., that could have affected the results reported by Pan et al (2012). Our posterior probability of the null hypothesis should be tempered by consideration of these factors.

## References

Kass, R. E and L. Wasserman (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. Journal of the American Statistical Association 90(431):928–34.

Pan A., Q. Sun, A. M. Bernstein, et al. (2012). Red meat consumption and mortality: results from 2 prospective cohort studies. Archives of Internal Medicine. Published online March 12, 2012.

Rouder, J. N., P. L. Speckman, D. Sun, R. D. Morey, and G. Iverson (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review 16(2):225–37.

Volinsky, C. T. and A. E. Raftery (1999). Bayesian information criterion for censored survival models. Technical Report 349, Department of Statistics, University of Washington. (http://www.stat.washington.edu/tech.reports/tr349.ps).

Zhang D. (2005). Chapter 7 in ST 745. Analysis of Survival Data. Lecture Notes. Department of Statistics, North Carolina State University. (http://www4.stat.ncsu.edu/%7Edzhang2/st745/chap7.ps).