## Letters in Response to the Underpowered Finkelstein, et al., Oregon Medicaid Study: The First One Makes Me Especially Sad

I used to think that Dierdre McCloskey was making too much of the failures of statistical rhetoric surrounding the concept of "statistics". No more. She's right.

The first NEJM letter below, in particular, makes me sad. The misconceptions created by the failure of even highly-regarded people to accurately communicate statistical significance, substantive significance, and the difference between them are mammoth. And I find them very depressing:

Effects of Medicaid on Clinical Outcomes: N Engl J Med 2013; 369:581-583 August 8, 2013 DOI: 10.1056/NEJMc1306867

To the Editor:

Baicker et al. provide high-quality evidence of the failure of insurance to promote physical health even among the insured. One cause may be that insurers do not cover cost-effective programs that reach beyond the threshold of the physician's office to change patient behavior, support patients at home, and alter systems of delivery. Even if they suggest that they support such interventions, insurers underinvest in them. For instance, for persons with diabetes, my own (generous) insurance covers only a handful of visits with a dietitian, and it offers neither telephone-based reminders nor counseling on diet or exercise — far less than effective methods require.

Public and private payers have tried to compensate hospitals and physicians according to performance. Results have been decidedly mixed. The reason may be not what they pay or how they pay it, but to whom. Insurers should be rewarded and penalized on the basis of comprehensive health outcomes (e.g., mortality, not just blood-pressure levels). Insurers determine where funds flow. If we want the health system to care more about population health, we should make sure that insurers do too.

Ari B. Friedman, B.A.
abfriedman@gmail.com

The reason that it makes me sad is that Ari Friedman misread Baicker et al. in the natural way--he takes the result of their study to be that they found a "failure of insurance to promote physical health even among the insured". That is simply not correct: they failed to find a statistically significant promotion of physical health. And that is not a call to conclude that insurance is ineffective, but rather a call to gather more data.

And they enabled this misreading when they wrote, in their abstract, that:

Medicaid coverage generated no significant improvements in measured physical health.

They really should not have written "significant" without "statistically" in front of it.

More:

To the Editor:

The abstract in the article by Baicker et al. states that “Medicaid coverage generated no significant improvements in measured physical health.” This is a misleading summary of the data reported in their article. The best estimates are that the Medicaid group had better outcomes than the control group according to most measures (see Table 2 of the article). The problem is that these findings are not statistically significant.

So, the effects might have been zero. That is not the same as saying that they were zero, or even that they were small. Buried toward the end of the article is the statement, “The 95% confidence intervals for many of the estimates of effects . . . include changes that would be considered clinically significant.” Nevertheless, almost all the article, the related editorial, and related news reports, opinion pieces, and online discussions proceeded as if the effects had been found to be zero.

If one objects, on the basis of a lack of statistical certainty, to the simple summary that the Medicaid group had better outcomes, then one should describe the substantive meaning of the confidence interval. An honest summary is that it is quite likely there were positive effects, though it is possible that they were zero or negative.

Ross Boylan, Ph.D.
University of California, San Francisco, San Francisco, CA
ross@biostat.ucsf.edu

And:

To the Editor:

In a comprehensive and careful follow-up to their previous analysis, Baicker et al. (May 2 issue) report on the effects of insurance coverage on health care and health outcomes in the Oregon Medicaid lottery experiment after approximately 2 years. Their instrumental-variable analysis is the next best thing to a randomized, controlled trial, since the instrument — in this case, winning a lottery for Medicaid coverage — satisfies the large-sample properties of being correlated to treatment and not being correlated to the outcomes of interest (e.g., health care utilization and outcomes) except through its effect on treatment.

The financial effects on the lottery winners were not trivial. They received an in-kind benefit valued at one third to two thirds of their household income, their out-of-pocket spending was reduced by 39% ($215), and catastrophic expenditures were reduced by 81%. These financial consequences could have a direct effect on self-reported depression, other mental health conditions, and possibly other outcomes. It is telling that self-reported happiness increased in the first year after winning the lottery, but not in the second.1,2,4 Awarding lottery winners equivalents of cash prizes (worth approximately$6,600 each) rather than Medicaid might have improved their health outcomes and well-being even more.

Joel W. Hay, Ph.D.
University of Southern California, Los Angeles, CA
jhay@usc.edu

We appreciate the attention given to our study and agree with many of the comments. Our study indeed does not speak directly to the effects of different types of insurance or of alternatives such as cash grants; these are important topics for future research.

Statistical precision is crucial to interpretation of the findings. The reported confidence intervals do not allow us to reject the null hypothesis that there was no effect of Medicaid on blood-pressure, cholesterol, or glycated hemoglobin levels — but they are also consistent with Medicaid improving (or harming) these outcomes. Empirical estimates always come with uncertainty. A key question is what effect sizes our findings rule out, and how these compare with findings from previous studies of the effect of health insurance or with expectations based on available treatments.

In some cases, we can reject effect sizes seen in previous studies. For example, we can reject decreases in diastolic blood pressure of more than 2.7 mm Hg (or 3.2 mm Hg in patients with a preexisting diagnosis of hypertension) with 95% confidence. Quasi-experimental studies of the 1-year effect of Medicaid showed decreases in diastolic blood pressure of 6 to 9 mm Hg.1,2

In other cases, our confidence intervals do not rule out the health improvements one might expect given our estimate of the effect of Medicaid on medication use. For example, as noted in our article, given our estimate of the increase in the use of diabetes medication because of Medicaid, the clinical literature would predict a decrease in the average glycated hemoglobin level of 0.05 percentage points, an effect that is well within our 95% confidence interval.

We assessed conditions for which treatments exist that were effective within 2 years (our study period), but power is always constrained by sample size and is further reduced here by the imperfect take-up rates of Medicaid and lack of baseline clinical measures. We took several steps to increase power. Our study examined subgroups in which one might expect larger Medicaid effects (older persons and those with preexisting conditions) and we combined measures using the composite Framingham risk score. In none of these cases did we find statistically significant effects of Medicaid on physical health. We did find substantial and significant improvements in depression and financial well-being, as well as an increased use of health care.

Katherine Baicker, Ph.D.
Harvard School of Public Health, Boston, MA
kbaicker@hsph.harvard.edu

Amy N. Finkelstein, Ph.D.
Massachusetts Institute of Technology, Cambridge, MA

I think the NEJM missed an opportunity to take advantage of an important teachable moment when it refused to publish this letter by Sam Richardson, Aaron Carroll, and Austin Frakt: questions of experimental design for sufficient power are undertaught, and not well known:

More Medicaid study power calculations (our rejected NEJM letter): The Oregon Health Insurance Experiment (OHIE), a randomized controlled trial (RCT) of Medicaid, failed to show statistically significant improvements in physical health; some have argued that this rules out the possibility of large effects. However, the results are not as precisely estimated as expected from an RCT of its size (12,229 individuals) because of large crossover between treatment and control groups.

The Experiment’s low precision is apparent in the wide confidence intervals reported.  For example, the 95% confidence interval around the estimated effect of Medicaid on the probability of elevated blood pressure spans a reduction of 44% to an increase of 28%.

We simulated the Experiment’s power to detect physical health effects of various sizes and the sample size required to detect effects sizes with 80% power. As shown in the table below (click to enlarge), it is very underpowered to detect clinically meaningful effects of Medicaid on the reported physical health outcomes. For example, the study had only 39.3% power to detect a 30% reduction in subjects with elevated blood pressure. It would have required 36,100 participants to detect it at 80% power. Moreover, such a result is substantially more than could be expected from the application of health insurance.

To estimate power levels… we ran 10,000 simulations of a dataset with 5406 treatments and 4786 controls (the study’s reported effective sample sizes given survey weighting). We took random draws for Medicaid enrollment based on the probabilities reported in the study. We took random draws for each outcome: probabilities for the non-Medicaid population are given by the control group means from the study, adjusted for the 18.5% crossover of controls into Medicaid; the probability of the outcome for those on Medicaid is X% lower than the probability for those not on Medicaid, where X% is the postulated effect size.

For each simulated dataset, we regressed the outcome on the indicator for treatment (winning the lottery), and the power is the percentage of the 10,000 iterations for which we rejected at p = 0.05 the hypothesis that winning the lottery had no effect on the outcome. To estimate the total sample size required for 80% power, we conducted a grid search for the lowest sample size that provided 80% probability of rejecting the null hypothesis, running 1000 simulations for each sample size. Our required sample sizes account for sampling weights, and are therefore comparable to the 12,229 total subjects from the study. We do not account for clustering at the household level or controls for household size (and demographic controls from the blood pressure analysis).

Simulations were validated by comparing a subset of results to results that were computed analytically based on the 24.1 percentage point increase of Medicaid enrollment among treatments.Our simulation Stata code is available for download here. The analytic method is described here.

The Experiment was carefully conducted and provides a wealth of new information about the effects of Medicaid on household finances, mental health, and healthcare utilization. However, it was underpowered to provide much insight into the physical health effects of Medicaid.

And they make what appear to me to be correct critical comments on Baicker and Finkelstein's reply:

One excerpt:

In some cases, we can reject effect sizes seen in previous studies. For example, we can reject decreases in diastolic blood pressure of more than 2.7 mm Hg (or 3.2 mm Hg in patients with a preexisting diagnosis of hypertension) with 95% confidence. Quasi-experimental studies of the 1-year effect of Medicaid showed decreases in diastolic blood pressure of 6 to 9 mm Hg.

Of course it is true that the study results reject, with 95% confidence, decreases in diastolic blood pressure mentioned in this quote. However, as Aaron wrote here and here, the prior work cited by the authors that suggests a 6-9 mm Hg drop in diastolic blood pressure was on a population of patients with hypertension. As he explained, and as I did again here, only a small fraction of the Oregon Health Study sample had high blood pressure:

A key point is that blood pressure reduction should only be expected in a population with initially elevated blood pressure, which was the focus of the prior literature referenced above. In contrast, the headline OHIE result is for all study subjects, only a small percentage of whom had elevated blood pressure at baseline. Unfortunately, there is no reported OHIE subanalysis focused exclusively on subjects with hypertension at time of randomization. Depending on which metrics from the published results you examine, between 3% and 16% of the sample had elevated blood pressure at baseline. Taking the high end, 16% x 5 mm Hg = 0.8 mm Hg is in the ballpark of a reasonable expectation of the reduction in diastolic blood pressure the OHIE could have found (it was also the study’s point estimate) were it adequately powered to do so. Was it?

No, which you can read about in full here. (And, no, power would still not be adequate even at twice this reasonable expectation.)

We have high regard for the study and its authors. The limitations of power are functions of the sample well beyond their control. Nevertheless, we believe they need to be kept in mind for a complete understanding of the study’s findings.