A Short History of an Astonishing Claim

I was reading Laurie Lewis’s 2006 review of communication research about collaborative interaction. The article is actually quite great overall. It is a thorough, well written, and interesting review with actionable calls for improving collaboration research. However, a cited empirical claim early on the top of the third page caught my attention:

According to Tschannen, patient mortality in hospitals using collaborative communication was reported to be 41% lower than the predicted number of patient deaths, whereas hospitals recognized for lack of collaborative communication exceeded their predicted number of patient deaths by 58%.

WOW! Collaborative communication can account for a huge amount of hospital deaths! This struck me as implausible. Could most patients that die in low collaborative communication hospitals really have survived if instead they had been in high collaborative communication hospitals?

While I was astonished when I read this claim, I also supposed that it might turn out to be true. After all, it’s easy to believe that effective medical care requires collaboration in diverse teams of patients, nurses, doctors of various specialties, administrators, assistants and so on. The article was peer reviewed, so several established and respected scholars at least read the statement without raising objections. Even still, I was skeptical that a researcher could execute a study that would accurately measure enough variation in collaborative practices in a large enough sample of hospitals to defend these particular numbers. Curious about the empirical support for this claim, I followed the citation to Dana Tschannen’s 2004 article about collaboration in nursing:

In a study by Knaus, Draper, Wagner, and Zimmerman (1986), a significant relationship between the presence of excellent communication and collaboration and patient mortality was found. Specifically, hospitals where collaboration was present reported a mortality rate 41% lower than the predicted number of patient death (p=0.001). Conversely, hospitals noted for poor communication (for example, little-to-no collaboration among health care professionals exceeded their predicted number of patient deaths by 58%.

So Lewis’s sentence faithfully reproduced Tschannen’s account of the research. However, Tschannen’s article doesn’t tell us much about the data or methods used. How many hospitals were in the sample? How was collaboration measured? What was the baseline model? In search of an answer I found the 1986 Knaus et al. article. Here’s the abstract:

We prospectively studied treatment and outcome in 5030 patients in intensive care units at 13 tertiary care hospitals. We stratified each hospital’s patients by individual risk of death using diagnosis, indication for treatment, and Acute Physiology and Chronic Health Evaluation (APACHE) II score. We then compared actual and predicted death rates using group results as the standard. One hospital had significantly better results with 69 predicted but 41 observed deaths (p less than 0.0001). Another hospital had significantly inferior results with 58% more deaths than expected (p less than 0.0001). These differences occurred within specific diagnostic categories, for medical patients alone and for medical and surgical patients combined, and were related more to the interaction and coordination of each hospital’s intensive care unit staff than to the unit’s administrative structure, amount of specialized treatment used, or the hospital’s teaching status. Our findings support the hypothesis that the degree of coordination of intensive care significantly influences its effectiveness.

The paper is not really about collaborative interaction in any deep sense. The main contribution of the paper is the method for comparing the performance of hospital ICUs. Note that the paper is about death rates in ICUs specifically, not in hospitals overall. There were 13 hospitals in the sample. However, the cited association between collaboration and death rate was not based on any measure of collaboration at all. It simply reflects the performance of the best and worst of the 13 hospitals relative to the model. The p-values refer not to statistical tests of the effect of collaboration, but to the difference between the hospitals and the model, where the sample size is the number of patients. Knaus et al. do attribute the difference to coordination practices, but they perform no statistical analysis of collaboration. They probably do not have enough variation in their sample of 13 hospitals to do so. Instead, they gesture to qualitative differences between the hospitals. A more correct claim would be “Relative to a model based on patient diagnostics, an ICU recognized for effective coordination achieved a 41% lower death rate and an ICU with poor coordination suffered a 58% greater death rate.”

Does this really make a difference? Lewis reproduced Tschannen’s misrepresentation, but the major ideas and takeaways of her article aren’t seriously threatened by this error. The correct, more modest, claim still suggests that collaborative interaction is important for healthcare outcomes. Perhaps the urgency is somewhat lessened. Instead of “inferior collaboration is the main cause of unnecessary patient death” we get “evidence suggests that collaboration is important for ICU outcomes.”

This accidental reproduction of errors seems surprisingly common in academic writing. It resembles a game of telephone, but unlike a game of telephone, where the only available information is a private whisper, it is usually possible to trace empirical claims back to primary sources. Ole Bjørn Rekdal wrote a fascinating account of a sort of meta-urban-legend about the iron content of spinach. It is commonly believed that spinach has a very high iron content compared to other greens, but this is false. It is also commonly believed that a misplaced decimal point is the origin of the mistake. Ironically, no primary source for the claim that a decimal point was to blame exists! Academics advocating for the importance of rigorous scholarship went on citing the decimal point mistake, but in doing so made the very sort of mistake they were criticizing.

(Thanks Mako for recommending Rekdal’s article).