« Hurricane Gustav Hits Cuba | Main | Another Huge Reason John McCain Is Not Qualified to Be President »

August 30, 2008

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00e551f08003883400e554cc47ed8833

Listed below are links to weblogs that reference Clinical and Actuarial Judgment:

Comments

Feed You can follow this conversation by subscribing to the comment feed for this post.

Reading this post almost immediately recalled to mind Ernst Mayr's summation of the changes he helped initiate in the field of biology:

"The replacement of typological by population thinking is perhaps the greatest conceptual revolution that has taken place in biology ...For the typologist, the type (eidos) is real and the variation an illusion, while for the populationist the type (average) is an abstraction and only the variation is real. No two ways of looking at nature could be more different."

And that in turn brought to mind Stephen J. Gould's compelling article on his successful fight against a rare, terminal cancer, "The Median Isn't The Message" (e.g., http://tinyurl.com/6nwg7).

Read Gould's article and it should become apparent that Shalizi's resolution #1 is a very plausible alternative, #2 should probably be phrased more carefully because at least in the patients case the fast and frugal heuristic is clearly to ignore the actuarial result and seek the long tail, and #3 undoubtedly contains some truth but since brains are pattern detection rather than computational engines the comparison could be misleading; e.g., brains and presumably experts who have them do not compute, they recognize.*

*Note: recognition is also a central characteristic of immune system function

In the studies referred to, do the the clinicians get to see the patient, ask questions, etc.? It's not surprising that computers do better at pure data analysis, but what happens when the clinician has access to this extra information?

Is this an example of Shalizi's explanation #1?

There are different situations here, and different considerations apply. Where there's enough data to construct good statistical models, human judgment is usually inferior to an algorithm. But as noted, there are situations with a lot of side information where human flixibility seems to give an edge.

Medical triage is an example. The lore is that good pediatricians can see 200 patients a day, and pick out the one child whose fever and irritability are early signs of something serious like meningitis. The same is said of ER docs.

Traditionally, physicians were supposed to approach a new patient afresh, take a detailed history and perform a thorough physical examination. This resulted in an enormous amount of raw data, if one were to digitize it. This pertains to Shalizi's point #1. But a lot of MD-patient contact is fragmentary and the physician arrives at the interview with a lot of information from the chart, the lab, etc.

It might be useful to classify the fallacies and biases that produce bad clinical judgment. Some are simple, like the physician who will never use an otherwise appropriate drug because of one bad outcome. Or the thousands of unnecessary tonsillectomies that were performed back in the day because of undoubted benefit for a minority of patients. Some involve intrusions of non-biological considerations, like race, gender, class (especially class) into the process. But these things are supposed to be taken into account in judging risks of various diseases, explaining things to the patient, choosing affordable care, etc. And there are factors involving the clinicians personality type (rigidity, openness to novel diagnoses, etc.)

A lot to think about. Ars longa vita brevis.

The idea that a computer analysis can outperform a good clinician may be true under specific conditions, but is certainly not true in diagnosing heart disease. ECGs in particular often have a great deal of noise from a patient in critical condition and the "electronic analysis" of these systems may read it as normal where it is highly abnormal. Now, if you train physicians to rely on that computer analysis they will lack (or lose) the skills necessary to make that catch. I have seen it happen and there is no substitute for good training. It might be helpful to have a computer perform a secondary analysis and have disagreements resolved by a committe of experts (in non-critical situations).

Is this the EMT theologians trying to apply their concepts elsewhere, now that we know there is nothing efficient or rational about the markets?

People can manage seven facts at once, plus or minus two.

For any application where the relevant number of facts is more than five, results will be poor.

Pretty much any disease will involve more facts than that in the diagnosis; this gets coped with using heuristics. The heuristics aren't very good because the choice space is too big. (Just the desk reference has more stuff in it than someone can possibly accurately remember.)

Familiar things with relatively small choice spaces allow people to develop good heuristics, which then fail horribly under conditions of change. People will typically fight to reverse the change rather than adopt new heuristics.

The choice space for diagnosing pretty much any disease are not stable, because the next patient has a completely different but still significant medical history. So the heuristics tend to fail.

Prompter systems, with patient information, a standard test battery, and "symptoms are consistent with condition with x% confidence; symptoms are consistent with condition with y% confidence" and so on listings, would almost have to greatly improve the process of diagnosis. They could certainly converge better on accuracy than the physicians, unassisted, could do.

I'm working on explanation 1 (as is Bernard Yomtov). The clinicians who raised me stress that the key diagnostic tool is the history not these newfangled diagnostic tests (which they use -- one had over 18,000 cites last time I checked). Even if the experimenters provided clinicians with patient records, they couldn't reproduce an examination in which the clinician asks questions based on earlier answers. Either the clinicians were not given access to the patients and fighting with one ear tied behind their backs or the programs were given a summary of the data from a patient interview which a clinician considered key to the case and were doing the easy part after the hard part had been done by a person.

There really isn't another possibility as far as I can see. Now it is alleged that young physicians are trained to make diagnoses in a way comprehensible and convincing to health insurance company employees and therefore interrupt the patient to follow a standard diagnostic flow chart which could be mechanized (the part about interrupting is a fact the rest is what the Grey bearded Arnold Relman (who has been criticized a bit in comment threads below) argues).

Given my total ignorance, I can't rule out any possibilities including the possiblity that insurance companies' desire to not pay for un-necessary care has lead bright young physicians to do a worse job than could be done by an 80s era PC.

homer,

One study I've seen cited involved EKG's. A large sample were presented to a computer algorithm and an eminent Swedish cardiologist. The cardiologist took it seriously and spent a lot of time on the project. The result was a small but significant edge for the algorithm.

I saw it in a paper concerning the cognitive psychology of expertise. It has nothing to do with traditonal economics.

There's a typology of this kind of thing in a collection of essays by a guy named Maubouisson. Depending on the problem area, the best performance may be an expert, an algorithm or the collective judgment of a crowd.

Many of these questions can be answered by following Brad's link to my page, where you'll find citations to the actual literature. (I particularly recommend the paper by Dawes, Faust and Meehl.) If anyone has pointers to studies which compare the accuracy of clinical judgment to statistical decision rules in natural settings (e.g., hospitals), I'd be extremely interested. This set of ideas certainly does _not_ come from efficient-market theorists (who I have little time for) or other economists, but rather is a quite solid part of cognitive psychology. If anything, it suggests yet another reason (as if there weren't enough already) why the economists' notions of rational behavior and rational expectations are bunk.

Upon reading this, I recalled the last literature I read in this area. I work on machine learning and public policy and my wife is a doctor, so I have a natural interest. In the literature, doctors were misdiagnosing approximately 4% of patients that eventually had heart attacks. The miss rate hadn't changed in 20 years. Most doctors admitted that they could use computer assistance catching "rare events" (unusual symptoms they were not accustomed to seeing which indicated a heart attack).

The big difference between 20 years ago and today isn't the change in mistakes, it's the doctor response. Today, everyone who presents with chest pain is admitted to the hospital. 20 years ago, they weren't admitted and the technology might have actually helped catch a few that should have been admitted. The public policy question is: "what are we going to do with the mistakes?" Whether the machine makes 4% mistakes or 2% (if the machine learning system is capable of reducing the error rate by half), we still have to change the doctor response in order to save money. What error rate are we going to be willing to tolerate without punishing the doctor through malpractice lawsuits?

Right now, the medical malpractice environment makes the use of a computer system to reduce the mistakes a waste of time. If the standard of care is to admit anyway, who cares if we have the technology to cut the mistake rate in half or by 2/3rds? In the end, we spend money on technology which only increases the cost of care.

Leaving aside the malpractice suits for analytical purposes, in the real world we want to know the relative cost of false positives and false negatives (Type 1 and Type 2 errors). If the cost of the hospitalization is low compared with the cost of foregone effective treatment, we want overdiagnosis. And conversely. These costs are difficult to estimate. Some of them involve putting a dollar value on a year of life, which makes people uncomfortable. Studies of clinical versus algorithmic performance are worthwhile, but in the medical area very difficult to translate into policy.

I understand an *insurance analyst's* desire to know the true costs, but, in most cases it is almost a no-brainer for the hospital to over-diagnose. In most states, the hospital gets increased revenue at every stage of the diagnosis and treatment process for the over-diagnosis, and it has the added benefit of simultaneously protecting themselves from malpractice litigation for a rare event!

If you're a payer, which we all are, you should be horrified at the use of our money. So, while I expect to see insurance companies use the computer estimates to whack hospitals or specific doctors into accepting lower reimbursements for a diagnosis/treatment regime which could have been better predicted by a discriminative modeling system, the problem is that the machine has a restricted training and testing set and, if it models cost, it is probably modeling costs from the institutional viewpoint. The doctor is considering cost to the patient, to the institution, and to herself/himself. And there may be an unknown prior which does significantly skew the expected cost calculation in the doctor's mind.

From a public policy standpoint, I'm more concerned that this cycle is broken in the real world. All real world evidence points to analysts pressing areas where they can get traction instead of addressing the white elephants in the room.

I agree, there are huge institutional and economic distortions in what's supposed to be an objective process.

It's certainly of interest to consider some medical diagnosic and decision-making situations from the standoint of cognitive psychology. It looks like for situations where there's a well-defined input and a small number of choices both humans and algorithms perform well, with perhaps an edge to the machine. There's a classification of biases and fallacies that could be applied to medical situations with possible benefit, despite the difficulty of major reform.

My impression is that malpractice suits are randomly related to medical behavior. There are many suits that have no real basis. There are huge screwups that don't lead to legal action because families don't know what happened or don't want to sue for various rational and irrational reasons.

The comments to this entry are closed.

Search Brad DeLong's Website

  •  

A Rising Sun

  • "I now know it is a rising, not a setting, sun" --Benjamin Franklin, 1787

Graphs

  • Global Warming
    Matthew Yglesias » Yes, The World is Really Getting Warmer
  • The U.S. Federal Budget Deficit
  • Modern Economic Growth Is a Historically Recent Phenomenon
    20090604 issuu Slouching.VI.doc
  • Escape from Malthusland
    20090604 issuu Slouching.VI.doc
  • The TED Spread Normalizes
  • Recovery in the 1930s
    Path Finder
  • Stock Market: The Graham Ratio
    Path Finder
  • Employment-to-Population
    Path Finder
  • GDP Growth
    Path Finder

From Brad DeLong

Egregious Moderation