Sam Wang: Why the Polling Industry Sucks
Matthew Yglesias sends us to Brendan Nyhan who sends us to Sam Wang:
The economics of reporting polls: TThe only thing happening in the Meta-Analysis is a slight, slow widening of Obama’s lead. Some of you want to know about individual polls, such as a recent Gallup national poll showing Obama ahead by only +2% (standard likely-voter model) or +6% (high-turnout model). I confess that I tend to ignore individual polls because of the statistical variability. So it didn’t occur to me to care about this particular data point. Obama is still crushing McCain, period.
But there is a lesson to be learned here: It is not in the interest of individual pollsters or media organizations for you to have the most accurate possible picture of the horserace.
Here is why.
Uncertainties such as the margin of error can be reduced by taking more samples. An individual pollster can halve the margin of error by surveying 4 times as many people. It’s a square-root relationship: N samples lead to a sqrt(N)-fold reduction in uncertainty. The same is true for combining polls, with the added advantage of reducing the effects of methodological variation. Thus the value of poll-aggregation sites like this one. Meta-Analysis worked extremely well in 2004 and 2006, and is likely to do so again this year.
So why don’t more pollsters or media organizations aggregate polls? The CNN Poll of Polls is a start, but it’s an exception. Two forces encourage bad horserace reporting:
Competition among pollsters. It’s not in the interest of individual pollsters to say “average my results with the others.” It’s also not advantageous to collect a larger sample once the margin of error meets industry standards.
The hungry media beast. With news budgets on the decline, it’s costly to report real news. Why pay for investigative reporting when you can buy a poll and report the horserace? Within the area of poll reporting, market forces discourage high accuracy. For example, commissioning a survey of 4 times as many people would reduce uncertainty by a factor of two. But why pay 4 times as much for data that generate a lower likelihood of an apparent - and reportable - swing?
For these reasons, media organizations aren’t motivated to report polling results with the maximum possible statistical power. The Meta-Analysis of State Polls is pure data reduction, basically a more general version of averaging. As a result, the top-line result is very steady. This a case where a blogger-hobbyist can add value. We use the polling/media system to provide added value - for cheap.
Which brings us back to costs...










www.electoral-vote.com
Run by Andy Tanenbaum, an MIT/UC Berkeley trained computer scientist and mathematician. This is the poll aggregator you may be looking for.
Posted by: Matt Goldstein | October 21, 2008 at 10:03 AM
Increasing the sampling number will only decrease random errors.
I suspect that the largest source of error is probably the systematic errors, e.g. likely voters vs registered voters, time of the day that the sampling is done, etc.
In any case, people change their minds so the final result (which is what we want to know) will still be obscured.
Posted by: NeilS | October 21, 2008 at 11:21 AM
"Why pay for investigative reporting when you can buy a poll and report the horserace?"
I'd like to see them hire somebody to report on the issues, using what's right there on the candidates' websites. How expensive would that be?
Posted by: low-tech cyclist | October 21, 2008 at 02:46 PM
It doesn't seem to make much sense to sample four times as many folks, as less than four times as much but still considerable cost, when
- the relative robustness of your number, ceteris paribus, will not make your number shinier or bigger in any news story, and
- because it's a horse race and a bunch of other folks are doing similar polls, the value added from knowing slightly better who's leading in the middle of the clubhouse turn isn't worth it, and
- there are hundreds of polls conducted over the course of the political races, and the marginal cost of bigger samples might add up to real money (noting again the iffy marginal value from an individual big poll), and
- the market has arrived at the current sample size as the preferred one, as one providing adequate information for decisionmaking, and being cheap enough to do again and again over the course of the race, and news organizations eat 'em like popcorn and use 'em as props for their narratives, learning more from the flow of results than from any one poll.
[apologies for any real or apparent redundification in the above points.]
Posted by: MaryCh | October 21, 2008 at 11:41 PM
Here is a new way of looking at poll data. This takes the national average of the poll data from realclearpolitics.com and annotates it with the news headlines from the time of the update. See how Joe the Plumber and Colin Powell stack up in helping their endorsements.
http://hooverlaw.com/charlie/polldata.php
Posted by: Charlie Hoover | October 22, 2008 at 09:30 AM