I heard a statistic today that makes it plain that the polls are simply junk. Michael Barone was being interviewed by Michael Medved and Barone stated that according to Pew only 9% of calls made by the polling firm result in a useable interview. I was stunned, so I looked it up – sure enough that is true. Barone also said 10% of homes now are “cell phone only.” Most polls utilize land lines only which means cell phone only households are not available for sampling. Interesting, this report says that 25% of the population is now cell phone only. For purposes of this post, let’s use Barone’s 10% number because it is more conservative.
If you put these two numbers together, we can conclude that only 8% of the population is even available for sampling. Let me turn that into a visual for you, we’ll base it on something that I do professionally – sample dirt to see if it is contaminated. Let’s suppose I need to determine if a 10 acre property is contaminated, how do I do that? Believe me, it’s not easy. Dirt is not a homogeneous material. Let me explain that. Water, for example, is easy to sample because I can stir it up. After stirring, the water near the top of the rail car is going to be pretty much like the water from the bottom of the rail car. So any sample I draw is going to be representative of the entire contents of the rail car, provided I stir it up before I take my sample. After stirring the water in the rail car is homogeneous.
Well you cannot do that with dirt – it does not stir – kind of like people around the country. So what you do is you take your 10 acres, and you draw a grid on it and you take a sample from each grid square and you hope that the combined results represent the totality of the 10 acres. But this is where things start to get really complex – how fine do you make your grid? In the picture at the left here, we have divided the square ten acre plot into a 10 X 10 gird. That results in 100 sample – that’s a lot of samples. Samples are very expensive, so is that too much or not enough?
Well that is where judgement comes in. In our example case, a geologist is going to look at the ten acres and see how much geological variation there is. The more variation, the more samples you will need to make sure you have checked all the different kinds of geological situations. Likewise, the ten acres will be reviewed by someone that knows how to look for external/surface signs of contamination (staining of the soil, dead vegetation and so forth). If there are some, you may want to sample where those indications exist rather than follow the grid rigorously since it is more likely the contamination is at those locations and you are trying to find contamination.
Now let’s analogize this to political sampling. How do the pollsters exercise judgement – like the geologist and chemist above? You hear a lot about “random” sampling. What random sampling is designed to do is almost the exact opposite of what the chemist is doing above. The chemist is looking for a specific result – is there contamination? Political pollsters do not want a specific result, so they use random sampling to exclude the possibility of there being any bias for a specific answer that they might be looking for. Rather than look for where things are likely to be contaminated as I do when sampling dirt – they just take any ‘ol sample in any ‘ol place. But, as you can see, that is a problem from the standpoint of the geologist because you cannot promise in such a situation that you have sampled each geological type.
In polling, this is where “cross tabs” that you have heard so much about come into play. Cross tabs are demographic data – all those questions you are asked at the end of a survey about where you went to school, how much money you make and what kind of car you drive is kind of the same as collecting geological data. So in our example of ten acres, our actual sampling is going to end up looking kind of like our picture at the right here. However, with each sample we are going to collect a lot of geological data. The geologist is then going to compare the data you collect with what is known about the region generally and he will then be able to determine if your sample represents the whole ten acres very well, or not so well. Thus someone reading the data can “weight” it accordingly. If the geologist determines that the samples are well correlated (meaning most of the geological variances are accounted for) with the region generally, then one can rely on the data very heavily – the data represents the acres well. If the correlation is not so good, (you ended up only sampling dirt of type X and got little or none of types A-F) then you must view the data received with “a grain of salt” – the data does not represent that acres very well.
Cross tabs are supposed to help the consumer of polling data weight the results. If the cross tabs match the country well, then the data is reliable, if match is bad, then the data is not so reliable. But note something very important here – to perform the weighting, we have to know about the nation as a whole. How many African-Americans are there in the sample versus how many African-Americans there are in the nation. This is an important point that we will return to in a moment.
With geology it is fairly easy to get the data – it is a well established set of characteristics that do not vary. Thus the regional geological data that one will compare to does not change over time. That is not true with political polling. Things like voter registration, party affiliation, income and social status, and many others change at a pretty rapid pace. So your weighting is only as good as your background data. In the analogy, should geological characteristics vary like political/demographic ones then, you would need a lot more samples just to make sure you had sufficient geological data about the region, and we are kind of back to the grid in the first picture.
But now let’s return to the statistics I opened the post with, and apply them to our ten acres. Now, my sampling map is going to look like the drawing on the left here – where I can take samples in only 8% of the total area I am supposed to represent. Normally, weighting would allow me to draw some conclusions about the data collected; however, what we are really talking about is political polling, weighting won’t help much. Since I cannot collect the background data from the 92% of the population unavailable to me, I have no way of knowing just how reliable my data is.
In polling it is these considerations that result in the “margin of error” that you hear about. But in this instance that is a bit misleading. Such margins are determined strictly on sample size (3 samples out of a 1000, let’s say) and not the weighting factors that we have been discussing. For more complex statistical situations there are other measures, like the “confidence interval.“ But note this:
Certain factors may affect the confidence interval size including size of sample, level of confidence, and population variability. A larger sample size normally will lead to a better estimate of the population parameter.
And note that when only 8% of the population is available to us, we cannot even adequately measure the “population variability,” and therefore cannot even get a good handle on more sophisticated means of statistical evaluation of the data quality.
So let me bottom line this for you. In a world where the vast majority (92%) of people self-select out of polling data, there are no meaningful conclusions that can be drawn from polling data, at least not given the state of the science as it exists today. Any accuracy that does come out of the polls is a result almost entirely of some extraordinarily shrewd judgement on the part of the pollsters. But on the shifting sands of political opinion this year’s shrewd judgement can very often be next year’s total misread of the situation. In other words, these incredibly low response rates mean that even if they precisely characterize the 8% that is available to them, something could be going on out there “in the black” that they simply know nothing about. (Then there is the factor of bias….)
Polling has always been a bit of an art form. But the ever decreasing response rates have reduced it to nothing more than just another way to express an opinion, that may or may not have any real correlation to reality.
I expect some very big surprises in November.