Can We Trust Opinion Polls?

Photo by Element5 Digital on Unsplash

Introduction

The election season is here in my home state of Tamil Nadu (India). Apart from the intense campaigning, mudslinging and catchy party propaganda songs, the election season is also the time for statistics. Many leading survey agencies have hit the road, with an aim to understand the ‘pulse of the people’, and to determine which political party/leader has the people’s trust.

The recent failure of polls to capture Trump’s victory in 2016 has been the most criticised in the recent past.

People Land

People Land is a parliamentary democracy, where the political party with the largest number of members in parliament forms the government. There are 100 members of parliament representing the 100 constituencies in the country. Each constituency has a voter population of 100,000. There are five political parties in the country.

The Voting Pattern of People Land

Let us begin by exploring how citizens decide who to vote for in People Land. In the simplest of scenarios (call this Scenario 1), let us assume that people vote solely based on their income and age.

The Voter Score Chart

How do Opinion Polls Perform in the Two Scenarios?

Naturally, we would assume that opinion polls would work well in the first scenario, where there are fewer factors affecting peoples’ choices, than in the second.

Accuracy of Opinion Polls in the Two Scenarios
Voter Score Distribution in the Two Scenarios
Popular Votes Cast in Scenario 1
Popular Votes Cast in Scenario 2

The Takeaways

Even in a simple case with made-up data, where people choose their preferred party based on their score from a ‘vote calculator’, we find that a sample size that worked perfectly well in one scenario fails in another. Hence, the success of polls is not merely a factor of the sample size but also the underlying homogeneity/heterogeneity in people’s opinions.

Footnotes (Technical Details)

[1] I generate the average age for constituencies from a normal distribution with a mean zero and variance of two. I generate the average years of education and income data for constituencies from a bivariate normal distribution with a mean of (0,0) and the following covariance matrix,

Applied Econometrician (M.A. from University of Chicago)