IT is election time and this time, more than ever before, we have had a large number of opinion polls going on the air on various channels. I, together with the Centre for the Study of Developing Societies (CSDS) and CNN-IBN, have already done three polls and will do a fourth one. There have been demands for banning opinion polls and then we have also had a sting operation.

Leaving controversies aside, let me try and discuss the broader questions involved in opinion polls, namely: 1) How can the opinion of a very small fraction of the electorate give the psephologist any insight into the mood of the nation? 2) How do we convert votes to seats? 3) Can opinion polls done weeks ahead of polling day accurately predict what the outcome will be? 4) Do opinion polls have any feedback effect, namely, do they influence voting behaviour?

As for the first question, consider a teacher who walks into her class and tells the students that she has a box with 1,000 chits of paper, all identical, each carrying the name Sachin or Sourav, with 99 per cent of them having the name of her favourite cricketer. She then invites one student to come forward, shake the box, draw one chit, unfold the same and read out the name on the chit. It turns out to be Sourav. She then asks the class to guess who her favourite cricketer is, Sachin or Sourav. She adds that those who guess correctly will get three extra days to submit the next assignment, while those who get it wrong have no penalty.

If you were one of the students, what would your answer be? I have yet to come across someone who would say the answer is Sachin. This is common sense and that is all that is required to understand opinion poll-based projections. One can argue, with 99 per cent confidence, that the answer will be correct. What if only 90 per cent of the chits had the name of the teacher’s favourite player? Then, the confidence level would be only 90 per cent. Can we improve the accuracy? If the teacher allows five students to come and draw one chit each and read out the name, the students can go with the name that appears on three or more of the chits. And they will again have over 99 per cent confidence. You can ask any class 12 mathematics student to do this calculation. And the answer will be the same if the total number is 10,000 or even one lakh instead of 1,000.

Now, consider a scenario where the total number of chits is over 10 lakh but only 52 per cent of them have the name of the teacher’s favourite player. If the students are allowed to draw 4,001 chits and if they go with whichever name appears in 2,001 or more chits, once again they can guess with over 99 per cent confidence.

**Predicting elections**

If in a constituency with 10 lakh voters, let us imagine that we write out all possible lists of voters, with each list having 4,001 names. Suppose there are only two candidates, with one of them having 52 per cent support and the other 48 per cent. Suppose each list is written on magic paper that turns red when dipped in water if 2,001 or more voters on that list support candidate A and the paper turns blue if 2,001 or more support candidate B. We can compute mathematically the percentage of chits that would turn blue or red when dipped in water. It turns out that if A has 52 per cent support, then over 99 per cent of chits would turn red and, if only 48 per cent support B, then less than 1 per cent of the chits would turn blue. Thus, if we can mix these chits well and draw one randomly, get the opinion of the people on the chit (called a sample) and determine who has majority out of these voters in the sample (like dipping the magic paper in water!), then we can predict with 99 per cent confidence the same candidate as the winner.

Indeed, if there are more than two candidates, as long as the gap between the winner and the candidate coming second is at least 4 per cent, we can determine the winner with 99 per cent confidence. With a 4,001 sample size, one can show that the percentage of support for the candidate in the whole population and the percentage of support in the sample differ by less than 2 per cent with 99 per cent probability if the sample is chosen randomly.

Thus, coming to the context of the Lok Sabha polls, if the aim was to estimate the vote shares for the major parties at the national level, a survey with randomly chosen 4,001 voters nationwide would give an estimate that is within the true value, plus or minus 2 per cent. And a sample with 16,001 voters nationwide would give an estimate that is within the true value, plus or minus 1 per cent. However, the public and the media are interested in the prediction of seats and not votes. In other words, they are interested in the prediction of the composition of Parliament—which party/alliance, if any, will get a majority. If no one is getting a majority, then which will be the leading party/alliance and how far is it from the majority mark?

There is no magic formula that will convert vote shares into seats. The seats depend on how the votes for the party (and for the other parties) are distributed in a State. Now, if we were to choose a sample of 4,001 voters randomly in each of the 543 constituencies in the country, then we can be sure that in most of them (at least in the constituencies where the contest is not very close), we will be able to predict the winner. But this translates into a sample size of over 21 lakhs. We simply do not have the resources to conduct a survey on this scale since both money and trained and reliable manpower are in short supply.

So, to do a nationwide poll, one has to use other methods. The methods used in the United States cannot be used here as our parliamentary system is different from theirs. Though our parliamentary system is similar to the one in the United Kingdom, we cannot use the methods from the U.K. as the socio-economic profiles of voters in each polling booth are not available in India (it is not available even at the constituency level). This data are only available at the district level since Census data are organised by districts. Also, unlike in the U.K., public opinion is very volatile in India. When parties change alliances and leaders hop parties, how can we expect voters to be loyal to parties? These two factors ruled out using the U.K. model in India and we had to develop the model afresh. We can build a model of voting behaviour that incorporates the socio-economic profile of a voter. Given the Indian reality, we would need a separate model for each State because voters with similar socio-economic profiles seem to vote differently even in neighbouring States such as Tamil Nadu and Karnataka or Uttar Pradesh and Bihar. So we would need a different set of parameters for each State. This would make the number of parameters very large and to estimate these we would need large sample sizes. Moreover, as mentioned earlier, socio-economic profiles of constituencies are also not available.

**Simplified model**

So we use a crude model that assumes that the swing for a party is constant all over a State. Swing is the change in the vote share of a party from the last election to the current one. With this model, we need to estimate the vote shares of the major parties (or alliances) in a State via opinion polls and then use past data to come up with an estimate of the vote shares of these parties in each constituency. In order to get a good estimate of the vote shares, we do a multistage probability-proportional-to-size circular sampling—in the first stage we select the constituencies, in the second stage we select polling booths and in the third stage we select respondents from the voters list randomly, and the enumerators have to go door to door and meet the selected voters and get their opinion. This methodology has generated fairly representative samples over the years (as seen by comparing the sample profile on socio-economic variables in the sample at the State level with the population profile available from Census data). In order to convert these vote estimates into seat shares, we need to take into account the difference in vote estimates for the two leading parties. If the difference is large, we can be reasonably sure that the party leading will also win. What if the estimates are not too far apart? Or if a third party also has a vote share very close to those of the first two?

I have developed a model for calculating the probability of a win based on vote shares: the higher the gap, the higher the probability of a win. It has the effect of giving the best chance consistent with the difference in vote shares to the candidate coming third, then to the second.

I do not want to go into the details since they are technical in nature. If our objective is to predict the seat share at the national level, this model works fine. I have done what is called back-testing the model: using data from the 2004 general election, we tried to calculate the vote shares of the major parties in each State and then compared them with the actual results of the 2009 election. While the results at the national level were good, the results at the State level were quite off the mark, but the errors cancelled out mostly.

A similar model for State Assembly elections has been used several times and we have a good track record when we compare our predictions and the results. Coming to the third question, I strongly believe (based on surveys we have done) that the predictive power of an opinion poll done weeks ahead of the polls is rather poor. For one, there is a huge volatility in public opinion as elections come nearer. This was observed by us in 1998 and in our series of tracker polls. Moreover, the pre-election opinion poll samples the opinion of the entire set of registered voters, but what counts for results is the subset of those who go and vote. And this percentage is usually between 55 and 65. These two observations make any projection made weeks ahead highly suspect. What an opinion poll can do at best is to gauge the mood of the nation at the time the data were collected.

Both these issues are addressed by the exit poll, where the voters are interviewed as they exit the booth. However, in an exit poll, it is difficult, or rather impossible, to choose respondents from a pre-chosen list. We have to leave the choice of respondents to the enumerator and the few times when we tried this, we got sample profiles which were very different from the population profile. The attribute where we saw maximum distortion was the gender.

So, we have stopped conducting exit polls and instead, given the multiphase elections in India, we conduct a door-to-door, proper, methodological poll two or three days after the elections are held in a particular constituency. On the last day of the poll, we go on the air with our findings for all the areas/States that had gone to poll earlier. For the constituencies that had elections on the last day, we publish our findings after two or three days.

Coming to the last question. about the demand to ban opinion polls, I believe that opinion polls do have a feedback effect on the electorate. However, various other things that happen during the campaigning have an effect too. So, instead of leaving it to political analysts to say who is winning, opinion polls give a scientific basis for measuring public opinion. Of course, it is a tool and its use or misuse depends on the user. I do not think that banning opinion polls is the solution.

*Rajeeva Karandikar, a mathematician, statistician and psephologist, is Director of the Chennai Mathematical Institute, Chennai. This article is based on a research paper written for The Hindu Centre for Politics and Public Policy.*

Please Email the Editor