The Indian summer monsoon is governed by an intrinsically unpredictable component that cannot be captured by statistical methods.
DO we understand the monsoon as a meteorological system? Do we have a good idea of its underlying physical processes at all? Increasingly, the answer seems to be a No. Even our understanding of the key forces that create the monsoon mechanism, such as the differential heating of the land in the Tibetan plateau and the oceans surrounding the subcontinent, may not be as important as it is thought to be.
This is surprising considering the accumulated data of nearly 120 years and the level of research worldwide that has gone into gleaning the causative variables that drive the monsoon. Given the unfailing regularity with which the summer monsoon visits the subcontinent and the fact that seven out of 10 times the rainfall over the country as a whole is normal, the phenomenon seems to be a fairly stable one.
The intractable complexity of the monsoon system was brought home once again when the India Meteorological Department (IMD) issued its annual forecast for the total Indian Summer Monsoon Rainfall (ISMR) for 2007 - from June 1 to September 30 - on April 19. Stung by the failed forecast of 2004, when another drought struck the country after the IMD predicted a normal monsoon, the department sought to improve its long-range forecast model with "a new set of predictor variables, a new model development method and new statistical tools".
According to the IMD, the new long-range forecast method was developed in 2005 in consultation with the scientific community, including atmospheric scientists and experts in statistics from the Indian Institute of Science (IISc) and the Indian Statistical Institute (ISI), Bangalore. It was also published in 2006 in the highly regarded and peer-reviewed journal Climate Dynamics. Experimental forecasts using the new model were made for the 2005 and 2006 monsoons and these proved correct. It should, however, be noted that while the current 2007 forecast has used eight parameters, nine parameters were discussed in the published paper. The ninth parameter (North West Europe Surface Pressure) could not be used for the 2007 forecast because of poor data, according to M. Rajeevan, Additional Director-General of Meteorology (Research), IMD, Pune, who is one of the brains behind the model.
This is the third time in the last eight years that the number and nature of the parameters in the long-range forecast model has been changed. The IMD began its operational forecasts in 1988 using a 16-parameter `power regression model'. This used six regional and 10 global land, atmosphere and ocean variables, which were chosen for the observed correlations between the predictand (the Indian summer monsoon rainfall) and the predictors (data available from the turn of the 20th century).
The use of both regional and global variables seemed reasonable - while regional factors had the greatest influence on the monsoon, global variables were also significant because of what in meteorological parlance is known as `teleconnection'. The most important of these global variables are associated with the Pacific Ocean, in particular El Nino - the unusual warming of waters in East Pacific off the Peruvian coast from the preceding winter up to June - and its inverse, La Nina.
Despite the run of successes the 1988 model had for 12 consecutive years - its prediction of `normal' monsoons turned out right every time - the 16-parameter model had problems predicting the quantity of rain (see Frontline, July 8, 2000, and November 9, 2001).
In the context of the Indian monsoon, predicting a normal monsoon does not say much about the merits of any statistical model because nearly three-fourths of the time the monsoon rainfall is normal. But in quantitative terms, errors greater than the model error of plus or minus 4 per cent occurred as many as nine times using the 1988 model. More pertinently, there was also widespread criticism of the model's methodology. The use of so many parameters was problematic because each one might be influenced by the other and therefore could not be considered an independent parameter. Further, there could be a problem of overfitting because the parameters were identified by finding correlations from the limited data sample of past years. As a result the model would have limited predictive power.
Faced with these criticisms, the IMD replaced four parameters that had declining correlations with the rainfall. One of the regional parameters dropped as a result of this exercise was the latitude of the high-pressure ridge when it intersected with longitude 75 E at the 500 millibar level (at a height of 6 km). After its discovery in 1978, this parameter emerged as the most important single variable for monsoon forecast in many statistical models. The ridge governs the mid-tropospheric (at a height of 6-7 km) wind circulation pattern - a southward displacement of the ridge compared to its normal position in April (at about 15 N) was unfavourable for a good monsoon and a northward displacement was favourable.
But those who constructed the model found that the influence of this parameter was weakening over the years. Time series analyses of strengths of correlations (correlation coefficients, or CCs) showed that while the influence of some parameters were stable over the years, others were unstable. Indeed, according to Rajeevan, the CC of the 500 mb ridge parameter had, from a strongly positive value in the past, become slightly negative. Therefore, the IMD altered four of its parameters in 2000. The model, however, still included six regional parameters, though some were different from the original.
Statistical models are based on the assumption that the correlation between a predictor and the predictand will be sustained into the future. However, predictors have been found to display variations in these correlations every 10 years. These variations are probably linked to changes in the regional and global circulation patterns. For instance, in the past 30-40 years it has become clear that the El Nino Southern Oscillation has had a significant impact on the monsoon, particularly the correlation of a deficient monsoon rainfall and El Nino event. This suggests a strong coupling of land, ocean and atmospheric variables. It is not clear why this phenomenon was not dominant in earlier years.
Meteorologists now debate whether the influence of El Nino has declined. While some believe that it has weakened - particularly when a normal monsoon occurred after the strongest El Nino event of the century in 1997 - others, including Sulochana Gadgil and other scientists at the IISc, believe that extreme events can be satisfactorily explained if one considers the combined effect of El Nino (in the Central Pacific) and the so-called Indian Ocean Dipole (IOD), which refers to sea-surface temperature anomalies in the southern Indian Ocean.
After the drought of 2002, when 19 per cent of the Indian monsoon rain never fell, the IMD was forced to develop a new model. According to the IMD, at least eight to 10 parameters are required to explain the 70-75 per cent variance in the past rainfall data used to extract the rainfall parameters and limit errors in forecast to a minimum. Detailed time series analyses of the strengths of correlations showed that six April-May parameters and four winter-spring parameters were weakening significantly; these 10 parameters were dropped from the original 16. Further, through extensive data analysis, the IMD developed four new stable and physically related parameters - the new set of 10 parameters included only three regional parameters.
The new model proved successful in 2003. But the 2004 forecast was off the mark by a long way; what was forecast as a normal monsoon turned out to be a drought with a total rainfall deficit of 13 per cent. The IMD reanalysed the model and adopted an entirely new approach to identifying suitable and stable predictors. This exercise was completed in 2005, and after successful experimental runs in 2005 and 2006, the model was used to forecast the monsoon this year. In all, nine parameters were identified and eight of them were used for the 2007 forecast.
From 2003, the IMD adopted a two-stage forecast strategy. The first forecast (based on eight parameters) was issued in April, and an update was given in June using two additional parameters to adjust for developments in key variables such as El Nino in June. Thus, rainfall in July, which accounts for nearly 50 per cent of the seasonal rainfall, could be forecast more accurately. Following the same strategy, the present model uses one set of five parameters for the April forecast and another set for the June forecast, with three common ones in both sets.
Given that the forecasts for 2005 and 2006 (based on the 10-parameter 2003 model) were fairly correct, why did the IMD change the model for 2007? Perhaps, it was greater confidence in a model developed using rigorous statistical methods; it was the first time an ensemble method was used to identify the optimum subset of predictors. In the ensemble approach, all possible subsets of parameters have to be considered to develop a model. For five parameters, the number of possible models works out to 31. Five of these were selected on the basis of their correctness of the forecast over the period 1981-2004.
The final forecast method was based on a weighted average of the five predictions, with the weight depending on the correctness of the forecast. According to the developers of the model, the final method reproduced excess and drought years fairly well. While the magnitude of the drought of 2002 could not be reproduced even in this model, the sign of departure from the long-period average was reproduced accurately. The most curious aspect of the new long-range forecast model, however, is that it has done away with regional parameters all together.
Statistical models based on empirically derived predictor-predictand relations are expected to give a clue to, if not capture, the underlying driving mechanism of the monsoon. While the monsoon is a complex interplay of several factors, what is really intriguing is that, even though some factors weaken and new ones come into play, the combined effect of key parameters is somehow still the same. Thus, we have a stable meteorological system. It is however hard to accept that regional parameters are not of use at all and that the new parameters have no physical relationship with the regional parameters that have been replaced. If the statistical relationship matches the underlying physics, this would mean that regional influences have lost their grip on the monsoon completely and the monsoon has become truly globalised - a phenomenon that is hard to believe.
According to Rajeevan, regional parameters are no longer relevant predictors because they are linked to El Nino, whose influence on the monsoon is considered to be declining. However, if the impact of El Nino is still significant, as some believe, this is not a good enough explanation. So what are the systems that drive the monsoon?
"Statistical models are not based on physics. They are based only on some empirical correlations, which may have nothing to do with the actual physical phenomena. So I have little faith in them," says J. Srinivasan of the IISc, "Moreover, they can model only the average behaviour well and not the extremes of floods and droughts. That requires a proper understanding of the underlying physics. Unfortunately, despite the hope one had 20 years ago, we are still far from developing a satisfactory dynamic model to make predictions because we lack sufficient data from the ocean depths. The oceans have long memories compared with the land or the atmosphere and they hold the key to monsoon dynamics."
None of the existing approaches that use regional or global circulation models has been able to explain the inter-annual variability of the monsoon satisfactorily. Indeed, the predictability of the monsoon is limited by the fact that the mean monsoon circulation may be governed by an intrinsically unpredictable component - some dynamic processes in the Indian summer monsoon area are chaotic when compared with other tropical regions.
In such a situation, given the onerous responsibility of issuing an official forecast year after year, the best the IMD can do is to keep fine-tuning its statistical models without worrying about the real mechanisms that drive the monsoon and be correct 70 per cent of the time.
To be fair to IMD scientists, with the new model, they seem to have put in their best into developing a model without any serious methodological flaws. Any extreme monsoon failure, which even such models cannot be expected to foretell, should be seen as part of the inherent inter-annual variability of the monsoon and not as an occasion to change the model once again.