Flaws in a forecast model

The India Meteorological Department's monsoon forecast model, which is based on 16 parameters in an empirical relationship with the total quantum of monsoon rainfall over the entire country, is flawed in certain respects.

R. RAMACHANDRAN

FOR the 12th consecutive year, the India Meteorological Department (IMD) has issued its operational Long Range Forecast (LRF), more than 10 days ahead, for a normal monsoon. A normal monsoon, in the IMD's parlance, is when the total rainfall for the coun try as a whole during the period between June 1 and September 30, the four months of the southwest Monsoon, is within 10 per cent of the Long Term Average (LTA) rainfall. The LTA is taken by the IMD to be 88 cm, based on the average over a 70-year period (1901-1970). That is, a monsoon is termed normal if the seasonal rainfall is between 97 and 80 cm.

The IMD's forecast is based on a statistical "power regression" model that uses 16 parameters in an empirical relationship with the total quantum of monsoon rainfall over the entire country. Historical data over a sample period (1951-1987) have been used to identify the 16 atmosphere-ocean-land variables that have significant influence on the monsoon behaviour (see table 1). These atmospheric factors, the "predictors", are both regional (such as the Himalayan snow cover) and global (such as the El Nino, the anomalous warming of the ocean off the Peruvian coast in the eastern Pacific), the latter's influences being termed "teleconnections". A measurement of their stabilised values, which occur from December to May, are used in establishing a quantitativ e relationship with the total monsoon rainfall, the "predictand". A quantitative prediction based on this model, which gives the quantum of rainfall expected, began to be given since 1988. The model is stated to have an inherent error window of 4 per cen t on either side because of its statistical nature.

Table 2 gives the performance of this statistical LRF over the last 11 years which, on the face of it, seems impressive. While the IMD's claim of this apparent success has become politically acceptable, it cannot hold ground scientifically. On six out of 11 occasions the forecast has gone wrong in the sense that the actual rainfall was outside the inherent error margin of plus or minus 4 per cent. The forecast has gone awry for the last four consecutive years with the error during 1994, 1997 and 1999 be ing particularly large. Fortunately for the IMD, this failure of the model has not impacted the operational aspect of the forecast adversely because, in the gross, the monsoon has been "normal" all these years.

That a statistical model should be able to predict monsoon behaviour successfully at least in this gross sense is hardly surprising. Scientists like P.K. Das, a former IMD Director-General, point out that if one looks at the history of monsoons, rainfall has been within the "normal" window of plus or minus 10 per cent (which is quite substantial) roughly 70 per cent of the time. This is so because, given the large geographical area of the country, there are enormous spatial variations, and drought over any region tends to get offset by an excess in another region to bring the overall rainfall within the large window. A drought and a flood in different parts during the same monsoon is not uncommon. The rainfall in 1994 was particularly skewed. Similarly , in terms of rainfall distribution, 1989 and 1991 were not good monsoon years, even though normal. Given this, the entire country cannot be used as a single unit for rainfall prediction and more so if the forecast is to be of some utility for agricultur e.

In a sense, the present model is retrograde in approach, both in scientific and economic terms. It is a throwback to 75 years in the past. The initial forecasts in the late 19th century, which were initiated by H.F. Blanford in 1886, were made for the wh ole of British India. Later, Gilbert W. Walker, who discovered the phenomenon of the southern oscillation and its influence on the monsoon, identified a number of regions with similar rainfall characteristics. He started separate forecasts for northweste rn India, northeastern India and peninsular India in 1924. The forecasts for northeastern India were discontinued after 1935 since the year to year variability was small and well within the accuracy limits. There were separate forecasts for northwestern and peninsular India until 1988, when the practice was changed to all-India forecast.

The earlier practice of region-wise forecast is meaningful as rainfall variability is as high as 40 per cent in northwestern India and as low as 7 per cent in northeastern India. Therefore, the attempt to forecast for the country as a whole, including th e highly unpredictable northwest and least variable northeast (which offset each other most of the times), is much easier than what was attempted by Walker (and several other researchers later) and marks a step backwards to 1924. Since last year, however , the IMD has attempted to give forecasts for these "broad homogeneous rainfall" regions individually but the forecasts, not surprisingly because of the model's shortcomings, were much more in error than over the whole country. But there are other fundam ental methodological flaws in the model as Das points out. Some of the serious criticisms are: (a) The predictors, especially pressures and temperatures, are not independent of one another. For example, the Southern Oscillation Index (SOI); the El Nino i n the present year and the El Nino in the previous year are not independent of each other; (b) The number of predictors (16) is too large and this could lead to cross-correlations, especially if the sample size is small (like in this case) in the regress ion analysis; (c) The screening procedure for the selection of predictors and the relative importance of each predictor are unclear. An attempt to see how much of variance each parameter accounts for and then to whittle down the number does not seem to h ave been made; (d) The claim of capturing the non-linear interactions among the various climatic forcings of the monsoon by a "power regression analysis" is questionable because the 16 predictors of the model were initially chosen by a linear regression fit with the historical rainfall data.

Even as regards (b), it is a step backwards because Walker had again noticed that an increase in the number of parameters does not enhance the accuracy of a forecast using regression analysis. This was more quantitatively shown by E.N. Lorenz of the Mass achusetts Institute of Technology (MIT), the man who discovered the phenomenon of chaos in physical systems, in a classic paper way back in 1956. What Lorenz has pointed out is probably what is happening to the model, resulting in increasing errors in fo recast.

"The 16 parameter model was only marginally accepted by the scientific community," points out T.N. Krishnamurti, a monsoon expert at Florida State University, United States. "There were problems with the validation methodology. Somehow more often it seem ed to reach a conclusion that seasonal monsoon rains were verified (by this method) to be near normal when many expressed doubts about the way that conclusion was arrived at. Personally, I have some strong doubts about the validity of this entire procedu re. Many a time we have felt that those repeated normal seasonal rainfall forecasts were somewhat suspect," he says.

According to Krishnamurti, doubts over the validation procedure arise from the following. All-India rainfall may not be a sufficient index for monsoon rain when the systems producing the rain have a broader extent covering the adjacent oceans as well. Th e monsoon rainfall may be better expressed by an average rain over land (as measured by rain gauges) plus rain over adjacent oceans (estimated using remote sensing). The 16 parameters do not just see the land but the depressions over the ocean basins as well, Krishnamurti points out. However, in a statistical sense this objection may not be as serious as what D.R. Sikka, former Director of the Indian Institute of Tropical Meteorology (IITM), Pune, who was once associated with this model, points out.

Although the IMD uses 88 cm as the mean all-India rainfall or the LTA based on its analysis of the rainfall time series for the period 1901-70, there is no one figure for the LTA on which there is a consensus among monsoon researchers. The problem arises because of the inconsistency in the number of rain gauges used for collecting rainfall data, and the corresponding area averaging that is done, from year to year. The entire country has a rain gauge network of about 5,000 stations. However, because of t he non-availability of rainfall data over a consistent set of rain gauges, different rainfall series have been constructed based on different rain gauge networks giving rise to different LTAs ranging from 84.6 cm to 90.3 cm.

According to researchers, the most reliable rainfall series that has been constructed till date is the one by B. Parthasarathy and associates using a consistent set of 306 well distributed rain gauge stations, one for each district, and covering 29 meteo rological sub-divisions, for the period between 1871 and 1994. The LTA, according to this time series, is 85.2 cm. This differs from the IMD figure of 88 cm significantly. One of the reasons for that is the former excludes six sub-divisions (basically hi lly regions and islands). However, researchers believe that the IMD data are not over a consistent and representative set of rain gauges for the entire 70-year period.

Sikka feels that the IMD figure needs to be re-assessed and updated according to criteria laid down by the World Meteorological Organisation (WMO) for any meaningful evaluation of forecasts because the difference between different values of LTAs in use i s as much as the observed variance in the monsoon rainfall itself (which is the same as the window for normal rainfall). Also, more important, it is well-known that measured rainfall differs significantly in the number of rain gauges used from year to ye ar. Often it has been pointed out that rainfall occurring over the mountains or during huge floods is not adequately measured. Therefore, for comparing the performance of a forecast model, the rainfall data used should be based on a consistent set of rai n gauges and a consistent method of area weightage and area averaging. It is not clear whether the IMD uses a consistent rainfall measurement for comparing with the forecast. This basic flaw in validation has not been addressed by the scientific communit y, Sikka says.

Notwithstanding the above, the other methodological problems would seem to be slowly catching up with the IMD model. Given the string of off-the-mark forecasts, the IMD has finally admitted to failure of the model, albeit indirectly. In the forecast for the current year, with the aim of limiting the forecast error, the IMD has dropped four parameters from its original set and replaced them with four new ones, "which", to quote this year's press release on the forecast, "have a stronger statistical relat ionship to monsoon rainfall." The IMD reasoned: "The quantitative forecast error has been larger than the model error in some years, primarily because the statistical relationship of some of its predictors has been weakening with time."

The IMD's reasoning may or may not be true but forecast failure could just be a consequence of the model's inherent flaws. This apparent weakening of some of the predictors may well be an artefact of the model's inadequacies, and not a real one, accordin g to some scientists. "Adding new parameters and dropping some of the old ones is as ad hoc as the initial choice of some of those," feels Krishnamurti.

One of the arguments forwarded by the main author of the 16-parameter model, V. Thapliyal of the IMD, Pune, is that the correlation coefficients (C.C.s) - a measure of the statistical relationship between two quantities - of different predictors with the monsoon rainfall change with time and this "epochal variation" of the C.C.s limits the accuracy of statistical models based on a few parameters or predictors. According to him, if a large number of predictors are collectively used, the resultant signal from a group of predictors may improve the accuracy of the forecast. This is contrary to what conventional wisdom of statistical analysis a la Lorenz would dictate.

In 1997, Thapliyal showed that in the context of the 16-parameter IMD model, while some parameters were stable over the 10 years of forecasts others were unstable. Their C.C.s fluctuated from year to year, rendering the current influence of some of them on rainfall forecast weak. In that analysis he had argued that operational models should be revised every year by replacing those parameters that have been less stable during recent years. It is from this perspective that this year the IMD altered four o f its parameters. But this goes against the grain of the basic understanding of the monsoon phenomenon as a stable and consistently recurring meteorological process governed by a stable set of atmospheric factors. The power of a good statistical model la y in being able to identify correctly these forcing parameters.

Significantly, the set of parameters replaced includes the position of the 500 mb high pressure ridge that runs along 75n E longitude in April. This is something that has surprised experts since the 500 mb ridge position in April was known to be strongly correlated to the monsoon rainfall. There have been many statistical forecast models with fewer parameters and all of them invariably had 500 mb ridge as the key predictor, accounting for a large part of rainfall variability, ever since its influence wa s discovered in 1978 by A.K. Banerjee and associates.

The ridge basically governs the mid-tropospheric (about 6 km above) wind circulation pattern, and its empirical relationship with rainfall behaviour suggested that it had a strong influence on the monsoon. It was found that a southward displacement of th e ridge compared to its normal position in April (at about 15nN) being unfavourable for a good monsoon and a northward displacement being favourable. The parameter has always been considered to be stable and robust with a very large C.C. with the monsoon rainfall. In a detailed analysis of the influence of the April 500 mb ridge in 1987, J. Shukla and D.A. Mooley found that a two-parameter statistical model, which used the ridge position and Darwin Pressure Tendency - defined as the pressure difference at Darwin between winter and spring - could explain the rainfall behaviour during the period between 1939 and 1984 to a large extent (about 80 per cent). Their analysis too showed that the April ridge position had a very large C.C. with the rainfall.

Darwin pressure tendency is what is known to drive the southern oscillation phenomenon, which, combined with the El Nino phenomenon (together called the ENSO phenomenon), is found to have a strong influence on monsoon behaviour. Interestingly, in the cha nged set of parameters, the IMD has now used Darwin pressure tendency (instead of only pressure at Darwin), the importance of which had been demonstrated by Shukla and Mooley in 1987. But, equally interestingly, the IMD has dropped the 500 mb ridge posit ion altogether contrary to the finding of Shukla and Mooley. This defies logic because, even assuming that its influence has weakened over the years, it is unlikely that an atmospheric feature that had a strong influence on the monsoon would suddenly los e its influence completely. On the contrary, there have been suggestions in recent times that there is perhaps a global scale teleconnection between the April 500 mb ridge and the pressure anomaly at Darwin in February.

But more pertinently, Thapliyal himself, using a different model called the Dynamic Stochastic Transfer Model, found that rainfall predicted using the 500 mb ridge position as the leading input parameter to the model was close to observed data. Therefore , experts argue that the apparent weakening of the influence of the April 500 mb ridge position in the 16-parameter model is more likely to be an artefact of the model itself and thus makes the entire methodology of the IMD model, its forecast and the va lidation suspect.

There have been many, and there are many ongoing, research efforts which try to avoid these shortcomings of statistical forecasts. It has been found, for example, that there is considerable similarity in the rainfall over 14 meteorological sub-divisions of northwestern and central India and it is possible to forecast the monsoon with as few as four predictors for this homogeneous rainfall region. Another work has demonstrated that properly identified three predictors can predict the rainfall with reason able accuracy.

Some of the key predictors in all these analyses are: (1) the latitude of the subtropical 500 mb high-pressure ridge in April; (2) the SOI; and (3) the winter mean temperature of the northern hemisphere. These, unlike the 16 parameters, are known to be f airly independent of each other. Unfortunately, while there is growing scientific opinion that monsoon rainfall over India is region-specific and a large number of predictors are unnecessary, the IMD's 16 parameters have become politically acceptable.

"I feel strongly that a non-IMD group should be asked to re-evaluate carefully this entire procedure totally independently," says Krishnamurti. Whether such a thing will happen in this market-sensitive and politically sensitive matter of monsoon forecast is doubtful. But it is time scientists came out more openly with their criticisms of the model instead of airing them behind the closed doors of their offices.

Flaws in a forecast model

More stories from this issue

Sri Lanka

Dangers in sampling

Forecasts for farmers

Narmada dams and human rights

Designing for safety

Common heritage

Overcoming disadvantages

Charges of violation

A challenge and an opportunity

From the State Autonomy Committee Report

India’s record heatwave vows to return: Can we survive the next?

Editor’s Note: We need a bigger, better heat action plan

Comments