You are here

Study identifies new predictors for Indian monsoon through machine learning

Photo Credits: Gururaja K V / Research Matters

A recent collaborative study between the Indian Institute of Science, Bangalore, and the Indian Institute of Technology, Kharagpur, has employed machine-learning techniques to reveal newpredictors for the Indian Monsoon, making monsoon predictions more reliable. The team consisted of Ms. Moumita Saha and Prof. Pabitra Mitra from the Department of Computer Science, IIT–Kharagpur; and Prof. Ravi S. Nanjundiah from the Centre for Atmospheric and Oceanic Sciences as well as Divecha Centre for Climate Change, IISc. Using global climate data from 1948-2000 and machine learning algorithms, the team derived a set of reliable predictors for monsoon rainfall of the sub-continent.

Predicting monsoon in India has always been a challenge due to the influence of climate change and India's humid tropical climate. Typically, in all monsoon forecast models, sea surface temperature (SST) and sea level pressure (SLP) are used as predictors because they are important regulators of monsoon. Existing models consider predictors from certain influencing regions of Atlantic and Indian Oceans.

However, global climatic changes (natural as well as due to human activity) lead to fluctuations on multiple scales. This could alter the effect of existing predictors on monsoon phenomenon and new climatic relationships could evolve. Hence, it is necessary to continuously search for new predictors that influence the Indian monsoon and include them in building new deep-learning computational models. 

The researchers of the study tried to address this issue by collating the SST and the SLP values from all across the world from 1948 to 2000. They employed a deep learning computational algorithm named “stacked autoencoder” that could process such massive amount of data and identify global predictors for developing monsoon forecast models.

A stacked autoencoder has multiple layers of calculations. Every layer learns patterns in its input data by combining all the data points non-linearly. These patterns are learned and stored at the nodes of each internal layer. Nodes of the first layer serve as input to the second layer, and so the process continues. Greater the number of layers in the designed stack, greater is the complexity of the patterns revealed by the algorithm.

"We tried implementing a couple of neural networks," reminisces Ms. Saha, the lead researcher. "But the stacked autoencoder performed better in terms of accuracy and catching the extremes," she explains. The team's 3-layer stacked autoencoder had multiple hidden nodes that represented complex relationships between global SSTs and SLPs. Nodes that exhibited the highest correlation with Indian monsoon in the same period (1948-2000) were then chosen as the final set of predictors. Their study is the first report of a stacked autoencoder being used to identify predictors for Indian Monsoon.

The researchers tested the newly identified predictors to match rainfall values for the period 2001-2014. Tests revealed a greater accuracy in prediction than existing models for both phases of the Indian Monsoon - Early (June-July) and Late (August-September). It was found that the new model forecasted early monsoon with mean absolute error of 6.8% in January, and the late monsoon with mean absolute error of 4.9% in March. Overall, their predicted rainfall values were closer to the real long period average (LPA) of rainfall as compared to Indian Meteorological Department’s (IMD) predictions. In the years 2011 and 2012, IMD had predicted a deficit, but actual rainfall was in excess of the LPA. In contrast, predictions by this new model matched the deficit and excess trend of actual rainfall in every year of the test period.

Even for the present year (2016), IMD predicted high probability of higher-than-normal rainfall. This new technique, on the other hand, predicted slightly below average rainfall. Prof. Nanjundiah points out, “Current trends of monsoon behaviour indicate that predictions based on our technique could possibly be nearer to the actual value.”

The team also designed separate stacked autoencoders that used global sea level pressures and sea surface temperatures independently. From the patterns obtained through these stacks, they derived distinct sets of monsoon predictions and found that the SLP-based model improved early monsoon prediction, whereas the SST-based model was most accurate in predicting the extreme cases - drought and excess rain.

Introducing deep learning computational techniques to solve challenges in atmospheric sciences is an interesting endeavour. "As computer scientists, we see this process in terms of data - we have huge amounts of data and are trying to find patterns in it", states Ms. Saha. This is a vastly different approach to that of IMD, which typically works with finite physics-based models because their meteorologists know the basic physical processes behind the monsoon. Hybrid models that combine the goodness of both, are the need of the hour. "We would like to collaborate with meteorologists and try to implement a hybridization of the two approaches", concludes Ms. Saha on the next steps.

Prof. Nanjundiah proposes to extend this work to improve predictions on other scales as well. “This appears to be just the beginning. We expect the technique to give us many more interesting results”, he says while signing off.

About the authors:

Moumita Saha is a Ph.D. research scholar at the Department of Computer Science and Engineering, Indian Institute of Technology – Kharagpur.


Dr. Pabitra Mitra is an Associate Professor at the Department of Computer Science and Engineering, Indian Institute of Technology – Kharagpur.


Prof. Ravi S. Nanjundiah is a Professor and Chairman of the Centre for Atmospheric & Oceanic Sciences, and Divecha Centre for Climate Change; both at the Indian Institute of Science, Bangalore.


About the paper:

This work was recently published in Procedia Computer Science as a part of the International Conference on Computational Science (ICCS) in 2016. Link to the published study can be found here.