In a recent IMF working paper “Overfitting in Judgment-based Economic Forecasts: The Case of IMF Growth Projections”, economist Klaus-Peter Hellwig examined IMF’s World Economic Forecasts (WEO) and check if the forecast model suffers from the problem of overfitting.
Overﬁtting occurs when forecasters attempt to construct stories about the future based on past experience, not taking into account that this experience is limited. That is to say, the forecasters pay too much attention to details that explain the data well in a limited sample but turn out to be less informative in a larger sample. As a result, overﬁtted models (and human forecasters) respond to noise rather than relevant information.
Or, in more formal terms, the expected squared forecast error (MSE) can be separated into three types of forecast error:
The bias term arises when the model is misspeciﬁed, such that the parameter estimates of the model are likely to be biased. The variance term refers to the sampling error of the model in a small sample.
The problem is that there is a trade-oﬀ between the bias term and the variance term – simple models have parameter bias (i.e. the bias term is large) and don’t provide enough explanatory power to ﬁt the data well. But complex models ﬁt the data too well (and the variance term become too large).
What Hellwig wonders is that if the WEO forecasts suffer from the problem of that forecasters might have relied too much on their human judgment, which is often said to be better equipped to aggregate complex information than an abstract empirical model, rather than a formal statistical model and introduce the problem of overfitting into their forecasts.
In general, Hellwig found little evidence of overﬁtting for short-term forecast horizons. However, when looking at projections over longer horizons, he found strong symptoms of overﬁtting.
Here is one of his testing procedures. In this so-called “Null model” setting, Hellwig measures the WEO forecasts’ accuracy by comparing it to forecasts that rely entirely on the sample mean and ignore any information gained from potential predictors, i.e. the Null model.
Null model’s forecast for any horizon and for any country corresponds to the unweighted average growth rate across all countries between 1970 and year t-1. It is updated every year to reﬂect new information about average growth in the historical sample.
The figure below shows the MSE for WEO forecasts and forecasts generated by the Null model. In panel (a), one can see that while WEO forecasts (blue line) are substantially better (lower forecast error) than the Null model (black dash line) in the short run, these diﬀerences in performance become insigniﬁcant as the horizon widens.
For ﬁve-year ahead forecasts of growth rates, the Null model is even more accurate than the WEO forecast, though not signiﬁcantly. This means the Null model is not signiﬁcantly worse at ﬁtting the data than the WEO forecasts in the long run!
Why is that? Hellwig then separates the sample into three groups – advanced economies, emerging market economies, and low-income countries. As you can see from the figure, in the advanced economies the WEO forecast performed much better than the Null model in the short run, and slightly less so in the longer run.
However, in emerging economies and low-income countries, the WEO forecasts generate almost as much forecast errors as the Null Model, even in the short run, except for the forecast of the nearest year.
Hellwig has several more tests in the paper, the result is similar – while WEO forecasts for advanced economies seem to perform quite well, for emerging markets and low-income economies there are signs that the forecasts suffer from overfitting.
The reason that forecasts for advanced economies have less forecast error is that their economic experience are well documented and less volatile, so that forecasters have a larger amount of training data. Also, forecasts for advanced economies are informed by a large set of competing forecasts produced by other organizations, so that forecasters are likely to beneﬁt from crowd wisdom. And the IMF’s Research Department produces model-based forecasts that can inform forecasters’ judgment.
On the other hand, the forecast for emerging markets and low income economies might be based too much on judgment-based. The informal models underlying the judgment may be highly complex – too complex to be formalized – whereas the data that inform these informal models are relatively limited and introduced the problem of overfitting.
But the WEO forecasts for advanced economies are not perfect. Using a machine learning method called LASSO, Hellwig tried to identify the variables which the algorithm selected as most “powerful” predictors. The result suggests that population growth, the current pace of ﬁscal consolidation, and the real exchange rate are the predictors of growth that IMF forecasts should have paid more attention to.
In sum, the research suggests that IMF growth projections have strong symptoms of overﬁtting, particularly at longer horizons. The problem can be explained by the human judgment used in the forecast.
As Hellwig said in his research, that “while statistical models typically deliver better forecast accuracy than economic models, communicating forecasts to decision makers often requires a narrative.” The IMF will have to choose between this dilemma, and at the meantime, investors should examine the WEO forecast with caution, or, with a grain of salt.
Cover photo credit: IMF Flickr