
Many observing the takeover by machine learning in the world of data science are starting to caution against the misuse and overuse of deep learning algorithms. From concerns around the “black box” nature of deep learning making models hard to interpret or explain, to the large amounts of data required for deep learning algorithms to be effective, to skeptical suggestions that deep learning approaches may have hit their limits, the original hype over deep learning that fuelled the recent mass interest in AI is giving way to a trough of disillusionment over the practice.
At Insite.ai, we are convinced that only by utilising deep learning approaches in forecasting will we see a paradigm shift in the accuracy of demand forecasting. This is the first article of two which explains our position. In this post I will provide a summary of deep learning and provide two examples of how deep learning was recently used to beat traditional forecasting methods. In the second part I provide a list of reasons why we believe that deep learning is poised to dominate in forecasting applications henceforth.
In neural networks (computerised networks comprised of nodes intended to mimic the human brain’s neural structure), data is transformed from an input to an output, via a series of transformations through layers. When there are many layers between the input and output data, the neural network is said to be deep. When machine learning mechanisms are applied to a deep neural network, so that the output of a model is influenced by (i.e. learns from) the quality of previous outputs and optimised, deep learning occurs.
Forecasting is an area that is particularly ripe for improving using deep learning models. Historically (right up to the last 18 months), major forecasting competitions have thrown up the somewhat surprising finding that complicated models aren’t necessarily more accurate than simple ones, particularly when forecasting a general set of time series. Until last year, in the M-competitions, the world’s most renowned forecasting competition (M for Spyros Makridakis, the organiser of the competition and a godfather of the forecasting industry generally), this was the established logic, with a basic model in the most recent previous competition, the M3 competition in 2000, comprised of a combination of three common exponential smoothing methods (Comb) outperforming all more complicated entrants (with a lone exception, which only just beat Comb).
Time and again, econometricians and forecasting practitioners had to head home with their tails between their legs in the knowledge that their complicated forecasting methods are inferior to basic ones.
This all changed in 2018 when the fourth M-competition was run, aptly named the M4 competition. In it, seventeen models outperformed the Comb benchmark, with most making use of machine learning in some way. The top two models beat Comb by 6.6% and 9.4% respectively, a large margin. Both included deep learning to do so.
The second best performing model was developed by Rob Hyndman of Monash University, an expert in forecasting (Hyndman edited the International Journal of Forecasting between 2005 and 2018) along with Pablo Montero-Manso and George Athanasopoulos. Hyndman’s model effectively took nine reasonably common time series methods (all available through the R Forecasting package), and combined them to generate a forecast. A machine learning technique known as gradient boosting (a type of decision tree) was used to calculate an optimised weighting for each of the nine models for each time series. In using a gradient boosting method, Hyndman was able to leverage peripheral features of the data that aren’t utilised by traditional time series methods. He was also able to predict which time series model worked well for each time series, and use this knowledge to optimise the weightings each model received.
The winner of the competition, Slawek Smyl of Uber Labs, focused even further on deep neural networks in his solution, by developing a hybrid model that combined an exponential smoothing model (in this case the Holt-Winters method with multiplicative seasonality) with a recurrent neural network in a hybrid model. Per Smyl:
This allowed for cross-learning across time series in extracting time series features (particularly seasonality). Ultimately, this method far outperformed the benchmark. In part two I provide eight reasons why we believe that deep learning based forecasting methods will become the de facto standard for serious forecasting problems.