We won’t call it debunking, but not all investing tips hold up
For nearly 3,000 years, bloodletting was an accepted medical practice for all types of maladies. It was only in the early 1800s when some doctors carefully reviewed data on the practice that they realized bloodletting didn’t improve patients’ health, and may sometimes be harmful.
Such review of accepted theories is currently a growing field among social and natural scientists. Peer-reviewed research is increasingly being thrown back into the review process to see if stands up.
A working paper by the University of Lausanne’s Amit Goyal, UCLA Anderson’s Ivo Welch and Athanasse Zafirov, a Ph.D. student, seeks to prevent the financial equivalent of bloodletting. Their meta-research — the term given for research on research — on papers published in top academic journals finds that many investing factors don’t hold up. To be precise, the 46 variables aren’t full-blown market strategies, but rather observed correlations that could form the basis for a strategy.
Past Performance May Not Be Indicative of Future Results
Building on Goyal and Welch’s 2008 paper that studied the predictive success of 17 variables, the researchers survey 26 papers identifying 29 variables considered useful in predicting the equity premium — the total rate of return on the stock market minus the prevailing short-term interest rate. The 17 variables from the 2008 paper are also reexamined. The researchers’ findings suggest that most of the variables have lost their predictive ability when tested on datasets extended to the end of 2020. A few variables do show flickers of promise but not overwhelming success across the researchers’ evaluation metrics.
The researchers’ first goal was to replicate the original findings of the papers’ authors. This involved recreating the variables and recalculating the reported statistics on the variables’ ability to predict the equity premium. Goyal, Welch and Zafirov were able to confirm the papers’ original findings, using the original dataset, on all but two of the papers. (The two remaining papers had data issues.)
The datasets to create the variables were then extended through December 2020, and the predictions for each of the 29 variables from the papers and the original 17 variables from the 2008 paper were retested.
The datasets in the papers ended between 2000 and 2017 and began as early as 1926. When building a predictive model, a researcher will typically split a dataset into at least two samples —one sample to train the model and another sample, typically the data from the latest years, to test the model. By extending the original datasets with data to the end of 2020 and starting the test sample 20 years after the start of the training sample, the components of these samples were slightly different than the samples used in the papers. It’s worth noting that the new data only made up a small percentage of the overall datasets.
“Because our paper reuses the data that the authors themselves had originally used to discover and validate their variables and theories, all that the predictors had to do in the few added years was not to ‘screw up’ badly.”
Nonetheless, of the 46 variables, only five managed to predict at a statistically significant level on the samples in the extended dataset.
But statistics are one thing, and investment performance is another. As a second test, Goyal, Welch and Zafirov devised simple investment strategies using the variables’ predictions to time investments by determining whether to go long or short the market and weighting the investments. The results of the investment strategies were compared with a buy-and-hold strategy. None of the five variables was able to significantly outperform the buy-and-hold approach in any of the investment strategies. Across all of the variable predictors, half lost money in the simplest investment strategy that used the variable to determine whether to go long or short.
Why Does the Performance Degrade?
The researchers suggest that the deterioration in predictive performance is at least partly explained by the fact that the market has shown greater variety in regimes over the last 20 years with many steep downturns. Campbell R. Harvey of Duke University and Yan Liu of Purdue University have performed similar meta-research and suggest that over-adapting the model to a particular data set may also be a factor due to authors running numerous backtests (simulations over historical data); they further suggest increasing necessary performance thresholds (raising the bar) as the number of backtests increase. Finally, a more generous explanation may be that as the predictive variables become well known by market practitioners, they lose their edge, just like a stock tip — when those tipped off start buying, the stock price rises and the tip loses its value.
Looking at the table below, the variables that were found to remain statistically significant on the extended dataset were those with the fewest citations and likely less well known among market participants.
The Five Best Variables on a Statistical Basis
Fourth-Quarter Growth Rate in Personal Consumption Expenditures (gpce): This macroeconomic variable from researchers Møller and Rangvid posits that high personal consumption growth rates at the end of the year predicts poor stock-market gains in the following year. The researchers found it to be the best, and most consistent, variable in the investment strategies. It outperformed a buy-and-hold approach with three of the four strategies tested. However, the outperformance was only marginal.
Aggregate Accruals (accru): This is a sentiment-based variable introduced by Hirshleifer, Hou and Teoh and uses aggressive corporate accounting to predict future stock returns — more aggressive accruals lead to lower future returns. The variable also marginally beat buy-and-hold returns in three out of four approaches. Most of its performance came from its prediction of the post-tech market crash in 2000-2002.
Credit Standards (crdstd): This is another macroeconomic variable and was introduced by Chava, Gallmeyer and Park. It finds that optimistic (loose) credit standards predict poor market returns and comes from survey data by the Fed. This variable did well in the researchers’ investment strategies and had good performance on test sample data, but statistical measures of the variable on the training sample data were not as convincing and much of its performance comes from the first four years in that sample.
The Investment Capital Ratio (i/k): This a financial ratio introduced by Cochrane all the way back in 1991 and was also included in the 2008 paper from Goyal and Welch. It posits that high capital investment in the current quarter predicts poor stock-market returns in the next quarter. While it was a poor predictor from 1975 to 1998, it has since improved performance yet was not able to outperform a buy-and-hold strategy in three of four of the researchers’ timing strategies.
Treasury-bill Rates (tbl): This is another variable examined in the 2008 paper. It does well statistically but had poor performance in the investment strategies.
Oft-Cited Papers With Poor-Performing Variables
Variance Risk Premium (vrp): This variable was introduced by Bollerslev, Tauchen and Zhou and has the most citations. The variable had poor statistical performance, as well as poor performance in all four of the investment strategies.
Share of Housing Consumption (house): This macroeconomic variable introduced by Piazzesi, Schneider and Tuzel has the second-highest number of citations. It uses housing share of consumer spending to forecast the excess return of stocks. (The higher the spending on housing, the higher the excess returns in the stock market.) The variable had poor statistical performance on the extended dataset and poor performance in the investment strategies.
The Price of West-Texas Intermediate Crude Oil (wtexas): This was the only commodity-based variable and was introduced by Driesprong, Jacobsen and Maat. The paper posits that changes in the price of oil predict stock returns — higher oil prices lead to lower stock returns — with lags. The variable had poor statistical performance for the extended dataset and inconsistent performance in the investment strategies.
The First Principal Component of 14 Technical Indicators (tchi): This variable was introduced by Neely, Rapach, Tu and Zhou and is a linear combination of technical indicators including moving price averages, momentum and volume. It only had marginal statistical performance and inconsistent performance in the trading strategies.
Distinguished Professor of Finance; J. Fred Weston Chair in Finance
About the Research
Goyal, A., Welch, I., & Zafirov, A. (2021). A Comprehensive Look at the Empirical Performance of Equity Premium Prediction II. http://dx.doi.org/10.2139/ssrn.3929119