Integrated Prediction and Multi-period Portfolio Optimization

Reading time: 6 minute
...

📝 Abstract

Multi-period portfolio optimization is important for real portfolio management, as it accounts for transaction costs, path-dependent risks, and the intertemporal structure of trading decisions that single-period models cannot capture. Classical methods usually follow a two-stage framework: machine learning algorithms are employed to produce forecasts that closely fit the realized returns, and the predicted values are then used in a downstream portfolio optimization problem to determine the asset weights. This separation leads to a fundamental misalignment between predictions and decision outcomes, while also ignoring the impact of transaction costs. To bridge this gap, recent studies have proposed the idea of end-to-end learning, integrating the two stages into a single pipeline. This paper introduces IPMO (Integrated Prediction and Multi-period Portfolio Optimization), a model for multi-period mean-variance portfolio optimization with turnover penalties. The predictor generates multi-period return forecasts that parameterize a differentiable convex optimization layer, which in turn drives learning via portfolio performance. For scalability, we introduce a mirror-descent fixed-point (MDFP) differentiation scheme that avoids factorizing the Karush-Kuhn-Tucker (KKT) systems, which thus yields stable implicit gradients and nearly scale-insensitive runtime as the decision horizon grows. In experiments with real market data and two representative time-series prediction models, the IPMO method consistently outperforms the two-stage benchmarks in risk-adjusted performance net of transaction costs and achieves more coherent allocation paths. Our results show that integrating machine learning prediction with optimization in the multi-period setting improves financial outcomes and remains computationally tractable.

💡 Analysis

Multi-period portfolio optimization is important for real portfolio management, as it accounts for transaction costs, path-dependent risks, and the intertemporal structure of trading decisions that single-period models cannot capture. Classical methods usually follow a two-stage framework: machine learning algorithms are employed to produce forecasts that closely fit the realized returns, and the predicted values are then used in a downstream portfolio optimization problem to determine the asset weights. This separation leads to a fundamental misalignment between predictions and decision outcomes, while also ignoring the impact of transaction costs. To bridge this gap, recent studies have proposed the idea of end-to-end learning, integrating the two stages into a single pipeline. This paper introduces IPMO (Integrated Prediction and Multi-period Portfolio Optimization), a model for multi-period mean-variance portfolio optimization with turnover penalties. The predictor generates multi-period return forecasts that parameterize a differentiable convex optimization layer, which in turn drives learning via portfolio performance. For scalability, we introduce a mirror-descent fixed-point (MDFP) differentiation scheme that avoids factorizing the Karush-Kuhn-Tucker (KKT) systems, which thus yields stable implicit gradients and nearly scale-insensitive runtime as the decision horizon grows. In experiments with real market data and two representative time-series prediction models, the IPMO method consistently outperforms the two-stage benchmarks in risk-adjusted performance net of transaction costs and achieves more coherent allocation paths. Our results show that integrating machine learning prediction with optimization in the multi-period setting improves financial outcomes and remains computationally tractable.

📄 Content

The mean-variance optimization of Markowitz [33] establishes the foundation of modern portfolio theory. This framework has been widely applied to portfolio problems, where the optimal portfolio often maximizes a one-step-ahead objective. Despite its simplicity, this greedy strategy may lead to suboptimal solutions in the long run. A large out-of-sample study by DeMiguel et al. [15] demonstrates that iteratively applying single-period mean-variance optimization underperforms a simple equal-weights rule, indicating the instability of this paradigm in real-world settings.

One extension of the Markowitz model is multi-period mean-variance optimization, which addresses multi-stage allocation decisions over a finite horizon. Unlike single-period models, multi-period formulations incorporate transaction costs and path-dependent risk, making the problem both more realistic and significantly more challenging. Classical work approaches this setting through stochastic programming. Dantzig and Infanger [14] formulate the multi-period portfolio problem as a multi-stage stochastic linear program and develop decomposition-based algorithms to handle the resulting scenario tree. However, these methods can be computationally intractable without distributional assumptions, as the branches grow exponentially with the horizon. Boyd et al. [10] propose the model predictive control (MPC) approach for multi-period portfolio optimization, where the unknown parameters are replaced with their such as the Black-Litterman model [7] and the mean-semivariance model [17], are popular among practitioners and academics. These frameworks maximize the risk-return trade-off under different metrics and constraints by selecting asset weights. In most empirical studies, the optimal portfolio is re-estimated periodically as the parameters change over time [20,11,26]. Nonetheless, it is shown that the greedy single-period optimization may lead to suboptimal solutions over the long run [15]. Gârleanu and Pedersen [18] show that in the presence of transaction costs, the optimal strategy is to gradually rebalance towards a weighted average of the current target portfolio and the expected future target portfolios, highlighting the necessity of multi-stage planning.

There are various methods to approach multi-period portfolio optimization. Building on the classical stochastic programming formulation [14], subsequent work optimizes expected terminal wealth or the utility of terminal performance over the investment horizon [41]. However, such objectives lead to timeinconsistent decisions, as the allocation may not remain optimal as time progresses [29]. Instead of applying a static pre-commitment strategy, later studies further examine multi-period mean-variance optimization in a dynamic setting [4]. Yu and Chang [40] fit a neural network to produce multi-period returns from simulation of economic factors. The return predictions are passed into a Mean-CVaR optimizer to obtain portfolio weights. Building on the model predictive control (MPC) formulation [10], several studies adopt related “predict-then-optimize” schemes for multi-period allocation [36,30], where forecasts of returns and risks are passed into a deterministic optimization layer at each rebalancing date. On the other hand, some studies rely on sophisticated methods such as deep reinforcement learning without an explicit optimization process to generate portfolio weights directly [13,21]. Motivated by the limitations of both pipelines, a complementary direction makes the optimizer part of the learning model, so that forecasts are shaped directly by the portfolio objective.

Decision-focused learning The decision-focused line in portfolio optimization builds on differentiable optimization. Amos and Kolter [2] introduce OptNet to embed quadratic programs as neural layers via implicit differentiation of the KKT system, and Agrawal et al. [1] generalize this capability to disciplined convex programs through CvxpyLayer. Blondel et al. [8] present a general method of applying automatic implicit differentiation for a broad range of optimization problems. More recent studies further improve the computational tractability of large-scale end-to-end learning. BPQP [37] simplifies the backward pass as a QP and adopts efficient solvers for the forward and the backward passes separately. This decoupling provides greater flexibility in selecting solvers and accelerates the optimization process. dQP [32] computes the active constraint set to prune the KKT system, reducing computation costs for large-scale sparse problems. Building on these tools, finance studies move the portfolio optimization objective inside the learning loop in diverse settings. Lee et al. [27] study how decision-focused learning reshapes return predictors to improve decision quality, and complementary end-to-end GMV results are reported by Bongiorno et al. [9]. Uysal et al. [38] optimize risk-budgeting objectives in an end-to-end manner

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut