Learning Tennis Strategy Through Curriculum-Based Dueling Double Deep Q-Networks

Tennis strategy optimization is a challenging sequential decision-making problem involving hierarchical scoring, stochastic outcomes, long-horizon credit assignment, physical fatigue, and adaptation t

Learning Tennis Strategy Through Curriculum-Based Dueling Double Deep Q-Networks

Tennis strategy optimization is a challenging sequential decision-making problem involving hierarchical scoring, stochastic outcomes, long-horizon credit assignment, physical fatigue, and adaptation to opponent skill. I present a reinforcement learning framework that integrates a custom tennis simulation environment with a Dueling Double Deep Q-Network(DDQN) trained using curriculum learning. The environment models complete tennis scoring at the level of points, games, and sets, rally-level tactical decisions across ten discrete action categories, symmetric fatigue dynamics, and a continuous opponent skill parameter. The dueling architecture decomposes action-value estimation into state-value and advantage components, while double Q-learning reduces overestimation bias and improves training stability in this long-horizon stochastic domain. Curriculum learning progressively increases opponent difficulty from 0.40 to 0.50, enabling robust skill acquisition without the training collapse observed under fixed opponents. Across extensive evaluations, the trained agent achieves win rates between 98 and 100 percent against balanced opponents and maintains strong performance against more challenging opponents. Serve efficiency ranges from 63.0 to 67.5 percent, and return efficiency ranges from 52.8 to 57.1 percent. Ablation studies demonstrate that both the dueling architecture and curriculum learning are necessary for stable convergence, while a standard DQN baseline fails to learn effective policies. Despite strong performance, tactical analysis reveals a pronounced defensive bias, with the learned policy prioritizing error avoidance and prolonged rallies over aggressive point construction. These results highlight a limitation of win-rate driven optimization in simplified sports simulations and emphasize the importance of reward design for realistic sports reinforcement learning.


📜 Original Paper Content

🚀 Synchronizing high-quality layout from 1TB storage...