This paper investigates the limit behavior of Markov Decision Processes (MDPs) made of independent particles evolving in a common environment, when the number of particles goes to infinity. In the finite horizon case or with a discounted cost and an infinite horizon, we show that when the number of particles becomes large, the optimal cost of the system converges almost surely to the optimal cost of a discrete deterministic system (the ``optimal mean field''). Convergence also holds for optimal policies. We further provide insights on the speed of convergence by proving several central limits theorems for the cost and the state of the Markov decision process with explicit formulas for the variance of the limit Gaussian laws. Then, our framework is applied to a brokering problem in grid computing. The optimal policy for the limit deterministic system is computed explicitly. Several simulations with growing numbers of processors are reported. They compare the performance of the optimal policy of the limit system used in the finite case with classical policies (such as Join the Shortest Queue) by measuring its asymptotic gain as well as the threshold above which it starts outperforming classical policies.
Deep Dive into A Mean Field Approach for Optimization in Particles Systems and Applications.
This paper investigates the limit behavior of Markov Decision Processes (MDPs) made of independent particles evolving in a common environment, when the number of particles goes to infinity. In the finite horizon case or with a discounted cost and an infinite horizon, we show that when the number of particles becomes large, the optimal cost of the system converges almost surely to the optimal cost of a discrete deterministic system (the ``optimal mean field’’). Convergence also holds for optimal policies. We further provide insights on the speed of convergence by proving several central limits theorems for the cost and the state of the Markov decision process with explicit formulas for the variance of the limit Gaussian laws. Then, our framework is applied to a brokering problem in grid computing. The optimal policy for the limit deterministic system is computed explicitly. Several simulations with growing numbers of processors are reported. They compare the performance of the optimal p
Une approche champ moyen pour l'optimisation dans les systèmes de particules et ses applications
Résumé : Cet article examine le comportement limite de processus de décision Markovien constitués de particules indépendantes évoluant dans un environnement commun, lorsque le nombre de particules tend vers l’infini. Dans le cas où on s’intéresse à un coût à horizon fini ou dans le cas d’un coût à horizon infini avec décote, nous montrons que lorsque le nombre de particules devient grand, le coût optimal du système converge presque sûrement vers le coût optimal du système déterministe. La convergence vaut également pour les politiques optimales.
De plus, nous donnons un aperçu de la vitesse de convergence en prouvant plusieurs théorèmes de la limite centrale pour le coût ainsi que l’état moyen du processus en donnant des formules explicites pour la variance des lois gaussiennes limites.
Enfin, ce modèle est appliqué à un problème de gestionnaire de ressources dans des grilles de calcul. Nous donnons un algorithme explicite pour calculer la politique optimale de la limite puis plusieurs simulations avec un nombre variable de processeurs sont étudiées. Nous comparons les performances de la politique optimale de la limite appliquée au système initiale avec plusieurs politiques classiques, (telles que joindre la file la plus courte). Nous mesurons le gain asymptotique, ainsi que le seuil à partir duquel elle surpasse les politiques classiques.
Mots-clés : Processus de décision Markovien, Champ moyen, Optimisation, Systèmes de particules, Gestionnaire de ressource
The general context of this paper is the optimization of the behavior of controlled Markovian systems, namely Markov Decision Processes composed by a large number of particles evolving in a common environment.
Consider a discrete time system made of N particles, N being large, that evolve randomly and independently (according to a transition probability kernel K). At each step, the state of each particle changes according to a probability kernel, depending on the environment. The evolution of the environment only depends on the number of particles in each state. Furthermore, at each step, a central controller makes a decision that changes the transition probability kernel. The problem addressed in this paper is to study the limit behavior of such systems when N becomes large and the speed of convergence to the limit.
Several papers ( [3], [6]) study the limit behavior of Markovian systems in the case of vanishing intensity (the expected number of transitions per time slot is o(N )). In these cases, the system converges to a differential system in continuous time. In the case considered here, time remains discrete at the limit. This requires a rather different approach to construct the limit.
In [8], discrete time systems are considered and the authors show that under certain conditions, as N grows large, a Markovian system made of N particles converges to a deterministic system. Since a Markov decision process can be seen as a family of Markovian kernels, the class of systems studied in [8] corresponds to the case where this family is reduced to a unique kernel and no decision can be made. Here, we show that under similar conditions as in [8], a Markov decision process also converges to a deterministic one. More precisely, we show that the optimal costs (as well as the corresponding states) converge almost surely to the optimal costs (resp. the corresponding states) of a deterministic system (the “optimal mean field”).
On a practical point of view, this allows one to compute the optimal policy in a deterministic system which can often be done very efficiently, and then to use this policy in the original random system as a good approximation of the optimal policy, which cannot be computed efficiently because of the curse of dimensionality. This is illustrated by an application of our framework to optimal brokering in computational grids. We consider a set of multi-processor clusters (forming a computational grid, like EGEE [1]) and a set of users submitting tasks to be executed. A central broker assigns the tasks to the clusters (where tasks are buffered and served in a fifo order) and tries to minimize the average processing time of all tasks. Computing the optimal policy (solving the associated MDP) is known to be hard [13]. Numerical computations can only be carried up to a total of 10 processors and two users. However, our approach shows that when the number of processors per cluster and the number of users submitting tasks grow, the system converges to a mean field deterministic system. For this deterministic mean field system, the optimal brokering policy can be explicitly computed. Simulations reported in Section 4 show that, using this policy over a grid with a growing number of processors, makes performance converge to the optimal sojourn time in a deterministic system, as expected. Also, simulations show that this deterministic static poli
…(Full text truncated)…
This content is AI-processed based on ArXiv data.