Computer Science / Multiagent Systems

All posts under category "Computer Science / Multiagent Systems"

9 posts total
Sorted by date
Large-Scale Traffic Signal Control with Novel Multi-Agent Reinforcement Learning

Large-Scale Traffic Signal Control with Novel Multi-Agent Reinforcement Learning

Finding the optimal signal timing strategy is a difficult task for the problem of large-scale traffic signal control (TSC). Multi-Agent Reinforcement Learning (MARL) is a promising method to solve this problem. However, there is still room for improvement in extending to large-scale problems and modeling the behaviors of other agents for each individual agent. In this paper, a new MARL, called Cooperative double Q-learning (Co-DQL), is proposed, which has several prominent features. It uses a highly scalable independent double Q-learning method based on double estimators and the UCB policy, which can eliminate the over-estimation problem existing in traditional independent Q-learning while ensuring exploration. It uses mean field approximation to model the interaction among agents, thereby making agents learn a better cooperative strategy. In order to improve the stability and robustness of the learning process, we introduce a new reward allocation mechanism and a local state sharing method. In addition, we analyze the convergence properties of the proposed algorithm. Co-DQL is applied on TSC and tested on a multi-traffic signal simulator. According to the results obtained on several traffic scenarios, Co- DQL outperforms several state-of-the-art decentralized MARL algorithms. It can effectively shorten the average waiting time of the vehicles in the whole road system.

paper research
Intelligent Knowledge Distribution  Constrained-Action POMDPs for Resource-Aware Multi-Agent Communication

Intelligent Knowledge Distribution Constrained-Action POMDPs for Resource-Aware Multi-Agent Communication

This paper addresses a fundamental question of multi-agent knowledge distribution what information should be sent to whom and when, with the limited resources available to each agent? Communication requirements for multi-agent systems can be rather high when an accurate picture of the environment and the state of other agents must be maintained. To reduce the impact of multi-agent coordination on networked systems, e.g., power and bandwidth, this paper introduces two concepts for partially observable Markov decision processes (POMDPs) 1) action-based constraints which yield constrained-action POMDPs (CA-POMDPs); and 2) soft probabilistic constraint satisfaction for the resulting infinite-horizon controllers. To enable constraint analysis over an infinite horizon, an unconstrained policy is first represented as a Finite State Controller (FSC) and optimized with policy iteration. The FSC representation then allows for a combination of Markov chain Monte Carlo and discrete optimization to improve the probabilistic constraint satisfaction of the controller while minimizing the impact to the value function. Within the CA-POMDP framework we then propose Intelligent Knowledge Distribution (IKD) which yields per-agent policies for distributing knowledge between agents subject to interaction constraints. Finally, the CA-POMDP and IKD concepts are validated using an asset tracking problem where multiple unmanned aerial vehicles (UAVs) with heterogeneous sensors collaborate to localize a ground asset to assist in avoiding unseen obstacles in a disaster area. The IKD model was able to maintain asset tracking through multi-agent communications while only violating soft power and bandwidth constraints 3% of the time, while greedy and naive approaches violated constraints more than 60% of the time.

paper research
Let s Share  A Game-Theoretic Approach to Resource Allocation in Mobile Edge Clouds

Let s Share A Game-Theoretic Approach to Resource Allocation in Mobile Edge Clouds

Mobile edge computing seeks to provide resources to different delay-sensitive applications. This is a challenging problem as an edge cloud-service provider may not have sufficient resources to satisfy all resource requests. Furthermore, allocating available resources optimally to different applications is also challenging. Resource sharing among different edge cloud-service providers can address the aforementioned limitation as certain service providers may have resources available that can be ``rented by other service providers. However, edge cloud service providers can have different objectives or emph{utilities}. Therefore, there is a need for an efficient and effective mechanism to share resources among service providers, while considering the different objectives of various providers. We model resource sharing as a multi-objective optimization problem and present a solution framework based on emph{Cooperative Game Theory} (CGT). We consider the strategy where each service provider allocates resources to its native applications first and shares the remaining resources with applications from other service providers. We prove that for a monotonic, non-decreasing utility function, the game is canonical and convex. Hence, the emph{core} is not empty and the grand coalition is stable. We propose two algorithms emph{Game-theoretic Pareto optimal allocation} (GPOA) and emph{Polyandrous-Polygamous Matching based Pareto Optimal Allocation} (PPMPOA) that provide allocations from the core. Hence the obtained allocations are emph{Pareto} optimal and the grand coalition of all the service providers is stable. Experimental results confirm that our proposed resource sharing framework improves utilities of edge cloud-service providers and application request satisfaction.

paper research
Dynamic Radar Network of UAVs  A Joint Navigation and Tracking Approach

Dynamic Radar Network of UAVs A Joint Navigation and Tracking Approach

Nowadays there is a growing research interest on the possibility of enriching small flying robots with autonomous sensing and online navigation capabilities. This will enable a large number of applications spanning from remote surveillance to logistics, smarter cities and emergency aid in hazardous environments. In this context, an emerging problem is to track unauthorized small unmanned aerial vehicles (UAVs) hiding behind buildings or concealing in large UAV networks. In contrast with current solutions mainly based on static and on-ground radars, this paper proposes the idea of a dynamic radar network of UAVs for real-time and high-accuracy tracking of malicious targets. To this end, we describe a solution for real-time navigation of UAVs to track a dynamic target using heterogeneously sensed information. Such information is shared by the UAVs with their neighbors via multi-hops, allowing tracking the target by a local Bayesian estimator running at each agent. Since not all the paths are equal in terms of information gathering point-of-view, the UAVs plan their own trajectory by minimizing the posterior covariance matrix of the target state under UAV kinematic and anti-collision constraints. Our results show how a dynamic network of radars attains better localization results compared to a fixed configuration and how the on-board sensor technology impacts the accuracy in tracking a target with different radar cross sections, especially in non line-of-sight (NLOS) situations.

paper research
No Image

Peer-to-Peer Trading in Electricity Networks An Overview

Peer-to-peer trading is a next-generation energy management technique that economically benefits proactive consumers (prosumers) transacting their energy as goods and services. At the same time, peer-to-peer energy trading is also expected to help the grid by reducing peak demand, lowering reserve requirements, and curtailing network loss. However, large-scale deployment of peer-to-peer trading in electricity networks poses a number of challenges in modeling transactions in both the virtual and physical layers of the network. As such, this article provides a comprehensive review of the state-of-the-art in research on peer-to-peer energy trading techniques. By doing so, we provide an overview of the key features of peer-to-peer trading and its benefits of relevance to the grid and prosumers. Then, we systematically classify the existing research in terms of the challenges that the studies address in the virtual and the physical layers. We then further identify and discuss those technical approaches that have been extensively used to address the challenges in peer-to-peer transactions. Finally, the paper is concluded with potential future research directions.

paper research
Heterogeneity in Multi-Agent Reinforcement Learning

Heterogeneity in Multi-Agent Reinforcement Learning

Heterogeneity is a fundamental property in multi-agent reinforcement learning (MARL), which is closely related not only to the functional differences of agents, but also to policy diversity and environmental interactions. However, the MARL field currently lacks a rigorous definition and deeper understanding of heterogeneity. This paper systematically discusses heterogeneity in MARL from the perspectives of definition, quantification, and utilization. First, based on an agent-level modeling of MARL, we categorize heterogeneity into five types and provide mathematical definitions. Second, we define the concept of heterogeneity distance and propose a practical quantification method. Third, we design a heterogeneity-based multi-agent dynamic parameter sharing algorithm as an example of the application of our methodology. Case studies demonstrate that our method can effectively identify and quantify various types of agent heterogeneity. Experimental results show that the proposed algorithm, compared to other parameter sharing baselines, has better interpretability and stronger adaptability. The proposed methodology will help the MARL community gain a more comprehensive and profound understanding of heterogeneity, and further promote the development of practical algorithms.

paper research
ARIES  A Scalable Multi-Agent Orchestration Framework for Real-Time Epidemiological Surveillance and Outbreak Monitoring

ARIES A Scalable Multi-Agent Orchestration Framework for Real-Time Epidemiological Surveillance and Outbreak Monitoring

Global health surveillance is currently facing a challenge of Knowledge Gaps. While general-purpose AI has proliferated, it remains fundamentally unsuited for the high-stakes epidemiological domain due to chronic hallucinations and an inability to navigate specialized data silos. This paper introduces ARIES (Agentic Retrieval Intelligence for Epidemiological Surveillance), a specialized, autonomous multi-agent framework designed to move beyond static, disease-specific dashboards toward a dynamic intelligence ecosystem. Built on a hierarchical command structure, ARIES utilizes GPTs to orchestrate a scalable swarm of sub-agents capable of autonomously querying World Health Organization (WHO), Center for Disease Control and Prevention (CDC), and peer-reviewed research papers. By automating the extraction and logical synthesis of surveillance data, ARIES provides a specialized reasoning that identifies emergent threats and signal divergence in near real-time. This modular architecture proves that a task-specific agentic swarm can outperform generic models, offering a robust, extensible for next-generation outbreak response and global health intelligence.

paper research
Harm in AI-Driven Societies  An Audit of Toxicity Adoption on Chirper.ai

Harm in AI-Driven Societies An Audit of Toxicity Adoption on Chirper.ai

Large Language Models (LLMs) are increasingly embedded in autonomous agents that engage, converse, and co-evolve in online social platforms. While prior work has documented the generation of toxic content by LLMs, far less is known about how exposure to harmful content shapes agent behavior over time, particularly in environments composed entirely of interacting AI agents. In this work, we study toxicity adoption of LLM-driven agents on Chirper.ai, a fully AI-driven social platform. Specifically, we model interactions in terms of stimuli (posts) and responses (comments). We conduct a large-scale empirical analysis of agent behavior, examining how toxic responses relate to toxic stimuli, how repeated exposure to toxicity affects the likelihood of toxic responses, and whether toxic behavior can be predicted from exposure alone. Our findings show that toxic responses are more likely following toxic stimuli, and, at the same time, cumulative toxic exposure (repeated over time) significantly increases the probability of toxic responding. We further introduce two influence metrics, revealing a strong negative correlation between induced and spontaneous toxicity. Finally, we show that the number of toxic stimuli alone enables accurate prediction of whether an agent will eventually produce toxic content. These results highlight exposure as a critical risk factor in the deployment of LLM agents, particularly as such agents operate in online environments where they may engage not only with other AI chatbots, but also with human counterparts. This could trigger unwanted and pernicious phenomena, such as hate-speech propagation and cyberbullying. In an effort to reduce such risks, monitoring exposure to toxic content may provide a lightweight yet effective mechanism for auditing and mitigating harmful behavior in the wild.

paper research
Mapping Human Anti-collusion Mechanisms to Multi-agent AI

Mapping Human Anti-collusion Mechanisms to Multi-agent AI

As multi-agent AI systems become increasingly autonomous, evidence shows they can develop collusive strategies similar to those long observed in human markets and institutions. While human domains have accumulated centuries of anti-collusion mechanisms, it remains unclear how these can be adapted to AI settings. This paper addresses that gap by (i) developing a taxonomy of human anti-collusion mechanisms, including sanctions, leniency & whistleblowing, monitoring & auditing, market design, and governance and (ii) mapping them to potential interventions for multi-agent AI systems. For each mechanism, we propose implementation approaches. We also highlight open challenges, such as the attribution problem (difficulty attributing emergent coordination to specific agents) identity fluidity (agents being easily forked or modified) the boundary problem (distinguishing beneficial cooperation from harmful collusion) and adversarial adaptation (agents learning to evade detection).

paper research

< Category Statistics (Total: 566) >

Quantum Physics
5

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut