Electrical Engineering and Systems Science

All posts under category "Electrical Engineering and Systems Science"

36 posts total
Sorted by date
No Image

Event-Triggered Stabilization Using Lyapunov Functions with Guaranteed Convergence Rate

A constructive tool of nonlinear control systems design, the method of Control Lyapunov Functions (CLF) has found numerous applications in stabilization problems for continuous time, discrete-time and hybrid systems. In this paper, we address the fundamental question given a CLF, corresponding to the continuous-time controller with some predefined (e.g. exponential) convergence rate, can the same convergence rate be provided by an event-triggered controller? Under certain assumptions, we give an affirmative answer to this question and show that the corresponding event-based controllers provide positive dwelltimes between the consecutive events. Furthermore, we prove the existence of self-triggered and periodic event-triggered controllers, providing stabilization with a known convergence rate.

paper research
Clock Synchronization Over Networks  Identifiability of the Sawtooth Model

Clock Synchronization Over Networks Identifiability of the Sawtooth Model

In this paper, we analyze the two-node joint clock synchronization and ranging problem. We focus on the case of nodes that employ time-to-digital converters to determine the range between them precisely. This specific design choice leads to a sawtooth model for the captured signal, which has not been studied before from an estimation theoretic standpoint. In the study of this model, we recover the basic conclusion of a well-known article by Freris, Graham, and Kumar in clock synchronization. More importantly, we discover a surprising identifiability result on the sawtooth signal model noise improves the theoretical condition of the estimation of the phase and offset parameters. To complete our study, we provide performance references for joint clock synchronization and ranging using the sawtooth signal model by presenting an exhaustive simulation study on basic estimation strategies under different realistic conditions. With our contributions in this paper, we enable further research in the estimation of sawtooth signal models and pave the path towards their industrial use for clock synchronization and ranging.

paper research
A Review of Autonomous Driving  Current Practices and Emerging Technologies

A Review of Autonomous Driving Current Practices and Emerging Technologies

Automated driving systems (ADSs) promise a safe, comfortable and efficient driving experience. However, fatalities involving vehicles equipped with ADSs are on the rise. The full potential of ADSs cannot be realized unless the robustness of state-of-the-art improved further. This paper discusses unsolved problems and surveys the technical aspect of automated driving. Studies regarding present challenges, high-level system architectures, emerging methodologies and core functions localization, mapping, perception, planning, and human machine interface, were thoroughly reviewed. Furthermore, the state-of-the-art was implemented on our own platform and various algorithms were compared in a real-world driving setting. The paper concludes with an overview of available datasets and tools for ADS development.

paper research
A Local Direct Method for Identifying Modules in Dynamic Networks with Correlated Noise

A Local Direct Method for Identifying Modules in Dynamic Networks with Correlated Noise

The identification of local modules in dynamic networks with known topology has recently been addressed by formulating conditions for arriving at consistent estimates of the module dynamics, under the assumption of having disturbances that are uncorrelated over the different nodes. The conditions typically reflect the selection of a set of node signals that are taken as predictor inputs in a MISO identification setup. In this paper an extension is made to arrive at an identification setup for the situation that process noises on the different node signals can be correlated with each other. In this situation the local module may need to be embedded in a MIMO identification setup for arriving at a consistent estimate with maximum likelihood properties. This requires the proper treatment of confounding variables. The result is a set of algorithms that, based on the given network topology and disturbance correlation structure, selects an appropriate set of node signals as predictor inputs and outputs in a MISO or MIMO identification setup. Three algorithms are presented that differ in their approach of selecting measured node signals. Either a maximum or a minimum number of measured node signals can be considered, as well as a preselected set of measured nodes.

paper research
Achieving State Synchronization in Homogeneous Networks of Non-Introspective Agents with Input Saturation  A Scale-Free Protocol Design Approach

Achieving State Synchronization in Homogeneous Networks of Non-Introspective Agents with Input Saturation A Scale-Free Protocol Design Approach

This paper studies global and semi-global regulated state synchronization of homogeneous networks of non-introspective agents in presence of input saturation based on additional information exchange where the reference trajectory is given by a so-called exosystem which is assumed to be globally reachable. Our protocol design methodology does not need any knowledge of the directed network topology and the spectrum of the associated Laplacian matrix. Moreover, the proposed protocol is scalable and achieves synchronization for any arbitrary number of agents.

paper research
A Novel Approach to Distributed Hypothesis Testing and Non-Bayesian Learning  Enhancing Learning Speed and Byzantine Resilience

A Novel Approach to Distributed Hypothesis Testing and Non-Bayesian Learning Enhancing Learning Speed and Byzantine Resilience

We study a setting where a group of agents, each receiving partially informative private signals, seek to collaboratively learn the true underlying state of the world (from a finite set of hypotheses) that generates their joint observation profiles. To solve this problem, we propose a distributed learning rule that differs fundamentally from existing approaches, in that it does not employ any form of belief-averaging . Instead, agents update their beliefs based on a min-rule. Under standard assumptions on the observation model and the network structure, we establish that each agent learns the truth asymptotically almost surely. As our main contribution, we prove that with probability 1, each false hypothesis is ruled out by every agent exponentially fast at a network-independent rate that is strictly larger than existing rates. We then develop a computationally-efficient variant of our learning rule that is provably resilient to agents who do not behave as expected (as represented by a Byzantine adversary model) and deliberately try to spread misinformation.

paper research
S&CNet  Monocular Depth Completion for Autonomous Systems and 3D Reconstruction

S&CNet Monocular Depth Completion for Autonomous Systems and 3D Reconstruction

Dense depth completion is essential for autonomous systems and 3D reconstruction. In this paper, a lightweight yet efficient network (S &CNet) is proposed to obtain a good trade-off between efficiency and accuracy for the dense depth completion. A dual-stream attention module (S &C enhancer) is introduced to measure both spatial-wise and the channel-wise global-range relationship of extracted features so as to improve the performance. A coarse-to-fine network is designed and the proposed S &C enhancer is plugged into the coarse estimation network between its encoder and decoder network. Experimental results demonstrate that our approach achieves competitive performance with existing works on KITTI dataset but almost four times faster. The proposed S &C enhancer can be plugged into other existing works and boost their performance significantly with a negligible additional computational cost.

paper research
Multi-Task Regression-Based Learning for Autonomous Drone Flight Control in Unstructured Outdoor Environments

Multi-Task Regression-Based Learning for Autonomous Drone Flight Control in Unstructured Outdoor Environments

Increased growth in the global Unmanned Aerial Vehicles (UAV) (drone) industry has expanded possibilities for fully autonomous UAV applications. A particular application which has in part motivated this research is the use of UAV in wide area search and surveillance operations in unstructured outdoor environments. The critical issue with such environments is the lack of structured features that could aid in autonomous flight, such as road lines or paths. In this paper, we propose an End-to-End Multi-Task Regression-based Learning approach capable of defining flight commands for navigation and exploration under the forest canopy, regardless of the presence of trails or additional sensors (i.e. GPS). Training and testing are performed using a software in the loop pipeline which allows for a detailed evaluation against state-of-the-art pose estimation techniques. Our extensive experiments demonstrate that our approach excels in performing dense exploration within the required search perimeter, is capable of covering wider search regions, generalises to previously unseen and unexplored environments and outperforms contemporary state-of-the-art techniques.

paper research
An Introduction to Decentralized Stochastic Optimization with Gradient Tracking

An Introduction to Decentralized Stochastic Optimization with Gradient Tracking

Decentralized solutions to finite-sum minimization are of significant importance in many signal processing, control, and machine learning applications. In such settings, the data is distributed over a network of arbitrarily-connected nodes and raw data sharing is prohibitive often due to communication or privacy constraints. In this article, we review decentralized stochastic first-order optimization methods and illustrate some recent improvements based on gradient tracking and variance reduction, focusing particularly on smooth and strongly-convex objective functions. We provide intuitive illustrations of the main technical ideas as well as applications of the algorithms in the context of decentralized training of machine learning models.

paper research
Controlling the Painlevé Paradox in a Robotic System

Controlling the Painlevé Paradox in a Robotic System

The Painlev e paradox is a phenomenon that causes instability in mechanical systems subjects to unilateral constraints. While earlier studies were mostly focused on abstract theoretical settings, recent work confirmed the occurrence of the paradox in realistic set-ups. In this paper, we investigate the dynamics and presence of the Painlev e phenomenon in a twolinks robot in contact with a moving belt, through a bifurcation study. Then, we use the results of this analysis to inform the design of control strategies able to keep the robot sliding on the belt and avoid the onset of undesired lift-off. To this aim, through numerical simulations, we synthesise and compare a PID strategy and a hybrid force/motion control scheme, finding that the latter is able to guarantee better performance and avoid the onset of bouncing motion due to the Painlev e phenomenon.

paper research
A Multi-Objective Evolutionary Approach for Grey-Box Modeling of a Buck Converter

A Multi-Objective Evolutionary Approach for Grey-Box Modeling of a Buck Converter

The present study proposes a simple grey-box identification approach to model a real DC-DC buck converter operating in continuous conduction mode. The problem associated with the information void in the observed dynamical data, which is often obtained over a relatively narrow input range, is alleviated by exploiting the known static behavior of buck converter as a priori knowledge. A simple method is developed based on the concept of term clusters to determine the static response of the candidate models. The error in the static behavior is then directly embedded into the multi-objective framework for structure selection. In essence, the proposed approach casts grey-box identification problem into a multi-objective framework to balance bias-variance dilemma of model building while explicitly integrating a priori knowledge into the structure selection process. The results of the investigation, considering the case of practical buck converter, demonstrate that it is possible to identify parsimonious models which can capture both the dynamic and static behavior of the system over a wide input range.

paper research
Storage or No Storage  Duopoly Competition Among Renewable Energy Suppliers in a Local Market

Storage or No Storage Duopoly Competition Among Renewable Energy Suppliers in a Local Market

This paper studies the duopoly competition between renewable energy suppliers with or without energy storage in a local energy market. The storage investment brings the benefits of stabilizing renewable energy suppliers outputs, but it also leads to substantial investment costs as well as some surprising changes in the market outcome. To study the equilibrium decisions of storage investment in the renewable energy suppliers competition, we model the interactions between suppliers and consumers using a three-stage game-theoretic model. In Stage I, at the beginning of the investment horizon, suppliers decide whether to invest in storage. Once such decisions have been made, in the day-ahead market of each day, suppliers decide on their bidding prices and quantities in Stage II, based on which consumers decide the electricity quantity purchased from each supplier in Stage III. In the real-time market, a supplier is penalized if his actual generation falls short of his commitment. We characterize a price-quantity competition equilibrium of Stage II, and we further characterize a storage-investment equilibrium in Stage I incorporating electricity-selling revenue and storage cost. Counter-intuitively, we show that the uncertainty of renewable energy without storage investment can lead to higher supplier profits compared with the stable generations with storage investment due to the reduced market competition under random energy generation. Simulations further illustrate results due to the market competition. For example, a higher penalty for not meeting the commitment, a higher storage cost, or a lower consumer demand can sometimes increase a supplier s profit. We also show that although storage investment can increase a supplier s profit, the first-mover supplier who invests in storage may benefit less than the free-rider competitor who chooses not to invest.

paper research
Real-Time Casing Collar Recognition System Using Neural Networks for Downhole Instruments

Real-Time Casing Collar Recognition System Using Neural Networks for Downhole Instruments

Accurate downhole positioning is critical in oil and gas operations but is often compromised by signal degradation in traditional surface-based Casing Collar Locator (CCL) monitoring. To address this, we present an in-situ, real-time collar recognition system using embedded neural network. We introduce lightweight Collar Recognition Nets (CRNs) optimized for resource-constrained ARM Cortex-M7 microprocessors. By leveraging temporal and depthwise separable convolutions, our most compact model reduces computational complexity to just 8,208 MACs while maintaining an F1 score of 0.972. Hardware validation confirms an average inference latency of 343.2 μs, demonstrating that robust, autonomous signal processing is feasible within the severe power and space limitations of downhole instrumentation.

paper research
Hear the Heartbeat in Stages  Physiologically-Informed Phase-Sensitive ECG Biometrics

Hear the Heartbeat in Stages Physiologically-Informed Phase-Sensitive ECG Biometrics

Electrocardiography (ECG) is adopted for identity authentication in wearable devices due to its individual-specific characteristics and inherent liveness. However, existing methods often treat heartbeats as homogeneous signals, overlooking the phase-specific characteristics within the cardiac cycle. To address this, we propose a Hierarchical Phase-Aware Fusion~(HPAF) framework that explicitly avoids cross-feature entanglement through a three-stage design. In the first stage, Intra-Phase Representation (IPR) independently extracts representations for each cardiac phase, ensuring that phase-specific morphological and variation cues are preserved without interference from other phases. In the second stage, Phase-Grouped Hierarchical Fusion (PGHF) aggregates physiologically related phases in a structured manner, enabling reliable integration of complementary phase information. In the final stage, Global Representation Fusion (GRF) further combines the grouped representations and adaptively balances their contributions to produce a unified and discriminative identity representation. Moreover, considering ECG signals are continuously acquired, multiple heartbeats can be collected for each individual. We propose a Heartbeat-Aware Multi-prototype (HAM) enrollment strategy, which constructs a multi-prototype gallery template set to reduce the impact of heartbeat-specific noise and variability. Extensive experiments on three public datasets demonstrate that HPAF achieves state-of-the-art results in the comparison with other methods under both closed and open-set settings.

paper research
An Adaptive, Disentangled Representation for Multidimensional MRI Reconstruction

An Adaptive, Disentangled Representation for Multidimensional MRI Reconstruction

We present a new approach for representing and reconstructing multidimensional magnetic resonance imaging (MRI) data. Our method builds on a novel, learned feature-based image representation that disentangles different types of features, such as geometry and contrast, into distinct low-dimensional latent spaces, enabling better exploitation of feature correlations in multidimensional images and incorporation of pre-learned priors specific to different feature types for reconstruction. More specifically, the disentanglement was achieved via an encoderdecoder network and image transfer training using large public data, enhanced by a style-based decoder design. A latent diffusion model was introduced to impose stronger constraints on distinct feature spaces. New reconstruction formulations and algorithms were developed to integrate the learned representation with a zero-shot selfsupervised learning adaptation and subspace modeling. The proposed method has been evaluated on accelerated T1 and T2 parameter mapping, achieving improved performance over state-of-the-art reconstruction methods, without task-specific supervised training or fine-tuning. This work offers a new strategy for learning-based multidimensional image reconstruction where only limited data are available for problem-specific or task-specific training.

paper research
Probability-Aware Parking Choice

Probability-Aware Parking Choice

Current parking navigation systems often underestimate total travel time by failing to account for the time spent searching for a parking space, which significantly affects user experience, mode choice, congestion, and emissions. To address this issue, this paper introduces the probability-aware parking selection problem, which aims to direct drivers to the best parking location rather than straight to their destination. An adaptable dynamic programming framework is proposed for decision-making based on probabilistic information about parking availability at the parking lot level. Closed-form analysis determines when it is optimal to target a specific parking lot or explore alternatives, as well as the expected time cost. Sensitivity analysis and three illustrative cases are examined, demonstrating the model s ability to account for the dynamic nature of parking availability. Acknowledging the financial costs of permanent sensing infrastructure, the paper provides analytical and empirical assessments of errors incurred when leveraging stochastic observations to estimate parking availability. Experiments with real-world data from the US city of Seattle indicate this approach s viability, with mean absolute error decreasing from 7% to below 2% as observation frequency grows. In data-based simulations, probability-aware strategies demonstrate time savings up to 66% relative to probability-unaware baselines, yet still take up to 123% longer than direct-to-destination estimates.

paper research
No Image

Indian EmoSpeech Command Dataset A Dataset for Emotion-Based Speech Recognition in Real-World Scenarios

Speech emotion analysis is an important task which further enables several application use cases. The non-verbal sounds within speech utterances also play a pivotal role in emotion analysis in speech. Due to the widespread use of smartphones, it becomes viable to analyze speech commands captured using microphones for emotion understanding by utilizing on-device machine learning models. The non-verbal information includes the environment background sounds describing the type of surroundings, current situation and activities being performed. In this work, we consider both verbal (speech commands) and non-verbal sounds (background noises) within an utterance for emotion analysis in real-life scenarios. We create an indigenous dataset for this task namely Indian EmoSpeech Command Dataset . It contains keywords with diverse emotions and background sounds, presented to explore new challenges in audio analysis. We exhaustively compare with various baseline models for emotion analysis on speech commands on several performance metrics. We demonstrate that we achieve a significant average gain of 3.3% in top-one score over a subset of speech command dataset for keyword spotting.

paper research
No Image

Game-Based Coalescence in Multi-Agent Systems

Coalescence, as a kind of ubiquitous group behavior in the nature and society, means that agents, companies or other substances keep consensus in states and act as a whole. This paper considers coalescence for n rational agents with distinct initial states. Considering the rationality and intellectuality of the population, the coalescing process is described by a bimatrix game which has the unique mixed strategy Nash equilibrium solution. Since the process is not an independent stochastic process, it is difficult to analyze the coalescing process. By using the first Borel-Cantelli Lemma, we prove that all agents will coalesce into one group with probability one. Moreover, the expected coalescence time is also evaluated. For the scenario where payoff functions are power functions, we obtain the distribution and expected value of coalescence time. Finally, simulation examples are provided to validate the effectiveness of the theoretical results.

paper research
Real-Time Trajectory Planning for Feedback Linearizable Systems Using Time-Varying Optimization

Real-Time Trajectory Planning for Feedback Linearizable Systems Using Time-Varying Optimization

We develop an optimization-based framework for joint real-time trajectory planning and feedback control of feedback-linearizable systems. To achieve this goal, we define a target trajectory as the optimal solution of a time-varying optimization problem. In general, however, such trajectory may not be feasible due to , e.g., nonholonomic constraints. To solve this problem, we design a control law that generates feasible trajectories that asymptotically converge to the target trajectory. More precisely, for systems that are (dynamic) full-state linearizable, the proposed control law implicitly transforms the nonlinear system into an optimization algorithm of sufficiently high order. We prove global exponential convergence to the target trajectory for both the optimization algorithm and the original system. We illustrate the effectiveness of our proposed method on multi-target or multi-agent tracking problems with constraints.

paper research
Automated Decision-Making for Electric Power Network Recovery

Automated Decision-Making for Electric Power Network Recovery

Critical infrastructure systems such as electric power networks, water networks, and transportation systems play a major role in the welfare of any community. In the aftermath of disasters, their recovery is of paramount importance; orderly and efficient recovery involves the assignment of limited resources (a combination of human repair workers and machines) to repair damaged infrastructure components. The decision maker must also deal with uncertainty in the outcome of the resource-allocation actions during recovery. The manual assignment of resources seldom is optimal despite the expertise of the decision maker because of the large number of choices and uncertainties in consequences of sequential decisions. This combinatorial assignment problem under uncertainty is known to be mbox{NP-hard}. We propose a novel decision technique that addresses the massive number of decision choices for large-scale real-world problems; in addition, our method also features an experiential learning component that adaptively determines the utilization of the computational resources based on the performance of a small number of choices. Our framework is closed-loop, and naturally incorporates all the attractive features of such a decision-making system. In contrast to myopic approaches, which do not account for the future effects of the current choices, our methodology has an anticipatory learning component that effectively incorporates emph{lookahead} into the solutions. To this end, we leverage the theory of regression analysis, Markov decision processes (MDPs), multi-armed bandits, and stochastic models of community damage from natural disasters to develop a method for near-optimal recovery of communities. Our method contributes to the general problem of MDPs with massive action spaces with application to recovery of communities affected by hazards.

paper research
Frequency Stability with MPC-based Inverter Power Control in Low-Inertia Power Systems

Frequency Stability with MPC-based Inverter Power Control in Low-Inertia Power Systems

The electrical grid is evolving from a network consisting of mostly synchronous machines to a mixture of synchronous machines and inverter-based resources such as wind, solar, and energy storage. This transformation has led to a decrease in mechanical inertia, which necessitate a need for the new resources to provide frequency responses through their inverter interfaces. In this paper we proposed a new strategy based on model predictive control to determine the optimal active-power set-point for inverters in the event of a disturbance in the system. Our framework explicitly takes the hard constraints in power and energy into account, and we show that it is robust to measurement noise, limited communications and delay by using an observer to estimate the model mismatches in real-time. We demonstrate the proposed controller significantly outperforms an optimally tuned virtual synchronous machine on a standard 39-bus system under a number of scenarios. In turn, this implies optimized inverter-based resources can provide better frequency responses compared to conventional synchronous machines.

paper research
No Image

Peer-to-Peer Trading in Electricity Networks An Overview

Peer-to-peer trading is a next-generation energy management technique that economically benefits proactive consumers (prosumers) transacting their energy as goods and services. At the same time, peer-to-peer energy trading is also expected to help the grid by reducing peak demand, lowering reserve requirements, and curtailing network loss. However, large-scale deployment of peer-to-peer trading in electricity networks poses a number of challenges in modeling transactions in both the virtual and physical layers of the network. As such, this article provides a comprehensive review of the state-of-the-art in research on peer-to-peer energy trading techniques. By doing so, we provide an overview of the key features of peer-to-peer trading and its benefits of relevance to the grid and prosumers. Then, we systematically classify the existing research in terms of the challenges that the studies address in the virtual and the physical layers. We then further identify and discuss those technical approaches that have been extensively used to address the challenges in peer-to-peer transactions. Finally, the paper is concluded with potential future research directions.

paper research
Resilience of Dynamic Routing Against Recurrent and Random Sensing Faults

Resilience of Dynamic Routing Against Recurrent and Random Sensing Faults

Feedback dynamic routing is a commonly used control strategy in transportation systems. This class of control strategies relies on real-time information about the traffic state in each link. However, such information may not always be observable due to temporary sensing faults. In this article, we consider dynamic routing over two parallel routes, where the sensing on each link is subject to recurrent and random faults. The faults occur and clear according to a finite-state Markov chain. When the sensing is faulty on a link, the traffic state on that link appears to be zero to the controller. Building on the theories of Markov processes and monotone dynamical systems, we derive lower and upper bounds for the resilience score, i.e. the guaranteed throughput of the network, in the face of sensing faults by establishing stability conditions for the network. We use these results to study how a variety of key parameters affect the resilience score of the network. The main conclusions are (i) Sensing faults can reduce throughput and destabilize a nominally stable network; (ii) A higher failure rate does not necessarily reduce throughput, and there may exist a worst rate that minimizes throughput; (iii) Higher correlation between the failure probabilities of two links leads to greater throughput; (iv) A large difference in capacity between two links can result in a drop in throughput.

paper research
CALPA-NET  A Channel Pruning Assisted Deep Residual Network for Digital Image Steganalysis

CALPA-NET A Channel Pruning Assisted Deep Residual Network for Digital Image Steganalysis

Over the past few years, detection performance improvements of deep-learning based steganalyzers have been usually achieved through structure expansion. However, excessive expanded structure results in huge computational cost, storage overheads, and consequently difficulty in training and deployment. In this paper we propose CALPA-NET, a ChAnneL-Pruning-Assisted deep residual network architecture search approach to shrink the network structure of existing vast, over-parameterized deep-learning based steganalyzers. We observe that the broad inverted-pyramid structure of existing deep-learning based steganalyzers might contradict the well-established model diversity oriented philosophy, and therefore is not suitable for steganalysis. Then a hybrid criterion combined with two network pruning schemes is introduced to adaptively shrink every involved convolutional layer in a data-driven manner. The resulting network architecture presents a slender bottleneck-like structure. We have conducted extensive experiments on BOSSBase+BOWS2 dataset, more diverse ALASKA dataset and even a large-scale subset extracted from ImageNet CLS-LOC dataset. The experimental results show that the model structure generated by our proposed CALPA-NET can achieve comparative performance with less than two percent of parameters and about one third FLOPs compared to the original steganalytic model. The new model possesses even better adaptivity, transferability, and scalability.

paper research
An Explainable Agentic AI Framework for Uncertainty-Aware and Abstention-Enabled Acute Ischemic Stroke Imaging Decisions

An Explainable Agentic AI Framework for Uncertainty-Aware and Abstention-Enabled Acute Ischemic Stroke Imaging Decisions

Artificial intelligence models have shown strong potential in acute ischemic stroke imaging, particularly for lesion detection and segmentation using computed tomography and magnetic resonance imaging. However, most existing approaches operate as black box predictors, producing deterministic outputs without explicit uncertainty awareness or structured mechanisms to abstain under ambiguous conditions. This limitation raises serious safety and trust concerns in high risk emergency radiology settings. In this paper, we propose an explainable agentic AI framework for uncertainty aware and abstention enabled decision support in acute ischemic stroke imaging. The framework follows a modular agentic pipeline in which a perception agent performs lesion aware image analysis, an uncertainty estimation agent computes slice level predictive reliability, and a decision agent determines whether to issue a prediction or abstain based on predefined uncertainty thresholds. Unlike prior stroke imaging systems that primarily focus on improving segmentation or classification accuracy, the proposed framework explicitly prioritizes clinical safety, transparency, and clinician aligned decision behavior. Qualitative and case based analyses across representative stroke imaging scenarios demonstrate that uncertainty driven abstention naturally emerges in diagnostically ambiguous regions and low information slices. The framework further integrates visual explanation mechanisms to support both predictive and abstention decisions, addressing a key limitation of existing uncertainty aware medical imaging systems. Rather than introducing a new performance benchmark, this work presents agentic control, uncertainty awareness, and selective abstention as essential design principles for developing safe and trustworthy medical imaging AI systems.

paper research
Extreme Generative Compression  Pushing Video Rates Below 0.02%

Extreme Generative Compression Pushing Video Rates Below 0.02%

Whether a video can be compressed at an extreme compression rate as low as 0.01%? To this end, we achieve the compression rate as 0.02% at some cases by introducing Generative Video Compression (GVC), a new framework that redefines the limits of video compression by leveraging modern generative video models to achieve extreme compression rates while preserving a perception-centric, task-oriented communication paradigm, corresponding to Level C of the Shannon-Weaver model. Besides, How we trade computation for compression rate or bandwidth? GVC answers this question by shifting the burden from transmission to inference it encodes video into extremely compact representations and delegates content reconstruction to the receiver, where powerful generative priors synthesize high-quality video from minimal transmitted information. Is GVC practical and deployable? To ensure practical deployment, we propose a compression-computation trade-off strategy, enabling fast inference on consume-grade GPUs. Within the AI Flow framework, GVC opens new possibility for video communication in bandwidth- and resource-constrained environments such as emergency rescue, remote surveillance, and mobile edge computing. Through empirical validation, we demonstrate that GVC offers a viable path toward a new effective, efficient, scalable, and practical video communication paradigm.

paper research
Improving Code-Switching Speech Recognition with TTS Data Augmentation

Improving Code-Switching Speech Recognition with TTS Data Augmentation

Automatic speech recognition (ASR) for conversational code-switching speech remains challenging due to the scarcity of realistic, high-quality labeled speech data. This paper explores multilingual text-to-speech (TTS) models as an effective data augmentation technique to address this shortage. Specifically, we fine-tune the multilingual CosyVoice2 TTS model on the SEAME dataset to generate synthetic conversational Chinese-English code-switching speech, significantly increasing the quantity and speaker diversity of available training data. Our experiments demonstrate that augmenting real speech with synthetic speech reduces the mixed error rate (MER) from 12.1 percent to 10.1 percent on DevMan and from 17.8 percent to 16.0 percent on DevSGE, indicating consistent performance gains. These results confirm that multilingual TTS is an effective and practical tool for enhancing ASR robustness in low-resource conversational code-switching scenarios.

paper research
MORE  Multi-Objective Adversarial Attacks on Speech Recognition

MORE Multi-Objective Adversarial Attacks on Speech Recognition

The emergence of large-scale automatic speech recognition (ASR) models such as Whisper has greatly expanded their adoption across diverse real-world applications. Ensuring robustness against even minor input perturbations is therefore critical for maintaining reliable performance in real-time environments. While prior work has mainly examined accuracy degradation under adversarial attacks, robustness with respect to efficiency remains largely unexplored. This narrow focus provides only a partial understanding of ASR model vulnerabilities. To address this gap, we conduct a comprehensive study of ASR robustness under multiple attack scenarios. We introduce MORE, a multi-objective repetitive doubling encouragement attack, which jointly degrades recognition accuracy and inference efficiency through a hierarchical staged repulsion-anchoring mechanism. Specifically, we reformulate multi-objective adversarial optimization into a hierarchical framework that sequentially achieves the dual objectives. To further amplify effectiveness, we propose a novel repetitive encouragement doubling objective (REDO) that induces duplicative text generation by maintaining accuracy degradation and periodically doubling the predicted sequence length. Overall, MORE compels ASR models to produce incorrect transcriptions at a substantially higher computational cost, triggered by a single adversarial input. Experiments show that MORE consistently yields significantly longer transcriptions while maintaining high word error rates compared to existing baselines, underscoring its effectiveness in multi-objective adversarial attack.

paper research
No Image

Parametrized Sharing for Multi-Agent Hybrid DRL for Multiple Multi-Functional RISs-Aided Downlink NOMA Networks

Multi-functional reconfigurable intelligent surface (MF-RIS) is conceived to address the communication efficiency thanks to its extended signal coverage from its active RIS capability and self-sustainability from energy harvesting (EH). We investigate the architecture of multi-MF-RISs to assist non-orthogonal multiple access (NOMA) downlink networks. We formulate an energy efficiency (EE) maximization problem by optimizing power allocation, transmit beamforming and MF-RIS configurations of amplitudes, phase-shifts and EH ratios, as well as the position of MF-RISs, while satisfying constraints of available power, user rate requirements, and self-sustainability property. We design a parametrized sharing scheme for multi-agent hybrid deep reinforcement learning (PMHRL), where the multi-agent proximal policy optimization (PPO) and deep-Q network (DQN) handle continuous and discrete variables, respectively. The simulation results have demonstrated that proposed PMHRL has the highest EE compared to other benchmarks, including cases without parametrized sharing, pure PPO and DQN. Moreover, the proposed multi-MF-RISs-aided downlink NOMA achieves the highest EE compared to scenarios of no-EH/amplification, traditional RISs, and deployment without RISs/MF-RISs under different multiple access.

paper research
Placenta Accreta Spectrum Detection using Multimodal Deep Learning

Placenta Accreta Spectrum Detection using Multimodal Deep Learning

Placenta Accreta Spectrum (PAS) is a life-threatening obstetric complication involving abnormal placental invasion into the uterine wall. Early and accurate prenatal diagnosis is essential to reduce maternal and neonatal risks. This study aimed to develop and validate a deep learning framework that enhances PAS detection by integrating multiple imaging modalities. A multimodal deep learning model was designed using an intermediate feature-level fusion architecture combining 3D Magnetic Resonance Imaging (MRI) and 2D Ultrasound (US) scans. Unimodal feature extractors, a 3D DenseNet121-Vision Transformer for MRI and a 2D ResNet50 for US, were selected after systematic comparative analysis. Curated datasets comprising 1,293 MRI and 1,143 US scans were used to train the unimodal models and paired samples of patient-matched MRI-US scans was isolated for multimodal model development and evaluation. On an independent test set, the multimodal fusion model achieved superior performance, with an accuracy of 92.5% and an Area Under the Receiver Operating Characteristic Curve (AUC) of 0.927, outperforming the MRI-only (82.5%, AUC 0.825) and US-only (87.5%, AUC 0.879) models. Integrating MRI and US features provides complementary diagnostic information, demonstrating strong potential to enhance prenatal risk assessment and improve patient outcomes.

paper research
Predicting Ground Shifts  A Multimodal Approach Across Europe

Predicting Ground Shifts A Multimodal Approach Across Europe

Near-real-time regional-scale monitoring of ground deformation is increasingly required to support urban planning, critical infrastructure management, and natural hazard mitigation. While Interferometric Synthetic Aperture Radar (InSAR) and continental-scale services such as the European Ground Motion Service (EGMS) provide dense observations of past motion, predicting the next observation remains challenging due to the superposition of long-term trends, seasonal cycles, and occasional abrupt discontinuities (e.g., co-seismic steps), together with strong spatial heterogeneity. In this study we propose a multimodal patch-based Transformer for single-step, fixed-interval next-epoch nowcasting of displacement maps from EGMS time series (resampled to a 64x64 grid over 100 km x 100 km tiles). The model ingests recent displacement snapshots together with (i) static kinematic indicators (mean velocity, acceleration, seasonal amplitude) computed in a leakage-safe manner from the training window only, and (ii) harmonic day-of-year encodings. On the eastern Ireland tile (E32N34), the STGCN is strongest in the displacement-only setting, whereas the multimodal Transformer clearly outperforms CNN-LSTM, CNN-LSTM+Attn, and multimodal STGCN when all models receive the same multimodal inputs, achieving RMSE = 0.90 mm and $R^2$ = 0.97 on the test set with the best threshold accuracies.

paper research
Reliable Grid Forecasting  State Space Models for Safety-Critical Energy Systems

Reliable Grid Forecasting State Space Models for Safety-Critical Energy Systems

Accurate grid load forecasting is safety-critical under-predictions risk supply shortfalls, while symmetric error metrics can mask this operational asymmetry. We introduce an operator-legible evaluation framework -- Under-Prediction Rate (UPR), tail Reserve$_{99.5}^{ %}$ requirements, and explicit inflation diagnostics (Bias$_{24h}$/OPR) -- to quantify one-sided reliability risk beyond MAPE. Using this framework, we evaluate state space models (Mamba variants) and strong baselines on a weather-aligned California Independent System Operator (CAISO) dataset spanning Nov 2023--Nov 2025 (84,498 hourly records across 5 regional transmission areas) under a rolling-origin walk-forward backtest. We develop and evaluate thermal-lag-aligned weather fusion strategies for these architectures. Our results demonstrate that standard accuracy metrics are insufficient proxies for operational safety models with comparable MAPE can imply materially different tail reserve requirements (Reserve$_{99.5}^{ %}$). We show that explicit weather integration narrows error distributions, reducing the impact of temperature-driven demand spikes. Furthermore, while probabilistic calibration reduces large-error events, it can induce systematic schedule inflation. We introduce Bias/OPR-constrained objectives to enable auditable trade-offs between minimizing tail risk and preventing trivial over-forecasting.

paper research
Scalable Data-Driven Reachability Analysis and Control via Koopman Operators with Conformal Coverage Guarantees

Scalable Data-Driven Reachability Analysis and Control via Koopman Operators with Conformal Coverage Guarantees

We propose a scalable reachability-based framework for probabilistic, data-driven safety verification of unknown nonlinear dynamics. We use Koopman theory with a neural network (NN) lifting function to learn an approximate linear representation of the dynamics and design linear controllers in this space to enable closed-loop tracking of a reference trajectory distribution. Closed-loop reachable sets are efficiently computed in the lifted space and mapped back to the original state space via NN verification tools. To capture model mismatch between the Koopman dynamics and the true system, we apply conformal prediction to produce statistically-valid error bounds that inflate the reachable sets to ensure the true trajectories are contained with a user-specified probability. These bounds generalize across references, enabling reuse without recomputation. Results on high-dimensional MuJoCo tasks (11D Hopper, 28D Swimmer) and 12D quadcopters show improved reachable set coverage rate, computational efficiency, and conservativeness over existing methods.

paper research
Scale-aware Adaptive Supervised Network with Limited Medical Annotations

Scale-aware Adaptive Supervised Network with Limited Medical Annotations

Medical image segmentation faces critical challenges in semi-supervised learning scenarios due to severe annotation scarcity requiring expert radiological knowledge, significant inter-annotator variability across different viewpoints and expertise levels, and inadequate multi-scale feature integration for precise boundary delineation in complex anatomical structures. Existing semi-supervised methods demonstrate substantial performance degradation compared to fully supervised approaches, particularly in small target segmentation and boundary refinement tasks. To address these fundamental challenges, we propose SASNet (Scale-aware Adaptive Supervised Network), a dual-branch architecture that leverages both low-level and high-level feature representations through novel scale-aware adaptive reweight mechanisms. Our approach introduces three key methodological innovations, including the Scale-aware Adaptive Reweight strategy that dynamically weights pixel-wise predictions using temporal confidence accumulation, the View Variance Enhancement mechanism employing 3D Fourier domain transformations to simulate annotation variability, and segmentation-regression consistency learning through signed distance map algorithms for enhanced boundary precision. These innovations collectively address the core limitations of existing semi-supervised approaches by integrating spatial, temporal, and geometric consistency principles within a unified optimization framework. Comprehensive evaluation across LA, Pancreas-CT, and BraTS datasets demonstrates that SASNet achieves superior performance with limited labeled data, surpassing state-of-the-art semi-supervised methods while approaching fully supervised performance levels. The source code for SASNet is available at https //github.com/HUANGLIZI/SASNet.

paper research
Seamlessly Natural  Image Stitching with Natural Appearance Preservation

Seamlessly Natural Image Stitching with Natural Appearance Preservation

This paper introduces SENA (SEamlessly NAtural), a geometry-driven image stitching approach that prioritizes structural fidelity in challenging real-world scenes characterized by parallax and depth variation. Conventional image stitching relies on homographic alignment, but this rigid planar assumption often fails in dual-camera setups with significant scene depth, leading to distortions such as visible warps and spherical bulging. SENA addresses these fundamental limitations through three key contributions. First, we propose a hierarchical affine-based warping strategy, combining global affine initialization with local affine refinement and smooth free-form deformation. This design preserves local shape, parallelism, and aspect ratios, thereby avoiding the hallucinated structural distortions commonly introduced by homography-based models. Second, we introduce a geometry-driven adequate zone detection mechanism that identifies parallax-minimized regions directly from the disparity consistency of RANSAC-filtered feature correspondences, without relying on semantic segmentation. Third, building upon this adequate zone, we perform anchor-based seamline cutting and segmentation, enforcing a one-to-one geometric correspondence across image pairs by construction, which effectively eliminates ghosting, duplication, and smearing artifacts in the final panorama. Extensive experiments conducted on challenging datasets demonstrate that SENA achieves alignment accuracy comparable to leading homography-based methods, while significantly outperforming them in critical visual metrics such as shape preservation, texture integrity, and overall visual realism.

paper research
UniCrop  A Universal, Multi-Source Data Engineering Pipeline for Scalable Crop Yield Prediction

UniCrop A Universal, Multi-Source Data Engineering Pipeline for Scalable Crop Yield Prediction

Accurate crop yield prediction relies on diverse data streams, including satellite, meteorological, soil, and topographic information. However, despite rapid advances in machine learning, existing approaches remain crop- or region-specific and require data engineering efforts. This limits scalability, reproducibility, and operational deployment. This study introduces UniCrop, a universal and reusable data pipeline designed to automate the acquisition, cleaning, harmonisation, and engineering of multi-source environmental data for crop yield prediction. For any given location, crop type, and temporal window, UniCrop automatically retrieves, harmonises, and engineers over 200 environmental variables (Sentinel-1/2, MODIS, ERA5-Land, NASA POWER, SoilGrids, and SRTM), reducing them to a compact, analysis-ready feature set utilising a structured feature reduction workflow with minimum redundancy maximum relevance (mRMR). To validate, UniCrop was applied to a rice yield dataset comprising 557 field observations. Using only the selected 15 features, four baseline machine learning models (LightGBM, Random Forest, Support Vector Regression, and Elastic Net) were trained. LightGBM achieved the best single-model performance (RMSE = 465.1 kg/ha, $R^2 = 0.6576$), while a constrained ensemble of all baselines further improved accuracy (RMSE = 463.2 kg/ha, $R^2 = 0.6604$). UniCrop contributes a scalable and transparent data-engineering framework that addresses the primary bottleneck in operational crop yield modelling the preparation of consistent and harmonised multi-source data. By decoupling data specification from implementation and supporting any crop, region, and time frame through simple configuration updates, UniCrop provides a practical foundation for scalable agricultural analytics. The code and implementation documentation are shared in https //github.com/CoDIS-Lab/UniCrop.

paper research

< Category Statistics (Total: 566) >

Computer Science (514) Machine Learning (117) Artificial Intelligence (89) Computer Vision (71) Computation and Language (NLP) (62) Electrical Engineering and Systems Science (36) Cryptography and Security (24) Robotics (22) Systems and Control (22) Software Engineering (20) Mathematics (18) Statistics (17) Economics (16) Information Retrieval (15) Distributed, Parallel, and Cluster Computing (14) Human-Computer Interaction (14) Neural and Evolutionary Computing (13) Computer Science and Game Theory (11) Econometrics (11) Image and Video Processing (10) Physics (10) Sound (10) Multiagent Systems (9) Optimization and Control (8) Computational Geometry (7) Databases (7) Graphics (6) Networking and Internet Architecture (6) Quantitative Biology (6) Quantum Physics (5) Theoretical Economics (5) Computational Complexity (4) Computational Engineering, Finance, and Science (4) Computers and Society (4) Emerging Technologies (4) Information Theory (4) Methodology (4) Multimedia (4) Programming Languages (4) Quantitative Finance (4) Signal Processing (4) Audio and Speech Processing (3) Data Structures and Algorithms (3) Hardware Architecture (3) History and Philosophy of Physics (3) Logic in Computer Science (3) Neurons and Cognition (3) Social and Information Networks (3) Statistics Theory (3) Computation (2) Condensed Matter (2) Dynamical Systems (2) Formal Languages and Automata Theory (2) General Finance (2) Operating Systems (2) Optics (2) Quantitative Methods (2) Applications (1) Astrophysics (1) Combinatorics (1) Computational Physics (1) Digital Libraries (1) Disordered Systems and Neural Networks (1) General Economics (1) Genomics (1) Geophysics (1) Instrumentation and Methods for Astrophysics (1) Logic (1) Mathematical Finance (1) Mathematical Software (1) Medical Physics (1) Mesoscale and Nanoscale Physics (1) Metric Geometry (1) Other Statistics (1) Performance (1) Physics and Society (1) Plasma Physics (1) Probability (1) Trading and Market Microstructure (1)

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut