Arxiv 2511.19941

February 22, 2026

Reading time: 5 minute

...

📝 Original Info

Title: Arxiv 2511.19941
ArXiv ID: 2511.19941
Date: Pending
Authors: ** 논문에 명시된 저자 정보가 제공되지 않아 현재 확인 불가. arXiv 2511.19941 페이지에서 저자명을 확인하시기 바랍니다. **

📝 Abstract

Magnetic Resonance Fingerprinting (MRF) leverages transient-state signal dynamics generated by the tunable acquisition parameters, making the design of an optimal, robust sequence a complex, high-dimensional sequential decision problem, such as optimizing one of the key parameters, flip angle. Reinforcement learning (RL) offers a promising approach to automate parameter selection, to optimize pulse sequences that maximize the distinguishability of fingerprints across the parameter space. In this work, we introduce an RL framework for optimizing the flip-angle schedule in MRF and demonstrate a learned schedule exhibiting non-periodic patterns that enhances fingerprint separability. Additionally, an interesting observation is that the RL-optimized schedule may enable a reduction in the number of repetition time, potentially accelerate MRF acquisitions.

💡 Deep Analysis

📄 Full Content

The acquisition phase of Magnetic Resonance Fingerprinting(MRF), characterized by the pseudo-random variation of parameters, presents a sophisticated optimal control problem. Some works use empirical sequence design for MRF acquisition, like QALAS that uses an interleaved Look-locker sequence with T2 preparation pulse. There has been exploration of optimizing sequence design for MRF, mostly using the Cramér-Rao lower bound (CRLB) optimization [1]. Zhao et al. provides a general framework for flip angle (FA) and Repetition Time (TR) optimization in MRF [2]; Asslander et al. proposed an optimal-control style framework to design FA and TR patterns to minimize noise via CRLB [3]; Lee et al. applied an automatic differentiation on bloch-simulations to compute gradients of CRLB with respect to FA schedule and TR [4]. More recently, Slioussarenko et al. Optimized the FA and TE parameters of a FLASH-based MRF sequence and achieved 30% scan duration reduction [5].

Reinforcement learning (RL) is uniquely positioned to address this by automating the selection of parameters [6], e.g. flip angle to generate an optimized pulse sequence that maximizes the distinctness of fingerprints across the parameter space. In this work, we propose a reinforcement learning framework that interacts with a GPU-accelerated Extended Phase Graphs [7] simulator to learn the FA schedule.

The contributions of this works are: (1) We propose, for the first time, a reinforcement learning framework to optimize the flip-angle schedule in MR fingerprinting; (2) We demonstrate a learned flip-angle schedule that exhibits nonperiodic patterns and improves fingerprint separability; (3) Our results indicate that the RL-optimized schedule may allow a reduction in the number of TRs, potentially accelerating MRF acquisition.

We implement a multi-level parallelism strategy, enabling intra-simulation parallelism (parallelising the execution of one EPG simulation) and inter-simulation parallelism (running multiple simulations simultaneously). Within this framework, we implemented a Steady-State Free Precession (SSFP) sequence [8] using the accelerated EPG implementation for validation purposes.

To enable integration with reinforcement learning, the EPG simulation must be interactive. In this work, the accelerated EPG simulator is encapsulated as a Gym environment, providing a standardized interface for interaction with the training process.

Environment Setup: The interactive environment defines a Gymnasium environment for RL-based flip angle optimization in MR fingerprinting. The agent adjusts flip angles over multiple TRs, and the environment simulates parallel EPG sequences with varied T1/T2 parameters sampled from a dictionary range. Observations are echo density matrices and rewards come from margin-based functions that encourage feature diversity/dissimilarity. The actions correspond to small adjustments to the previous flip angle, in this case, only a single-degree adjustment is allowed per TR.

Policy Network Architecture: The policy network uses a Transformer encoder as the feature extractor, processing time-series echo signals. The encoder includes configurable parameters (embedding dimension, number of attention heads, and layers). The extracted features are passed through a multi-layer perceptron (MLP) that outputs separate policy and value heads for the actor-critic architecture.

The reward function is design to maximize the dissimilarities between the MRF signal trajectory. We introduce a margin-based reward function to maximize pairwise dissimilarity across all time series in a batch. The reward combines a positive term for pairs exceeding a margin threshold and a negative term for pairs below it, normalized by the number of pairs. Given a batch of time series features F=f 1 , f 2 , … , f N , we compute a pairwise dissimilarity matrix D∈R NxN using normalized dot product for measuring similarity. The reward is:

where τ is the margin threshold, P is the set of all unique pairs, excluding self-comparisons, and |P|=N ( N -1)/2.

The first term encourages pairs to exceed the margin (dissimilarity), while the second penalizes pairs below the margin (similarity). Normalization by P stabilizes the | | reward across batch sizes. This formulation promotes diversity across all pairs, unlike approaches that only reward pairs above the threshold, ensuring the learned sequences are maximally dissimilar.

Optimization (PPO) with a Transformer-based actor-critic policy [9]. PPO [10] is an on-policy algorithm that updates the policy using collected trajectories. The agent collects n timesteps of experience by interacting with the EPG simulation environment. At each step, the agent observes the current echo signal state, selects an action (flip angle adjustment) from a discrete action space, and receives a reward based on the dissimilarity between time series in the batch.

Training: for training, each batch is randomly sampled from 82x82 T1-T2 pairs, w

📄 Read Full PDF on ArXiv

Reference

This content is AI-processed based on open access ArXiv data.

Arxiv 2511.19941

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Start searching

No results found