PETAR: Localized Findings Generation with Mask-Aware Vision-Language Modeling for PET Automated Reporting

February 22, 2026

Reading time: 2 minute

...

📝 Original Info

Title: PETAR: Localized Findings Generation with Mask-Aware Vision-Language Modeling for PET Automated Reporting
ArXiv ID: 2510.27680
Date: 2025-10-31
Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. (가능하면 원문에서 확인 필요) **

📝 Abstract

Generating automated reports for 3D positron emission tomography (PET) is an important and challenging task in medical imaging. PET plays a vital role in oncology, but automating report generation is difficult due to the complexity of whole-body 3D volumes, the wide range of potential clinical findings, and the limited availability of annotated datasets. To address these challenges, we introduce PETARSeg-11K, the first large-scale, publicly available dataset that provides lesion-level correspondence between 3D PET/CT volumes and free-text radiological findings. It comprises 11,356 lesion descriptions paired with 3D segmentations. Second, we propose PETAR-4B, a 3D vision-language model designed for mask-aware, spatially grounded PET/CT reporting. PETAR-4B jointly encodes PET, CT, and 3D lesion segmentation masks, using a 3D focal prompt to capture fine-grained details of lesions that normally comprise less than 0.1% of the volume. Evaluations using automated metrics show PETAR-4B substantially outperforming all 2D and 3D baselines. A human study involving five physicians -- the first of its kind for automated PET reporting -- confirms the model's clinical utility and establishes correlations between automated metrics and expert judgment. This work provides a foundational dataset and a novel architecture, advancing 3D medical vision-language understanding in PET.

PETAR: Localized Findings Generation with Mask-Aware Vision-Language Modeling for PET Automated Reporting

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Related Posts

An efficient probabilistic hardware architecture for diffusion-like models

Benchmarking the Thinking Mode of Multimodal Large Language Models in Clinical Tasks

Causal Masking on Spatial Data: An Information-Theoretic Case for Learning Spatial Datasets with Unimodal Language Models

Start searching

No results found