3D Blood Pulsation Maps

3D Blood Pulsation Maps
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We present Pulse3DFace, the first dataset of its kind for estimating 3D blood pulsation maps. These maps can be used to develop models of dynamic facial blood pulsation, enabling the creation of synthetic video data to improve and validate remote pulse estimation methods via photoplethysmography imaging. Additionally, the dataset facilitates research into novel multi-view-based approaches for mitigating illumination effects in blood pulsation analysis. Pulse3DFace consists of raw videos from 15 subjects recorded at 30 Hz with an RGB camera from 23 viewpoints, blood pulse reference measurements, and facial 3D scans generated using monocular structure-from-motion techniques. It also includes processed 3D pulsation maps compatible with the texture space of the 3D head model FLAME. These maps provide signal-to-noise ratio, local pulse amplitude, phase information, and supplementary data. We offer a comprehensive evaluation of the dataset’s illumination conditions, map consistency, and its ability to capture physiologically meaningful features in the facial and neck skin regions.


💡 Research Summary

The paper introduces Pulse3DFace, the first publicly available dataset designed for the estimation of three‑dimensional (3D) blood‑pulse maps of the human face. The authors recorded raw RGB videos of 15 subjects from 23 distinct viewpoints at 30 Hz (resolution 1224 × 1024) while simultaneously acquiring contact photoplethysmography (PPG) reference signals. In addition to the videos, a set of photographs covering the full facial surface was captured for each subject; these images were processed with a structure‑from‑motion (SfM) pipeline to obtain camera poses, intrinsics, and a textured surface mesh. The mesh was then fitted to the FLAME 3D morphable model, providing a unified shape space and texture coordinates for all subjects.

The processing pipeline consists of two main stages. First, 2‑D pulse maps are computed from each video. A reference BVP signal (S_ref) is derived by averaging skin‑segmented RGB pixels across the whole face and applying the Plane‑Orthogonal‑to‑Skin (POS) algorithm. The heart‑rate (HR_ref) is identified as the frequency with maximal power in the spectrum of S_ref and validated against the contact PPG. The video is divided into overlapping 20‑second segments; within each segment a sliding spatial window (size k = 3–17 px) extracts local RGB time series. POS is applied to each window to obtain a local BVP signal, from which eight parameters are derived: signal‑to‑noise ratio (SNR), phase of the POS‑derived BVP, phase and amplitude for each colour channel (R, G, B), and HR. SNR is calculated as a log‑ratio of power around HR_ref (and its first harmonic) to the remaining physiological band (30–200 bpm). Phase is taken from the FFT bin corresponding to HR_ref, while colour‑channel amplitudes are obtained via Hilbert‑transform‑based analytic signal processing.

Second, the 2‑D maps are lifted into 3‑D space. SfM provides the camera extrinsics for each video frame; the textured mesh is cleaned, centered, and registered to FLAME. The FLAME texture space serves as a common canvas: each pixel’s 2‑D parameters are projected onto the corresponding UV coordinate, yielding full‑face 3‑D maps of SNR, amplitude, and phase. A skin mask removes hair and boundary artefacts; the green channel consistently shows the highest SNR, confirming prior findings that the green band carries the strongest pulsatile information.

The authors evaluate the dataset along three axes. (1) Illumination analysis shows that multi‑view capture mitigates lighting bias: SNR distributions across different lighting conditions are more uniform than in single‑view datasets. (2) Consistency across viewpoints is quantified by Pearson correlations between 3‑D maps of the same subject captured from different angles, achieving average correlations above 0.85, indicating high spatial stability of the pulse signal. (3) Physiological validity is demonstrated by correlating the 3‑D amplitude and phase maps with the contact PPG; average correlation coefficients reach 0.78, and phase delays observed in the neck region correspond to known pulse transit times.

Key contributions are: (i) a complete pipeline for generating high‑resolution 3‑D pulse maps compatible with the FLAME model, (ii) a multi‑view, multi‑illumination dataset that enables systematic study of illumination effects on remote PPG, (iii) quantitative metrics (SNR, phase, amplitude) for benchmarking pulse‑extraction algorithms, and (iv) a resource for creating synthetic avatars with realistic blood‑volume pulse dynamics, which can be used to train end‑to‑end deep learning models for remote heart‑rate and heart‑rate‑variability estimation.

In conclusion, Pulse3DFace fills a critical gap in remote photoplethysmography research by providing the first 3‑D pulse‑map dataset, a robust processing pipeline, and comprehensive evaluation. It opens avenues for multi‑view pulse analysis, illumination‑invariant algorithm development, and realistic synthetic data generation for training and validation of next‑generation contact‑free cardiovascular monitoring systems. Future work may expand the subject pool, incorporate diverse skin tones and ages, and explore real‑time 3‑D pulse‑map reconstruction.


Comments & Academic Discussion

Loading comments...

Leave a Comment