MATTER: Multiscale Attention for Registration Error Regression
Point cloud registration (PCR) is crucial for many downstream tasks, such as simultaneous localization and mapping (SLAM) and object tracking. This makes detecting and quantifying registration misalignment, i.e., PCR quality validation, an important task. All existing methods treat validation as a classification task, aiming to assign the PCR quality to a few classes. In this work, we instead use regression for PCR validation, allowing for a more fine-grained quantification of the registration quality. We also extend previously used misalignment-related features by using multiscale extraction and attention-based aggregation. This leads to accurate and robust registration error estimation on diverse datasets, especially for point clouds with heterogeneous spatial densities. Furthermore, when used to guide a mapping downstream task, our method significantly improves the mapping quality for a given amount of re-registered frames, compared to the state-of-the-art classification-based method.
💡 Research Summary
The paper introduces MATTER (Multiscale Attention for registration Error Regression), a novel framework that moves point‑cloud registration quality assessment from a coarse classification problem to a fine‑grained regression task. Traditional approaches treat misalignment detection as a categorical decision (e.g., “good”, “moderate”, “bad”), which limits the ability to quantify the actual registration error. MATTER instead predicts a continuous scalar error (the mean point‑wise distance between the source cloud transformed by the estimated pose and the same cloud transformed by the ground‑truth pose).
The core of MATTER is a multiscale feature extraction pipeline combined with a multi‑head attention mechanism that learns to weight features from different neighborhood radii on a per‑point basis. For each anchor point (selected by farthest‑point sampling from both source and reference clouds), three concentric neighborhoods are built with radii of roughly 7.5 m, 4.0 m, and 2.5 m. Within each radius the method computes five geometric descriptors: joint differential entropy, separate differential entropy, Sinkhorn divergence, joint coverage ratio, and separate coverage ratio. Three global descriptors (visibility score, sensor distance, and a binary source flag) are appended, yielding a (5 S + 3)‑dimensional feature vector (S = 3).
A query‑learning MLP maps this vector to a query vector q. For each scale s, the same vector is used as key k(s) and value v(s). Multi‑head attention (four heads) computes softmax‑scaled similarity scores α(s)ᵢ between q and each k(s), controlled by a temperature τ that determines how sharply the model selects a scale. The attention‑weighted sum of the values produces a compact 8‑dimensional embedding (\tilde{f}), which is then processed by a PointTransformer followed by a three‑layer ReLU‑MLP to output the predicted alignment error E_align. The network is trained with an ℓ₂ loss on ground‑truth errors.
MATTER addresses two fundamental trade‑offs in point‑cloud registration error estimation. Small neighborhoods capture fine‑grained local geometry, making entropy‑based descriptors reliable, but they fail when the initial pose is far from the true alignment because corresponding structures fall outside the neighborhood. Large neighborhoods guarantee overlap even under poor initial poses but introduce non‑overlapping regions and heterogeneous point distributions, violating the isotropic Gaussian assumption underlying the closed‑form entropy estimates. By learning attention weights, MATTER dynamically selects the most informative scale for each point, effectively balancing these opposing effects.
The authors evaluate MATTER on three realistic benchmarks: (1) nuScenes‑ICP (high overlap, ICP registration), (2) nuScenes‑ICP with synthetic GNSS‑style noise (perturbed initial poses), and (3) KITTI‑GeoTransformer (low overlap, GeoTransformer registration). They adapt several state‑of‑the‑art classification‑based misalignment detectors (F‑ACT, CorAl, KPConv‑based, NDT Score) to regression by replacing their classification heads with a scalar predictor and training them end‑to‑end. Across all datasets, MATTER achieves the lowest root‑mean‑square error (RMSE), mean absolute error (MAE), and highest coefficient of determination (R²). For example, on KITTI it reduces RMSE from 0.49 m (F‑ACT) to 0.34 m, and R² improves from 0.969 to 0.985.
Ablation studies show that a single‑scale model must pick a dataset‑specific radius (large for high‑overlap, small for low‑overlap) and therefore cannot generalize well. Simple concatenation of multiscale features already improves performance, but the attention‑based selector yields the best results with only a negligible increase in parameters (≈0.06 %). Visualizations (Fig. 2) illustrate that the attention mechanism prefers the smallest scale in dense, well‑overlapping regions and switches to larger scales in sparse or non‑overlapping areas, confirming the intended adaptive behavior.
To demonstrate downstream impact, the authors integrate MATTER into a SLAM‑style mapping pipeline on KITTI sequence 10. They register every fifth scan with GeoTransformer, then use the predicted errors to decide which frames to re‑register under a fixed budget (≈39 % of frames). MATTER’s selections lead to a final frame alignment error of 18.35 m, compared to 21.46 m when using F‑ACT. Varying the re‑registration rate shows that MATTER consistently yields lower final errors, especially at low re‑registration percentages (7 %–40 %).
In summary, MATTER contributes three major advances: (1) reframing point‑cloud misalignment assessment as a regression problem, (2) introducing a multiscale geometric feature set fused via learned attention to adaptively choose neighborhood scales per point, and (3) demonstrating that accurate error prediction can directly improve downstream mapping without additional computational cost. The approach is lightweight, robust to heterogeneous point densities and poor initial poses, and opens the door to more nuanced quality‑aware SLAM and 3D reconstruction systems. Future work may explore extending the multiscale attention to non‑rigid deformations, dynamic temperature scheduling, or integrating learned descriptors beyond entropy‑based statistics.
Comments & Academic Discussion
Loading comments...
Leave a Comment