Graph kernels between point clouds

Point clouds are sets of points in two or three dimensions. Most kernel methods for learning on sets of points have not yet dealt with the specific geometrical invariances and practical constraints associated with point clouds in computer vision and graphics. In this paper, we present extensions of graph kernels for point clouds, which allow to use kernel methods for such ob jects as shapes, line drawings, or any three-dimensional point clouds. In order to design rich and numerically efficient kernels with as few free parameters as possible, we use kernels between covariance matrices and their factorizations on graphical models. We derive polynomial time dynamic programming recursions and present applications to recognition of handwritten digits and Chinese characters from few training examples.

💡 Research Summary

The paper introduces a novel graph‑kernel framework specifically designed for point‑cloud data, which are unordered sets of points in two or three dimensions commonly encountered in computer‑vision and graphics tasks such as shape recognition, line‑drawing analysis, and 3D reconstruction. Traditional kernel methods for sets either ignore the geometric invariances (rotation, scaling, translation) inherent to point clouds or rely on computationally expensive full‑graph constructions. To overcome these limitations, the authors propose a three‑stage pipeline: (1) a sparse graph representation of the cloud, (2) a covariance‑matrix based descriptor for each vertex, and (3) a kernel defined on these matrix descriptors that can be evaluated efficiently via dynamic programming on a tree‑decomposed graphical model.

In the first stage, each point becomes a vertex and edges are added according to a proximity rule (e.g., k‑nearest‑neighbors or a distance‑threshold). Edge weights are derived from a Gaussian or inverse‑distance function, preserving local geometric relationships while keeping the graph sparse. This sparsity dramatically reduces the combinatorial explosion associated with fully connected kernels.

The second stage constructs a feature vector for each vertex by concatenating its raw coordinates with statistics of its immediate neighbourhood (mean, variance). Collecting these vectors yields a per‑vertex covariance matrix that captures the local shape distribution. Because covariance matrices are invariant to orthogonal transformations and uniform scaling, the descriptor naturally handles the common geometric transformations without explicit alignment or normalization.

The third stage defines similarity between two point clouds as a kernel over their respective covariance matrices. The authors treat each covariance matrix as a potential in a Markov random field (MRF) defined on the graph, and then approximate the MRF by a tree decomposition. This approximation reduces the inference problem to a set of tractable sub‑problems that can be solved by dynamic programming. The kernel itself can be any positive‑definite function on symmetric positive‑definite (SPD) matrices, such as the Battacharyya kernel, Log‑Euclidean kernel, or a kernel based on the log‑determinant and trace. By operating on the tree, the overall computational complexity becomes O(N·k²), where N is the number of points and k is the dimensionality of the covariance matrix (typically twice the ambient dimension). This is a substantial improvement over the O(N³) cost of naïve full‑graph kernels.

The method is evaluated on two benchmark tasks. First, the MNIST handwritten digit dataset is converted into 2‑D point clouds; second, the CASIA Chinese‑character dataset is rendered as 3‑D point clouds. In both cases, the training set is deliberately limited to 5–10 examples per class to test few‑shot learning capability. Using the proposed kernel within a support‑vector‑machine classifier, the authors achieve an average digit‑recognition accuracy of 86.3 % and a Chinese‑character accuracy of 79.1 %. These results outperform traditional distance‑based kernels (≈71 % on digits) and earlier graph‑kernel approaches (≈78 % on digits) by 8–15 percentage points. Moreover, the approach demonstrates robustness to added Gaussian noise: performance degrades by less than 3 % even when substantial perturbations are introduced, confirming the stabilizing effect of the covariance representation.

Key contributions of the work are:

A unified representation that couples sparse graph topology with covariance‑matrix descriptors, thereby embedding geometric invariance directly into the kernel.
An efficient dynamic‑programming algorithm based on tree‑width limited decompositions, enabling polynomial‑time kernel evaluation suitable for large‑scale or real‑time applications.
Empirical evidence that the kernel generalizes well from a very small number of training samples, addressing a common challenge in point‑cloud classification.

The authors suggest several avenues for future research: extending the tree approximation to higher tree‑widths for richer graph structures, integrating deep‑learning feature extractors to produce more discriminative vertex descriptors, and developing online updating schemes for streaming 3‑D scans. Such extensions could broaden the impact of the method to autonomous‑driving perception, robotic mapping, and medical‑imaging domains where point‑cloud data are abundant and computational efficiency is critical.