Wedge Sampling: Efficient Tensor Completion with Nearly-Linear Sample Complexity

Wedge Sampling: Efficient Tensor Completion with Nearly-Linear Sample Complexity
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We introduce Wedge Sampling, a new non-adaptive sampling scheme for low-rank tensor completion. We study recovery of an order-$k$ low-rank tensor of dimension $n \times \cdots \times n$ from a subset of its entries. Unlike the standard uniform entry model (i.e., i.i.d. samples from $[n]^k$), wedge sampling allocates observations to structured length-two patterns (wedges) in an associated bipartite sampling graph. By directly promoting these length-two connections, the sampling design strengthens the spectral signal that underlies efficient initialization, in regimes where uniform sampling is too sparse to generate enough informative correlations. Our main result shows that this change in sampling paradigm enables polynomial-time algorithms to achieve both weak and exact recovery with nearly linear sample complexity in $n$. The approach is also plug-and-play: wedge-sampling-based spectral initialization can be combined with existing refinement procedures (e.g., spectral or gradient-based methods) using only an additional $\tilde{O}(n)$ uniformly sampled entries, substantially improving over the $\tilde{O}(n^{k/2})$ sample complexity typically required under uniform entry sampling for efficient methods. Overall, our results suggest that the statistical-to-computational gap highlighted in Barak and Moitra (2022) is, to a large extent, a consequence of the uniform entry sampling model for tensor completion, and that alternative non-adaptive measurement designs that guarantee a strong initialization can overcome this barrier.


💡 Research Summary

This paper introduces Wedge Sampling, a novel non‑adaptive measurement scheme for low‑rank tensor completion that dramatically reduces the sample complexity required for both weak and exact recovery. Traditional tensor completion analyses assume uniform entry sampling, where each entry of an order‑k tensor is observed independently with probability p. When the tensor is unfolded into a highly rectangular matrix, efficient algorithms rely on estimating the left singular subspace of this matrix via the non‑diagonal part of AAᵀ. The non‑diagonal entries are non‑zero only when a length‑two path (a “wedge”) exists between two left vertices in the associated bipartite sampling graph. Under uniform sampling, each potential wedge appears with probability p², so for p ≪ n^{‑k/2} the graph of left vertices remains essentially disconnected. Consequently, existing polynomial‑time methods need ˜O(n^{k/2}) samples, matching the conjectured statistical‑to‑computational barrier of Barak and Moitra (2022).

Wedge Sampling directly addresses this bottleneck by sampling wedges themselves rather than individual edges. The sampling space consists of all triples (i, ℓ, j) where i and j are left vertices and ℓ is a right vertex. Each sampled wedge yields both entries A_{iℓ} and A_{jℓ}. Because the number of observed wedges is proportional to the total sample budget, a near‑linear number of wedges (˜O(n)) suffices to make the left‑vertex graph well‑connected, thereby strengthening the spectral signal needed for initialization.

The paper’s technical contributions are fourfold:

  1. Concentration under wedge sampling. The authors prove a new matrix concentration bound (Theorem 5) for the highly rectangular unfolded matrix formed from wedge samples. The bound holds with only ˜O(n) observations by exploiting the incoherence of the tensor unfolding, a regime beyond standard matrix‑completion inequalities.

  2. Leave‑one‑out ℓ₂,∞ analysis. To obtain fine‑grained control of the left singular subspace, a bespoke leave‑one‑out argument is developed (Theorem 6). This yields ℓ₂,∞ error guarantees that are stronger than existing ℓ₂ results and are crucial for the subsequent refinement steps.

  3. Two concrete algorithms.

    • Spectral weak recovery: Perform wedge‑sampling spectral initialization with O(n log n) samples, then apply a spectral denoising step using an additional O(log n) uniform samples.
    • Gradient‑descent exact recovery: Use wedge‑sampling initialization with ˜O(n) samples, followed by gradient descent refinement that requires only ˜O(n) additional uniform samples. Both algorithms achieve high‑probability success with total sample complexity ˜O(n), a substantial improvement over the ˜O(n^{k/2}) required under uniform sampling.
  4. Plug‑and‑play compatibility. The wedge‑sampling initialization can be combined with existing state‑of‑the‑art tensor completion pipelines (e.g., Montanari‑Sun, Cai‑et al.) by simply augmenting them with ˜O(n) uniform samples, preserving their convergence guarantees while drastically lowering the overall sample budget.

The authors also develop new concentration tools for sparse random tensors under an incoherent norm, overcoming the spectral‑norm barrier at p ≈ n^{‑3/2} that hampers prior analyses. Their leave‑one‑out technique provides the necessary ℓ₂,∞ control to handle ultra‑sparse regimes where only ˜O(n) samples are available.

Experimental results on synthetic tensors of various orders (k = 3, 4) and ranks confirm the theory. Wedge‑sampling‑based methods achieve accurate recovery with near‑linear samples, whereas uniform‑sampling baselines fail or require orders of magnitude more observations. Moreover, adding a modest number of uniform samples after wedge initialization enables existing algorithms to reach exact recovery with the same near‑linear budget.

In summary, the paper demonstrates that the statistical‑to‑computational gap observed in tensor completion is largely an artifact of the uniform entry sampling model. By redesigning the measurement process to directly capture informative length‑two patterns, wedge sampling provides a powerful, non‑adaptive framework that attains almost optimal sample complexity while remaining compatible with current polynomial‑time algorithms. This work opens a new direction for designing measurement schemes that bridge the gap between information‑theoretic limits and efficient computation in high‑dimensional tensor problems.


Comments & Academic Discussion

Loading comments...

Leave a Comment