KOINEU

February 10, 2026

Reading time: 22 minute

...

📝 Original Info

Title:
ArXiv ID: 2512.20325
Date:
Authors: Unknown

📝 Abstract

Exterior powers play important roles in persistent homology in computational geometry. In the present paper we study the problem of extracting the K longest intervals of the exterior-power layers Λ i M of a tame persistence module M , directly from the barcode B(M ), without enumerating the entire B(Λ i M ). We prove a structural decomposition theorem that organizes B(Λ i M ) into monotone per-anchor streams with explicit multiplicities, enabling a best-first algorithm. We provide an O (M + K) log M time algorithm for any fixed i ≥ 2, obtained via a grouped best-first search. We also show that the Top-K length vector is 2-Lipschitz under bottleneck perturbations of the input barcode, and prove a comparison-model lower bound implying the O(M log M ) preprocessing is information-theoretically unavoidable. Our experiments confirm the theory, showing speedups over full enumeration in high overlap cases. By enabling efficient extraction of the most prominent features, our approach makes higher-order persistence feasible for large datasets and thus broadly applicable to machine learning, data science, and scientific computing.

📄 Full Content

Exterior powers Λ i in persistent homology capture higher-order interactions among topological features that ordinary persistence cannot record [14,9,25]. 1While standard persistence tracks the lifetimes of individual cycles, exterior powers Λ i encode how groups of cycles coexist, providing richer invariants that are stable and computable [7,4,5]. These higher-order signatures are increasingly important in applications ranging from theoretical topology to machine learning, where concise and discriminative summaries are essential.

However, computing the full exterior-power barcode quickly becomes infeasible: even for Λ2 , its size can be quadratic in the input. Straightforward algorithms that enumerate all intervals therefore waste work when only the most significant features are needed. This motivates the Top-K problem: extracting just the K longest intervals of B(Λ i M ) without enumerating the entire structure. Such a view aligns with practice, where users seek concise visual summaries, robust statistics, or fixed-length features for downstream learning.

This work shows that Top-K for exterior powers admits an efficient, stable solution, bridging classical persistent homology algorithms [10,26,1] with techniques from selection algorithms [11,13] and persistent data structures [8]. This makes higher-order persistence more scalable and broadly applicable across computational geometry and topological data analysis.

The main contributions of this paper are as follows: (i) We give a structural decomposition of B(Λ i M ) into simple monotone streams, making explicit where higher-order intervals originate and how their multiplicities arise; (ii) We design a best-first algorithm that extracts the exact Top-K intervals in near outputsensitive time, avoiding the cost of full enumeration; (iii) We show that the Top-K length vector is stable under bottleneck perturbations, providing a concise and noise-robust summary; (iv) We establish an Ω(M log M ) lower bound in the comparison model, proving that our preprocessing cost is optimal up to constants.

In the rest of the paper, we develop mathematical foundations first, and provide structure theorems and algorithms with complexity results. We then prove stability and optimality results. Finally, we give an experimental verification.

Let i ≥ 2 be a fixed constant throughout the paper.

Let M be a tame, pointwise finite-dimensional persistence module over

, where, for convenience of notation, we use the same symbol M for the module and the corresponding number of intervals (for persistence module basics, see [25,5]). We write M t as usual for the value of M at t ∈ I. Λ i M is defined pointwise by (Λ i M ) t = Λ i M t . 2 Unless otherwise stated, we assume all bars are finite (d r < ∞); this covers common filtrations on finite complexes/graphs. In some filtrations (e.g. H 0 of Vietoris-Rips or Čech complexes), some bars extend to +∞. Our algorithm and decomposition extend unchanged if such bars are handled in either of the following standard ways: (i) Truncation: fix a global time horizon t max and replace each infinite bar [b r , ∞) by [b r , t max ), so that Λ i -intervals respect the finite horizon; (ii) Relative formulation: regard an infinite bar as persisting until a formal symbol ∞, and observe that in the exterior-power interval calculus (Theorem 1 below) only min{d r , . . .} appears, so truncating to any sufficiently large finite cutoff yields the same Top-K results. Thus we may assume without loss of generality that all bars are finite.

We adopt the closed-open convention and process a global event list of all b r and d r sorted by time, breaking ties by handling deaths before births [9,5]. 3Among births at the same time we fix any total order that is consistent across the sweep (cf. [9,10]). For an event sweep from -∞ to +∞, just before the birth of bar r at time b r , define the alive set A r := { s : [b s , d s ) is alive just before b r } and c r := |A r |. Order A r by non-increasing death times; write these as d r (1)

so j → ℓ r (j) is non-increasing.

We first prove a fundamental theorem clarifying the barcode structure of exterior powers.

Theorem 1 (Exterior-power interval calculus). For any i ≥ 1, the barcode of Λ i M is the multiset

Proof. By the barcode decomposition for tame pointwise finite-dimensional modules (see [25,5]), there is a (noncanonical) isomorphism M ∼ = M r=1 I [br,dr) , where I [br,dr) is the (one-dimensional) interval module supported on [b r , d r ).

Fix i ≥ 1. Apply the exterior-power functor pointwise in t ∈ R. For any finite family of vector spaces,

), the summand for {ℓ 1 , . . . , ℓ i } is k precisely when t lies in the intersection i j=1 [b ℓj , d ℓj ), and is 0 otherwise. Hence, as t varies, the subfunctor generated by this summand is the interval module supported on i j=1 [b ℓj , d ℓj ) = max j b ℓj , min j d ℓj , which is nonzero exactly when max j b ℓj < min j d ℓj .

Naturality of the above isomorphisms with respect to the structure maps of M shows that these pointwise decompositions assemble to an isomorphism of persistence modules

, with the convention that empty intersections contribute the zero module and hence no bar. Therefore the barcode of Λ i M is precisely the stated multiset of intervals.

Let the multiset of lengths of B(Λ i M ) (counted with multiplicity) be sorted in non-increasing order as L 1 ≥ L 2 ≥ • • • . For K ≥ 1, the Top-K multiset is {L 1 , . . . , L K } (with multiplicity), and the Top-K length vector is

padded with zeros if necessary. 4For j ≥ i -1 define the binomial weight w i (j) := j-1 i-2 . In Section 3 we prove that, for fixed anchor r, all Λ i intervals whose largest chosen rank equals j have common length ℓ r (j) and total multiplicity w i (j); moreover, the union over all anchors gives the full multiset B(Λ i M ).

Let ∆ = {(t, t) ∈ R 2 : t ∈ R} denote the diagonal. For barcodes X, Y (finite multisets of points (b, d) with b < d), an ε-matching is a partial matching between X ∪ ∆ and Y ∪ ∆ such that matched pairs are within L ∞ -distance ≤ ε and unmatched points lie within ε of the diagonal. The bottleneck distance is

Theorem 2 (Stability of exterior powers). For i ≥ 1 and tame persistence modules M, M ′ ,

Proof. Suppose M and M ′ are ε-interleaved, i.e. there exist linear maps f t : M t → M ′ t+ε and g t : M ′ t → M t+ε commuting with structure maps and satisfying the usual zig-zag relations up to shift 2ε. Because Λ i is a functor on vector spaces that preserves linear maps, applying Λ i to each f t , g t yields natural transformations

These commute with the induced structure maps of Λ i M and Λ i M ′ , since functors preserve commutative diagrams. Moreover, the zig-zag identities are preserved under Λ i , because if g t+ε • f t equals the shift map M t → M t+2ε , then Λ i g t+ε • Λ i f t equals the shifted map on Λ i M t . Thus Λ i M and Λ i M ′ are also ε-interleaved. By the fundamental isometry theorem of persistence ( [4,5]), interleaving distance equals bottleneck distance on barcodes. Hence

This result shows that exterior powers preserve the classical stability of persistence, ensuring that higher-order interaction features remain robust under perturbations of the input data.

We analyze running time in the RAM (Random Access Machine) model with comparisons; sorting O(M ) endpoints costs O(M log M ), which is informationtheoretically unavoidable in this model (cf. Section 6; see also [15,6]). We use coordinate compression of distinct death times to {1, . . . , N }.

Persistent order-statistics (OS) tree. We will use a standard persistent segment tree over {1, . . . , N } storing counts of alive bars at each compressed death coordinate (cf. [8]). It supports:

-Update(root, pos, ±1) in O(log M ) time, returning a new root and keeping the old root immutable; -KthFromRight(root, k): returns the death value of the k-th alive bar in non-increasing order, counting with multiplicity. Equivalently, the tree maintains cumulative counts of alive bars at each coordinate, and the query walks these counts from the right to locate the k-th bar; -Size(root) in O(1) time.

During the sweep we store, for each birth of r, the snapshot root T r before inserting r (encoding A r ) and the integer c r = Size(T r ). The total space is O(M ) for heap buffers plus O(M log M ) nodes for persistence.

We now derive a birth-anchored, rank-grouped description of B(Λ i M ) that will drive our best-first algorithm. Throughout this section, the sweep/tie conventions of Section 2 apply (see also [14,25] for background on barcode manipulations).

Given an i-tuple I = {ℓ 1 , . . . , ℓ i } ⊆ {1, . . . , M } with max j b ℓj < min j d ℓj , let

By our event order (deaths before births, and a fixed total order among equal-time births), there is a unique index r ⋆ ∈ A ⋆ that is processed last at time t ⋆ .

We call r ⋆ the anchor of I. At the moment just before b r ⋆ = t ⋆ , all elements of I \ {r ⋆ } are alive; hence I \ {r ⋆ } ⊆ A r ⋆ , where A r ⋆ is the alive set from Section 2. Ordering A r ⋆ by non-increasing death times, write the deaths as

Lemma 1. Assume the sweep order processes deaths before births at equal times and fixes a total order among simultaneous births. For any Λ i interval arising from

Then there is a unique anchor r ⋆ , namely the index processed last among the births at t ⋆ , and

Conversely, for any r and any (i-1)-subset of A r , the i-tuple {r} ∪ S yields a (possibly zero-length) Λ i interval with birth b r .

Proof. By Theorem 1, the Λ i interval from

Let t ⋆ = max j b ℓj . Among the indices with birth t ⋆ , exactly one is processed last under the tie rule; call it r ⋆ . At time b r ⋆ , all other ℓ j are alive, so

Conversely, for any anchor r and any (i -1)-subset S ⊆ A r , the i-tuple {r} ∪ S yields the interval [b r , min{d r , d s : s ∈ S}) by the same formula.

Fix an anchor r with alive set A r of size c r . Let S = {j 1 < • • • < j i-1 } ⊆ {1, . . . , c r } be the ranks of the chosen neighbors (so the corresponding death times are d r (j 1 ), . . . , d r (j i-1 )). The Λ i interval produced by (r, S) is

whose length equals, by definition (1), |I(r, S)| = ℓ r (max S). Hence the length depends only on the largest rank in S.

Proposition 1. Fix r and a rank j ∈ {i -1, . . . , c r }. The number of (i -1)subsets S of ranks with max S = j equals w i (j) = j-1 i-2 . All such subsets yield the same length ℓ r (j), truncated below by 0 as in (1). In particular, when d r (j) ≤ b r the resulting value is ℓ r (j) = 0, which contributes nothing to B(Λ i M ).

Proof. To have max S = j, one must include rank j and choose the remaining i-2 ranks from {1, . . . , j -1}, giving j-1 i-2 choices. The shared length follows because |I(r, S)| = min{d r , d r (j)} -b r depends only on the largest rank (and is truncated below by 0 as in (1)). { ℓ r (j) with multiplicity w i (j) = j-1 i-2 : j ∈ J r }, with j → ℓ r (j) non-increasing, and globally

{ ℓ r (j) with multiplicity w i (j) : j ∈ J r }, a disjoint union of multisets of finite (positive-length) intervals.

Proof. Fix i ≥ 2. By the interval calculus (Theorem 1), any i-tuple I = {ℓ 1 , . . . , ℓ i } with max j b ℓj < min j d ℓj produces the Λ i -interval [max j b ℓj , min j d ℓj ). Let t ⋆ = max j b ℓj and choose the unique index r ⋆ that is processed last among those with b ℓj = t ⋆ . Then I \ {r ⋆ } ⊆ A r ⋆ , and r ⋆ is the anchor of I. Conversely, for any anchor r and (i-1)-subset S ⊆ A r , the i-tuple {r} ∪ S yields

Thus every element of B(Λ i M ) arises uniquely from some pair (r, S). Now order A r by non-increasing death times d r (1)

} are the ranks of the chosen neighbors, then the length of I(r, S) depends only on the largest rank: |I(r, S)| = ℓ r (max S). Therefore all (i-1)-subsets with the same maximal rank j yield the same length ℓ r (j).

At this point Proposition 1 applies: it tells us that the number of such subsets is exactly w i (j) = j-1 i-2 , and that they all contribute the same value ℓ r (j) (truncated at 0). Hence for each anchor r, the multiset of anchored lengths is ℓ r (j) with multiplicity w i (j) : j ∈ {i -1, . . . , c r } .

Finally, define J r = {j : ℓ r (j) > 0}. Restricting to j ∈ J r removes the zerolength intervals, which do not belong to B(Λ i M ). Because each interval has a unique anchor, the global barcode is the disjoint multiset union over anchors:

ℓ r (j) with multiplicity w i (j) : j ∈ J r .

Monotonicity of j → ℓ r (j) follows directly from the ordering d r (1) ≥ d r (2) ≥ • • • . This proves the theorem.

When i = 2, w 2 (j) = j-1 0 = 1, so each rank contributes exactly one element and the anchored stream becomes a simple non-increasing sequence ℓ r (1)

The above theorem gives a complete and nonredundant decomposition of B(Λ i M ) into per-anchor monotone streams with closed-form multiplicities, providing the structural foundation for efficient Top-K algorithms and showing exactly how higher-order intervals are organized.

We now give a best-first algorithm that outputs the K longest elements of B(Λ i M ) without enumerating the entire multiset.

Build the global event list of all births and deaths, sorted by time, with deaths before births at ties (Section 2; cf. [10,9]). Coordinate-compress distinct death times to {1, . . . , N } and maintain a persistent order-statistics tree over this axis, storing counts of alive deaths. During the sweep:

-On a death of bar x, perform an update -1 at the index of d x .

-On a birth of bar r at time b r , before inserting r: store the current snapshot root T r encoding A r , and record c r = Size(T r ); then insert +1 at the index of d r so that r is alive for later anchors.

This costs O(M log M ) time and O(M log M ) persistent nodes.

By Theorem 3, B(Λ i M ) is the multiset union of rank-grouped streams. We run a grouped best-first search where each heap entry represents the current head (r, j) of anchor r’s stream at rank j with key ℓ r (j) and weight w i (j) = j-1 i-2 . This mirrors classic best-first paradigms in top-K aggregation and selection over structured sets (cf. [11,13]).

The entire procedure is given in Algorithm 1 below. This algorithm leverages the rank-grouped structure of B(Λ i M ) to compute the exact Top-K intervals in near output-sensitive time, avoiding full enumeration when Interval identities. Algorithm 1 outputs the Top-K length multiset directly. If actual interval identities are required, each bulk emission at line 12 can be expanded into the explicit (i-1)-subsets of ranks that realize the multiplicity w i (j), truncated once K intervals are produced. This refinement preserves the asymptotic complexity bound for fixed i.

Unbundled (colex) variant. Alternatively, one can represent states as strictly increasing (i-1)-tuples of ranks and expand at most i colex neighbors per pop; this yields the same outputs with an i factor in the loop cost (cf. [13,11]). We focus on the grouped variant for the sharpest bound.

Theorem 4 (Correctness and complexity). For fixed i ≥ 2, Algorithm 1 (the grouped variant) outputs exactly the K longest elements of B(Λ i M ) (with multiplicity) in non-increasing order in

Proof. We first prove correctness. By Theorem 3, the global multiset is the disjoint union of monotone streams {ℓ r (j)} with weights w i (j). The heap stores precisely the current heads of all nonempty streams. Because each stream is non-increasing and the heap key is the head length, once ℓ r (j) is popped no unseen element can exceed it, since every remaining element is bounded by its stream head and every head is in the heap. The algorithm therefore emits all w i (j) copies at once; ties may thus be grouped per anchor, consistent with the tie policy in Section 2. Since the Top-K vector is invariant under permutations of equal values, bulk emission is safe. After advancing that stream, the invariant is preserved. By induction, the outputs are exactly the global Top-K in order.

We prove the complexity statements. Preprocessing costs O(M log M ). During initialization, each anchor with c r ≥ i -1 contributes at most one heap entry, obtained by a single KthFromRight query at j = i -1. Anchors with c r < i -1 contribute none, so the number of initial heap entries is at most M , giving O(M log M ) time overall. Each pop outputs at least one item, so there are at most K pops (or fewer if H becomes empty when the total output is < K). A pop performs O(1) heap operations and a single order-statistics query, each O(log M ), giving O(K log M ) for the loop in the grouped variant and O(iK log M ) in the unbundled variant. Space bounds follow from the heap size O(M ) and the persistent tree. All log M factors are under the RAM model, where basic arithmetic and memory accesses take O(1) time.

In the special case i = 2: since w 2 (j) = 1 for all j, every pop outputs a single element and advances j → j+1, yielding the stated O((M + K) log M ) bound.

We show that the Top-K length vector of B(Λ i M ) varies Lipschitz-continuously (with constant 2) under bottleneck perturbations of the input barcode B(M ). Throughout this section, i ≥ 1 is fixed.

Recall from Section 2 that

denotes the non-increasing Top-K length vector of B(Λ i M ) (with multiplicity), padded with zeros if necessary.

Proof. Let X = B(Λ i M ) and Y = B(Λ i M ′ ). By the stability of exterior powers (Theorem 2), we have d B (X, Y ) ≤ ε. Hence there exists an ε-matching between X ∪ ∆ and Y ∪ ∆ (cf. [7,5]). Let (s 1 ≥ s 2 ≥ • • • ) and (t 1 ≥ t 2 ≥ • • • ) be the non-increasing rearrangements of S and T (padded with zeros to equal length). We claim that |s k -t k | ≤ 2ε for all k. Suppose, for contradiction, that s k > t k +2ε. Then S has at least k elements ≥ s k , so their images under π are ≥ s k -2ε > t k , implying that T has at least k elements strictly greater than t k , which contradicts the definition of t k . The reverse inequality t k > s k + 2ε is symmetric. Hence |s k -t k | ≤ 2ε for all k. Finally, restricting to the first K coordinates gives Proof. We reduce Element Uniqueness (or, equivalently, the decision version of 1D minimum gap) to computing the Top-1 Λ 2 length. Consider an input set {x 1 , . . . , x M } ⊂ [0, 1 2 ] of real numbers (not necessarily distinct). Construct a barcode with bars J r := [ x r , x r + 1 ) (r = 1, . . . , M ). For any pair (r, s), the Λ 2 intersection length equals |J r ∩ J s | = max{ 0, 1 -|x r -x s | }. Since all |x r -x s | ≤ 1 2 < 1, every pair intersects and max r̸ =s |J r ∩J s | = 1-min r̸ =s |x r -x s |. Hence the Top-1 Λ 2 length equals 1 iff there exists a duplicate (min r̸ =s |x r -x s | = 0). Therefore any algorithm that computes the Top-1 Λ 2 length can decide Element Uniqueness. The latter requires Ω(M log M ) comparisons in the algebraic decision tree/comparison model; thus computing the Top-1 Λ 2 length also requires Ω(M log M ) comparisons.

We evaluate the proposed best-first algorithm on synthetic barcodes where we can directly control overlap (i.e., expected concurrency), which in turn controls the output size K all = |B(Λ 2 M )|. Our goals are to (i) validate exactness against a full enumeration baseline, and (ii) quantify wall-clock improvements as a function of K and overlap.

For each trial we sample M bars on [0, 1] as follows: birth times b ∼ Unif[0, 1) and independent exponential lengths L ∼ Exp(λ) truncated to [0, 1-b], with mean parameter set by E[L] = Lmean ∈ {0.03, 0.05}. This yields expected concurrency ≈ M • Lmean, so larger Lmean produces heavier overlap and larger K all .

We focus on i = 2 (Λ 2 ), where the grouped best-first bound is tightest.5 Baseline = Enum-Λ 2 (full pairwise enumeration) + TopK-Select (heap-based selection; cf. classical PH pipelines [10,26] and fast toolchains such as [1]). Ours = TopK-Λ 2 (Algorithm 1). We measure total wall-clock time per query and report the factor speedup := baseline time ours time . Correctness is checked by exact multiset equality of the top-K lengths up to numerical rounding.

We use M ∈ {3000, 5000}, Lmean ∈ {0.03, 0.05}, and K = 10, 000. For each setting, we report a single representative run; in all runs the outputs of TopK-Λ 2 matched Enum-Λ 2 exactly.

Table 1 summarizes the results (wall times in seconds). In the heavy overlap case: when K all is large, TopK-Λ 2 consistently wins (2.1× at K=10 4 for M =3000, Lmean=0.05), because it avoids the baseline’s fixed “enumerate everything” cost. In the moderate overlap case: we have steady gains (1.45×) for M =5000, Lmean=0.03. All runs matched baseline Top-K lengths exactly.

We introduced a birth-anchored, rank-grouped structural decomposition of B(Λ i M ) (Theorem 3) and leveraged it to design a best-first Top-K algorithm (Theorem 4) that returns the exact Top-K multiset without full enumeration. For fixed i ≥ 2 the grouped running time is O((M + K) log M ), while an unbundled colex variant runs in O((M + iK) log M ). We established that the Top-K length vector is 2-Lipschitz with respect to bottleneck perturbations (Theorem 5), and proved a comparison-model lower bound (Proposition 2) showing the O(M log M ) preprocessing is information-theoretically unavoidable. Experiments confirmed the theory: up to 2.1× speedups in high-overlap regimes, with exact outputs. Beyond its theoretical contributions, this work shows that higher-order persistence constructions can be made practically scalable, enabling their systematic use as stable, discriminative features in data analysis, machine learning, and computational geometry (e.g., for molecular graph classification tasks). We also plan to develop variants of our method for applications at the intersection of logic, category theory and machine learning [18][19][20][21][22][23][24]2].

j ← i-1; query dr(j) on Tr; set L ← ℓr(j) via (

Λ denotes exterior product/power. Λ is defined pointwise for persistence modules (i.e., take exterior product at each element of the domain interval; more detail below).

Note that Λ i Mt on the right-hand side denotes the i-th exterior power of the vector space Mt. On morphisms, for each s ≤ t in I, we set (Λ i M ) s≤t = Λ i (M s≤t ) : Λ i (Ms) -→ Λ i (Mt). For persistence homology basics, see also[9,25].

In persistent homology, the birth of a bar is the parameter value at which a homology class first appears, and its death is the value at which that class disappears[9,5].

When multiple intervals have equal length, any ordering of the ties is acceptable, since our results concern the Top-K multiset and the sorted length vector LK (M, i), which are invariant under tie-breaking (cf. top-K aggregation[11]).

Empirical behavior for i = 3 (and higher) matches that for i = 2, providing evidence that the observed gains are not special to the case i = 2.

📄 Read Full PDF on ArXiv

Reference

This content is AI-processed based on open access ArXiv data.

📝 Original Info

📝 Abstract

📄 Full Content

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

📄 Full Content

Reference

Start searching

No results found