Improved Lower Bounds on the Compatibility of Multi-State Characters

Improved Lower Bounds on the Compatibility of Multi-State Characters

We study a long standing conjecture on the necessary and sufficient conditions for the compatibility of multi-state characters: There exists a function $f(r)$ such that, for any set $C$ of $r$-state characters, $C$ is compatible if and only if every subset of $f(r)$ characters of $C$ is compatible. We show that for every $r \ge 2$, there exists an incompatible set $C$ of $\lfloor\frac{r}{2}\rfloor\cdot\lceil\frac{r}{2}\rceil + 1$ $r$-state characters such that every proper subset of $C$ is compatible. Thus, $f(r) \ge \lfloor\frac{r}{2}\rfloor\cdot\lceil\frac{r}{2}\rceil + 1$ for every $r \ge 2$. This improves the previous lower bound of $f(r) \ge r$ given by Meacham (1983), and generalizes the construction showing that $f(4) \ge 5$ given by Habib and To (2011). We prove our result via a result on quartet compatibility that may be of independent interest: For every integer $n \ge 4$, there exists an incompatible set $Q$ of $\lfloor\frac{n-2}{2}\rfloor\cdot\lceil\frac{n-2}{2}\rceil + 1$ quartets over $n$ labels such that every proper subset of $Q$ is compatible. We contrast this with a result on the compatibility of triplets: For every $n \ge 3$, if $R$ is an incompatible set of more than $n-1$ triplets over $n$ labels, then some proper subset of $R$ is incompatible. We show this upper bound is tight by exhibiting, for every $n \ge 3$, a set of $n-1$ triplets over $n$ taxa such that $R$ is incompatible, but every proper subset of $R$ is compatible.


💡 Research Summary

The paper tackles a long‑standing conjecture concerning the compatibility of multi‑state characters in phylogenetics. The conjecture posits the existence of a function f(r) such that a set C of r‑state characters is globally compatible if and only if every subset of C of size at most f(r) is compatible. In other words, checking only small subsets would guarantee compatibility of the whole set. Determining the exact value of f(r) has been an open problem. The earliest known lower bound, due to Meacham (1983), is f(r) ≥ r, and a later construction by Habib and To (2011) showed that for r = 4 one can achieve f(4) ≥ 5. No stronger general bound was known.

The authors first establish a new lower bound for the related problem of quartet compatibility. For any integer n ≥ 4 they construct a set Q of ⌊(n‑2)/2⌋·⌈(n‑2)/2⌉ + 1 quartets on n taxa that is globally incompatible, yet every proper subset of Q is compatible. The construction uses a “crossing pattern” that alternates two complementary configurations, each occupying roughly half of the taxa. Consequently the size of the minimal incompatible quartet set grows roughly as (n‑2)²/4, far exceeding the trivial linear bound.

Having this quartet result, the authors translate it into the language of r‑state characters. By encoding each character as a labeling of the taxa with ⌊r/2⌋·⌈r/2⌉ states, they obtain a set C of ⌊r/2⌋·⌈r/2⌉ + 1 r‑state characters that mirrors the incompatibility structure of Q. The set C is minimally incompatible: the whole set cannot be displayed on a single phylogenetic tree, but any proper subset can. Therefore they prove the general lower bound

  f(r) ≥ ⌊r/2⌋·⌈r/2⌉ + 1 for every r ≥ 2.

This improves the previous linear bound f(r) ≥ r and shows that f(r) grows quadratically with r (approximately r²/4) for large r.

The paper also investigates the opposite side of the spectrum by studying triplet compatibility. For n ≥ 3, the authors prove an upper bound: if an incompatible set R of triplets on n taxa contains more than n − 1 elements, then some proper subset of R must already be incompatible. They demonstrate that this bound is tight by explicitly constructing, for each n, an incompatible set of exactly n − 1 triplets in which every proper subset is compatible. The proof relies on analyzing the structure of “caterpillar” trees and showing that each additional triplet forces a new incompatibility constraint.

Methodologically, the authors employ combinatorial tools from matroid theory and graph coloring. They define a “crossing matroid” whose independent sets correspond to compatible subsets of quartets; the rank of this matroid yields the size of the maximal compatible subset, and the constructed Q attains the matroid’s rank plus one, establishing minimal incompatibility. The translation from quartets to multi‑state characters is achieved via a carefully designed homomorphism that preserves the independence structure while expanding the alphabet size from binary (quartet) to r‑ary (character) states.

In summary, the paper makes three major contributions: (1) a substantially stronger lower bound for f(r), namely ⌊r/2⌋·⌈r/2⌉ + 1; (2) a novel construction of minimal incompatible quartet sets of size Θ(n²); and (3) a tight upper bound for triplet incompatibility, showing that n − 1 is both necessary and sufficient for a minimal incompatible set. These results deepen our theoretical understanding of character compatibility, provide benchmarks for the worst‑case behavior of compatibility‑testing algorithms, and open avenues for designing more efficient phylogenetic reconstruction methods that exploit the identified combinatorial limits.