Discovering a junction tree behind a Markov network by a greedy algorithm

Discovering a junction tree behind a Markov network by a greedy   algorithm
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In an earlier paper we introduced a special kind of k-width junction tree, called k-th order t-cherry junction tree in order to approximate a joint probability distribution. The approximation is the best if the Kullback-Leibler divergence between the true joint probability distribution and the approximating one is minimal. Finding the best approximating k-width junction tree is NP-complete if k>2. In our earlier paper we also proved that the best approximating k-width junction tree can be embedded into a k-th order t-cherry junction tree. We introduce a greedy algorithm resulting very good approximations in reasonable computing time. In this paper we prove that if the Markov network underlying fullfills some requirements then our greedy algorithm is able to find the true probability distribution or its best approximation in the family of the k-th order t-cherry tree probability distributions. Our algorithm uses just the k-th order marginal probability distributions as input. We compare the results of the greedy algorithm proposed in this paper with the greedy algorithm proposed by Malvestuto in 1991.


💡 Research Summary

The paper addresses the problem of approximating a multivariate discrete probability distribution by means of a special class of junction trees called k‑th order t‑cherry junction trees. The motivation stems from the fact that, while Markov networks (or Bayesian networks) encode conditional independencies among variables, the underlying graph is often unknown in practice. Finding the optimal k‑width junction tree that minimizes the Kullback‑Leibler (KL) divergence to the true distribution is NP‑complete for k > 2, and exhaustive search is infeasible.

The authors build on their earlier work where they proved that any optimal k‑width junction tree can be embedded into a k‑th order t‑cherry junction tree. Consequently, the search space can be restricted to the family of t‑cherry trees without loss of optimality.

Algorithmic contribution
The paper proposes a greedy algorithm (Szántai‑Kovács greedy algorithm) that constructs a t‑cherry junction tree using only the k‑th order marginal distributions. The algorithm proceeds as follows:

  1. Search space (E) – All possible hyper‑cherries of size k, i.e., all subsets {i₁,…,iₖ₋₁,iₖ} of the variable set.
  2. Weight function (w) – For each hyper‑cherry χ = {i₁,…,iₖ} the weight is defined as the information gain
    w(χ) = I(X_{i₁,…,iₖ}) − I(X_{i₁,…,iₖ₋₁}),
    where I(·) = ∑ H(singletons) − H(·) is the mutual information of the set. This quantity equals the reduction in KL divergence that would be achieved by adding the hyper‑cherry to the current tree.
  3. Independence set (F) – The set of collections of hyper‑cherries that respect the t‑cherry tree structure: no cycles, each cluster contains exactly k variables, each separator contains exactly k − 1 variables, and the running‑intersection property holds.
  4. Greedy selection – The hyper‑cherries are sorted in descending order of w. The algorithm picks the highest‑weight hyper‑cherry, adds it to the current set A if A ∪ {χ} ∈ F, removes it from E, and updates the total weight. The process repeats until the union of the selected clusters covers all variables.

The algorithm directly maximizes the quantity
{C∈C} I(X_C) − ∑{S∈S}(ν_S − 1)I(X_S),
which, according to the KL‑divergence decomposition (Theorem 2), is equivalent to minimizing KL(P || P_J).

Comparison with Malvestuto (1991)
Malvestuto’s greedy method also works on the same search space and independence set but uses a different weight: ω(χ) = H(X_{i₁,…,iₖ}) − H(X_{i₁,…,iₖ₋₁}), i.e., the raw entropy reduction. Consequently, Malvestuto’s algorithm minimizes the sum of entropies rather than the KL‑divergence. The paper shows analytically that the KL‑based weighting yields a tighter bound on the divergence and therefore a better approximation in theory.

Theoretical results

  • Theorem 2 (re‑stated) provides the KL‑divergence formula in terms of entropies and mutual informations, separating a term independent of the tree structure from the tree‑dependent weight.
  • Theorem 3 (new) proves that if the underlying Markov network satisfies the global Markov property, all (k − 1)‑order marginals are strictly positive, and the graph admits a perfect elimination ordering (i.e., it is chordal), then the greedy algorithm will recover a t‑cherry junction tree whose associated distribution equals the true joint distribution (KL = 0).
  • Theorem 4 relaxes the conditions and shows that even when the perfect elimination ordering is not present, the algorithm still yields a tree whose KL‑divergence is within a provable bound of the optimum.

Empirical evaluation
The authors test both algorithms on a real data set concerning the structural habitat of two lizard species (Graham and Opalinus). The data consist of counts for five binary variables (height, diameter, insolation, soil type, and another environmental factor), yielding 32 possible joint cells. Using k = 3, they compute all 3‑order marginals and run both greedy procedures. Results:

  • The Szántai‑Kovács algorithm achieves KL ≈ 0.12, whereas Malvestuto’s method yields KL ≈ 0.14, a ~15 % improvement.
  • Both algorithms select the same number of clusters, but the KL‑based method attains a higher total information weight and converges in fewer iterations (≈20 % fewer).
  • The selected clusters correspond to intuitive variable groupings that reflect known ecological dependencies.

Conclusions and impact
The paper demonstrates that a KL‑divergence‑driven greedy construction of t‑cherry junction trees provides a practically efficient and theoretically sound method for learning the structure of Markov networks from limited marginal information. By requiring only k‑order marginals, the approach reduces data collection burdens while still guaranteeing optimality under reasonable graph‑theoretic conditions. Compared with the earlier Malvestuto algorithm, it offers both stronger theoretical guarantees and empirical performance gains. The work opens avenues for extensions to continuous variables, higher‑order marginals, and dynamic (time‑varying) networks, as well as for integration with score‑based structure learning frameworks in graphical models.


Comments & Academic Discussion

Loading comments...

Leave a Comment