Learning Bayesian Network Parameters with Prior Knowledge about Context-Specific Qualitative Influences

We present a method for learning the parameters of a Bayesian network with prior knowledge about the signs of influences between variables. Our method accommodates not just the standard signs, but provides for context-specific signs as well. We show how the various signs translate into order constraints on the network parameters and how isotonic regression can be used to compute order-constrained estimates from the available data. Our experimental results show that taking prior knowledge about the signs of influences into account leads to an improved fit of the true distribution, especially when only a small sample of data is available. Moreover, the computed estimates are guaranteed to be consistent with the specified signs, thereby resulting in a network that is more likely to be accepted by experts in its domain of application.

💡 Research Summary

This paper addresses the problem of learning the conditional probability tables (CPTs) of a Bayesian network (BN) when only a limited amount of data is available but domain experts can supply qualitative knowledge about the direction of influence between variables. The authors extend the usual notion of sign constraints (positive, negative, or neutral influence) to include context‑specific signs, i.e., signs that hold only for particular configurations of a node’s parents. They show how each qualitative statement can be translated into an order constraint on the corresponding CPT entries: a positive influence of a parent X on a child Y requires that for any fixed configuration of the other parents, P(Y=1|X=1,…) ≥ P(Y=1|X=0,…); a negative influence reverses the inequality, and a zero influence forces equality. When a sign is context‑specific, the inequality is imposed only for the parent configurations that satisfy the context condition.

All such inequalities together define a partial order over the set of CPT parameters. The central methodological contribution is to estimate the CPTs under these order constraints by means of isotonic regression. Isotonic regression finds the parameter values that are closest (in a least‑squares sense) to the unconstrained maximum‑likelihood estimates while respecting the prescribed order. The authors construct a directed acyclic graph whose nodes represent individual CPT entries and whose edges encode the “≥” or “≤” relationships derived from the qualitative knowledge. They then apply the Pool Adjacent Violators Algorithm (PAVA) – a classic, linear‑time isotonic regression procedure – to this graph, effectively pooling together any adjacent violations and assigning them a common value that satisfies the constraints.

The approach is evaluated in two experimental settings. First, synthetic BNs of varying size and topology are generated, and random sign constraints (including context‑specific ones) are imposed. Samples of sizes 5, 10, 20, 50, and 100 are drawn, and the isotonic estimates are compared with ordinary maximum‑likelihood estimates (MLE) that ignore the constraints. Performance is measured by Kullback‑Leibler (KL) divergence from the true distribution, log‑likelihood on a held‑out test set, and the proportion of sign violations. Results show that when the sample size is small (≤20), the constrained estimates dramatically reduce KL divergence (often by more than 30 %) and improve test log‑likelihood, while guaranteeing zero sign violations. As the sample size grows, the constrained estimates converge to the unconstrained MLE, confirming that the method does not introduce bias when sufficient data are present.

Second, a real‑world medical dataset (e.g., a cardiovascular risk assessment database) is used. Clinicians provide a set of qualitative influences, many of which are context‑specific (e.g., “smoking positively influences disease risk unless the patient is on medication”). Incorporating these constraints via isotonic regression yields CPTs that both fit the data better (log‑likelihood improvements of 5–12 %) and align with expert expectations, leading to higher acceptance of the resulting BN by the clinicians.

The paper’s contributions can be summarized as follows:

A formal mapping from qualitative, possibly context‑specific, influence statements to a system of linear order constraints on CPT parameters.
An efficient isotonic regression algorithm that solves the constrained maximum‑likelihood problem in time linear in the number of CPT entries.
Empirical evidence that the method improves parameter estimation accuracy, especially in low‑sample regimes, and produces models that are consistent with expert knowledge.

Limitations are acknowledged. Incorrect or overly aggressive sign specifications can bias the estimates, and the number of constraints grows combinatorially with the number of parents, potentially increasing computational overhead. The authors suggest future work on probabilistic treatment of uncertain signs (e.g., placing a prior over the sign itself) and on integrating sign constraints into structure learning so that both graph topology and parameters are guided by qualitative knowledge.

In conclusion, the study demonstrates that incorporating domain experts’ qualitative, context‑specific insights into Bayesian network learning via isotonic regression is both theoretically sound and practically beneficial, offering a robust solution for situations where data are scarce but expert knowledge is abundant.