EM Algorithm for Estimation of the Offspring Distribution in Multitype Branching Processes with Terminal Types

EM Algorithm for Estimation of the Offspring Distribution in Multitype   Branching Processes with Terminal Types
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Multitype branching processes (MTBP) model branching structures, where the nodes of the resulting tree are objects of different types. One field of application of such models in biology is in studies of cell proliferation. A sampling scheme that appears frequently is observing the cell count in several independent colonies at discrete time points (sometimes only one). Thus, the process is not observable in the sense of the whole tree, but only as the “generation” at given moment in time, which consist of the number of cells of every type. This requires an EM-type algorithm to obtain a maximum likelihood (ML) estimation of the parameters of the branching process. A computational approach for obtaining such estimation of the offspring distribution is presented in the class of Markov branching processes with terminal types.


💡 Research Summary

The paper addresses the problem of estimating the offspring distribution in multitype branching processes (MTBPs) that include terminal types, using only generation‑size observations rather than full tree data. Such a situation is common in cell‑proliferation experiments where researchers can count cells of each type at one or several discrete time points, but the underlying genealogical tree remains hidden. The authors formulate a maximum‑likelihood (ML) estimation problem under this incomplete‑data setting and propose an Expectation‑Maximization (EM) algorithm tailored to MTBPs with terminal types.

First, the authors define the MTBP model in both discrete and continuous time. Each particle of type (T_v) produces a random multiset of offspring according to a (d)-variate distribution (p_v(\cdot)). Terminal types (T_{T_j}) are defined as absorbing: once a particle becomes terminal it never reproduces, yet it continues to be observed in subsequent generations. This captures the biological reality that dead cells remain visible under a microscope.

The observable data consist of the vector of type counts at a fixed time (t) (or after a fixed number of generations). The hidden data are the full branching trees that generated those counts. The complete‑data log‑likelihood is a simple sum of log‑probabilities of each production rule weighted by its occurrence count in the tree. The incomplete‑data likelihood is obtained by marginalising over all possible trees, which is computationally intractable.

The EM framework circumvents this difficulty. In the E‑step, given current parameter estimates (p^{(i)}), the algorithm computes the expected number of times each production rule (T_v\to A) occurs, denoted (E_{p^{(i)}}


Comments & Academic Discussion

Loading comments...

Leave a Comment