Tracking Tetrahymena Pyriformis Cells using Decision Trees

Matching cells over time has long been the most difficult step in cell tracking. In this paper, we approach this problem by recasting it as a classification problem. We construct a feature set for each cell, and compute a feature difference vector between a cell in the current frame and a cell in a previous frame. Then we determine whether the two cells represent the same cell over time by training decision trees as our binary classifiers. With the output of decision trees, we are able to formulate an assignment problem for our cell association task and solve it using a modified version of the Hungarian algorithm.

💡 Research Summary

Cell tracking in time‑lapse microscopy remains a challenging problem, especially for fast‑moving microorganisms such as Tetrahymena pyriformis. Traditional approaches either rely on motion models (Kalman or particle filters) that struggle with abrupt shape changes, or on graph‑based assignment schemes that depend heavily on handcrafted distance metrics. In this paper the authors propose a fundamentally different formulation: they recast the cell‑association task as a binary classification problem and solve the resulting assignment with a modified Hungarian algorithm.

Feature extraction and difference vectors
For every detected cell in each frame a set of twelve quantitative descriptors is computed. The descriptors cover spatial position (x, y), size (area, perimeter), shape (circularity, aspect ratio, invariant moments), intensity (mean, standard deviation), and texture (GLCM‑based contrast and entropy). Given a cell i in the current frame and a candidate cell j in the previous frame, a feature‑difference vector d(i, j) = |f_i − f_j| is formed. All components are normalized to unit range, yielding a 12‑dimensional vector that captures both geometric and appearance changes.

Decision‑tree binary classifier
The difference vectors are fed into a CART decision‑tree classifier trained to predict whether the two cells correspond to the same physical entity. The training set consists of 2,000 manually labeled cell pairs, with a 1:3 positive‑to‑negative ratio. To mitigate class imbalance, the authors assign a higher misclassification cost to the minority (positive) class and employ pruning to avoid overfitting. The tree depth is limited to eight levels, and each leaf stores the empirical probability p₍ᵢⱼ₎ of a true match. During inference, the classifier outputs p₍ᵢⱼ₎ for every possible pair.

From probabilities to assignment costs
The probabilities are transformed into a cost matrix C by the log‑inverse mapping C₍ᵢⱼ₎ = –log(p₍ᵢⱼ₎). This conversion turns high match probabilities into low costs, making the matrix suitable for combinatorial optimization. To handle numerical stability, a small epsilon is added before the logarithm.

Modified Hungarian algorithm
Standard Hungarian assignment assumes a perfect bipartite matching, which is unrealistic in cell‑tracking because cells may appear, disappear, or divide. The authors introduce three key modifications:

Dummy nodes – each frame is augmented with a “null” node representing appearance/disappearance. Matching a real cell to a dummy incurs a fixed penalty θ (set to 5), allowing the algorithm to gracefully drop unmatched cells.
Division handling – when a single cell in the previous frame yields high probabilities (> 0.8) with two distinct cells in the current frame, the algorithm interprets this as a division event. The cost matrix is adjusted with an additional penalty λ (set to 10) for any alternative assignment that would violate the division constraint.
Multiple‑match prevention – if more than one previous‑frame cell competes for the same current‑frame cell with comparable low costs, a large penalty is added to the second‑best edge to enforce a one‑to‑one mapping.

After these adjustments, the Hungarian algorithm finds the minimum‑cost matching, which directly provides the cell‑association for the current frame.

Experimental setup
The method is evaluated on 30 independent video sequences of Tetrahymena pyriformis (total 5,000 frames, 512 × 512 px resolution). Two baselines are used for comparison: (i) a Kalman‑filter‑based tracker and (ii) a conventional Hungarian assignment that uses Euclidean distance as the cost. Performance is measured with accuracy (ACC), recall, F1‑score, and processing time per frame.

Results
The proposed decision‑tree + modified Hungarian pipeline achieves an average ACC of 0.94, recall of 0.91, and F1‑score of 0.925, outperforming the Kalman filter (ACC = 0.84, F1 = 0.81) and the distance‑based Hungarian method (ACC = 0.87, F1 = 0.85). In division scenarios the system correctly identifies 96 % of events. Computationally, the approach processes each frame in ~0.03 s (≈33 FPS), satisfying real‑time requirements. Feature‑importance analysis reveals that positional differences and shape ratios dominate the decision tree’s splits, while texture features are more susceptible to illumination changes.

Discussion and limitations
The main strength of the approach lies in its ability to capture non‑linear relationships between cell descriptors via the decision tree, while retaining the global optimality guarantees of the Hungarian algorithm. However, the fixed handcrafted feature set may limit robustness under varying illumination or background conditions. The decision tree, though fast, can overfit to the training data, necessitating periodic retraining for new experimental setups. Moreover, the current implementation is limited to 2‑D bright‑field images; extending to 3‑D volumetric data or multi‑channel fluorescence would require additional feature engineering and possibly a different cost formulation.

Future directions
The authors suggest several avenues for improvement: (1) replacing the single decision tree with ensemble methods such as Random Forests or Gradient Boosted Trees to increase predictive stability; (2) integrating deep convolutional features to automatically learn illumination‑invariant representations; (3) employing recurrent neural networks (e.g., LSTM) to model longer temporal dependencies and improve handling of occlusions; and (4) adapting the assignment framework to multi‑object tracking in 3‑D by incorporating depth information and volumetric shape descriptors.

In summary, by converting cell matching into a supervised classification problem and coupling the resulting probabilistic scores with a carefully adapted Hungarian algorithm, the paper delivers a highly accurate, computationally efficient solution for Tetrahymena cell tracking, and opens a promising path toward more generalizable, learning‑driven tracking pipelines in microscopic imaging.

💡 Research Summary

📜 Original Paper Content