Separate Training for Conditional Random Fields Using Co-occurrence Rate Factorization

The standard training method of Conditional Random Fields (CRFs) is very slow for large-scale applications. As an alternative, piecewise training divides the full graph into pieces, trains them independently, and combines the learned weights at test time. In this paper, we present \emph{separate} training for undirected models based on the novel Co-occurrence Rate Factorization (CR-F). Separate training is a local training method. In contrast to MEMMs, separate training is unaffected by the label bias problem. Experiments show that separate training (i) is unaffected by the label bias problem; (ii) reduces the training time from weeks to seconds; and (iii) obtains competitive results to the standard and piecewise training on linear-chain CRFs.

💡 Research Summary

The paper tackles the well‑known scalability bottleneck of Conditional Random Fields (CRFs) when applied to large‑scale sequence labeling problems. Traditional maximum‑likelihood training of CRFs requires computing a global partition function that sums over all possible label sequences; this operation grows exponentially with the number of labels and linearly with sequence length, making training prohibitively slow for datasets with millions of instances or hundreds of label types. Piecewise training, a popular workaround, splits the graph into small sub‑graphs (typically edges), trains each independently, and later combines the learned parameters. While piecewise training reduces computational cost, it introduces approximation errors because interactions between sub‑graphs are ignored, and it does not resolve the label‑bias problem that plagues models such as MEMMs.

To overcome these limitations, the authors introduce a novel factorization called Co‑occurrence Rate Factorization (CR‑F). The key idea is to express the joint distribution of any two variables X and Y through their co‑occurrence rate

💡 Research Summary

📜 Original Paper Content