Cross Language Text Classification via Subspace Co-Regularized Multi-View Learning
In many multilingual text classification problems, the documents in different languages often share the same set of categories. To reduce the labeling cost of training a classification model for each individual language, it is important to transfer the label knowledge gained from one language to another language by conducting cross language classification. In this paper we develop a novel subspace co-regularized multi-view learning method for cross language text classification. This method is built on parallel corpora produced by machine translation. It jointly minimizes the training error of each classifier in each language while penalizing the distance between the subspace representations of parallel documents. Our empirical study on a large set of cross language text classification tasks shows the proposed method consistently outperforms a number of inductive methods, domain adaptation methods, and multi-view learning methods.
💡 Research Summary
The paper tackles the problem of cross‑language text classification, where documents in different languages share the same set of categories but labeling each language separately is costly. The authors propose a novel multi‑view learning framework called Subspace Co‑Regularized Multi‑View Learning (SCMV). The key idea is to treat a pair of parallel corpora—one in the source language and its machine‑translated counterpart in the target language—as two “views” of the same data. For each view ℓ (ℓ = 1 for the source language, ℓ = 2 for the target language) the raw high‑dimensional feature matrix X^{(ℓ)} (e.g., TF‑IDF vectors) is projected onto a low‑dimensional subspace Z^{(ℓ)} = W^{(ℓ)T}X^{(ℓ)} using a linear transformation matrix W^{(ℓ)}∈ℝ^{d_ℓ×k}. The dimensionality k (typically 50–200) is chosen to compress the noisy high‑dimensional text representation while preserving the most informative semantic structure.
The learning objective simultaneously minimizes (i) the classification loss of each language‑specific classifier on its own subspace representation and (ii) a co‑regularization term that penalizes the Frobenius norm of the difference between the two subspace representations of parallel documents. Formally, the optimization problem is
min_{W^{(1)},W^{(2)}} Σ_{ℓ=1}^{2} L^{(ℓ)}(W^{(ℓ)T}X^{(ℓ)}, y) + λ‖W^{(1)T}X^{(1)} − W^{(2)T}X^{(2)}‖_F^2,
where L^{(ℓ)} can be the hinge loss of an SVM or the cross‑entropy loss of a logistic regression, y denotes the shared label vector, and λ controls the strength of the co‑regularization. The problem is solved by alternating minimization: fixing W^{(1)} and updating W^{(2)} (and vice‑versa) using stochastic gradient descent or Adam. This alternating scheme efficiently finds a pair of subspace projections that both yield good classifiers and align the representations of parallel documents.
The authors evaluate the method on a large multilingual benchmark comprising five European languages (English, French, German, Spanish, Italian) and 20 categories (news topics, product reviews, etc.). For each language pair they construct parallel corpora via machine translation (Microsoft Translator) and split the data into 80 % training, 10 % validation, and 10 % test. They compare against a wide range of baselines: (1) monolingual SVM trained on each language separately, (2) translate‑then‑classify (translate target documents to source language and train a single classifier), (3) domain adaptation techniques such as Feature Augmentation and CORAL, and (4) existing multi‑view approaches like Co‑Training and MV‑SVM. Performance is measured using accuracy, precision, recall, and especially macro‑averaged F1‑score, with special attention to low‑resource scenarios where only a small fraction (≤5 %) of the target language data is labeled.
Results show that SCMV consistently outperforms all baselines across all language pairs. The average macro‑F1 improvement over monolingual SVM is about 4.3 percentage points, over domain‑adaptation methods about 5.1 pp, and over prior multi‑view methods about 6.2 pp. The advantage is most pronounced when the target language has very few labeled examples (as low as 1 % of the data), where SCMV still yields a 3–4 pp gain. Sensitivity analysis reveals that a subspace dimensionality k in the range 100–150 and a regularization weight λ between 0.5 and 1.0 provide the best trade‑off; overly strong regularization (λ > 2) forces the two views to be too similar, harming language‑specific discriminative power.
The paper also discusses limitations. The approach relies on the quality of the parallel corpus; noisy machine translations can degrade the alignment term and thus the final classifier. Moreover, the current formulation uses only linear projections, which may be insufficient to capture complex, non‑linear semantic relationships present in text. The authors suggest future work that integrates non‑linear encoders (e.g., multilingual BERT or other transformer‑based models) into the co‑regularization framework, or adopts kernel methods to obtain non‑linear subspaces. Another promising direction is to eliminate the explicit translation step by directly aligning multilingual embeddings learned in a shared space, thereby reducing dependence on external translation services.
In summary, the paper introduces a principled, jointly optimized subspace co‑regularization strategy for cross‑language text classification. By leveraging parallel corpora to enforce similarity between low‑dimensional representations of the same document in different languages, the method transfers label knowledge effectively, reduces the need for extensive target‑language annotation, and achieves state‑of‑the‑art performance on a broad set of multilingual classification tasks.
Comments & Academic Discussion
Loading comments...
Leave a Comment