GemDetox at TextDetox CLEF 2025: Enhancing a Massively Multilingual Model for Text Detoxification on Low-resource Languages

GemDetox at TextDetox CLEF 2025: Enhancing a Massively Multilingual Model for Text Detoxification on Low-resource Languages
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

As social-media platforms emerge and evolve faster than the regulations meant to oversee them, automated detoxification might serve as a timely tool for moderators to enforce safe discourse at scale. We here describe our submission to the PAN 2025 Multilingual Text Detoxification Challenge, which rewrites toxic single-sentence inputs into neutral paraphrases across 15 typologically diverse languages. Building on a 12B-parameter Gemma-3 multilingual transformer, we apply parameter-efficient LoRA SFT fine-tuning and prompting techniques like few-shot and Chain-of-Thought. Our multilingual training corpus combines 3,600 human-authored parallel pairs, 21,600 machine-translated synthetic pairs, and model-generated pairs filtered by Jaccard thresholds. At inference, inputs are enriched with three LaBSE-retrieved neighbors and explicit toxic-span annotations. Evaluated via Style Transfer Accuracy, LaBSE-based semantic preservation, and xCOMET fluency, our system ranks first on high-resource and low-resource languages. Ablations show +0.081 joint score increase from few-shot examples and +0.088 from basic CoT prompting. ANOVA analysis identifies language resource status as the strongest predictor of performance ($η^2$ = 0.667, p < 0.01).


💡 Research Summary

GemDetox presents a comprehensive solution to the PAN 2025 Multilingual Text Detoxification Challenge, which requires rewriting toxic single‑sentence inputs into neutral paraphrases across 15 typologically diverse languages. The authors build upon the 12 B‑parameter Gemma‑3 multilingual transformer, applying parameter‑efficient LoRA (Low‑Rank Adaptation) fine‑tuning and sophisticated prompting techniques, namely few‑shot in‑context examples and Chain‑of‑Thought (CoT) prompting.

Data Construction
The training corpus merges three sources: (i) the organizer‑provided ParaDetox‑9 parallel data (3 600 human‑authored toxic‑neutral pairs across nine high‑resource languages), (ii) machine‑translated extensions for six low‑resource languages (Italian, French, Hebrew, Hinglish, Japanese, Tatar) using NLLB‑200 (3.3 B) and a dedicated Hinglish translator, yielding 21 600 synthetic pairs, and (iii) synthetic pairs generated from a multilingual toxicity dataset. For the synthetic component, a strong detoxification model creates neutral candidates for toxic sentences; these are filtered by character‑level Jaccard similarity (5‑gram) and LaBSE semantic similarity thresholds (≥ 0.85 for MT pairs, ≥ 0.80 for synthetic pairs). After deduplication, script filtering (Hinglish), and semantic preservation checks, the final corpus contains roughly 18 000 high‑quality toxic‑neutral pairs covering all 15 languages. Each toxic sentence is enriched with three nearest LaBSE neighbors and explicit toxic‑span annotations.

Model Adaptation
Gemma‑3‑12B‑Instruct is quantised to 4‑bit integers and run with BF16 activations on Hopper‑class GPUs, keeping memory under 24 GB. LoRA adapters (rank = 16, scaling = 16, no dropout) are inserted into every attention and MLP sub‑module, freezing the remaining 99.45 % of parameters; only about 65 M trainable parameters (0.55 % of the model) are updated.

Prompt Engineering
A four‑step CoT system message guides the model to: (1) identify toxic elements, (2) retrieve the core semantic content, (3) rewrite using neutral vocabulary, and (4) verify non‑toxicity. The base prompt is authored in English, manually verified, then translated into each target language using OpenAI’s o4‑mini‑high model. Training instances are formatted as a three‑turn dialogue (system, user with language tag and toxic sentence, assistant output). To provide stronger supervision, the prompt is prefixed with three language‑specific few‑shot examples selected via LaBSE similarity. Model outputs are emitted in a standardized JSON structure; during training, system and user tokens are masked so loss is computed only on the assistant span.

Inference Procedure
At test time, the input is concatenated with its three LaBSE neighbors and toxic‑span tags. The model generates three candidate neutralizations; each candidate is scored against the reference neutral sentence using Jaccard similarity, and the highest‑scoring candidate is selected as the final output.

Evaluation
The shared‑task evaluation uses three normalized metrics: Style Transfer Accuracy (STA), Content Preservation (SIM) based on a weighted LaBSE cosine similarity, and Fluency (FL) measured by xCOMET. The joint score is the average of the three. GemDetox achieves first place on both high‑resource and low‑resource language tracks. Ablation studies reveal that adding few‑shot examples improves the joint score by +0.081, while basic CoT prompting adds +0.088. An ANOVA analysis shows language resource status explains 66.7 % of performance variance (η² = 0.667, p < 0.01), underscoring the importance of data augmentation for low‑resource languages.

Limitations and Future Work
The authors acknowledge that subtle, culturally‑specific toxicity may still evade detection, and that Jaccard‑based filtering could limit lexical diversity. Future directions include richer semantic filters (potentially multimodal), incorporation of human feedback via RLHF, and exploration of larger context windows to better capture nuanced toxicity.

Conclusion
By combining a massive multilingual LLM with lightweight LoRA fine‑tuning, extensive multilingual synthetic data, and carefully crafted CoT/few‑shot prompting, GemDetox delivers state‑of‑the‑art performance on multilingual text detoxification. The work demonstrates that even for low‑resource languages, strategic data augmentation and prompt design can close the performance gap, offering a practical, scalable pipeline for real‑world moderation systems.


Comments & Academic Discussion

Loading comments...

Leave a Comment