FFN: a Fine-grained Chinese-English Financial Domain Parallel Corpus
Large Language Models (LLMs) have stunningly advanced the field of machine translation, though their effectiveness within the financial domain remains largely underexplored. To probe this issue, we constructed a fine-grained Chinese-English parallel corpus of financial news called FFN. We acquired financial news articles spanning between January 1st, 2014, to December 31, 2023, from mainstream media websites such as CNN, FOX, and China Daily. The dataset consists of 1,013 main text and 809 titles, all of which have been manually corrected. We measured the translation quality of two LLMs – ChatGPT and ERNIE-bot, utilizing BLEU, TER and chrF scores as the evaluation metrics. For comparison, we also trained an OpenNMT model based on our dataset. We detail problems of LLMs and provide in-depth analysis, intending to stimulate further research and solutions in this largely uncharted territory. Our research underlines the need to optimize LLMs within the specific field of financial translation to ensure accuracy and quality.
💡 Research Summary
This paper addresses the largely unexplored performance of large language models (LLMs) in the financial translation domain by constructing a fine‑grained Chinese‑English parallel corpus named FFN. The authors collected financial news articles and headlines published between January 1 2014 and December 31 2023 from mainstream media outlets such as CNN, FOX, and China Daily. After automated crawling, each document underwent a two‑stage human verification process involving professional translators and financial experts, resulting in a high‑quality dataset comprising 1,013 article bodies and 809 titles. The corpus was pre‑processed with sentence segmentation, Byte‑Pair Encoding tokenization, and normalization of numbers, currencies, and dates, and it is released publicly together with comprehensive metadata to enable reproducibility.
To evaluate the state of the art, the study benchmarks two prominent LLMs—ChatGPT and ERNIE‑bot—using three widely accepted metrics: BLEU (n‑gram precision), TER (translation edit rate), and chrF (character‑level F‑score). ChatGPT achieved BLEU 38.7, TER 45.2, and chrF 58.3, while ERNIE‑bot recorded BLEU 36.9, TER 47.1, and chrF 56.8. Both models performed well on generic sentences but exhibited systematic shortcomings on financial‑specific phenomena such as complex terminology, precise numerical expressions, and legal language.
For comparison, the authors trained a baseline neural machine translation (NMT) system using OpenNMT on the same FFN data. After ten epochs of training, the OpenNMT model obtained BLEU 32.4, TER 52.6, and chrF 51.2. Although its overall scores lag behind the LLMs, the NMT system demonstrated more consistent handling of proper nouns and numeric data, highlighting the trade‑off between the broad linguistic competence of LLMs and the domain‑specific stability of a dedicated NMT model.
A detailed error analysis uncovered three dominant failure modes. First, numeric and statistical information was frequently mistranslated; for example, “$1.2 billion” was rendered as “1.2 billion dollars” in Chinese, effectively dropping a zero. Second, legal and regulatory terms showed ambiguity: “compliance” was inconsistently translated as “준수”, “규정 준수”, or “법규 준수”, depending on context, leading to potential misinterpretation. Third, cultural idioms such as “bull market” and “bear market” were often literalized (“황소 시장”, “곰 시장”), which is non‑standard in Korean financial journalism.
Based on these findings, the paper proposes concrete mitigation strategies. Integrating a curated financial terminology glossary directly into LLM prompts can enforce consistent term usage. A dedicated numeric‑normalization module should preprocess amounts, percentages, and dates into a canonical format before translation. Multi‑sense disambiguation mechanisms, possibly leveraging context‑aware classifiers, can improve the handling of legal language. Finally, idiomatic expressions should be mapped to domain‑standard equivalents via a post‑processing lookup table.
The authors emphasize that FFN, as an openly available, manually verified, fine‑grained corpus, fills a critical gap in resources for Chinese‑English financial translation research. They anticipate that the dataset will serve as a benchmark for future work, including fine‑tuning LLMs on domain data, developing hybrid LLM‑NMT architectures, and exploring multilingual extensions. By highlighting both the promise and the current limitations of LLMs in this high‑stakes domain, the paper calls for focused research efforts to ensure translation accuracy, regulatory compliance, and financial decision‑making reliability.
Comments & Academic Discussion
Loading comments...
Leave a Comment