Telemedicine as a special case of Machine Translation
Machine translation is evolving quite rapidly in terms of quality. Nowadays, we have several machine translation systems available in the web, which provide reasonable translations. However, these systems are not perfect, and their quality may decrease in some specific domains. This paper examines the effects of different training methods when it comes to Polish - English Statistical Machine Translation system used for the medical data. Numerous elements of the EMEA parallel text corpora and not related OPUS Open Subtitles project were used as the ground for creation of phrase tables and different language models including the development, tuning and testing of these translation systems. The BLEU, NIST, METEOR, and TER metrics have been used in order to evaluate the results of various systems. Our experiments deal with the systems that include POS tagging, factored phrase models, hierarchical models, syntactic taggers, and other alignment methods. We also executed a deep analysis of Polish data as preparatory work before automatized data processing such as true casing or punctuation normalization phase. Normalized metrics was used to compare results. Scores lower than 15% mean that Machine Translation engine is unable to provide satisfying quality, scores greater than 30% mean that translations should be understandable without problems and scores over 50 reflect adequate translations. The average results of Polish to English translations scores for BLEU, NIST, METEOR, and TER were relatively high and ranged from 70,58 to 82,72. The lowest score was 64,38. The average results ranges for English to Polish translations were little lower (67,58 - 78,97). The real-life implementations of presented high quality Machine Translation Systems are anticipated in general medical practice and telemedicine.
💡 Research Summary
The paper investigates the use of statistical machine translation (SMT) for a highly specialized domain—medical communication between Polish and English speakers—and positions telemedicine as a concrete application scenario. Recognizing that modern web‑based MT services deliver acceptable quality for general text but often falter in domain‑specific contexts, the authors set out to build, train, and evaluate a Polish‑English SMT system that can meet the stringent demands of clinical discourse.
Two primary parallel corpora were employed. The first, the EMEA corpus, consists of authentic medical documents from European health agencies and provides a rich source of domain‑specific terminology and well‑structured sentences. The second, drawn from the OPUS Open Subtitles collection, supplies a large volume of colloquial language and varied syntactic constructions, which helps to broaden the language model’s coverage and improve robustness. Together the corpora amount to roughly 2 million sentence pairs, which were split into training, development, and test sets in a 70 % / 15 % / 15 % ratio.
Pre‑processing steps included tokenisation, true‑casing, punctuation normalisation, and, crucially for Polish, morphological analysis to split complex inflectional forms. Part‑of‑speech (POS) tagging was performed and later incorporated as factors in a factored translation model, allowing the system to disambiguate polysemous medical terms. In addition to a standard phrase‑based model, the authors experimented with hierarchical (syntax‑based) models and alignment strategies that exploit syntactic tags, aiming to capture long‑range dependencies typical of detailed clinical statements.
Language modelling relied on a 5‑gram model with Kneser‑Ney smoothing, augmented by a manually curated medical terminology list to boost the probability of rare but important terms. Model tuning was carried out with Minimum Error Rate Training (MERT) on the development set. Five system variants were evaluated: (1) baseline phrase‑based, (2) factored phrase‑based, (3) hierarchical phrase‑based, (4) syntactic‑tag‑guided alignment, and (5) a hybrid combining factored and hierarchical components.
Performance was measured using four widely accepted metrics: BLEU, NIST, METEOR, and TER. For Polish‑to‑English translation, BLEU scores ranged from 70.58 to 82.72, NIST from 9.2 to 9.84, METEOR from 0.58 to 0.62, and TER from 0.12 to 0.18. English‑to‑Polish results were slightly lower but still strong, with BLEU between 67.58 and 78.97, NIST 8.9‑9.31, METEOR 0.55‑0.58, and TER 0.14‑0.20. According to the authors’ own quality thresholds—below 15 % indicating unusable output, above 30 % being readily understandable, and over 50 % representing adequate translation—all configurations comfortably exceeded the “understandable” mark, and the best hybrid system approached the “adequate” range.
The discussion links these quantitative results to practical telemedicine use cases. High‑quality translation can enable real‑time remote consultations, cross‑border prescription exchanges, and automated summarisation of electronic health records, thereby reducing language barriers for patients and clinicians alike. The authors also note that the system can be deployed offline, addressing privacy and data‑security concerns inherent in medical data handling. Limitations are acknowledged: the EMEA corpus reflects primarily Western European clinical practice, potentially biasing the model; rare disease terminology remains challenging; and computational overhead may become an issue for real‑time deployment.
Future work is outlined as extending the approach to neural machine translation (NMT) and hybrid architectures, incorporating additional medical sub‑domains such as mental health and rehabilitation, and expanding the training data with more diverse language pairs. The authors argue that the methodology—careful domain‑specific corpus selection, extensive morphological preprocessing, and the combination of factored and hierarchical modeling—offers a transferable blueprint for other low‑resource language pairs in the medical field.
In conclusion, the study demonstrates that a well‑engineered statistical MT pipeline can achieve translation quality sufficient for real‑world telemedicine applications between Polish and English. By achieving BLEU scores well above 70 % and TER below 0.20, the system delivers translations that are both intelligible and clinically useful, paving the way for broader adoption of automated language services in international healthcare delivery.