A technical curriculum on language-oriented artificial intelligence in translation and specialised communication

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper presents a technical curriculum on language-oriented artificial intelligence (AI) in the language and translation (L&T) industry. The curriculum aims to foster domain-specific technical AI literacy among stakeholders in the fields of translation and specialised communication by exposing them to the conceptual and technical/algorithmic foundations of modern language-oriented AI in an accessible way. The core curriculum focuses on 1) vector embeddings, 2) the technical foundations of neural networks, 3) tokenization and 4) transformer neural networks. It is intended to help users develop computational thinking as well as algorithmic awareness and algorithmic agency, ultimately contributing to their digital resilience in AI-driven work environments. The didactic suitability of the curriculum was tested in an AI-focused MA course at the Institute of Translation and Multilingual Communication at TH Koeln. Results suggest the didactic effectiveness of the curriculum, but participant feedback indicates that it should be embedded into higher-level didactic scaffolding - e.g., in the form of lecturer support - in order to enable optimal learning conditions.

💡 Research Summary

The paper proposes a practical, open‑source curriculum designed to build “technical AI literacy” among stakeholders in the language and translation (L&T) industry—students, translators, and specialised communication professionals. Recognising that large language models (LLMs) such as GPT‑4 are reshaping translation workflows, the author argues that domain‑specific technical understanding is essential for maintaining agency, consulting competence, and digital resilience.

The curriculum is delivered as a series of four Jupyter notebooks hosted on GitHub and executed in Google Colab. Each notebook focuses on a core component of modern language‑oriented AI and combines explanatory text with runnable code, thereby embodying the “literate computing” paradigm.

Vector Embeddings – Introduces static (e.g., Word2Vec, GloVe) and contextualised embeddings (BERT). Learners load pre‑trained models, train a tiny embedding from scratch, visualise vectors, and compute Euclidean distance and cosine similarity. The notebook also shows how BERT’s initial embedding layer converts static vectors into contextualised representations, linking directly to the later transformer notebook.
Technical Foundations of Neural Networks – Covers neurons, weights, biases, activation functions, and the forward‑ and backward‑propagation processes. Students build a minimal neural network in pure Python, simulate a forward pass that “translates” a short sentence, and step through a simplified back‑propagation update. This hands‑on module reinforces matrix/tensor arithmetic, softmax, and non‑linear transformations that reappear in later sections.
Tokenization – Explains why tokenisation reduces vocabulary size, compares word‑level and character‑level approaches, then dives into sub‑word algorithms: Byte‑Pair Encoding (BPE), WordPiece, and Unigram. Learners apply each tokenizer to the same sentence, inspect token IDs, explore GPT‑2’s vocabulary, and trace the pipeline from sub‑word tokens to embeddings. Positioning this notebook after embeddings and neural‑network basics ensures that the added complexity of sub‑word tokenisation is manageable.
Transformer Neural Networks – Builds on the previous three notebooks to unpack the full transformer architecture. It distinguishes encoder‑decoder, encoder‑only, and decoder‑only models, linking each to typical tasks (translation, NLU, text generation). Using the original Vaswani et al. (2017) design, the notebook visualises positional encodings, multi‑head self‑attention (query/key/value, scaling, masking, softmax), and the decoder’s masked attention and generation steps. Interactive tools such as BertViz, HuggingFace Decoding Visualiser, and Beam Search Visualiser let learners watch attention maps and probability distributions in real time, and experiment with greedy versus beam search decoding on GPT‑2.

Pedagogically, the author frames each notebook as a “zone of proximal development” activity: the material is deliberately placed just beyond the learner’s independent capability, requiring expert scaffolding. While the notebooks embed some guidance, the study’s pilot implementation in a Master’s‑level AI‑focused course at TH Köln revealed that additional lecturer support dramatically improves comprehension, especially for the more abstract attention mechanisms.

Evaluation relied on post‑course surveys. Participants reported high satisfaction with conceptual clarity and the hands‑on nature of the notebooks, noting that visualising attention helped demystify the “black‑box” perception of LLMs. However, many indicated that without an instructor’s explanations, certain sections felt overwhelming. The author therefore recommends embedding the notebooks within instructor‑led sessions, providing pre‑lecture primers and post‑lecture debriefs.

Limitations include the use of relatively small models (BERT, GPT‑2) for tractability in Colab, which may not fully capture the scale and architectural nuances of contemporary LLMs, and the reliance on self‑reported learning outcomes rather than objective performance metrics. Future work will extend the curriculum with notebooks covering model fine‑tuning, multimodal LLMs, retrieval‑augmented generation, and knowledge‑graph integration, as well as incorporate pre‑ and post‑tests to quantify learning gains.

In sum, the paper delivers a concrete, modular, and openly accessible curriculum that bridges the gap between abstract AI theory and the practical needs of translation professionals. Its strength lies in the step‑wise progression from embeddings to full transformer models, enriched by interactive visualisations. Successful deployment, however, hinges on complementary instructor scaffolding to guide learners through the more mathematically intensive components.

A technical curriculum on language-oriented artificial intelligence in translation and specialised communication

💡 Research Summary

Comments & Academic Discussion

Leave a Comment