Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque

February 22, 2026

Reading time: 1 minute

...

📝 Original Info

Title: Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque
ArXiv ID: 2511.09396
Date: 2025-11-12
Authors: 저자 정보 제공되지 않음 (논문에 명시된 저자명이 없거나 제공되지 않았습니다.)

📝 Abstract

Current Multimodal Large Language Models exhibit very strong performance for several demanding tasks. While commercial MLLMs deliver acceptable performance in low-resource languages, comparable results remain unattained within the open science community. In this paper, we aim to develop a strong MLLM for a low-resource language, namely Basque. For that purpose, we develop our own training and evaluation image-text datasets. Using two different Large Language Models as backbones, the Llama-3.1-Instruct model and a Basque-adapted variant called Latxa, we explore several data mixtures for training. We show that: i) low ratios of Basque multimodal data (around 20%) are already enough to obtain solid results on Basque benchmarks, and ii) contrary to expected, a Basque instructed backbone LLM is not required to obtain a strong MLLM in Basque. Our results pave the way to develop MLLMs for other low-resource languages by openly releasing our resources.

Multimodal Large Language Models for Low-Resource Languages: A Case Study for Basque

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Related Posts

A New Perspective on Precision and Recall for Generative Models

Accelerating Instanton Theory with the Line Integral String Method, Gaussian Process Regression, and Selective Hessian Modeling

Beyond Boundaries: Leveraging Vision Foundation Models for Source-Free Object Detection

Start searching

No results found