Malagasy Dialects and the Peopling of Madagascar

Malagasy Dialects and the Peopling of Madagascar
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The origin of Malagasy DNA is half African and half Indonesian, nevertheless the Malagasy language, spoken by the entire population, belongs to the Austronesian family. The language most closely related to Malagasy is Maanyan (Greater Barito East group of the Austronesian family), but related languages are also in Sulawesi, Malaysia and Sumatra. For this reason, and because Maanyan is spoken by a population which lives along the Barito river in Kalimantan and which does not possess the necessary skill for long maritime navigation, the ethnic composition of the Indonesian colonizers is still unclear. There is a general consensus that Indonesian sailors reached Madagascar by a maritime trek, but the time, the path and the landing area of the first colonization are all disputed. In this research we try to answer these problems together with other ones, such as the historical configuration of Malagasy dialects, by types of analysis related to lexicostatistics and glottochronology which draw upon the automated method recently proposed by the authors \cite{Serva:2008, Holman:2008, Petroni:2008, Bakker:2009}. The data were collected by the first author at the beginning of 2010 with the invaluable help of Joselin`a Soafara N'er'e and consist of Swadesh lists of 200 items for 23 dialects covering all areas of the Island.


💡 Research Summary

The paper investigates the internal classification, historical configuration, and settlement chronology of Malagasy dialects using a quantitative linguistic approach. The authors collected Swadesh lists of 200 items from 23 Malagasy dialects covering the entire island, a dataset that appears to be the most extensive of its kind. They applied a normalized Levenshtein distance (LDN) to each pair of cognate items, counting insertions, deletions, and substitutions, and then averaged the 200 LDN values to obtain a single distance measure for each dialect pair.

Using these pairwise distances, two standard phylogenetic algorithms were employed: UPGMA, which assumes equal evolutionary rates across branches, and Neighbor‑Joining (NJ), which allows rate heterogeneity. Both methods produced remarkably consistent high‑level results: the dialects split into two major clusters, one located in the central‑north‑east and the other in the south‑west of Madagascar. Minor differences between the trees (e.g., the placement of Majunga, Antandroy, and Ambovombe) reflect the geographic position of those varieties near the boundary between the two regions and suggest localized contact or mixed ancestry.

To locate the linguistic homeland, the authors performed a Structure Component Analysis (SCA), a geometric representation akin to principal component analysis, on the distance matrix. The area exhibiting the greatest lexical diversity—interpreted as the point of origin—coincides with the present‑day central‑north‑east region. By integrating the SCA‑derived time scale with a generalized glottochronological formula (a modification of the classic Swadesh‑based dating equation), they estimated the date of the initial settlement at around AD 650 ± 50 years. This date is more recent than many earlier proposals that placed the colonisation between 1000 and 2000 years ago, but it aligns with the hypothesis of a relatively rapid maritime migration.

External comparisons were made with other Austronesian languages using the Austronesian Basic Vocabulary Database and the Automated Similarity Judgment Program (ASJP). The strongest lexical affinity was found with Maanyan (≈45 % shared basic vocabulary), confirming the long‑standing view that Maanyan is the closest relative. Nevertheless, notable loanwords from Malay and Javanese were also detected, indicating that the founding population likely comprised not only Maanyan speakers but also other Indonesian seafarers.

The authors contrast their findings with the classic study by V´erin et al. (1969), which used a 100‑item list and traditional cognate counting. While V´erin reported a primary split between the northern tip and the rest of the island, the present analysis, with a larger lexical sample and a more sensitive distance metric, reveals a north‑east versus south‑west division that aligns closely with geographic realities. The paper also notes that Antandroy dialects show the highest average LDN to all others, confirming their status as the most divergent varieties, whereas the Merina dialect (the official standard) has the lowest average distance, reflecting convergence toward the prestige norm.

In conclusion, the study provides robust quantitative evidence that Malagasy dialects form two major groups separated along a south‑east to north‑west axis, that the linguistic homeland lies in the central‑north‑east, and that the first settlement likely occurred around AD 650. The methodology—combining normalized Levenshtein distances, dual phylogenetic reconstructions, and SCA‑based dating—offers a powerful template for future linguistic, archaeological, and genetic investigations of island colonisation. Further work is suggested to disentangle lexical borrowing from inherited vocabulary, to incorporate semantic change analyses, and to integrate the linguistic timeline with archaeological and genomic data for a fuller picture of Madagascar’s peopling.


Comments & Academic Discussion

Loading comments...

Leave a Comment