The origin of Mayan languages from Formosan language group of Austronesian

Basic body-part names (BBPNs) were defined as body-part names in Swadesh basic 200 words. Non-Mayan cognates of Mayan (MY) BBPNs were extensively searched for, by comparing with non-MY vocabulary, including ca.1300 basic words of 82 AN languages listed by Tryon (1985), etc. Thus found cognates (CGs) in non-MY are listed in Table 1, as classified by language groups to which most similar cognates (MSCs) of MY BBPNs belong. CGs of MY are classified to 23 mutually unrelated CG-items, of which 17.5 CG-items have their MSCs in Austronesian (AN), giving its closest similarity score (CSS), CSS(AN) = 17.5, which consists of 10.33 MSCs in Formosan, 1.83 MSCs in Western Malayo-Polynesian (W.MP), 0.33 in Central MP, 0.0 in SHWNG, and 5.0 in Oceanic [i.e., CSS(FORM)= 10.33, CSS(W.MP) = 1.88, …, CSS(OC)= 5.0]. These CSSs for language (sub)groups are also listed in the underline portion of every section of (Section1 - Section 6) in Table 1. Chi-squar test (degree of freedom = 1) using [Eq 1] and [Eqs.2] revealed that MSCs of MY BBPNs are distributed in Formosan in significantly higher frequency (P < 0.001) than in other subgroups of AN, as well as than in non-AN languages. MY is thus concluded to have been derived from Formosan of AN. Eskimo shows some BBPN similarities to FORM and MY.

💡 Research Summary

The paper puts forward a provocative hypothesis that the Mayan language family (MY) originated from the Formosan branch of the Austronesian (AN) language family. The author’s methodology hinges on a narrow lexical set: “basic body‑part names” (BBPNs), defined as those body‑part terms that appear in the Swadesh 200‑word list. From the Mayan corpus, 23 mutually independent BBPN items were extracted. For each Mayan BBPN, the author searched a large comparative database that includes roughly 1,300 basic vocabulary items from 82 Austronesian languages compiled by Tryon (1985), as well as a selection of non‑Austronesian languages. The most similar cognate (MSC) in each language was identified, and a “Closest Similarity Score” (CSS) was assigned: each BBPN that finds an MSC in a given language group contributes 0.5 points (or 1 point if two or more cognates are found), and the scores are summed across groups.

The results show that 17.5 of the 23 BBPN items have their MSCs in Austronesian languages, with a particularly strong concentration in the Formosan subgroup (CSS = 10.33). The other Austronesian sub‑groups contribute much smaller amounts (Western Malayo‑Polynesian = 1.88, Central MP = 0.33, Oceanic = 5.0, SHWNG = 0). A chi‑square test (df = 1) was applied to compare the observed distribution of MSCs across language groups with the expected distribution based on the proportion of languages in each group. The test yields a highly significant result (P < 0.001), indicating that the frequency of Formosan MSCs is far greater than would be expected by chance, both relative to other Austronesian sub‑groups and to the non‑Austronesian sample.

From these quantitative findings the author concludes that Mayan must have been derived from the Formosan branch of Austronesian. An ancillary observation notes that some Eskimo terms also show similarity to both Formosan and Mayan BBPNs, but this is not explored in depth.

While the study is ambitious in its attempt to apply a statistical framework to long‑range linguistic comparison, several methodological concerns arise. First, the restriction to BBPNs ignores the broader lexical and grammatical evidence that is usually required for establishing deep genetic relationships. Body‑part terminology can be subject to cultural borrowing, semantic shift, or independent innovation, and the paper does not address how such factors were controlled. Second, the comparative sample is heavily weighted toward Austronesian languages, especially Formosan, while the non‑Austronesian control group is relatively small and heterogeneous, potentially inflating the apparent concentration of matches in Formosan. Third, the CSS calculation treats all form‑based resemblances equally, without accounting for systematic sound correspondences, regular phonological changes, or semantic drift; consequently, chance resemblances may be over‑counted. Fourth, the chi‑square analysis lacks a transparent description of how expected frequencies were derived, and the use of a single degree of freedom may oversimplify the complex distribution of language families. Finally, the brief mention of Eskimo similarities is not substantiated with data, leaving open the possibility of alternative contact scenarios.

In sum, the paper provides an intriguing data set and a novel quantitative approach, but the conclusions—that Mayan derives from Formosan Austronesian—remain tentative. Future work should broaden the lexical scope beyond body‑part terms, incorporate rigorous phonological reconstruction, expand the non‑Austronesian comparative pool, and employ more sophisticated statistical models that reflect the hierarchical nature of language families. Only with such refinements can the claim of a Formosan origin for Mayan be evaluated convincingly.