Machine Learning: When and Where the Horses Went Astray?

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Machine Learning is usually defined as a subfield of AI, which is busy with information extraction from raw data sets. Despite of its common acceptance and widespread recognition, this definition is wrong and groundless. Meaningful information does not belong to the data that bear it. It belongs to the observers of the data and it is a shared agreement and a convention among them. Therefore, this private information cannot be extracted from the data by any means. Therefore, all further attempts of Machine Learning apologists to justify their funny business are inappropriate.

💡 Research Summary

The paper sets out to dismantle the widely‑accepted definition of Machine Learning (ML) as a discipline that extracts “meaningful information” directly from raw data. It argues that this definition is philosophically unsound because the notion of “meaning” does not reside in the data itself but in the observers who interpret the data. Meaning, according to the authors, is a socially constructed convention—a shared agreement among a community of observers—rather than an intrinsic property of the data. Consequently, any algorithm that operates solely on data cannot, by itself, uncover or generate that meaning.

The authors begin by distinguishing between Shannon‑type information (statistical regularities, entropy reduction) and semantic meaning (interpretive content). They invoke Wittgenstein’s language‑game theory and Heidegger’s existential phenomenology to illustrate that meaning emerges only through use, context, and communal practices. In this view, data are merely carriers of potential signals; they do not contain the interpretive frameworks required to turn those signals into meaningful concepts.

A central concept introduced is “private information,” defined as the meaning that is exclusive to a particular observer group. For example, medical diagnostic labels are the product of expert consensus, not of the raw imaging data alone. The paper contends that supervised learning, which relies on data‑label pairs, merely reproduces this pre‑existing private information. When a model predicts a label with high accuracy, it is not discovering new meaning but faithfully reproducing a human‑defined mapping that already embodies a social agreement.

The critique extends to contemporary trends such as self‑supervised learning, transfer learning, and meta‑learning, which are often marketed as “learning meaning from data alone.” The authors point out that these methods still depend on pre‑text tasks, large pseudo‑labeled corpora, or fine‑tuning on human‑curated datasets—each of which embeds human‑chosen objectives and conventions. Thus, the claim that meaning can be autonomously distilled from raw inputs is, in the authors’ view, a mischaracterization.

Empirical illustrations reinforce the argument. An image classifier that achieves 99 % top‑1 accuracy on ImageNet is effectively reproducing the label “cat,” a concept that was defined, agreed upon, and encoded by the dataset creators. The same visual stimulus could be labeled “animal,” “pet,” or even “spiritual symbol” in different cultural contexts, underscoring that meaning is contingent on the observer’s linguistic and cultural background, not on the pixel values themselves.

From these analyses, the paper derives two practical implications for the ML community. First, evaluation metrics must go beyond raw accuracy and incorporate assessments of label quality, cultural bias, and the transparency of the meaning‑making process. Second, research should shift toward interactive, human‑in‑the‑loop paradigms that treat meaning as a negotiated construct rather than a hidden variable to be uncovered. Approaches such as interactive labeling, explainable AI (XAI), and collaborative decision‑making frameworks are highlighted as promising directions.

In conclusion, the authors assert that the prevailing definition of ML as “information extraction from data” is fundamentally flawed. Meaning is not an emergent property of data but a product of shared human conventions. Recognizing this limitation forces the field to acknowledge its semantic boundaries and to develop new methodologies that explicitly address the co‑construction of meaning between humans and machines. This philosophical re‑orientation, they argue, is essential for the responsible and realistic advancement of artificial intelligence.

Machine Learning: When and Where the Horses Went Astray?

💡 Research Summary

Comments & Academic Discussion

Leave a Comment