When you talk about "Information processing" what actually do you have in mind?

“Information Processing” is a recently launched buzzword whose meaning is vague and obscure even for the majority of its users. The reason for this is the lack of a suitable definition for the term “information”. In my attempt to amend this bizarre situation, I have realized that, following the insights of Kolmogorov’s Complexity theory, information can be defined as a description of structures observable in a given data set. Two types of structures could be easily distinguished in every data set - in this regard, two types of information (information descriptions) should be designated: physical information and semantic information. Kolmogorov’s theory also posits that the information descriptions should be provided as a linguistic text structure. This inevitably leads us to an assertion that information processing has to be seen as a kind of text processing. The idea is not new - inspired by the observation that human information processing is deeply rooted in natural language handling customs, Lotfi Zadeh and his followers have introduced the so-called “Computing With Words” paradigm. Despite of promotional efforts, the idea is not taking off yet. The reason - a lack of a coherent understanding of what should be called “information”, and, as a result, misleading research roadmaps and objectives. I hope my humble attempt to clarify these issues would be helpful in avoiding common traps and pitfalls.

💡 Research Summary

The paper opens by observing that the term “information processing” has become a fashionable buzz‑word, yet its meaning remains vague even among practitioners. The author attributes this ambiguity to the lack of a precise definition of “information” itself. To remedy this, the paper draws on Kolmogorov’s algorithmic complexity theory, which defines the information content of an object as the length of the shortest program that can describe it. Crucially, Kolmogorov’s notion of “description” is inherently linguistic: it is a text that encodes the structure of the data.

Applying this insight to any data set, the author distinguishes two kinds of observable structures. The first, “physical information,” consists of the objective, statistical or mathematical patterns that can be directly extracted from the raw data—pixel arrangements in an image, spectral components of a signal, temporal correlations in sensor streams, and so forth. The second, “semantic information,” is the interpretation that a human observer attaches to those patterns: categories, concepts, contextual meanings, and narratives. Both types of information are, according to the author, best represented as textual descriptions—one that narrates the physical regularities, another that conveys the human‑assigned meaning.

From this dual‑text perspective the paper makes a bold claim: information processing is essentially text processing. Human cognition, after all, operates through language; therefore any artificial system that aspires to process information in a human‑like way should manipulate linguistic representations. This view aligns with Lotfi Zadeh’s “Computing With Words” (CWW) paradigm, which attempts to perform computation directly on linguistic terms. However, the author argues that CWW has failed to gain traction because it lacks a coherent definition of information and because it treats words as isolated symbols without grounding them in the underlying physical data. Moreover, CWW provides no systematic method for handling the hierarchical complexity of semantic information.

To overcome these shortcomings, the paper proposes a concrete framework. First, raw data are transformed into a “technical text” that describes their statistical structure (e.g., “the histogram of pixel intensities is approximately uniform”). Second, human experts generate an “interpretive text” that captures the semantic layer (e.g., “this photograph depicts a sunset over a beach”). Both texts are then fed into modern natural‑language‑processing pipelines: tokenization, syntactic and semantic parsing, relation extraction, summarization, and generative modeling. By doing so, the system can integrate low‑level quantitative cues with high‑level conceptual knowledge in a unified linguistic space.

The advantages of this approach are threefold. (1) Since all information resides in textual form, it can be stored, queried, and linked using existing database and knowledge‑graph technologies. (2) Contemporary deep‑learning language models (e.g., Transformers) can be directly applied, enabling the learning of complex semantic relationships and the generation of novel interpretations. (3) Explicit separation of physical and semantic texts allows for stage‑wise validation: errors in the physical‑information extraction can be detected and corrected at the textual level before they propagate to the semantic reasoning stage.

The paper illustrates potential applications across domains. In medical imaging, for instance, the pixel‑level description of a scan would be the physical text, while the radiologist’s diagnostic narrative would be the semantic text; an NLP‑driven system could then fuse both to assist diagnosis. Similar benefits are envisaged for autonomous vehicles, multimedia retrieval, and human‑computer interaction, where grounding language in perceptual data is essential.

In conclusion, the author asserts that “information processing = text processing” and outlines a research roadmap that starts from a rigorous definition of information, proceeds through dual‑text generation, and culminates in the deployment of state‑of‑the‑art language technologies. By clarifying the concept of information and aligning computational methods with linguistic representations, the paper aims to resolve the current conceptual confusion and to provide a solid foundation for future advances in information science, artificial intelligence, and related fields.