Analogy perception applied to seven tests of word comprehension
It has been argued that analogy is the core of cognition. In AI research, algorithms for analogy are often limited by the need for hand-coded high-level representations as input. An alternative approach is to use high-level perception, in which high-level representations are automatically generated from raw data. Analogy perception is the process of recognizing analogies using high-level perception. We present PairClass, an algorithm for analogy perception that recognizes lexical proportional analogies using representations that are automatically generated from a large corpus of raw textual data. A proportional analogy is an analogy of the form A:B::C:D, meaning “A is to B as C is to D”. A lexical proportional analogy is a proportional analogy with words, such as carpenter:wood::mason:stone. PairClass represents the semantic relations between two words using a high-dimensional feature vector, in which the elements are based on frequencies of patterns in the corpus. PairClass recognizes analogies by applying standard supervised machine learning techniques to the feature vectors. We show how seven different tests of word comprehension can be framed as problems of analogy perception and we then apply PairClass to the seven resulting sets of analogy perception problems. We achieve competitive results on all seven tests. This is the first time a uniform approach has handled such a range of tests of word comprehension.
💡 Research Summary
The paper introduces PairClass, an algorithm that treats lexical proportional analogies (A : B :: C : D) as a supervised classification problem over word‑pair vectors. The authors argue that analogy is central to cognition and that traditional AI approaches to analogy rely on hand‑crafted high‑level representations, which limits scalability. PairClass instead performs “high‑level perception” by automatically constructing representations from raw textual data.
The pipeline begins with morphological processing of each word pair to generate variants (e.g., plural forms). Using a massive web‑crawled corpus of roughly 5 × 10¹⁰ tokens (≈280 GB), the system extracts sentences that contain the two words (or their variants) in either order, allowing up to three intervening words. Two search templates are employed: “
Comments & Academic Discussion
Loading comments...
Leave a Comment