A Feature-based Generalizable Prediction Model for Both Perceptual and Abstract Reasoning
A hallmark of human intelligence is the ability to infer abstract rules from limited experience and apply these rules to unfamiliar situations. This capacity is widely studied in the visual domain using the Raven’s Progressive Matrices. Recent advances in deep learning have led to multiple artificial neural network models matching or even surpassing human performance. However, while humans can identify and express the rule underlying these tasks with little to no exposure, contemporary neural networks often rely on massive pattern-based training and cannot express or extrapolate the rule inferred from the task. Furthermore, most Raven’s Progressive Matrices or Raven-like tasks used for neural network training used symbolic representations, whereas humans can flexibly switch between symbolic and continuous perceptual representations. In this work, we present an algorithmic approach to rule detection and application using feature detection, affine transformation estimation and search. We applied our model to a simplified Raven’s Progressive Matrices task, previously designed for behavioral testing and neuroimaging in humans. The model exhibited one-shot learning and achieved near human-level performance in the symbolic reasoning condition of the simplified task. Furthermore, the model can express the relationships discovered and generate multi-step predictions in accordance with the underlying rule. Finally, the model can reason using continuous patterns. We discuss our results and their relevance to studying abstract reasoning in humans, as well as their implications for improving intelligent machines.
💡 Research Summary
The paper presents a novel algorithmic framework for abstract reasoning that bridges the gap between human fluid intelligence and current deep‑learning approaches to Raven’s Progressive Matrices (RPM). Rather than relying on massive pattern‑based training, the authors propose a feature‑based, one‑shot learning system that can infer and explicitly express the underlying rule of a matrix problem. The core pipeline consists of three stages: (1) feature detection using scale‑invariant keypoint descriptors (SIFT, with ORB as an alternative), applied separately to three “attention windows” extracted from the matrix stimuli; (2) transformation estimation via repeated Random Sample Consensus (RANSAC). The first RANSAC pass finds the largest set of inlier correspondences and estimates a basic affine transformation (rotation, translation, scaling). Subsequent passes re‑sample outliers together with a few inliers to discover additional distinct transformations, thereby building a sequence of affine operations that best maps cue A to cue B and cue B to cue C. Each candidate transformation is evaluated locally by warping cue B toward cue C and computing mean‑squared error (MSE); the transformation with the lowest local MSE is retained. This local search is repeated ten times to mitigate stochastic sampling effects. (3) Image combination, thresholding, and global similarity assessment. All discovered transformations are applied sequentially to the input image, the intermediate results are summed, and a randomly selected threshold (half, two‑thirds, or one‑third of the maximum pixel value) is used to binarize the composite image. The final output is compared to the target answer image using global MSE, and the transformation‑threshold configuration that yields the smallest global error is selected as the inferred rule.
The method was evaluated on a simplified RPM task designed for neuroimaging studies (Morin et al., 2023), which includes four conditions: Perceptual Matching, Perceptual Reasoning, Symbolic Matching, and Symbolic Reasoning. Each condition presents 24 unique cue stimuli, each shown four times (including left‑right flips), for a total of 384 trials. The algorithm automatically detects the rectangular “blank” region, extracts the three cue windows, and runs the full pipeline on each trial. Performance results show near‑human accuracy in the Symbolic Reasoning condition and a substantial advantage over chance in the Perceptual conditions. Crucially, the model can generate an explicit description of the rule (e.g., “rotate 45° and scale by 1.2”) after a single exposure, something that contemporary deep neural networks cannot do.
Beyond performance, the authors draw a compelling link to human neurobiology. Prior fMRI work (Morin et al., 2023) reported dynamic reconfiguration of frontoparietal networks during abstract reasoning. The authors argue that this reconfiguration may reflect an iterative search for a generalizable sequence of transformations—precisely what their algorithm implements. Thus, the model offers a computational hypothesis for how the brain might discover and apply affine‑like transformations during RPM solving.
Key contributions include: (i) abstracting away from pixel‑level operations to scale‑invariant features, reducing computational load and aligning with known IT‑cortex response properties; (ii) using RANSAC‑based affine transformation discovery to capture both simple (identity, similarity) and complex (rotation + scaling) relations; (iii) providing an interpretable rule output, satisfying explainable‑AI criteria; (iv) demonstrating that a single‑shot, feature‑driven approach can handle both symbolic and continuous visual reasoning tasks.
Limitations are acknowledged: reliance on SIFT/ORB may struggle with highly textured or color‑varying stimuli; the greedy local‑search and random threshold selection introduce heuristic elements that could be replaced by more principled optimization (e.g., Bayesian model selection). Future work could integrate deep‑learned feature extractors, explore systematic threshold optimization, and perform quantitative comparisons with human neuroimaging data to validate the proposed mechanistic analogy.
Overall, the paper offers a promising direction for building AI systems that reason more like humans—flexibly switching between perceptual and symbolic domains, learning rules from minimal exposure, and articulating those rules in an understandable form.
Comments & Academic Discussion
Loading comments...
Leave a Comment