Reading time: 18 minute
...

๐Ÿ“ Original Info

  • Title:
  • ArXiv ID: 2512.18412
  • Date:
  • Authors: Unknown

๐Ÿ“ Abstract

We propose a structural-graph approach to classifying contour images in a few-shot regime without using backpropagation. The core idea is to make structure the carrier of explanations: an image is encoded as an attributed graph (critical points and lines represented as nodes with geometric attributes), and generalization is achieved via the formation of concept attractors (class-level concept graphs). Purpose. To design and experimentally validate an architecture in which class concepts are formed from a handful of examples (5-6 per class) through structural and parametric reductions, providing transparent decisions and eliminating backpropagation. Objectives. (1) Define a vocabulary of node/edge types and an attribute set for contour graphs; (2) specify normalization and invariances; (3) develop structural and parametric reduction operators as monotonic structural simplifications; (4) describe a procedure for aggregating examples into stable concepts; (5) perform classification via graph edit distance (GED) with practical approximations; (6) compare with representative few-shot approaches. Methods. Contour vectorization is followed by constructing a bipartite graph (Point/Line as nodes) with normalized geometric attributes such as coordinates, length, angle, and direction; reductions include the elimination of unstable substructures or noise and the alignment of paths between critical points. Concepts are formed by iterative composition of samples, and classification is performed by selecting the best graph-to-concept match (using approximated GED). Results. On an MNIST subset with 5-6 base examples per class (single epoch), we obtain a consistent accuracy of around 82% with full traceability of decisions: misclassifications can be explained by explicit structural similarities. An indicative comparison with SVM, MLP, CNN, as well as metric and meta-learning baselines, is provided. Conclusions. The structural-graph scheme with concept attractors enables few-shot learning without backpropagation and offers built-in explanations through the explicit graph structure. Limitations concern the computational cost of GED and the quality of skeletonization; promising directions include classification-algorithm optimization, work with static scenes, and associative recognition.

๐Ÿ“„ Full Content

Recent advances in Artificial Intelligence (AI), particularly in Deep Learning and Artificial Neural Networks (ANN), have led to significant progress in solving complex tasks [1][2][3]. However, the widespread application of these technologies has revealed a number of fundamental limitations that question the possibility of creating truly autonomous and adaptive systems [4][5][6]. These limitations include: the need for massive amounts of data for training, which requires significant time, computational, and energy resources [7,8]; fundamental problems of generative models related to information trust, "hallucinations," and the "entropy gap" phenomenon [4,7,9]; and model degradation when training on recursively generated data (model autophagy disorder, MAD) [10,11].

In this work, we proceed from the assumption that these problems are of a fundamental nature, stemming from the current conceptual paradigm. Modern ANNs are based primarily on the statistical nature of learning and a rigid architecture, which is optimized using the backpropagation algorithm [2,3,6]. Even specialized approaches for few-shot learning, such as meta-learning (MAML, Prototypical Networks) [12][13][14], are essentially complex methods of statistical optimization. They do not eliminate the fundamental dependence on statistics and cannot learn truly “from scratch” on a few examples, as they rely on models pre-trained on large data or require a complex meta-learning stage.

The paper considers an alternative approach based on abandoning backpropagation in favor of biologically grounded structural generalizations. This work presents a practical computational implementation of such an approach. We demonstrate how visual patterns (contour images) can be represented as attributed graphs [15][16][17], where nodes (critical points, lines) and edges (spatial connections) encode the topological and geometric properties of the object. The learning process is implemented as single-pass few-shot learning without backpropagation. It is based on the application of structural and parametric reduction operators, which act through monotonic structural simplification. Iterative application of these operators on 5-6 unique samples forces the system to converge to a stable, generalized state with minimal structural complexity -a generalized concept graph (or prototype graph).

The development of structural-graph models for few-shot learning lies at the intersection of several key research directions: Few-shot learning [14], Explainable AI (XAI) methods [18,19], Graph representations (GED) [20,21], and alternative architectures (OvA/OvO) [22]. Analysis of the literature in these areas reveals fundamental conceptual limitations that the proposed approach aims to address [7,[23][24][25].

A. Few-shot/Meta-learning Dominant deep learning models (CNN, MLP, Transformer) are fundamentally statistical and demonstrate low efficiency when training on critically small datasets, requiring thousands of examples and many training epochs to achieve acceptable accuracy. To address this problem, few-shot and meta-learning methods have been proposed [2,7,14,26]. Prototypical Networks learn to identify class prototypes based on a distance metric in the embedding space [13]. MAML (Model-Agnostic Meta-Learning) attempts to find an optimal initial weight initialization for fast adaptation [12]. Although both methods significantly improve accuracy on small samples, they do not eliminate the fundamental dependence on statistics and backpropagation. They require a complex and resourceintensive meta-learning stage on large auxiliary datasets [14,26]. Thus, this is a transfer of knowledge obtained statistically, rather than true “from scratch” single-pass learning.

Simultaneously with the increasing complexity of models, the problem of their interpretability has intensified. Deep learning models function as “black boxes”. Popular XAI methods such as LIME and SHAP are post-hoc techniques: they attempt to approximate the behavior of an already trained model rather than explain its actual decision-making process [27,28]. Studies have shown that such explanations can be unreliable, contradictory, and vulnerable to adversarial attacks [18,19,29,30]. An alternative is “explainability by design,” where the internal representation of the model is semantically meaningful [16,18,19]. Graph structures are an ideal candidate for this, as they allow explicit encoding of semantics in nodes and edges. Graph Edit Distance (GED) [20,21] is used to compare such structures. However, GED is an NP-hard problem, which remains a challenge for practical application [31,32].

C. Alternative Architectures (OvA/OvO) and Feature Generalization Problems Alternative ANN architectures -“One-vs-All” (OvA) and “One-vs-One” (OvO) -have long been considered for classification tasks [22]. This is an approach where instead of one large network, specialized networks are used (e.g., one for each class). This approach is conceptually close to ours, where we build one separate “neuron” (concept graph) for each class. However, in classical implementations of OvA/OvO architectures relying on backpropagation, noticeable limitations are observed regarding Out-of-Distribution Detection (OOD) [33,34]. Networks trained on limited examples do not form stable class separation boundaries. This is because traditional ANNs generalize only local recognition features (e.g., individual textures or angles) and cannot generalize features at the level of the entire structure [23,24]. Their fully connected and combinatorial nature with stochastic initialization makes generalization of global, topological properties impossible. Our approach solves this problem because generalization occurs not through stochastic optimization of local weights, but through deterministic structural graph reduction, which captures global topological features.

Literature analysis reveals three distinct but interconnected problems:

1 subgraphs/attributes supporting the decision; compare with post-hoc explanations and discuss validity limits (when structure “does not explain”).

This section details the methodological pipeline used to convert 2D contour images into stable concept graphs and their subsequent classification. The methodology is based on principles of structural generalization and abandons gradient optimization.

To achieve transparency and move away from “opaque” weight matrices inherent in traditional neural networks, a representation is proposed where “structure is the carrier of explanations.” The input contour image, obtained after binarization and skeletonization stages, is transformed into an attributed graph. The system encodes contours as bipartite graphs, whose structure strictly alternates between nodes of type Point and nodes of type Line. This architectural differentiation is fundamental as it allows clear separation of topological structure (critical points) from geometric properties (segments connecting them).

Point Nodes: Represent topological structure and critical contour points. They are ontologically classified into four main types:

โ€ข EndPoint: Terminal nodes marking the start or end of an open contour.

To ensure representation invariance to scale and shift, a necessary condition for stable attractor formation, all coordinates and related metrics (e.g., length) undergo normalization. Point coordinates are transformed into a centered system with range [-1, 1] using the formula:

x -center x center x A similar formula applies to y. This process is the first step of parametric reduction (R uc ), translating absolute, instancespecific values into relative, generalized parameters.

The learning process (concept formation) in this work fundamentally differs from traditional statistical optimization (e.g., gradient descent on a loss function). It is viewed as a deterministic process of structural generalization striving towards a state of minimal structural complexity. This most stable, generalized system state representing the invariant essence of a class (e.g., all variants of writing the digit “3”) is called the generalized concept graph.

The transition from a set of individual sample graphs (G 1 , . . . , G n ) to a single concept graph C is a process of controlled structure simplification (reduction). This process is governed by a set of Custom Reduction Operations (CRO), which act by reducing structural complexity or parametric variability, attempting to simplify the graph to a stable prototype in a finite number of steps.

The general reduction process can be described as a composition of three classes of operators:

where G input is the initial graph, and R uc , R sp , R w are theoretical reduction operators. A key aspect of our methodology is the direct mapping of these theoretical operators to specific CRO algorithms implemented in the system, as detailed in Table 1.

The learning process is one-pass and does not require backpropagation. It iteratively builds an attractor based on an ultra-small sample consisting of 5-6 unique training samples per class. The concept is initialized with the first sample graph: C = G 1 . This sample acts as an initial hypothesis about the class structure. Each subsequent sample G i+1 is integrated into the current concept C i using the reduction operation:

Each CRO operation is a five-stage process applying reduction operators from This iterative process is path-dependent; the order of sample presentation affects the final concept graph. This mimics a process where an initial hypothesis (C 0 ) is iteratively refined under the influence of new data (G i+1 ), which acts as a reduction force, eliminating sample-specific variations (noise) and leaving only the generalized core.

The classification (inference) process consists of comparing a graph G test , obtained from an unknown input image, with each concept graph C k from the trained library, minimizing the Graph Edit Distance (GED) to the input graph G test :

GED is defined as the minimum cost of a sequence of operations (insertion, deletion, substitution of nodes/edges) required to transform G test into C k . To ensure GED correctly accounts for the generalized nature of concepts, we use custom cost functions.

Node Substitution Cost: The cost of replacing node u โˆˆ G test with node v โˆˆ C k is calculated based on “range-based cost functions”:

โ€ข For numerical attributes (e.g., length, angle): If the attribute value of u (e.g., u.length) falls within the learned range of attribute v (e.g., v.length {min, max}), the substitution cost for this attribute is 0. If the value is outside the range, the cost is proportional to the distance to the nearest range boundary. โ€ข For categorical attributes: Cost is 0 for exact match or infinite (high) for mismatch. โ€ข Label Compatibility: Substitution cost is infinite if base node types are incompatible (e.g., Line to Point). Edge Edit Cost: Reduced cost to prioritize topological differences (presence/absence of nodes) over connectivity differences.

Calculating exact GED is an NP-hard problem. To ensure practical applicability, an approximation is used via a strict 60-second timeout for each individual comparison GED(G test , C k ). This timeout acts as a heuristic approximation, interrupting the search for the optimal edit path if it takes too long, and returning the best distance found so far.

The proposed architecture implements an approach conceptually close to One-vs-All, where each class k is represented by a separate “neuron” which is the generalized concept graph C k . The classification (inference) process consists of comparing the contour graph G test with each concept graph C k from the trained library. Unlike stochastic networks where neuron “excitation” is a numerical output (e.g., softmax), in our system, the “excitation” of the k-th neuron is the process of calculating the edit distance GED(G test , C k ). To select the final classification result, we apply the Winner-Takes-All concept. The winner is the class (concept) C k whose edit distance to the input graph (G test ) is minimal.

If the distance is equal for multiple classes, a conflict resolution rule applies. The class that is structurally more complex is chosen. Complexity is calculated as the sum of nodes and edges of the graph (Fig. 1).

This section presents empirical validation of the proposed graph approach to concept formation. The goal is not to optimize absolute accuracy, but to demonstrate that stable, explainable concept attractors can be formed from extremely limited data (few-shot learning) and that their performance and error patterns stem directly from their topological and parametric structure. Experiments are conducted on the MNIST-6 subset (classes ‘1’, ‘2’, ‘3’, ‘6’, ‘7’, ‘9’), using 5-6 unique training samples per subclass.

The system was trained on 8 concepts covering 6 classes (some classes, like ‘1’ and ‘2’, had two concepts to represent different writing styles). Training consisted of iterative structural reduction of 5-6 base samples (with 10 augmentation variants per sample, totaling about 350 examples) for each concept. Evaluation was performed on a test set of 5467 images not involved in concept formation. General performance metrics are presented in Table 2. These results are conceptually significant. An accuracy of 82.35% demonstrates that the approach based on forming canonical structural attractors without gradient optimization is viable and provides meaningful classification. The processing pipeline showed high reliability, with a 100% success rate, except for 10 images (0.18%) which failed processing due to skeletonization errors resulting in disconnected graphs.

In-depth analysis of metrics for each class (Table 3) reveals a direct dependence of performance on the structural uniqueness of digits.

The confusion matrix (Figure 2) provides deep insight into how the model makes decisions, visualizing systematic errors that are a direct consequence of structural and topological similarity.

Primary confusion occurs between digits 7 and 1 (angular open contours), and secondary confusion between digits 2 and 3 (curved open contours). Digits with a closed contour (6, 9) demonstrate strong discrimination.

โ€ข Main confusion: 152 samples of digit ‘2’ were classified as ‘3’. 28 samples of ‘3’ were classified as ‘2’. โ€ข Secondary confusion: 118 samples of digit ‘7’ were classified as ‘1’. Classes ‘6’ and ‘9’ demonstrate minimal confusion between themselves and with other open contours (e.g., only 48 samples of ‘6’ were erroneously classified as ‘9’). Unlike “black boxes,” where error causes are hidden in millions of weights, errors in this model are fully interpretable. Analysis shows that errors concentrate along structurally similar pairs:

  1. ‘2’ vs ‘3’: Both digits have similar “curved morphology.” They are open contours starting from one side,

This subsection analyzes the final result of the learning process -stable concept attractors, which are the carriers of explanations in the system. The structural reduction process transforms multiple training graphs into single canonical structures. Their metrics (Table 4) quantitatively define the “ideal” shape of each digit.

Analysis of Table 4 demonstrates a direct correlation between digit topology and attractor complexity. โ€ข Concepts ‘1 1’ and ‘1 3’ are minimal, consisting of only 3 nodes (StartPoint, Line, EndPoint). This ideally reflects their topology as a simple, unbranched path. โ€ข Concepts ‘6 1’ and ‘9 2’ have a higher average degree (2.00), indicating the presence of cycles. Importantly, they contain no EndPoint (EP=0), but contain Intersection-Point (IP=1) where the cycle closes. โ€ข Concepts ‘2 1’, ‘2 2’, ‘3 1’, ‘7 1’ have intermediate complexity (5-12 nodes). All contain exactly one End-Point (EP=1), topologically marking them as open contours. The number of CornerPoints (CP) encodes the number of bends (e.g., ‘7 1’ has 1 CP, ‘2 1’ has 2 CP). This table is essentially a dictionary for XAI. The explanation for classifying ‘9’ is that the input image graph successfully matched concept ‘9 2’, which is canonically defined as an 8-node structure with 1 IntersectionPoint (cycle) and 0 EndPoints (no free ends).

The concept formation process (Figures 3a-d) is an empirical demonstration of theoretical reduction operators.

Step 1 (C 0 = G 1 ): The first sample (G 1 ) establishes the initial concept C 0 . It is overly specific and contains all structural details and noise of the initial sample (Fig. 3a).

Step 2 (C 1 = CRO(C 0 + G 2 )): Integrating the second sample (G 2 ) reveals a mismatch -a “redundant endpoint branch.” The structural reduction operator (Endpoint removal) is applied, removing this G 1 -specific noise. This is a practical implementation of the R w operator finding a common substructure (Fig. 3b).

Step 3 and 4 (C 2 , C 3 ): Subsequent iterations continue this process, removing a “redundant corner point” (Fig. 3c) and other “noise substructure” (Fig. 3d). The final concept C 3 (Fig. 3d) is a stable attractor representing the most general topological structure (“curved S-shape”) common to all training samples. This process is a form of learning without backpropagation, where the representation structure itself is optimized, not a weight vector.

Structural reduction determines which nodes remain, while parametric generalization determines how their attributes are generalized to encode variability. Using concept ‘3 1’ (formed from 3 samples) as an example:

Numerical Properties: Attributes like coordinates are not averaged but converted into ranges ({min, max, center}). This creates flexible decision boundaries.

โ€ข x normalized : [-0.7, 0.2] (center -0.33) โ€ข y normalized : [0.3, 0.9] (center 0.63) Numeric Counters (Count Properties): Topological variations are also encoded as ranges.

โ€ข endpoint counts: {min: 2, max: 4, center: 2.67} โ€ข intersection point counts: {min: 0, max: 2, center: 0.67} Categorical Properties: Preserved only if 100% match.

โ€ข contour type: “OPEN” (all samples were open).

โ€ข horizontal direction: Removed (values were inconsistent, e.g., “Left”, “Right”).

This process is a powerful XAI tool.

To evaluate the effectiveness of the proposed approach (referred to as ComAN in experimental materials), its results are compared with other machine learning models under strictly limited data conditions (few-shot). Data for comparison is taken from experimental reports.

Analysis of this comparison reveals three key conclusions:

Recent advances in AI, particularly in Deep Learning and ANNs, have led to significant progress. However, widespread application of these technologies has revealed fundamental limitations questioning the viability of the current approach. Current ANN paradigms face several conceptual crises. They require huge amounts of data for training, as well as significant time, computational, and energy resources. Besides high cost, these models, especially generative ones, show significant reliability problems, generating errors and “hallucinations,” significantly reducing trust in their results. This directly leads to the phenomenon of “data inbreeding,” also known as “Model Autophagy Disorder” (MAD). When models trained to prefer the statistically probable start learning on synthetic data generated by themselves, they enter a recursive loop. This process inevitably leads to rapid “information degradation and Conclusions and Future Research Directions This study presents a comprehensive approach to AI departing from purely statistical methods in favor of biologically grounded principles of structural generalization. The work successfully presents and experimentally validates a unified theoretical and practical framework. This framework combines structural generalization principles with a practical, transparent, and highperformance XAI system based on generalized graph concepts (prototypes). The main contribution lies in demonstrating that abandoning statistical optimization (backpropagation algorithm) in favor of deterministic graph reduction allows:

  1. Achieving competitive classification accuracy (82.35%).

  2. Operating in few-shot learning mode (5-6 samples per class). 3) Performing single-pass learning without backpropagation. 4) Ensuring full, internal explainability and transparency of decision-making. Despite successful concept validation, the current implementation has clear bottlenecks outlining directions for future research.

โ€ข The classification (inference) process relies on graph matching, generally using Graph Edit Distance (GED), which is NP-complete. This creates significant computational load at the inference stage, leading to average processing time โˆผ3.5 seconds per image and the need for timeouts (e.g., 60 seconds). Effectively, a trade-off occurred: learning computational complexity (backpropagation) was replaced by combinatorial inference complexity (GED). โ€ข Sensory Limitation (Preprocessing). The model is “brittle” and depends on input “sensory” data quality: 1) Preprocessing errors lead to complete processing failure as the model cannot build a correct graph. 2) Invariance is limited by the range used in augmentation. Significant rotations ruin structural matching as they change line node attributes (e.g., quadrants).

โ€ข Representational Limitation. The model is “blind” to any non-shape information. The current approach “discards texture and gradient information,” limiting its application exclusively to shape and contour recognition tasks.

Identified limitations directly point to future research perspectives:

  1. Short-term perspectives include solving immediate engineering problems: researching fast GED approximation algorithms to speed up inference; developing more robust skeletonization methods; and extending graph representation to include texture and gradient attributes, turning the model into multimodal (in the sense of physical parameters). 2) Long-term vision concerns the most fundamental limitation of the current study: “lack of modeling evolutionary biological inter-neuronal connections.” The current Co-mAN model successfully implements the “grandmother cell” concept -one static concept (neuron) corresponds to one class. The next fundamental step is the transition from modeling individual neurons to modeling dynamic networks of these neurons. This will require developing mechanisms by which these graph concepts can dynamically interact, compete (e.g., via “Winner Take All” mechanisms), and form more complex, hierarchical “world models.” This is the path to creating AI systems that not only mimic biological efficiency but also approach true biological plausibility.

โ€ข CornerPoint: Nodes marking sharp changes in direction (corners).โ€ข IntersectionPoint: Nodes where three or more segments connect.โ€ข StartPoint: (segment length), angle (for CornerPoint), quadrant (discretized direction), horizontal_direction, and vertical_direction.

โ€ข CornerPoint: Nodes marking sharp changes in direction (corners).โ€ข IntersectionPoint: Nodes where three or more segments connect.

โ€ข CornerPoint: Nodes marking sharp changes in direction (corners).

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

โ†‘โ†“
โ†ต
ESC
โŒ˜K Shortcut