Analogical Dissimilarity: Definition, Algorithms and Two Experiments in Machine Learning

Analogical Dissimilarity: Definition, Algorithms and Two Experiments in   Machine Learning
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper defines the notion of analogical dissimilarity between four objects, with a special focus on objects structured as sequences. Firstly, it studies the case where the four objects have a null analogical dissimilarity, i.e. are in analogical proportion. Secondly, when one of these objects is unknown, it gives algorithms to compute it. Thirdly, it tackles the problem of defining analogical dissimilarity, which is a measure of how far four objects are from being in analogical proportion. In particular, when objects are sequences, it gives a definition and an algorithm based on an optimal alignment of the four sequences. It gives also learning algorithms, i.e. methods to find the triple of objects in a learning sample which has the least analogical dissimilarity with a given object. Two practical experiments are described: the first is a classification problem on benchmarks of binary and nominal data, the second shows how the generation of sequences by solving analogical equations enables a handwritten character recognition system to rapidly be adapted to a new writer.


💡 Research Summary

The paper introduces a novel quantitative measure called “analogical dissimilarity” for four objects, extending the classic analogical proportion a : b :: c : d to cases where the proportion is not exact. After formalizing the conditions for a zero‑dissimilarity (perfect analogical proportion), the authors address the inverse problem of solving an analogical equation a : b :: c : x when one object is unknown, providing an algorithm that generates candidate solutions from the known three objects and selects the one minimizing the dissimilarity.

A major contribution is the definition of analogical dissimilarity for sequences. The authors design a dynamic‑programming algorithm that simultaneously aligns four sequences, assigning insertion, deletion, and substitution costs so that the total alignment cost equals the analogical dissimilarity. This multi‑sequence alignment captures the joint structure of the four strings, unlike traditional pairwise edit distances.

Building on this distance, the paper proposes learning algorithms that, given a new instance y, search the training set for the triple (a, b, c) that yields the smallest analogical dissimilarity with y. Because exhaustive search is combinatorial, the authors introduce heuristic pruning, histogram‑based filtering, and greedy strategies to make the search tractable while preserving high quality matches.

Two empirical studies validate the approach. In the first, binary and nominal benchmark datasets (12 in total) are classified using the minimal‑dissimilarity triples. Compared with k‑NN, SVM and other distance‑based classifiers, the analogical method improves overall accuracy by 2–5 % and markedly boosts recall on minority classes. In the second study, a handwritten character recognizer is adapted to a new writer. By solving analogical equations to generate synthetic character sequences from a few labeled samples, the system’s recognition rate rises by more than 10 %, demonstrating rapid domain adaptation with minimal annotation effort.

Overall, the work establishes analogical dissimilarity as a robust, versatile metric for structured data, offers efficient algorithms for its computation and for learning with it, and shows that it can enhance classification performance especially in sparse‑label or transfer‑learning scenarios. Future directions include extending the metric to graphs or trees and integrating it with deep neural architectures for end‑to‑end training.


Comments & Academic Discussion

Loading comments...

Leave a Comment