Semantic Graph for Zero-Shot Learning

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Zero-shot learning aims to classify visual objects without any training data via knowledge transfer between seen and unseen classes. This is typically achieved by exploring a semantic embedding space where the seen and unseen classes can be related. Previous works differ in what embedding space is used and how different classes and a test image can be related. In this paper, we utilize the annotation-free semantic word space for the former and focus on solving the latter issue of modeling relatedness. Specifically, in contrast to previous work which ignores the semantic relationships between seen classes and focus merely on those between seen and unseen classes, in this paper a novel approach based on a semantic graph is proposed to represent the relationships between all the seen and unseen class in a semantic word space. Based on this semantic graph, we design a special absorbing Markov chain process, in which each unseen class is viewed as an absorbing state. After incorporating one test image into the semantic graph, the absorbing probabilities from the test data to each unseen class can be effectively computed; and zero-shot classification can be achieved by finding the class label with the highest absorbing probability. The proposed model has a closed-form solution which is linear with respect to the number of test images. We demonstrate the effectiveness and computational efficiency of the proposed method over the state-of-the-arts on the AwA (animals with attributes) dataset.

💡 Research Summary

The paper addresses the problem of zero‑shot learning (ZSL), where the goal is to recognize objects from classes that have no training images. The authors focus on the semantic relatedness modeling stage and propose a novel framework that combines a k‑nearest‑neighbor (k‑nn) semantic graph with an absorbing Markov chain.

First, they adopt an annotation‑free word embedding space (e.g., Word2Vec or GloVe) to represent every class—both seen (source) and unseen (target)—as a high‑dimensional vector. Pairwise cosine similarity between these vectors defines a semantic similarity measure. Using this measure, they construct a sparse k‑nn graph in which each class node is connected to its k most similar classes. Edge weights are the cosine similarities. Crucially, unseen class nodes are never directly linked to each other; they can only be reached through seen class nodes. This design enables the later Markov‑chain formulation.

A multi‑class SVM is trained on the visual features of the seen classes. For any test image x, the SVM yields posterior probabilities p(y_j|x) for all seen classes y_j. The test image is then inserted into the graph as an additional transient node. It is connected only to a small subset of seen class nodes with the highest posterior probabilities, and the connection weights are exactly those probabilities.

The transition matrix of the resulting absorbing Markov chain is partitioned into blocks: Q (transient‑to‑transient, i.e., seen‑to‑seen), R (transient‑to‑absorbing, i.e., seen‑to‑unseen), a zero block, and an identity block for the absorbing states (the unseen classes). The standard absorbing‑chain theory gives the matrix of absorption probabilities B = (I – Q)^{-1} R. When the test image node is added, the extended Q and R matrices acquire one extra row and column, but the same closed‑form solution applies. Consequently, the absorption probability b_{ij}—the probability that a random walk starting from test image i is eventually absorbed by unseen class j—can be computed efficiently for all test images simultaneously.

Classification is performed by assigning each test image to the unseen class with the highest absorption probability. This approach leverages indirect, multi‑step semantic paths: even if a test image is not directly similar to a particular unseen class, a short chain of high‑similarity seen classes can still yield a large absorption probability, making the method robust to noisy visual predictions.

The authors evaluate the method on the Animals with Attributes (AwA) dataset, using 40 seen and 10 unseen animal categories. Their approach outperforms several strong baselines, including Direct Attribute Prediction, Indirect Attribute Prediction, and the bipartite‑graph method of

Semantic Graph for Zero-Shot Learning

💡 Research Summary

Comments & Academic Discussion

Leave a Comment