Exponential Family Graph Matching and Ranking

Reading time: 6 minute
...

📝 Original Info

  • Title: Exponential Family Graph Matching and Ranking
  • ArXiv ID: 0904.2623
  • Date: 2009-06-05
  • Authors: Researchers from original ArXiv paper

📝 Abstract

We present a method for learning max-weight matching predictors in bipartite graphs. The method consists of performing maximum a posteriori estimation in exponential families with sufficient statistics that encode permutations and data features. Although inference is in general hard, we show that for one very relevant application - web page ranking - exact inference is efficient. For general model instances, an appropriate sampler is readily available. Contrary to existing max-margin matching models, our approach is statistically consistent and, in addition, experiments with increasing sample sizes indicate superior improvement over such models. We apply the method to graph matching in computer vision as well as to a standard benchmark dataset for learning web page ranking, in which we obtain state-of-the-art results, in particular improving on max-margin variants. The drawback of this method with respect to max-margin alternatives is its runtime for large graphs, which is comparatively high.

💡 Deep Analysis

Deep Dive into Exponential Family Graph Matching and Ranking.

We present a method for learning max-weight matching predictors in bipartite graphs. The method consists of performing maximum a posteriori estimation in exponential families with sufficient statistics that encode permutations and data features. Although inference is in general hard, we show that for one very relevant application - web page ranking - exact inference is efficient. For general model instances, an appropriate sampler is readily available. Contrary to existing max-margin matching models, our approach is statistically consistent and, in addition, experiments with increasing sample sizes indicate superior improvement over such models. We apply the method to graph matching in computer vision as well as to a standard benchmark dataset for learning web page ranking, in which we obtain state-of-the-art results, in particular improving on max-margin variants. The drawback of this method with respect to max-margin alternatives is its runtime for large graphs, which is comparativel

📄 Full Content

The Maximum-Weight Bipartite Matching Problem (henceforth 'matching problem') is a fundamental problem in combinatorial optimization [26]. This is the problem of finding the 'heaviest' perfect match in a weighted bipartite graph. An exact optimal solution can be found in cubic time by standard methods such as the Hungarian algorithm.

This problem is of practical interest because it can nicely model real-world applications. For example, in computer vision the crucial problem of finding a correspondence between sets of image features is often modeled as a matching problem [2,3]. Ranking algorithms can be based on a matching framework [19], as can clustering algorithms [14,11].

When modeling a problem as one of matching, one central question is the choice of the weight matrix. The problem is that in real applications we typically observe edge feature vectors, not edge weights. Consider a concrete example in computer vision: it is difficult to tell what the ‘similarity score’ is between two image feature points, but it is straightforward to extract feature vectors (e.g. SIFT) associated with those points.

In this setting, it is natural to ask whether we could parameterize the features, and use labeled matches in order to estimate the parameters such that, given graphs with ‘similar’ features, their resulting max-weight matches are also ‘similar’. This idea of ‘parameterizing algorithms’ and then optimizing for agreement with data is called structured estimation [31,33]. [31] and [3] describe max-margin structured estimation formalisms for this problem. Max-margin structured estimators are appealing in that they try to minimize the loss that one really cares about (‘structured losses’, of which the Hamming loss is an example). However structured losses are typically piecewise constant in the parameters, which eliminates any hope of using smooth optimization directly. Max-margin estimators instead minimize a surrogate loss which is easier to optimize, namely a convex upper bound on the structured loss [33]. In practice the results are often good, but known convex relaxations produce estimators which are statistically inconsistent [22], i.e., the algorithm in general fails to obtain the best attainable model in the limit of infinite training data. The inconsistency of multiclass support vector machines is a well-known issue in the literature that has received careful examination recently [8,7].

Motivated by the inconsistency issues of max-margin structured estimators as well as by the well-known benefits of having a full probabilistic model, in this paper we present a maximum a posteriori (MAP) estimator for the matching problem. The observed data are the edge feature vectors and the labeled matches provided for training. We then maximize the conditional posterior likelihood of matches given the observed data. We build an exponential family model where the sufficient statistics are such that the mode of the distribution (the prediction) is the solution of a max-weight matching problem. The resulting partition function is P-complete to compute exactly. However, we show that for learning to rank applications the model instance is tractable. We then compare the performance of our model instance against a large number of state-of-theart ranking methods, including DORM [19], an approach that only differs to our model instance by using max-margin instead of a MAP formulation. We show very competitive results on standard webpage ranking datasets, and in particular we show that our model performs better than or on par with DORM. For intractable model instances, we show that the problem can be approximately solved using sampling and we provide experiments from the computer vision domain. However the fastest suitable sampler is still quite slow for large models, in which case max-margin matching estimators like those of [3] and [31] are likely to be preferable even in spite of their potential inferior accuracy.

In recent years, great attention has been devoted in Machine Learning to so-called structured predictors, which are predictors of the kind

where X is an arbitrary input space and Y is an arbitrary discrete space, typically exponentially large. Y may be, for example, a space of matrices, trees, graphs, sequences, strings, matches, etc. This structured nature of Y is what structured prediction refers to.

In the setting of this paper, X is the set of vector-weighted bipartite graphs (i.e., each edge has a feature vector associated to it), and Y is the set of perfect matches induced by X. If N graphs are available, along with corresponding annotated matches (i.e., a set {(x n , y n )} N n=1 ), our task will be to estimate θ such that when we apply the predictor g θ to a new graph it produces a match that is similar to matches of similar graphs from the annotated set. Structured learning or structured estimation refers to the process of estimating a vector θ for predictor g θ when data {(x 1 , y 1 ), . . . , (x N ,

…(Full text truncated)…

📸 Image Gallery

cover.png page_2.webp page_3.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut