Learning View Generalization Functions

Reading time: 6 minute
...

📝 Original Info

  • Title: Learning View Generalization Functions
  • ArXiv ID: 0712.0136
  • Date: 2007-12-02
  • Authors: Thomas M. Breuel

📝 Abstract

Learning object models from views in 3D visual object recognition is usually formulated either as a function approximation problem of a function describing the view-manifold of an object, or as that of learning a class-conditional density. This paper describes an alternative framework for learning in visual object recognition, that of learning the view-generalization function. Using the view-generalization function, an observer can perform Bayes-optimal 3D object recognition given one or more 2D training views directly, without the need for a separate model acquisition step. The paper shows that view generalization functions can be computationally practical by restating two widely-used methods, the eigenspace and linear combination of views approaches, in a view generalization framework. The paper relates the approach to recent methods for object recognition based on non-uniform blurring. The paper presents results both on simulated 3D ``paperclip'' objects and real-world images from the COIL-100 database showing that useful view-generalization functions can be realistically be learned from a comparatively small number of training examples.

💡 Deep Analysis

Deep Dive into Learning View Generalization Functions.

Learning object models from views in 3D visual object recognition is usually formulated either as a function approximation problem of a function describing the view-manifold of an object, or as that of learning a class-conditional density. This paper describes an alternative framework for learning in visual object recognition, that of learning the view-generalization function. Using the view-generalization function, an observer can perform Bayes-optimal 3D object recognition given one or more 2D training views directly, without the need for a separate model acquisition step. The paper shows that view generalization functions can be computationally practical by restating two widely-used methods, the eigenspace and linear combination of views approaches, in a view generalization framework. The paper relates the approach to recent methods for object recognition based on non-uniform blurring. The paper presents results both on simulated 3D ``paperclip’’ objects and real-world images from t

📄 Full Content

Learning view-based or appearance-based models of objects has been a major area of research in visual object recognition (see [5] for reviews). One direction of research has focused on treating the problem of learning appearance based models as an interpolation problem [16,14]. Another approach is to treat the problem of learning object models as a classification problem.

Both approaches have some limitations. For example, acquiring a novel object may involve fairly complex computations or model building. They also do not easily explain how an observer can transfer his skill at recognizing existing objects to generalizing from single or multiple views of novel objects; to explain such transfer, a variety of additional methods have been explored in the literature, including the use of object classes or categories, the acquisition and use of object parts, or the adaptation and sharing of features or feature hierarchies.

This paper describes an approach to learning appearance-based models that addresses these issues in a unified framework: the visual learning problem is reformulated as that of learning view generalization functions. The paper shows that knowledge of the view generalization function is equivalent to being able to carry out Bayes-optimal 3D optimal object recognition for an arbitrary collection of objects, presented to the system as training views. Model acquisition reduces to storing 2D views and does not involve learning or model building.

This represents a significant paradigm shift relative to previous approaches to learning in visual object recognition, which have treated the problem of acquiring models as a separate learning problems. While previous models of visual object recognition can be reinterpreted in the framework in this paper (and we will do so for two such methods), the formulation in terms of view generalization functions makes it easy to apply any of a wide variety of standard statistical models and classifiers to the problem of generalization to novel objects.

In this paper, I will first express Bayes-optimal 3D object recognition in terms of training and target views and prior distributions on object models and viewpoints. Then, I will describe the statistical basis of learning view generalization functions. Finally, I will demonstrate, both on the standard “paperclip” model and on the COIL-100 database, that learning view generalization functions is feasible.

This section will review 3D object recognition from a Bayesian perspective and establish notation. Let us look at the question of how an observer can recognize 3D objects from their 2D views. Let ω identify an object and B be an unknown 2D view (we will refer to B also as the target view). Then, classifying B according to ω(B) = arg max ω P (ω|B) is well known to result in minimum error classification [4]. Using Bayes rule, we can rewrite this as arg max ω P (ω|B) = arg max ω P (B|ω)P (ω)

= arg max ω P (B|ω)P (ω) P (ω) is simply the frequency with which object ω occurs in the world. Let us try to express P (B|ω) in terms of models and/or training views. Assume that we are given a 3D object model M ω . In the absence of noise, the projection of this 3D model into a 2D image is determined by some function f of the viewing parameters φ ∈ Φ, B = f (M ω , φ). The function f usually is rigid body transformations followed by orthographic or perspective projection.

In the presence of additive noise, B = f (M ω , φ) + N for some amount of noise distributed according to some prior noise distribution P (N ). With this

By construction, Equation 3 represents Bayesoptimal 3D model-based recognition, assuming perfect knowledge of the 3D model M ω for a given object ω.

In real-world recognition problems, the observer is rarely given a correct 3D model M ω prior to recognition. Instead, the observer needs to infer the model from a set of training views2 T ω = {T ω,1 , . . . , T ω,r }. Therefore, an observer is faced with the problem of determining P (B|ω) as P (B|T ω ). In a model-based framework, this means that the observer attempts to perform reconstruction of the object model M given the training views T ω and then performs recognition using the resulting distribution of probabilities over the possible models for recognition. If we put this together with Equation 3, we obtain for P (B|ω) = P (B|T ω ):

By construction, P (B|T ω ) represents the density of target views B given a set of training views T ω . Now that we have derived the Bayes-optimal 3D object recognition, let us look at some approaches that have been proposed in the literature for solving the 3D object recognition problem and how they relate to Bayes optimal recognition. 3D Model-Based Maximum Likelihood Methods. Traditional approaches to model-based 3D computer vision (e.g., [6]) generally divide recognition into two phases. During a model acquisition phase, the recognition system attempts to optimally reconstruct 3D models from 2D training data. During the

…(Full text truncated)…

📸 Image Gallery

cover.png page_2.webp page_3.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut