Blind Recovery of Spatially Varying Reflectance from a Single Image

Blind Reco very of Spatially V arying Reﬂectance from a Single Ima ge Ke vin Karsch David F orsyth Univ ersity of Illinois Input Synthesized material Material 1 Material 2 Mixing weights Figure 1: F r om a single photogr aph, our method estimates spatially varying materials (diffuse r eﬂectance and specular parameters). The input image is decomposed into k low-order , parametric materials (Material 1 and 2) and a set of per-pixel material mixing coefﬁcients (Mixing weights); shape and illumination is jointly inferr ed. This decomposition can be transferr ed to new shapes (Synthesized material) and also used to generate ne w materials. Abstract W e propose a new technique for estimating spatially varying para- metric materials from a single image of an object with unknown shape in unknown illumination. Our method uses a low-order parametric reﬂectance model, and incorporates strong assumptions about lighting and shape. W e develop new priors about how mate- rials mix over space, and jointly infer all of these properties from a single image. This produces a decomposition of an image which corresponds, in one sense, to microscopic features (material re- ﬂectance) and macroscopic features (weights deﬁning the mixing properties of materials over space). W e ha ve b uilt a large dataset of real objects rendered with dif ferent material models under dif ferent illumination ﬁelds for training and ground truth ev aluation. Exten- siv e experiments on both our synthetic dataset images as well as real images show that (a) our method reco vers parameters with rea- sonable accurac y; (b) material parameters reco vered by our method giv e accurate predictions of new renderings of the object; and (c) our low-order reﬂectance model still provides a good ﬁt to many real-world reﬂectances. CR Categories: I.2.10.d [Artiﬁcial Intelligence]: V ision and Scene Understanding—Modeling and recov ery of physical at- tributes; I.3.8 [Computer Graphics]: Applications; I.4.8.c [Image Processing and Computer V ision]: Image Models Keyw ords: reﬂectance estimation, shape from shading, material transfer , material modeling 1 Introduction Humans are quite good at guessing an object’ s material based on appearance alone [Adelson 2000]. Howe ver , material 1 estimation from a single photograph remains a challenging and unsolved prob- lem in computer vision. Appearance is often considered a function of object shape, incident illumination, and surface reﬂectance, and many solutions hav e been proposed addressing the problem of ma- terial estimation from a single image if shape and/or illumination are known precisely . Romeiro and Zickler ﬁrst sho wed ho w to estimate reﬂectance under known shape and illumination [Romeiro et al. 2008], and Romeiro et al. later extended this work by marginalizing ov er illumina- tion [Romeiro and Zickler 2010]. Generalizing further , Lombardi and Nishino [Lombardi and Nishino 2012] recover reﬂectance and illumination from an image assuming only that the object’ s shape is known, and Oxholm and Nishino [Oxholm and Nishino 2012] esti- mate reﬂectance and shape under exact lighting. If multiple images are av ailable, it is also possible to reco ver shape and spatially v ary- ing reﬂectance [Alldrin et al. 2008; Goldman et al. 2010]. These techniques provide valuable intuition for moving forward, yet they hinge on knowing exact shape or exact illumination, or ha ve strict setup requirements (directional light, multiple photos, etc), and re- quire a fundamentally different approach when additional informa- tion is not av ailable. Such approaches ha ve been proposed by Barron and Malik [Barron and Malik 2012a; Barron and Malik 2012b], who use strict priors to jointly recover shape, diffuse albedo, and illumination. Howev er, as in many shape-from-shading algorithms, all surfaces are assumed to be Lambertian. Glossy surfaces are thus impossible to recover and may cause errors in estimation. Furthermore, Lambertian models of material are not suitable for describing a large percentage of real- world surfaces, limiting the applicability of these techniques. A major concern of prior work is in recovering real-world BRDFs and high-frequenc y illumination [Lombardi and Nishino 2012; Ox- holm and Nishino 2012; Romeiro and Zickler 2010], or that re- cov ered shapes are integrable and reconstructions are exact (image 1 W e abbreviate “material reﬂectance” with “material. ” and re-rendered image match e xactly) [Barron and Malik 2012a; Barron and Malik 2012b]. Howev er, it is well known that recov- ering these high-parametric solutions is ill-posed in many cases, and precise conditions must be met to estimate these robustly . For example, real-world BRDFs can be extracted from a curv ed shape and single directional light (kno wn a priori) [Chandraker and Ra- mamoorthi 2011], and surface normals can be found giv en enough pictures with particular lighting conditions and isotropic (yet un- known) BRDFs [Alldrin and Krie gman 2007; Shi et al. 2012]. W e opt for lower-order representations of reﬂectance and illumina- tion. Our idea is to reduce the number of parameters we recov er , relax the constraints imposed by prior methods, and attempt to re- cov er materials from a more practical perspecti ve. Our main goal is material inference, but we must jointly optimize o ver shape and illumination since these are unknown to our algorithm. W e con- sider simple models of reﬂectance and lighting (models often used by artists, where perception is the only metric that matters), and impose only soft constraints on shape reconstruction. Our material model is low-order (only ﬁve parameters), allowing us to tease good estimates of materials from images, e ven if our reco vered shape and illumination estimates are some what inaccurate. W e also show that our model can be extended to spatially varying materials by infer- ring mixture coefﬁcients (linearly combining 5-parameter materi- als) at each pixel. Figure 1 demonstrates the results of our estima- tion technique on a marble mortar and pestle. Most similar to our work is the method of Goldman et al. [Gold- man et al. 2010]. They estimate per-pixel mixture weights and a set of parametric materials, b ut require multiple HDR images under known lighting and impose limiting priors on mixture weights. Our method is applicable to single, LDR images (lighting is jointly esti- mated), and we also dev elop new priors for better mixture weights. Contributions. Our primary contribution is a technique for extract- ing spatially varying material reﬂectance (beyond dif fuse parame- ters) directly from an object’ s appearance in a single photograph without r equiring any knowledge of the object’s shape or scene il- lumination . W e use a low-order parameterization of material and dev elop a new model of illumination that can be described also with only a few parameters, allo wing for efﬁcient rendering. Because our model has few parameters, we tend to g et low variance and thus r obustness in our material estimates (e.g. bias-variance tradeof f). By design, our material model is the same that is used through- out the 3D art/design community , and describes a large class of real-world materials (Sec 2). W e sho w how to ef ﬁciently estimate materials from plausible initializations of lighting and shape, and propose nov el priors that are crucial in estimating material rob ustly (Sec 3). W e extend this formulation to spatially varying materials in Section 4. Our material estimates perform fav orably to baseline methods and measure well with ground truth, and we demonstrate results for both synthetic and real images (Sec 5). W e show applica- tions in relighting (Figs 6, 8), material transfer (Fig 1), and material generation (Fig 12). Limitations. Since we are using lo w-order material models that are isotropic and ha ve monochromatic specular components, we cannot hope to estimate BRDFs of arbitrary shape (e.g. as measured by a gonioreﬂectometer), and there are some materials not encoded by our representation. Our reco vered lighting and shape are not neces- sarily correct with respect to the true lighting/shape, although they are consistent with one another and sometimes give good estimates; as such, we only make claims about the accuracy of our material estimates. W e use inﬁnitely distant spherical lighting without con- sidering interreﬂections, and we do not attempt to solve color con- stancy issues; lighting en vironments in our dataset integrate to the same v alue (per channel). Since we only have a single view of the object, certain material properties (e.g. specularities) may not be visible (depending especially on the coverage of the normals, i.e. ﬂat surfaces provide much less information than curved surfaces). N ω i ω o N ω i ω o R d R s r ’ a b ’ c d Measured BRDF (a) En vironment map (c) Our model (b) En vironment map (c) Our model (b) Our illumination (d) Figure 2: A general BRDF can be made up of numer ous r eﬂection “lobes” (a), b ut in practice (e.g . surface modeling), a simple BRDF with one diffuse and one specular lobe tends to sufﬁce (b). W e use this repr esentation as well as a low-or der parameterization of il- lumination. Our illumination considers a r eal-world, omnidir ec- tional lighting envir onment (c), and appr oximates it with a mixtur e of Gaussians and spherical harmonics; (d) shows our model ﬁt to (c). W e observe only slight per ceptual differ ences when rendering with differ ent combinations of the high- and low-or der parameteri- zations (bottom r ows). Due to our low-order model and perhaps mixture priors, shading effects can sometimes manifest in the spatial mixture map (Fig 11). 2 Low-or der reﬂectance and illumination Many pre vious methods have attempted to use high-order models of material (e.g. a linear combination of basis functions learned from measured BRDFs [Romeiro and Zickler 2010]) and illumina- tion (parameterized with wav elets [Romeiro and Zickler 2010] or ev en on an image grid [Lombardi and Nishino 2012; Oxholm and Nishino 2012], consisting of hundreds to thousands of parameters or more). W e propose the use of more rigid models of shape and illumination, which still can describe the appearance of most real world objects, and provide necessary rigidness to estimate materi- als when neither shape or illumination are exactly kno wn. Representing material. W e represent materials using an isotropic diffuse and specular BRDF model consisting of only ﬁ ve param- eters: dif fuse albedo in the red, green, and blue channels ( R d ), monochromatic specular albedo ( R s ) and the isotropic “roughness” value ( r ), which is the concentration of light scattered in the spec- ular direction, and can be considered (roughly) to be the size of the specular “lobe” (a smaller roughness v alue indicates a smaller specular lobe, where r = 0 encodes a perfect specular reﬂector). This type of material model is surprisingly general for its low number of parameters. Ngan et al. have previously shown that such parameterizations provide very good ﬁts to real, measured BRDFs [Ngan et al. 2005]. Perceptually , this model can also en- code a family of isotropic, dielectric surfaces (mattes, plastics, and many other materials in the continuum of perfectly diffuse to near-perfect specular) [Pharr and Humphreys 2010]. There is also compelling evidence that such a material model suf ﬁces for pho- torealistic rendering, as this is the same material parameterization found most commonly in 3D modeling and rendering packages (such as Blender 2 , which only considers dif fuse and specular re- ﬂection for opaque objects), and used extensiv ely throughout the 3D artist/designer community 3 . W e write our BRDF following the isotropic substrate model as described in Physically Based Rendering [Pharr and Humphreys 2010], which uses a microfacet reﬂectance model and assumes the Schlick approximation to the Fresnel effect [Schlick 1994]. Figure 2 shows a comparison of what a measured BRDF (a) might look like in comparison to our material parameterization (b). W e compare measured BRDFs to our material model (ﬁt using the pro- cedure described in Sec 5) rendered in natural illumination in the bottom row (left tw o columns). Representing illumination. Consider a single point within a scene and the omnidirectional light incident to that point. This incident il- lumination can be conceptually decomposed into luminaires (light- emitters) and non-emitting objects. W e consider these two sepa- rately , since the tw o tend to produce visually distinct patterns in ob- ject appearance (depending of course on the material). Luminaires will generally cause large, high-frequency changes in appearance (e.g. specular highlights), and non-emitters usually produce small, low-frequenc y changes. Using this intuition, we parameterize each luminaire as a two di- mensional Gaussian in the 2-sphere domain (sometimes known as the Kent distribution), and approximate any other incident light (non-emitted) as low-order functions on the sphere using 2 nd or- der spherical harmonics 4 . Such a parameterization has very few parameters relativ e to a full il- lumination en vironment (or en vironment map): six per light source (two for direction L ( d ) , one for each intensity L ( I ) , concentration κ , ellipticalness β , and rotation about the direction γ ) and 27 spher- ical harmonic coef ﬁcients (nine per color channel), but more impor - tantly , this parameterization still enables realistic rendering at much higher efﬁciency . Rendering efﬁcienc y is crucial to our procedure, as each function ev aluation in our optimization method (Sec 3) re- quires rendering. Our lighting en vironments maintain only high frequencies in re- gions of emitting sources, and is encoded by low-frequenc y spheri- cal harmonics ev erywhere else. Howe ver , rendering with full versus approximate (our) lighting produces similar results (bottom middle vs bottom right). For additional discussion, see Sec 3. 3 Estimating specular reﬂectance Our idea is to jointly recover material, shape, and illumination, such that a rendering of these estimates produces an image that is similar to the input image. F ollowing Barron and Malik [Barron and Malik 2 http://wiki.blender .org/index.php/Doc:2.6/Manual/Materials 3 http://www .luxrender .net/forum 4 W e assume all lighting comes from an inﬁnite-radius sphere surround- ing the object, as done in pre vious methods [Barron and Malik 2012a; Lom- bardi and Nishino 2012; Oxholm and Nishino 2012; Romeiro and Zickler 2010]) 2012a], we also enforce a strong set of priors to bias our material estimate tow ards plausible results. Our goal is to reco ver a ﬁ ve dimensional set of material parameters M = ( R ( r ) d , R ( g ) d , R ( b ) d , R s , r ) , while jointly optimizing over illu- mination L and surface normals N . Following notation in section 2, we denote R d as RGB diffuse reﬂectance, R s as monochromatic specular reﬂectance, and r as the roughness coefﬁcient (smaller r ⇒ narro wer lobe ⇒ shinier material). Illumination L = { L, s } is parameterized as a mixture of m Gaussians in the 2-sphere domain L = { L 1 , . . . L k } with direction L ( d ) i , intensity L ( I ) i , concentra- tion L ( κ ) i , ellipticalness L ( β ) i , and rotation L ( γ ) i ( i ∈ { 1 , . . . , k } ). Statistics of real illumination en vironments are nonstationary and can contain concentrated bright points [Dror et al. 2004]; the Gaus- sian mixture aim to represent these peaks. Indirect light s is rep- resented as a 9 × 3 matrix of 2 nd order spherical harmonic co- efﬁcients (9 per color channel). N is simply a vector of per -pixel surface normals parameterized by azimuth and ele vation directions. W e phrase our problem as a continuous optimization by solving the parameters which minimize the following: argmin M , N , L E rend ( M , N , L ) + E mat ( M ) + E illum ( L ) + E shape ( N ) subject to 0 ≤ M ( i ) ≤ 1 , i ∈ { 1 , . . . , 5 } , (1) where E rend is the error between a rendering of our estimates and the input image, and E mat , E illum , and E shape are priors that we place on material, illumination, and shape respectively . In the remainder of this section, we discuss the rendering term and the prior terms. Figure 3 sho ws the result of our optimization technique at various stages for a giv en input. Rendering error . Our optimization is guided primarily by a term that penalizes pixel error between the input image and a rendering of our estimates. The term itself is quite simple, but efﬁciently op- timizing an objectiv e function (which includes the rendering equa- tion) can be challenging. Writing I as the input, we deﬁne the term as the av erage squared error for each pixel: E rend ( M , N , L ) = X i ∈ pixels σ rend i || I i − f ( M , N i , L ) || 2 , (2) where f ( M , N , L ) is our rendering function, and σ rend i = I 2 i re- weights the error to place more importance on brighter points (pri- marily specularities). Notice that we do not strictly enforce equality (as in [Barron and Malik 2012a; Barron and Malik 2012b]), as this soft constraint al- lows more ﬂexibility during the optimization, and because our pa- rameterizations are too stiff for equality to hold. This has the added beneﬁt of reducing variance in our estimates. As in any iterativ e optimization scheme, each iteration requires a function ev aluation (and most likely a gradient or e ven hessian depending on the method). If chosen na ¨ ıvely , f can take hours or longer to e valuate and differentiate, and here we describe how to construct f so that this optimization becomes computationally tractable. The ke y to efﬁciency is in our lo w-order parameterization of illu- mination. By considering emitting and non-emitting sources sep- arately , we treat our rendering as two sub-renders: one is a “full” render using the emitting luminaires (which are purely directional since we assume the light is at inﬁnity), and the other is a diffuse- only render using all other incident light (reﬂected by non-emitters). T rue Starting point Optimized estimates Final result Input image Normals Rendered Normals Rendered Normals T rue M (input) Recov ered M Original illumination Novel illumination Figure 3: Results fr om our optimization procedur e. On the left is the input image (top left), true surface normals (top right), and true illumination (below), followed to the right by estimates that we be gin our optimization with. The estimated rendering (using Eq 3), estimated surface normals, and estimated illumination are displayed in the third column. The rightmost column shows the true material (left) and our estimated material (right) r endered onto the true shape in the original lighting en vironment (top), and render ed in novel lighting (bottom). Our initialization is described in Sec 3.1, and uses no prior information about the input. Denoting Ω e and Ω n as set of “emitting” and “non-emitting” light directions respectively , l ( ω ) as the light traveling along direction ω , and f M as the BRDF deﬁned by material M , we write our rendering function as f ( e ) i = Z Ω e f M ( ω , v ) l ( ω ) max( ω · N i , 0) dω f ( n ) i = Z Ω n l ( ω ) max( ω · N i , 0) dω f ( M , N , L ) i = f ( e ) i + R d f ( n ) i , (3) for the i th image pixel and a particular view direction v . Notice that f ( n ) i is simply irradiance o ver the non-emitting regions of the sphere, and is modulated by diffuse reﬂectance ( R d ) since Lamber- tian BRDFs are constant. W e can compute both of these efﬁciently , because Ω e is typically small (most lighting directions are occupied by ne gligible Gaus- sian components), and it is well known that dif fuse objects can be efﬁciently rendered through spherical harmonic projection [Ra- mamoorthi and Hanrahan 2001a]. In terms of previous notation, directional sources L are used in the full render ( f ( e ) i ), and s is used for the diffuse-only render ( f ( n ) i ). The intuition behind such a model is that indirect light contributes relativ ely low-frequency effects to an object’ s appearance, and ap- proximating these ef fects leads to only slight perceiv able differ - ences [Ramamoorthi and Hanrahan 2001b]. A variation of this in- tuition is used for efﬁciently choosing samples in Monte Carlo ray tracing (e.g. importance sampling [Pharr and Humphreys 2010]), which causes problems in continuous optimization techniques since rendering is then non-deterministic. Material prior . The rigidity of our material model (5 parameters to describe the entire surface), is a strong implicit prior in itself, but we also must deal with the ambiguity that can exist between diffuse and specular terms. For example, if a specular lobe ( r ) is large enough, then the specular albedo and diffuse albedo can be confused (e.g. dark specular albedo/bright diffuse albedo may look the same as bright specular albedo/dark diffuse albedo). Thus, we add a simple term to discourage large specular lobes, persuading the diffuse component to pick up any ambiguity between it and the specular terms: E mat ( M ) = λ m r 2 . (4) The only material parameter that is constrained is specular lobe size, and λ m = 1 in our work. Illumination prior . W e dev elop our illumination prior by collect- ing statistics from spherical HDR imagery found across the web (more details in Sec 5). Each image gi ves us a sample of real-w orld illumination, and to see how each sample relates to our illumination parameters, we ﬁt our lighting model (SO(2) Gaussians + 2 nd order spherical harmonics) to each spherical image. Fitting is done using constrained, non-linear least squares, and the number of Gaussians (corresponding roughly to luminaires) is determined by the number of peaks in the HDR image (smoothed to suppress noise). Priors are dev eloped by clustering the Gaussian parameters, and through prin- cipal component analysis on the spherical harmonic coefﬁcients. Denote ¯ κ j , ¯ β j as the means of the j th clusters (clustered indepen- dently using k -means) for the concentration, and ellipticalness of Gaussian parameters from our ﬁtting process. Intuitiv ely , these cluster centers giv e a reasonable sense of the shape of luminaires found in typical lighting en vironments, and we enforce our esti- mated sources to hav e shape parameters similar to these: E means illum ( L i ) = S ( {| L ( κ ) i − ¯ κ j |} k i =1 ) + S ( {| L ( β ) i − ¯ β j |} k i =1 ) , (5) where S is the softmin function (dif ferentiable min approximation) and | · | is a differentiable approximation to the absolute v alue (e.g. √ x 2 +  ). W e also ﬁnd the principal components (per channel) of the spher- ical harmonic coefﬁcients ﬁt to our data. During estimation, we reparameterize the estimated SH coefﬁcients using weight vectors w { r,g,b } , principal component matrices S { r,g,b } , and means of all ﬁt SH components µ { r,g,b } : s ( w ) = [ µ r + S r w r , µ g + S g w g , µ b + S b w b ] . W e impose a Laplacian prior on the weight vector: E pca illum ( w ) = X i ∈ weights | w | . (6) This coerces the recov ered SH components to lie near the dataset mean, and slide along prominent directions in the data. W e found that sev en principal components (per channel) roughly explained ov er 95% of our data (eigen value sums contain > 95% of the mass), and we discard the two components corresponding to the smallest eigen values (then w { r,g,b } ∈ R 7 and S { r,g,b } ∈ R 9 × 7 ). W e also impose a gray world assumption, namely that each color channel should integrate (over the sphere of directions) to roughly the same value. Because we only ha ve a single view of an ob- ject, some portions of the lighting sphere have signiﬁcantly more inﬂuence than others; e.g. the hemisphere behind the object is mostly unseen and has smaller inﬂuence than the hemisphere in front. W e weight the integration appropriately so that the dominant hemisphere has more inﬂuence (using W θ = cos( θ − π 2 ) , where θ is the angle between the view direction and the direction of inte gra- tion). This integration translates to a simple inner product (due to the nice properties of spherical harmonics), making the prior easy to compute: E gray illum ( s ) = || G T s r − G T s g || + || G T s g − G T s b || + || G T s r − G T s b || , (7) where G is the pre-computed integral of 2 nd order spherical har- monic basis functions (weighted by W θ ), and s { r,g,b } are the cur- rent estimates of spherical harmonic coefﬁcients. Our prior is then a weighted sum of these three terms: E illum ( L ) = λ m m X i =1 E means illum ( L i ) + λ p E pca illum ( w ) + λ g E gray illum ( s ) , (8) keeping in mind L = { L, s ( w ) } , and w as PCA weights described abov e. W e set λ m = λ p = λ g = 0 . 1 . Shape prior . W e also optimize over a grid of surface normals, and impose typical shape-from-contour constraints: smoothness, inte- grable shape, and boundary normals are assumed perpendicular to the view direction. Let N i = ( N x i , N y i , N z i ) , and ˆ N be the set of normals perpendicular to the occluding contour and vie w direction. W e write the prior as: E shape ( N ) = X i ∈ pixels λ s η s i ||∇ N i || + λ I ||∇ y N x i N z i − ∇ x N y i N z i || + λ c P c ∈ boundary pixels || N c − ˆ N c || , (9) where the ﬁrst term encodes smoothness where the input is also smooth (modulated by image dependent weights η ), the second term enforces integrability , and the third ensures that boundary nor- mals are perpendicular to the vie wing direction. W e set the weights as λ s = 1 , λ I = λ c = 0 . 1 . 3.1 Initialization Initial estimates of shape come from a na ¨ ıve shape-from-contour algorithm (surface assumed tangent to view direction at the silhou- ette, and smooth elsewhere), and light is initialized with the mean of our dataset (if applicable; leaving out the illumination that gen- erated the input image). W e estimate an initial R d by rendering irradiance with initial estimates of shape and lighting, dividing by the input image to get per-pix el albedo estimates, and av eraging the RGB channels; R s , r are set as small constants (0.01 for our re- sults). Full details of our initialization procedure can be found in supplemental material. 3.2 Undoing estimation bias Our low-parametric models tend to introduce bias into our esti- mates, but at the same time reduce estimation v ariance; e.g. bias- variance tradeoff). Howe ver , we ha ve found that our priors produce consistent estimation bias: we typically see a smaller specular lobe and specular albedo due most likely to our material prior (Eq 4). W e may also observe omitted-variable bias for images with materi- als not encoded by our model, but we do not address here. Past methods point out that there are clear visual distinctions be- tween different types and le vels of gloss [Fleming et al. 2003; Sha- ran et al. 2008; Wills et al. 2009], and we use the input image cou- pled with our estimates to develop an “un-biasing” function. W e dev elop a simple regression method (simple methods should suf- ﬁce since the bias appears to be consistent) which works well for removing bias and produces improved results. Our goal is to ﬁnd a linear prediction function that tak es a vector of features to unbiased estimates of R d , R s and r . Our features consist of our estimates of specular albedo and specular lobe size, as well as histogram fea- tures computed on the resulting rendered image, normal map, input image, and the error image (rendered minus input); features are computed for both raw and gradient images. Giv en a set of results from our optimization technique with ground truth material param- eter (obtained, e.g., from our dataset in Sec 5), we compute a bias prediction function by solving an L 1 regression problem (with L 2 regularization). For more details, see the supplemental material. 4 Recovering spatiall y varying reﬂectance W e propose an extension of Eq 1 for estimating spatial mixtures of materials. First, we deﬁne our appearance model simply as a spa- tially varying linear combination of renderings. Radiance at pixel i is deﬁned as: X j ∈ materials m i,j f ( M j , N i , L ) , (10) where m i,j is the j th mixture weight at pixel i , and M j is the j th material. The rendering error term for estimating spatial materials then becomes: E mix rend ( M 1 , . . . , M k , N , L , m ) = X i ∈ pixels σ rend i       I i − X j ∈ materials m i,j f ( M j , N i , L )       2 . (11) W e deﬁne three properties that the spatial maps ( m ) must adhere to: unity , ﬁrmness 5 , and smoothness. First, the unity prior ensures that the mixture weights must be nonnegativ e and sum to one at every pixel: ∀ i, j m i,j > 0 , ∀ i X j m i,j = 1 . (12) As noted by Goldman et al. [Goldman et al. 2010], this prev ents ov erﬁtting and remov es certain ambiguities during estimation. W e place another prior on the “ﬁrmness” of our mixture maps. For certain objects, man y patches on the surf ace are dominated by a sin- gle material (e.g. checkerboard); for others, the surface is roughly uniform ov er space (e.g. soap can be made of a diffuse layer and a glossy ﬁlm which are both present over the whole surface); there are e ven materials ranging in between (e.g. marble). W e w ould lik e a structured way of controlling which type of spatial mixture we produce, and we do so by imposing an exponential prior on each mixture element: E mix ﬁrm ( m ) = X i,j m α i,j , (13) where α > 0 controls ho w ﬁrm a mixture will be. For e xam- ple, with the unity constraint, α > 1 encourages uniform mix- ture weights (not ﬁrm, e.g. soap), and α < 1 encourages mixture weights to be near zero or one (ﬁrm, e.g. checkerboard). F or results in this paper , we use α = 0 . 5 . Notice that for α < 1 this function is no longer conv ex, although in practice our optimization still seems to fare well. Our prior is more general (and controllable) than the method of Goldman et al. [Goldman et al. 2010], which assumes that each pixel is the linear combination of at most two materials. Finally , we encourage spatial smoothness of the mixtures, as nearly all mixed-materials contain spatial structure: E mix smooth ( m ) = X i,j ||∇ x m i,j || + ||∇ y m i,j || , (14) 5 W e deﬁne the ﬁrmness prior as the decisiveness of mixture weights to snap to 0 or 1, and in this sense it has no relation to tactile properties. where ∇ x and ∇ y are spatial gradient operators in the image do- main. By inserting our new rendering term and mixture priors into the objectiv e function for single materials (Eq 1), we deﬁne a new op- timization problem for estimating spatially varying materials: argmin M 1 ,..., M k , N , L E mix rend ( M 1 , . . . , M k , N , L , m ) + E mat ( M )+ E illum ( L ) + E shape ( N ) + E mix ﬁrm ( m ) + E mix smooth ( m ) , subject to: 0 ≤ M ( i ) j ≤ 1 , i ∈ { 1 , . . . , 5 } , ∀ j, ∀ i, j m i,j > 0 , ∀ i X j m i,j = 1 . (15) Solving this objectiv e function can be difﬁcult, but we hav e had suc- cess using constrained quasi-Newton methods (L-BFGS Hessian). Our optimization results in a decomposition of the input image into k materials M , a set of per-pixel weights for each material m , per- pixel surface normals N , and illumination parameters L . In this work, we focus on the correctness of our mixture materials and their applications. 5 Experiments W e e valuate the results of our method for objects with homoge- neous (spatially uniform) materials in Section 5.1, as well as our inhomogeneous (spatially varying) material results in Section 5.2. W e report results for both a dataset we collected containing ground truth material information, as well as for the Dre xel Natural Illumi- nation dataset. 5.1 Homogeneous materials For ev aluation and training our bias predictors (Sec 3.2), we hav e collected a dataset consisting of 400 images rendered with real world shapes, materials, and illumination en vironments (all chosen from well-established benchmark datasets). W e use the 20 ground truth shapes av ailable in the MIT Intrinsic Image dataset [Grosse et al. 2009], and render each of these objects with 20 of the mate- rials approximated from the MERL BRDF dataset [Matusik et al. 2003], for a total of 400 images. W e use 100 different illumination en vironments (50 indoor , 50 outdoor) found across the web, pri- marily from the well known ICT light probe gallery 6 and the sIBL archiv e 7 . W e ensure that each object is rendered in 10 unique indoor and 10 unique outdoor lighting environments, permuted such that each illumination environment is used exactly four times through- out the dataset. Each lighting en vironment is white balanced and has the same mean (per channel). Our dataset has two “versions. ” The ﬁrst version of our dataset ( ﬁt dataset ) is rendered using our low-order reﬂectance model (we ap- proximate MERL BRDFs by ﬁtting our own 5-parameter material model to the measured data, and render using our ﬁts). The result- ing images are highly realistic, and allow us to both compare our material estimates with ground truth, and regress bias prediction functions (as in Sec 3.2). The second version ( measured dataset ) is rendered using only measur ed BRDFs (from the MERL dataset); these images are truly realistic as the shape, material, and lighting are all sampled directly from real-world data. Furthermore, these images are synthesized using a physical renderer and thus include shadows and bounced 6 http://gl.ict.usc.edu/Data/HighResProbes 7 http://www .hdrlabs.com/sibl/archiv e.html light. This dataset gauges how well our method can generalize to real images and reﬂectances not encoded by our model. Results. W e generate results using the optimization procedure de- scribed in Sec 3, followed by our bias regression method as in Sec 3.2. Bias prediction functions are learned through leav e-one- out cross validation. In this section, we report results from our optimization tech- nique ( blind optimized ), and after bias re gression ( opti- mized+regr essed ). F or comparison, we compute a baseline mate- rial estimate which computes the R d by a veraging the image pixels in each channel, and R s and r from the average found in our mate- rial dataset, and then regress and apply bias predictors to the base- line estimates ( baseline+regr essed ). W e also compare to materials achiev ed by our optimization assuming the shape and illumination are known 8 and ﬁxed ( known shape+light ); hence only the mate- rial is optimized. Results using this procedure gauge the difﬁculty of our optimization problem, and shows ho w much our optimiza- tion can improv e with more sophisticated initialization procedures. On our ﬁt dataset, our full method (optimized + regressed) is ca- pable of recov ering highly accurate material parameters. Figure 4 plots the true material from our “ﬁt” dataset against our estimated parameters for each of the 400 images. A perfect material estimate would lie along the diagonal (dashed line). Overall, we see a linear trend in our diffuse results, and that our bias regression can signiﬁ- cantly improve our optimized estimates of specular reﬂectance and specular lobe size (and ev en better than shape+light). W e also de velop two ways of measuring visual error in our mate- rials. W e deﬁne original illumination as the average pixel error from comparing the input image with the image produced by ren- dering our estimated material onto the true shape in the true light- ing (which are known for our all images in our dataset). This is a harsh test, as any errors in material must manifest themselves once rendered with the true shape and light. The second metric ( cross render ed ) is even more telling: we compare renderings of the in- put object with the a) true material and b) our estimated material in six novel illumination environments not present in our dataset and compute average pixel error . This measure exposes material errors across unique, unseen illumination. Using these measures, our full method achieves low error for both versions of our dataset. Figure 5 shows these error measures for three different metrics (per-pixel L2 and L1 norms, and absolute log difference), and optimized+regressed performs the best o ver- all for both datasets. This indicates that both our optimization and regression are crucial components, and one is not dominating in- ference since optimized+regressed consistently outperforms base- line+regressed. Known+shape light also performs well, indicating that our optimization procedure might improve if better initializa- tions are av ailable. W e demonstrate that in many cases our method can do very well at visually reproducing both measured and ﬁt reﬂectances, e ven in nov el illuminations. W e sho w qualitative results for both versions of our dataset in Figs 6 and 7 – these are some of our best and me- dian results. Our material estimates are typically visually accurate in original and nov el illumination, e ven for many of the measured BRDFs in our measured dataset. W e also observe that our regres- sion generally helps for both datasets, indicating that our learned bias predictors may generalize to comple x materials and real-world images. Howe ver , it is clear that our results degrade for comple x re- ﬂectance functions that lie well outside our model (Fig 7, measured dataset columns 1+4). Finally , we demonstrate our method’ s capability on real ima ges from the Drexel Natural Illumination dataset in Fig 8. Our model 8 Known lighting is ﬁt to our parameterization and may still be some distance from ground truth 0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Diffuse reflectance known shape+light blind optimized optimized+regressed true values 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0.2 Specular reflectance baseline known shape+light blind optimized optimized+regressed true values 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.04 Specular lobe size baseline known shape+light blind optimized optimized+regressed true values Figure 4: Err ors in material estimates for each image in our dataset. Each plot shows the true material value on the horizontal axis plotted against our estimate of diffuse reﬂectance ( R d ), specular reﬂectance R s , and specular lobe size r (left to right). W e show the results for our baseline, the material pr oduced given accurate initial shape and lighting, our blind optimization technique (blind optimized), and the material r egr essed by un-biasing our optimization results (blind r e gressed); details in Sec 5. L2 L1 log 0 0.2 0.4 0.6 0.8 1 Fit dataset pixelwise error baseline (orig) baseline+regressed (orig) blind optimized (orig) optimized+regressed (orig) known shape+light (orig) baseline (cross) baseline+regressed (cross) blind optimized (cross) optimized+regressed (cross) known shape+light (cross) L2 L1 log 0 0.2 0.4 0.6 0.8 1 Measured dataset pixelwise error baseline (orig) baseline+regressed (orig) blind optimized (orig) optimized+regressed (orig) known shape+light (orig) baseline (cross) baseline+regressed (cross) blind optimized (cross) optimized+regressed (cross) known shape+light (cross) Figure 5: W e compare the average per-pixel error of the input image and a r e-render ed image with estimated material (but with the true shape and true lighting that pr oduced the input image) for various techniques and for both versions of our dataset; see Sec 5 for details. W e compute err ors in the original illumination (orig), and averag ed over six novel illumination en vir onments (cr oss), for thr ee differ ent metrics: L2 and L1 norm, and the absolute log differ ence, and show the mean over the dataset (err or bars indicate one standard deviation). Our full method (optimized+r e gressed) achie ves low err or r elative to others. W e also observe similar (yet slightly worse) err or on our measur ed dataset, indicating that, for a variety of cases, our a) our method can handle r eal-world materials, and b) that our material model is capable of visually r eproducing comple x reﬂectance functions. appears somewhat robust to spatially varying reﬂectance in these images, but suffers from the complexity of the imaged reﬂectances and because we assume only a single material is present; this sug- gests ideas for future work. 5.2 Inhomogeneous materials For ground truth e valuation, we again use the measured dataset . W e use our mixture estimation procedure to estimate k = { 2 , 3 } materials 9 for each dataset image, and compare to the results in of our method for k = 1 . For additional comparison, we compute a baseline material estimate by clustering the image into k compo- nents (using k -means); computing diffuse albedo (per component) by averaging the image pixels in each channel, and the specular components are ﬁxed to a small yet reasonable v alue. W e measure error by rendering our estimated material onto the true shape in the true lighting (which are known for our all images in the dataset), and compare this to the input image. W e do the same test, b ut for six nov el lighting en vironments not found in the dataset (e.g. estimated material versus true material in novel light). W e denote these as “orig” and “cross” lighting respectively . These are harsh tests of generalization, as any errors in material must manifest themselves once rendered with the true shape and light, and the “cross” measure exposes material errors across unique and unseen illumination. Fig 10 shows quantitative results averaged over the entire dataset for 9 W e use a spatial mixture for homogeneous materials as our mixture maps generalize current literature. They capture spatial v ariation in material (as in [Goldman et al. 2010]), but we use them to also encode any kind of surface v ariation not well-captured due to long-standing SFS assumptions. L2, L1, and absolute log difference error metrics. Our mixture ma- terials (optimized- { 2 , 3 } ) consistently outperform single material estimation (optimized- 1 ), and are always better than the baseline estimates. W e observe a similar trend in our qualitative results (Fig 9). Be- cause we are attempting to estimate true, measured BRDFs which may lie outside of our 5-parameter material model, estimation may not work well with a single material. Howe ver , by adding multiple materials, we typically get improved results, even in novel illumina- tion. This indicates that our mixture weights are typically robust to shading artifacts such as shadows and specularities. It is clear that adding more components helps, although the distinction between k = 2 , 3 is subtle (both qualitativ ely and quantitatively). 6 Applications Once we hav e decomposed an image into its materials and spatial mixing weights, we can apply this intrinsic material information to new surfaces as in Fig 1. Applying the materials (microstructure) to a novel object is straightforward, but transferring the mixture weights (macrostructure) can be challenging in certain cases (e.g. when a mapping from one surf ace to another is not easily com- puted). W e propose a straightforward solution: choose a small patch of the image deﬁned by the mixture weights that is nearly fronto-parallel (determined from our predicted surface normals; to avoid fore- shortening), and synthesize a lar ger texture (seeded with the small patch) using existing methods; e.g. [Efros and Leung 1999]. Then, map the surface of the object that the material will be transferred to onto a plane (also using existing methods; e.g. [Shef fer et al. Fit dataset Measured dataset Best results Median results Best results Median results Nov el illumination Original illumination true opt+reg opt true (input) opt+reg opt Figure 6: Qualitative results on both versions of our dataset Materials ar e estimated using our blind optimized (opt) and optimized+re gr essed (r eg) methods, and compared to gr ound truth (true). The true original illumination image is also the input for estimating material. Notice that our technique can r ecover both glossy and matte materials, performs well even for these complex shapes. Our method attains visually pleasing r esults even for complex r eﬂectance functions not encoded by our model (e.g. measured dataset) e ven in new lighting conditions. Median True Best opt reg reg opt Fit dataset materials 6 ? Measured dataset materials Figure 7: Comparison of estimated materials r ender ed in novel lighting. The true materials lie on the middle r ow alongside our per-material best and median optimized (opt) and r e gr essed (re g); arr ows indicate the dir ection in which materials should impr ove. W e achieve very good r esults for input ima ges that ar e well described by our model in the ﬁt dataset (rows 2 and 4 generally look like r ow 3), and even in many cases for measured BRDFs. However , low-order model bias pre vents our method fr om capturing certain materials well (e.g. column 4; measured dataset). Original light true estimated Nov el light true estimated Original light true estimated Nov el light true estimated Original light true estimated Nov el light true estimated Figure 8: Results on r eal data fr om the Dr exel Natural Illumination dataset. This dataset contains real images and corr esponding gr ound truth shape and lighting information. W e estimate materials fr om one picture, and r ender the material using the true shape and light for the original illumination and another illumination from the dataset (novel light); we compare to the r eal pictur e of the object in both scenes (original and novel). Even in the presence of slight spatial variation (e.g. top left; apple) and complex reﬂectance (top middle) our method can still r ecover decent estimates. Still, addr essing these issues is key to g eneralizing our method’ s applicability . Original light 1 material 2 materials 3 materials true Nov el light 1 material 2 materials 3 materials true Figure 9: Best (top r ow) and median (bottom r ow) r esults fr om the “measur ed dataset” which contains physically render ed objects with measur ed BRDFs. T ypically these materials ar e not well encoded by our low-or der material model with 1 mixtur e component, but increasing the number of mixture components impro ves re-r endering err or . W e show our estimated materials for one, two, and thr ee mixtur e components, and compar e these to the ground truth r esult (also the input image) in both the original and novel illumination en vir onments. L2 L1 log 0 0.2 0.4 0.6 0.8 1 Pixelwise error for increasing number of components baseline−1 (orig) baseline−2 (orig) baseline−3 (orig) optimized−1 (orig) optimized−2 (orig) optimized−3 (orig) baseline−1 (cross) baseline−2 (cross) baseline−3 (cross) optimized−1 (cross) optimized−2 (cross) optimized−3 (cross) Figure 10: Quantiative r esults on our “measur ed” dataset. Our mixtur e materials (optimized- { 2 , 3 } ) consistently outperform single material estimation (optimized- 1 ); see text for details. 2006]); this mapping deﬁnes correspondences between the synthe- sized mixture weights and the new mesh. W e generate all of our transfer/generation results using this technique, and more sophisti- cated methods are clear directions for future work. W e also propose a generative material modeling strategy: besides transferring a complete mixture material, we can combine estimates from multiple images to create ne w materials (e.g. materials from one and mixture weights from another , and so on). Generativ e results (as well as direct transfer results) are shown in Fig 12. W e ha ve decomposed four swatches from our dataset (all unique colors and mediums and spanning the three illumination en- vironments in our dataset) using k = 2 mixture components. W e apply each set of materials to each synthesized mixture, and ren- der the result onto spheres. W e assert that our estimated materi- als correspond to microstructure and mixing weights correspond to macrostructure, which appears correct for these results (microstruc- ture varies v ertically , macrostructure horizontally). 7 Conclusion W e have demonstrated a new technique for estimating spatially varying parametric materials from an image of a single object of unknown shape in unknown illumination, going beyond the typi- cal Lambertian assumptions made by existing shape-from-shading techniques. Strong priors and low-order parameterizations of light- ing and material are ke y in providing enough constraints to make this inference tractable. Such rigid parameterizations often lead to estimation bias, and we also present a simple yet po werful tech- nique for removing this bias. Input Rendered Mixing weights Input Synthesized material Material 1 Material 2 Mixing weights Figure 11: F ailure examples. The top r ow demonstrates an incor- r ect mixtur e map estimate: specularities have been detected as a separate material. A material transfer result is shown on bottom, but our material model contains no mesostructur e and appears ﬂat. Our results suggest that material recov ery is not necessarily depen- dent upon the joint recovery of accurate shape and illumination; as long as the shape and illumination are consistent with each other , materials can still be rob ustly estimated . This is encouraging from a material inference standpoint, as even the best shape-from-shading algorithms still produce ﬂawed estimates in man y scenarios. As far as we know , our method is the ﬁrst to estimate parametric material models without assuming shape or illumination is known a priori. W e believ e that our method provides good initial evidence that solving this problem is in fact feasible, and provides a founda- tion for estimating materials from photographs alone. Our decompositions can be transferred to new shapes, imbuing them with similar appearance as the input image. Furthermore, our decompositions are also generati ve, and can be used to create new materials by simultaneously transferring decompositions from multiple objects (e.g. mixing weights from one, materials from an- other). Our re-rendering results do not incorporate any informa- tion from our estimated surface normals, and the spatial frequency of our mixture weights are deﬁned by the input image resolution (some artifacts visible in Fig 1); intelligently incorporating and up sampling these estimates are reasonable directions for future work. References A D E L S O N , E . H . 2000. Lightness perception and lightness illu- sion. In New Cognitive Neur osciences , 339–351. glass base texture ﬁber gloss blue-glass gray-base red-texture red gray blue Figure 12: Material tr ansfer and generation for several material ”swatches” (hemispher es painted with differ ent colors/mediums/coats). W e decompose single images (on left) into two material components and a spatial mixture map. Then, we synthesize new materials by taking all combinations of the inferr ed materials and the derived mixture weights, and r ender these combinations onto spher es in novel illumination (using LuxRender: http:// luxr ender .net). Images along the diagonal show a transfer material r esult for a given pictur e on the left. The off-dia gonals show the gener ative capabilities of our algorithm: by combining multiple decompositions (materials + mixing weights), we can generate new , unseen materials. W e expect that full 3D textur es will give better results, but it is currently impossible to estimate 3D textur es fr om a single picture . Best viewed in color at high r esolution. A L L D R I N , N . , A N D K R I E G M A N , D . 2007. T oward reconstructing surfaces with arbitrary isotropic reﬂectance : A stratiﬁed photo- metric stereo approach. In ICCV . A L L D R I N , N . G . , Z I C K L E R , T. , A N D K R I E G M A N , D . 2008. Photometric stereo with non-parametric and spatially-varying re- ﬂectance. In CVPR . B A R RO N , J . T., A N D M A L I K , J . 2012. Color constancy , intrinsic images, and shape estimation. In ECCV . B A R RO N , J . T . , A N D M A L I K , J . 2012. Shape, albedo, and illumi- nation from a single image of an unknown object. In CVPR . C H A N D R A K E R , M . , A N D R A M A M O O RT H I , R . 2011. What An Image Rev eals About Material Reﬂectance. In ICCV . D R O R , R . O . , W I L L S K Y , A . S . , A N D A D E L S O N , E . H . 2004. Statistical characterization of real-w orld illumination. J V is 4 , 9, 821–837. E F R O S , A . A. , A N D L E U N G , T. K . 1999. T exture synthesis by non-parametric sampling. In ICCV . F L E M I N G , R . W . W . , D RO R , R . O . , A N D A D E L S O N , E . H . 2003. Real-world illumination and the perception of surface reﬂectance properties. J V is 3 , 5, 347–368. G O L D M A N , D . , C U R L E S S , B . , H E RT Z M A N N , A . , A N D S E I T Z , S . 2010. Shape and spatially-varying brdfs from photometric stereo. IEEE TP AMI 32 , 6, 1060–1071. G R O S S E , R . , J O H N S O N , M . K. , A D E L S O N , E . H . , A N D F R E E - M A N , W . T. 2009. Ground-truth dataset and baseline e valuations for intrinsic image algorithms. ICCV . L O M BA R D I , S . , A N D N I S H I N O , K . 2012. Reﬂectance and Natural Illumination from a Single Image. In ECCV . M ATU S I K , W. , P FI S T E R , H . , B R A N D , M . , A N D M C M I L L A N , L . 2003. A data-driven reﬂectance model. A CM T ransactions on Graphics 22 , 3 (July), 759–769. N G A N , A . , D U R A N D , F. , A N D M AT U S I K , W. 2005. Experimental analysis of brdf models. In Pr oceedings of the Eur ographics Symposium on Rendering , 117–226. O X H O L M , G . , A N D N I S H I N O , K . 2012. Shape and Reﬂectance from Natural Illumination. In ECCV . P H A R R , M . , A N D H U M P H R E Y S , G . 2010. Physically Based Rendering, Second Edition: F rom Theory T o Implementation , 2nd ed. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. R A M A M O O RT H I , R . , A N D H A N R A H A N , P . 2001. An efﬁcient representation for irradiance environment maps. In ACM SIG- GRAPH . R A M A M O O RT H I , R . , A N D H A N R A H A N , P . 2001. A signal- processing framework for in verse rendering. In ACM SIG- GRAPH . R O M E I RO , F . , A N D Z I C K L E R , T . 2010. Blind reﬂectometry. In ECCV . R O M E I RO , F. , V A S I LY E V , Y . , A N D Z I C K L E R , T . 2008. Passiv e reﬂectometry . In ECCV , 859–872. S C H L I C K , C . 1994. An Inexpensi ve BRDF Model for Physically- based Rendering. Computer Graphics F orum 13 , 233–246. S H A R A N , L . , L I , Y . , M O T OY O S H I , I . , N I S H I D A , S . , A N D A D E L - S O N , E . H . 2008. Image statistics for surface reﬂectance per- ception. J. Opt. Soc. Am. A 25 , 4 (Apr), 846–865. S H E FF E R , A . , P R AU N , E . , A N D R O S E , K . 2006. Mesh parameter - ization methods and their applications. F ound. T r ends. Comput. Graph. V is. 2 , 2 (Jan.), 105–171. S H I , B . , T A N , P . , M A T S U S H I TA , Y . , A N D I K E U C H I , K . 2012. El- ev ation angle from reﬂectance monotonicity: photometric stereo for general isotropic reﬂectances. In ECCV , 455–468. W I L L S , J . , A G A R W A L , S . , K R I E G M A N , D . , A N D B E L O N G I E , S . 2009. T ow ard a perceptual space for gloss. A CM T ransactions on Graphics 28 , 4, 1–15.

Blind Recovery of Spatially Varying Reflectance from a Single Image

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment