Convolutional Sparse Coding for High Dynamic Range Imaging

EUR OGRAPHICS 2016 / J. Jorge and M. Lin (Guest Editors) V olume 35 ( 2016 ), Number 2 Con volutional Sparse Coding f or High Dynamic Range Imaging Ana Serrano 1 Felix Heide 2 Diego Gutierrez 1 Gordon W etzstein 2 Belen Masia 1 , 3 1 Univ ersidad de Zaragoza 2 Stanford Univ ersity 3 MPI Informatik Figure 1: High dynamic range image (HDRI) recover ed fr om a single, coded, 8-bit low dynamic rang e (LDR) image using the pr oposed sparse r econstruction method. Left: HDR image r ecover ed with our fr amework, tonemapped for display purposes. The inset shows a cr opped r e gion of the coded LDR imag e used as input to the r econstruction algorithm. Center left: Close-up of two exposures of the reconstructed HDR image showing the ability of our method to reconstruct an extended dynamic range. Center right: normalized luminance plots of the marked scanline (yellow line, r otated by 90 ◦ ) for the r econstructed imag e (gr een curve) and the gr ound truth ima ge (blue curve). Right: false color image of the reconstructed HDR scene (scale is in stops), showing the extr emely larg e dynamic range that the original scene had and our technique is able r ecover . Abstract Curr ent HDR acquisition techniques ar e based on either (i) fusing multibr ack eted, low dynamic rang e (LDR) images, (ii) mod- ifying e xisting har dwar e and capturing differ ent exposur es simultaneously with multiple sensors, or (iii) r econstructing a single image with spatially-varying pixel exposures. In this paper , we propose a novel algorithm to recover high-quality HDRI im- ages fr om a single, coded exposur e. The proposed r econstruction method builds on recently-intr oduced ideas of con volutional sparse coding (CSC); this paper demonstrates how to make CSC pr actical for HDR imaging . W e demonstrate that the pr oposed algorithm achieves higher-quality r econstructions than alternative methods, we evaluate optical coding schemes, analyze al- gorithmic parameters, and build a prototype coded HDR camera that demonstrates the utility of con volutional sparse HDRI coding with a custom har dwar e platform. Categories and Subject Descriptors (according to A CM CCS) : I.4.1 [Image Processing and Computer V ision]: Digitization and Image Capture— 1. Introduction One of the fundamental characteristic of a sensor is its dynamic range : the interplay of full-well capacity , noise, and analog to dig- ital con version. The ability to simultaneously record and distin- guish very low signals alongside extremely bright scene parts is critical for many applications in scientiﬁc imaging, microscopy , and also consumer photography . Unfortunately , the hardware ca- pabilities of av ailable image sensors are insufﬁcient to capture the wide range of intensities observed in natural scenes. This has mo- tiv ated researchers to dev elop computational imaging techniques to overcome the dynamic range constraints of sensor hardware by c  2016 The Author(s) Computer Graphics Forum c  2016 The Eurographics Association and John Wile y & Sons Ltd. Published by John W iley & Sons Ltd. A. Serrano, F . Heide, D. Gutierr ez, G. W etzstein & B. Masia / Convolutional Spar se Coding for Single-shot HDR Imaging co-designing image capture mechanisms and post-processing algo- rithms. T oday , high dynamic range (HDR) photography is well- established and usually done via one of three general approaches: sequentially capturing and subsequently fusing multiple different exposures (e.g., [ MP95 , DM97 ]), capturing different exposures si- multaneously with multiple sensors (e.g., [ TKTS11 ]), or coding per-pix el or per-scanline exposures within a single image with appropriate reconstruction algorithms [ NM00 , NN02 , GHMN10 , WIH10 , KGBU13 , HST ∗ 14 , ZSFC ∗ 15 ]. Whereas sequential im- age capture is easily afforded by existing cameras, this method makes it challenging to capture dynamic scenes and usually re- quires additional motion stabilization and de-ghosting techniques. Multi-sensor solutions are elegant, but more expensiv e and they require precise calibration. In this paper, we advocate for coded pixel exposure techniques and propose a ne w reconstruction al- gorithm for this class of computational cameras. Our approach builds on recent advances in conv olutional sparse coding and re- construction techniques. W e show that a naïve application of tradi- tional, patch-based (i.e. non-conv olutional) sparse reconstruction techniques [ LBRN07 , CENR10 ] struggles to deliv er high image quality for high contrast scenes. W e make the key observation that con volutional sparse coding (CSC) (e.g., [ KSlB ∗ 10 ]), is particu- larly well-suited for the type of high-contrast signals present in HDR images. Therefore, we pose the HDR recovery problem as con volutional sparse coding problem and derive necessary formu- lations to solve it ef ﬁciently . W e make the following contrib utions: • W e introduce conv olutional sparse coding (CSC) for high dy- namic range image reconstruction. • W e propose forward and in verse methods that are tailored to re- cov ering a high-contrast (HDR) image from a single, coded ex- posure photograph. • W e demonstrate improved image quality over other existing ap- proaches and over a naïve application of sparse reconstruction techniques to HDRI. W e also evaluate algorithmic parameters, analyze different exposure coding schemes, and interpret HDR image features. • W e build a prototype coded exposure camera and demonstrate the utility of our algorithm using data captured with this proto- type. 2. Related W ork One of the most common techniques to compute HDR images is ex- posure brack eting. This technique, also kno wn as multi-bracketing, merges several LDR images of the scene taken with different brack- eting exposures, into the ﬁnal HDR image [ MP95 , DM97 ]. One of the main drawbacks of this technique is that, if either the camera or some scene elements mov e during the extended capture pro- cess, ghosting artifacts appear . There have been many algorithms designed to remove these artifacts by means of alignment and de- ghosting [ SS12 ]. Some recent works include the use of optical ﬂow [ ZBW11 ], patch-based reconstruction [ GGC ∗ 09 , SKY ∗ 12 ], or modeling the noise distribution of color values [ GKTT13 ]. The problem is further aggrav ated for HDR video (e.g., [ KSB ∗ 13 , MG11 ]): on the one hand, optical ﬂow solutions fail in the pres- ence of complex motion, on the other hand patch-based methods lack built-in temporal coherence. In contrast, the proposed con vo- lutional sparse coding approach can produce an HDR image from a single shot, thus removing the need for alignment, motion estima- tion or , in general, any de-ghosting strategy . Other works rely on multiple cameras [ SBB14 , BRG ∗ 14 ], en- hanced sensor control electronics performed in simulation [ PZJ13 ], or otherwise highly modiﬁed hardware designs [ MRK ∗ 13 , ZSFC ∗ 15 ]. For instance, T occi et al. [ TKTS11 ] and Kronander et al. [ KGBU13 ] achieve single-shot HDR by acquiring several LDR images with different sensors using a beam splitter. Our method uses an off-the-shelf camera with a simple mask on the sensor or using a per-pix el coding e xposure, which greatly reduces complex- ity , size and overall cost. Previously proposed single-shot approaches rely on exposures that vary per image s canline, for example implemented with coded electronic shutters [ CKL14 ], or sensors which allow different gain settings simultaneously for alternating pixel rows [ GHMN10 , HKU14 , HST ∗ 14 ]. In all of these cases, an image is reconstructed using sophisticated interpolation methods, and often relies on ad- ditional image priors. These methods present a trade-off between the dynamic range that can be recovered with only two different exposures, and the quality of the ﬁnal reconstruction, determined by how far apart the exposures are chosen. Other spatially-varying gain methods aim at capturing increased dynamic range from a single image, using a per-pix el coded exposures. Nayar and col- leagues [ NM00 , NN02 ] place a mask of spatially varying neutral density ﬁlters on the sensor, effecti vely coding different exposures for adjacent pixels according to the optical pattern of the mask. Howe ver , this method is limited by interpolation artifacts and alias- ing resulting from the regular pattern of the mask. The work by Aguerrebere and colleagues [ AAD ∗ 14 ] leverages recent advances in solving in verse problems [ YSM12 ] together with a spatially- varying mask, but still relies on a complex MAP Expectation- Maximization optimization framework which can lead to artifacts in scenes of high dynamic range. In this paper , we propose a sparse reconstruction frame work that takes advantage of the compressibility of visual information to re- construct a high dynamic range image from asingle shot with pixel- coded exposure. Sparse reconstruction has been used before in the context of rendering [ SD11 ], and image reconstruction and ac- quisition [ SD09b , MKU15 ], including high-speed video [ LGH ∗ 13 , SGM15 ], dual photography [ SCG ∗ 05 , SD09a ] and light transport acquisition [ PML ∗ 09 ], light ﬁeld capture [ MWBR13 ], hyperspec- tral imaging [ LL WD14 , JCK16 ], or even extended dynamic range imaging using a Fourier basis [ SBN ∗ 12 , SKZ ∗ 13 ]. Howe ver , we do not rely on a con ventional, patch-based learning and reconstruc- tion method as most of these works do because it has certain lim- itations for the recovery of HDR images. Instead, we propose a nov el formulation based on con volutional sparse coding (CSC). CSC has been used for learning hierarchical image representa- tions [ KSlB ∗ 10 , ZTF11 , CPS ∗ 13 ] and to solve transient imaging problems [ HXK ∗ 14 , HDL ∗ 14 ]. W e build on the basic idea of con- volutional sparse coding and make it practical for coded, single- shot HDR image acquisition. c  2016 The Author(s) Computer Graphics Forum c  2016 The Eurographics Association and John Wile y & Sons Ltd. A. Serrano, F . Heide, D. Gutierr ez, G. W etzstein & B. Masia / Convolutional Spar se Coding for Single-shot HDR Imaging 3. CSC framework f or HDR reconstruction In this section, we offer a brief revie w of sparse coding techniques and introduce a new formulation of con volutional sparse coding tailored to the problem of high dynamic image reconstruction from a single image with spatially-varying pix el exposures. 3.1. Review of sparse coding and r econstruction The traditional problem faced in sparse reconstruction is that of solving an underdetermined system of linear equations y = Φ Φ Φα α α in which α α α ∈ R n is the signal we are interested in, y ∈ R m is the signal we actually can measure, and Φ Φ Φ ∈ R m × n is the sensing matrix, such that m < n . Solving the sparse reconstruction problem relies on the assump- tion that the signal is sufﬁciently compressible in some basis or dictionary Λ Λ Λ ∈ R n × l . This implies that α α α = Λ Λ Λ s , with most coef- ﬁcients of s ∈ R l being zero or close to zero. This dictionary is often learned from a training set representative of the images of interest † [ AEB06 , MBPS09 ]. W e can then recover α α α under certain conditions by solving the follo wing minimization problem [ Ela10 ]: min s k s k 1 subject to k y − Φ Φ ΦΛ Λ Λ s k 2 ≤ ε (1) where ε represents uncertainties in the measurements, such as sen- sor noise. This minimization is solved in a patch-based manner, that is the image is di vided into a series of ov erlapping patches and each patch is reconstructed indi vidually using Eq. 1 . All the recon- structed patches are subsequently mer ged, for example by comput- ing a per-pix el av erage, to yield the ﬁnal result. A drawback of dictionary-based sparse coding approaches is that important spatial structures of the signal of interest can be lost due to the subdivision into mutually-independent patches. Further , patches (atoms) of the dictionaries learned with this approach are often redundant and contain shifted versions of the same features. This can be seen in Figure 2 (left), which shows sample atoms of a dictionary learned from HDR images. Moreover , as we show in Section 4.2 and Figure 5 , due to the nature of the mathematical formulation (a linear combination of learned patches), these patch- based approaches can fail to adequately represent high-frequency , high-contrast image features, which are particularly important in HDR images. An alternative to patch-based approaches is CSC, which instead is based on an image decomposition into spatially-in variant con vo- lutional features, as explained in the following. Compared to the atoms of a dictionary , the learned ﬁlters of our CSC scheme (Fig- ure 2 (right)) show a much richer variance (e.g., they span a larger range of orientations), which leads to better reconstructions. Con volutional sparse coding models the signal of interest α α α ∈ R n as a sum of sparsely-distributed conv olutional features [ HHW15 ], that is α α α is modeled as: α α α = K ∑ k = 1 d k ∗ z k , (2) † Alternativ ely , well-explored sparsity bases, such as the DCT or wa velets, could be used. Figure 2: Left: sample atoms of learned dictionary trained on HDR images (patches ar e tonemapped for display). Right: sample ﬁlters learned with a con volutional sparse coding frame work. The con vo- lutional ﬁlter bank shows less redundancy , crisper features, and a lar ger rang e of featur e orientations. In this case, the dictionary is a conv olutional ﬁlter bank formed by ﬁlters d k of ﬁxed spatial support √ p × √ p , while z k are sparse feature maps of size √ n × √ n . Consequently , the signal recovery can be performed by solving argmin d , z 1 2 k x − K ∑ k = 1 d k ∗ z k k 2 2 + β K ∑ k = 1 k z k k 1 subject to k d k k 2 2 ≤ 1 ∀ k ∈ { 1 , . . . , K } . (3) Heide and colleagues [ HHW15 ] generalized this formulation to be able to handle incomplete data, as modeled by the general linear operator M : argmin d , z 1 2 k x − M K ∑ k = 1 d k ∗ z k k 2 2 + β K ∑ k = 1 k z k k 1 subject to k d k k 2 2 ≤ 1 ∀ k ∈ { 1 , . . . , K } . (4) They also proposed a technique for ef ﬁciently solving this problem via splitting of the objectiv e function. 3.2. HDR image formation model Based on the ﬁlm reciprocity equation [ DM97 ], we can describe the image formation model at the sensor as: y = f ( p ∗ ∆ t L ) (5) where y ∈ R n is the vectorized image captured at the sensor, ∆ t is the exposure time, L ∈ R n represents radiance values, and the function f models the camera response. The conv olution by p is modeling the effect of the point spread function (PSF) of the op- tical system, which can also be expressed as a multiplication by a con volution matrix P . Note that we use radiance L instead of irra- diance since almost all modern cameras provide a nearly constant mapping between both magnitudes, compensating for angular ef- fects [ DM97 , KMH95 ]. W e optically modulate the light arriving at each pixel by placing a coded transmissivity mask Ω Ω Ω on the sen- sor or by applying a spatially-coded exposure readout. This can be formulated as y = f ( Ω Ω Ω P ∆ t L ) (6) where Ω Ω Ω ∈ R n × n is a diagonal matrix containing the modulation code of the mask. For RA W images, we can assume a linear re- sponse of the digital sensor with respect to irradiance for all non- saturated pixels [ LMS ∗ 13 ]. Thus, we can rewrite Equation 6 as c  2016 The Author(s) Computer Graphics Forum c  2016 The Eurographics Association and John Wile y & Sons Ltd. A. Serrano, F . Heide, D. Gutierr ez, G. W etzstein & B. Masia / Convolutional Spar se Coding for Single-shot HDR Imaging y y y = ζΩ Ω Ω PL , where ζ is a scale factor modeling the linear response of the sensor and the inﬂuence of exposure time ∆ t . This scaling factor (and thus absolute radiance values) could be recovered by imaging a calibrated light source and scaling all radiance values accordingly . In our context, we aim at obtaining relativ e radiance values, therefore we can remove ζ and rewrite Equation 7 in nor- malized form as: y = Ω Ω Ω PL ∗ (7) where L ∗ represents relativ e radiance values. The mask Ω Ω Ω will en- sure that pixels are sampled with ef fectiv ely different exposure v al- ues, so that in all image regions at least some of the pixels prop- erly sample the dynamic range. The sparse reconstruction step de- scribed ne xt will be in charge of obtaining the radiance v alues from these differently sampled pix els. 3.3. Con volutional sparse HDRI coding Equation 4 allows for the recovery of contrast-normalized images in which part of the data is missing or unreliable, as giv en by matrix M . In the case of HDR reconstruction, ho wev er, our captured image y —as given by Eq. 7 —does not only have missing or unreliable data, but also differently exposed pixels due to matrix Ω Ω Ω . In the case of HDR imaging the unreliable data M corresponds to both saturated and noisy pixels. Incorporating the varying exposures Ω Ω Ω we pose the con volutional reconstruction of radiance values as: argmin z 1 2 k y − Ω Ω Ω MP K ∑ k = 1 d k ∗ z k k 2 2 + β K ∑ k = 1 k z k k 1 (8) where β controls the relati ve weight of the sparsity term. Note that, in contrast to Eqs. 3 and 4 , we optimize only for z , since we assume that we hav e already learned a dictionary of ﬁlters d . The dictionary of ﬁlters d is learned using Eq. 4 , and some of the learned ﬁlters are sho wn in Figure 2 (right). W e learn the ﬁlters from a set of LDR images, after performing a local contrast normal- ization on these images. This amounts to learning from whitened data (normalized sigma and mean). As a consequence of this nor- malization, the formulation cannot be used directly in a generativ e model: While the correct scaling for recovery can be obtained dur- ing the optimization by ﬁnding the correct values in the sparse maps z , the offset cannot. T o solv e this, we introduce an of fset term o that we jointly estimate with the sparse feature maps. The smoothness is ensured by a quadratic smoothness constraint, leading to: argmin z 1 2 k y − Ω Ω Ω MP ( K ∑ k = 1 d k ∗ z k + o ) k 2 2 + β K ∑ k = 1 k z k k 1 + λ s k∇ o k 2 2 (9) Thanks to this normalization, the ﬁlters generalize to dif ferent means and scales—which are obtained during the optimization—, and they are independent of dynamic range. W e additionally ob- serve that the learned ﬁlters have fewer data-speciﬁc features and are more general this way , and the learning conver ges in fewer itera- tions. Speciﬁc implementation details on the ﬁlter dictionary learn- ing are giv en in Section 5 . W e can elegantly ﬁt this additional offset in the proposed opti- mization framew ork by expressing it as the con volution o = d K + 1 ∗ z K + 1 , where d K + 1 is a Dirac delta, and Equation 9 thus becomes: argmin z 1 2 k y − Ω Ω Ω M K + 1 ∑ k = 1 Pd k ∗ z k k 2 2 + β K ∑ k = 1 k z k k 1 + λ s k∇ z K + 1 k 2 2 (10) where λ s controls the relati ve weight of the smoothness term. Note that only smoothness, and not sparsity , is enforced for this z K + 1 . Finally , if we rewrite Equation 10 by substituting ˆ M = Ω Ω Ω M and ˆ d k = Pd k , our problem can be written as the CSC problem shown in Equation 4 , with the exception of the quadratic smoothness term: argmin z 1 2 k y − ˆ M K + 1 ∑ k = 1 ˆ d k ∗ z k k 2 2 + β K ∑ k = 1 k z k k 1 + λ s k∇ z K + 1 k 2 2 (11) W e solve this problem using a modiﬁcation of the ADMM algo- rithm [ BPC ∗ 11 ]. T o do so, we need to reformulate Equation 11 to express the ﬁrst two terms as a sum of functions, in the following form: argmin z I ∑ i = 1 f i ( K i z ) + λ s k∇ z K + 1 k 2 2 , (12) For more details on this transformation please refer to [ HHW15 , Sec. 2.1 and 2.2]. Once this is done, the modiﬁed ADMM algorithm to solve for z in our case is shown in Algorithm 1 . The update in line 2 of the algorithm is solv ed in the spectral domain, and thus the ad- ditional smooth constraint does not increase the computational cost signiﬁcantly w .r .t. the original formulation [ HHW15 ]. Also, the ﬁl- ter size does not matter in our case, since we are performing the ﬁlter inv ersion in the frequency domain. This would not be com- putationally efﬁcient with traditional CSC methods such as that of Szlam et al. [ SKL10 ]. Finally , prox φ refers to the proximal opera- tor of a function φ as described in Parikh and Boyd’ s work [ PB14 ]. Algorithm 1 ADMM for HDR recov ery 1: for k = 1 to V do 2: y k + 1 = argmin y k K y − z + λ k k 2 2 + λ s k∇ z K + 1 k 2 2 3: z k + 1 i = prox f i ρ ( K i y k + 1 i + λ k i ) 4: λ k + 1 = λ k + ( Ky k + 1 − z k + 1 ) 5: end for 4. Analyzing con volutional sparse HDRI coding In this section, we provide an analysis of the proposed framework, including choice of coded exposure patterns and algorithmic pa- rameters. W e also show advantages of this formulation over tradi- tional, patch-based sparse reconstruction for HDR capture. c  2016 The Author(s) Computer Graphics Forum c  2016 The Eurographics Association and John Wile y & Sons Ltd. A. Serr ano, F . Heide , D. Gutierr ez, G. W etzstein & B. Masia / Con volutional Spar se Coding for Single-shot HDR Ima ging 4.1. Design of coded exposur e patter ns There are se v eral f actors to tak e into account when designing the optical mask Ω Ω Ω . First, it needs to ha v e a high light throughput, to a v oid noise and reduce required e xposure time; second, its per - pix el transmissi vity v alues e i should co v er a wide range of e xpo- sures (that is, e max / e min should be lar ge); and third, it should f acil- itate practical implementation. W e tested se v eral conﬁgurations for the mask o v er a set of se v en dif ferent images; in particular , these conﬁgurations were: binary , Gaussian, uniform, uniform with four ﬁx ed e xposures, ﬁx ed pattern with four e xposures, and interlea v ed e xposure. In the follo wi ng we detail the formulation for each mask, the moti v ation behind its testing, and its performance. W e initially tested and compared the performance three optical masks: a binary mask, a mask where e xposure v alues are dra wn from a Gaussian distrib ution ( Ω Ω Ω G = { e i ; e i ∼ N ( 0 . 6 , 0 . 1 ) } ), and a mask obtained by dra wing v alues from a unif orm distrib ution ( Ω Ω Ω U = { e i ; e i ∼ U ( 0 , 1 ) } ). The reconstruction results are sho wn in Figure 3 . The binary mask is limited when modulating the incom- ing light, and, as a result, is v ery limited in terms of the reco v ered dynamic range; lar ge saturated areas, for instance, will be impossi- ble to reco v er since all the pix els will be de graded due to the binary sampling. Both the uniform and the Gaussian masks yield good re- sults, and choosing between them represents a trade-of f between transmissi vity and dynamic range. The Gaussian mask of fers bet- ter light throughput, b ut a more limited reco v erable dynamic range: most of the v alues of the Gaussian distrib ution will be close to the mean, with fe w v ery lo w v alues. As a result, lar ge bright areas (such as in Figure 3 , around the sun) may s till rema in saturated. A uni- form mask allo ws reco v ery of a lar ger dynamic range because it more uniformly samples the range of e xposures, minimizing the risk of lar ge under - or o v er -e xposed areas e v en in scenes of v ery high dynamic ranges. While a uniform mask w orks well in practice, for a practical hardw are implementation ha ving a lo w num ber of discrete e xpo- sure v alues is beneﬁcial. W e therefore compare the uniform mask Ω Ω Ω U with a uniform 4-e xposure mask Ω Ω Ω F , that is one in which each pix el randomly tak es one of four e xposure v alues { e 1 . . e 4 } . W e choose the e xposure v alues such that the ratio e max / e min co v ers 6 f -stops, i.e., e 4 / e 1 = 2 6 ; this, with the dynamic range of 1000:1 that a standard CMOS sensor has [ EG02 ], allo ws us to reco v er up to 16 stops in dyna mic range. Figure 4 sho ws the quality of the resulting reconstruction for Ω Ω Ω U and Ω Ω Ω F , whi ch can be seen to be v ery similar in both. Thus, Ω Ω Ω F allo ws us to reco v er a v ery simi- lar range to the uniform one, without artif acts, and has an easier implementation. Consequently , in the remainder of the paper , we opt for a uniform, 4-e xposure pattern ( Ω Ω Ω = Ω Ω Ω F ), since it of fers the best trade-of f between quality of the results—in terms of reco v ered dynamic range and abse nce of artif acts —, and ease of implemen- tation in hardw are. The e xception to this is our hardw are prototype (Section 5.1 ): since it e xhibits signiﬁcant light loss (mainly due to the LCoS and the beamsplitter) we do use a Gaussian mask to minimize the impact of the reduced light throughput. Ho we v er , fu- ture chip designs with b uilt-in per -pix el e xposure will o v ercome this prototype’ s limitations; taking this into account the best option among the conﬁgurations we tested is Ω Ω Ω F . Additionally , to highlight the v ersatility of our reconstruction Figur e 3: HDR ima g es in false color (color scale shows f-stops) showing (fr om left to right): gr ound truth r adiance , r adiance r e- co ver ed using a binary mask Ω Ω Ω B as optical code , a Gaussian mask Ω Ω Ω G , and a uniform mask Ω Ω Ω U (mor e details in the te xt). The ﬁr st two masks clearly fall short when r eco vering dynamic r ang e , while the uniform one of fer s r esults very close to the original. The tonemapped gr ound truth ima g e can be seen in F igur e 4 , left. Figur e 4: Left: a tonemapped HDR gr ound truth ima g e . Right: quality of dif fer ent optical masks when attempting to r eco ver the gr ound truth scene r adiance . F r om left to right: a uniform mask Ω Ω Ω U , a 4-e xposur e mask Ω Ω Ω F , an interleaved mask Ω Ω Ω I , and a ﬁxed pattern mask Ω Ω Ω P . F or eac h one , the left part shows the r econ- structed ima g e , and the right part the err or with r espect to the gr ound truth displayed as ( 1 − S S I M ) [ WBSS04 ]. The top r ow shows a sample r e gion of the corr esponding mask. W e c hoose Ω Ω Ω F for its ability to faithfully r eco ver a wide dynamic r ang e and its ease of implementation. Please r efer to te xt for mor e details. frame w ork, we tested tw o additional e xposure patterns which ha v e been used before in the conte xt of HD R imaging. Their results are also sho wn in Figure 4 (in the tw o rightm ost images). In particu- lar , we sho w a reconstruction result for a ﬁxed pattern Ω Ω Ω P , using four e xposures (that is, the mask sho ws a repeating, ﬁx ed 2 × 2 pattern), and a result for an interlea v ed e xposure pattern. The for - mer has been proposed before for HDR imaging, b ut with the re- construction done by means of interpolation [ NM00 ], which can lead to aliasing ef fects. The latter is inspired by the Ma gic L antern softw are package, which of fers a ﬁrmw are upgrade to capture an interlea v ed e xposure consisting of alternating ro ws with tw o dif fer - ent e xposures Ω Ω Ω I for some of f-the-shelf cameras. Our frame w ork allo ws for a plausible result e v en with these e xposure patterns. 4.2. Adv antage of CSC HDRI o v er patch-based appr oaches P atch-based sparse reconstruction approaches ha v e been widely used in computational imaging problems [ LGH ∗ 13 , MWBR13 , LL WD14 ]. In this section, we illustrate and e xplain ho w directly applying such approaches to the problem of HDR reconstruction c  2016 The Author(s) Computer Graphics F orum c  2016 The Eurographics Association and John W ile y & Sons Ltd. A. Serr ano, F . Heide , D. Gutierr ez, G. W etzstein & B. Masia / Con volutional Spar se Coding for Single-shot HDR Ima ging Figur e 5: Detail of an HDR ima g e r econstructed using a patc h- based spar se r econstruction appr oac h (left) and our con volutional spar se coding fr ame work (right). The former is unable to r eco ver very high-contr ast sharp edg es, while the latter of fer s good r e- sults in this case . The ima g es ar e tonemapped for display us- ing [ MDK08 ]. from a single, e xposure-coded image w ould produce undesired re- sults in a number of cases. W e ha v e already seen ho w the ﬁlters in our frame w ork sho w a richer v ariance (less redundanc y and a lar ger range of orienta- tions) compared to traditi onal atoms in a learned dictionary (Fig- ure 2 ). Conseque ntly , CSC dictionaries are more descripti v e and better capture the essence of the signals (e.g., the y a v oi d the need to ha v e shifted v ersions of the patches), which results in better re- constructions. More importantly , in patch-based approaches, the signal (a gi v en image patch that is to be reconstructed) is repre- sented as a linear combination of dictionary patches with their asso- ciated coef ﬁcients. This is problematic when attempting to recon- struct patches which contain v ery lar ge contrast edges (common in HDR images), because an e xtremely lar ge number of patches with high-v alued coef ﬁcients is needed to properly reconstruct the edge. This is of course not only the case with learned dictionary patches, b ut also if an y other basis (e.g., DCT) is used. As such, this problem w as also encountered in the past in HDR image compres- sion [ MKMS04 , Fig. 5]. Consequently , when reconstructing HDR images with a patch-base d approach the reconstruction f ails in the presence of v ery high contrast edges, yielding artif acts as sho wn in Figure 5 , left. CSC, in contrast, can naturally handle these lar ge contrast edges—as sho wn i n Figure 5 (right)—thanks to the formu- lation of the signal as a sum of con v olutions of the ﬁlters by sparse feature maps as opposed to a linear combination of dictionary ele- ments. Moreo v er , the con v olutional sparse coding frame w ork con v er ges signiﬁcantly f aster than the patch-based approach (for which we use the well-kno wn OMP algorithm [ TG07 ]). Speciﬁcally , in an In- tel Xeon E5-1620 @3.50GHz with 16GB RAM our CSC approach is around 2.5x f aster . 4.3. Optimization parameters As e xplained in Section 3.3 , β controls the relati v e weight of the sparsity term with respect to the data term (see Equation 11 ). In- creasing the v alue of β will therefore result in a de gradation of the high frequencies in the reconstructed scene, since the feature maps z will be too sparse to represent ﬁne details. Decreasing β , on the contrary , will lead t o an e xcessi v e relati v e weight of the data term, which can result in artif acts due to approximations of non-linearities of the process (such as the quantization). Fig- Figur e 6: Reconstructed HDR ima g e (tonemapped for display) showing the ef fect of β , the r elative weight of the spar sity term, in the optimization. Please r efer to the te xt for details. ure 6 sho ws this beha vior . W e choose an intermediate v alue of β , β c hosen = 1 . 5 · 10 − 5 , which we use in all t he reconstructions sho wn in this w ork. The other rele v ant parameter in the optimization is the relati v e weight of the quadrati c smoothness term, λ s in Equation 11 ; we choose λ s = 0 . 5 · 10 − 5 . In this case, it is im portant that a good estimate of the of fset term z K + 1 is gi v en as initial v alue to the opti- mization. W e pro vide a blurred v ersion of the captured LDR image di vided by the optical mask, which yields good results and f ast con- v er gence. 5. Results W e sho w here reconstruction results using both e xisting HDR im- ages ‡ , and data captured with our prototype camera. All results sho wn ha v e been reconstructed using our single-shot method de- scribed in this paper , with the same optical mask Ω Ω Ω F described in Section 4.1 , consisting of four ra ndomly sampled e xposure v alues with e max / e min = 2 6 , e xcept where otherwise indicated. The ﬁlter bank d k used for the reconstruction is learned from a collection of ten natural LDR images using the method proposed by Heide et al. [ HHW15 ]; a representati v e sample of these learned ﬁlters is sho wn in Figure 2 (right). When choosing the training images we learn the ﬁlters from, we found our frame w ork rob ust enough to pro vide similar results when learned from dif ferent sets of im- ages: Learning the ﬁlter bank from a dataset of images used in the w ork of Heide et al. or learning from tonemapped images from F airchild’ s database (on a set not used for testing) yielded recon- structions which dif fered in less than 0.5 dB in PSNR. The size of the ﬁlters is determined by the resolution of the training data; the ﬁlters need to be lar ge enough so the y contain useful information, yet small enough not to o v erﬁt to speciﬁc features of the training data. W e ﬁnd that learning K = 100 ﬁ lters of size 11 × 11 pix els fulﬁlls these conditions for our data and w orks well for all the im- ages tested. All HDR results sho wn ha v e been tonemapped using the same algorithm [ MDK08 ]. W e additionally compare our results ‡ W e use images from the HDR Photographic Surv e y (http://rit- mcsl.or g/f airchild/HDR.html), and the EMP A HDR Image Database (http://www .empamedia.ethz.ch/hdrdatabase/inde x.php). c  2016 The Author(s) Computer Graphics F orum c  2016 The Eurographics Association and John W ile y & Sons Ltd. A. Serrano, F . Heide, D. Gutierr ez, G. W etzstein & B. Masia / Convolutional Spar se Coding for Single-shot HDR Imaging Figure 7: T op row: Recovered HDR images fr om a single-shot coded image (tone mapped using [ MDK08 ]), and PSNR values. The insets show the squared, per-pixel differ ence with respect to the gr ound truth luminance. Bottom ro w: F alse color (split) ima ges depicting luminance of the original scene, and of our r econstructed scene; we use a base-2 logarithm to pr operly display the extr emely larg e dynamic range . to two other spatially varying exposure methods [ NM00 , HKU14 ]. For results using existing HDR images as input, we simulate the process of capturing the coded LDR image as follows: W e ﬁrst ap- ply a con volution kernel p simulating the optical PSF of the cam- era, and modulate light arriving at the sensor multiplying the ra- diance values of the input HDR by our coded mask. W e bracket these values taking into account that a typical CMOS sensor has a dynamic range of around 1000 : 1. In doing so, we assume a reasonably well-exposed LDR image, but nevertheless we simu- late the metering of a camera and take into account saturation and under-e xposure by placing the sensor range so that the number of saturated and under -exposed pixels is minimized. Then we normal- ize these bracketed values and apply a camera response function § . Last, we quantify the resulting values to store the LDR image which will be used as input for the reconstruction. Figure 7 shows four of our reconstructed HDR images. In ad- dition to our reconstruction (top row), we show , for each scene, a false color image of the ground truth scene and our reconstruction (bottom row , split images) to show our ability to recover the large dynamic range present in the original scene. Since we recover rel- ativ e radiance, and giv en the large dynamic range, we plot in false color l og 2 radiance normalized to the ground truth. Further , the insets in the top row show the error, computed as the square of the per-pix el difference between ground truth and our reconstruc- tion, scaled for visualization purposes. W e also report the PSNR for each one, which is always above 40 dB. This ﬁgure sho ws how our § http://www1.cs.columbia.edu/CA VE/software/softlib/dorf.php Figure 8: Additional results obtained by our technique for two HDR scenes. T op row: tonemapped HDR image (using [ MDK08 ]. Middle row: Normalized luminance plots for the corresponding marked scanlines for our recover ed image (green curve) and the gr ound truth image (blue curve). Bottom row: Close-up of two ex- posur es of the corresponding highlighted re gions, displaying very high-contrast edges. c  2016 The Author(s) Computer Graphics Forum c  2016 The Eurographics Association and John Wile y & Sons Ltd. A. Serrano, F . Heide, D. Gutierr ez, G. W etzstein & B. Masia / Convolutional Spar se Coding for Single-shot HDR Imaging Figure 9: Comparison with two repr esentative spatially varying exposur e methods [ NM00 , HKU14 ]. The inherent interpolation step in such methods leads to visible artifacts in areas of high contrast or very ﬁne detail. Figure 10: HDR r econstruction of an animated scene . Left: Coded sensor images using our optical code Ω Ω Ω F . Center and right: T wo example frames exhibiting temporal coher ence in the r e- construction. Input video fr om the LiU HDR video r epository (http://www .hdrv .org). method is able to recover scenes with very high dynamic range, faithfully reproducing contrast in the original scene. More recon- structed scenes can be found in Figure 8 , in which we show our re- constructed HDR image (top row), normalized luminance of sam- ple scanlines, both recov ered and ground truth (middle row), and two exposures of the reconstructed scene to better show the qual- ity of the reconstruction across the dynamic range, including the challenging case of high-contrast sharp edges (bottom row). Different from other common spatially varying exposure meth- ods, our approach does not rely on interpolation of the captured samples to reconstruct the image. Instead, it exploits information of the structure of natural images through the learned con volutional ﬁlter bank, which greatly minimizes the presence of visible artifacts in areas of high contrast or very ﬁne detail. W e show this by explic- itly comparing our results against the spatially varying exposure methods of Nayar et al. [ NM00 ] and Hajisharif et al. [ HKU14 ], which makes use of the Magic Lantern software to capture inter- laced, dual-ISO images. Our method preserves edges better , min- imizing the aliasing artifacts that arise from the trade-off between spatial resolution and dynamic range in Nayar’ s method, while Ha- jisharif ’ s method has difﬁculties recovering thin structures, such as the small branches of the tree (Figure 9 ). Our technique can be applied to the reconstruction of HDR ani- mated scenes as well, using the same optical code for each frame. Our reconstruction framework yields a v ery faithful recov ery of the original signal, naturally leading to temporal coherence, without the need for e xplicit enforcement. W e sho w this in Figure 10 , using an existing HDR video from the LiU HDR video repository ¶ , and also include this video in the supplementary material. The HDR video recovering is performed frame by frame from LDR capture simulations from the aforementioned HDR video. Finally , our framework can also be used for compression of HDR images. T raditional techniques used for compression of images can fail when applied to HDR images, due to the high-contrast sharp edges that can be present in them. Consequently , techniques hav e been developed to compress this type of content [ MKMS04 ]. Our framew ork allows for compression of HDR images, since we can represent them with a set of sparse feature maps. W e ha ve sho wn in Figure 5 how for HDR content we a void artifacts that appear when codiﬁcation and reconstruction with patch-based schemes is used. Note that DCT was also proven to not work well by Mantiuk et al. [ MKMS04 ], requiring more complex processing for compres- sion. 5.1. Hardwar e prototype implementation Per-pix el exposure cameras are not commercially av ailable yet, al- though a per-pixel exposure patent has already been ﬁled by Sony Corporation [ Jo14 ]. W e hav e built a prototype that simulates this feature to demonstrate our method with real scenes. T o this end, we have implemented a capture system based on a liquid crystal on silicon (LCoS) display (Figure 11 , left). This device, together with a beamsplitter and relay optics, simulates a Gaussian attenu- ation mask placed before the sensor. In this setup, the SLR cam- era lens (Canon EF-S 60 mm f/2.8 Macro USM) is focused on the LCoS, virtually placing the mask at the sensor . Our imaging lens is a Canon EF 50 mm f/1.8 II, focused at 50 cm; scenes are placed at 80 - 100 cm. The f-number of the system is f/2.8, the maxi- mum of both lenses. Since a single pixel of the LCoS cannot be well-resolved with this setup, we treat LCoS pixels in blocks of 8 × 8 pixels, resulting in a mask with a resolution of 240 × 135. Figure 11 (right) shows results with real scenes captured with our prototype optical setup. The ﬁgure includes a close-up of the LDR coded image captured at the sensor , the ﬁnal tone mapped HDR re- construction, and sev eral details with varying exposure lev els. Our lab prototype is not artifact-free, although it demonstrates the via- bility of our approach. The LCoS displays some birefringence, de- creased light throughput, and a se vere loss of contrast, all of which degrade the LDR captured signal. Future chip designs such as the Sony patent could ov ercome these limitations. Nevertheless, our reconstruction does not introduce additional degradation in the re- sults, as Figures 7 and 8 show . Additionally , we have applied our technique to an image cap- tured using an interlaced exposure with dual ISO 100/800 on a Canon EOS 500D camera with the Magic Lantern sofware. The result is shown in Figure 12 . 6. Discussion and conclusion Limitations In some cases, it is possible that the image y captured with the optical mask contains lar ge saturated ar - eas despite the presence of the mask; the low transmissi vity ¶ http://www .hdrv .org c  2016 The Author(s) Computer Graphics Forum c  2016 The Eurographics Association and John Wile y & Sons Ltd. A. Serr ano, F . Heide , D. Gutierr ez, G. W etzstein & B. Masia / Con volutional Spar se Coding for Single-shot HDR Ima ging Figur e 11: Left: Our pr ototype har dwar e implementation. Our optical system is made up of an ima ging lens, a beamsplitter , an LCoS, and an SLR camer a. Objects ar e placed for illustr ation purposes only; when photo gr aphing the scene , the y ar e placed at a distance of 80 - 100 cm fr om the ima ging lens. Middle and right: T wo r econstructions of r eal scenes. F or eac h scene we show the tonemapped HDR r econstruction (top), two dif fer ent e xposur es of the highlighted ar e as r e vealing the dynamic r ang e (bottom), as well as a partial detail of the LDR coded ima g e captur ed at the sensor (inset). Figur e 12: Reconstruction of an HDR ima g e captur ed with dual ISO 100/800 with a Canon EOS 500D: original scene (left) , and close-ups of coded and r econstructed r e gions, the latter tonemapped using [ MDK08 ] (right) . pix els of the mask typically pre v ent this, b ut in images with e xtremely lar ge dynamic range it c an happen. In these cases when no information at all is captured, the reco v ery may ha v e some artif acts. An e xample of this is sho wn in the inset ﬁg- ure with a light b ulb . This light b ulb is a cl ose-up re gion of the scene in Figure 8 (right column). This scene has a v ery lar ge dynamic range (o v er 17 stops), since it captures both the v ery dark inside of the room a nd the bright light b ulb outside. Therefore, if the inside is to be reco v ered, there is a saturated area in the captured image y . Ne v ertheless, as we sho w in the paper , we are able to f aithfully reconstruct scenes of v ery lar ge dynamic range. Beneﬁts W e ha v e presented a frame w ork for con v olutional sparse coding of HDR images. From a single, optically coded im- age, we reconstruct dynamic range using a trained con v olutional ﬁlter bank. Our approach follo ws a current trend in computational photograph y , le v eraging the joint design of optical elements and processing algorithms. Once trained, the obtained ﬁlter bank can be used to reconstruct a wide v ariety of HDR images greatly dif- fering from the training set. Since our reconstruction is based on a con v olutional approach, it does not rely on the linear combination of patches common in sparse reconstruction methods; this greatly reduces reconstruction artif acts, in particular in high-contrast sharp edges present in HDR images. W e are not limited to a restricted number of captured e xposures, nor do we f ace the implicit trade- of f between captured dynamic range and interpolation quality that other methods based on spatially-v arying e xposures f ace. In com- parison to other CSC approaches, the algorithm we base our for - mulation on has demonstrated (see [ HHW15 , Sec. 3]) that it has a lo wer comple xity and better con v er gence than pre viously proposed methods for CSC [ ZKTF10 , BEL13 , BL14 ], beneﬁts which directly carry o v er to our method. As an additional adv antage, our frame w ork naturally accounts for the optical PSF of the system, sinc e we incorporate it in our model ( P in Equation 10 ). Moreo v er , it can be easily e xtended to perform demosaicking, by properly designing matrix M in Equa- tion 10 , which models missing pix els. Last, we ha v e not only b uilt a ph ysical prototype, b ut ha v e also sho wn ho w our approach can yield good r esults with of f-the-shelf consumer hardw are that cap- tures interlea v ed e xposures using the Magic Lantern softw are. Futur e w ork The de v elopment of patents lik e Son y’ s per - pix el, double-e xposure method will progressi v ely introduce v ary- ing e xposure and optically modulated systems, thus allo wing for increased capabilities of commercial cameras. Our optimization could incorporate e xpli cit modeling of image noise to perform de- noising in partic ularly noisy images. Finally , an e xciting a v enue of future w ork lies at the con v er gence between acquisition and display c  2016 The Author(s) Computer Graphics F orum c  2016 The Eurographics Association and John W ile y & Sons Ltd. A. Serrano, F . Heide, D. Gutierr ez, G. W etzstein & B. Masia / Convolutional Spar se Coding for Single-shot HDR Imaging technologies, for the full plenoptic function and taking perceptual considerations into account [ MWDG13 ]; compressiv e sensing and sparse coding techniques may be able to handle the high dimen- sionality of this challenging problem. 7. Acknowledgements The authors would like to thank Karol Myszko wski, as well as Jose Echev arria and Adrian Jarabo, for fruitful insights and discussion. W e would also like to thank Saghi Hajisharif and Jonas Unger, for sharing their results and for their assistance with them; Nicolas Landa for preliminary testing of traditional compressive sensing on HDR; and Maria Angeles Losada and the Photonic T echnologies Group at Universidad de Zaragoza for their optical instrumenta- tion. Ana Serrano was supported by an FPI grant from the Spanish Ministry of Economy and Competitivity (project Lightslice). Felix Heide was supported by a Four-year Fellowship from the Univer - sity of British Columbia. Diego Gutierrez would like to acknowl- edge support from the BBV A Foundation and project Lightslice. Gordon W etzstein was supported by a T erman Faculty Fellowship and by the Intel Strategic Research Alliance on Compressi ve Sens- ing. Belen Masia was partially supported by the Max Planck Center on V isual Computing and Communication. References [AAD ∗ 14] A G U E R R E B E R E C . , A L M A N S A A . , D E L O N J . , G O U S S E AU Y . , M U S E P . : Single shot high dynamic range imaging using piecewise linear estimators. In ICCP (2014). 2 [AEB06] A H A RO N M . , E L A D M . , B R U C K S T E I N A . : K-SVD: An Algo- rithm for Designing Overcomplete Dictionaries for Sparse Representa- tion. T rans. Sig. Pr oc. 54 , 11 (Nov . 2006), 4311–4322. 3 [BEL13] B R I S T OW H . , E R I K S S O N A . , L U C E Y S . : Fast Conv olutional Sparse Coding. In Pr oc. CVPR (2013), pp. 391–398. 9 [BL14] B R I S T OW H . , L U C E Y S . : Optimization Methods for Conv olu- tional Sparse Coding. In arXiv:1406.2407 (2014). 9 [BPC ∗ 11] B O Y D S . , P AR I K H N . , C H U E . , P E L E ATO B . , E C K S T E I N J .: Distributed optimization and statistical learning via the alternating direc- tion method of multipliers. F oundations and Tr ends in Mac hine Learning 1 , 3 (2011), 127–239. 4 [BRG ∗ 14] B A T Z M . , R I T C H E R T. , G A R B A S J . - U . , P AP T S A . , S E I L E R J . , K A U P A . : High Dynamic Range V ideo Reconstruction From a Stereo Camera Setup. Elsevier 29 , 2 (2014). 2 [CENR10] C A N D E S E . , E L D A R B Y . , N E E D E L L A D . , R A N DA L L C P.: Compressed sensing with coherent and redundant dictionaries. Applied and Computational Harmonic Analysis 31 , 1 (2010). 2 [CKL14] C H O H . , K I M S . J . , L E E S . : Single-shot High Dynamic Range Imaging Using Coded Electronic Shutter. Computer Graphics F orum (2014). 2 [CPS ∗ 13] C H E N B . , P O L A T K A N G . , S A P I RO G . , B L E I D . , D U N S O N D . , C A R I N L .: Deep learning with hierarchical con volutional factor analysis. P attern Analysis and Machine Intelligence, IEEE Tr ansactions on 35 , 8 (Aug 2013), 1887–1901. 2 [DM97] D E B E V E C P . E . , M A L I K J . : Recovering high dynamic range radiance maps from photographs. In Proceedings of SIGGRAPH ’97 (1997), pp. 369–378. 2 , 3 [EG02] E L G A M A L A .: High dynamic range image sensors. In Interna- tional Solid-State Circuits Conference T utorials, 2002. 5 [Ela10] E L A D M . : Sparse and Redundant Representations: Fr om The- ory to Applications in Signal and Image Processing , 1st ed. Springer Publishing Company , Inc., 2010. 3 [GGC ∗ 09] G A L L O O . , G E L FA N D N . , C H E N W. , T I C O M . , P U L L I K . : Artifact-free high dynamic range imaging. ICCP (2009). 2 [GHMN10] G U J . , H I T O M I Y . , M I T S U NA G A T., N AY A R S . : Coded Rolling Shutter Photography: Flexible Space-Time Sampling. In ICCP (2010). 2 [GKTT13] G R A NA D O S M . , K I M K . I . , T O M P K I N J . , T H E O B A LT C . : Automatic noise modeling for ghost-free hdr reconstruction. A CM Tr ans. Graph. 32 , 6 (Nov . 2013), 201:1–201:10. 2 [HDL ∗ 14] H U X. , D E N G Y . , L I N X . , S U O J . , D A I Q . , B A R S I C. , R A S K A R R . : Robust and accurate transient light transport decompo- sition via conv olutional sparse coding. Opt. Lett. 39 , 11 (Jun 2014), 3177–3180. 2 [HHW15] H E I D E F., H E I D R I C H W. , W E T Z S T E I N G . : Fast and Flexible Con volutional Sparse Coding. In Proc. CVPR (2015). 3 , 4 , 6 , 9 [HKU14] H A J I S H A R I F S . , K R O NA N D E R J . , U N G E R J . : Hdr reconstruc- tion for alternating gain (iso) sensor readout. In Eur ographics 2014 Short P apers (May 2014). 2 , 7 , 8 [HST ∗ 14] H E I D E F. , S T E I N B E R G E R M . , T S A I Y . - T. , R O U F M . , P A - JA K D . , R E D DY D . , G A L L O O . , L I U J . , H E I D R I C H W., E G I A Z A R I A N K . , K A U T Z J . , P U L L I K .: Flexisp: A ﬂexible camera image processing framew ork. ACM T ransactions on Graphics 33 , 6 (2014). 2 [HXK ∗ 14] H E I D E F. , X I AO L . , K O L B A . , H U L L I N M . B . , H E I D R I C H W .: Imaging in scattering media using correlation image sensors and sparse convolutional coding. Opt. Express 22 , 21 (Oct 2014), 26338– 26350. 2 [JCK16] J E O N D . S . , C H O I I . , K I M M . H . : Multisampling Compressiv e V ideo Spectroscopy. Computer Graphics F orum 35 , 2 (2016). 2 [Jo14] J O K .: Image processing apparatus, image processing method, and program. us patent 2014/0321766a1, 10 2014. 8 [KGBU13] K R O NA N D E R J . , G U S TA V S O N S . , B O N N E T G . , U N G E R J . : Uniﬁed hdr reconstruction from raw cfa data. In ICCP (2013), pp. 1–9. 2 [KMH95] K O L B C . , M I T C H E L L D . , H A N R A H A N P . : A realistic camera model for computer graphics. In SIGGRAPH (1995). 3 [KSB ∗ 13] K A L A N TA R I N . K . , S H E C H T M A N E . , B A R N E S C . , D A R A B I S . , G O L D M A N D . B . , S E N P .: Patch-based high dynamic range video. ACM T rans. Graph. 32 , 6 (Nov . 2013). 2 [KSlB ∗ 10] K AVU K C U O G L U K . , S E R M A N E T P . , L A N B O U R E AU Y . , G R E G O R K . , M A T H I E U M . , C U N Y . L . : Learning con volutional feature hierarchies for visual recognition. In Advances in Neural Information Pr ocessing Systems 23 , Laf ferty J., W illiams C., Shawe-T aylor J., Zemel R., Culotta A., (Eds.). Curran Associates, Inc., 2010, pp. 1090–1098. 2 [LBRN07] L E E H . , B A T T L E A . , R A I NA R . , N G A . Y . : Efﬁcient sparse coding algorithms. In Advances in Neural Information Processing Sys- tems 19 . 2007, pp. 801–808. 2 [LGH ∗ 13] L I U D . , G U J . , H I TO M I Y . , G U P TA M . , M I T S U NA G A T., N AY A R S . : Efﬁcient Space-Time Sampling with Pixel-wise Coded Ex- posure for High Speed Imaging. IEEE T ransactions on P attern Analysis and Machine Intelligence 99 (2013), 1. 2 , 5 [LL WD14] L I N X . , L I U Y . , W U J . , D A I Q .: Spatial-spectral encoded compressiv e hyperspectral imaging. ACM T rans. Graph. (SIGGRAPH Asia) 33 , 6 (2014). 2 , 5 [LMS ∗ 13] L E E J . - Y . , M A T S U S H I TA Y . , S H I B . , K W E O N I . S . , I K E U C H I K .: Radiometric calibration by rank minimization. IEEE T ransactions on P attern Analysis and Machine Intelligence 35 , 1 (Jan. 2013), 144– 156. 3 [MBPS09] M A I R A L J . , B A C H F. , P O N C E J . , S A P I R O G . : Online dictio- nary learning for sparse coding. In Pr oceedings of the 26th Annual Inter - national Conference on Machine Learning (2009), ICML ’09, pp. 689– 696. 3 [MDK08] M A N T I U K R . , D A LY S . , K E R O F S K Y L . : Display adaptive tone mapping. ACM T rans. Graph. 27 , 3 (Aug. 2008), 68:1–68:10. 6 , 7 , 9 c  2016 The Author(s) Computer Graphics Forum c  2016 The Eurographics Association and John Wile y & Sons Ltd. A. Serrano, F . Heide, D. Gutierr ez, G. W etzstein & B. Masia / Convolutional Spar se Coding for Single-shot HDR Imaging [MG11] M A N G I AT S . , G I B S O N J . D .: Spatially adapti ve ﬁltering for re g- istration artifact removal in hdr video. In ICIP (2011), IEEE, pp. 1317– 1320. 2 [MKMS04] M A N T I U K R . , K R AWC Z Y K G . , M Y S Z K OW S K I K . , S E I D E L H . - P .: Perception-motivated high dynamic range video encoding. ACM T rans. Graph. 23 , 3 (Aug. 2004), 733–741. 6 , 8 [MKU15] M I A N D J I E . , K R O NA N D E R J . , U N G E R J .: Compressive image reconstruction in reduced union of subspaces. In Eur ographics 2015 (May 2015). 2 [MP95] M A N N S . , P I C A R D R . W.: On being undigital with digital cam- eras: Extending dynamic range by combining differently exposed pic- tures. In Pr oceedings of IST (1995), pp. 442–448. 2 [MRK ∗ 13] M A NA K OV A . , R E S T R E P O J . F., K L E H M O . , H E G E D Ü S R . , E I S E M A N N E . , S E I D E L H . - P . , I H R K E I .: A reconﬁgurable camera add- on for high dynamic range, multi-spectral, polarization, and light-ﬁeld imaging. ACM T rans. Graph. 32 , 4 (July 2013), 47:1–47:14. 2 [MWBR13] M A RW A H K . , W E T Z S T E I N G . , B A N D O Y . , R A S K A R R . : Compressiv e Light Field Photography using Overcomplete Dictionaries and Optimized Projections. ACM T rans. Gr aph. 32 , 4 (2013), 1–11. 2 , 5 [MWDG13] M A S I A B . , W E T Z S T E I N G . , D I DY K P., G U T I E R R E Z D . : A Survey on Computational Displays: Pushing the Boundaries of Optics, Computation, and Perception. Computers & Graphics 37 , 8 (2013), 1012 – 1038. 10 [NM00] N A Y A R S . , M I T S U N AG A T.: High dynamic range imaging: spa- tially varying pixel exposures. In CVPR (2000), vol. 1, pp. 472–479 vol.1. 2 , 5 , 7 , 8 [NN02] N A Y A R S . , N A R A S I M H A N S . : Assorted Pixels: Multi-Sampled Imaging With Structural Models. In ECCV (May 2002), vol. IV , pp. 636– 652. 2 [PB14] P A R I K H N . , B OY D S . : Proximal algorithms. F ound. T rends Op- tim. 1 , 3 (Jan. 2014), 127–239. 4 [PML ∗ 09] P E E R S P., M A H A J A N D . K . , L A M O N D B . , G H O S H A . , M A - T U S I K W., R A M A M O O RT H I R . , D E B E V E C P . : Compressive light trans- port sensing. ACM T rans. Graph. 28 , 1 (Feb . 2009), 3:1–3:18. 2 [PZJ13] P O RT Z T. , Z H A N G L . , J I A N G H .: Random coded sampling for high-speed hdr video. In ICCP (2013), pp. 1–8. 2 [SBB14] S C H E D L D . C . , B I R K L B AU E R C . , B I M B E R O . : Coded Expo- sure HDR Light-Field V ideo Recording. Computer Graphics F orum 33 , 2 (2014), 33–42. 2 [SBN ∗ 12] S C H Ö B E R L M . , B E L Z A . , N O W A K A . , S E I L E R J . , K A U P A . , F O E S S E L S .: Building a high dynamic range video sensor with spatially nonregular optical ﬁltering. In Pr oc. SPIE (2012), vol. 8499, pp. 84990C–84990C–11. 2 [SCG ∗ 05] S E N P . , C H E N B . , G A R G G . , M A R S C H N E R S . R . , H O RO W I T Z M . , L E V OY M . , L E N S C H H .: Dual Photography . ACM T ransactions on Graphics 24 , 3 (2005), 745–755. 2 [SD09a] S E N P . , D A R A B I S . : Compressive Dual Photography. Computer Graphics F orum 28 , 2 (2009). 2 [SD09b] S E N P . , D A R A B I S .: A novel framework for imaging using com- pressed sensing. In Proceedings of the 16th IEEE International Confer- ence on Image Pr ocessing (2009), ICIP’09, pp. 2109–2112. 2 [SD11] S E N P., D A R A B I S . : Compressive rendering: A rendering appli- cation of compressed sensing. IEEE Tr ansactions on V isualization and Computer Graphics 17 , 4 (2011), 487–499. 2 [SGM15] S E R R A N O A . , G U T I E R R E Z D . , M A S I A B . : Compressive high- speed video acquisition. In Pr oc. of CEIG (2015). 2 [SKL10] S Z L A M A . , K A V U K C U O G L U K . , L E C U N Y . : Conv olutional matching pursuit and dictionary training. arXiv:1010.0422 (2010). 4 [SKY ∗ 12] S E N P . , K A L A N TA R I N . K . , Y A E S O U B I M . , D A R A B I S . , G O L D M A N D . B . , S H E C H T M A N E . : Robust patch-based hdr recon- struction of dynamic scenes. ACM T rans. Graph. 31 , 6 (Nov . 2012), 203:1–203:11. 2 [SKZ ∗ 13] S C H Ö B E R L M . , K E I N E RT J . , Z I E G L E R M . , S E I L E R J . , N I E H AU S M . , S C H U L L E R G . , K A U P A . , F O E S S E L S . : Evaluation of a high dynamic range video camera with non-regular sensor. In Proc. SPIE (2013), vol. 8660, pp. 86600M–86600M–12. 2 [SS12] S R I K A N T H A A . , S I D I B É D .: Ghost detection and removal for high dynamic range images: Recent advances. Image Commun. 27 , 6 (July 2012), 650–662. 2 [TG07] T R O P P J . A . , G I L B E RT A . C . : Signal recovery from random measurements via orthogonal matching pursuit. IEEE Tr ans. Inf. Theor . 53 , 12 (Dec. 2007), 4655–4666. 6 [TKTS11] T O C C I M . D . , K I S E R C . , T O C C I N . , S E N P . : A versatile hdr video production system. ACM T rans. Graph. 30 , 4 (July 2011), 41:1– 41:10. 2 [WBSS04] W A N G Z . , B OV I K A . C . , S H E I K H H . R . , S I M O N C E L L I E . P . : Image quality assessment: From error visibility to structural similarity . IEEE T ransactions on Image Pr ocessing 13 , 4 (2004), 600–612. 5 [WIH10] W E T Z S T E I N G . , I H R K E I . , H E I D R I C H W .: Sensor Saturation in Fourier Multiplex ed Imaging. In Pr oc. CVPR (2010). 2 [YSM12] Y U G . , S A P I R O G . , M A L L AT S .: Solving inverse problems with piecewise linear estimators: From gaussian mixture models to struc- tured sparsity . IEEE Tr ansactions on Image Processing 21 , 5 (2012), 2481–2499. 2 [ZBW11] Z I M M E R H . , B RU H N A . , W E I C K E RT J .: Freehand hdr imag- ing of moving scenes with simultaneous resolution enhancement. Com- puter Graphics F orum 30 , 2 (2011), 405–414. 2 [ZKTF10] Z E I L E R M . D . , K R I S H N A N D . , T A Y L O R G . W. , F E R G U S R .: Decon volutional networks. In Proc. of CVPR (2010), pp. 2528–2535. 9 [ZSFC ∗ 15] Z H AO H . , S H I B . , F E R NA N D E Z - C U L L C . , Y E U N G S . - K . , R A S K A R R . : Unbounded high dynamic range photography using a mod- ulo camera. In ICCP (2015). 2 [ZTF11] Z E I L E R M . D . , T A Y L O R G . W. , F E R G U S R . : Adaptive decon- volutional networks for mid and high level feature learning. In IEEE International Conference on Computer V ision, ICCV 2011, Barcelona, Spain, November 6-13, 2011 (2011), pp. 2018–2025. 2 c  2016 The Author(s) Computer Graphics Forum c  2016 The Eurographics Association and John Wile y & Sons Ltd.

Convolutional Sparse Coding for High Dynamic Range Imaging

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment