Training Auto-encoder-based Optimizers for Terahertz Image Reconstruction

T R A I N I N G A U T O - E N C O D E R - B A S E D O P T I M I Z E R S F O R T E R A H E RT Z I M A G E R E C O N S T R U C T I O N A P R E P R I N T T ak Ming W ong 1, 2 , Matthias Kahl 1, 3 , Peter Haring Bolívar 1, 3 , Andreas K olb 1, 2 , and Michael Möller 1, 4 1 Center for Sensor Systems ( ZESS ), Univ ersity of Siegen, 57076 Sie gen, Germany 2 Computer Graphics and Multimedia Systems Group, Univ ersity of Siegen, 57076 Sie gen, Germany 3 Institute for High Frequency and Quantum Electronics ( HQE ), Univ ersity of Siegen, 57068 Sie gen, Germany 4 Computer V ision Group, University of Sie gen, 57076 Siegen, German y October 30, 2019 A B S T R A C T T erahertz (THz) sensing is a promising imaging technology for a wide v ariety of dif ferent applications. Extracting the interpretable and physically meaningful parameters for such applications, howe ver , requires solving an in verse problem in which a model function determined by these parameters needs to be ﬁtted to the measured data. Since the underlying optimization problem is noncon vex and very costly to solve, we propose learning the prediction of suitable parameters from the measured data directly . More precisely , we develop a model-based autoencoder in which the encoder network predicts suitable parameters and the decoder is ﬁx ed to a physically meaningful model function, such that we can train the encoding network in an unsupervised w ay . W e illustrate numerically that the resulting network is more than 140 times faster than classical optimization techniques while making predictions with only slightly higher objecti ve values. Using such predi ctions as starting points of local optimization techniques allows us to con ver ge to better local minima about twice as fast as optimizing without the network-based initialization. 1 Introduction T erahertz (THz) imaging is an emerging sensing technology with a great potential for hidden object imaging, contact- free analysis, non-destructi ve testing and stand-off detection in v arious application ﬁelds, including semi-conductor industry , biological and medical analysis, material and quality control, safety and security [1 – 3]. The physically interpretable quantities relev ant to the aforementioned applications, howev er , cannot always be measured directly . Instead, in THz imaging systems, each pix el contains implicit information about such quantities, making the in verse pr oblem of inferring these physical quantities a challenging problem with high practical rele v ance. As we will discuss in Sec. 2, at each pixel location ~ x the relation between the desired (unkno wn) parameters ~ p ( ~ x ) = ( ˆ e ( ~ x ) , σ ( ~ x ) , µ ( ~ x ) , φ ( ~ x )) ∈ R 4 , i.e., the electric ﬁeld amplitude ˆ e , the position of the surface µ , the width of the reﬂected pulse σ , and the phase φ , and the actual measurements g ( ~ x ) ∈ R n z can be modelled via the equation g ( ~ x, z ) = ( f ˆ e,σ,µ,φ ( z i )) i ∈{ 1 ,...,n z } + noise, where f ˆ e,σ,µ,φ ( z ) = ˆ e sinc ( σ ( z − µ )) exp ( − i ( ω z − φ )) , (1) sinc( t ) =    sin( π t ) π t t 6 = 0 , 1 t = 0 , (2) and ( z i ) i ∈{ 1 ,...,n z } is a device-dependent sampling grid z g rid . More details of the THz model are described in [4]. Thus, the crucial step in THz imaging is the solution of optimization problem of the form min ˆ e,σ,µ,φ Loss ( f ˆ e,σ,µ,φ ( z g rid ) , g ( ~ x )) , (3) A P R E P R I N T - O C T O B E R 3 0 , 2 0 1 9 at each pixel ~ x , possibly along with additional regularizers on the unkno wn parameters. Even with simple choices of the loss function such as an ` 2 -squared loss, the resulting ﬁtting problem is highly noncon ve x and global solutions become rather e xpensiv e. Considering that the number ( n x · n y ) of pixels, i.e., of optimization problem (3) to be solv ed, typically is in the order of hundred thousands to millions, ev en local ﬁrst order or quasi-Newton methods become quite costly: For e xample, running the build-in T rust-Region solver of MA TLAB R  to reconstruct a 446 × 446 THz image takes ov er 170 minutes. In this paper , we propose to train a neural network to solve the per-pixel optimization problem (3) directly . W e formulate the training of the network as a model-based autoencoder (AE), which allows us to train the corresponding network with real data in an unsupervised way , i.e., without ground truth. W e demonstrate that the resulting optimization network yields parameters ( ˆ e, σ, µ, φ ) that result in only slightly higher losses than actually running an optimization algorithm, despite the advantage of being more than 140 times faster . Moreov er , we demonstrate that our network can serve as an excellent initialization scheme for classical optimizers. By using the network’ s prediction as a starting point for a gradient-based optimizer , we obtain lower losses and con verge more than 2x faster than classical optimization approaches, while beneﬁting from all theoretical guarantees of the respectiv e minimization algorithm. This paper is org anized as follows: Sec. 2 gives more details on ho w THz imaging systems w ork. Sec. 3 summarizes the related work on learning optimizers, machine learning for THz imaging techniques, and model-based autoencoders. Sec. 4 describes model-based AEs in contrast to classical supervised learning approaches in detail, before Sec. 5 sum- marizes our implementation. Sec. 6 compares the proposed approaches to classical (optimization-based) reconstruction techniques in terms of speed and accuracy before Sec. 7 dra ws conclusions. 2 THz Imaging Systems There are sev eral approaches to realizing THz imaging, e.g. femtosecond laser based scanning system [5, 6], synthetic aperture systems [7, 8], and hybrid systems [9]. A typical approach to THz imaging is based on the Frequency Modulated Continuous W av e (FMCW) concept [8], which uses active frequenc y modulated THz signals to sense reﬂected signals from the object. The reﬂected energy and phase shifts due to the signal path length make 3D THz imaging possible. In Figure 1, the setup of our electronic FMCW -THz 3D imaging system is sho wn. More details on the THz imaging system are described in [8]. Y X Z O object (focal plane) image plane Tx Rx beam splitter plano convex lens imaging unit Figure 1: THz 3D imaging geometry . Both transmitter (Tx) and recei ver (Rx) are mounted on the same platform. The imaging unit, consisting of Tx, Rx and optical components, are moved along the x and y direction using stepper motors and linear stages. This imaging unit takes a depth proﬁle of the object at each lateral position, in order to acquire a full THz 3D image. 2 A P R E P R I N T - O C T O B E R 3 0 , 2 0 1 9 In this paper , we denote by g t ( ~ x, t ) the measured demodulated time domain signal of the reﬂected electric ﬁeld amplitude of the FMCW system at lateral position ~ x ∈ R 2 . In FMCW radar signal processing, this continuous wa ve temporal signal is con verted into frequency domain by a F ourier transform [10, 11]. Since the linear frequency sweep has a unique frequency at each spatial position in z -direction, the con verted frequency domain signal directly relates to the spatial azimuth ( z -direction) domain signal g c ( ~ x, z ) = F { g t ( ~ x, t ) } . (4) The resulting 3D image g c ∈ C n x × n y × n z is complex data in the spatial domain, representing per-pixel complex reﬂectivity of THz energy . The quantities n x , n y , n z resemble the discretization in vertical, horizontal and depth- direction, respecti vely . Equiv alently , we may represent g c by considering the real and imaginary parts as two separate channels, resulting a 4D real data tensor g ∈ R n x × n y × n z × 2 . Since the system is calibrated by amplitude normalization with respect to an ideal metallic reﬂector , a rectangular frequency signal response is ensured for the FMCW frequenc y dependance [8]. After the FFT in (4) , the z -direction signal en velope is an ideal sinc function as continuous spatial signal amplitude, giving rise to the physical model giv en in (1) in the introduction. In (1) , the electric ﬁeld amplitude ˆ e is the reﬂection coefﬁcient for the material, which is dependent on the complex dielectric constant of the material and helps to identify and classify materials. The depth position µ is the position at which maximum reﬂection occurs, i.e., the position of the surf ace reﬂecting the THz ener gy . σ is the width of the reﬂected pulse, which includes information on the dispersion characteristics of the material. The phase φ of the reﬂected wa ve depends on the ratio of real to imaginary parts of the dielectric properties of the material. Thus, the parameters ~ p = ( ˆ e, σ, µ, φ ) contain important information about the geometry as well as the material of the imaged object, which is of interest in a wide variety of applications. 3 Related W ork Due to the re volutionary success (con volutional) neural networks ha ve had on computer vision problems ov er the last decade, researchers hav e extended the ﬁelds of applications of neural networks signiﬁcantly . A particularly interesting concept is to learn the solution of complex, possibly nonconv ex, optimization problems. Different lines of research hav e considered directly learning the optimizer itself, e.g. modelled as a recurrent neural network [12], or rolling out optimization algorithms and learning the incremental steps, e.g. in the form of parameterized proximal operators in [13]. Further hybrid approaches include optimization problems in the networks’ architecture, e.g. [14], or combining optimizers with networks that hav e been trained indi vidually [15, 16]. The recent work of Moeller et al. [17] trains a network to predict descent directions to a given ener gy in order to give pro vable con vergence results on the learned optimizer . Objectiv es similar to the one arising in the training of our model-based AEs are considered, for instance, for solving in verse problems with deep image priors [18] or deep decoders [19]. These works, ho we ver , consider the input to the networks being ﬁx ed random noise and ha ve to solve an optimization probl em for the networks weights for each in verse problems, such that they are re gularization-by-parametrization approaches rather than learned optimizers. The most related prior work is the 3D face reconstruction network from T e wari et al. [20]. They aimed at ﬁnding a semantic code vector from a giv en facial image such that feeding this code vector into a rending engine yields an image similar to the input image itself. While this problem had been addressed using optimization algorithms a long time ago [21] (also known under the name of analysis-by-synthesis approaches), the approach by T ewari et al. [20] replaced the optimizer with a neural network and kept the original cost function to train the network in an unsupervised way . The resulting structure resembles an AE in which the decoder ﬁxed to the forward model and was therefore coined model-based AE. As we will discuss in the ne xt section, the idea of model-based AEs generalizes far be yond 3D face reconstruction and can be used to boost the THz parameter identiﬁcation problem signiﬁcantly . Finally , a recent work has exploited deep learning techniques in T erahertz imaging in [22], but the considered application of super-resolving the THz amplitude image by training a conv olutional neural network on synthetically blurred images is not directly related to our proposed approach. 4 A Model-Based A utoencoder for THz Image Reconstruction Let us denote the THz input data by g ∈ R n x × n y × n z × 2 , and consider our four unknown parameters ( ˆ e, σ, µ, φ ) to be R n x × n y matrices, allowing each parameter to change at each pixel. Under slight abuse of notation we can interpret all operations in (1) to be pointwise and again identify comple x values with tw o real v alues in order to hav e 3 A P R E P R I N T - O C T O B E R 3 0 , 2 0 1 9 S yn the tic da t a Ne tw ork Pr edict ed P ar ame t er s Mod el Ran doml y dr a wn p ar ame t er s Super vi sed learn in g 𝑃 = ( Ƹ 𝑒 , 𝜎 , 𝜇 , 𝜙 ) 𝑃 = ( Ƹ 𝑒 , 𝜎 , 𝜇 , 𝜙 ) Figure 2: Classical supervised learning strategy with simulated data: The forward model f P (e.g. from (1) ) is used to simulate data g , which can subsequently be fed into a network to be trained to reproduce the simulation parameters in a supervised way . f ˆ e,σ,µ,ω,φ ( z g rid ) ∈ R n x × n y × n z × 2 , where z g rid = ( z i ) i ∈{ 1 ,...,n z } denotes the depth sampling grid. Concatenating all four matrix valued parameters into a single parameter tensor ~ P ∈ R n y × n x × 4 , our goal can be formalized as ﬁnding ~ P such that f ~ P ( z g rid ) ≈ g . A classical supervised machine learning approach to problems with kno wn forw ard operator is illustrated in Figure 2 for the example of THz image reconstruction: The explicit forward model f is used to simulate a large set of i mages g from known parameters P which can subsequently be used as training data for predicting P via a neural network G ( g ; θ ) depending on weights θ . Such supervised approaches with simulated training data are frequently used in other image reconstruction areas, e.g. super resolution [23, 24], or image deblurring [25, 26]. The accuracy of networks trained on simulated data, ho wev er , crucially relies on precise kno wledge of the forw ard model and the simulated noise. Slight deviations thereof can signiﬁcantly degrade a network performance as demonstrated in [27], where deep denoising networks trained on Gaussian noise were outperformed by BM3D when applied to realistic sensor noise. Instead of pursuing the supervised learning approach described abo ve, we replace ~ p = ( ˆ e, σ, µ, φ ) in the optimization approach (3) by a suitable network G ( g ; θ ) that depends on the raw input data g and learnable parameters θ , that can be trained in an unsupervised way on real data . Assuming we have multiple e xamples g k of THz data, and choosing the loss function in (3) as an ` 2 -squared loss, giv es rise to the unsupervised training problem min θ X training examples k k f G ( g k ; θ ) ( z g rid ) − g k k 2 F . (5) As we ha ve illustrated in Figure 3, this training resembles an AE architecture: The input to the netw ork is data g k which gets mapped to parameters P that – when fed into the model function f – ought to reproduce g k again. Opposed to the straight forward supervised learning approach, the proposed approach (5) has two signiﬁcant adv antages • It allows us to train the network in an unsupervised way , i.e., on real data, and therefore learn to deal with measurement-speciﬁc distortions. • The cost function in (5) implicitly handles the scaling of dif ferent parameters, and therefore circumvents the problem of deﬁning meaningful cost functions on the parameter space: Simple parameter discrepancies such as k P 1 − P 2 k 2 2 for two different parameters sets P 1 and P 2 largely depend on the scaling of the individual parameters and might ev en be meaningless, e.g. for cyclic parameters such as the phase of fset φ . 5 Encoder Network Architecture and T raining 5.1 Data Pr eprocessing As illustrated in the plot of the magnitude of an exemplary measured THz signal sho wn in Figure 4, the THz energy is mainly focused in the main lobe and ﬁrst side-lobes of the sinc function. Because the physical model remains valid 4 A P R E P R I N T - O C T O B E R 3 0 , 2 0 1 9 Inpu t da t a Ne tw ork P ar ame t er s Mod el Model - based r ec ons truc tion T r ain output to c oin cid e with the in pu t 𝑃 = ( Ƹ 𝑒 , 𝜎 , 𝜇 , 𝜙 ) Figure 3: A model-based AE for THz image reconstruction: The input data g is fed into a network G whose parameters θ are trained in such a way that feeding the network’ s prediction G ( g ; θ ) into a model function f again reproduces the input data g . Such an architecture resembles an AE with a learnable encoder and a model-based decoder and allows an unsupervised training on real data. 6250 6300 6350 z 0 500 1000 1500 Signal magnitude z grid Figure 4: Magnitude of a sample point of measured THz signal. The main lobe and major side-lobes are included in the grid window , which is colored in gray . in close proximity of the main lobe only , we preprocess the data to reduce the impressively large range of 12600 measurements per pixel. W e, therefore, crop out 91 measurements per pixel centered around the main lobe, whose position is related to the object distance and to the parameter µ . Details of the cropping windo w are described in [4]. W e represent the THz data in a 4D real tensor g ∈ R n x × n y × n z × 2 , where n x = n y = 446 , and n z is the size of the cropping window , i.e. 91 in our case. 5.2 Encoder Ar chitecture and T raining For the encoder network G ( g ; θ ) we pick a spatially decoupled architecture using 1 × 1 con v olutions on g only , leading to a signal-by-signal reconstruction mechanism that allows a high level of parallelism and therefore maximizes the reconstruction speed on a GPU. The speciﬁc architecture (illustrated in Figure 5) applies a ﬁrst set of con volutional ﬁlters on the real and imaginary part separately , before concatenating the activ ations, and applying three further con volutional ﬁlters on the concatenated structure. W e apply batch-normalization (BN) [28] after each conv olution and use leaky rectiﬁed linear units (LeReLU) [29] as activ ations. Finally , a fully connected layer reduces the dimension to the desired size of four output parameters per pixel. T o ensure that the amplitude is physically meaningful, i.e., non-negati ve, 5 A P R E P R I N T - O C T O B E R 3 0 , 2 0 1 9 we apply an absolute v alue function on the ﬁrst component. Interestingly , this choice compared fav orably to a plain rectiﬁed linear unit when the network is trained. r eal imagin ar y 10 Con v 5 BN , LeR eL U 20 Con v 5x2 BN, LeR eL U Imag . R eal F C , 20 Con v 5x2 BN, LeR eL U 30 Con v 5x2 BN, LeR eL U 10 Con v 5 BN, LeR eL U LeR eL U | Ƹ 𝑒 | Ƹ 𝑒 𝜎 𝜇 𝜙 Figure 5: Architecture of encoding network G ( g ; θ ) that predicts the parameters: At each pixel the real and imaginary part is extracted, con v olved, concatenated and processed via three con volutional and 1 fully connected layer . T o obtain physically meaningful (non-negati v e) amplitudes, we apply an absolute value function to the ﬁrst component. 0 200 400 600 800 1000 1200 Epoch 25 30 35 40 45 50 55 Loss (in decibel) training validation Figure 6: The av erage losses of the training and validation sets ov er 1200 epochs on a decibel scale illustrate that there is almost no generalization gap between training and v alidation. W e train our model optimizing (5) using the Adam optimizer [30] on 80% of the 446 × 446 pixels from a real (measured) THz image for 1200 epochs. The remaining 20% of the pixels serv e as a v alidation set. The batch size is set to 4096 . The initial learning rate is set to 0 . 005 , and is reduced by a factor of 0.99 e very 20 epochs. Figure 6 illustrates the decay of the training and validation losses o ver 1200 epochs. As we can see, the validation loss nicely resembles the training loss with almost no generalization gap. 6 Numerical Experiments W e ev aluate the proposed model-based AE on two datasets, which are acquired using the setup described in Sec. 2, namely the MetalPCB dataset and the StepChart dataset. The MetalPCB dataset is measured by a nearly planar copper target etched on a circuit board (Figure 7a), which includes metal and PCB material regions, in the standard size scale of USAF tar get MIL-STD-150A [31]. After the preprocessing described in Sec. 5.1, the MetalPCB dataset has 446 × 446 × 91 sample points. The StepChart dataset is based on an aluminum object (Figure 7b) with sharp edges to e valuate the distance measurement accurac y using a 3D object. The StepChart dataset has 113 × 575 × 91 sample points after preprocessing. In order to ev aluate the optimization quality on different materials and structures, MetalPCB dataset is evaluated in regions: PCB r e gion is a local re gion that contains PCB material only , Metal r e gion is a local re gion that contains copper 6 A P R E P R I N T - O C T O B E R 3 0 , 2 0 1 9 (a) (b) Figure 7: Objects of ev aluated datasets (a) MetalPCB dataset (b) StepChart dataset material only , and All re gion is the entire image area. Similarly , the StepChart dataset is e valuated by 3 re gions: Edge r e gion is the region that contains physical edges, Steps re gion is the center planar region of each steps, and All re gion is the entire image area. This segmentation is done, because the THz measurements of the highly specular aluminum target results in strong multi-path interference artif acts at the edges that should be in vestigated separately . The proposed model-based AE is trained on the MetalPCB dataset only , while the parameter inference is made for both the MetalPCB and StepChart datasets. This cross-referencing between two datasets can verify whether the proposed AE method is modelling the physical behavior of the system without o verﬁtting to a speciﬁc dataset or recorded material. T o compare with the classical optimization methods, the parameters are estimated using the T rust-Re gion Algorithm (TRA) [32], which is implemented in MA TLAB R  . The TRA optimization requires a proper deﬁnition of the parameter ranges. Furthermore, it is very sensitive with respect to the initial parameter set. W e, therefore, carefully select the initial parameters by sequentially estimating them from the source data (see [4] for more details). Still, the optimization may result in a parameter set with signiﬁcant loss values; see Sec. 6.2. The trained encoder network is independent of an y initialization scheme as it tries to directly predict optimal parameters from the input data. While the network alone gives remarkably good results with signiﬁcantly lower runtimes than the optimization method, there is no guarantee that the network’ s predictions are critical points of the energy to be minimized. This motiv ates the use of the encoder network as an initialization scheme to the TRA, speciﬁcally because the TRA guarantees the monotonic decrease of objecti ve function such that using the TRA on top of the network can only improv e the results. W e abbreviate this approach to AE+TRA for the rest of this paper . T o f airly compare all three approaches, the optimization time of TRA and the inference time of the AE are both recorded by an Intel R  i7-8700K CPU computation, while the AE is trained on a NVIDIA R  GTX 1080 GPU. The PyT orch source code is av ailable at https://github.com/tak- wong/THz- AutoEncoder . 6.1 Loss and timing T able 1: Loss and timing enhancement based on the proposed model-based AE Dataset (Region) Measurement TRA AE AE+TRA MetalPCB (All) A verage Loss 693.9 886.3 442.2 MetalPCB (PCB) A verage Loss 589.0 872.6 589.0 MetalPCB (Metal) A verage Loss 519.6 446.1 115.7 StepChart (All) A verage Loss 3815.1 5148.3 3675.3 StepChart (Edges) A verage Loss 4860.4 6309.1 2015.7 StepChart (Steps) A verage Loss 1152.5 2015.7 1150.3 MetalPCB T raining time (sec.) none 9312.8 9312.8 MetalPCB Run time (sec.) 10391.2 † 73.5 ∗ 4854.7 StepChart Run time (sec.) 3463.9 † 22.8 ∗ 1712.4 † Inference time ∗ Run time is the sum of AE inference and TRA optimization time 7 A P R E P R I N T - O C T O B E R 3 0 , 2 0 1 9 In T able 1, the a verage loss in (5) and the timing are sho wn for the T rust-Re gion Algorithm (TRA), the Autoencoder (AE) and the joint AE+TRA approaches, respectiv ely . W e can see that the proposed encoder network achiev es a lower av erage loss than the TRA method in the metal region of the MetalPCB dataset, it yields higher average losses than the TRA on both datasets. It is encouraging to see that although the AE was trained on the MetalPCB dataset, the relati ve performance in comparison to the TRA does not decay too signiﬁcantly when changing to an entirely unseen data set with a different material, with the AE loss being 21 . 7% and 25 . 9% higher than the TRA loss on the MetalPCB and StepChart data sets, respecti vely . If such a sacriﬁce in accurac y is acceptable, the speed-up in runtime is tremendous with the AE being over 140 times faster than the TRA (for both methods being ev aluated on a CPU). Note that even the sum of training and inference time are smaller for the proposed AE than the runtime of the TRA on the MetalPCB dataset. Interestingly , the combined AE+TRA approach of initializing the TRA with the encoder network’ s prediction leads to better losses than the TRA alone in all regions. Additionally , the AE-initialized TRA con verged more than 2 times faster due to the stopping criterion being reached earlier . W e note that the losses of all approaches are signiﬁcantly higher for the StepCart data set than they are for the MetalPCB. This is because the aluminum StepChart object (Figure 7b) has a more complex physical structure than the MetalPCB object, which results in a mixture of scattered THz pulses by multi-path interference effects in all object regions. Incorporating such effects in the reﬂection model of (1) could therefore be an interesting aspect of future research for improving the e xplainability of the measured data with the physical model. 6.2 Quality Assessment of THz Images In THz imaging, the intensity image I that is equal to the squared amplitude, i.e. I = ˆ e 2 is the most important criteria for quality assessment. Note that the intensity could be inferred directly from the data by considering that (1) yields f ˆ e,σ,µ,φ ( µ ) · f ∗ ˆ e,σ,µ,φ ( µ ) = ˆ e 2 · sinc 2 (0) = ˆ e 2 = I (6) where f ∗ is the complex conjugate of f . As we illustrate in Figure 8, the model-based approach is not only capable of extracting all rele vant parameters, i.e., ˆ e , µ , σ and φ , but, compared to values directly extracted from the source data, the resulting intensity I is more homogeneous in homogeneous material regions. The homogeneity of the directly extracted intensity results from the very lo w depth of ﬁeld of THz imaging systems in general, combined with the slight non-planarity of the MetalPCB target. As depicted in Figure 8c, the intensity variations along the selected line in the homogeneous copper region are reduced using the three model-based methods, i.e. TRA, AE, and AE+TRA. Howe v er , due to the crucial selection of the initial parameters (see discussion at the beginning of Sec. 6), the TRA optimization results e xhibit signiﬁcant amplitude ﬂuctuations and loss v alues (Figure 8d) in the tw o horizontal sub-regions x ∈ [150 , 200] and x > 430 . The proposed AE and AE+TRA methods, ho wev er , deli ver superior results with respect to the main quality measure applied in THz imaging, i.e. to the intensity homogeneity and the loss in model ﬁtting. Still, the AE approach shows v ery few e xtreme loss values, while the AE+TRA method’ s loss values are consistently low along the selected line in the homogeneous copper re gion. 7 Conclusions and Future W ork In this paper , we propose a model-based autoencoder for THz image reconstruction. Comparing to a classical T rust- Region optimizer, the proposed autoencoder gets within 25% margin to the objective value of the optimizer , while being more than 140 times faster . Using the network’ s prediction as an initialization to a gradient-based optimization scheme improv es the result ov er a plain optimization scheme in terms of objectiv e v alues while still being two times faster . W e believe that these are very promising results for training optimizers/initialization schemes for parameter identiﬁcation problems in general by exploiting the idea of model-based autoencoders for unsupervised learning. Future research will include exploiting spatial information during the reconstruction as well as considering joint parameter identiﬁcation and reconstruction problems such as denoising, sharpening, and super-resolving parameter images such as the amplitude images shown in Figure 8b. Acknowledgement This is a pre-print of a conference proceeding article published in German Conference on Pattern Recognition. The ﬁnal authenticated version is a vailable online at: https://doi.org/10.1007/978-3-030-33676-9_7 8 A P R E P R I N T - O C T O B E R 3 0 , 2 0 1 9 THz intensity image (a) THz intensity image by autoencoder+TRA (b) 100 200 300 400 X 0 0.5 1 1.5 2 2.5 3 THz intensity 10 6 intensity from source data intensity by TRA intensity by AE intensity by AE+TRA (c) 100 200 300 400 X 0 0.5 1 1.5 2 2.5 Loss 10 5 loss by TRA loss by AE loss by AE+TRA (d) Figure 8: Comparison of the THz intensity for the MetalPCB dataset: (a) intensity image extracted from the source data without any model-based processing (in red: the pixel line for plots (c) and (d)); (b) image extracted by the proposed AE+TRA approach (in red: the pixel line for plots (c) and (d)); (c) plot of the intensity e xtracted along the horizontal line in the copper region; (d) plot of the per-pixel loss by TRA, AE, and AE+TRA approaches along the horizontal line in the copper region. References [1] W ai Lam Chan, Jason Deibel, and Daniel M Mittleman. Imaging with terahertz radiation. Reports on pr ogr ess in physics , 70(8):1325, 2007. [2] Christian Jansen, Stef fen W ietzke, Ole Peters, Maik Scheller , Nico V ie weg, Mohammed Salhi, Norman Krumbholz, Christian Jördens, Thomas Hochrein, and Martin K och. T erahertz imaging: applications and perspecti ves. Appl. Opt. , 49(19):E48–E57, 2010. [3] Peter H Siegel. T erahertz technology . IEEE T ransactions on micr owave theory and techniques , 50(3):910–928, 2002. 9 A P R E P R I N T - O C T O B E R 3 0 , 2 0 1 9 [4] T ak Ming W ong, Matthias Kahl, Peter Haring Bolívar , and Andreas K olb . Computational image enhancement for frequency modulated continuous w av e (fmcw) thz image. J ournal of Infrar ed, Millimeter , and T er ahertz W aves , 40(7):775–800, 2019. [5] Ken B Cooper , Robert J Dengler, Nuria Llombart, Bertrand Thomas, Goutam Chattopadhyay , and Peter H Siegel. Thz imaging radar for standof f personnel screening. IEEE T r ansactions on T er ahertz Science and T echnolo gy , 1(1):169–182, 2011. [6] Binbin B Hu and Martin C Nuss. Imaging with terahertz wav es. Optics letters , 20(16):1716–1718, 1995. [7] K McClatche y , MT Reiten, and RA Che ville. T ime resolv ed synthetic aperture terahertz impulse imaging. Applied physics letters , 79(27):4485–4487, 2001. [8] Jinshan Ding, Matthias Kahl, Otmar Loffeld, and Peter Haring Bolívar . Thz 3-d image formation using sar techniques: simulation, processing and experimental results. IEEE T ransactions on T erahertz Science and T ec hnology , 3(5):606–616, 2013. [9] M. Kahl, A. Keil, J. Peuser , T . Löfﬂer , M. Pätzold, A. K olb, T . Sprenger, B. Hils, and P . Haring Bolívar . Stand-off real-time synthetic imaging at mm-wav e frequencies. In P assive and Active Millimeter-W ave Imaging XV , volume 8362, page 836208, 2012. [10] David C Munson and Robert L V isentin. A signal processing view of strip-mapping synthetic aperture radar . IEEE T ransactions on Acoustics, Speech, and Signal Pr ocessing , 37(12):2131–2147, 1989. [11] Merrill Iv an Skolnik. Radar handbook. 1970. [12] Marcin Andrycho wicz, Misha Denil, Sergio Gomez Colmenarejo, Matthe w W . Hof fman, Da vid Pfau, T om Schaul, and Nando de Freitas. Learning to learn by gradient descent by gradient descent. In Pr oc. Int. Conf . on Neural Information Pr ocessing Systems (NIPS) , 2016. [13] Erich K obler , T eresa Klatzer , K erstin Hammernik, and Thomas Pock. V ariational networks: Connecting v ariational methods and deep learning. In Pr oc. German Conf . P attern Recognition (GCPR) , 2017. [14] Brandon Amos and J. Zico Kolter . Optnet: Differentiable optimization as a layer in neural networks. In Pr oc. Int. Conf. on Mac hine Learning , 2017. [15] J-H. Chang, C-L. Li, B. Poczos, B.V .K. V ijaya Kumar , and A.C. Sankaranarayanan. One network to solve them all — solving linear in verse problems using deep projection models. In Pr oc. IEEE Int. Conf. on Computer V ision , 2017. [16] T . Meinhardt, M. Moeller , C. Hazirbas, and D. Cremers. Learning proximal operators: Using denoising networks for regularizing in verse imaging problems. In Pr oc. IEEE Int. Conf. on Computer V ision , 2017. [17] M. Moeller , T . Möllenhoff, and D. Cremers. Controlling neural networks via energy dissipation, 2019. Online at https://arxiv.org/abs/1904.03081 . [18] D. Ulyanov , A. V edaldi, and V .S. Lempitsky . Deep image prior . In Pr oc. IEEE Conf . Computer V ision and P attern Recognition , 2018. [19] R. Heckel and P . Hand. Deep decoder: Concise image representations from untrained non-conv olutional networks. In Int. Conf. on Learning Repr esentations , 2019. [20] A yush T e wari, Michael Zollöfer , Hyeongwoo Kim, P ablo Garrido, Florian Bernard, Patrick Perez, and Theobalt Christian. MoF A: Model-based Deep Con volutional F ace Autoencoder for Unsupervised Monocular Reconstruc- tion. In Pr oc. IEEE Int. Conf . on Computer V ision , 2017. [21] V olker Blanz and Thomas V etter . A morphable model for the synthesis of 3d faces. In Pr oc. SIGGRAPH , pages 187–194, New Y ork, NY , USA, 1999. A CM Press/Addison-W esle y Publishing Co. [22] Zhenyu Long, T ianyi W ang, ChengW u Y ou, Zhengang Y ang, Kejia W ang, and Jinsong Liu. T erahertz image super-resolution based on a deep con volutional neural network. Applied Optics , 58(10):2731–2735, 2019. [23] Chao Dong, Chen Change Loy , Kaiming He, and Xiaoou T ang. Learning a deep conv olutional network for image super-resolution. In Pr oc. Eur op. Conf. Computer V ision , pages 184–199. Springer , 2014. [24] Jiwon Kim, Jung Kwon Lee, and K young Mu Lee. Accurate image super-resolution using v ery deep con volutional networks. In Proc. IEEE Conf . Computer V ision and P attern Recognition , pages 1646–1654, 2016. [25] Seungjun Nah, T ae Hyun Kim, and Kyoung Mu Lee. Deep multi-scale con volutional neural network for dynamic scene deblurring. In Pr oc. IEEE Conf . Computer V ision and P attern Recognition , pages 3883–3891, 2017. [26] Christian J Schuler , Michael Hirsch, Stefan Harmeling, and Bernhard Schölkopf. Learning to deblur . IEEE T r ans. P attern Analysis and Machine Intelligence (P AMI) , 38(7):1439–1451, 2016. 10 A P R E P R I N T - O C T O B E R 3 0 , 2 0 1 9 [27] T . Plötz and S. Roth. Benchmarking denoising algorithms with real photographs. In Pr oc. IEEE Conf. Computer V ision and P attern Recognition , 2017. [28] Serge y Iof fe and Christian Sze gedy . Batch normalization: Accelerating deep network training by reducing internal cov ariate shift. Pr oc. Int. Conf. on Machine Learning , 2015. [29] Xavier Glorot, Antoine Bordes, and Y oshua Bengio. Deep sparse rectiﬁer neural networks. In Pr oceedings of the fourteenth international confer ence on artiﬁcial intelligence and statistics , pages 315–323, 2011. [30] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv pr eprint arXiv:1412.6980 , 2014. [31] Military Standard. Photographic lenses, 1959. [32] Thomas F Coleman and Y uying Li. An interior trust region approach for nonlinear minimization subject to bounds. SIAM Journal on optimization , 6(2):418–445, 1996. 11

Training Auto-encoder-based Optimizers for Terahertz Image Reconstruction

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment