Unsupervised and Unregistered Hyperspectral Image Super-Resolution with Mutual Dirichlet-Net

IEEE TRANSA CTIONS ON GEOSCIENCE AND REMO TE SENSING 1 Unsupervised and Unre gistered Hyperspectral Image Super -Resolution with Mutual Dirichlet-Net Y ing Qu, Member , IEEE, Hairong Qi, F ellow , IEEE, Chiman Kwan Senior Member , IEEE, Naoto Y okoya, Member , IEEE, and Jocelyn Chanussot, F ellow , IEEE Abstract — (Please ﬁnd the ﬁnal version fr om IEEE T ransactions on Geoscience and Remote Sensing on IEEE Xplore. The code has been r eleased on GitHub at https://github .com/yingutk/u2MDN.) Hyperspectral images (HSI) pro vide rich spectral information that has contrib uted to the successful performance impr ovement of numerous computer vision and remote sensing tasks. Howev er , it can only be achiev ed at the expense of images’ spatial resolution. Hyperspectral image super -resolution (HSI-SR) thus addresses this problem by fusing low resolution (LR) HSI with multispectral image (MSI) carrying much higher spatial resolution (HR). Existing HSI-SR approaches requir e the LR HSI and HR MSI to be well register ed and the reconstruction accuracy of the HR HSI relies heavily on the registration accuracy of different modalities. In this paper , we pr opose an unregistered and unsupervised mutual Dirichlet-Net ( u 2 -MDN) to exploit the uncharted problem domain of HSI-SR without the requirement of multi-modality registration . The success of this endea vor would largely facilitate the deployment of HSI-SR since registration requir ement is difﬁcult to satisfy in real-world sensing devices. The novelty of this work is three-f old. First, to stabilize the fusion procedur e of two unregistered modalities, the network is designed to extract spatial and spectral information of two modalities with differ ent dimensions through a shared encoder-decoder structure. Second, the mutual information (MI) is further adopted to capture the non-linear statistical dependencies between the representations from two modalities (carrying spatial information) and their raw inputs. By maximizing the MI, spatial correlations between different modalities can be well characterized to further r educe the spectral distortion. W e assume the repr esentations f ollow a similar Dirichlet distribution for its inherent sum-to-one and non-negative properties. Third, a collaborative l 2 , 1 norm is employed as the reconstruction error instead of the mor e common l 2 norm to better preserve the spectral information. Extensive experimental results demonstrate the superior performance of u 2 -MDN as compared to the state-of-the-art. Index T erms —Hyperspectral image, unregister ed, super - resolution, mutual inf ormation, unsupervised deep learning I . I N T RO D U C T I O N Hyperspectral image (HSI) collects hundreds of contiguous spectral representations of objects, which demonstrates advan- tages over the con ventional multispectral image (MSI) or RGB image with much less spectral information [1], [2]. Compared Y ing Qu, and Hairong Qi are with the Adv anced Imaging and Collaborati ve Information Processing Group, Department of Electrical Engineering and Computer Science, University of T ennessee, Knoxville, TN 37996 USA (e- mail: yqu3@vols.utk.edu; hqi@utk.edu). Chiman Kwan is with Applied Research LLC, Rockville, MD, 20850 USA (e-mail: chiman.kwan@arllc.net) Naoto Y okoya is with RIKEN Center for Advanced Intelligence Project (AIP) T okyo, 103-0027, Japan. (e-mail: naoto.yokoya@riken.jp) Jocelyn Chanussot is with the Univ . Grenoble Alpes, Inria, CNRS, Grenoble INP , LJK, Grenoble, 38000, France.(e-mail: jocelyn@hi.is). (a) (b) (c) (d) Fig. 1. Unregistered hyperspectral image super-resolution.(a) First band of the 20 degree rotated and cropped LR HSI with 38% information missing. (b) First band of the HR MSI. (c) First band of the reconstructed HR HSI by the proposed methods. (d) First band of the reference HR HSI. to con ventional images, the rich spectral information of HSI can effecti vely distinguish visually similar objects that actually consist of different materials. Thus, HSI has been shown to enhance the performance of a wide range of computer vision and remote sensing tasks, such as, object recognition and classiﬁcation [3]–[5], segmentation [6], tracking [7], en viron- mental monitoring [8], and change detection [9]. During the HSI acquisition process, the ﬁner the spectral resolution, the smaller the radiation energy that can reach the sensor for a particular spectral band within a narrow wa velength range. Thus, the high spectral resolution of HSI can only be achiev ed at the cost of its spatial resolution due to the hardware limitations [10], [11]. On the contrary , we can obtain con ventional MSI or RGB with a much higher spatial resolution by integrating the radiation energy ov er broad spectral bands which inevitably reduces their spectral resolution signiﬁcantly [12]. T o improv e the spatial resolution of HSI for better application performance, a natural w ay is to fuse the high spectral information extracted from HSI with the high-resolution spatial information extracted from con ven- tional images to yield high resolution images in both spatial and spectral domains [4], [13]. This procedure is referred to as hyperspectr al imag e super -r esolution (HSI-SR) [11], [12]. HSI-SR can be broadly divided into three categories, tra- ditional component substitution (CS) [14], [15] and multi- resolution analysis (MRA) based methods [16], matrix f ac- torization based, and Bayesian-based approaches [4], [17]. Although HSI-SR has been intensively studied, spectral dis- tortion can be easily introduced during the optimization pro- cedure of methods from these categories. Recently , there have been several attempts to address the HSI-SR problem with deep learning where the mapping function between the LR HSI and HR HSI is learned using different frameworks [18], [19]. Howe ver , the deep learning-based approaches are generally limited to handle image pairs with large spatial-scale dif fer- IEEE TRANSA CTIONS ON GEOSCIENCE AND REMO TE SENSING 2 ences and the learned mapping function may not be readily adapted to reconstruct HR HSI possessing different spectral characteristics or acquired from different sensors. Despite a plethora of works on HSI-SR, all current ap- proaches have at least one pre-requisite to solving the problem of HSI-SR, i.e., the two input modalities (HSI and MSI) must be well registered, and the quality of the reconstructed HR HSI relies heavily on the registration accurac y [2], [4], [20]–[22]. According to pre vious works, there are a few methods that introduce registration as a pre-step before data fusion [17], [23], [24]. Howe ver , these pre-steps can only handle small- scale dif ferences, e.g ., two pixels/eight pixels of fset in LR HSI/HR MSI [20]. Moreov er, ev en in the re gistration commu- nity , HSI and MSI registration is a challenging problem itself as one pixel in LR HSI may cover hundreds of pixels in the corresponding HR MSI. The spectral difference is also large that both the spectral response function (SRF) and multi-band images hav e to be taken into consideration during registration [20], [25]–[28]. In this paper , an unsupervised network structure is proposed, aiming to solve the HSI-SR problem directly without multi- modality registration. An example is sho wn in Fig. 1. W e address the problem based on the assumption that, the pixels in the o verlapped region of HR HSI and HR MSI can be approximated by a linear combination of the same spectral in- formation (spectral bases) with the same corresponding spatial information (representations), which indicates ho w the spectral basis is constructed for each pixel. Since LR HSI is the do wn- sampled version of the HR HSI, ideally , its representations should be correlated with that of the HR MSI and HR HSI, i.e ., they should follo w similar patterns and distributions although possessing different resolutions, as shown in Fig. 2. Therefore, to reconstruct HR HSI with minimum spectral distortion, the network is designed to decouple both the LR HSI and HR MSI into spectral bases and representations, such that their spectral bases are shared and their representations are correlated with each other . (a) (b) Fig. 2. Learned hidden representations from unregistered (a) low resolution HSI and (b) high resolution MSI, respectively , as shown in Fig. 1. The nov elty of this work is three-fold. First, to stabilize the fusion procedure for two unregistered modalities, the network extracts both the spectral and spatial information of the multi- modalities through the same encoder-decoder structure, by projecting the LR HSI onto the same statistical space as HR MSI, as illustrated in Fig. 3. The representations of the network are encouraged to follow a Dirichlet distribution to naturally meet the non-neg ativ e and sum-to-one physical constraints. Second, to prev ent spectral distortion, we further adopt mutual information (MI) to e xtract optimal and cor - related representations from multi-modalities. Since the two- modalities are unregistered, the correlated representations are learned by maximizing the MI between the representations and their own inputs during the network optimization. Third, a collaborativ e l 2 , 1 norm is employed as the reconstruction error instead of commonly used l 2 loss, so that the network is able to reconstruct indi vidual pixels as accurately as possible. In this way , the network preserves the spectral information better . W ith the abov e design, the proposed network is able to work directly on unregistered images and the spectral distortion of the reconstructed HR HSI can be largely reduced. The pro- posed method is referred to as unregistered and unsupervised mutual Dirichlet Net, or u 2 -MDN for short. u 2 -MDN is an e xtension of our previous work uSDN [21]. Howe ver , uSDN is only effecti ve on general HSI-SR problem with well-registered LR HSI and HR MSI. Here, we hav e made substantial extensions to address the challenges of HSI- SR with unregistered multi-modalities. T o the best of our knowledge, this is the ﬁrst effort to solving the HSI-SR prob- lem directly on unregistered image pairs with unsupervised deep learning. The major impro vements can be summarized from three perspecti ves. First, the network structure is dif ferent from that of the uSDN. Instead of adopting two deep learning networks as in uSDN, the proposed u 2 -MDN is speciﬁcally designed to extract the representations of multi-modalities with only one encoder-decoder structure, which largely sta- bilizes the information extraction and fusion procedure giv en the unregistered multi-modalities. Second, uSDN minimizes spectral distortion of the reconstructed HR HSI by reducing the angular difference of the representations from multiple modalities, which fails to deal with unregistered cases, while the proposed u 2 -MDN is able to handle both well-registered and unregistered cases by extracting correlated representations with mutual information through the mutual discriminativ e network. Third, instead of commonly used l 2 loss adopted by uSDN, the collaborative l 2 , 1 norm is introduced in the proposed u 2 -MDN to better preserve the spectral information. I I . R E L A T E D W O R K A. Hyperspectr al Imag e Super -Resolution The problem of HSI-SR originates from multispectral im- age super-resolution (MSI-SR) in the remote sensing ﬁeld, where the spatial resolution of MSI is further improved by a high-resolution panchromatic image (P AN). T raditional widely utilized MSI-SR methods can be roughly categorized into two groups: the component substitution (CS) and the multi- resolution analysis (MRA) based approaches. Generally , CS– based approaches [14] project the given data onto a predeﬁned space where the spectral information and spatial information are separated. Subsequently , the spatial component is substi- tuted with the one e xtracted from P AN [15]. Several methods based on CS hav e been proposed to address the problem of hyper-sharpening and achie ved promising results with different criteria [29]–[31]. MRA-based approaches achie ve the spatial details by ﬁrst applying a spatial ﬁlter to the HR images. Then the spatial details are injected into the LR HSI [16], [17], [32]– [34]. Although these traditional pan-sharpening approaches IEEE TRANSA CTIONS ON GEOSCIENCE AND REMO TE SENSING 3 can be extended to solv e the HSI-SR problem, they usually suffer from sev ere spectral distortions [11], [17], [35]. Recent approaches consist of Bayesian-based and matrix factorization-based methods [4], [17]. Bayesian approaches estimate the posterior distrib ution of the HR HSI giv en LR HSI and HR MSI. The unique framework of Bayesian offers a con venient way to regularize the solution space of HR HSI by employing a proper prior distribution such as Gaussian. Dif fer- ent methods vary according to the different prior distributions adopted. W ei et al . proposed a Bayesian Naive method [36] based on the assumption that the representation coef ﬁcients of HR HSI follow a Gaussian distribution. Howe ver , this assumption does not always hold especially when the ground truth HR HSI contains complex textures. Instead of using Gaussian prior , dictionary-based approaches solve the problem under the assumption that HR HSI is a linear combination of properly chosen ov er-complete dictionary and sparse co- efﬁcients [37]. Simoes et al . proposed HySure [38], which takes into account both the spatial and spectral characteristics of the given data. This approach solves the problem through vector -based total variation regularization. Akhtar et al . [11] introduced a non-parametric Bayesian strategy to solve the HSI-SR problem. The method ﬁrst learns a spectral dictionary from LR HSI under the Bayesian framework. Then it estimates the spatial coefﬁcients of the HR MSI by Bayesian sparse coding. Eventually , the HR HSI is generated by combining the spatial dictionary with the spatial coef ﬁcients. Howe ver , the spectral information extracted from LR HSI may not be the optimal spectral bases for MSI, since MSI is not utilized during the optimization procedure. Matrix factorization-based approaches ha ve been activ ely studied recently [10], [12], [39], [40], with Kawakami et al . [10] being the ﬁrst that introduced matrix factorization to solve the HSI-SR problem. The method learns a spectral basis from LR HSI and then uses this basis to extract sparse coef- ﬁcients from HR MSI with non-negati ve constraints. Similar to Bayesian-based approaches, the HR HSI is generated by linearly combining the estimated bases with the coefﬁcients. Y okoya et al . [39] decomposed both the LR HSI and HR MSI alternatively to achie ve the optimal non-ne gati ve bases and coefﬁcients that are used to generate HR HSI. W ycoff et al . [41] solved the problem with alternating direction method of multipliers (ADMM). Lanaras et al . [12] further improv ed the fusion results by introducing a sparse constraint. Howe ver , most methods [12], [39], [41] are based on the same assumption that the do wn-sampling function between the spatial coef ﬁcients of HR HSI and LR HSI is kno wn beforehand. In practice, this assumption is not always true due to the complex en vironmental conditions. Most of these approaches focus on the spectral characteristics of the HSI, where the spectral information of the HSI is extracted while the spatial relationship between pixels is untouched. Recently , there have been a fe w approaches proposed to address the HSI-SR problem based on tensor decomposition [42]–[45], which explored both the spectral and spatial correlations of the HSI by learning a core tensor and the dictionaries along three dimensions, i.e ., the spectral dimension, and two spatial dimensions. In this way , the information of each dimension can be represented with its o wn dictionary , while the core-tensor is shared among multi-modalities. Although this formulation works well on well-registered images, it is problematic on unregistered image pairs, since the core-tensor cannot be shared between HSI and MSI with large displacements. In addition, it might limit the reconstruction ability of the method on remote sensing images which have only a few redundant structures on the spatial domain of HSI. Chen et al . [46] proposed to simultaneously register im- ages during the fusing process. Howe ver , it only works on panchromatic and MSI. Zhou et al . [47] proposed an inte grated approach for registration and fusion, which addressed the problem of HSI-SR on unregistered image pairs. Ho wever , the registration is still a required step, and the fusion and reg- istration are performed independently , which would introduce additional errors during optimization. B. Deep learning based Super-Resolution Deep learning attracts increasing attention for natural image super-resolution since 2014 when Dong et al . ﬁrst introduced con volution neural network (CNN) to solv e the problem of natural image super-resolution and demonstrated state-of-the- art restoration quality [48]. Ledig et al . proposed a method based on generati ve adversarial network and skipped residual network. The method employed perceptual loss through the VGG network which can recover photo-realistic textures from heavily down-s ampled images [49]. Usually , natural image SR methods only work up to 8 times upscaling. There have been several attempts to address the MSI-SR or HSI-SR with deep learning in a supervised fashion. In 2015, a modiﬁed sparse tied-weights denoising autoencoder was proposed by Huang et al. [50] to enhance the resolution of MSI. The method assumes that the mapping function between LR and HR P AN is the same as the one between LR and HR MSI. Masi et al . proposed a supervised three-layer SRCNN [51] to learn the mapping function between LR MSI and HR MSI. Similar to [51], W ei et al . [52] learned the mapping function with deep residual network. Li et al . [53] solved the HSI-SR problem by learning a mapping function with spatial constraint strategy and con volutional neural network (CNN). Dian et al . [54] initialized the HR HSI from the fusion frame work via the Sylv ester equation. Then, the mapping function is trained between the initialized HR-HSI and the reference HR HSI through deep residual learning. Xie et al . [55] reduced spectral distortions of the reconstructed HR HSI by exploiting the approximate lo w-rankness prior along the spectral domain of the HSI. Howe ver , these supervised deep learning-based methods can not be readily adopted on HSI-SR for real applications due to three reasons. First, the scale differences between LR HSI and HR MSI can reach as large as 10, i.e ., one pixel in HSI covers 100 pixels in MSI. In some applications, the scale dif ference can e ven be 25 [56] and 30 [57]. But most e xisting super -resolution methods only work on up to 8 times upscaling. Second, they are designed to ﬁnd an end-to- end mapping function between the LR images and HR images IEEE TRANSA CTIONS ON GEOSCIENCE AND REMO TE SENSING 4 under the assumption that the mapping function is the same for different images. Ho we ver , the mapping function may not remain the same for images acquired with different sensors. Even for the data collected from the same sensor , the mapping function for different spectral bands may not be the same. Thus the assumption may cause se vere spectral distortion. Third, training a mapping function is a supervised problem that requires a lar ge dataset, the down-sampling function, and the av ailability of the HR HSI, making supervised learning unrealistic for HSI. Recently , we proposed an unsupervised uSDN [21], which addressed the problem of HSI-SR with deep network models. Speciﬁcally , it extracts the spectral and spatial information through two encoder-decoder networks from the two modal- ities. The angular difference between the LR HSI and HR MSI representations is minimized to reduce the spectral dis- tortion for every ten iterations. Fu et al . [58] proposed an unsupervised CNN-based method for HSI super-resolution, which learns a mapping function between the RGB space and the spectral space with spatial constraint for the HR HSI. Zheng et al . [59] proposed an unsupervised method with learnable downsampling function based on the theory of linear unmixing. These methods can achie ve promising results for different HSI datasets. Howe ver , they are speciﬁcally designed for well-registered image pairs. I I I . P RO B L E M F O R M U L A T I O N Giv en the LR HSI, ¯ Y h ∈ R m × n × L , where m , n and L denote its width, height and number of spectral bands, respectiv ely , and the unregistered HR MSI with ov erlapped region, ¯ Y m ∈ R M × N × l , where M , N and l denote its width, height and number of spectral bands, respecti vely , the goal is to reconstruct the HR HSI ¯ X ∈ R M × N × L based on the content of HR MSI. In general, MSI has much higher spatial resolution than HSI, and HSI has much higher spectral resolution than MSI, i.e ., M  m , N  n and L  l . T o facilitate the subsequent processing, we unfold the 3D images into 2D matrices, Y h ∈ R mn × L , Y m ∈ R M N × l and X ∈ R M N × L , such that each row represents the spectral reﬂectance of a single pixel. Since each pixel in both LR HSI and HR MSI can be approximated by a linear combination of c spectral bases D [11], [12], [21], the matrices can be further decomposed as Y h = S h D h (1) Y m = S m D m (2) X = S m D h (3) where D h ∈ R c × L , D m ∈ R c × l denote the spectral bases of LR HSI and HR MSI, respectively . S h ∈ R mn × c , S m ∈ R M N × c denote the coefﬁcients of LR HSI and HR MSI, respectiv ely , Since S h or S m indicate how the spectral bases are combined for individual pixels at speciﬁc locations, they preserve the spatial structure of HSI. Note that the beneﬁt of unfolding the data into 2D matrices is that, the extraction procedure can decouple each pixel without changing the relationship of the pixel and its neighborhood pixels, thus the reconstructed image has less artifacts [11], [12], [21]. In real applications, although the areas captured by LR HSI and HR MSI might not be registered well, they always have ov erlapping regions, and the LR HSI includes all the spectral basis of HR MSI i.e ., they share the same type of materials carrying speciﬁc spectral signatures. The relationship between LR HSI and HR MSI can be e xpressed as C h 6 = C m , C h ∩ C m 6 = ∅ , D m = D h R , (4) where C h and C m denote the contents of LR HSI and HR MSI, respectiv ely . R ∈ R L × l is the prior transformation matrix of sensor [10], [12], [13], [17], [21], [35], [37]–[39], which describes the relationship between HSI and MSI bases. W ith D h ∈ R c × L carrying the high-resolution spectral information and S m ∈ R M N × c carrying the high-resolution spatial information, the desired HR HSI, X , is generated by Eq. (3). The challenges to solve this problem are that 1) the ground truth X is not av ailable, and 2) the LR HSI and HR MSI do not cover the same re gion. T o solve this unsupervised and unregistered HR-HSI problem, the key is to take advan- tage of the shared spectral information D h among different modalities. In addition, the representations of both modalities specifying the spatial information of scene should meet the non-negati ve and sum-to-one physical constraints. Moreover , in the ideal case, for the pixels in the overlapped re gion between LR HSI and HR MSI, their spatial information should follow similar patterns, because they carry the information of how the reﬂectance of shared materials (spectral basis) are mixed in each location. Therefore, the network should have the ability to learn correlated spatial and spectral information from unregistered multi-modality images to maximize its ability to prev ent spectral distortion. I V . P R OP O S E D A P P RO AC H W e propose an unsupervised architecture for unre gistered LR HSI and HR MSI as sho wn in Fig. 3. Here, we highlight the structural uniqueness of the network. T o extract corre- lated spectral and spatial information of unregistered multi- modalities, the network projects the LR HSI into the same statistical space as HR MSI, so that the two modalities can share the same encoder and decoder . The encoder enforces the representations (carrying spatial information) of both modal- ities to follow a Dirichlet distrib ution, to naturally meet the non-negati ve and sum-to-one physical properties. In order to prev ent spectral distortion, mutual information is introduced during optimization to maximize the correlation between the representations of LR HSI and HR MSI. And the collaborativ e l 2 , 1 loss is adopted to encourage the network to extract accurate spectral and spatial information from both modalities. A. Network Ar chitectur e As shown in Fig. 3, the network reconstructs both the LR HSI Y h and HR MSI Y m by sharing the same encoder and decoder network structure. Since the number of the spectral band L of the HSI Y h is much larger than that of the spectral band l of MSI Y m , we project Y h into an l dimensional space by ˜ Y h = Y h R , such that ˜ Y h represents the LR MSI lying in IEEE TRANSA CTIONS ON GEOSCIENCE AND REMO TE SENSING 5 Fig. 3. Simpliﬁed architecture of u 2 -MDN. the same space as HR MSI. In this way , both modalities are linked to share the same encoder structure without additional parameters. On the other hand, the spectral information D m of MSI is highly compressed from that of HSI, i.e ., D m = D h R . Thus, it is very unstable and difﬁcult to directly extract D h , carrying high spectral resolution from, MSI with lo w- spectral resolution. But the spectral basis of HR MSI can be transformed from those of LR HSI which possesses more spectral information, i.e ., ˆ Y m = S m D m = S m D h R = X R . Therefore, in the network design, both modalities share the same decoder structure D h . The transformation matrix R is added as ﬁxed weights to reconstruct the HR MSI ˆ Y m . Then the output of the layer before the ﬁxed weights is actually X , according to Eq. (3). Let us deﬁne the input domain as Y = { ˜ Y h , Y m } , output domain as ˆ Y = { ˆ Y h , X } , and the representation domain as S = { S h , S m } , the encoder of the network E φ : Y → S , maps the input data to low-dimensional representations (latent variables on the Bottleneck hidden layer), i.e ., p φ ( S |Y ) and the decoder D ψ : S → ˆ Y reconstructs the data from the representations, i.e ., p ψ ( ˆ Y |S ) . Note that the bottleneck hidden layer S behav es as the representation layer that reﬂects the spatial information, and the weights ψ of the decoder D ψ serve as D h in Eq. (1), respectiv ely . This correspondence is further elaborated belo w . T aking the procedure of training LR HSI as an example. The LR HSI is reconstructed by ˆ Y h = D ψ ( S h ) , where S h = E φ ( Y h ) . Since Y h carries the high-resolution spectral information, to better extract the spectral basis, part of the network should simulate the prior relationship described in Eq. (1). That is, the representation layer S h acts as the proportional coef ﬁcients and the weights ψ of the decoder correspond to the spectral basis D h in Eq. (1). Therefore, in the network structure, we deﬁne ψ = W 1 W 2 ... W k = D h with identity activ ation function without bias, where W k denotes the weights in the k th layer . In this way , D h preserves the spectral information of LR HSI, and the latent variables S h preserves the spatial information effecti vely . More imple- mentation details will be described in Sec. IV -B. Eventually , the desired HR HSI is generated directly by X = S m D h . Note that the dashed lines in Fig. 3 sho w the path of back-propagation which will be elaborated in Sec. IV -C. Fig. 4. Details of the encoder-decoder structure. B. Mutual Diric hlet Network with Collaborative Constraint T o extract better spectral information and naturally in- corporate the physical requirements of spatial information, i.e ., non-ne gati ve and sum-to-one, the representations S are encouraged to follo w a Dirichlet distribution. In addition, the network should hav e the ability to learn the correlated and optimized representations generated from the encoder E φ for both modalities. Thus, in the network design, we maximize the mutual information (MI) between the representations of LR HSI, S h , and HR MSI , S m , by maximizing the MI between the input images and their own representations. T o further reduce the spectral distortion, the collaborativ e l 2 , 1 loss is incorporated into the network instead of the traditional l 2 re- construction loss. The detailed encoder-decoder structure and the MI structure are shown in Fig. 4 and Fig. 5, respectiv ely . 1) Dirichlet Structur e: T o generate representations with Dirichlet distribution, we incorporate the stick-breaking struc- ture between the encoder and representation layers. The stick- breaking process was ﬁrst proposed by Sethuranman [60] back in 1994. It can be illustrated as breaking a unit-length stick into c pieces, the length of which follo ws a Dirichlet distribution. Nalisnick and Smyth, and Qu et al . successfully coupled the expressi veness of networks with the Bayesian nonparametric model through a stick-breaking process [21], [61]. Here, we follow the work of [21], [61], which draw the samples of S from Kumarasw amy distrib ution [62]. The stick-breaking process is integrated into the network between the encoder E φ and the decoder D ψ , as shown in Fig. 3. Assuming that the generated representation row vector is denoted as s i = { s ij } 1 ≤ j ≤ c , we hav e 0 ≤ s ij ≤ 1 , and P c j =1 s ij = 1 . Each v ariable s ij can be deﬁned as s ij =  v i 1 for j = 1 v ij Q k 1 , (5) where v ik ∼ Beta ( u, α, β ) . Since it is difﬁcult to dra w samples directly from the Beta distribution, we draw samples from the in verse transform of Kumaraswamy distribution. The beneﬁt of the Kumarasw amy distrib ution is that it has a closed-form CDF , and it is equiv alent to the Beta distribution when α = 1 or β = 1 . Let α = 1 , we ha ve v ik ∼ 1 − (1 − u 1 β i ik ) . (6) Both parameters u ik and β i are learned through the network for each row vector as illustrated in Fig. 3. Because β > 0 , a softplus is adopted as the activ ation function [63] at the β layer . Similarly , a sigmoid [64] is used to map u into (0 , 1) range at the u layer . Due to the f act that the spectral IEEE TRANSA CTIONS ON GEOSCIENCE AND REMO TE SENSING 6 signatures of data are dif ferent for each image pair , the network only trains one group of data, i.e ., LR HSI Y h and HR MSI Y m , to reconstruct its o wn HR HSI X . Therefore, to increase the representation po wer of the network, the encoder of the network is densely connected, i.e ., each layer is fully connected with all its subsequent layers [65]. 2) Mutual Dirichlet Network: Before further describing the details of the network, we ﬁrst explain the reason that motiv ates this design. Gi ven unregistered multi-modalities LR HSI, Y h and HR MSI, Y m , and the desired HR HSI, X , each pixel of which indicates the mixed spectral reﬂection of the captured area. The overlapped region of the three modalities is deﬁned by C . Ideally , each pixel in the o verlapped region of these three modalities should possess the same spectral signatures. In addition, the corresponding proportional coefﬁcients of X and Y m should be the same for a giv en pixel within C . Since Y h is a do wn-sampling and transformed version of X , its proportional coefﬁcients (representations) should follow the same pattern as that of X and Y m , i.e ., S h and S m should be highly correlated although with different resolution. One example is sho wn in Fig. 1. Therefore, to generate HR HSI with low spectral distortion, it is necessary to encourage the representations S h and S m to follow similar patterns. Howe ver , traditional constraints like correlation may not work properly , because the input LR HSI and HR MSI are not registered with each other and the mapping function E φ , between the input Y and the representations S , holds the non- linear property . Therefore, we introduce MI, which captures the non-linear statistical dependencies between v ariables [66], to reinforce the representations of LR HSI and HR MSI to follow similar patterns with statistics. Mutual information has been widely used for multi-modality registrations [67], [68]. It is a Shannon-entropy-based mea- surement of mutual independence between two random vari- ables, e .g ., S h and S m . The mutual information I ( S h ; S m ) measures how much uncertainty of one v ariable ( S h or S m ) is reduced giv en the other v ariable ( S m or S h ). Mathematically , it is deﬁned as I ( S h ; S m ) = H ( S h ) − H ( S h | S m ) = R S h ×S m log d P S h S m d P S h ⊗ d P S m d P S h S m (7) where H indicates the Shannon entropy , H ( S h | S m ) is the conditional entrop y of S h giv en S m . d P S h S m is the joint probability distribution, and P S h , P S m denote the marginals. Belghazi et al . [69] introduced an MI estimator , which allows neural network to estimate MI through back-propagation, by adopting the concept of Donsker -V aradhan representation [70]. In order to maximally preserve the spectral information of the reconstructed HR HSI, our goal is to encourage the two representations S h and S m to follo w similar patterns by maximizing their MI, I ( S h ; S m ) , during the optimization procedure. Since S h = E φ ( Y h ) and S m = E φ ( Y m ) , the MI can also be expressed as I ( E φ ( Y h ); E φ ( Y m )) . Howe ver , it is difﬁcult to maximize such MI directly with neural networks, because the two modalities do not match with each other in our scenario. Therefore, we maximize the av erage MI between the representations and their o wn inputs, i.e ., I ( Y h , E φ ( Y h )) Fig. 5. Details of the MI structure. and I ( Y m , E φ ( Y m )) . The beneﬁt of doing this is two-fold. First, by optimizing the encoder weights E φ , it is able to greatly improve the quality of indi vidual representations [71]. Thus it helps the network to preserv e the spectral and spatial information better . Second, since the multi-modalities, i.e ., Y h and Y m , are correlated, and the dependencies (MI) between the representations and multi-modalities are maximized, it also maximizes the MI, I ( S h ; S m ) , between different modalities, such that S h and S m are encouraged to follo w similar patterns. Let’ s e xplain it with a toy example. W e assume that both Y h and Y m cov er the same material ‘brick’, the spectral pixel of which in the image pairs are denoted by y h and y m , respectiv ely , and ˜ y h = y h R . ˜ y h , and y m may not be identical to each other in real applications, but they are correlated and should possess similar spectral information. By maximizing the MI between the image and their representations, we are able to ﬁnd a better representation s h which reduces the uncertainty of ˜ y h to a large extent, and also a better representation s m , which reduces the uncertainty of ˜ y m to a large extent. Since ˜ y h and y m are similar , s m and s h should also be similar . In this way , the MI can regularize the solution space, such that S h and S m hav e similar patterns. T aking I ( Y h , E φ ( Y h )) as an example. It is equiv alent to Kullback-Leibler (KL) div ergence [69] between the joint distribution P Y h E φ ( Y h ) and the product of the marginals P Y h ⊗ P E φ ( Y h ) . Let P = P Y h E φ ( Y h ) and Q = P Y h ⊗ P E φ ( Y h ) , we can further express MI as I ( Y h , E φ ( Y h )) = E P [log d P d Q ] = D K L ( P k Q ) (8) Such MI can be maximized by maximizing the KL- div ergence’ s lo wer bound based on Donsker -V aradhan (DV) representation [70]. Since we do not need to calculate the exact MI, we introduce an alternative lower bound based on Jensen-Shannon which works better than the D V -based objectiv e function [71]. In the network design, an additional network T w : Y × S → R is built with tw o fully-connected layers, whose weights are denoted as w . During the training procedure, the raw image and the extracted representations are stacked and fed into the network as sho wn in Fig. 5. Then the estimator can be deﬁned as I φ,w ( Y h , E φ ( Y h )) : = E P [ − sp ( −T w,φ ( Y h , E φ ( Y h ))] (9) where sp ( x ) = log(1 + e x ) . Note that we ignore the negati ve samples in D V -based objecti ve function [71], which are usually generated by shuf ﬂing the input data. Because it is unstable to train the network with random shifting input data giv en only two input data pairs. Since both E φ and T w are used to ﬁnd the IEEE TRANSA CTIONS ON GEOSCIENCE AND REMO TE SENSING 7 optimal representations, the y are updated together . Combined with the MSI MI, the objectiv e function is deﬁned as L I ( φ, w ) = I φ,w ( Y h , E φ ( Y h )) + I φ,w ( Y m , E φ ( Y m )) (10) Since the encoder E φ and the estimation network of MI T w for both LR HSI and HR MSI share the same weights φ and w , their optimized representations follow similar patterns. More optimization details are described in Sec. IV -C. In order to extract better spectral information, we adopt the collaborativ e reconstruction loss with l 2 , 1 norm [72] instead of traditional l 2 norm for both LR HSI and HR MSI. The objectiv e function for l 2 , 1 loss is deﬁned as L 2 , 1 ( φ, ψ ) = k D ψ ( E φ ( Y h )) − Y h k 2 , 1 + k D ψ ( E φ ( Y m )) − Y m k 2 , 1 (11) where k X k 2 , 1 = P m i =1 q P n j =1 X 2 i,j . l 2 , 1 norm can be treated as the sequential application of the l 2 norm on each pixel vector , followed by the l 1 norm on the image to enforce the reconstruction errors of the entire image to be sparse, that is, most of the reconstruction errors of indi vidual pixels to be zero, such that the individual pixels would be reconstructed as accurately as possible. In this way , it extracts better spectral information and further reduces the spectral distortion. C. Optimization and Implementation Details The objective functions of the proposed network architec- ture can then be expressed as: L ( φ, ψ , w ) = L 2 , 1 ( φ, ψ ) − λ L I ( φ, w ) + µ k ψ k 2 F (12) where l 2 norm is applied on the decoder weights ψ to prev ent ov er-ﬁtting. λ and µ are the parameters that balance the trade-off between reconstruction error , neg ativ e of mutual information and weight loss, respectiv ely . Before feeding into the network, the spectral vectors in LR HSI and HR MSI are transformed to zero-mean vectors by reducing the vector mean of their o wn image. Since the spectral information of MSI has been compressed too much ( e.g ., HSI has 31 bands, but MSI has 3 bands), the decoder of the network is only updated by LR HSI data to stabilize the network. The number of the input nodes is equal to the band number of HR MSI l . LR HSI Y h is projected into a l dimensional space by ˜ Y h = Y h R before feeding into the network, while HR MSI is directly fed into the network. The number of the output nodes is chosen based on the band number of LR HSI L . When the input of the network is Y h , the output of the decoder is ˆ Y h . When the input of the network is Y m , the reconstructed ˆ Y m is generated by multiplying the output of the decoder with ﬁxed weights R . The encoder -decoder is constructed with fully-connected layers and the detailed structure is sho wn in Fig. 4. The input of the encoder has l neurons carrying each pixel of the image, which is densely connected by stacking with all its subsequent layers. Let’ s take l = 8 as an example, the input layer has 8 neurons, and we assume that the second and the third layers hav e 3 neurons, respectively . The input layer is passed to the second layer by stacking the ﬁrst layer on top of the second layer . Then the stacked layer is passed to the third layer by stacking 11 neurons on top of the third layer . In this way , the encoder is densely connected. The layer v is drawn with Eq. (6) gi ven layer u and layer β , which are learned by back- propagation. β has only one node, which is learned by a two-layer densely-connected fully-connected neural network. It denotes the distribution parameter of each pixel. u has 15 nodes, which are learned by a four -layer densely-connected neural-network. The representation layer S with 15 nodes is constructed with v and β , according to Eq. (5). The decoder has two fully-connected layers. The number of nodes and the activ ation functions for dif ferent layers are sho wn in T able I. T ABLE I T H E N U MB E R O F L A Y E RS A N D N O D ES I N T H E P RO P O SE D N E TW O R K . u / β encoder u / β / v T w decoder # layers 4/2 1/1/1 2 2 # nodes [3,3,3,3]/[3,3] 15/1/15 [18,1] [15,15] activ ation linear sigmoid/softplus/linear sigmoid linear The training is done in an unsupervised fashion without ground truth HR HSI. Giv en multi-modalities LR HSI and HR MSI, the network is optimized with back-propagation to extract their correlated spectral bases and representations, as illustrated in Fig. 3 with red-dashed lines. The training process stops when the reconstruction error of the network does not decrease anymore. Then we can feed the HR MSI into the trained network, and obtain the reconstructed HR HSI, X , from the output of the decoder . V . E X P E R I M E N T S A N D R E S U LT S A. Datesets The proposed u 2 -MDN has been extensiv ely ev aluated with two widely used benchmark datasets, CA VE [73] and Harvard [1], and ﬁve remote sensing datasets, Hyperspec Chikusei, CASI Uni versity of Houston, R OSIS-3 Uni versity of Pavia, HYDICE W ashington DC Mall [4] and real data without simulation, as summarized in T able II. 1) Cave dataset: The CA VE dataset consists of 32 HR HSI images and each of which has a dimension of 512 × 512 with 31 spectral bands taken within the wav elength range 400–700 nm at an interval of 10 nm. 2) Harvar d dataset: The Harvard dataset includes 50 HR HSI images with both indoor and outdoor scenes. The images are cropped to 1024 × 1024 , with 31 bands taken at an interval of 10 nm within the wa velength range of 420–720 nm. 3) Hyperspec Chikusei dataset: The dataset was taken by Headwall’ s Hyperspec-VNIR-C sensor ov er Chikusei, Ibaraki, Japan. The image has a ground sampling distance (GSD) of 2.5 m and was cropped to 540 × 420 with 128 bands, cov ering the wa velength range from 363 to 1018 nm. Please refer to [74] for more details. 4) University of Houston dataset: This dataset was acquired by ITRES CASI-1500 sensor over the Uni versity of Houston campus with a GSD of 2.5 m [75]. It was cropped to 320 × 540 with 144 bands taken within the wa velength range 364–1046 nm. IEEE TRANSA CTIONS ON GEOSCIENCE AND REMO TE SENSING 8 5) University of P avia dataset: The dataset was taken by the reﬂectiv e optics spectrographic imaging system (ROSIS-3) sensor ov er the Uni versity of Pa via, Italy , with a GSD of 1.3 m. It was cropped to 560 × 320 with 103 spectral bands taken within the w av elength range 430–830 nm. 6) W ashington DC Mall dataset: The dataset was acquired by the hyperspectral digital imagery collection experiment (HYDICE) sensor over the Mall in W ashington DC, USA at a GSD of 2.5 m. The image was cropped to 420 × 300 with 191 bands covering the wa velength range from 400 to 2500 nm. 7) Real dataset without simulation: The LR HSI over the Cuprite mining district, Nev ada, US, was acquired by Hyperion with a GSD of 30 m, the image size of which is 100 × 153 with 167 bands taken within the wa velength range from 426 to 2355 nm. The HR MSI is the SWIR data of W orldV eiw3 with a GSD of 7.5 m, the image size of which is 460 × 670 with 8 bands covering the wav elength range from 1209 to 2329 nm. Both rigid and nonrigid deformation exist as sho wn in Figs. 14a and 14b. B. Experimental Setup For real applications, the mis-registration of two modali- ties is crucial for HSI-SR [20], [22], [47]. T o demonstrate how misregistration would inﬂuence the performance of HSI- SR, we conduct two groups of experiments to ev aluate the various approaches, i.e ., the e xperiments on well-re gistered image pairs, and on unregistered image pairs. By conducting experiments in these tw o scenarios, we intend to show that misregistration would inﬂuence the performance of HSI-SR signiﬁcantly . Therefore, it is very important to dev elop algo- rithms that can directly work on unre gistered image pairs. The well-registered image pairs are generated in two differ - ent ways following the widely-used protocols for benchmark datasets [12], [42], [76] and the W alds protocol [4], [77] for remote sensing datasets. • F or benchmark HSI datasets, CA VE [73] and Harvard [1], the image pairs are generated with the extreme Super- Resolution (SR) ratio of 32, where the LR HSI Y h is obtained by av eraging the HR HSI ov er 32 × 32 disjoint blocks. The HR MSI with 3 bands are generated by multiplying the HR HSI with the giv en spectral response matrix R of Nikon D700 [11], [12], [21]. Note that we adopt this setting because it is the same protocol used by state-of-the-art methods [11], [12], [42], [76] on general hyperspectral images. In addition, for remote sensing applications, the scale dif ference can even be 25 [56] and 30 [57]. With such settings, we are able to ev aluate the proposed method in extreme scenarios. • F or remote sensing datasets, the image pairs are simulated with the W alds protocol [77], where the LR HSI is gen- erated by applying a Gaussian ﬁlter with its full width at half maximum (FWHM) equal to the SR ratio, to match a plausible system modulation transfer function (MTF) [4], [17], [31]. The MSI is generated by de grading the HR HSI in the spectral domain using MSI spectral reﬂection functions (SRFs) from dif ferent sensors as ﬁlters. The datasets are listed in T able II. Please refer to [4] for more details. Note that since the scales are different between the real LR HSI and HR MSI for dif ferent sensors [4], [56], [57], the SR ratio is set to 4, 5, 6 and 8, to ev aluate the rob ustness of the proposed method. The noise is added to the image with a signal-to-noise-ratio (SNR) of 30 dB in all bands. The unregistered image pairs are generated in the same way as that of the well-registered image pairs, except that the LR HSI images are further distorted with rigid or nonrigid deformations. • F or benchmark HSI datasets, CA VE [73] and Harvard [1], it is easier to introduce rigid deformation. Thus, the LR HSI is further rotated with 5 ◦ and cropped by 15% of its surrounding pixels, e.g ., for images in the CA VE dataset, 39,322 pixels of the MSI are not covered in the LR HSI; and for images in the Harvard dataset, 157,290 pixels of the MSI are not covered in the LR HSI. • F or remote sensing datasets, it is usually unavoidable to introduce nonrigid deformation [22]. Thus, following the protocol in [47], [78], the nonrigid distortion is emulated by introducing random shifts in pixels. • F or real data, the LR HSI is directly captured from Hyperion and the HR MSI is captured from W orldV iew3. Both rigid and nonrigid deformations exist as shown in Figs. 14a and 14b. T ABLE II D AT A S E T PA IR S F RO M D I FF E RE N T S E N S OR S U S ED I N T H E E X PE R I M EN T S . Dataset HSI sensor MSI sensor SR Ratio CA VE Apogee Alta U260 Nikon 32 Harvard Nuance FX Nikon 32 Chikusei Hyperspec W orldV iew2 6 Houston CASI Sentinel-2 5 Pa via R OSIS-3 QuickBird 8 W ashington HYDICE QuickBird 4 Real data Hyperion W orldV iew3 4 The results of the proposed method on indi vidual images in Fig. 6 are compared with nine state-of-the-art methods, including traditional methods such as CS-based GSA [15] and MRA-based SFIM [33], matrix f actorization based methods such as CNMF [39] and Lanaras’ CSU [12], Bayesian- based methods such as HySure [38], sparse-coding based methods such as NSSR [76], tensor-based method [42], the integrated registration and fusion method [47], and the uSDN method [21] that belong to different categories of HSI-SR. These methods also reported the best performance [12], [17], [21], with the original code made av ailable by the authors. Note that the proposed u 2 -MDN is unsupervised, i.e ., the HR HSI is not av ailable during the training procedure. Thus, for a fair comparison, only unsupervised methods are included in the experiments. The average results on the datasets are also reported to e valuate the robustness of the proposed method. For rigid deformation, since the resolution of HSI does not match that of the degraded MSI, i.e ., there exists lar ge displacement between two modalities, only ﬁv e methods may reconstruct HR HSI from unregistered images without large errors. Thus, the proposed method is compared with these ﬁv e state-of-the-art methods, i.e ., GSA [15], SFIM [33], IEEE TRANSA CTIONS ON GEOSCIENCE AND REMO TE SENSING 9 CNMF [39], NSSR [76], and the integrated registration and fusion method [47] on unregistered image pairs. Note that, as discussed in Sec. III, in order to work on unregistered image pairs, the LR HSI should include all the spectral bases of HR MSI. F or the CA VE and Harvard datasets, not all the image pairs meet this requirement after rotation and cropping. Thus, we choose sev en commonly used image pairs from the benchmark dataset, where the LR HSI includes all the spectral bases of HR MSI even after rotation and cropping. The chosen image pairs are shown in Fig. 6. The remote sensing images are sho wn in Fig. 7. (a) balloon (b) cloth (c) pompoms (d) spool (e) img1 (f) imgb5 (g) imgc5 Fig. 6. The HR MSI of individual test images from the CA VE [73] (top row) and Harvard [1] (bottom row) datasets. C. Evaluation Metrics For quantitati ve comparison, the erreur relativ e globale adimensionnelle de synth ` ese (ERGAS), the peak signal-to- noise ratio (PSNR), and the spectral angle mapper (SAM) are applied to e valuate the quality of the reconstructed HSI. ERGAS pro vides a measurement of the band-wise normal- ized root of mean square error (RMSE) between the reference HSI, X , and the reconstructed HSI, ˆ X , with the best value at 0 [77]. It is deﬁned as ERGAS ( X , ˆ X ) = 100 sr v u u t 1 L L X i =1 mean k X i − ˆ X i k 2 2 ( mean X i ) 2 , (13) where sr denotes the sr factor between the HR MSI and LR HSI, L denotes the number of spectral bands of the reconstructed ˆ X . PSNR is the av erage ratio between the maximum po wer of the image and the power of the residual errors in all the spectral bands. A larger PSNR indicates a higher spatial quality of the reconstructed HSI. For each image band of HSI, the PSNR is deﬁned as PSNR ( X i , ˆ X i ) = 10 · log 10  max( X i ) 2 mean k X i − ˆ X i k 2 2  (14) SAM [79] is commonly used to quantify the spectral distor - tion of the reconstructed HSI. The larger the SAM, the worse (a) Chikusei (b) Houston (c) Pavia (d) W ashington Fig. 7. Color composite of the remote sensing datasets from [4]. The reference HR HSI of the (a) Chikusei, (b) Houston, (c) Pa via and (d) W ashington datasets. the spectral distortion of the reconstructed HSI. F or each HSI pixel ˆ X j , the SAM is deﬁned as SAM ( X j , ˆ X j ) = arccos X T j ˆ X j k X j k 2 k ˆ X j k 2 ! (15) The global SAM is estimated by av eraging the SAM ov er all the pixels in the entire image. D. Experimental Results on Register ed Image P airs For a fair comparison, we ﬁrst perform experiments on the general case when LR HSI and HR MSI are well reg- istered. T able III sho w the experimental results of 7 groups of commonly benchmarked images from the CA VE and Harvard datasets [11], [12], [21], [76]. T able IV sho w the experimental results of the remote sensing images. The average results of the datasets are sho wn in T able V. Note that, in order to show ho w the method works in dif ferent scenarios, the data are not normalized for e valuation. Since the intensities of the Harvard dataset are quite small, the ERGAS of the reconstructed images is generally smaller than those of the CA VE dataset and remote sensing dataset. W e observe that CS-based GSA [15] is stable on both the benchmarked and remote sensing datasets. Howe ver , it could not preserv e the spectral information well especially on the benchmarked datasets. Matrix-factorization-based CSU [12] works better than CNMF [39] on the benchmarked CA VE and Harvard datasets. Howe ver , its performance is worse than that IEEE TRANSA CTIONS ON GEOSCIENCE AND REMO TE SENSING 10 T ABLE III B E NC H M A RK E D R E S U L T S I N T E R MS O F E R GA S ( E ) , P S N R ( P ) A N D S A M ( S ) O N W E LL - R E GI S T ER E D I M AG E PA IR S . Methods CA VE Harvard balloon cloth pompoms spool img1 imgb5 imgc5 E P S E P S E P S E P S E P S E P S E P S GSA 0.19 41.89 4.07 0.40 32.51 5.95 0.37 34.78 7.39 0.41 39.61 9.53 0.12 40.41 2.19 0.16 39.07 2.19 0.12 38.82 1.67 SFIM 0.59 33.52 8.45 0.54 30.59 5.25 3.76 25.39 11.89 2.93 28.63 19.71 0.23 32.62 2.10 0.29 33.15 3.52 0.23 35.62 2.84 CNMF 0.26 39.27 9.71 0.54 30.52 6.55 0.31 35.45 6.32 0.54 37.28 16.77 0.15 37.25 2.86 0.17 39.06 2.14 0.13 38.49 2.64 CSU 0.19 41.52 4.68 0.40 33.47 5.52 0.28 36.81 6.01 0.45 39.64 6.84 0.12 39.12 2.30 0.18 39.01 2.37 0.12 39.05 2.38 HySure 0.34 37.08 9.92 0.53 30.22 7.13 0.52 31.68 10.97 0.55 37.47 15.54 0.18 35.82 4.27 0.34 35.52 3.45 0.19 36.75 2.34 NSSR 0.16 43.2 3.35 0.31 33.3 4.58 0.26 37.71 5.31 0.45 39.41 6.91 0.14 39.91 2.24 0.17 39.12 2.17 0.12 38.87 1.87 CSTF 0.14 44.71 3.97 0.39 32.51 5.25 0.27 36.72 6.09 0.38 42.06 8.61 0.21 33.73 2.77 0.25 34.98 2.46 0.22 32.48 1.96 Integrated 0.28 37.75 2.64 1.47 21.55 8.73 0.52 30.29 5.99 1.03 30.94 6.77 0.32 29.81 2.68 0.63 26.29 2.31 0.27 30.47 1.79 uSDN 0.20 41.54 4.56 0.35 33.48 4.16 0.25 37.84 5.43 0.40 38.49 13.01 0.12 39.30 2.27 0.16 39.72 2.10 0.11 39.12 2.58 u 2 -MDN 0.16 43.59 1.93 0.30 34.85 4.31 0.19 39.12 3.46 0.37 40.08 4.47 0.11 40.97 2.06 0.15 39.76 2.08 0.11 39.19 1.77 T ABLE IV R E MO TE S E N SI N G R E SU LT S I N T E R MS O F E R GA S , P S NR A N D S A M O N W E LL - R E GI S T E RE D I M AG E PA IR S . Methods Chikusei Houston Pa via W ashington ERGAS PSNR SAM ERGAS PSNR SAM ERGAS PSNR SAM ERGAS PSNR SAM GSA 1.432 42.1264 1.4478 2.7859 34.133 1.8443 1.0661 38.7949 3.5647 3.518 37.2308 2.187 SFIM 1.2284 47.4358 0.9379 2.9415 33.9958 0.9938 0.7274 42.9283 2.312 3.0356 39.2045 1.2382 CNMF 1.479 47.8427 1.1602 2.9896 33.1454 1.3882 0.7712 43.2417 2.3623 3.0341 39.1491 1.388 CSU 2.4705 35.8506 1.9208 3.2773 32.3793 2.0193 1.7283 33.9385 3.5754 4.2854 34.1841 1.9706 HySure 1.2216 48.7601 1.0934 2.9619 34.5328 1.7281 0.7767 43.2719 2.6094 3.3232 39.0 1.6808 NSSR 2.6427 33.5161 2.5263 4.7663 29.2931 5.3182 3.7068 28.8702 5.7786 9.1737 29.9297 4.0385 CSTF 1.9024 38.1548 1.7884 4.0207 29.5598 6.4 1.1877 37.3 4.0719 22.4659 20.4012 20.1433 Integrated 1.3854 43.4116 1.4104 4.0627 28.9168 3.9936 1.1773 37.9724 3.5066 5.8183 29.5106 4.3237 uSDN 1.7861 42.8702 1.3035 3.6198 32.5059 5.698 1.0221 39.4535 3.1874 6.7819 30.1769 5.3259 Proposed 1.4717 50.2839 1.0578 2.8659 34.0584 0.8865 0.715 43.8022 2.3053 3.8988 39.2144 1.2298 T ABLE V T H E A V E R AG E E R G AS , P S N R A N D S A M S C OR E S O VE R W E LL - R E GI S T ER E D B E NC H M AR K E D A N D R E M OT E S E N SI N G DAT A S E T S . Methods CA VE Harvard Remote Sensing ERGAS PSNR SAM ERGAS PSNR SAM ERGAS PSNR SAM GSA 0.34 37.2 6.74 0.13 39.43 2.02 2.2005 38.0713 2.261 SFIM 1.96 29.53 11.33 0.25 33.8 2.82 1.9832 40.8911 1.3705 CNMF 0.41 35.63 9.84 0.15 38.27 2.55 2.0685 40.8447 1.5747 CSU 0.33 37.86 5.76 0.14 39.06 2.35 2.9404 34.0881 2.3715 HySure 0.49 34.11 10.89 0.24 36.03 3.35 2.0709 41.3912 1.7779 NSSR 0.30 38.41 5.04 0.14 39.3 2.09 5.0724 30.4023 4.4154 CSTF 0.30 39.00 5.98 0.41 33.73 2.4 7.3942 31.3540 8.1009 Integrated 0.83 30.13 6.03 1.09 28.86 2.26 2.8672 33.3008 3.9413 uSDN 0.30 37.84 6.79 0.13 39.38 2.32 3.3025 36.2516 3.8787 u 2 -MDN 0.26 39.41 3.54 0.12 39.97 1.97 2.2379 41.8397 1.3699 of CNMF on the remote sensing dataset, whose number of spectral bands is higher than that of the benchmarked dataset. MRA-based SFIM [33], Bayesian-based HySure [38] and the integrated fusion approach [47] could achiev e relatively good performance on the remote sensing datasets, but their performance drops signiﬁcantly on the benchmarked CA VE and Harv ard datasets. On the contrary , sparse-coding-based NSSR [76] and tensor-based CSTF [42] could achie ve much more competiti ve performance on the benchmarked datasets than on the remote sensing datasets. Note that for NSSR, the most effecti ve step on the CA VE dataset is a post-processing step from [41], which actually degrades the performance on remote sensing datasets with more numbers of spectral bands. Thus, the post-processing step is disabled on the remote sensing datasets to improve the reconstruction accuracy . The tensor-based CSTF could achie ve competiti ve results on the CA VE dataset, which has a redundant spatial structure. How- ev er, its performance drops on the remote sensing datasets with less redundant spatial structure. The deep-learning-based uSDN [21] preserves spectral in- formation well on both the benchmarked and remote sensing datasets. Howe ver , it can only work on well-re gistered images due to its network design with angular difference regulariza- tion. Based on the average results shown in T able V, the pro- posed u 2 -MDN network powered by the mutual information and collaborative l 2 , 1 loss shows comparable, if not better , performance as compared to the state-of-the-art approaches in terms of ERGAS, PSNR, and SAM, and quite stable for different types of input images regardless of the number of spectral bands and SR ratios. In addition, it is very effecti ve in preserving the spectral signature of the reconstructed HR HSI, showing much-improved performance, especially measured by SAM on the CA VE data. This further demonstrates the robustness of the proposed u 2 -MDN. IEEE TRANSA CTIONS ON GEOSCIENCE AND REMO TE SENSING 11 T ABLE VI R E SU LT S O N U N R EG I S T ER E D ( RI G I D D I S TO RT E D ) B E N C HM A R KE D I M AG E S I N T E R M S O F E R G A S, P S N R A N D S A M . Methods CA VE Harvard balloon cloth pompoms spool img1 imgb5 imgc5 E P S E P S E P S E P S E P S E P S E P S GSA 0.82 27.71 14.35 0.76 27.59 9.75 1.18 23.54 22.13 1.07 29.92 17.16 1.65 23.60 10.66 0.40 20.07 4.90 0.69 22.18 5.32 SFIM 1.51 22.47 12.69 1.01 24.74 9.65 1.82 19.50 14.89 1.88 25.30 21.02 1.33 17.41 3.28 0.68 25.38 4.44 0.89 19.93 3.96 CNMF 0.71 29.18 10.63 0.69 27.84 8.12 0.83 26.67 11.88 0.63 34.62 17.03 0.74 22.29 3.85 0.34 31.39 3.97 0.48 25.34 3.16 NSSR 0.52 32.59 8.07 0.72 27.16 8.05 0.76 27.45 10.22 1.03 32.80 15.94 0.61 25.83 5.29 0.50 29.72 6.80 0.35 28.66 2.64 Integrated 1.14 24.68 9.34 1.68 19.84 11.25 1.82 19.38 17.60 1.65 29.81 13.60 1.00 19.80 4.05 0.68 25.40 3.24 0.87 20.25 2.40 u 2 -MDN 0.30 38.61 3.48 0.40 32.89 6.08 0.37 33.64 4.87 0.56 36.25 6.78 0.13 39.42 2.32 0.25 36.90 2.73 0.14 36.29 2.26 T ABLE VII R E SU LT S O N U N R EG I S T ER E D ( NO N R I GI D D I S TO RT ED ) R E M OT E S E NS I N G I M AG E S I N T ER M S O F E R G AS , P S N R A N D S A M . Methods Chikusei Houston Pa via W ashington ERGAS PSNR SAM ERGAS PSNR SAM ERGAS PSNR SAM ERGAS PSNR SAM GSA 4.4617 27.3028 7.6554 5.2167 27.6816 6.7581 3.5149 27.161 12.6803 7.1542 28.0469 7.8453 SFIM 6.8057 23.8192 5.8405 11.1495 22.5494 8.3767 5.2757 23.8248 9.2951 10.7509 23.9122 8.7243 CNMF 6.5318 25.5878 4.7582 5.6255 28.0016 5.5382 4.4232 25.2803 7.9551 25.8519 29.4765 5.9344 NSSR 5.8158 25.8475 5.4245 7.9435 23.7692 8.4696 5.0728 25.5778 8.2277 18.4162 22.3786 9.6114 Integrated 8.9107 21.4644 7.5126 14.4247 18.7631 10.2734 7.4938 20.7214 11.9717 15.0896 20.9486 11.0361 u 2 -MDN 1.5843 45.4919 1.0899 2.9458 33.6519 1.0497 0.8898 40.4446 2.4371 4.0777 38.6361 1.2757 E. Experimental Results on Unre gistered Image P airs In this section, two unregistered scenarios are studied, i.e ., rigid distorted benchmarked datasets, and nonrigid distorted remote sensing datasets, as described in Sec. V -B. Note that, since the pixels in the HSI and MSI do not match with each other , the reconstruction errors are expected to be increased. (a) (b) Fig. 8. The average PSNR of different wav elengths for the reconstructed HSI from the unre gistered rigid distorted (a) CA VE dataset and (b) Harvard dataset, respectiv ely . The performance of different methods on unregistered im- age pairs are reported in T ables VI and VII. Note that, only the methods that are able to work with unregistered image pairs are chosen in this group of experiments. Thus ﬁv e state- of-the-art methods are compared in the tables. The traditional CS-based GSA [15] and MTF-based SFIM [33] fail in this scenario. This is because when the giv en two modalities are unregistered, the spatial details could not be directly added to impro ve the spatial resolution of LR HSI. The matrix- factorization-based CNMF and sparse-coding based NSSR are more robust than the traditional methods. Howe ver , their performance also drops for both benchmarked and remote sensing datasets. The reason is that the adopted predeﬁned down-sampling function will introduce signiﬁcant spectral distortion when the LR HSI and HR MSI are unre gistered. The integrated fusion method could achieve good performance (a) (b) (c) (d) Fig. 9. The PSNR of different wavelengths for the reconstructed HSI from unregistered nonrigid distorted from (a) Chikusei, (b) Houston, (c) Pavia and (d) W ashington datasets, respectively . on remote sensing images with small distortion. Ho we ver , its performance drops on images with lar ge distortion. This is because the integrated fusion method performs registration before fusion, which may introduce additional distortion dur- ing optimization. The proposed u 2 -MDN is able to handle challenging scenarios much better than the state-of-the-art. The main reason that contrib utes to the success of the proposed approach is that, the network is able to extract the optimal and correlated spatial representations from two modalities through mutual information and collaborativ e loss. In this w ay , both the spatial and especially the spectral information are effecti vely IEEE TRANSA CTIONS ON GEOSCIENCE AND REMO TE SENSING 12 T ABLE VIII T H E A V E R AG E E R G AS , P S N R A N D S A M S C OR E S O VE R U N RE G I S TE R E D B E N CH M A RK E D A N D R E MO TE S E N S IN G DAT A S E T S . Methods CA VE Harvard Remote Sensing ERGAS PSNR SAM ERGAS PSNR SAM ERGAS PSNR SAM GSA 0.96 27.19 15.85 0.91 21.95 6.96 5.09 27.5481 8.7348 SFIM 1.56 23.00 14.56 0.97 20.91 3.89 8.50 23.5264 8.0592 CNMF 0.72 29.58 11.92 0.52 26.34 3.66 10.61 27.0866 6.0465 NSSR 0.76 30 10.57 0.49 28.07 4.91 9.31 24.3933 7.9333 Integrated 1.57 23.43 12.95 0.85 21.82 3.23 11.48 20.4744 10.1985 u 2 -MDN 0.41 35.35 5.30 0.17 37.54 2.44 2.3744 39.5561 1.4631 preserved. This demonstrates the representation capacity of the proposed structure. T o demonstrate the reconstruction performance in dif ferent spectral bands, the average PSNR of the benchmarked datasets on each wa velength is sho wn in Fig. 8. Since the numbers of spectral bands of the remote sensing datasets are different, we show their individual PSNR on each band in Fig. 9. W e can observ e that, regardless of the type of the datasets, the proposed method consistently outperforms the other methods for all the spectral bands on unregistered image pairs. T o visualize the reconstructed results for unregistered image pairs, we show the color composition of the reconstructed HR HSI in Figs. 10-13, among which Figs. 10, 11 demonstrate the results of the rigid distorted image pairs, while Figs. 12, 13 demonstrate the results of the nonrigid distorted image pairs. The ﬁrst column of each ﬁgure presents the reference HR HSI, the distorted LR HSI, and the original LR HSI in (a), (h), and (o), respectiv ely . The ﬁrst through third rows show the reconstructed images, the absolute dif ference, and the spectral map of the results from different methods. W e can observe that most approaches could not handle unregistered images pairs with large displacement well. The reconstructed results from SFIM have some blocking artifacts in most scenarios. The integrated fusion method has some smear ef fects on the reconstructed images due to the large displacement as shown in Fig. 10f and 13f. NSSR fails on the remote sensing datasets as shown in Figs. 12e and 13e, but it suffers relatively smaller spatial distortion on the benchmarked datasets. GSA could produce clear reconstructed images in most cases even though the images are unregistered, as shown in Figs. 10b, 12b and 13b. This observation is consistent with the conclusions drawn in [22]. Howe ver , we observe from the SAM maps that, it suffers from spectral distortion. The CNMF method handles unregistered image pairs better than the other approaches as shown in Figs. 10d, 11d, 12d and 13d. But its performance is limited by the predeﬁned down-sampling function. The effecti veness of the proposed method can be readily observed from the reconstructed results of difference images shown in Figs. 10g, 11g, 12g and 13g, where the proposed approach has much less spectral and spatial distortion as compared to the state-of-the-art, re gardless of the type of input images. F . Experimental Results on Unr e gister ed Real Image P airs W e further ev aluate the proposed method on the real unreg- istered image pairs with both rigid and nonrigid distortions. Since there is no ground truth HR HSI in real applications, we provide a visual inspection of the reconstructed results in Fig. 14. W e can observ e that, as long as the LR HSI includes all the spectral bases of HR MSI, the proposed method powered with mutual information is able to increase the spatial resolution of the LR HSI while preserving its spectral resolution well, e ven when the LR HSI and HR MSI hav e lar ge pix el displacement. G. Ablation and P arameter Study T aking the challenging rotated ‘pompom’ image from the CA VE dataset as an example, we further ev aluate 1) the necessity of maximizing the mutual information between rep- resentations and input images and 2) the usage of collaborativ e l 2 , 1 loss. Since the y are all designed to reduce the spectral distortion of the reconstructed image, we use SAM as the ev aluation metric. Fig. 15 illustrates the SAM of the reconstructed HR HSI when increasing the parameters of mutual information λ in Eq. 12. W e can observe that, if there is no mutual information maximization, i.e . λ = 0 , the spectral information would not be preserved well. When we gradually increase λ , the recon- structed HR HSI preserves better spectral information, i.e ., the SAM is lar gely reduced. The reason for that is, when we maximize the MI between the representations and their own inputs, it actually maximizes the mutual information of the representations of two modalities. Therefore, the network is able to correlate the extracted spectral and spatial information from unregistered HR MSI and LR MSI in an effecti ve way , to largely reduce the spectral distortion. Howe ver , when the parameters are too large, it may hinder the reconstruction procedure of the image pairs. Therefore, we need to choose the proper parameters for the network. In our experiments, we keep µ = 1 × 10 − 4 during the e xperiments to reduce over - ﬁtting.W e set λ = 1 × 10 − 5 for general HSI dataset with less spectral bands and λ = 1 × 10 − 1 for remote sensing HSI with more spectral bands. The ef fectiv eness ev aluation of the collaborati ve l 2 , 1 norm is demonstrated in Fig. 16. W e can observe that with l 1 norm, the network con ver ges much slower as compared to those using the l 2 norm and l 21 norm, and the l 21 norm con verges to smaller spectral distortions than using the l 2 norm or the l 1 norm. Thus, l 2 , 1 norm can preserve the spectral information better and signiﬁcantly reduce the spectral distortion of the restored HR HSI. IEEE TRANSA CTIONS ON GEOSCIENCE AND REMO TE SENSING 13 (a) Ref HR HSI (b) GSA (c) SFIM (d) CNMF (e) NSSR (f) Integrated (g) u 2 -MDN (h) Distorted LR HSI (i) Difference of GSA (j) Difference of SFIM (k) Difference of CNMF (l) Difference of NSSR(m) Difference of Inte- grated (n) Difference of u 2 -MDN (o) LR HSI (p) SAM of GSA (q) SAM of SFIM (r) SAM of CNMF (s) SAM of NSSR (t) SAM of Integrated (u) SAM of u 2 -MDN Fig. 10. Reconstructed results gi ven unregistered rigid distorted image pairs from the CA VE dataset. (a) Color composite of the reference HR HSI. (h) Color composite of the distorted LR HSI. (o) Color composite of the LR HSI. (b)-(g): reconstructed results. (i)-(n): average absolute difference between the reconstructed HSI and reference HSI ov er dif ferent spectral bands, from different methods. (p)-(u) SAM of each pixel between the reconstructed HSI and reference HSI from different methods. (a) Ref HR HSI (b) GSA (c) SFIM (d) CNMF (e) NSSR (f) Integrated (g) u 2 -MDN (h) Distorted LR HSI (i) Difference of GSA (j) Difference of SFIM (k) Difference of CNMF (l) Difference of NSSR(m) Difference of Inte- grated (n) Difference of u 2 -MDN (o) LR HSI (p) SAM of GSA (q) SAM of SFIM (r) SAM of CNMF (s) SAM of NSSR (t) SAM of Integrated (u) SAM of u 2 -MDN Fig. 11. Reconstructed results gi ven unregistered rigid distorted image pairs from the Harvard dataset. (a) Color composite of the reference HR HSI. (h) Color composite of the distorted LR HSI. (o) Color composite of the LR HSI. (b)-(g): reconstructed results. (i)-(n): average absolute difference between the reconstructed HSI and reference HSI ov er dif ferent spectral bands, from different methods. (p)-(u) SAM of each pixel between the reconstructed HSI and reference HSI from different methods. IEEE TRANSA CTIONS ON GEOSCIENCE AND REMO TE SENSING 14 (a) Ref HR HSI (b) GSA (c) SFIM (d) CNMF (e) NSSR (f) Integrated (g) u 2 -MDN (h) Distorted LR HSI (i) Difference of GSA (j) Difference of SFIM (k) Difference of CNMF (l) Difference of NSSR(m) Difference of inte- grated (n) Difference of u 2 -MDN (o) LR HSI (p) SAM of GSA (q) SAM of SFIM (r) SAM of CNMF (s) SAM of NSSR (t) SAM of Integrated (u) SAM of u 2 -MDN Fig. 12. Reconstructed results given unregistered nonrigid distorted image pairs from the Chikusei dataset. (a) Color composite of the reference HR HSI. (h) Color composite of the distorted LR HSI. (o) Color composite of the LR HSI. (b)-(g): reconstructed results. (i)-(n): av erage absolute difference between the reconstructed HSI and reference HSI over different spectral bands, from different methods. (p)-(u) SAM of each pixel between the reconstructed HSI and reference HSI from different methods. H. T oler ance Study At last, we would like to examine how much spectral information can be preserved when the network deals with unregistered images. T o preserve spectral information, the input LR HSI should cov er all the spectral signatures of HR MSI. Thus, we choose the image in Fig. 1 from the Harvard dataset which has most of the spectral signatures centered in the image. The results are sho wn in Fig. 17. The image is rotated from 5 degrees to 30 degrees with 15% to 48% percent of information missing. W e can observe that as long as the spectral bases are included in the LR HSI, no matter ho w small the overlapped region is between the LR HSI and HR MSI, we could always achiev e the reconstructed image with small spectral distortion even for unregistered input images. V I . C O N C L U S I O N W e proposed an unsupervised encoder -decoder network u 2 - MDN to solve the problem of hyperspectral image super- resolution without multi-modality registration. The unique structure stabilizes the network training by projecting both modalities into the same space and extracting the spectral basis from LR HSI with rich spectral information as well as spatial representations from HR MSI with high-resolution spatial information simultaneously . The network learns cor- related spatial information from two unregistered modalities by maximizing the mutual information between the represen- tations and their own raw inputs. In this way , it maximizes the MI between the two representations that largely reduces the spectral distortion. In addition, the collaborativ e l 2 , 1 norm is adopted to encourage the network to further preserve spectral information. Extensiv e experiments on two benchmark datasets demonstrated the superiority of the proposed approach ov er the state-of-the-art. A C K N O W L E D G M E N T The authors would like to thank all the developers of the ev aluated methods who kindly of fered their codes, and Dr . Danfeng Hong and Dr . Ke Zhang who pro vided suggestions on synthetic data generation. This publication was made possible by N ASA grant NNX12CB05C and NNX16CP38P . R E F E R E N C E S [1] A. Chakrabarti and T . Zickler, “Statistics of real-world hyperspectral images, ” The IEEE Conference on Computer V ision and P attern Recog- nition (CVPR) , pp. 193–200, 2011. IEEE TRANSA CTIONS ON GEOSCIENCE AND REMO TE SENSING 15 (a) Ref HR HSI (b) GSA (c) SFIM (d) CNMF (e) NSSR (f) Integrated (g) u 2 -MDN (h) Distorted LR HSI (i) Difference of GSA (j) Difference of SFIM (k) Difference of CNMF (l) Difference of NSSR(m) Difference of inte- grated (n) Difference of u 2 -MDN (o) LR HSI (p) SAM of GSA (q) SAM of SFIM (r) SAM of CNMF (s) SAM of NSSR (t) SAM of Integrated (u) SAM of u 2 -MDN Fig. 13. Reconstructed results gi ven unregistered nonrigid distorted image pairs from the Pavia dataset. (a) Color composite of the reference HR HSI. (h) Color composite of the distorted LR HSI. (o) Color composite of the LR HSI. (b)-(g): reconstructed results. (i)-(n): average absolute difference between the reconstructed HSI and reference HSI ov er dif ferent spectral bands, from different methods. (p)-(u) SAM of each pixel between the reconstructed HSI and reference HSI from different methods. (a) Real LR HSI (b) Real HR MSI (c) Reconstructed HR HSI Fig. 14. Color composite of (a) the LR HSI of the real data from Hyperion, (b) the HR MSI of the real data from W orldV iew3 (images courtesy Maxar), and (c) the reconstructed HR HSI from the proposed method. IEEE TRANSA CTIONS ON GEOSCIENCE AND REMO TE SENSING 16 Fig. 15. Inﬂuence of MI Fig. 16. The effect of l 2 , 1 Fig. 17. T olerance study [2] J. M. Bioucas-Dias, A. Plaza, N. Dobigeon, M. Parente, Q. Du, P . Gader, and J. Chanussot, “Hyperspectral unmixing overview: Geometrical, statistical, and sparse regression-based approaches, ” Selected T opics in Applied Earth Observations and Remote Sensing, IEEE Journal of , vol. 5, no. 2, 2012. [3] C. Kwan, B. A yhan, G. Chen, J. W ang, B. Ji, and C.-I. Chang, “ A novel approach for spectral unmixing, classiﬁcation, and concentration estimation of chemical and biological agents, ” IEEE T ransactions on Geoscience and Remote Sensing , vol. 44, no. 2, pp. 409–419, 2006. [4] N. Y okoya, C. Grohnfeldt, and J. Chanussot, “Hyperspectral and mul- tispectral data fusion: A comparative re vie w of the recent literature, ” IEEE Geoscience and Remote Sensing Magazine , vol. 5, no. 2, pp. 29– 56, 2017. [5] J. M. Haut, M. E. Paoletti, J. Plaza, J. Li, and A. Plaza, “ Activ e learning with con volutional neural networks for hyperspectral image classiﬁcation using a new bayesian approach, ” IEEE Tr ansactions on Geoscience and Remote Sensing , no. 99, pp. 1–22, 2018. [6] P . S. S. A ydav and S. Minz, “Classiﬁcation of hyperspectral images using self-training and a pseudo validation set, ” Remote Sensing Letters , vol. 9, no. 11, pp. 1109–1117, 2018. [7] B. Uzkent, A. Rangnekar, and M. Hoffman, “ Aerial vehicle tracking by adaptiv e fusion of hyperspectral likelihood maps, ” The IEEE Conference on Computer V ision and P attern Recognition W orkshops (CVPRW) , July 2017. [8] A. Plaza, Q. Du, J. M. Bioucas-Dias, X. Jia, and F . A. Kruse, “Foreword to the special issue on spectral unmixing of remotely sensed data, ” IEEE T ransactions on Geoscience and Remote Sensing , vol. 49, no. 11, pp. 4103–4110, 2011. [9] M. Borengasser, W . S. Hungate, and R. W atkins, Hyperspectral r emote sensing: principles and applications , 2007. [10] R. Kaw akami, Y . Matsushita, J. Wright, M. Ben-Ezra, Y .-W . T ai, and K. Ikeuchi, “High-resolution hyperspectral imaging via matrix factorization, ” The IEEE Conference on Computer V ision and P attern Recognition (CVPR) , pp. 2329–2336, 2011. [11] N. Akhtar , F . Shafait, and A. Mian, “Bayesian sparse representation for hyperspectral image super resolution, ” The IEEE Conference on Computer V ision and P attern Recognition (CVPR) , pp. 3631–3640, 2015. [12] C. Lanaras, E. Baltsavias, and K. Schindler , “Hyperspectral super - resolution by coupled spectral unmixing, ” The IEEE Confer ence on Computer V ision and P attern Recognition (CVPR) , pp. 3586–3594, 2015. [13] G. V iv one, L. Alparone, J. Chanussot, M. Dalla Mura, A. Garzelli, G. A. Licciardi, R. Restaino, and L. W ald, “ A critical comparison among pansharpening algorithms, ” IEEE Tr ansactions on Geoscience and Remote Sensing , vol. 53, no. 5, 2015. [14] C. Thomas, T . Ranchin, L. W ald, and J. Chanussot, “Synthesis of multispectral images to high spatial resolution: A critical review of fusion methods based on remote sensing physics, ” IEEE T ransactions on Geoscience and Remote Sensing , vol. 46, no. 5, pp. 1301–1312, 2008. [15] B. Aiazzi, S. Baronti, and M. Selva, “Improving component substitution pansharpening through multiv ariate regression of ms+ pan data, ” IEEE T ransactions on Geoscience and Remote Sensing , vol. 45, no. 10, 2007. [16] B. Aiazzi, L. Alparone, S. Baronti, A. Garzelli, and M. Selv a, “Mtf- tailored multiscale fusion of high-resolution ms and pan imagery , ” Photogrammetric Engineering & Remote Sensing , vol. 72, no. 5, pp. 591–596, 2006. [17] L. Loncan, L. B. de Almeida, J. M. Bioucas-Dias, X. Briottet, J. Chanus- sot, N. Dobigeon, S. Fabre, W . Liao, G. A. Licciardi, M. Simoes et al. , “Hyperspectral pansharpening: a revie w , ” IEEE Geoscience and Remote Sensing Magazine , vol. 3, no. 3, 2015. [18] Y . Chang, L. Y an, H. F ang, S. Zhong, and W . Liao, “Hsi-denet: Hyperspectral image restoration via con volutional neural netw ork, ” IEEE T ransactions on Geoscience and Remote Sensing , no. 99, pp. 1–16, 2018. [19] P . Arun, K. M. Buddhiraju, A. Porwal, and J. Chanussot, “Cnn- based super-resolution of hyperspectral images, ” IEEE Tr ansactions on Geoscience and Remote Sensing , 2020. [20] Y . Zhou, A. Rangarajan, and P . D. Gader, “Nonrigid registration of hyperspectral and color images with vastly different spatial and spec- tral resolutions for spectral unmixing and pansharpening, ” 2017 IEEE Confer ence on Computer V ision and P attern Recognition W orkshops (CVPRW) , pp. 1571–1579, 2017. [21] Y . Qu, H. Qi, and C. Kwan, “Unsupervised sparse dirichlet-net for hy- perspectral image super-resolution, ” The IEEE Conference on Computer V ision and P attern Recognition (CVPR) , pp. 2511–2520, 2018. [22] S. Baronti, B. Aiazzi, M. Selva, A. Garzelli, and L. Alparone, “ A theoretical analysis of the effects of aliasing and misregistration on pansharpened imagery , ” IEEE Journal of Selected T opics in Signal Pr ocessing , vol. 5, no. 3, pp. 446–453, 2011. [23] F . D. V an der Meer , H. M. V an der W erff, F . J. V an Ruitenbeek, C. A. Hecker , W . H. Bakker, M. F . Noomen, M. V an Der Meijde, E. J. M. Carranza, J. B. De Smeth, and T . W oldai, “Multi-and hyperspectral geologic remote sensing: A revie w , ” International Journal of Applied Earth Observation and Geoinformation , vol. 14, no. 1, pp. 112–128, 2012. [24] Q. W ei, N. Dobigeon, and J.-Y . T ourneret, “Fast fusion of multi-band images based on solving a sylv ester equation, ” IEEE Tr ansactions on Image Processing , vol. 24, no. 11, pp. 4109–4121, 2015. [25] H. Chui and A. Rangarajan, “ A new point matching algorithm for non- rigid registration, ” Computer V ision and Image Understanding , vol. 89, no. 2-3, pp. 114–141, 2003. [26] X. Fan, H. Rhody , and E. Saber , “ A spatial-feature-enhanced mmi algo- rithm for multimodal airborne image registration, ” IEEE Tr ansactions on Geoscience and Remote Sensing , vol. 48, no. 6, pp. 2580–2589, 2010. [27] A. Myronenk o and X. Song, “Point set registration: Coherent point drift, ” IEEE transactions on pattern analysis and machine intelligence , vol. 32, no. 12, pp. 2262–2275, 2010. [28] J. Ma, H. Zhou, J. Zhao, Y . Gao, J. Jiang, and J. T ian, “Rob ust feature matching for remote sensing image registration via locally linear transforming, ” IEEE T ransactions on Geoscience and Remote Sensing , vol. 53, no. 12, pp. 6469–6481, 2015. [29] M. E. Schaepman, M. Jehle, A. Hueni, P . D’Odorico, A. Damm, J. W eyermann, F . D. Schneider , V . Laurent, C. Popp, F . C. Seidel et al. , “ Advanced radiometry measurements and earth science applications with the airborne prism experiment (apex), ” Remote Sensing of En vir onment , vol. 158, pp. 207–219, 2015. [30] M. Selva, B. Aiazzi, F . Butera, L. Chiarantini, and S. Baronti, “Hyper- sharpening: A ﬁrst approach on sim-ga data, ” IEEE Journal of Selected T opics in Applied Earth Observations and Remote sensing , vol. 8, no. 6, pp. 3008–3024, 2015. [31] M. Selv a, L. Santurri, and S. Baronti, “Improving hypersharpening for worldview-3 data, ” IEEE Geoscience and Remote Sensing Letters , vol. 16, no. 6, pp. 987–991, 2019. [32] S. G. Mallat, “ A theory for multiresolution signal decomposition: the wav elet representation, ” IEEE T ransactions on P attern Analysis and Machine Intelligence , vol. 11, no. 7, 1989. [33] J. Liu, “Smoothing ﬁlter-based intensity modulation: A spectral preserve image fusion technique for improving spatial details, ” International Journal of Remote Sensing , vol. 21, no. 18, pp. 3461–3472, 2000. [34] P . Burt and E. Adelson, “The laplacian pyramid as a compact image code, ” IEEE Tr ansactions on Communications , vol. 31, no. 4, pp. 532– 540, 1983. IEEE TRANSA CTIONS ON GEOSCIENCE AND REMO TE SENSING 17 [35] R. Dian, L. Fang, and S. Li, “Hyperspectral image super-resolution via non-local sparse tensor factorization, ” The IEEE Confer ence on Computer V ision and P attern Recognition (CVPR) , pp. 5344–5353, 2017. [36] Q. W ei, N. Dobigeon, and J.-Y . T ourneret, “Bayesian fusion of hyper- spectral and multispectral images, ” 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , 2014. [37] Q. W ei, J. Bioucas-Dias, N. Dobigeon, and J.-Y . T ourneret, “Hyperspec- tral and multispectral image fusion based on a sparse representation, ” IEEE T ransactions on Geoscience and Remote Sensing , vol. 53, no. 7, 2015. [38] M. Sim ˜ oes, J. Bioucas-Dias, L. B. Almeida, and J. Chanussot, “ A con ve x formulation for hyperspectral image superresolution via subspace-based regularization, ” IEEE T ransactions on Geoscience and Remote Sensing , vol. 53, no. 6, pp. 3373–3388, 2015. [39] N. Y okoya, T . Y airi, and A. Iwasaki, “Coupled nonnegati ve matrix factorization unmixing for hyperspectral and multispectral data fusion, ” IEEE T ransactions on Geoscience and Remote Sensing , vol. 50, no. 2, pp. 528–537, 2012. [40] M. A. V eganzones, M. Simoes, G. Licciardi, N. Y okoya, J. M. Bioucas- Dias, and J. Chanussot, “Hyperspectral super-resolution of locally low rank images from complementary multisource data, ” IEEE T ransactions on Image Processing , vol. 25, no. 1, pp. 274–288, 2016. [41] E. W ycoff, T .-H. Chan, K. Jia, W .-K. Ma, and Y . Ma, “ A non- negati ve sparse promoting algorithm for high resolution hyperspectral imaging, ” Acoustics, Speech and Signal Pr ocessing (ICASSP), 2013 IEEE International Conference on , pp. 1409–1413, 2013. [42] S. Li, R. Dian, L. Fang, and J. M. Bioucas-Dias, “Fusing hyperspectral and multispectral images via coupled sparse tensor factorization, ” IEEE T ransactions on Image Processing , vol. 27, no. 8, pp. 4118–4130, 2018. [43] Y . Chang, L. Y an, X.-L. Zhao, H. Fang, Z. Zhang, and S. Zhong, “W eighted low-rank tensor recovery for hyperspectral image restora- tion, ” IEEE T ransactions on Cybernetics , 2020. [44] Y . Xu, Z. Wu, J. Chanussot, P . Comon, and Z. W ei, “Nonlocal coupled tensor cp decomposition for hyperspectral and multispectral image fusion, ” IEEE T ransactions on Geoscience and Remote Sensing , vol. 58, no. 1, pp. 348–362, 2019. [45] Y . Xu, Z. Wu, J. Chanussot, and Z. W ei, “Hyperspectral images super- resolution via learning high-order coupled tensor ring representation, ” IEEE T ransactions on Neural Networks and Learning Systems , 2020. [46] Chen, Y eqing, Li, W ei, Liu, Junzhou, and Huang, “Sirf: Simultaneous satellite image registration and fusion in a uniﬁed frame work. ” IEEE T ransactions on Image Pr ocessing A Publication of the IEEE Signal Pr ocessing Society , 2015. [47] Y . Zhou, A. Rangarajan, and P . D. Gader, “ An integrated approach to registration and fusion of hyperspectral and multispectral images, ” IEEE T ransactions on Geoscience and Remote Sensing , vol. 58, no. 5, pp. 3020–3033, 2020. [48] C. Dong, C. C. Loy , K. He, and X. T ang, “Image super-resolution using deep con volutional networks, ” IEEE T ransactions on P attern Analysis and Machine Intelligence , vol. 38, no. 2, pp. 295–307, 2016. [49] C. Ledig, L. Theis, F . Husz ´ ar , J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. T ejani, J. T otz, Z. W ang et al. , “Photo-realistic single image super-resolution using a generativ e adv ersarial network, ” arXiv pr eprint arXiv:1609.04802 , 2016. [50] W . Huang, L. Xiao, Z. W ei, H. Liu, and S. T ang, “ A new pan-sharpening method with deep neural networks, ” IEEE Geoscience and Remote Sensing Letters , vol. 12, no. 5, pp. 1037–1041, 2015. [51] G. Masi, D. Cozzolino, L. V erdoliv a, and G. Scarpa, “Pansharpening by con volutional neural networks, ” Remote Sensing , vol. 8, no. 7, p. 594, 2016. [52] Y . W ei, Q. Y uan, H. Shen, and L. Zhang, “Boosting the accurac y of multi-spectral image pan-sharpening by learning a deep residual network, ” arXiv preprint , 2017. [53] Y . Li, J. Hu, X. Zhao, W . Xie, and J. Li, “Hyperspectral image super- resolution using deep conv olutional neural network, ” Neurocomputing , vol. 266, pp. 29–41, 2017. [54] R. Dian, S. Li, A. Guo, and L. Fang, “Deep hyperspectral image sharpening, ” IEEE T ransactions on Neur al Networks and Learning Systems , 2018. [55] Q. Xie, M. Zhou, Q. Zhao, D. Meng, W . Zuo, and Z. Xu, “Multispectral and hyperspectral image fusion by ms/hs fusion net, ” 2019 IEEE/CVF Confer ence on Computer V ision and P attern Recognition (CVPR) , 2019. [56] C. Kwan, B. Buda vari, A. C. Bovik, and G. Marchisio, “Blind quality assessment of fused worldvie w-3 images by using the combinations of pansharpening and hypersharpening paradigms, ” IEEE Geoscience and Remote Sensing Letters , vol. 14, no. 10, pp. 1835–1839, 2017. [57] C. Kwan, C. Haberle, A. Echav arren, B. A yhan, B. Chou, B. Budavari, and S. Dickenshied, “Mars surface mineral abundance estimation using themis and tes images, ” IEEE Ubiquitous Computing, Electr onics and Mobile Communication Conference ,New Y ork City , November 2018. [58] Y . Fu, T . Zhang, Y . Zheng, D. Zhang, and H. Huang, “Hyperspectral image super -resolution with optimized rgb guidance, ” 2019 IEEE/CVF Confer ence on Computer V ision and P attern Reco gnition (CVPR) , pp. 11 661–11 670, 2019. [59] K. Zheng, L. Gao, W . Liao, D. Hong, B. Zhang, X. Cui, and J. Chanus- sot, “Coupled conv olutional neural network with adapti ve response function learning for unsupervised hyperspectral super resolution, ” IEEE T ransactions on Geoscience and Remote Sensing , 2020. [60] J. Sethuraman, “ A constructi ve deﬁnition of dirichlet priors, ” Statistica Sinica , pp. 639–650, 1994. [61] E. Nalisnick and P . Smyth, “Deep generative models with stick-breaking priors, ” International Conference on Machine Learning (ICML) , 2017. [62] P . Kumaraswamy , “ A generalized probability density function for double-bounded random processes, ” Journal of Hydr ology , vol. 46, no. 1-2, pp. 79–88, 1980. [63] C. Dugas, Y . Bengio, F . B ´ elisle, C. Nadeau, and R. Garcia, “Incor- porating second-order functional knowledge for better option pricing, ” Advances in neural information processing systems , pp. 472–478, 2001. [64] J. Han and C. Moraga, “The inﬂuence of the sigmoid function parame- ters on the speed of backpropagation learning, ” F r om Natur al to Artiﬁcial Neural Computation , pp. 195–201, 1995. [65] G. Huang, Z. Liu, K. Q. W einberger , and L. van der Maaten, “Densely connected con volutional networks, ” arXiv preprint , 2016. [66] J. B. Kinney and G. S. Atwal, “Equitability , mutual information, and the maximal information coefﬁcient, ” Pr oceedings of the National Academy of Sciences , p. 201309933, 2014. [67] B. Zitova and J. Flusser , “Image registration methods: a survey , ” Image and vision computing , vol. 21, no. 11, pp. 977–1000, 2003. [68] J. W oo, M. Stone, and J. L. Prince, “Multimodal registration via mutual information incorporating geometric and spatial context, ” IEEE T ransactions on Image Pr ocessing , vol. 24, no. 2, pp. 757–769, 2015. [69] I. Belghazi, S. Rajeswar, A. Baratin, R. D. Hjelm, and A. Courville, “Mine: mutual information neural estimation, ” arXiv pr eprint arXiv:1801.04062 , 2018. [70] M. D. Donsker and S. S. V aradhan, “ Asymptotic evaluation of certain markov process expectations for large time. iv , ” Communications on Pur e and Applied Mathematics , vol. 36, no. 2, pp. 183–212, 1983. [71] R. D. Hjelm, A. Fedorov , S. Lavoie-Marchildon, K. Gre wal, A. T rischler, and Y . Bengio, “Learning deep representations by mutual information estimation and maximization, ” arXiv pr eprint arXiv:1808.06670 , 2018. [72] F . Nie, H. Huang, X. Cai, and C. H. Ding, “Efﬁcient and robust feature selection via joint ` 2, 1-norms minimization, ” Advances in neur al information pr ocessing systems , pp. 1813–1821, 2010. [73] F . Y asuma, T . Mitsunaga, D. Iso, and S. K. Nayar, “Generalized assorted pixel camera: postcapture control of resolution, dynamic range, and spectrum, ” IEEE T ransactions on Image Pr ocessing , vol. 19, no. 9, pp. 2241–2253, 2010. [74] N. Y okoya and A. Iwasaki, “ Airborne hyperspectral data over chikusei, ” Space Appl. Lab., Univ . T okyo, T okyo, Japan, T ech. Rep. SAL-2016-05- 27 , 2016. [75] C. Debes, A. Merentitis, R. Heremans, J. Hahn, N. Frangiadakis, T . van Kasteren, W . Liao, R. Bellens, A. Pi ˇ zurica, S. Gautama et al. , “Hyperspectral and lidar data fusion: Outcome of the 2013 grss data fusion contest, ” IEEE Journal of Selected T opics in Applied Earth Observations and Remote Sensing , vol. 7, no. 6, pp. 2405–2418, 2014. [76] W . Dong, F . Fu, G. Shi, X. Cao, J. W u, G. Li, and X. Li, “Hyperspectral image super-resolution via non-negativ e structured sparse representa- tion, ” IEEE Tr ansactions on Image Pr ocessing , vol. 25, no. 5, pp. 2337– 2352, 2016. [77] L. W ald, T . Ranchin, and M. Mangolini, “Fusion of satellite images of different spatial resolutions: Assessing the quality of resulting images, ” Photogrammetric Engineering & Remote Sensing , vol. 63, no. 6, pp. 691–699, 1997. [78] R. C. Hardie, M. A. Rucci, A. J. Dapore, and B. K. Karch, “Block matching and wiener ﬁltering approach to optical turbulence mitigation and its application to simulated and real imagery with quantitative error analysis, ” Optical Engineering , vol. 56, no. 7, p. 071503, 2017. [79] F . A. Kruse, A. Lefkoff, J. Boardman, K. Heidebrecht, A. Shapiro, P . Barloon, and A. Goetz, “The spectral image processing system (sips)—interactiv e visualization and analysis of imaging spectrometer data, ” Remote sensing of envir onment , vol. 44, no. 2-3, pp. 145–163, 1993. IEEE TRANSA CTIONS ON GEOSCIENCE AND REMO TE SENSING 18 Y ing Qu (S’16–M’18) received the B.S. degree in automatics and M.S. degree in pattern recognition & artiﬁcial intelligence from Northeastern Uni versity , Shenyang, China in 2008 and 2010, respecti vely , and the Ph.D. degree in computer engineering from the Univ ersity of T ennessee, Knoxville, in 2017. She is currently a research associate with the Department of Electrical Engineering and Computer Science at the Univ ersity of T ennessee, Knoxville. Her current research interests are remote sensing, artiﬁcial intel- ligence and computer vision. She was the recipient of the Best Student Paper A wards at The International Geoscience and Remote Sensing Symposium (IGARSS) in 2016. Hairong Qi (IEEE Fellow since 2017) received the B.S. and M.S. degrees in computer science from Northern JiaoT ong University , Beijing, China in 1992 and 1995, respectively , and the Ph.D. degree in computer engineering from North Carolina State Univ ersity , Raleigh, in 1999. She is currently the Gonzalez Family Professor with the Department of Electrical Engineering and Computer Science at the University of T ennessee, Knoxville. Her cur- rent research interests are in advanced imaging and collaborative processing in resource-constrained distributed environment, hyperspectral image analysis, and automatic target recognition. Dr. Qi’ s research is supported by National Science Foundation (NSF), DARP A, Of ﬁce of Naval Research (ONR), Department of Homeland Security (DHS), U.S. Army Space and Missile Defense Command, and U.S. Army Medical Research and Materiel Command. Dr . Qi is the recipient of the NSF CAREER A ward. She also received the Best Paper A wards at the 18th International Conference on Pattern Recognition (ICPR) in 2006, the 3rd A CM/IEEE International Conference on Distributed Smart Cameras (ICDSC) in 2009, and IEEE W orkshop on Hyperspectral Image and Signal Processing: Evolution in Remote Sensor (WHISPERS) in 2015. She is aw arded the Highest Impact Paper from the IEEE Geoscience and Remote Sensing Society in 2012. Chiman Kwan (S’85-M’93-SM’98) recei ved his BS with honors in Electronics from the Chinese Univ ersity of Hong K ong in 1988, and MS and Ph.D. degrees in electrical engineering from the University of T exas at Arlington in 1989 and 1993, respectiv ely . Currently , he is the Chief T echnology Of ﬁcer of Signal Processing, Inc. and Applied Research LLC, leading research and de velopment efforts in chemical agent detection, biometrics, speech processing, im- age fusion, mission planning, and fault diagnostics and prognostics. His primary research areas include robust and adaptiv e control methods, signal and image processing, commu- nications, neural networks, and pattern recognition applications. From April 1991 to February 1994, he worked in the Beam Instrumentation Department of the SSC (Superconducting Super Collider Laboratory) in Dallas, T exas, where he was heavily in volved in the modeling, simulation and design of modern digital controllers and signal processing algorithms for the beam control and synchronization system. He later joined the Automation and Robotics Research Institute in Fort W orth, where he applied intelligent control methods such as neural networks and fuzzy logic to the control of power systems, robots, and motors. Between July 1995 and April 2006, he was with Intelligent Automation, Inc. in Rockville, Maryland. He has served as Principal Inv estigator/Program Manager for more than 120 different projects. Dr . Kwan has 15 patents, 65 in vention disclosures, more than 120 papers in archival journals and more than 250 additional papers published in major conference proceedings. He is listed in the New Millennium edition of Who’ s Who in Science and Engineering and is a member of T au Beta Pi. He also receiv ed se veral aw ards from IEEE related to fault diagnostics and prognostics, and a certiﬁcate of recognition from NASA for the health monitoring of Auxiliary Power Units in the Space Shuttle. Naoto Y okoya (S’10–M’13) received the M.Eng. and Ph.D. degrees from the Department of Aero- nautics and Astronautics, the University of T okyo, T okyo, Japan, in 2010 and 2013, respectively . He is currently a Lecturer at the University of T okyo and a Unit Leader at the RIKEN Center for Ad- vanced Intelligence Project, T okyo, Japan, where he leads the Geoinformatics Unit. He was an Assistant Professor at the Uni versity of T okyo from 2013 to 2017. In 2015-2017, he was an Alexander von Humboldt Fellow , working at the German Aerospace Center (DLR), Oberpfaf fenhofen, and T echnical University of Munich (TUM), Munich, Germany . His research is focused on the development of image processing, data fusion, and machine learning algorithms for understanding remote sensing images, with applications to disaster management. Dr . Y okoya won the ﬁrst place in the 2017 IEEE Geoscience and Remote Sensing Society (GRSS) Data Fusion Contest organized by the Image Analysis and Data Fusion T echnical Committee (IADF TC). He is the Chair (2019-2021) and was a Co-Chair (2017-2019) of IEEE GRSS IADF TC and also the secretary of the IEEE GRSS All Japan Joint Chapter since 2018. He is an Associate Editor for the IEEE Journal of Selected T opics in Applied Earth Observations and Remote Sensing (JST ARS) since 2018. He is/was a Guest Editor for the IEEE JST ARS in 2015-2021, for Remote Sensing in 2016-2021, and for the IEEE Geoscience and Remote Sensing Letters (GRSL) in 2018-2019. Jocelyn Chanussot (M’04–SM’04–F’12) receiv ed the M.Sc. degree in electrical engineering from the Grenoble Institute of T echnology (Grenoble INP), Grenoble, France, in 1995, and the Ph.D. degree from the Univ ersit ´ e de Sav oie, Annecy , France, in 1998. Since 1999, he has been with Grenoble INP , where he is currently a Professor of signal and image processing. His research interests include image analysis, hyperspectral remote sensing, data fusion, machine learning and artiﬁcial intelligence. He has been a visiting scholar at Stanford University (USA), KTH (Sweden) and NUS (Singapore). Since 2013, he is an Adjunct Professor of the Univ ersity of Iceland. In 2015-2017, he was a visiting professor at the University of California, Los Angeles (UCLA). He holds the AXA chair in remote sensing and is an Adjunct professor at the Chinese Academy of Sciences, Aerospace Information research Institute, Beijing. Dr. Chanussot is the founding President of IEEE Geoscience and Remote Sensing French chapter (2007-2010) which received the 2010 IEEE GRSS Chapter Excellence A ward. He has received multiple outstanding paper awards. He was the V ice-President of the IEEE Geoscience and Remote Sensing Society , in charge of meetings and symposia (2017-2019). He was the General Chair of the ﬁrst IEEE GRSS W orkshop on Hyperspectral Image and Signal Processing, Evolution in Remote sensing (WHISPERS). He was the Chair (2009-2011) and Cochair of the GRS Data Fusion T echnical Committee (2005-2008). He w as a member of the Machine Learning for Signal Processing T echnical Committee of the IEEE Signal Processing Society (2006-2008) and the Program Chair of the IEEE International W orkshop on Machine Learning for Signal Processing (2009). He is an Associate Editor for the IEEE T ransactions on Geoscience and Remote Sensing, the IEEE Transactions on Image Processing and the Proceedings of the IEEE. He was the Editor-in- Chief of the IEEE Journal of Selected T opics in Applied Earth Observ ations and Remote Sensing (2011-2015). In 2014 he served as a Guest Editor for the IEEE Signal Processing Magazine. He is a Fellow of the IEEE, a member of the Institut Univ ersitaire de France (2012-2017) and a Highly Cited Researcher (Clariv ate Analytics/Thomson Reuters, since 2018).

Unsupervised and Unregistered Hyperspectral Image Super-Resolution with Mutual Dirichlet-Net

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment