Consensus Neural Network for Medical Imaging Denoising with Only Noisy Training Samples

Deep neural networks have been proved efficient for medical image denoising. Current training methods require both noisy and clean images. However, clean images cannot be acquired for many practical medical applications due to naturally noisy signal,…

Authors: Dufan Wu, Kuang Gong, Kyungsang Kim

Consensus Neural Network for Medical Imaging Denoising with Only Noisy   Training Samples
Consensus Neural Net w ork for Medical Imaging Denoising with Only Noisy T raining Samples Dufan W u 1 , 2 , Kuang Gong 1 , 2 , Kyungsang Kim 1 , 2 , and Quanzheng Li 1 , 2 1 Cen ter for Adv anced Medical Computing and Analysis, Massach usetts General Hospital and Harv ard Medical School, Boston, MA 02114, USA 2 Gordon Center for Medical Imaging, Massac husetts General Hospital and Harv ard Medical School, Boston, MA 02114, USA Abstract. Deep neural net works ha ve been prov ed efficient for medi- cal image denoising. Curren t training methods require b oth noisy and clean images. How ev er, clean images cannot b e acquired for many prac- tical medical applications due to naturally noisy signal, suc h as dynamic imaging, spectral computed tomograph y , arterial spin labeling magnetic resonance imaging, etc. In this paper we proposed a training metho d whic h learned denoising neural net works from noisy training samples only . T raining data in the acquisition domain was split to t wo subsets and the net work was trained to map one noisy set to the other. A consen- sus loss function was further prop osed to efficien tly com bine the outputs from b oth subsets. A mathematical pro of was provided that the pro- p osed training scheme was equiv alen t to training with noisy and clean samples when the noise in the tw o subsets w as uncorrelated and zero- mean. The metho d was v alidated on Low-dose CT Challenge dataset and NYU MRI dataset and achiev ed improv ed p erformance compared to existing unsup ervised metho ds. Keyw ords: Image denoising · Neural net work · Unsupervised learning. 1 In tro duction Deep neural netw ork has b een prov ed efficien t for noise and artifacts reduction in medical imaging reconstruction [1][2]. Despite of the sup erior image quality ac hieved b y neural net works compared to handcrafted prior functions [3], almost all the neural net work based methods require b oth noisy and clean images during the training. Ho wev er, such clean images are not alw ays accessible due to nat- urally noisy signals acquired in many medical applications. Dynamic imaging, including dynamic positron emission tomography (PET), dynamic magnetic res- onance (MR), and computed tomograph y (CT) p erfusion, acquires signals with rapid temp oral change, and the signal quality is limited due to short acquisition time. Sp ectral CT has very noisy material images due to the ill-p osed decomp o- sition pro cedure [4]. Arterial spin lab eling (ASL) MR also has noisy images due to the low efficiency in lab eling arterial blo od with magnetic field, whic h results in noisy signals emitted b y the lab eled blo o d [5]. 2 Dufan W u et al. A recent work prop osed in 2018, Noise2noise [6], demonstrated that denoising net works can be learned b y mapping a noisy image to another noisy realization of the same image, and the performance w as similar to that using noisy-clean pairs. Ho wev er, it requires at least tw o noise realizations for each training sample, whic h is not readily av ailable for most medical images. Even if tw o noise realizations are given, Noise2noise framework can only effectiv ely use one of them, which degraded the achiev able image quality . Last but not least, [6] did not clarify the conditions for Noise2noise to work, which can b e problematic for medical imaging due to the complicated noise c haracteristics. In this w ork we prop osed a consensus net work for medical imaging which required only noisy data for training. Our consensus netw ork w as inspired by Noise2noise but with ma jor improv emen ts for medical imaging. The Noise2noise framew ork w as first analyzed with a newly prop osed mathematical theorem, whic h further clarified its applicable condition. Based on conditions derived from the theorem, the acquired signals were split to t wo sets to reconstruct images with differen t noise realizations. The denoising netw ork was then trained to map one noise realization to the other, with a nov el loss function whic h efficiently ag- gregated b oth noise realizations during testing time. The prop osed metho d was ev aluated on Low-dose CT (LDCT) Challenge dataset [7] for quarter-dose CT image denoising, and New Y ork Universit y (NYU) MR dataset [8] for 4 × under- sampling parallel imaging. Results from both datasets demonstrated impro ved p erformance compared to the original Noise2noise framework and iterative re- construction metho ds. 2 Metho dology W e will first provide proof for Noise2Noise training, then deriv e loss function for the prop osed consensus netw ork based on our theorem. 2.1 Noise2noise T raining Giv en paired clean and noisy images x i , x i + n i ∈ R n , conv en tional metho d to train denoising net work under L2-loss is: Θ c = argmin Θ 1 N N X i =1 k f ( x i + n i ; Θ ) − x i k 2 2 , (1) where f ( x ; Θ ) : R n → R n is the denoising neural net work and Θ ∈ R m is the parameters to b e trained. Equation (1) is referred as Noise2clean training. Noise2noise training uses tw o indep enden t noise realizations of each sample for the training: Θ n = argmin Θ 1 N N X i =1 k f ( x i + n i 1 ; Θ ) − ( x i + n i 2 ) k 2 2 , (2) Consensus Netw ork 3 where n i 1 , n i 2 ∈ R n are t wo indep endent noise samples. The equiv alence b etw een Noise2noise (2) and Noise2clean (1) is guaranteed b y theorem 1, which is one of our main contributions: Theorem 1. If c onditional exp e ctation E { n i 2 | x i + n i 1 } = 0 ∀ Θ ∈ R m and i , then lim N →∞ Θ n = Θ c . Pr o of. Let y i := f ( x i + n i 1 ; Θ ) and expand the loss function in (2): 1 N N X i =1 k y i − ( x i + n i 2 ) k 2 2 = 1 N N X i =1 k y i − x i k 2 2 − 1 N N X i =1 2 n T i 2 y i + 1 N N X i =1 ( n T i 2 n i 2 + 2 n T i 2 x i ) (3) The first term is Noise2clean loss (1) and the last term is irrelev an t to Θ . F or the second term, according to Lindeb erg-Levy central limit theorem: √ N ( 1 N X 2 n T i 2 y i − E { 2 n T i 2 y i } ) d − → N (0 , V ar { 2 n T i 2 y i } ) , (4) where V ar { x } is the v ariance of x , and N (0 , σ 2 ) is a normal distribution with v ariance σ 2 . As V ar { 2 n T i 2 y i } is finite, the second term in (3) will conv erge to E { 2 n T i 2 y i } as N → ∞ . The exp ectation can b e written as conditional exp ectation: E { 2 n T i 2 y i } = 2 E { E T { n i 2 | y i } y i } = 2 E { E T { n i 2 | x i + n i 1 } f ( x i + n i 1 ; Θ ) } , (5) the last equiv alence was due to that y i = f ( x i + n i 1 ; Θ ) w as deterministic. Equation (5) equals to 0 giv en the theorem’s assumption, which lead to di- minishing second term in (3) as N → ∞ . As the third term was irrelev an t to Θ , w e hav e: argmin Θ 1 N N X i =1 k y i − x i k 2 2 = argmin Θ 1 N N X i =1 k y i − ( x i + n i 2 ) k 2 2 , (6) as N → ∞ , which implies Θ n = Θ c .  2.2 Consensus Loss F unction The key to Noise2noise training is finding indep enden t and zero-mean noise realizations n i 1 and n i 2 . F or medical imaging, it can b e achiev ed by splitting the measurement to independent sets whic h are reconstructed separately . F or example, in low-dose CT, x i + n i 1 and x i + n i 2 can b e images reconstructed from o dd and even pro jections resp ectively . The ma jor dra wback of this “data splitting” metho d w as that both x i + n i 1 and x i + n i 2 w ere noisier than the original noisy image, whic h restricted the quality of denoised images. T o efficiently aggregate both noise realizations, 4 Dufan W u et al. they should b e taken in to consideration in the same loss function. According to theorem 1, suc h loss function can b e designed under the Noise2clean framework first, then substituting clean images x i with its noisy version when appropriate. The follo wing Noise2clean mo del was considered to derive our consensus loss: L c = 1 N N X i =1     f ( x i + n i 1 ; Θ 1 ) + f ( x i + n i 2 ; Θ 2 ) 2 − x i     2 2 = 1 N N X i =1  1 2 k y i 1 − x i k 2 2 + 1 2 k y i 2 − x i k 2 2 − 1 4 k y i 1 − y i 2 k 2 2  , (7) where y i 1 := f ( x i + n i 1 ; Θ 1 ) and y i 2 := f ( x i + n i 2 ; Θ 2 ). Θ 1 , Θ 2 ∈ R m w ere the trainable v ariables. The second equiv alence w as just simple factorization. This mo del effectiv ely aggregated x i + n i 1 and x i + n i 2 b y taking b oth of them in to consideration in the same loss function. Similar to theorem 1, x i can b e substituted with x i + n i 1 or x i + n i 2 during training, and our Noise2noise consensus loss is giv en as: L n = 1 N N X i =1  1 2 k f ( x i + n i 1 ; Θ 1 ) − ( x i + n i 2 ) k 2 2 + 1 2 k f ( x i + n i 2 ; Θ 2 ) − ( x i + n i 1 ) k 2 2 − 1 4 k f ( x i + n i 1 ; Θ 1 ) − f ( x i + n i 2 ; Θ 2 ) k 2 2  (8) and the denoised image is giv en by: z i = f ( x i + n i 1 ; Θ 1 ) + f ( x i + n i 2 ; Θ 2 ) 2 (9) 2.3 Regularization In practice, n i 1 and n i 2 could b e correlated due to aliasing related to structures, and N may not be large enough in mini-batch training. Tw o regularization terms w ere added b eside L n for artifacts reduction and detail preserv ation. The first term w as weigh t decay defined as: L w = k Θ 1 k 2 2 + k Θ 2 k 2 2 , (10) whic h was effective at eliminating artifacts due to noise correlation. The second term w as image consistency: L r = 1 N N X i =1   z i − x est i   2 2 , (11) where z i is giv en by (9) and x est i is an estimation of x i . It could b e the original noisy images, or reconstructed with artifacts-suppressing algorithms. Consensus Netw ork 5 The final loss function for training w as L = L n + β w L w + β r L r , (12) where β w could b e tuned first with β r = 0 to remov e artifacts, then β r could b e tuned for image details with fixed β w . The denoised image w as given by (9). 3 Exp erimen ts 3.1 Data Preparat ion W e v alidated the proposed metho ds on t w o datasets: LDCT Grand Challenge [7] and NYU MR images [8]. Both datasets ha ve high qualit y images which pro vided reference for ev aluation. The LDCT dataset consisted of ab domen CT scans from 10 patien ts. Quarter- dose raw data w as synthesized from the acquisitions by realistic noise insertion. The ra w data was rebined to m ulti-slice fan b eam sinogram for image reconstruc- tion. W e randomly chose 50 slices from each patients for our study , and used 8 patien ts for training with the other 2 patien ts for testing. F or eac h slice, we split the 2304 quarter-dose pro jections to odd and even set and used filtered bac kpro jection (FBP) with Hann filter to reconstruct the x i + n i 1 and x i + n i 2 required for the training of consensus net work. The NYU MR dataset used in this study consisted of sagittal knee MR scans with T urb o Spin Echo sequence from 20 patients. 4 × catesian downsampling w as syn thesized b y do wnsampling the original kspace data with different random masks for each slice. The central 48 lines w ere kept for aliasing reduction and sensitivit y map estimation. Each patients had 31 to 35 slices and we randomly c hose 16 patients for training with 4 patien ts for testing. F or each patient, the sampled lines were further randomly split to t w o sets with 8 × dwonsampling eac h. The central 48 lines were kept for both sets. Zero-filling was used to reconstruct x i + n i 1 and x i + n i 2 due to its linear prop ert y so that the artifacts was zero-mean. The random sampling part of the kspace data w as also amplified b y a factor of 8 to enforce the zero-mean prop erty . 3.2 P arameters W e used UNet [1][9] as f ( x ; Θ ) in all the studies. 32 and 64 basic featuremaps w ere used for LDCT and NYU MR datasets resp ectiv ely . F or MR study , the real and image part of the images were fed to the netw ork as tw o channels. The x est i in CT and MR studies were quarter-dose FBP results and SENSE [10] results resp ectiv ely . The hyperparameters, β w and β r , were tuned according to section 2.3. LDCT study used β w = 5 × 10 − 6 and β r = 0 . 5. MR study used β w = 1 × 10 − 6 and β r = 5. The netw orks were trained on 96 × 96 lo cal patc hes with mini-batch size of 40. A t each ep o c h, 40 patches were randomly extracted from each training slice. The netw orks were trained by Adam algorithm with learning rate of 10 − 4 for 100 ep o chs. CT images were normalized to HU / 1000 and MR images were normalized to [-1, 1] b efore b eing fed to the netw orks. 6 Dufan W u et al. Full dose FBP TV Noise2noise Proposed Noise2clea n Fig. 1. A testing CT slice pro cessed with different metho ds. A small lesion is marked with blue arrow on the full-dose image. The display window is [-160, 240] HU. T able 1. RMSEs and SSIMs of the testing CT images. SSIMs were calculated within the liver window [-160, 240] HU. Noise2clean required clean images for training. Index FBP TV Noise2noise Proposed Noise2clean RMSE(HU) 27 . 6 ± 5 . 3 36 . 9 ± 4 . 4 17 . 4 ± 3 . 2 18 . 4 ± 2 . 9 14 . 7 ± 2 . 8 SSIM 81 . 5 ± 4 . 7 81 . 8 ± 3 . 5 84 . 2 ± 4 . 6 87 . 1 ± 3 . 4 87 . 7 ± 3 . 3 4 Results The ro ot mean squares error (RMSE) and structural similarity index (SSIM) on the testing quarter-dose CT dataset are given in table 1, and one of the testing slices with tw o lesions is given in figure 1. Beside the prop osed metho d, w e also ga v e the results from quarter-dose FBP , iterativ e reconstruction with total v ariation (TV) minimization [3], original Noise2noise (2) and Noise2clean (1). All the metho ds did not require clean images for training or testing except for the Noise2clean, whic h was sup ervised learning. The prop osed metho d ac hieved the best SSIM among all the unsup ervised metho ds. Although Noise2noise had low er RMSE compared to the prop osed metho d, its images were significantly ov ersmo othed as shown in figure 1. The higher RMSE of the proposed metho ds were mainly due to mismatch in noise patterns in the reference images and the proposed results. The prop osed metho d preserv ed b oth the small lesion structure and textures compared to TV and Noise2noise. The structural details recov ered by the prop osed metho d were very similar to that b y the Noise2clean training, and they achiev ed close SSIMs. Consensus Netw ork 7 Full sample Zero-filling SENSE Noise2noise Proposed Noise2clea n Fig. 2. Amplitude images of a testing MR slice processed with different metho ds. Structures are marked with blue arro ws on the fully sampled image. T able 2. RMSEs and SSIMs of the testing MR images. SSIMs were calculated for the amplitude images. Noise2clean required clean images for training. Index Zero-filling SENSE Noise2noise Proposed Noise2clean RMSE(10 − 3 ) 36 . 3 ± 7 . 6 38 . 4 ± 4 . 6 27 . 7 ± 6 . 3 23 . 4 ± 4 . 2 21 . 0 ± 4 . 5 SSIM 90 . 1 ± 1 . 9 86 . 5 ± 1 . 1 94 . 6 ± 1 . 3 95 . 2 ± 0 . 6 96 . 4 ± 0 . 8 The 4 × downsampling MR testing results are given in table 2 and figure 2. Results from Zero-filling, SENSE, Noise2noise and Noise2clean are given b eside the prop osed metho d. In the MR study , the prop osed metho d achiev ed the b est RMSE and SSIM among all the unsup ervised methods. Noise2noise results were still o versmoothed. The proposed metho d preserv ed detailed structures of the knee compared to Noise2noise, and had significan tly reduced artifacts and noise compared to zero- filling and SENSE. Compared to Noise2clean result, the prop osed method recov- ered almost the same amoun t of structures despite of some sligh t loss in contrast. The artifacts and noise lev el were also similar for the tw o results. 5 Conclusion and Discussion In this pap er we prop osed an unsup ervised learning metho d for medical image denoising whic h only required noisy samples during training. A no vel theorem w as prop osed for the Noise2noise framew ork. Our consensus netw ork w as pro- p osed based on the theorem with nov el framework and loss functions designed 8 Dufan W u et al. for medical imaging. The prop osed loss function efficiently utilized b oth noise realizations of the same ob ject and achiev ed improv ed p erformance compared to Noise2noise under the same framew ork. The prop osed metho d achiev ed b etter p erformance than other unsup ervised metho ds. Its image qualit y was close to that of sup ervised denoising netw orks. The prop osed metho d had w eak assumption on the property of noise and images. It work ed for b oth lo cal noise in CT and non-lo cal undersampling ar- tifacts in MR. The noise was only required to b e zero-mean and indep endent in the t wo realizations. Whereas zero-mean can b e guaranteed with appropriate reconstruction algorithms, the indep endence prop erty could b e breached due to v arious factors. The artifacts caused by correlations w ere successfully comp en- sated with the prop osed regularizations. Although the proposed method need to split the raw data to create the train- ing dataset, its deploymen t can work in image domain only . After the consensus net work w as trained, another netw ork could b e trained to map noisy images to the output of conse nsus net work in a conv en tional sup ervised manor. Hence, the net works’ deploymen t do not need to interfere existing workflo w of scanners. W e achiev ed promising results on existing public datasets where high-quality reference images w ere av ailable, which pro vided reliable ev aluation of the metho d’s p erformance. The metho d can b e applied to more c hallenging applications where only noisy images are a v ailable, such as dynamic PET, ASL MR, and spectral CT. F urthermore, the metho d itself could also b e improv ed, such as replacing the image consistency (11) with data consistency; or replacing UNet with recon- struction net works. References 1. Jin, K.H., et al.: Deep conv olutional neural netw ork for inv erse problems in imaging. IEEE T rans. Image Pro cess. 26 (9), 4509–4522 (2017) 2. Kang, E., et al.: A deep conv olutional neural netw ork using directional wa velets for lo w-dose X-ray CT reconstruction. Med. Phys. 44 (10), e360–e375 (2017) 3. Sidky , E.Y., Pan, X.: Image reconstruction in circular cone-beam computed to- mograph y by constrained, total-v ariation minimization. Phys. Med. Biol. 53 (17), 4777–4807 (2008) 4. Niu, T., et al.: Iterativ e imagedomain decomp osition for dualenergy CT. Med. Phys., 41 , 041901 (2014) 5. Bibic, A., et al.: Denoising of arterial spin labeling data: wa v elet-domain filtering compared with Gaussian smo othing. Magn. Reson. Mater. Phys., Biol. Med. 23 (3), 125–137 (2010) 6. Leh tinen, J., et al.: Noise2Noise: Learning image restoration without clean data. Pro c. Mach. Learn. Res. 80 , 2965–2974 (2018) 7. McCollough, C.H., et al.: Low-dose CT for the detection and classification of metastatic liver lesions: Results of the 2016 low dose CT grand challenge. Med. Ph ys. 44 (10), e339–e352 (2017) 8. Hammernik, K., et al.: Learning a v ariational netw ork for reconstruction of acceler- ated MRI data. Magn. Reson. Med. 79 (6), 3055–3071 (2018) Consensus Netw ork 9 9. Ronneb erger, O., et al.: U-net: conv olutional net works for biomedical image seg- men tation. Med. Image. Comput. Comput. Assist. Interv., 234–241 (2015) 10. Pruessmann, K.P ., et al.: Sense: sensitivity enco ding for fast MRI. Magn. Reson. Med. 42 (5), 952–962 (1999)

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment