Signed Laplacian Deep Learning with Adversarial Augmentation for Improved Mammography Diagnosis

Signed Laplacian Deep Learning with Adv ersarial Augmen tation for Impro v ed Mammograph y Diagnosis Heyi Li 1 ? , Dongdong Chen 1 * , William H. Nailon 2 , Mik e E. Davies 1 , and Da vid I. Laurenson 1 1 Institute for Digital Communications, Univ ersit y of Edinburgh, Edin burgh, UK { Heyi.Li,d.chen, dave.laurenson, mike.davies } @ed.ac.uk 2 Oncology Ph ysics Department, Edin burgh Cancer Centre, W estern General Hospital, Edin burgh bill.nailon@luht.scot.nhs.uk Abstract. Computer-aided breast cancer diagnosis in mammograph y is limited by inadequate data and the similarit y betw een b enign and cancerous masses. T o address this, w e prop ose a signed graph regularized deep neural net work with adv ersarial augmentation, named Dia gNet . Firstly , w e use adv ersarial learning to generate p ositiv e and negativ e mass-con tained mammograms for eac h mass class. After that, a signed similarit y graph is built up on the expanded data to further highlight the discrimination. Finally , a deep con v olutional neural netw ork is trained b y jointly optimizing the signed graph regularization and classiﬁcation loss. Experiments sho w that the DiagNet framew ork outperforms the state-of-the-art in breast mass diagnosis in mammography . Keyw ords: Deep learning · Mammography Diagnosis · Adv ersarial learn- ing · Graph regularization 1 In tro duction Breast cancer is one of the most frequently diagnosed mortal diseases for w omen all ov er the world [6]. Mammography is extensively applied and computer-aided diagnosis systems (CADs) are often emplo yed as a second reader. Leveraging the recen t success of deep neural netw orks on representation learning, deep learning based CADs [7, 11, 12, 16, 17, 20] outp erform traditional metho ds, which rely heav- ily on handcrafted features. How ev er, t wo ma jor c hallenges in mammographic CADs remain (1): limited access to well annotated data [7] and (2): the similar- it y b et ween b enign and cancerous masses. T o alleviate the impact of inadequate data, [7, 11, 12, 20] applied classical geometric transformations for data augmen ta- tion (e.g. ﬂips, rotations, random crops etc), and more recently , [16, 17] generated syn thetic images on the manifold of real mammograms using adv ersarial learning [9], which enjoys a pow erful ability to learn the unknown underlying distribution. Unfortunately , the follo wing questions remain unansw ered: What kind of data ? These authors con tribute equally to this work. 2 H. Li et al. augmen tation is most helpful for CADs in mammograph y? How can we alleviate the impact of the similarity betw een data, i.e., ho w can we maximize the margin b et w een manifolds with a small diﬀerence? In this paper, we prop ose a new deep learning framework that improv es mam- mograph y diagnosis as follows. Firstly , we prop ose an adversarial data augmen- tation strategy , in which both p ositiv e and negative samples of speciﬁc classes are generated in an unsup ervised manner, in order to mak e more distinct b ound- aries b et ween diﬀeren t classes. After that, w e build a signed graph Laplacian ov er the augmen ted data to quantitativ ely capture the geometric structure of data. Finally , we train a deep neural netw ork b y jointly optimizing the graph regular- ization and classiﬁcation loss, by whic h the in tra-class diﬀerence is minimized, and more imp ortan tly , the inter-class manifold margin is maximized in the deep represen tation space. Extensive exp erimen ts sho w that the prop osed DiagNet outp erforms the state-of-the-art of breast masses diagnosis in mammography . 2 Preliminary 2.1 Adv ersarial learning Adv ersarial learning is a tec hnique that attempts to fo ol mo dels through mali- cious input [10] and has achiev ed impressiv e results in represen tation learning. The key idea of the success of a Generative Adversarial Net work (GAN) is to force the output of the generator to b e indistinguishable from the real input [9]. Adv ersarial training is particularly p o w erful for image generation, and for learn- ing unknown and complicated distributions from the training data. In this paper, w e prop ose to use adversarial learning to generate oﬀ-distribution instances along with on-distribution instances in order to enlarge the medical image data. 2.2 Manifold Learning In real applications, data typically reside on a low-dimensional manifold embed- ded in to a high-dimensional ambien t space [15]. Manifold learning is extensively explored because of its eﬀectiveness for preserving the top ological locality , whic h relies on the assumption that neigh b ors tend to ha ve the same labels [3]. In this pap er, we aim to incorp orate graph embedding in to a deep neural net work as a regularizer in the latent space. In addition, lo cal data manifold structure preserv ation within the hidden represen tations in deep neural netw orks oﬀers the p ossibilit y of improving the p erformance of the classiﬁer [4]. 3 Prop osed Metho d In this section, w e formally introduce the details of DiagNet , which is comp osed of three steps as shown in Fig.1: (1) adversarial augmentation, (2) a signed graph Laplacian built upon the augmented data and (3) join t optimization of the classiﬁer loss and signed graph regularizer. W e ﬁrst deﬁne the notation applied Signed Laplacian Deep Learning with Adversarial Augmen tation 3 +1 -1 -1 -1 +1 -1 (2) B ui ld Si gn ed Gr aph (1) A dv ers ari al A ug men tat i on (3) J oi n t Op ti mi zati o n Original Data Space M a x i m i s e M a n i f o l d M a r g i n Deep Latent Space Augmented Data Space Augmented Data Space (a) DC Block, 128 DC Block, 256 DC Block, 728 DC Block, 728 RC Block, 728 RC Block, 728 RC Block, 728 RC Block, 728 RC Block, 728 RC Block, 728 RC Block, 728 Dense, 1024 Dense, 1024 Dense, 1024 SoftMax Layer R C B l o c k f(x)+x ReLu SConv SConv x Di agNet D C B l o c k ReLu Pooling SConv SConv f(x)+x x (b) Fig. 1: The prop osed DiagNet for Breast Mass Diagnosis. (a) the framework of the prop osed algorithm, which consists of three steps. { x 1 , x 2 } and { x 3 , x 4 } are samples on benign manifold M b and malignant manifold M m resp ectiv ely . In the ﬁrst step, i.e. adversarial data augmentation, positive neigh bors { x 5 , x 7 } and negativ e neighbors { x 6 , x 8 } are generated with (1) and (2) resp ectiv ely . Then a signed graph is built up on b oth original and augmented samples as (3). Finally , a joint loss (6) is optimized in the deep latent space, maximizing data manifold margin. (b) The utilized deep netw ork architecture. “DC block” represents a do wn-sampling conv olutional blo c k, “RC blo c k” is a residual conv olutional blo c k, and “SCon v” is separable conv olutions. throughout the paper. Let { X , Y } = { x i , y i } n i =1 b e the n mammograms with corresp onding lab els, where x i ∈ R H × W is an image sample and y i ∈ { y c } C c =1 is the class lab el. Let { X c , y c } denote the c -th class data. 3.1 Adv ersarial Augmentation As also mentioned in section 1, inadequate data and the similarit y b et ween b enign and cancerous masses [7] are t wo main reasons causing high false pos- itiv es in mammographic CADs. Recently , [1, 16, 17] emplo yed GANs to create new instances. Even though they generated on-distribution samples that are not separable by discriminators, they ignored the imp ortance of distinguishable but similar instances, which tend to improv e the discriminative ability . T o o vercome this shortcoming, as shown in Fig.1(a), we prop ose to use adversarial learning to generate more instances of b oth p ositive neighb ors (i.e. instances on the man- ifold, e.g. x 5 and x 7 ) and ne gative neighb ors (i.e. instances oﬀ the manifold, e.g. 4 H. Li et al. x 6 and x 8 ). Here, there are deﬁned tw o manifolds: M b for b enign images and M m for malignan t images. In particular, inspired by [19], we generate neigh b oring instances one by one for a certain data class { X c , y c } , c = 1 , 2 , · · · , C , where C = 2 in this pap er. Sp eciﬁcally , b oth p ositiv e and negative neighbors are generated based on the noise corrupted seed p oin ts (a n umber of randomly selected samples in X c ) and they are b oth close to the original data points. In particular, the p ositiv e neighbors X + c are the generated samples that cannot b e separated from X c b y a discriminator, while the negativ e neighbors X − c are the ones that can b e separated. Finally , the expanded dataset for class c is of the form X c = { X c ∪ X + c ∪ X − c } , and the whole dataset is X = S c X c . Let x b e a desired new sample and P ( x ; X c , X + c ) b e the probability that x is classiﬁed as class c b y a discriminator trained on { X c , X + c } . Similarly P ( x ; X c , X − c ) corresp onds to a discriminator trained on { X c , X − c } . Note that X + c and X − c are initialized as empty . In this pap er, we trained tw o SVM classi- ﬁers as the discriminators and the corresp onding output probability is obtained with logistic sigmoid of the output signed distance. Accordingly , a set of neigh- b oring instances { x t } T t =1 of X c are iterativ ely generated. In each iteration t , the discriminator is learned and the w eights are updated. After T iterations of training, w e select one desired p ositiv e neighbor x : arg max x P  x ; X c , X + c ∪ { x t } T t =1  − γ max { 0 , r 1 − min x i ∈ X + c d ( x , x i ) } , (1) where d ( · ) is a distance measure, γ w eights the distance regularization, forcing generated p oin ts to be diﬀerent with a minim um distance r 1 . Similarly , w e select one desired negativ e neigh b or x , with an added distance restriction to force new p oin ts to b e scattered close to X c : arg min x P  x ; X c , X − c ∪ { x t } T t =1  + γ max { 0 , r 2 − min x j ∈ X − c d ( x , x j ) } + γ max { 0 , min x i ∈ X c d ( x , x i ) − r 3 } , (2) where the distance regularization forces generated p oin ts to acquire a minimum distance r 2 and maxim um distance r 3 . 3.2 Signed graph Laplacian regularizer Graph embedding trained with distributional context can b o ost p erformance in v arious pattern recognition tasks. In this pap er, we aim to incorp orate the signed graph Laplacian regularizer [2] to learn a discriminativ e datum represen tation H ( X ) b y a deep neural netw ork, where discriminativ e here means that the in tra- class data manifold structure is preserved in the laten t space and the inter- manifold (sligh tly diﬀerent) margins are maximized. Using the sup ervision of the adversarial augmentation in section 3.1, w e build a signed graph up on the expanded data X . Given X c = { X c , X + c , X − c } for class Signed Laplacian Deep Learning with Adversarial Augmentation 5 c , and all other classes data X − c = S t =1 , ··· ,C ; t 6 = c { X t , X + t , X − t } , for ∀ x i ∈ X c , the corresp onding elements in the signed graph is built as follows: φ ij = ( +1 , x j ∈ { X c ∪ X + c } n + i , − 1 , x j ∈ { X − c ∪ X − c } n − i , (3) where the {·} n + i ( {·} n − i ) denotes the corresp onding n + ( n − ) nearest neigh b orhoo d of x i to appro ximate the lo calit y of the manifold. Then, we compute the structure preserv ation in the deep representation space (directly behind the softmax lay er as sho wn in Fig.1(b)) H = { h ( x i ) } N i =1 , where N = |X | . The signed graph Laplacian regularizer is deﬁned as following: J g ( X , Φ ) = X i,j ( φ ij · dist ( h ( x i ) , h ( x j )) , if φ ij > 0 max  0 , m + φ ij · dist ( h ( x i ) , h ( x j ))  , if φ ij < 0 , (4) where dist ( · ) is a distance metric for the dissimilarity betw een h ( x i ) and h ( x j ). It encourages similar examples to b e close, and those that are dissimilar to hav e a distance of at least m eac h other, where m > 0 is a margin. Note that instead of calculating the manifold em b edding b y solving an eigen- v alue decomp osition, w e learn the embedding H by a deep neural netw ork. Specif- ically , inspired by the depth-wise separable conv olutions [5] that are extensively emplo yed to learn mappings with a series of factoring ﬁlters, we build stac ks of depth-wise separable con volutions with similar top ological architecture to that in [5] to learn suc h deep representations (Fig.1(b)). Therefore, b y minimizing (4), it is exp ected that if t w o connected no des x i and x j are from the same class (i.e. φ ij is p ositiv e), h ( x i ) and h ( x j ) are also close to each other, and vice versa. Beneﬁting from suc h learned discriminativity , w e train a simple softmax classiﬁer to predict the class lab el, i.e., J l = − 1 N N X i =1 C X c =1 δ c ( y i ) log P  y i | x i ; θ  , (5) where δ c ( y i ) = 1 when y i = c , and 0 otherwise; θ is the parameter set of the neural net work. Finally , by incorp orating the signed Laplacian regularizer (4) and the classi- ﬁcation loss (5), the total ob jective of Dia gNet is accordingly deﬁned as: J = J l + λ J g , (6) where λ ≥ 0 is the regularization trade-oﬀ parameter which controls the smo oth- ness of hidden represen tations. 4 Exp erimen ts 4.1 Datasets and ROIs selection The Dia gNet is ev aluated on the most frequently used full-ﬁeld digital mam- mographic dataset, INbreast [13]. 107 mass contained mammograms are divided 6 H. Li et al. in to a training and a test set con taining 80% and 20% of the images respectively . As for ROIs selection, rectangular mass-con tained boxes are selected with pro- p ortional padding (1 . 6 times) up on original R OI bounding b o xes. The selected R OIs are augmen ted with ﬂips and further adversarially augmented b y 40% more (20% p ositiv e neighbors and 20% negative neighbors). 4.2 Implemen tation Details W e ﬁrst solve the prop osed adversarial augmentation in (1) and (2) by the deriv ativ e-free optimization approac h RA COS algorithm [18]. The distance mea- sure d ( · ) in (1) and (2) is set to be the angular cosine distance because of its sup erior discriminativ e information [14]. Let ρ = min x i , x j ∈X c d ( x i , x j ), then we set the radius parameters r 1 , r 2 = ρ , and r 3 = 3 × ρ for X c . F urther T = 200 and γ is 10 − 2 . Secondly , the signed graph is built up on augmented data X . F or each graph no de, n + and n − in (3) are optimally c hosen as 1 and 4 resp ectiv ely using grid searc h. In addition, the metric dist ( · ) in (4) is also the angular cosine distance and m is 1. Finally , the deep neural netw ork is built with stac ks of 3 × 3 kernel-sized sep- arable conv olutional la y ers. The ﬁrst three blo c ks are equipped with increasing feature maps (128, 256, 728) and decreasing spatial squared size (224, 112, 56), and the consecutive sev en blo c ks keep the same feature map with size 28. After global av eraging and three fully connected la y ers of 1024 neurons, a softmax la yer is padded for lab el prediction. Drop out la yers with 50% drop out rate and w eight decay with l 2 norm rate 10 − 4 are used to preven t o ver-ﬁtting. Residual skips are added in order to solve the gradient diverging and v anishing problems. The regularization parameter λ in (6) is optimally c hosen as 1. 4.3 Results and analysis Adv ersarial Augmen tation: T o examine the qualit y of generated images by the prop osed adv ersarial augmentation strategy , w e carry out the exp erimen t on the INbreast dataset. Fig.2 visually sho ws the augmented examples. It can b e seen that, for either mass t ype, the generated p ositiv e and negativ e neigh b ors are b oth similar to the original data, but the negative neighbors are more diﬀerent. Compare to the state-of-art: W e v alidate DiagNet ’s p erformance with accuracy and AUC (area under the R OC curv e) scores. T able.1 compares the state-of-art algorithms, in which [11] is re-implemen ted and the results of the remaining ones are taken from the original papers. It shows that, the Diag- Net has ac hieved the state-of-art with mean accuracy 93.4% and AUC score 0.95. When compared with the second b est algorithm [16], the DiagNet ’s AUC score is signiﬁcantly higher with exp erimen ts on the whole dataset without any pre-pro cessing, p ost-processing or transfer learning. In addition, empirical ob- serv ations sho w that our mo del is robust to noise and geometric transforms, and these results are omitted due to the space limitation. Signed Laplacian Deep Learning with Adversarial Augmentation 7 (a) Benign Masses (b) Malignant Masses Fig. 2: Generated mammogram examples by prop osed adv ersarial augmentation strategy . The masses in the ﬁrst ro w of b oth (a) and (b) are original data, the second and the third ro w are generated p ositiv e and negativ e neigh b ors, resp ectiv ely . T able 1: Breast Mass Diagnosis performance comparisons of the proposed Di- agNet and relative state-of-the art metho ds on INbreast test set. Metho dology End-to-end Accuracy A UC (2012) Domingues et. al [8] 5 89% N/A (2016) Dh ungel et. al [7] 3 91% 0.76 (2017) Zh u et. al [20] 3 90% 0.89 (2018) Shams et. al [16] 3 93% 0.92 (2019) Li et. al [11] 3 88% 0.92 prop osed DiagNet 3 93 . 4 ± 1 . 9% 0 . 950 ± 0 . 02 Imp ortance of Signed Graph Laplacian regularizer: Determining the optimal v alues of h yper-parameter is a big challenge in deep learning. T o explore DiagNet ’s performance with diﬀeren t signed graph conﬁgurations, the v alues of n + and n − are ﬁrst grid searched with ﬁxed regularization parameter λ = 1, as sho wn in Fig.3(a). The b est performance o ccurs when n + = 1 and n − = 4, whic h increases at least b y 8% the accuracy rate and by 12% the AUC score compared to the baseline (no graph regularization, n + , n − = 0). This conﬁrms the eﬀectiveness of using the signed graph regularization. In addition, results sho w that the DiagNet achiev es go od p erformance only when b oth n + and n − are considered in the corresp onding singed graph construction. Fig.3 sho ws the p erformances with v arious v alues of λ , where the b est result o ccurs at λ = 1. 5 Conclusions In this paper, w e prop osed a Dia gNet for improv ed mammogram image anal- ysis. By in tegrating the signed graph regularizer and the adv ersarial sampling augmen tation, Dia gNet w orks in a simple but eﬀective wa y to learn discrimi- nativ e features. Extensiv e exp erimen ts sho w that our metho d outperforms state- of-the-art on breast mass diagnosis in mammograph y . 8 H. Li et al. 85% 91% 89% 89% 91% 93% 88% 87% 83% 90% 90% 94% 94% 95% 89% 88% (0,0) (5, 5) (4,1) (5,0) (10,0) (1, 4) (0,5) (0, 10) Ac cura cy AUC (a) Conﬁgurations of ( n + , n − ) 85% 90% 89% 91% 93% 92% 83% 94% 91% 88% 95% 92% 0 0.001 0.01 0.1 1 10 Ac cura cy AUC (b) Conﬁgurations of λ Fig. 3: P erformance of Dia gNet on INBreast with v arying parameters. Classi- ﬁcation accuracy and AUC score versus (a) diﬀeren t n + p ositiv e neighbors and n − negativ e neighbors and (b) v arious regularizer parameter λ . References 1. An toniou, A., Storkey , A., Edwards, H.: Data augmentation generativ e adversarial net works. arXiv preprint arXiv:1711.04340 (2017) 2. Chen, D., Lv, J., Davies, M.E.: Learning discriminative represen tation with signed Laplacian restricted Boltzmann machine. arXiv preprint arXiv:1808.09389 (2018) 3. Chen, D., Lv, J., Yi, Z.: Unsup ervised multi-manifold clustering by learning deep represen tation. In: W orkshops at the 31th AAAI conference on artiﬁcial in telligence (AAAI). pp. 385–391 (2017) 4. Chen, D., Lv, J., Yi, Z.: Graph regularized restricted b oltzmann machine. IEEE T ransactions on Neural Netw orks and Learning Systems 29 (6), 2651–2659 (2018) 5. Chollet, F.: Xception: Deep learning with depthwise separable con volutions. In: Pro ceedings of the IEEE conference on computer vision and pattern recognition. pp. 1251–1258 (2017) 6. DeSan tis, C., Ma, J., Bryan, L., Jemal, A.: Breast cancer statistics, 2013. CA: a cancer journal for clinicians 64 (1), 52–62 (2014) 7. Dh ungel, N., Carneiro, G., Bradley , A.P .: The automated learning of deep fea- tures for breast mass classiﬁcation from mammograms. In: International Confer- ence on Medical Image Computing and Computer-Assisted In terven tion. pp. 106– 114. Springer (2016) 8. Domingues, I., Sales, E., Cardoso, J., P ereira, W.: INbreast-database masses char- acterization. XXI II CBEB (2012) 9. Go odfellow, I., P ouget-Abadie, J., Mirza, M., Xu, B., W arde-F arley , D., Ozair, S., Courville, A., Bengio, Y.: Generativ e adversarial nets. In: Adv ances in neural information pro cessing systems. pp. 2672–2680 (2014) 10. Kurakin, A., Goo dfello w, I.J., Bengio, S.: Adversarial machine learning at scale (2017) 11. Li, H., Chen, D., Nailon, W.H., Davies, M.E., Laurenson, D.: A deep dual-path net work for improv ed mammogram image pro cessing. International Conference on Acoustics, Sp eec h and Signal Processing (2019) 12. Li, H., Chen, D., Nailon, W.H., Da vies, M.E., Laurenson, D.: Impro ved breast mass segmen tation in mammograms with conditional residual U-Net. In: Image Analysis for Mo ving Organ, Breast, and Thoracic Images, pp. 81–89. Springer (2018) Signed Laplacian Deep Learning with Adversarial Augmentation 9 13. Moreira, I.C., Amaral, I., Domingues, I., Cardoso, A., Cardoso, M.J., Cardoso, J.S.: INbreast: to ward a full-ﬁeld digital mammographic database. Academic radiology 19 (2), 236–248 (2012) 14. Nair, V., Hinton, G.E.: Rectiﬁed linear units impro ve restricted Boltzmann ma- c hines. In: Proceedings of the 27th international conference on machine learning (ICML-10). pp. 807–814 (2010) 15. Seung, H.S., Lee, D.D.: The manifold w a ys of perception. science 290 (5500), 2268– 2269 (2000) 16. Shams, S., Platania, R., Zhang, J., Kim, J., Park, S.J.: Deep generative breast cancer screening and diagnosis. In: In ternational Conference on Medical Image Computing and Computer-Assisted Interv en tion. pp. 859–867. Springer (2018) 17. W u, E., W u, K., Cox, D., Lotter, W.: Conditional inﬁlling GANs for data augmen- tation in mammogram classiﬁcation. In: Image Analysis for Mo ving Organ, Breast, and Thoracic Images, pp. 98–106. Springer (2018) 18. Y u, Y., Qian, H., Hu, Y.Q.: Deriv ative-free optimization via classiﬁcation. In: Thir- tieth AAAI Conference on Artiﬁcial Intelligence (2016) 19. Y u, Y., Qu, W.Y., Li, N., Guo, Z.: Op en-category classiﬁcation by adversarial sample generation. International Joint Conference on Artiﬁcial Intelligence (2017) 20. Zh u, W., Lou, Q., V ang, Y.S., Xie, X.: Deep multi-instance netw orks with sparse lab el assignment for whole mammogram classiﬁcation. In: In ternational Conference on Medical Image Computing and Computer-Assisted Interv en tion. pp. 603–611. Springer (2017)

Signed Laplacian Deep Learning with Adversarial Augmentation for Improved Mammography Diagnosis

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment