Signed Laplacian Deep Learning with Adversarial Augmentation for Improved Mammography Diagnosis

Computer-aided breast cancer diagnosis in mammography is limited by inadequate data and the similarity between benign and cancerous masses. To address this, we propose a signed graph regularized deep neural network with adversarial augmentation, name…

Authors: Heyi Li, Dongdong Chen, William H. Nailon

Signed Laplacian Deep Learning with Adversarial Augmentation for   Improved Mammography Diagnosis
Signed Laplacian Deep Learning with Adv ersarial Augmen tation for Impro v ed Mammograph y Diagnosis Heyi Li 1 ? , Dongdong Chen 1 * , William H. Nailon 2 , Mik e E. Davies 1 , and Da vid I. Laurenson 1 1 Institute for Digital Communications, Univ ersit y of Edinburgh, Edin burgh, UK { Heyi.Li,d.chen, dave.laurenson, mike.davies } @ed.ac.uk 2 Oncology Ph ysics Department, Edin burgh Cancer Centre, W estern General Hospital, Edin burgh bill.nailon@luht.scot.nhs.uk Abstract. Computer-aided breast cancer diagnosis in mammograph y is limited by inadequate data and the similarit y betw een b enign and cancerous masses. T o address this, w e prop ose a signed graph regularized deep neural net work with adv ersarial augmentation, named Dia gNet . Firstly , w e use adv ersarial learning to generate p ositiv e and negativ e mass-con tained mammograms for eac h mass class. After that, a signed similarit y graph is built up on the expanded data to further highlight the discrimination. Finally , a deep con v olutional neural netw ork is trained b y jointly optimizing the signed graph regularization and classification loss. Experiments sho w that the DiagNet framew ork outperforms the state-of-the-art in breast mass diagnosis in mammography . Keyw ords: Deep learning · Mammography Diagnosis · Adv ersarial learn- ing · Graph regularization 1 In tro duction Breast cancer is one of the most frequently diagnosed mortal diseases for w omen all ov er the world [6]. Mammography is extensively applied and computer-aided diagnosis systems (CADs) are often emplo yed as a second reader. Leveraging the recen t success of deep neural netw orks on representation learning, deep learning based CADs [7, 11, 12, 16, 17, 20] outp erform traditional metho ds, which rely heav- ily on handcrafted features. How ev er, t wo ma jor c hallenges in mammographic CADs remain (1): limited access to well annotated data [7] and (2): the similar- it y b et ween b enign and cancerous masses. T o alleviate the impact of inadequate data, [7, 11, 12, 20] applied classical geometric transformations for data augmen ta- tion (e.g. flips, rotations, random crops etc), and more recently , [16, 17] generated syn thetic images on the manifold of real mammograms using adv ersarial learning [9], which enjoys a pow erful ability to learn the unknown underlying distribution. Unfortunately , the follo wing questions remain unansw ered: What kind of data ? These authors con tribute equally to this work. 2 H. Li et al. augmen tation is most helpful for CADs in mammograph y? How can we alleviate the impact of the similarity betw een data, i.e., ho w can we maximize the margin b et w een manifolds with a small difference? In this paper, we prop ose a new deep learning framework that improv es mam- mograph y diagnosis as follows. Firstly , we prop ose an adversarial data augmen- tation strategy , in which both p ositiv e and negative samples of specific classes are generated in an unsup ervised manner, in order to mak e more distinct b ound- aries b et ween differen t classes. After that, w e build a signed graph Laplacian ov er the augmen ted data to quantitativ ely capture the geometric structure of data. Finally , we train a deep neural netw ork b y jointly optimizing the graph regular- ization and classification loss, by whic h the in tra-class difference is minimized, and more imp ortan tly , the inter-class manifold margin is maximized in the deep represen tation space. Extensive exp erimen ts sho w that the prop osed DiagNet outp erforms the state-of-the-art of breast masses diagnosis in mammography . 2 Preliminary 2.1 Adv ersarial learning Adv ersarial learning is a tec hnique that attempts to fo ol mo dels through mali- cious input [10] and has achiev ed impressiv e results in represen tation learning. The key idea of the success of a Generative Adversarial Net work (GAN) is to force the output of the generator to b e indistinguishable from the real input [9]. Adv ersarial training is particularly p o w erful for image generation, and for learn- ing unknown and complicated distributions from the training data. In this paper, w e prop ose to use adversarial learning to generate off-distribution instances along with on-distribution instances in order to enlarge the medical image data. 2.2 Manifold Learning In real applications, data typically reside on a low-dimensional manifold embed- ded in to a high-dimensional ambien t space [15]. Manifold learning is extensively explored because of its effectiveness for preserving the top ological locality , whic h relies on the assumption that neigh b ors tend to ha ve the same labels [3]. In this pap er, we aim to incorp orate graph embedding in to a deep neural net work as a regularizer in the latent space. In addition, lo cal data manifold structure preserv ation within the hidden represen tations in deep neural netw orks offers the p ossibilit y of improving the p erformance of the classifier [4]. 3 Prop osed Metho d In this section, w e formally introduce the details of DiagNet , which is comp osed of three steps as shown in Fig.1: (1) adversarial augmentation, (2) a signed graph Laplacian built upon the augmented data and (3) join t optimization of the classifier loss and signed graph regularizer. W e first define the notation applied Signed Laplacian Deep Learning with Adversarial Augmen tation 3 +1 -1 -1 -1 +1 -1 (2) B ui ld Si gn ed Gr aph (1) A dv ers ari al A ug men tat i on (3) J oi n t Op ti mi zati o n Original Data Space M a x i m i s e M a n i f o l d M a r g i n Deep Latent Space Augmented Data Space Augmented Data Space (a) DC Block, 128 DC Block, 256 DC Block, 728 DC Block, 728 RC Block, 728 RC Block, 728 RC Block, 728 RC Block, 728 RC Block, 728 RC Block, 728 RC Block, 728 Dense, 1024 Dense, 1024 Dense, 1024 SoftMax Layer R C B l o c k f(x)+x ReLu SConv SConv x Di agNet D C B l o c k ReLu Pooling SConv SConv f(x)+x x (b) Fig. 1: The prop osed DiagNet for Breast Mass Diagnosis. (a) the framework of the prop osed algorithm, which consists of three steps. { x 1 , x 2 } and { x 3 , x 4 } are samples on benign manifold M b and malignant manifold M m resp ectiv ely . In the first step, i.e. adversarial data augmentation, positive neigh bors { x 5 , x 7 } and negativ e neighbors { x 6 , x 8 } are generated with (1) and (2) resp ectiv ely . Then a signed graph is built up on b oth original and augmented samples as (3). Finally , a joint loss (6) is optimized in the deep latent space, maximizing data manifold margin. (b) The utilized deep netw ork architecture. “DC block” represents a do wn-sampling conv olutional blo c k, “RC blo c k” is a residual conv olutional blo c k, and “SCon v” is separable conv olutions. throughout the paper. Let { X , Y } = { x i , y i } n i =1 b e the n mammograms with corresp onding lab els, where x i ∈ R H × W is an image sample and y i ∈ { y c } C c =1 is the class lab el. Let { X c , y c } denote the c -th class data. 3.1 Adv ersarial Augmentation As also mentioned in section 1, inadequate data and the similarit y b et ween b enign and cancerous masses [7] are t wo main reasons causing high false pos- itiv es in mammographic CADs. Recently , [1, 16, 17] emplo yed GANs to create new instances. Even though they generated on-distribution samples that are not separable by discriminators, they ignored the imp ortance of distinguishable but similar instances, which tend to improv e the discriminative ability . T o o vercome this shortcoming, as shown in Fig.1(a), we prop ose to use adversarial learning to generate more instances of b oth p ositive neighb ors (i.e. instances on the man- ifold, e.g. x 5 and x 7 ) and ne gative neighb ors (i.e. instances off the manifold, e.g. 4 H. Li et al. x 6 and x 8 ). Here, there are defined tw o manifolds: M b for b enign images and M m for malignan t images. In particular, inspired by [19], we generate neigh b oring instances one by one for a certain data class { X c , y c } , c = 1 , 2 , · · · , C , where C = 2 in this pap er. Sp ecifically , b oth p ositiv e and negative neighbors are generated based on the noise corrupted seed p oin ts (a n umber of randomly selected samples in X c ) and they are b oth close to the original data points. In particular, the p ositiv e neighbors X + c are the generated samples that cannot b e separated from X c b y a discriminator, while the negativ e neighbors X − c are the ones that can b e separated. Finally , the expanded dataset for class c is of the form X c = { X c ∪ X + c ∪ X − c } , and the whole dataset is X = S c X c . Let x b e a desired new sample and P ( x ; X c , X + c ) b e the probability that x is classified as class c b y a discriminator trained on { X c , X + c } . Similarly P ( x ; X c , X − c ) corresp onds to a discriminator trained on { X c , X − c } . Note that X + c and X − c are initialized as empty . In this pap er, we trained tw o SVM classi- fiers as the discriminators and the corresp onding output probability is obtained with logistic sigmoid of the output signed distance. Accordingly , a set of neigh- b oring instances { x t } T t =1 of X c are iterativ ely generated. In each iteration t , the discriminator is learned and the w eights are updated. After T iterations of training, w e select one desired p ositiv e neighbor x : arg max x P  x ; X c , X + c ∪ { x t } T t =1  − γ max { 0 , r 1 − min x i ∈ X + c d ( x , x i ) } , (1) where d ( · ) is a distance measure, γ w eights the distance regularization, forcing generated p oin ts to be different with a minim um distance r 1 . Similarly , w e select one desired negativ e neigh b or x , with an added distance restriction to force new p oin ts to b e scattered close to X c : arg min x P  x ; X c , X − c ∪ { x t } T t =1  + γ max { 0 , r 2 − min x j ∈ X − c d ( x , x j ) } + γ max { 0 , min x i ∈ X c d ( x , x i ) − r 3 } , (2) where the distance regularization forces generated p oin ts to acquire a minimum distance r 2 and maxim um distance r 3 . 3.2 Signed graph Laplacian regularizer Graph embedding trained with distributional context can b o ost p erformance in v arious pattern recognition tasks. In this pap er, we aim to incorp orate the signed graph Laplacian regularizer [2] to learn a discriminativ e datum represen tation H ( X ) b y a deep neural netw ork, where discriminativ e here means that the in tra- class data manifold structure is preserved in the laten t space and the inter- manifold (sligh tly different) margins are maximized. Using the sup ervision of the adversarial augmentation in section 3.1, w e build a signed graph up on the expanded data X . Given X c = { X c , X + c , X − c } for class Signed Laplacian Deep Learning with Adversarial Augmentation 5 c , and all other classes data X − c = S t =1 , ··· ,C ; t 6 = c { X t , X + t , X − t } , for ∀ x i ∈ X c , the corresp onding elements in the signed graph is built as follows: φ ij = ( +1 , x j ∈ { X c ∪ X + c } n + i , − 1 , x j ∈ { X − c ∪ X − c } n − i , (3) where the {·} n + i ( {·} n − i ) denotes the corresp onding n + ( n − ) nearest neigh b orhoo d of x i to appro ximate the lo calit y of the manifold. Then, we compute the structure preserv ation in the deep representation space (directly behind the softmax lay er as sho wn in Fig.1(b)) H = { h ( x i ) } N i =1 , where N = |X | . The signed graph Laplacian regularizer is defined as following: J g ( X , Φ ) = X i,j ( φ ij · dist ( h ( x i ) , h ( x j )) , if φ ij > 0 max  0 , m + φ ij · dist ( h ( x i ) , h ( x j ))  , if φ ij < 0 , (4) where dist ( · ) is a distance metric for the dissimilarity betw een h ( x i ) and h ( x j ). It encourages similar examples to b e close, and those that are dissimilar to hav e a distance of at least m eac h other, where m > 0 is a margin. Note that instead of calculating the manifold em b edding b y solving an eigen- v alue decomp osition, w e learn the embedding H by a deep neural netw ork. Specif- ically , inspired by the depth-wise separable conv olutions [5] that are extensively emplo yed to learn mappings with a series of factoring filters, we build stac ks of depth-wise separable con volutions with similar top ological architecture to that in [5] to learn suc h deep representations (Fig.1(b)). Therefore, b y minimizing (4), it is exp ected that if t w o connected no des x i and x j are from the same class (i.e. φ ij is p ositiv e), h ( x i ) and h ( x j ) are also close to each other, and vice versa. Benefiting from suc h learned discriminativity , w e train a simple softmax classifier to predict the class lab el, i.e., J l = − 1 N N X i =1 C X c =1 δ c ( y i ) log P  y i | x i ; θ  , (5) where δ c ( y i ) = 1 when y i = c , and 0 otherwise; θ is the parameter set of the neural net work. Finally , by incorp orating the signed Laplacian regularizer (4) and the classi- fication loss (5), the total ob jective of Dia gNet is accordingly defined as: J = J l + λ J g , (6) where λ ≥ 0 is the regularization trade-off parameter which controls the smo oth- ness of hidden represen tations. 4 Exp erimen ts 4.1 Datasets and ROIs selection The Dia gNet is ev aluated on the most frequently used full-field digital mam- mographic dataset, INbreast [13]. 107 mass contained mammograms are divided 6 H. Li et al. in to a training and a test set con taining 80% and 20% of the images respectively . As for ROIs selection, rectangular mass-con tained boxes are selected with pro- p ortional padding (1 . 6 times) up on original R OI bounding b o xes. The selected R OIs are augmen ted with flips and further adversarially augmented b y 40% more (20% p ositiv e neighbors and 20% negative neighbors). 4.2 Implemen tation Details W e first solve the prop osed adversarial augmentation in (1) and (2) by the deriv ativ e-free optimization approac h RA COS algorithm [18]. The distance mea- sure d ( · ) in (1) and (2) is set to be the angular cosine distance because of its sup erior discriminativ e information [14]. Let ρ = min x i , x j ∈X c d ( x i , x j ), then we set the radius parameters r 1 , r 2 = ρ , and r 3 = 3 × ρ for X c . F urther T = 200 and γ is 10 − 2 . Secondly , the signed graph is built up on augmented data X . F or each graph no de, n + and n − in (3) are optimally c hosen as 1 and 4 resp ectiv ely using grid searc h. In addition, the metric dist ( · ) in (4) is also the angular cosine distance and m is 1. Finally , the deep neural netw ork is built with stac ks of 3 × 3 kernel-sized sep- arable conv olutional la y ers. The first three blo c ks are equipped with increasing feature maps (128, 256, 728) and decreasing spatial squared size (224, 112, 56), and the consecutive sev en blo c ks keep the same feature map with size 28. After global av eraging and three fully connected la y ers of 1024 neurons, a softmax la yer is padded for lab el prediction. Drop out la yers with 50% drop out rate and w eight decay with l 2 norm rate 10 − 4 are used to preven t o ver-fitting. Residual skips are added in order to solve the gradient diverging and v anishing problems. The regularization parameter λ in (6) is optimally c hosen as 1. 4.3 Results and analysis Adv ersarial Augmen tation: T o examine the qualit y of generated images by the prop osed adv ersarial augmentation strategy , w e carry out the exp erimen t on the INbreast dataset. Fig.2 visually sho ws the augmented examples. It can b e seen that, for either mass t ype, the generated p ositiv e and negativ e neigh b ors are b oth similar to the original data, but the negative neighbors are more different. Compare to the state-of-art: W e v alidate DiagNet ’s p erformance with accuracy and AUC (area under the R OC curv e) scores. T able.1 compares the state-of-art algorithms, in which [11] is re-implemen ted and the results of the remaining ones are taken from the original papers. It shows that, the Diag- Net has ac hieved the state-of-art with mean accuracy 93.4% and AUC score 0.95. When compared with the second b est algorithm [16], the DiagNet ’s AUC score is significantly higher with exp erimen ts on the whole dataset without any pre-pro cessing, p ost-processing or transfer learning. In addition, empirical ob- serv ations sho w that our mo del is robust to noise and geometric transforms, and these results are omitted due to the space limitation. Signed Laplacian Deep Learning with Adversarial Augmentation 7 (a) Benign Masses (b) Malignant Masses Fig. 2: Generated mammogram examples by prop osed adv ersarial augmentation strategy . The masses in the first ro w of b oth (a) and (b) are original data, the second and the third ro w are generated p ositiv e and negativ e neigh b ors, resp ectiv ely . T able 1: Breast Mass Diagnosis performance comparisons of the proposed Di- agNet and relative state-of-the art metho ds on INbreast test set. Metho dology End-to-end Accuracy A UC (2012) Domingues et. al [8] 5 89% N/A (2016) Dh ungel et. al [7] 3 91% 0.76 (2017) Zh u et. al [20] 3 90% 0.89 (2018) Shams et. al [16] 3 93% 0.92 (2019) Li et. al [11] 3 88% 0.92 prop osed DiagNet 3 93 . 4 ± 1 . 9% 0 . 950 ± 0 . 02 Imp ortance of Signed Graph Laplacian regularizer: Determining the optimal v alues of h yper-parameter is a big challenge in deep learning. T o explore DiagNet ’s performance with differen t signed graph configurations, the v alues of n + and n − are first grid searched with fixed regularization parameter λ = 1, as sho wn in Fig.3(a). The b est performance o ccurs when n + = 1 and n − = 4, whic h increases at least b y 8% the accuracy rate and by 12% the AUC score compared to the baseline (no graph regularization, n + , n − = 0). This confirms the effectiveness of using the signed graph regularization. In addition, results sho w that the DiagNet achiev es go od p erformance only when b oth n + and n − are considered in the corresp onding singed graph construction. Fig.3 sho ws the p erformances with v arious v alues of λ , where the b est result o ccurs at λ = 1. 5 Conclusions In this paper, w e prop osed a Dia gNet for improv ed mammogram image anal- ysis. By in tegrating the signed graph regularizer and the adv ersarial sampling augmen tation, Dia gNet w orks in a simple but effective wa y to learn discrimi- nativ e features. Extensiv e exp erimen ts sho w that our metho d outperforms state- of-the-art on breast mass diagnosis in mammograph y . 8 H. Li et al. 85% 91% 89% 89% 91% 93% 88% 87% 83% 90% 90% 94% 94% 95% 89% 88% (0,0) (5, 5) (4,1) (5,0) (10,0) (1, 4) (0,5) (0, 10) Ac cura cy AUC (a) Configurations of ( n + , n − ) 85% 90% 89% 91% 93% 92% 83% 94% 91% 88% 95% 92% 0 0.001 0.01 0.1 1 10 Ac cura cy AUC (b) Configurations of λ Fig. 3: P erformance of Dia gNet on INBreast with v arying parameters. Classi- fication accuracy and AUC score versus (a) differen t n + p ositiv e neighbors and n − negativ e neighbors and (b) v arious regularizer parameter λ . References 1. An toniou, A., Storkey , A., Edwards, H.: Data augmentation generativ e adversarial net works. arXiv preprint arXiv:1711.04340 (2017) 2. Chen, D., Lv, J., Davies, M.E.: Learning discriminative represen tation with signed Laplacian restricted Boltzmann machine. arXiv preprint arXiv:1808.09389 (2018) 3. Chen, D., Lv, J., Yi, Z.: Unsup ervised multi-manifold clustering by learning deep represen tation. In: W orkshops at the 31th AAAI conference on artificial in telligence (AAAI). pp. 385–391 (2017) 4. Chen, D., Lv, J., Yi, Z.: Graph regularized restricted b oltzmann machine. IEEE T ransactions on Neural Netw orks and Learning Systems 29 (6), 2651–2659 (2018) 5. Chollet, F.: Xception: Deep learning with depthwise separable con volutions. In: Pro ceedings of the IEEE conference on computer vision and pattern recognition. pp. 1251–1258 (2017) 6. DeSan tis, C., Ma, J., Bryan, L., Jemal, A.: Breast cancer statistics, 2013. CA: a cancer journal for clinicians 64 (1), 52–62 (2014) 7. Dh ungel, N., Carneiro, G., Bradley , A.P .: The automated learning of deep fea- tures for breast mass classification from mammograms. In: International Confer- ence on Medical Image Computing and Computer-Assisted In terven tion. pp. 106– 114. Springer (2016) 8. Domingues, I., Sales, E., Cardoso, J., P ereira, W.: INbreast-database masses char- acterization. XXI II CBEB (2012) 9. Go odfellow, I., P ouget-Abadie, J., Mirza, M., Xu, B., W arde-F arley , D., Ozair, S., Courville, A., Bengio, Y.: Generativ e adversarial nets. In: Adv ances in neural information pro cessing systems. pp. 2672–2680 (2014) 10. Kurakin, A., Goo dfello w, I.J., Bengio, S.: Adversarial machine learning at scale (2017) 11. Li, H., Chen, D., Nailon, W.H., Davies, M.E., Laurenson, D.: A deep dual-path net work for improv ed mammogram image pro cessing. International Conference on Acoustics, Sp eec h and Signal Processing (2019) 12. Li, H., Chen, D., Nailon, W.H., Da vies, M.E., Laurenson, D.: Impro ved breast mass segmen tation in mammograms with conditional residual U-Net. In: Image Analysis for Mo ving Organ, Breast, and Thoracic Images, pp. 81–89. Springer (2018) Signed Laplacian Deep Learning with Adversarial Augmentation 9 13. Moreira, I.C., Amaral, I., Domingues, I., Cardoso, A., Cardoso, M.J., Cardoso, J.S.: INbreast: to ward a full-field digital mammographic database. Academic radiology 19 (2), 236–248 (2012) 14. Nair, V., Hinton, G.E.: Rectified linear units impro ve restricted Boltzmann ma- c hines. In: Proceedings of the 27th international conference on machine learning (ICML-10). pp. 807–814 (2010) 15. Seung, H.S., Lee, D.D.: The manifold w a ys of perception. science 290 (5500), 2268– 2269 (2000) 16. Shams, S., Platania, R., Zhang, J., Kim, J., Park, S.J.: Deep generative breast cancer screening and diagnosis. In: In ternational Conference on Medical Image Computing and Computer-Assisted Interv en tion. pp. 859–867. Springer (2018) 17. W u, E., W u, K., Cox, D., Lotter, W.: Conditional infilling GANs for data augmen- tation in mammogram classification. In: Image Analysis for Mo ving Organ, Breast, and Thoracic Images, pp. 98–106. Springer (2018) 18. Y u, Y., Qian, H., Hu, Y.Q.: Deriv ative-free optimization via classification. In: Thir- tieth AAAI Conference on Artificial Intelligence (2016) 19. Y u, Y., Qu, W.Y., Li, N., Guo, Z.: Op en-category classification by adversarial sample generation. International Joint Conference on Artificial Intelligence (2017) 20. Zh u, W., Lou, Q., V ang, Y.S., Xie, X.: Deep multi-instance netw orks with sparse lab el assignment for whole mammogram classification. In: In ternational Conference on Medical Image Computing and Computer-Assisted Interv en tion. pp. 603–611. Springer (2017)

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment