Collaborative Multi-agent Learning for MR Knee Articular Cartilage Segmentation

The 3D morphology and quantitative assessment of knee articular cartilages (i.e., femoral, tibial, and patellar cartilage) in magnetic resonance (MR) imaging is of great importance for knee radiographic osteoarthritis (OA) diagnostic decision making.…

Authors: Chaowei Tan, Zhennan Yan, Shaoting Zhang

Collaborative Multi-agent Learning for MR Knee Articular Cartilage   Segmentation
Collab orativ e Multi-agen t Learning for MR Knee Articular Cartilage Segmen tation Chao wei T an 1 , Zhennan Y an 2 , Shaoting Zhang 2 , Kang Li 1 , 3 , and Dimitris N. Metaxas 1 1 Departmen t of Computer Science, Rutgers Universit y , Piscataw ay , USA 2 SenseTime Research 3 Departmen t of Orthopaedics, New Jersey Medical School, Rutgers Universit y , New ark, USA Abstract. The 3D morphology and quantitativ e assessmen t of knee ar- ticular cartilages (i.e., femoral, tibial, and patellar cartilage) in magnetic resonance (MR) imaging is of great imp ortance for knee radiographic osteoarthritis (OA) diagnostic decision making. Ho w ever, effective and efficien t delineation of all the knee articular cartilages in large-sized and high-resolution 3D MR knee data is still an op en challenge. In this paper, w e prop ose a nov el framework to solv e the MR knee cartilage segmen- tation task. The k ey contribution is the adversarial learning based col- lab orativ e multi-agen t segmentation net work. In the prop osed netw ork, w e use three parallel segmentation agents to lab el cartilages in their re- sp ectiv e region of interest (ROI), and then fuse the three cartilages by a no vel ROI-fusion la yer. The collab orative learning is driv en by an adver- sarial sub-net work. The ROI-fusion lay er not only fuses the individual cartilages from m ultiple agen ts, but also bac kpropagates the training loss from the adversarial sub-net work to each agen t to enable join t learning of shap e and spatial constraints. Extensiv e ev aluations are conducted on a dataset including hundreds of MR knee v olumes with diverse p opula- tions, and the proposed metho d shows sup erior p erformance. Keyw ords: Collaborative m ulti-agen t learning · Cartilage segmen tation 1 In tro duction Osteoarthritis (O A) is the most common c hronic health problem of h uman join ts and the knee has the highest risk of developing O A in human lifetime. The knee articular cartilages (i.e., femoral, tibial, and patellar cartilage) are essential tis- sues for knee radiographic OA diagnosis. Ec kstein et al. [2] indicated that the cartilage morphology outcomes (e.g., cartilage thickness and surface area) by measuring 3D magnetic resonance (MR) data in knee join t can help to iden tify the symptomatic and structural severit y of knee OA. Hunter et al. [4] inv esti- gated the knee cartilage defects/losses by MR imaging as one imp ortant factor of knee O A. In order to capture the wide range and thin structure of cartilages in detail, MR data is usually in large size (millions of vo xels) and high resolu- tion. Fig. 1 exhibits a 3D MR knee data from the Osteoarthritis Initiative (O AI) 2 C. T an et al. Fig. 1. (a) and (b) show the coronal and sagittal slices of a 3D MR knee data. The red, green and blue con tours indicate the femoral cartilage (FC), tibial cartilage (TC) and patellar cartilage (PC), resp ectively . (c) demonstrates the cartilage labels in 3D. Fig. 2. Flow chart of the collaborative multi-agen t learning for cartilage segmentation. database 1 , which has high resolution (0 . 365 mm × 0 . 365 mm × 0 . 7 mm ) and large size (384 × 384 × 160). Effective and efficient segmentation of all articular car- tilages in such high-resolution and large-sized data is challenging. F urthermore, the radiographic representations of cartilages ma y v ary a lot in individuals with differen t age and pathology . Although the ov er-the-coun ter deep learning meth- o ds (e.g. VNet [6]) hav e shown sup erior p erformances in many segmen tation tasks, simply applying VNet to the MR knee data ma y ha ve lo w accuracy and result in crash of training due to huge GPU memory consumption. Besides, the task of multi-cartilage classification suffers from sev ere class imbalance problem. Xu et al. [10] show ed a con textual additive net work fo cusing on the b oost of memory efficiency for cartilage segmentation. The approach is based on small o verlapping patc hes (a patch may only capture partial target) whic h may sacri- fice certain accuracy . Some previous metho ds [3,7] presen t multi-task netw orks. They in tro duce the distinctive b oundary features of organ to improv e accuracy . But the tissue of cartilage has very thin structure and its top ology ma y c hange in degenerative conditions. Xu et al. [9] segmen ted thin ob jects in 2D images through a my o cardial infarction segmen tation. Y et this 2D task-sp ecific strategy ma y still suffer from the memory issue when applying for the 3D knee data. 1 h ttp://www.oai.ucsf.edu/ Collab orativ e Multi-agent Learning for MR Knee Cartilage Segmen tation 3 Fig. 3. Ov erview of the multiple cartilage R OIs extraction (only show the sagittal view). The num b er of feature maps in the netw ork is display ed under each blo ck. In this pap er, w e prop ose a no vel segmen tation framew ork with collab orative m ulti-agent learning (shown in Fig. 2) for the task of knee cartilage lab eling in large-sized and high-resolution 3D MR data. Through region of interest (ROI) extraction, three high-resolution cartilage R OIs are fed in to differen t segmen- tation agents. The multiple agents collaborate by the help of discriminator and pro duce cartilage lab els at the end. The R OI-fusion lay er not only fuses the indi- vidual cartilages from m ultiple agen ts for discriminator, but also bac kpropagates the training errors from the adv ersarial sub-netw ork to eac h agen t to enable joint learning of shap e and spatial constraints. Suc h collab orativ e multi-agen t frame- w ork can obtain fine-grained segmen tation in each ROI and ensure the spatial constrain ts b et ween different cartilages. It satisfies the limits of GPU resources and enables smo oth training on the challenging data. The exp erimen tal results sho w that the prop osed metho d can extract all cartilages accurately . 2 Metho ds The ov erview of the prop osed framew ork is shown in Fig. 2. The coarse cartilage segmen tor and ROI extraction (i.e., N ) steps aim to efficiently lo calize and extract three local regions of F C, TC and PC, and feed the R OIs to segmentation agen ts resp ectiv ely . The blue dashed b o x sho ws the collab orativ e multi-agen t cartilage segmentation module, which consists of three segmentation agen ts, one R OI-fusion la yer (i.e., L ), and one join t-lab el discriminator. R OI extraction . In order to initialize the collab orativ e m ulti-agent learning, w e first extract the ROIs of three cartilages. As shown in Fig. 3, by utilizing the lo cation information of the multi-cartilage marks from the coarse segmen tor, the image and lab el ROIs of F C, TC and PC are extracted from the original data. The segmentor’s structure is like VNet [6], i.e., encoding-deco ding. The enco ding part contains 3 do wn-samplings (by con volutions of filter size 2 and stride 2) to obtain 3 different scales of feature maps. The deco ding part has 4 C. T an et al. Fig. 4. Demonstration of the collab orativ e multi-agen t learning framework for fine- grained cartilage segmen tation. The agents yield binary lab els and the spatial fusion op eration outputs a 4-channel result (F C, TC, PC and backgroun d). 3 up-samplings (by decon volutions of filter size 2 and stride 2) to restore the scale of feature maps to reach the original input size. The blue blo ck in this figure represents residual blo c k follow ed by a do wn-sampling or up-sampling lay er men tioned ab o ve when c hanging resolution. All the con volutional lay ers in the residual blocks hav e filter size 3, stride 1 and zero-padding 1. PReLU activ ation and batch normalization follow the conv olutional and deconv olutional lay ers. The coarse cartilage segmen tor is trained based on m ulti-class cross entrop y loss ` mce to obtain cartilage masks from the do wn-sampled MR data (e.g., 192 × 192 × 160). Collab orativ e m ulti-agen t learning . In this learning stage (sho wn in Fig. 4), w e construct one big netw ork by three individual segmen tation agents, one R OI- fusion la y er, and one adv ersarial sub-net work. The segmen tation agen t A c = { f ,t,p } ( f , t and p stand for FC, TC and PC, respectively) aims to generate fine cartilage binary mask A c ( x i,c ) in the resp ectiv e ROI x i,c (its ground truth (GT) ROI is y i,c and i is the data index). Each ROI is small enough to co ver only one cartilage in it. Since the large portion of background and other cartilages are excluded, the class im balance problem is relieved significantly . The small ROIs also re- duce the requiremen t for the computational resources (i.e., GPU memories) and enable fine-grained segmen tation in high-resolution data. All the segmentation agen ts hav e similar VNet-like pattern as the coarse segmentor. T o balance the receptiv e field of neurons and the GPU memory consumption, we further reduce the down- and up-sampling op erations to 2. Considering the thin characteristics and unclear b oundary of cartilage, w e need to better utilize the m ulti-resolution con textual features to capture its fine details. In VNet, skip connection is de- signed to merge the up-sampled high-level features I up h in deco ding path and the equiv alen t-resolution lo w-level features I l in symmetrical encoding path by sim- ple concatenation. Here, we apply an atten tion mechanism [5] to extend the skip Collab orativ e Multi-agent Learning for MR Knee Cartilage Segmen tation 5 connections. F ormally , the connecting operation b ecomes o ( α  I l , I up h ), where o denotes concatenation along the c hannel dimension, and  is elemen t-wise mul- tiplication. The attention mask α = m ( σ r ( c l ( I l ) + c h ( I up h ))) serv es as a weigh t map that guides the learning to fo cus on desired region. Here, c h and c l are t wo con volutions of filter size 1 and stride 1; σ r is an activ ation function (e.g., ReLU); m is another conv olution of filter size 1 and stride 1 with sigmoid to contract the features to a single-channel mask. Light blue blo c k in Fig. 4 represents the no vel atten tion based concatenation. Although individual agent can obtain fine segmentation in its ROI, the in- dividual learning losses the mutual constraints b et w een cartilages. In order to mak e the agents collab orate together to make use of the mutual p osition and shap e priors of all the cartilages for b etter delineations, we prop ose a collab o- rativ e learning strategy . This strategy utilizes a ROI-fusion lay er F to restore the single-cartilage output from each agent back to the original knee joint space where the mutual constraints and priors can b e enco ded. F ( A f , A t , A p ) is im- plemen ted b y using the lo cation information of the three input ROIs to fuse the fine cartilage masks bac k to the original space. Then, the multi-cartilage priors are learned implicitly b y adv ersarial learning strategy . W e utilize a discriminator sub-net work D to classify the fused m ulti-cartilage mask as “fak e” and the whole GT lab el y i as “real”. In adversarial learning, the agents and the discriminator are trained alternatively . The parameters of agents are fixed when training the discriminator, and vice verse. In this wa y , discriminator sub-netw ork can learn join t priors of multiple cartilages and guide the agents to pro duce b etter seg- men tation. It is imp ortan t to note that the lay er F not only fuses R OIs b y their co ordinates, but also passes the gradien t updates from the discriminator to the agen ts during backpropagation, so that the tw o parts can b e optimized in this alternating fashion. Since it is not in tuitiv e to judge the lab els without seeing the input in segmen tation task, we b orrow the idea of conditional generative adver- sarial nets, and treat the input MR knee image x i as the conditioning v ariable. Fig. 4 sho ws that the discriminator sub-netw ork consists of 4 down-sampling con volutional la yers, and the same residual blo ck in the agents is also emplo yed under each resolution level for con textual information learning. The input to the discriminator is a pair of MR knee image x i and multi-label cartilage mask (either the GT lab el y i or F ( A f , A t , A p )). A global av erage lay er is utilized at the end to generate a probability v alue for fake/real mask discrimination. The loss functions of discriminator and agen ts are defined in Eq. 1 and Eq. 2. Here, ` b indicates the binary cross entrop y loss. In Eq. 2, the first term L s = ` b [ A c ( x i,c ) , y i,c ] is to train each single segmentation agent. The second term L m = ` mce [ F ( A f , A t , A p ) , y i ] and the third one are applied on the fused multi- cartilage mask for joint-label learning. The discriminator D and segmen tation agen ts A c = { f ,t,p } are alternatively trained by minimizing Eq. 1 and Eq. 2. X i { ` b [ D ( x i , y i ) , 1] + ` b [ D ( x i , F ( A f , A t , A p )) , 0] } (1) X i  X c = { f ,t,p } L s ( x i,c , y i,c ) + L m + ` b [ D ( x i , F ( A f , A t , A p )) , 1]  (2) 6 C. T an et al. T able 1. Quantitativ e comparisons of approaches: mean and std of ev aluation metrics. F emoral Cartilage Tibial Cartilage P atellar Cartilage All Cartilages DSC VOE ASD DSC VOE ASD DSC VOE ASD DSC VOE ASD D 1 0.862 24.15 0.103 0.869 22.93 0.104 0.844 26.65 0.107 0.866 23.59 0.095 0.024 3.621 0.042 0.034 5.184 0.061 0.052 7.429 0.049 0.023 3.475 0.026 D 2 0.832 28.64 0.131 0.879 21.38 0.088 0.861 23.69 0.091 0.851 25.94 0.111 0.025 3.618 0.059 0.038 5.972 0.055 0.040 6.027 0.051 0.023 3.393 0.036 C 0 0.814 31.30 0.205 0.806 32.42 0.199 0.771 35.74 0.350 0.809 31.99 0.213 0.029 4.155 0.095 0.033 4.577 0.055 0.132 14.56 0.129 0.031 4.350 0.095 P1 0.868 23.19 0.108 0.854 25.17 0.126 0.824 28.78 0.201 0.862 24.24 0.110 0.023 3.514 0.067 0.029 4.173 0.059 0.104 12.45 0.439 0.023 3.457 0.048 P2 0.900 18.82 0.074 0.889 19.81 0.082 0.880 21.19 0.075 0.893 19.19 0.073 0.037 6.006 0.041 0.038 6.072 0.051 0.043 6.594 0.038 0.034 5.434 0.034 3 Exp erimen ts Exp erimen tal settings . W e v alidate our prop osed metho d on the iMorphics dataset from the OAI database. This set includes 176 3D MR (sagittal DESS sequences) knee images. The set is splitted in to training: 120, v alidation: 26, test- ing: 30. P atients are randomly and exclusively used in the three subsets. Fixed R OI size of each type of cartilage is pre-defined based on adequate ev aluation on the training data. W e compare the prop osed metho d with the state-of-the- art dense atrous spatial pyramid p ooling (DenseASPP) for semantic segmen- tation [11]. It integrates the ASPP arc hitecture in a dense connection manner, whic h is able to generate large receptiv e field and multi-scale features for segmen- tation tasks. W e also ev aluate performances of the prop osed coarse segmentor and individual agen ts to sho w the effectiv eness of the collaborative learning. Dice similarit y co efficien t (DSC), v olumetric o verlap error (V OE) and a v erage surface distance (ASD) b etw een the GT lab els and segmented results are rep orted. In the training (no pre-trained weigh ts used), we set the batc h size to 1 and multi- ply a factor of 0.95 every 10 ep o c hs to reduce the learning rate (LR). The Adam (with initial LR 0.001) and stochastic gradient descent (SGD, with initial LR 0.0002) solv ers are used for each agent and the discriminator. All the netw orks are trained and tested by a 12GB-RAM Titan X GPU. Exp erimen tal results . Quantitativ e comparisons are sho wn in T able 1. C 0 rep- resen ts the coarse cartilage extraction b y the segmentor in Fig. 3. P1 denotes the fused results generated by the prop osed segmentation agents, without the join t learning by the adv ersarial sub-net w ork. P2 represents results from the proposed metho d by employing the collaborative multi-agen t learning framew ork as in Fig. 4. F or comparison, w e in tegrate tw o v arian ts of DenseASPP into the collab ora- tiv e m ulti-agent framework. In the first v arian t D 1, the residual blo c ks and skip connections are replaced by DenseASPP blo c ks in the tw o down-sampled levels of the agent netw ork. While in the second v arian t D 2, only the deep est level is replaced with DenseASPP blo c k. Collab orativ e Multi-agent Learning for MR Knee Cartilage Segmen tation 7 Fig. 5. Results of sub ject 1. (a) and (b) show the segmen tation and GT lab els for FC (red), TC (green), and PC (blue) in sagittal view. (c) is the segmented 3D cartilages. Fig. 6. Results of sub ject 2. (a) shows the segmented cartilages in sagittal view. (b) and (c) demonstrate the GT and segmen tation results in 3D view. F rom the table, w e can see that the prop osed segmentation P2 achiev es the b est p erformance in all metrics. The mean results of C 0 (i. e., a similar implemen- tation of VNet) are relatively go od and hav e no gross failure in our experiments. This sho ws that the coarse stage is reliable initialization. The proposed P2 obvi- ously outp erforming P1 sho ws that segmen tation agents are improv ed with the help of the proposed collab orative learning strategy . The o verall performances of the DenseASPP based v arian ts D 1 and D 2 are close to that of P1 . It indicates that the proposed agent net w ork with the atten tion based concatenation is effec- tiv e enough, compared to the DenseASPP blocks which ha ve more complicated arc hitecture. In addition, the results of the prop osed method are comparable to those reported in some recen t studies [1,10]. Xu et al. [10] rep orted a total DSC (0 . 887 ± 0 . 024) v alue of FC and TC. Am b ellan et al. [1] utilized b oth 2D and 3D deep learning based segmen tations with statistical shap e mo dels as shap e refinemen t p ostprocessing for femoral and tibial cartilages extraction. Using a similar set from O AI, they ach ieved 2 DSC (0 . 893 ± 0 . 024), VOE (19 . 4 ± 3 . 87) and ASD (0 . 19 ± 0 . 09) for FC, DSC (0 . 881 ± 0 . 038), VOE (21 . 05 ± 5 . 808) and ASD (0 . 223 ± 0 . 143) for TC. Without the sophisticated shap e adjustmen t step, 2 [1] separately presen ts the results of FC, medial TC and lateral TC at tw o timep oin ts. F or con v enience, w e av erage these results and get the appro ximate mean/std metrics. 8 C. T an et al. the prop osed metho d acquires the comparable DSC and V OE scores, and muc h lo wer surface distance errors. Hence, the prop osed framework can b e used to automatically generate reliable assessments of all imp ortant articular cartilages in quantitativ e analysis for knee OA. Visualization results (tw o examples) of the prop osed metho d are show ed in Fig. 5 and 6. The tw o patien ts ha v e ob vious shap e v ariance of cartilages. In Fig. 5 (a)-(c), the prop osed method can accurately extract most of the cartilage regions and obtain smooth tissue boundaries. F urthermore, as indicated by green dashed circles in Fig. 5 (a) and (c), our metho d can effectively capture a small cartilage defect. The green dashed circles in Fig. 6 (a) and (c) indicate a p ossible cartilage damage/miss symptom well captured by our metho d. The 3D view exhibiting accurate 3D pattern of cartilage defects could b e very useful in visual study of cartilage-related diseases. The yello w arro ws in Fig. 6 (c) show some minor errors o ccurred in some neighborho o d areas due to unclear b oundaries. 4 Conclusions In this paper, we present a nov el fully automatic method to segment three knee cartilages in 3D MR images based on a collab orativ e multi-agen t learning ar- c hitecture. Eac h segmen tation agent depicts the high-resolution cartilage mask in its coarsely (but efficien tly) lo cated ROI. A no vel skip connection b y multi- resolution attention mechanism is introduced to enhance the feature extraction of target, while suppressing confusing information in neighborho o d areas. Then, the depicted m ultiple ROIs are spatially fused into the original space to form a m ulti-cartilage label image for collab orative learning. The collab oration of agents is implemen ted by the no v el ROI-fusion la y er follow ed b y an adversarial discrim- inator to ensure the shap e and p osition constraints. Learning of the agents and discriminator are conducted in an alternating fashion. In our exp erimen ts, the prop osed metho d ac hieves robust and accurate segmentation for all imp ortan t articular cartilages in high resolution and large 3D MR knee data. In future w e will apply the metho d for quantifing cartilage biomark ers (e.g., volume, thic k- ness, surface area) in large-scale studies and detecting cartilage defects for lesion estimation [2,4]. Besides the cartilages, the prop osed framework could also b e extended for other m ulti-organ segmen tation tasks [8]. References 1. Am b ellan, F., T ack, A., Ehlke, M., Zacho w, S.: Automated segmen tation of knee b one and cartilage combining statistical shape kno wledge and conv olutional neural net works: Data from the osteoarthritis initiativ e. Medical Image Analysis (2018) 2. Ec kstein, F., Wirth, W.: Quantitativ e cartilage imaging in knee osteoarthritis. Arthritis 2011 (2010) 3. He, K., Cao, X., Shi, Y., Nie, D., Gao, Y., Shen, D.: P elvic organ segmen tation using distinctiv e curv e guided fully con volutional netw orks. IEEE transactions on medical imaging 38 (2), 585–595 (2019) Collab orativ e Multi-agent Learning for MR Knee Cartilage Segmen tation 9 4. Hun ter, D.J., Guermazi, A., Lo, G.H., Grainger, A.J., Conaghan, P .G., Boudreau, R.M., Ro emer, F.W.: Evolution of semi-quantitativ e whole joint assessment of knee OA: MOAKS (MRI Osteoarthritis Knee Score). Osteoarthritis and Cartilage 19 (8), 990–1002 (2011) 5. Jetley , S., Lord, N.A., Lee, N., T orr, P .H.: Learn to pa y attention. arXiv preprint arXiv:1804.02391 (2018) 6. Milletari, F., Nav ab, N., Ahmadi, S.A.: V-net: F ully conv olutional neural netw orks for v olumetric medical image segmen tation. In: 3D Vision (3D V), 2016 F ourth In ternational Conference on. pp. 565–571. IEEE (2016) 7. T an, C., Zhao, L., Y an, Z., Li, K., Metaxas, D., Zhan, Y.: Deep m ulti-task and task- sp ecific feature learning net work for robust shap e preserved organ segmentation. In: ISBI. pp. 1221–1224. IEEE (2018) 8. Uzun ba¸ s, M.G., Chen, C., Zhang, S., P ohl, K.M., Li, K., Metaxas, D.: Collab orativ e m ulti organ segmentation by in tegrating deformable and graphical mo dels. In: MICCAI. pp. 157–164. Springer (2013) 9. Xu, C., Xu, L., Brahm, G., Zhang, H., Li, S.: Mutgan: Simultaneous segmenta- tion and quantification of my o cardial infarction without con trast agents via joint adv ersarial learning. In: MICCAI. pp. 525–534. Springer (2018) 10. Xu, Z., Shen, Z., Niethammer, M.: Contextual additive netw orks to efficiently b oost 3d image segmentations. In: Deep Learning in Medical Image Analysis and Multimo dal Learning for Clinical Decision Support, pp. 92–100. Springer (2018) 11. Y ang, M., Y u, K., Zhang, C., Li, Z., Y ang, K.: Denseaspp for semantic segmentation in street scenes. In: CVPR. pp. 3684–3692 (2018)

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment