Automatic Segmentation of Vestibular Schwannoma from T2-Weighted MRI by Deep Spatial Attention with Hardness-Weighted Loss
Automatic segmentation of vestibular schwannoma (VS) tumors from magnetic resonance imaging (MRI) would facilitate efficient and accurate volume measurement to guide patient management and improve clinical workflow. The accuracy and robustness is cha…
Authors: Guotai Wang, Jonathan Shapey, Wenqi Li
Automatic Segmen tation of V estibular Sc h w annoma from T2-W eigh ted MRI b y Deep Spatial A tten tion with Hardness-W eigh ted Loss Guotai W ang 1 , 2 , 3 , Jonathan Shap ey 2 , 3 , 7 , W enqi Li 4 , Reub en Doren t 2 , Alex Demitriadis 5 , Sotirios Bisdas 6 , Ian P addick 5 , Rob ert Bradford 5 , 7 , S ´ ebastien Ourselin 2 , and T om V ercauteren 2 1 Sc ho ol of Mec hanical and Electrical Engineering, Universit y of Electronic Science and T echnology of China, Chengdu, China 2 Sc ho ol of Biomedical Engineering and Imaging Sciences, King’s College London, London, UK 3 W ellcome/EPSRC Cen tre for Interv entional and Surgical Sciences, Universit y College London, London, UK 4 NVIDIA, Cambridge, UK 5 Queen Square Radiosurgery Centre (Gamma Knife), National Hospital for Neurology and Neurosurgery , London, UK 6 Neuroimaging Analysis Centre, Queen Square, London, UK 7 Departmen t of Neurosurgery , National Hospital for Neurology and Neurosurgery , London, UK guotai.wang@uestc.edu.cn Abstract. Automatic segmentation of v estibular sch wannoma (VS) tu- mors from magnetic resonance imaging (MRI) w ould facilitate efficient and accurate v olume measurement to guide patien t management and impro ve clinical w orkflow. The accuracy and robustness is c hallenged b y low contrast, small target region and low through-plane resolution. W e in tro duce a 2.5D con volutional neural net work (CNN) able to exploit the differen t in-plane and through-plane resolutions encountered in stan- dard of care imaging proto cols. W e use an attention mo dule to enable the CNN to fo cus on the small target and prop ose a sup ervision on the learning of atten tion maps for more accurate segmentation. Addition- ally , we propose a hardness-w eighted Dice loss function that giv es higher w eights to harder vo xels to b o ost the training of CNNs. Exp eriments with ablation studies on the VS tumor segmen tation task show that: 1) the prop osed 2.5D CNN outp erforms its 2D and 3D coun terparts, 2) our sup ervised atten tion mechanism outp erforms unsup ervised attention, 3) the vo xel-lev el hardness-weigh ted Dice loss can impro ve the performance of CNNs. Our metho d ac hieved an a verage Dice score and ASSD of 0.87 and 0.43 mm resp ectiv ely . This will facilitate patien t managemen t deci- sions in clinical practice. 1 In tro duction V estibular sch w annoma (VS) is a b enign tumor arising from one of the balance nerv es connecting the brain and inner ear. The incidence of VS has risen sig- 2 G. W ang, et al. ( b) T2 axial ( c) T2 sagittal ( d) T2 coronal ( a) ceT1 axial Fig. 1. An example of VS tumor. (a): con trast-enhanced T1-weigh ted MRI. (b)-(d): T2-w eighted MRI. Note the small target region, low con trast in T2, and lo w resolution in sagittal and coronal views. nifican tly in recent y ears and is now estimated to b e b etw een 14 and 20 cases p er million p er year [4]. High-qualit y magnetic resonance imaging (MRI) is re- quired for diagnosis and exp ectant management with serial imaging is usually advised for smaller tumors. Curren t MR proto cols include contrast-enhanced T1-w eighted (ceT1) and high-resolution T2-w eighted (hrT2) images, but there is increasing concern ab out the p oten tially harmful cumulativ e side-effects of gadolinium con trast agents. Accurate measuremen t of VS tumor volume from MRI is desirable for growth detection and guiding managemen t of the tumor. Ho wev er, curren t clinical practice relies on labor-intensiv e man ual segmentation. This pap er aims for automatic segmentation of the VS tumor from high- resolution T2-weigh ted MRI. This will improv e clinical workflo w and enable pa- tien ts to undergo surveillance imaging without the need for gadolinium con trast, th us impro ving patien t safet y . Ho wev er, this task is c hallenging due to sev eral reasons. First, T2 images ha ve a relativ ely lo w contrast and the exact b ound- ary of the tumor is hard to detect. Second, the VS tumor is a relativ ely small structure with large shap e v ariations in the whole brain image. Additionally , the image is often acquired with lo w through-plane resolution, as shown in Fig. 1. In the literature, a Ba yesian mo del was prop osed for automatic VS tumor segmen tation from ceT1 MRI [9], but it can hardly b e applied to T2 images with muc h low er con trast. Semi-automated tools for this task suffer from in ter- op erator v ariations [8]. In recen t y ears, conv olutional neural net works (CNNs) ha ve achiev ed state-of-the-art performance for many segmentation tasks [1,2,6]. Ho wev er, most of them are prop osed to segmen t images with isotropic resolution, and are not readily applicable to our VS images with high in-plane resolution and low through-plane resolution. T o segment small structures from large image con texts, Y u et al. [12] used a coarse-to-fine approach with recurrent saliency transformation. Okta y et al. [5] learned an attention map to enable the CNN to focus on target structures. Ho wev er, the atten tion map w as not learned with explicit sup ervision during training, and ma y not b e well-aligned with the target region, which can limit the segmentation accuracy . Therefore, we h yp othesise that end-to-end sup ervision on the learning of atten tion map will lead to b etter results. Complemen tary approac hes to deal with small structures include the use of adapted loss functions such as Dice loss [3] and generalized Dice loss [7]. They can mitigate the class imbalance b etw een foreground and background b y image- lev el weigh ting during training. Considering the fact that some v oxels are harder Automatic Segmentation of V estibular Sch wannoma b y Deep Atten tion 3 Input Output (conv2D, BN, pReLU) X 2 (conv3D, BN, pReLU) X 2 max pool 2D/3D deconvolution 2D/3D skip connection conv2D + Softmax conv2D/3D, ReLU conv2D/3D, sigmoid = attention map attention module L1 L2 L3 L4 L5 2D layers 3D layers Fig. 2. The proposed 2.5D U-Net with spatial attention for VS tumor segmentation from anisotropic MRI. The atten tion module is depicted in the righ t b ottom corner. than the others to learn during training, we prop ose a v oxel-lev el hardness- w eighted Dice loss function to further improv e the segmentation accuracy . The contribution of this pap er is three-fold. First, to the best of our knowl- edge, this is the first work on automatic VS tumor segmentation using deep learning. W e prop ose a 2.5D CNN com bining 2D and 3D conv olutions to deal with the lo w through-plane resolution. Second, we prop ose an attention mo dule to enable the CNN to focus on the target region. Unlike previous works [5], we explicitly sup ervise the learning of attention maps so that they can highlight the target structure better. Finally , w e propose a v oxel-lev el hardness-weigh ted Dice loss function to bo ost the p erformance of CNNs. The prop osed metho d was v alidated with T2-weigh ted MR images of 245 patients with VS tumor. 2 Metho ds 2.5D CNN for Segmen tation of Im ages with Anisotropic Resolutions. F or our images with high in-plane resolution and lo w through-plane resolution, 2D CNNs applied slice-by-slice will ignore inter-slice correlation. Isotropic 3D CNNs may need to upsample the image to an isotropic 3D resolution to balance the physical receptive field (in terms of mm rather than v oxels) along eac h axis, whic h requires more memory and may limit the depth or feature num b ers of the CNNs. Therefore, it is desirable to design a 2.5D CNN that can not only use inter-slice features but also b e more efficient than 3D CNNs. In addition, to mak e the receptive field isotropic in terms of physical dimensions, the n umber of con volution along each axis should b e different when dealing with such images. In [10], a 2.5D CNN was prop osed for brain tumor segmen tation. Ho wev er, it w as designed for isotropically resampled 3D images and limited by a small ph ysical receptiv e field along the through-plane axis. W e prop ose a nov el attention-based 2.5D CNN combining 2D and 3D conv o- lutions. As sho wn in Fig. 2, the main structure follo ws the typical enco der and deco der design of U-Net [6]. The enco der contains five levels of conv olutions. 4 G. W ang, et al. The first t wo lev els (L1-L2) and the other three levels (L3-L5) use 2D and 3D con volutions/max-po olings, respectively . This is motiv ated by the fact that the in-plane resolution of our VS tumor images is ab out 4 times that of the through- plane resolution. After the first tw o max-p o oling lay ers that downsample the feature maps only in 2D, the feature maps in L3 and the followings hav e a near- isotropic 3D resolution. A t each level, w e use a blo ck of lay ers containing tw o con volution la yers each follo wed by batc h normalization (BN) and parametric rectified linear unit (pReLU). The num b er of output feature c hannels at lev el l is denoted as N l . N l is set as 16 l in our exp eriments. The deco der contains similar blocks of 2D and 3D lay ers. Additionally , to deal with the small target region, w e add a spatial attention mo dule to each lev el of the deco der, which is depicted in Fig. 2 and detailed in the follo wing. Multi-Scale Sup ervised Spatial A ttention. Previous works hav e sho wn that spatial attention can b e automatically learned in CNNs to enable the net- w ork to fo cus on the target region in a large image context [5]. Building upon these works, we further in tro duce an explicit sup ervision on the learning of at- ten tion to improv e its accuracy . A spatial attention map can b e seen as a single- c hannel image of atten tion coefficient α i ∈ [0 , 1] that is a score of relative im- p ortance for each spatial p osition i . As shown in Fig. 2, the prop osed atten tion mo dule consists of tw o conv olution la yers. F or an input feature map at lev el l with channel num b er N l , the first conv olution lay er reduces the channel num b er to N l / 2 and is follow ed b y ReLU. The second conv olution lay er further reduces the c hannel n umber to 1 and is follow ed b y sigmoid to generate the spatial at- ten tion map A l at level l . A l is multiplied with the input feature map. W e also use a residual connection in the atten tion mo dule, as depicted in Fig. 2. W e prop ose an atten tion loss to sup ervise the learning of spatial atten tion explicitly during training. Let G denote the multi-c hannel one-hot ground truth segmen tation of an image and G f denote the single-c hannel binary foreground mask. F or atten tion map A l at level l , let G f l denote the a verage-po oled v ersion of G f so that it has the same resolution as A l . Our loss function for training is: L = 1 L X l ` ( A l , G f l ) + ` ( P, G ) (1) where L is the num b er of resolution levels ( L = 5 in our case). ` ( A l , G f l ) measures the difference b etw een A l and G f l . It drives the attention maps to b e as close to the foreground mask as p ossible. P denotes the prediction output of CNN, i.e., the probability of b elonging to eac h class for eac h vo xel. ` ( P , G ) is the segmen tation loss. The m ulti-scale supervision in Eq. (1) is similar to the holistic loss [11]. How ev er, here we apply it to multi-scale attention maps rather than the netw ork’s final prediction output. The tw o terms in Eq. (1) share the same underlying loss function ` , as discussed in the follo wing. V o xel-Lev el Hardness-W eigh ted Dice Loss. A go o d c hoice of ` is the Dice loss [3] prop osed to train CNNs for binary segmen tation, and it has shown go o d Automatic Segmentation of V estibular Sch wannoma by Deep Atten tion 5 p erformance in dealing with imbalanced foreground and background classes. F or segmen tation of small structures with lo w contrast, some vo xels are harder than the others to learn. T reating all the v oxels for a certain class equally as in [3] ma y limit the p erformance of CNNs on hard vo xels. Therefore, we prop ose automatic hard v oxel weigh ting in the loss function by defining a vo xel-level weigh t: w ci = λ ∗ abs ( p ci − g ci ) + (1 . 0 − λ ) (2) where p ci is the probabilit y of b eing class c for v oxel i predicted by a CNN, and g ci is the corresponding ground truth v alue. λ ∈ [0 , 1] con trols the degree of hard v oxel weigh ting. Our prop osed hardness-w eighted Dice loss (HDL) is defined as: ` H DL ( P , G ) = 1 . 0 − 1 C X c 2 P i w ci p ci g ci + P i w ci ( p ci + g ci ) + (3) where C is the c hannel n um b er of P and G , and =10 − 5 is a small num b er for n umerical stability . Similarly to [3], the gradient of ` H DL with resp ect to p ci can b e easily computed. Note that for the first term ` H DL ( A l , G f l ) in Eq. (1) dealing with atten tion maps, the channel num b er is one. 3 Exp erimen ts and Results Data and Implementation. T2-weigh ted MRI of 245 patients with a single sp oradic VS tumor w ere acquired in axial view b efore radiosurgery treatmen t, with high in-plane resolution around 0.4 mm × 0.4 mm, in-plane size 512 × 512, slice thickness and in ter-slice spacing 1.5 mm, and slice n umber 19 to 118. The ground truth was manually annotated by an exp erienced neurosurgeon and ph ysicist. W e randomly split the images into 178, 20 and 47 for training, v alida- tion and testing respectively . Each image was cropp ed with a cubic b ox of size 100 mm × 50 mm × 50 mm manually , and normalized by its intensit y mean and standard deviation. The CNNs were implemen ted in T ensorflow and Nift yNet [2] on a Ubuntu desktop with an NVIDIA GTX 1080 Ti GPU. F or training, w e used Adam optimizer with weigh t decay 10 − 7 , batch size 2. The learning rate was initialized to 10 − 4 and halv ed ev ery 10k. The training was ended when perfor- mance on the v alidation set stopp ed to increase. F or quantitativ e ev aluation, we measured Dice, av erage symmetric surface distance (ASSD) and relative v olume error (R VE) b etw een segmentation results and the ground truth. Comparison of Differen t Net w orks. First, w e ev aluate the p erformance of our 2.5D net work, and refer to our CNN without the attention mo dule as 2.5D U-Net. Its 2D and 3D counterparts with the same configuration except the dimension of conv olution/deco volution and max-po oling are referred to as 2D U-Net and 3D U-Net resp ectively . F or 3D U-Net, the images were resampled to isotropic resolution of 0.4 mm × 0.4 mm × 0.4 mm. The p erformance of these net works trained with Dice loss is shown in T able 1. It can b e observed that 6 G. W ang, et al. 2.5D U-Net 2.5D U-Net + AG [5] 2.5D U-Net + P A 2.5D U-Net + SpvP A in-plane through-plane in-plane through-plane (a) (b) Fig. 3. Visual comparison of different attention mechanisms for VS tumor segmenta- tion. Odd columns: segmentation results (green curves) and the ground truth (yello w curv es). Ev en columns: attention maps at the highest resolution level (L1) of the de- co der, where warmer color represents higher atten tion. our 2.5D U-Net ac hieves higher accuracy than its 2D and 3D counterparts. In addition, it is more efficien t than the other tw o. Its low er inference time than slice-b y-slice 2D U-Net is due to the 3D down-sampled feature maps in L3-L5. T able 1. Quantitativ e ev aluation of differen t netw orks for VS tumor segmentation. Dice loss was used for training. AG: The attention gate prop osed in [5]. P A: Our prop osed atten tion module. Sp vP A: The prop osed atten tion with supervision. * denotes significan t impro vemen t from 2.5D U-Net based on a paired t -test ( p < 0.05). Net work Dice(%) ASSD (mm) R VE (%) Time (s) 2D U-Net 80.38 ± 10.42 0.92 ± 0.68 18.01 ± 17.23 3.56 ± 0.36 3D U-Net 83.61 ± 13.69 0.84 ± 0.62 18.01 ± 17.48 3.90 ± 0.49 2.5D U-Net 85.69 ± 7.07 0.67 ± 0.45 16.02 ± 14.71 3.49 ± 0.39 2.5D U-Net + AG [5] 85.93 ± 6.96 0.58 ± 0.41* 15.45 ± 12.37 3.51 ± 0.34 2.5D U-Net + P A 86.09 ± 6.94 0.55 ± 0.32* 14.87 ± 12.19 3.52 ± 0.37 2.5D U-Net + SpvP A 86.71 ± 4.99* 0.53 ± 0.29 * 13.40 ± 9.34 * 3.52 ± 0.37 Effect of Sup ervised Atten tion. W e further in vestigated the effect of our prop osed attention (P A) mo dule and sup ervised attention (SpvP A). P A w as compared with the atten tion gate (A G) module prop osed in [5]. W e combined these mo dules with our 2.5D U-Net resp ectively . A G w as used to calibrate fea- tures from the enco der, as implemen ted in [5], while our P A and SpvP A w ere de- signed to calibrate the concatenation of enco der and decoder features, as shown in Fig. 2. These v arian ts w ere trained with Dice loss and their performance is Automatic Segmentation of V estibular Sch wannoma by Deep Atten tion 7 0.0 0.5 1.0 85.5 86.5 87.5 Dice (%) 0.0 0.5 1.0 0.4 0.5 0.6 0.7 ASSD (mm) 0.0 0.5 1.0 11 13 15 17 RVE (%) 2.5D U-Net 2.5D U-Net + SpvPA Fig. 4. P erformance of 2.5D U-Net and 2.5D U-Net + Sp vP A trained with the pro- p osed vo xel-lev el hardness-w eighted Dice loss (HDL) with differen t v alues of λ . sho wn in T able 1. It can b e observ ed that b oth AG [5] and P A lead to a better segmen tation accuracy than the 2.5D U-Net without attention, and our proposed P A p erforms slightly better than AG [5]. By using our SpvP A, the segmentation accuracy can b e further impro ved from that of P A. Fig. 3 sho ws a visual comparison of these three different attention metho ds. It can be observed that the attention map of AG [5] successfully suppresses most of the background region, but the magnitude for the target region is low er than that of P A and SpvP A. The attention map of P A highlights the target region, but also assigns high attention co efficients for strong edges in the input image. This is mainly b ecause the input for the P A module is a concatenation of high-level and lo w-level features. Benefiting from our explicit supervision on the learning of attention, the atten tion map of SpvP A fo cuses more on the target region and is less blurry than that of A G [5]. P erformance of V o xel-Lev el Hardness-W eigh ted Dice Loss. W e addi- tionally used HDL to train 2.5D U-Net and 2.5D U-Net + SpvP A resp ectiv ely . The av erage Dice, ASSD and R VE of these tw o netw orks with different v alues of λ are sho wn in Fig. 4. Note that λ = 0 . 0 is the baseline without hard vo xel w eighting and a higher v alue of λ corresp onds to assigning higher weigh ts to harder vo xels during training. The figure shows that our HDL with different v alues of λ leads to higher segmen tation p erformance. An improv ement of ac- curacy is observed when λ increases from 0.0 to 0.4. Interestingly , when λ is higher than 0.6, the segmen tation accuracy decreases, as shown by the curves of Dice in Fig. 4. This indicates that giving to o muc h emphasis to hard vo xels may decrease the generalization ability of the CNNs. As a result, we suggest a prop er range of λ as [0.4, 0.6]. Quan titativ e comparison betw een Dice loss [3] and our prop osed HDL with λ = 0 . 6 is presented in T able 2. It shows that our prop osed HDL outp erforms Dice loss [3] for b oth 2.5D U-Net and 2.5D U-Net + SpvP A. 4 Discussion and Conclusion In this w ork, we propose a 2.5D CNN for automatic VS tumor segmen tation from high-resolution T2-weigh ted MRI. Our net work is a trade-off b etw een standard 8 G. W ang, et al. T able 2. P erformance of tw o CNNs trained with different loss functions. * denotes significan t impro vemen t from Dice loss [3] based on a paired t -test ( p < 0.05). Net work T raining loss Dice(%) ASSD (mm) R VE (%) 2.5D U-Net Dice loss [3] 85.69 ± 7.07 0.67 ± 0.45 16.02 ± 14.71 HDL ( λ =0.6) 86.66 ± 6.01* 0.56 ± 0.37* 14.48 ± 12.30* 2.5D U-Net + SpvP A Dice loss [3] 86.71 ± 4.99 0.53 ± 0.29 13.40 ± 9.34 HDL ( λ =0.6) 87.27 ± 4.91 0.43 ± 0.31* 12.14 ± 8.94 2D and 3D CNNs and sp ecifically designed for images with high in-plane reso- lution and lo w through-plane resolution. Exp erimen ts show that it outp erforms its 2D and 3D counterparts in terms of segmentation accuracy and efficiency . T o deal with the small target region, w e prop ose a multi-scale spatial atten tion mec hanism with explicit sup ervision on the learning of attention maps. Experi- men tal results demonstrate that the sup ervised atten tion can guide the netw ork to fo cus more accurately on the target region, leading to higher accuracy of the final segmentation. W e also combine automatic hard v oxel weigh ting with exist- ing Dice loss [3], and the prop osed vo xel-level hardness-w eighted Dice loss can lead to further p erformance improv emen t. This will facilitate the rapid adoption of these techniques in to clinical practice, providing clinicians with the means for accurate automatically-generated segmentations that will b e used to inform patien t management decisions. Though our metho ds can also b e applied to ceT1 images, this w ork on T2 image segmentation improv es patient safety b y enabling patien ts to undergo serial imaging without the need to use p otentially harmful con trast agents. Ac knowledgemen ts. This work w as supported by W ellcome T rust [203145Z/16/Z; 203148/Z/16/Z; WT106882], and EPSR C [NS/A000050/1; NS/A000049/1] fund- ing. TV is supp orted by a Medtronic / Ro yal Academy of Engineering Researc h Chair [R CSRF1819/7/34]. References 1. Ab dulk adir, A., Lienk amp, S.S., Bro x, T., Ronneb erger, O.: 3D U-Net: Learning dense volumetric segmen tation from sparse annotation. In: MICCAI. pp. 424–432 (2016) 2. Gibson, E., Li, W., Sudre, C., Fidon, L., Shakir, D.I., W ang, G., Eaton-Rosen, Z., Gra y , R., Do el, T., Hu, Y., Whyn tie, T., Nachev, P ., Mo dat, M., Barratt, D.C., Ourselin, S., Cardoso, M.J., V ercauteren, T.: NiftyNet: A deep-learning platform for medical imaging. Comput. Methods Programs Biomed. 158, 113–122 (2018) 3. Milletari, F., Na v ab, N., Ahmadi, S.A.: V-Net: F ully con volutional neural net works for volumetric medical image segmentation. In: IC3D V. pp. 565–571 (2016) 4. Moffat, D.A., Hardy , D.G., Irving, R.M., Viani, L., Beynon, G.J., Baguley , D.M.: Referral patterns in vestibular sc hw annomas. Clin. Otolaryngol. Allied Sci. 20(1), 80–83 (1995) Automatic Segmentation of V estibular Sch wannoma by Deep Atten tion 9 5. Okta y , O., Schlemper, J., Le F olgo c, L., Lee, M., Heinric h, M., Misaw a, K., Mori, K., Mcdonagh, S., Hammerla, N.Y., Kainz, B., Glock er, B., Ruec kert, D.: A tten tion U-Net: Learning where to lo ok for the pancreas. arXiv Prepr. arXiv1804.03999 (2018) 6. Ronneb erger, O., Fisc her, P ., Brox, T.: U-Net: Con volutional net works for biomed- ical image segmentation. In: MICCAI. pp. 234–241 (2015) 7. Sudre, C.H., Li, W., V ercauteren, T., Ourselin, S., Cardoso, M.J.: Generalised Dice ov erlap as a deep learning loss function for highly unbalanced segmentations. In: Deep Learn. Med. Image Anal. Multimo dal Learn. Clin. Decis. Support. pp. 240–248 (2017) 8. T ysome, J., Patterson, A., Das, T., Donnelly , N., Mannion, R., Axon, P ., Grav es, M., MacKeith, S.: A comparison of semi-automated volumetric vs linear mea- suremen t of small vestibular sch wannomas. Eur. Arch. Oto-Rhino-Laryn. 275(4), 867–874 (2018) 9. V okurk a, E.A., Herwadk ar, A., Thack er, N.A., Ramsden, R.T., Jackson, A.: Using ba yesian tissue classification to impro ve the accuracy of vestibular sc hw annoma v olume and growth measurement. Am. J. Neuroradiol. 23(3), 459–467 (2002) 10. W ang, G., Li, W., Ourselin, S., V ercauteren, T.: Automatic brain tumor segmen- tation using cascaded anisotropic conv olutional neural net works. In: In t. MICCAI Brainlesion W ork., pp. 178–190 (2017) 11. Xie, S., Diego, S., Jolla, L., T u, Z., Diego, S., Jolla, L.: Holistically-nested edge detection. In: ICCV. pp. 1395–1403 (2015) 12. Y u, Q., Xie, L., W ang, Y., Zhou, Y., Fishman, E.K., Y uille, A.L.: Recurrent saliency transformation netw ork: incorporating multi-stage visual cues for small organ seg- men tation. In: CVPR. pp. 8280–8289 (2018)
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment