Deep Attentive Features for Prostate Segmentation in 3D Transrectal Ultrasound

Automatic prostate segmentation in transrectal ultrasound (TRUS) images is of essential importance for image-guided prostate interventions and treatment planning. However, developing such automatic solutions remains very challenging due to the missin…

Authors: Yi Wang, Haoran Dou, Xiaowei Hu

Deep Attentive Features for Prostate Segmentation in 3D Transrectal   Ultrasound
SUBMIT TO IEEE TRANS. ON MEDICAL IMA GING, V OL. XX, NO. XX, XX XX 1 Deep Attenti v e Features for Prostate Se gmentation in 3D T ransrectal Ultrasound Y i W ang*, Haoran Dou, Xiaowei Hu, Lei Zhu, Xin Y ang, Ming Xu, Jing Qin, Pheng-Ann Heng, Tianfu W ang, and Dong Ni Abstract —A utomatic prostate segmentation in transrectal ul- trasound (TRUS) images is of essential importance for image- guided prostate inter ventions and treatment planning . However , developing such automatic solutions remains very challenging due to the missing/ambiguous boundary and inhomogeneous intensity distribution of the prostate in TRUS, as well as the large variability in prostate shapes. This paper develops a novel 3D deep neural network equipped with attention modules for better prostate segmentation in TR US by fully exploiting the complementary information encoded in different layers of the con volutional neural network (CNN). Our attention module utilizes the attention mechanism to selectively leverage the multi- level features integrated fr om different lay ers to refine the features at each individual layer , suppressing the non-prostate noise at shallow layers of the CNN and increasing mor e pr ostate details into features at deep layers. Experimental results on challenging 3D TRUS volumes show that our method attains satisfactory segmentation performance. The proposed attention mechanism is a general strategy to aggregate multi-lev el deep features and has the potential to be used for other medical image segmentation tasks. The code is publicly a vailable at https://github .com/wulalago/D AF3D. Index T erms —Attention mechanisms, deep features, feature pyramid network, 3D segmentation, transrectal ultrasound. I . I N T RO D U C T I O N P R OST A TE cancer is the most common noncutaneous cancer and the second leading cause of cancer-related deaths in men [1]. Early detection and interventions is the crucial key to the cure of progressiv e prostate cancer [2]. This work was supported in part by the National Natural Science Foundation of China under Grants 61701312 and 61571304, in part by the Natural Science Foundation of SZU (No. 2018010), in part by the Shenzhen Peacock Plan (KQTD2016053112051497), and in part by a grant from the Research Grants Council of HKSAR (No. 14225616). (Y i W ang and Haoran Dou contributed equally to this work.) (Corresponding author: Y i W ang.) Y . W ang, H. Dou, T . W ang and D. Ni are with the National-Regional Ke y T echnology Engineering Laboratory for Medical Ultrasound, Guangdong Key Laboratory for Biomedical Measurements and Ultrasound Imaging, School of Biomedical Engineering, Health Science Center, Shenzhen University , Shenzhen, China, and also with the Medical UltraSound Image Computing (MUSIC) Lab, Shenzhen, China (e-mail: one wang@szu.edu.cn). X. Hu, X. Y ang and P .A. Heng are with the Department of Computer Science and Engineering, The Chinese Uni versity of Hong K ong, Hong K ong, China. L. Zhu and J. Qin are with the Centre for Smart Health, School of Nursing, The Hong Kong Polytechnic Uni versity , Hong K ong, China; L. Zhu is also with the Department of Computer Science and Engineering, The Chinese Univ ersity of Hong Kong, Hong Kong, China. M. Xu is with the Department of Medical Ultrasonics, the First Af filiated Hospital, Institute of Diagnostic and Interventional Ultrasound, Sun Y at-Sen Univ ersity , Guangzhou, China. Copyright (c) 2019 IEEE. Personal use of this material is permitted. Howe ver , permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org. Fig. 1. Example TRUS images. Red contour denotes the prostate boundary . There are large prostate shape variations, and the prostate tissues present inho- mogeneous intensity distributions. Orange arro ws indicate missing/ambiguous boundaries. T ransrectal ultrasound (TR US) has long been a routine imag- ing modality for image-guided biopsy and therapy of prostate cancer [3]. Accurate boundary delineation from TR US images is of essential importance for the treatment planning [4], biopsy needle placement [5], brachytherapy [6], cryotherapy [7], and can help surface-based registration between TR US and preoperativ e magnetic resonance (MR) images during image- guided interventions [8], [9]. Currently , prostate boundaries are routinely outlined manually in a set of transverse cross- sectional 2D TR US slices, then the shape and volume of the prostate can be derived from the boundaries for the subsequent treatment planning. Howe ver , manual outlining is tedious, time-consuming and often irreproducible, ev en for e xperienced physicians. Automatic prostate segmentation in TR US images has be- come a considerable research area [10], [11]. Ne vertheless, ev en though there has been a number of methods in this area, accurate prostate segmentation in TR US remains very challenging due to (a) the ambiguous boundary caused by poor contrast between the prostate and surrounding tissues, (b) missing boundary segments result from acoustic shadow and the presence of other structures (e.g. the urethra), (c) inhomogeneous intensity distribution of the prostate tissue in TR US images, and (d) the large shape variations of different prostates (see Fig. 1). SUBMIT TO IEEE TRANS. ON MEDICAL IMAGING, VOL. XX, NO. XX, XX XX 2 Fig. 2. The visual comparisons of TR US segmentations using conventional multi-level features (ro ws 1 and 3) and proposed attentive features (ro ws 2 and 4). (a) is the input TRUS images; (b)-(e) show the output feature maps from layer 1 (shallow layer) to layer 4 (deep layer) of the convolutional networks; (f) is the segmentation results predicted by corresponding features; (g) is the ground truths. W e can observ e that directly applying multi-level features without distinction for TRUS segmentation may suffer from poor localization of prostate boundaries. In contrast, our proposed attentive features are more powerful for the better representation of prostate characteristics. A. Relevant W ork The problem of automatic prostate segmentation in TRUS images has been extensi vely exploited in the literature [5], [12]–[30]. One main methodological stream utilizes shape statistics for the prostate se gmentation. Ladak et al. [12] proposed a semi-automatic segmentation of 2D TR US images based on shape-based initialization and the discrete dynamic contour (DDC) for the refinement. W ang et al. [16] further employed the DDC method to segment series of contiguous 2D slices from 3D TR US data, thus obtaining 3D TRUS seg- mentation. Pathak et al. [13] proposed a edge-guided boundary delineation algorithm with built-in a priori shape kno wledge to detect the most probable edges describing the prostate. Shen et al. [15] presented a statistical shape model equipped with Gabor descriptors for prostate segmentation in ultrasound images. Inspired by [15], robust activ e shape model has been proposed to discard displacement outliers during model fitting procedure, and further applied to ultrasound se gmentation [25]. T utar et al. [20] defined the prostate segmentation task as fitting the best surface to the underlying images under shape constraints learned from statistical analysis. Y an et al. [5] dev eloped a partial activ e shape model to address the missing boundary issue in ultrasound shadow area. Y an et al. [22] used both global population-based and patient-specific local shape statistics as shape constraint for the TR US se gmentation. All aforementioned methods hav e incorporated prior shape information to pro vide robust segmentation against image noise and artifacts. Ho wev er , due to the large v ariability in prostate shapes, such methods may lose specificity , which are generally not sufficient to faithfully delineate boundaries in some cases [11]. In addition to shape statistics based methods, many other approaches resolve the prostate segmentation by formulating it as a foreground classification task. Zhan et al. [21] utilized a set of Gabor-support vector machines to analyse texture features for prostate segmentation. Ghose et al. [23] performed supervised soft classification with random forest to identify prostate. Y ang et al. [28] extracted patch-based features (e.g., Gabor wav elet, histogram of gradient, local binary pattern) and employed the trained kernel support vector machine to locate prostate tissues. In general, all above methods used hand-crafted features for segmentations, which are ineffecti ve to capture the high-lev el semantic knowledge, and thus tend to fail in generating high-quality segmentations when there are ambiguous/missing boundaries in TR US images. Recently , deep neural networks are demonstrated to be a very powerful tool to learn multi-lev el features for object seg- mentation [31]–[37]. Guo et al. [38] presented a deep network for the segmentation of prostate in MR images. Motiv ated by [38], Ghav ami et al. [39], [40] employed con volutional neural networks (CNNs) built upon U-net architecture [34] for automatic prostate segmentation in 2D TRUS slices. T o tackle the missing boundary issue in TRUS images, Y ang et al. [41] proposed to learn the shape prior with the biologically plausi- ble recurrent neural networks (RNNs) and bridged boundary incompleteness. Karimi et al. [42] employed an ensemble of multiple CNN models and a statistical shape model to segment TR US images for prostate brachytherapy . Anas et al. [43] employed a deep residual neural net with an exponential weight map to delineate the 2D TRUS images for low-dose prostate brach ytherapy treatment. Anas et al. [44] further dev eloped an RNN-based architecture with gated recurrent unit as the core of the recurrent connection to segment prostate in freehand ultrasound guided biopsy . Compared to traditional machine learning methods with hand-crafted features, one of the main adv antages of deep neural networks is to generate multi-le vel features consisting of abundant semantic and fine information. Howe ver , directly applying multi-level con volutional features without distinction for TR US segmentation may suffer from poor localization of prostate boundaries, due to the distraction from redundant SUBMIT TO IEEE TRANS. ON MEDICAL IMAGING, VOL. XX, NO. XX, XX XX 3 Fig. 3. The schematic illustration of our prostate segmentation network equipped with attention modules. FPN: feature pyramid network; SLF: single-layer features; MLF: multi-layer features; AM: attention module; ASPP: atrous spatial pyramid pooling. features (see the 1st and 3rd rows of Fig. 2). Because the inte- grated multi-lev el features tend to include non-prostate regions (due to low-le vel details from shallow layers) or lose details of prostate boundaries (due to high-level semantics from deep layers) when generating segmentation results. Our preliminary study on 2D TRUS images [45] has demonstrated that it is essential to lev erage the complementary advantages of features at multiple levels and to learn more discriminati ve features targeting for accurate and robust segmentation. Ho wev er , the work [45] only realizes 2D segmentation which could be v ery limiting for its application. In this study , we dev elop a novel 3D feature pyramid network equipped with attention modules to generate deep attentiv e features (D AF) for better prostate segmentation in 3D TRUS volumes. The D AF is generated at each indi vidual layer by learning the complementary information of the lo w- lev el detail and high-level semantics in multi-layer features (MLF), thus is more powerful for the better representation of prostate characteristics (see the 2nd and 4th ro ws of Fig. 2). Experiments on 3D TR US volumes demonstrate that our seg- mentation using deep attentiv e features achieves satisfactory performance. B. Contributions The main contributions of our work are twofold. 1) W e propose to fully exploit the useful complementary information encoded in the multi-lev el features to refine the features at each individual layer . Specifically , we achiev e this by dev eloping an attention module, which can automatically learn a set of weights to indicate the importance of the features in MLF for each individual layer . 2) W e dev elop a 3D attention guided network with a nov el scheme for TRUS prostate segmentation by harnessing the spatial contexts across deep and shallo w layers. T o the best of our knowledge, we are the first to utilize attention mechanisms to refine multi-layer features for the better 3D TRUS segmentation. In addition, the proposed attention mechanism is a general strategy to aggregate multi-level features and has the potential to be used in other segmentation applications. The remainder of this paper is organized as follow . Sec- tion II presents the details of the attention guided network which generates attentiv e features by effecti vely le veraging the complementary information encoded in multi-lev el features. Section III presents the experimental results of the proposed method for the application of 3D TRUS segmentation. Sec- tion IV elaborates the discussion of the proposed attention guided network, and the conclusion of this study is given in Section V. I I . D E E P A T T E N T I V E F E AT U R E S F O R 3 D S E G M E N TA T I O N Segmenting prostate from TR US images is a challenging task especially due to the ambiguous/missing boundary and inhomogeneous intensity distribution of the prostate in TRUS. Directly using low-le vel or high-level features, or even their combinations to conduct prostate segmentation may often fail to get satisfactory results. Therefore, le veraging v arious factors such as multi-scale contextual information, re gion semantics and boundary details to learn more discriminative prostate fea- tures is essential for accurate and rob ust prostate segmentation. T o address above issues, we present deep attentiv e features for the better representation of prostate. The following subsec- tions present the details of the proposed scheme and elaborate the novel attention module. A. Network Ar chitectur e Fig. 3 illustrates the proposed prostate segmentation net- work with deep attenti ve features. Our netw ork takes the TR US images as the input and outputs the se gmentation result SUBMIT TO IEEE TRANS. ON MEDICAL IMAGING, VOL. XX, NO. XX, XX XX 4 Fig. 4. The schematic illustration of the atrous spatial pyramid pooling (ASPP) with dilated con volution and group normalization (GN). in an end-to-end manner . It first produces a set of feature maps with dif ferent resolutions. The feature maps at shallow layers hav e high resolutions but with fruitful detail information while the feature maps at deep layers hav e lo w resolutions but with high-lev el semantic information. W e implement the 3D ResNeXt [46] as the feature extraction layers (the gray parts in the left of Fig. 3). Specifically , to alleviate the issue of large scale variability of prostate shapes in different TR US slices (e.g., mid-gland slices sho w much larger prostate region than base/apex slices do), we employ dilated con volution [47] in backbone ResNeXt to systematically aggreg ate multi-scale contextual information. W e use 3 × 3 × 3 dilated con volutions with rate of 2 to substitute the con ventional 3 × 3 × 3 con volutions in layer3 and layer4 to increase the recepti ve field without loss of resolution. In addition, considering that the TR US data is a “thin” v olume (slice number ( L ) is relati vely smaller than slice width ( W )/height ( H )), we set do wnsam- pling of layer0 by stride (2 , 2 , 2) , and set layer1 , layer2 by stride (2 , 2 , 1) to retain useful information in different slices. T o naturally lev erage the feature hierarchy computed by con volutional network, we further utilize feature pyramid net- work (FPN) architecture [48] to combine multi-level features via a top-down pathway and lateral connections (see Fig. 3, 3D-FPN). The top-down pathway upsamples spatially coarser , but semantically stronger feature maps from higher pyramid lev els. These feature maps are then merged with correspond- ingly same-sized bottom-up maps via lateral connections. Each lateral connection merges feature maps by element- wise addition. The enhanced feature maps at each layer are obtained by using the deeply supervised mechanism [49] that imposes the supervision signals to multiple layers. The deeply supervised mechanism can reinforce the propagation of gradients flows within the 3D network and hence help to learn more representativ e features [50]. Note that the feature maps at layer0 are ignored in the pyramid due to the memory limitation. After obtaining the enhanced feature maps with different lev els of information via FPN, we enlarge these feature maps with different resolutions to the same size of layer1 ’ s feature map by trilinear interpolation. The enlarged feature maps at each individual layer are denoted as “single-layer features (SLF)”, and the multiple SLFs are combined together, followed by conv olution operations, to generate the “multi- layer features (MLF)”. Although the MLF encodes the low- lev el detail information as well as the high-level semantic information of the prostate, it also inevitably incorporates noise from the shallo w layers and losses some subtle parts of the prostate due to the coarse features at deep layers. In order to refine the features of the prostate ultrasound image, we present an attention module to generate deep attentiv e features at each layer in the principle of the attention mechanism. The proposed attention module lev erages the MLF and the SLF as the inputs and produces the refined attentiv e feature maps; please refer to Section II-B for the details of our attention module. Then, instead of directly averaging the obtained multi-scale attentiv e feature maps for the prediction of the prostate region, we employ a 3D atrous spatial pyramid pooling (ASPP) [51] module to resample attentiv e features at different scales for more accurate prostate representation. As shown in Fig. 3, the multiple attentiv e feature maps generated by attention modules are combined together, followed by con volution operations, to form an attentive feature map. Four parallel conv olutions with different atrous rates are then applied on top of this attenti ve feature map to capture multi-scale information. Specifically , the schematic illustration of our 3D ASPP with dilated con vo- lution and group normalization (GN) [52] is shown in Fig. 4. Our 3D ASPP consists of (a) one 1 × 1 × 1 con volution and three 3 × 3 × 3 dilated con v olutions with rates of ( 6 , 12 , 18 ), and (b) group normalization right after the conv olutions. W e choose GN instead of batch normalization is due to GN’ s accuracy is considerably stable in a wide range of batch sizes [52], which will be more suitable for 3D data computation. Our GN is along the channel direction and the number of groups is 32 . Finally , we combine multi-scale attenti ve features together , and get the prostate segmentation result by using the deeply supervised mechanism [49]. B. Deep Attentive F eatures As presented in Section II-A, the feature maps at shallow layers contain the detail information of prostate but also include non-prostate regions, while the feature maps at deep layers are able to capture the highly semantic information to indicate the location of the prostate b ut may lose the fine details of the prostate’ s boundaries. In order to refine the features at each layer, here we present a deep attentive module (see Fig. 5) to generate the refined attentive features by utilizing the proposed attention mechanism. Attention model is widely used for various tasks, includ- ing image segmentation. Several attention mechanisms, e.g., channel-wise attention [53] and pixel-wise attention [54], have been proposed to boost the network’ s representational power . In this study , we explore layer-wise attention mechanism to selectiv ely le verage the complementary features across all scales to refine the features of indi vidual layers. Specifically , as shown in Fig. 5, we feed the MLF and SLF at each layer into the proposed attention module and generate refined SLF through the follo wing three steps. The first step is SUBMIT TO IEEE TRANS. ON MEDICAL IMAGING, VOL. XX, NO. XX, XX XX 5 Fig. 5. The schematic illustration of the proposed attention module. to generate an attenti ve map at each layer, which indicates the importance of the features in MLF for each specific individual layer . Giv en the single-layer feature maps at each layer , we concatenate them with the multi-layer feature maps as F x , and then produce the unnormalized attention weights W x (see Fig. 5): W x = f a ( F x ; θ ) , (1) where θ represents the parameters learned by f a which con- tains three con volutional layers. The first two con volutional layers use 3 × 3 × 3 kernels, and the last con v olutional layer applies 1 × 1 × 1 kernels. It is worth noting that in our implementation, each con volutional layer consists of one con volution, one group normalization, and one parametric rec- tified linear unit (PRelu) [55]. These con volutional operations are employed to choose the useful multi-level information with respect to the features of each individual layer . After that, our attention module computes the attentive map A x by normalizing W x with a Sigmoid function. In the second step, we multiply the attentiv e map A x with the MLF in a element-by-element manner to weight the features in MLF for each SLF . Third, the weighted MLF is merged with corresponding features of each SLF by applying two 3 × 3 × 3 and one 1 × 1 × 1 con volutional layers, which is capable of automatically refining layer-wise SLF and producing the final attentiv e features for the giv en layer (see Fig. 5). In general, our attention mechanism le verages the MLF as a fruitful feature pool to refine the features of each SLF . Specifically , as the SLF at shallo w layers is responsible for dis- Fig. 6. The learning curve of our attention guided network. cov ering detailed information but lack of semantic information of prostate, the MLF can guide them gradually suppress details that are not located in the semantic saliency regions while cap- turing more details in semantic saliency regions. Meanwhile, as SLF at deep layers are responsible for capturing cues of the whole prostate and may lack of detailed boundary features, the MLF can enhance their boundary details. By refining the features at each layer using the proposed attention mechanism, our network can learn to select more discriminative features for accurate and robust TR US se gmentation. C. Implementation Details Our proposed framew ork was implemented on PyT orch and used the 3D ResNeXt [46] as the backbone network. a) Loss Function : During the training process, Dice loss L dice and binary cross-entropy loss L bce are used for each output of this network: L dice = 1 − 2 P N i =1 p i g i P N i =1 p i 2 + P N i =1 g i 2 , (2) L bce = N X i =1 g i log p i + N X i =1 (1 − g i ) log(1 − p i ) , (3) where N is the vox el number of the input TR US volume; p i ∈ [0 . 0 , 1 . 0] represents the vox el value of the predicted probabilities; g i ∈ { 0 , 1 } is the voxel value of the binary ground truth volume. The binary cross-entropy loss L bce is a con ventional loss in segmentation task. It is preferred in preserving boundary details but may cause over -/under- segmentation due to class-imbalance issue. In order to alle viate this problem, we combine the Dice loss L dice with the L bce . The Dice loss emphasizes global shape similarity to gener- ate compact segmentation and its differentiability has been illustrated in [56]. The combined loss is helpful to consider both local detail and global shape similarity . W e define each supervised signal L sig nal as the summation of L dice and L bce : L sig nal = L dice + L bce . (4) Therefore the total loss L total is defined as the summation of loss on all supervised signals: L total = n X i =1 w i L i sig nal + n X j =1 w j L j sig nal + w f L f sig nal , (5) SUBMIT TO IEEE TRANS. ON MEDICAL IMAGING, VOL. XX, NO. XX, XX XX 6 Fig. 7. One example to illustrate the ef fectiveness of the proposed attention module for the feature refinement. (a) is the input TRUS image and its ground truth; (b)-(e) show the features from layer 1 (shallow layer) to layer 4 (deep layer); rows 1-3 show single-layer features (SLFs), corresponding attentiv e maps and attention-refined SLFs, respectiv ely; (f) is the multi-layer features (MLF) and the attention-refined MLF . W e can observe that our proposed attention module provides a feasible solution to effecti vely incorporate details at low levels and semantics at high levels for better feature representation. where w i and L i sig nal represent the weight and loss of i - th layer; while w j and L j sig nal represent the weight and loss of j -th layer after refining features using our attention modules; n is the number of layers of our network; w f and L f sig nal are the weight and loss for the output layer . W e empirically set the weights ( w i =1 , 2 , 3 , 4 , w j =1 , 2 , 3 , 4 and w f ) as ( 0 . 4 , 0 . 5 , 0 . 7 , 0 . 8 , 0 . 4 , 0 . 5 , 0 . 7 , 0 . 8 , 1 ). b) Training Process : Our framework is trained end-to- end. W e adopt Adam [57] with the initial learning rate of 0 . 001 , a mini-batch size of 1 on a single TIT AN Xp GPU, to train the whole framew ork. Fig. 6 shows the learning curve of the proposed frame work. It can be observed that the training con ver ges after 14 epochs. T raining the whole framework by 20 epochs takes about 54 hours on our experimental data. The code is publicly available at https://github.com/ wulalago/D AF3D. I I I . E X P E R I M E N T S A N D R E S U L T S A. Materials Experiments were carried on TRUS volumes obtained from forty patients at the First Affiliate Hospital of Sun Y at- Sen Univ ersity , Guangzhou, Guangdong, China. The study protocol was revie wed and approved by the Ethics Committee of our institutional re view board and informed consent was obtained from all patients. W e acquired one TR US volume from each patient. All TR US data were obtained using Mindray DC-8 ultrasound system (Shenzhen, China) with an integrated 3D TRUS probe. These data were then reconstructed into TR US volumes. The 3D TRUS volume contains 170 × 132 × 80 vox els with a vox el size of 0 . 5 × 0 . 5 × 0 . 5 mm 3 . T o insure the ground- truth segmentation as correct as possible, two experienced urological clinicians with extensi ve experience in interpreting the prostate TR US images hav e been in volv ed for annotations. It took two weeks for one clinician to delineate all boundaries using a custom interface developed via C++. This clinician delineated each slice by considering the 3D information of its neighboring slices. Then all the manually delineated bound- aries were further refined/confirmed by another clinician for the correctness assurance. W e adopted data augmentation (i.e., rotation and flipping) for training. B. Experimental Methods T o demonstrate the adv antages of the proposed method on TR US segmentation, we compared our attention guided net- work with other three state-of-the-art segmentation networks: 3D Fully Con volutional Network (FCN) [33], 3D U-Net 1 [39], and Boundary Completion Recurrent Neural Network (BCRNN) [41]. It is worth noting that the work [41] and [39] have been proposed specializing in TRUS segmentation. For a fair comparison, we re-trained all the three compared models using the public implementations and adjusted training parameters to obtain best segmentation results. In addition to the aforementioned compared methods, we also performed ablation analysis to directly show numerical gains of the attention module design. W e discarded the atten- tion modules in our framework, and directly sent the MLF (the yellow layer in Fig. 3) to go through the ASPP module for the final prediction. W e denote this model as 3D customized FPN (cFPN). Four-fold cross-validation was conducted to ev aluate the segmentation performance of dif ferent models 2 . The metrics employed to quantitativ ely ev aluate segmen- tation included Dice Similarity Coefficient (Dice), Jaccard Index, Conformity Coefficient (CC), A verage Distance of 1 The work [39] adopted a 2D U-Net architecture [34] as backbone network. Here we extend [39] to 3D architecture for a fair comparison. 2 T o ensure fair comparison, same hyper-parameter tuning was conducted for each network in the cross-v alidation. More specifically , we sampled o ver a range over hyper-parameters and trained each network. Each network’ s performance shown in this paper was hyper-parameters that produced on av erage the best performance in all four folds. SUBMIT TO IEEE TRANS. ON MEDICAL IMAGING, VOL. XX, NO. XX, XX XX 7 T ABLE I M E TR I C R E S U L T S O F D I FF E RE N T M E T H OD S ( M EA N ± S D , B E S T R E S ULT S A R E H I G HL I G H TE D I N B O L D ) Metric 3D FCN [33] 3D U-Net [39] BCRNN [41] 3D cFPN Ours Dice 0 . 82 ± 0 . 04 0 . 84 ± 0 . 04 0 . 82 ± 0 . 04 0 . 88 ± 0 . 04 0.90 ± 0.03 Jaccard 0 . 70 ± 0 . 06 0 . 73 ± 0 . 06 0 . 70 ± 0 . 05 0 . 78 ± 0 . 06 0.82 ± 0.04 CC 0 . 56 ± 0 . 12 0 . 63 ± 0 . 11 0 . 56 ± 0 . 11 0 . 72 ± 0 . 10 0.78 ± 0.08 ADB 9 . 58 ± 2 . 65 8 . 27 ± 2 . 03 5 . 13 ± 1 . 13 6 . 12 ± 1 . 88 3.32 ± 1.15 95 HD 25 . 11 ± 7 . 83 20 . 39 ± 4 . 74 11 . 57 ± 2 . 64 15 . 11 ± 5 . 03 8.37 ± 2.52 Precision 0 . 81 ± 0 . 09 0 . 83 ± 0 . 08 0 . 87 ± 0 . 07 0 . 85 ± 0 . 08 0.90 ± 0.06 Recall 0 . 85 ± 0 . 09 0 . 88 ± 0 . 08 0 . 79 ± 0 . 08 0.92 ± 0.06 0 . 91 ± 0 . 04 T ABLE II P - V A L UE S F RO M W I L C OX ON R A N K - S U M T E S TS B E T W EE N O U R M E TH O D A N D OT H E R C O MPA R ED M E T H OD S O N D I FF ER E N T M E T RI C S Metric 3D FCN vs. Ours 3D U-Net vs. Ours BCRNN vs. Ours 3D cFPN vs. Ours Dice 10 − 12 10 − 10 10 − 12 10 − 3 Jaccard 10 − 12 10 − 10 10 − 12 10 − 3 CC 10 − 12 10 − 10 10 − 12 10 − 3 ADB 10 − 14 10 − 14 10 − 8 10 − 10 95 HD 10 − 14 10 − 14 10 − 7 10 − 11 Precision 10 − 6 10 − 6 0 . 03 10 − 3 Recall 0 . 01 0 . 11 10 − 8 0 . 08 Boundaries (ADB, in vox el), 95% Hausdorff Distance ( 95 HD, in voxel), Precision, and Recall [58], [59]. Metrics of Dice, Jaccard and CC were used to e valuate the similarity between the segmented volume and ground truth 3 . The ADB measured the average over the shortest voxel distances between the segmented volume and ground truth. The HD is the longest distance ov er the shortest distances between the segmented volume and ground truth. Because HD is sensitive to outliers, we used the 95 th percentile of the asymmetric HD instead of the maximum. Precision and Recall ev aluated segmentations from the aspect of v oxel-wise classification accuracy . All ev al- uation metrics were calculated in 3D. A better segmentation shall have smaller ADB and 95 HD, and larger values of all other metrics. C. Segmentation P erformance W e first qualitativ ely illustrate the effecti veness of the proposed attention module for the feature refinement. From Fig. 7, we can observe that our attentive map can indicate how much attention should be paid to the MLF for each SLF , and thus is able to select the useful complementary information from the MLF to refine each SLF correspondingly . T able I summarizes the numerical results of all compared methods. It can be observed that our method consistently outperforms others on almost all the metrics. Specifically , our method yielded the mean Dice v alue of 0 . 90 , Jaccard of 0 . 82 , CC of 0 . 78 , ADB of 3 . 32 vox els, 95 HD of 8 . 37 3 Dice=2(G ∩ S) / (G+S), Jaccard=(G ∩ S) / (G ∪ S), CC=2-(G ∪ S) / (G ∩ S), where S and G denotes the segmented volume and ground truth, respectively . vox els, and Precision of 0 . 90 . All these results are the best among all compared methods. Note that our method had the second best mean Recall value among all methods; our customized feature pyramid network achiev ed the best Recall value. Howe ver , except for the mean Recall v alue, our attention guided network outperforms the ablation model (i.e., the 3D cFPN) with regard to all the other metrics. Specifically , as shown in T able I, the mean Dice, Jaccard, CC, ADB, 95 HD, and Precision values by the proposed attention guided network are approximately 2 . 57% , 4 . 58% , 8 . 18% , 45 . 74% , 44 . 61% , and 5 . 85% better than the ablation model without attention modules, respecti vely . These comparison results between our method and the 3D cFPN demonstrate that the proposed attention module contributes to the improv ement of the TR US segmentation. Although our customized 3D FPN architec- ture already consistently outperforms existing state-of-the-art segmentation methods on most of the metrics by lev eraging the useful multi-lev el features, the proposed attention module has the capability to more effecti vely lev erage the useful complementary information encoded in the multi-le vel features to refine themselves for e ven better segmentation. T o in vestigate the statistical significance of the proposed method over compared methods on each of the metrics, a series of statistical analyses are conducted. First, the one-way analysis of variance (ANO V A) [60] is performed to ev aluate if the metric results of different methods are statistically different. The resulting F Dice = 34 . 85 , F J accar d = 36 . 71 , F C C = 32 . 22 , F ADB = 71 . 73 , F 95 H D = 73 . 83 , F P r ecision = 7 . 88 , and F Recall = 18 . 80 , respecti vely; all are larger than the same F critical (= 2 . 42) , indicating that the dif ferences between each of the metrics from the fiv e methods are statistically significant. Based on the observations from ANO V A, the W ilcoxon rank-sum test is further employed to compare the segmentation performances between our method and other compared methods. T able II lists the p -values from W ilcoxon rank-sum tests between our method and other compared meth- ods on different metrics. By observing T able II, it can be concluded that the null hypotheses for the four comparing pairs on the metrics of Dice, Jaccard, CC, ADB, 95 HD, and Precision are not accepted at the 0 . 05 lev el. As a result, our method can be regarded as significantly better than the other four compared methods on these e valuation metrics. It is w orth noting that the p -values of 3D U-Net-Ours and 3D cFPN-Ours on metric Recall are beyond the 0 . 05 level, which indicates that our method, 3D U-Net and 3D cFPN achiev e similar SUBMIT TO IEEE TRANS. ON MEDICAL IMAGING, VOL. XX, NO. XX, XX XX 8 Fig. 8. 2D visual comparisons of segmented slices from 3D TRUS volumes. Left: prostate TRUS slices with orange arrows indicating missing/ambiguous boundaries; Right: corresponding segmented prostate boundaries using our method (green), 3D FCN [33] (cyan), 3D U-Net [39] (gray), BCRNN [41] (purple) and 3D cFPN (red), respecti vely . Blue contours are ground truths extracted by an experienced clinician. Our method has the most similar segmented boundaries to the ground truths. Specifically , compared to our ablation study (red contours), the proposed attention module is beneficial to learn more discriminative features indicating real prostate region and boundary . (W e encourage you to zoom in for better visualization.) performance with regard to the Recall ev aluation. In general, the results shown in T ables I and II pro ve the effect of our attention guided network on the accurate TRUS segmentation. Figs. 8, 9 and 10 visualize some segmentation results in 2D and 3D, respecti vely . Fig. 8 compares some segmented boundaries by different methods in 2D TR US slices. Appar- ently , our method obtains the most similar segmented bound- aries (green contours) to the ground truths (blue contours). Furthermore, as sho wn in Fig. 8, our method can successfully infer the missing/ambiguous boundaries, whereas other com- pared methods including 3D cFPN tend to fail in generating high-quality segmentations when there are ambiguous/missing boundaries in TRUS images. These comparisons demonstrate that the proposed deep attenti ve features can efficiently ag- gregate complementary multi-level information for accurate representation of the prostate tissues. Figs. 9 and 10 visualize 3D segmentation results by different methods on two TR US volumes. As sho wn in Fig. 9, our method has the most similar segmented surfaces to the ground truths (blue surfaces). Fig. 10 further depicts the corresponding surface distance between segmented surfaces and ground truths. It can be observed that our method consistently achiev es accurate and robust segmentation covering the whole prostate region. Giv en the 170 × 132 × 80 vox els input 3D TR US volume, the av erage computational times needed to perform a whole prostate segmentation for 3D FCN, 3D U-Net, BCRNN, 3D cFPN and our method are 1 . 10 , 0 . 34 , 31 . 09 , 0 . 24 and 0 . 30 seconds, respectively . Our method is faster than the 3D FCN, 3D U-Net and BCRNN. I V . D I S C U S S I O N In this paper, an attention guided neural network which generates attentiv e features for the segmentation of 3D TR US volumes is presented. Accurate and robust prostate segmen- tation in TR US images remains very challenging mainly due to the missing/ambiguous boundary of the prostate in TR US. Con ventional methods mainly employ prior shape information to constrain the segmentation, or design hand- crafted features to identify prostate regions, which generally tend to fail in faithfully delineating boundaries when there are missing/ambiguous boundaries in TR US images [11]. Recently , since con volutional neural network approaches hav e demonstrated to be very powerful to learn multi-le vel features for the ef fectiv e object segmentation [37], we are moti vated to dev elop a CNN based method to tackle the challenging issues in TRUS segmentation. T o the best of our knowledge, we are the pioneer to utilize 3D CNN with attention mechanisms to refine multi-lev el features for the better TR US segmentation. Deep con volutional neural networks ha ve achie ved superior performance in many image computing and vision tasks, due to the advantage of generating multi-le vel features consisting of ab undant semantic and fine information. Howe ver , ho w to lev erage the complementary advantages of multi-lev el features and to learn more discriminativ e features for image segmen- tation remains the key issue to be addressed. As shown in Figs. 2, 7 and 8, directly applying multi-le vel con v olutional features without distinction for TRUS segmentation tends to include non-prostate regions (due to low-le vel details from shallow layers) or lose details of prostate boundaries (due to high-lev el semantics from deep layers). In order to address this issue, we propose an attention guided network to select more discriminative features for TR US segmentation. Our attention module leverages the MLF as a fruitful feature pool SUBMIT TO IEEE TRANS. ON MEDICAL IMAGING, VOL. XX, NO. XX, XX XX 9 Fig. 9. 3D visualization of the segmentation results on two TRUS volumes. Rows indicate segmentation results on different TRUS data. Columns indicate the comparisons between ground truth (blue surface) and segmented surfaces (red) using (a) 3D FCN [33], (b) 3D U-Net [39], (c) BCRNN [41], (d) 3D cFPN, and (e) our method, respectively . Our method has the most similar segmented surfaces to the ground truths. Fig. 10. 3D visualization of the surface distance (in vox el) between segmented surface and ground truth. Different colors represent different surface distances. Rows indicate segmented surfaces on different TR US data. Columns indicate the segmented surfaces obtained by (a) 3D FCN [33], (b) 3D U-Net [39], (c) BCRNN [41], (d) 3D cFPN, and (e) our method, respecti vely . Our method consistently performs well on the whole prostate surface. to refine each SLF , by learning a set of weights to indicate the importance of MLF for specific SLF . T able I and Figs. 8, 9 and 10 all demonstrate that our attention module is useful to improve multi-level features for 3D TR US segmentation. More generally , the proposed attention module provides a feasible solution to effecti vely incorporate details at low lev els and semantics at high lev els for better feature representation. Thus as a generic feature refinement architecture, our attention module is potentially useful to become a beneficial component in other segmentation/detection networks for their performance improv ement. Considering the issue of missing/ambiguous prostate bound- ary in TR US images, we adopt a hybrid loss function, which combines binary cross-entropy loss and Dice loss for our seg- mentation network. The binary cross-entropy loss is preferred in preserving boundary details while the Dice loss emphasizes global shape similarity to generate compact segmentation. Therefore the hybrid loss is beneficial to lev erage both local and global shape similarity . This hybrid loss could be useful for other segmentation tasks and we will further explore it in our future work. Although the proposed method achiev es satisfactory per- formance in the experiments, there is still one important limitation in this study . The experiments were based on a four- fold cross-validation study with only forty TR US volumes. In each fold, test data were held out while the data from the remaining patients were used in training. The cross-validation was also to identify hyper-parameters that generalize well across the samples we learn from in each fold. Such cross- validation approach on forty samples may hav e caused over - fitting to training samples. As a result, future studies will focus on ev aluating the generalizability of the approach on a SUBMIT TO IEEE TRANS. ON MEDICAL IMAGING, VOL. XX, NO. XX, XX XX 10 larger dataset by properly dividing data to mutually exclusiv e training, validation and test subsets. V . C O N C L U S I O N This paper dev elops a 3D attention guided neural network with a nov el scheme for prostate segmentation in 3D tran- srectal ultrasound images by harnessing the deep attentiv e features. Our key idea is to select the useful complementary information from the multi-le vel features to refine the features at each individual layer . W e achiev e this by dev eloping an attention module, which can automatically learn a set of weights to indicate the importance of the features in MLF for each individual layer by using an attention mechanism. T o the best of our knowledge, we are the first to utilize attention mechanisms to refine multi-le vel features for the better 3D TR US segmentation. Experiments on challenging TR US vol- umes show that our se gmentation using deep attenti ve features achiev es satisfactory performance. In addition, the proposed attention mechanism is a general strategy to aggregate multi- lev el features and has the potential to be used for other medical image segmentation and detection tasks. A C K N O W L E D G M E N T The authors would like to thank the Associate Editor and the anonymous revie wers for their constructi ve comments. R E F E R E N C E S [1] R. L. Siegel, K. D. Miller, and A. Jemal, “Cancer statistics, 2018, ” CA: A Cancer Journal for Clinicians , v ol. 68, no. 1, pp. 7–30, 2018. [2] F . Pinto, A. T otaro, A. Calarco, E. Sacco, A. V olpe, M. Racioppi, A. D’Addessi, G. Gulino, and P . Bassi, “Imaging in prostate cancer di- agnosis: present role and future perspectives, ” Ur ologia Internationalis , vol. 86, no. 4, pp. 373–382, 2011. [3] H. Hricak, P . L. Choyke, S. C. Eberhardt, S. A. Leibel, and P . T . Scardino, “Imaging prostate cancer: a multidisciplinary perspecti ve, ” Radiology , vol. 243, no. 1, pp. 28–53, 2007. [4] Y . W ang, J.-Z. Cheng, D. Ni, M. Lin, J. Qin, X. Luo, M. Xu, X. Xie, and P . A. Heng, “T o wards personalized statistical deformable model and hybrid point matching for robust MR-TRUS registration, ” IEEE T ransactions on Medical Imaging , vol. 35, no. 2, pp. 589–604, 2016. [5] P . Y an, S. Xu, B. Turkbe y , and J. Kruecker , “Discrete deformable model guided by partial activ e shape model for TRUS image segmentation, ” IEEE T ransactions on Biomedical Engineering , vol. 57, no. 5, pp. 1158– 1166, 2010. [6] B. J. Davis, E. M. Horwitz, W . R. Lee, J. M. Crook, R. G. Stock, G. S. Merrick, W . M. Butler , P . D. Grimm, N. N. Stone, L. Potters et al. , “ American brachytherapy society consensus guidelines for transrectal ultrasound-guided permanent prostate brachytherapy , ” Brachytherapy , vol. 11, no. 1, pp. 6–19, 2012. [7] D. K. Bahn, F . Lee, R. Badalament, A. Kumar , J. Greski, and M. Cher- nick, “T argeted cryoablation of the prostate: 7-year outcomes in the primary treatment of prostate cancer , ” Urology , vol. 60, no. 2, pp. 3–11, 2002. [8] Y . Hu, H. U. Ahmed, Z. T aylor, C. Allen, M. Emberton, D. Hawkes, and D. Barratt, “MR to ultrasound registration for image-guided prostate interventions, ” Medical Image Analysis , vol. 16, no. 3, pp. 687–703, 2012. [9] Y . W ang, Q. Zheng, and P . A. Heng, “Online robust projective dictionary learning: Shape modeling for MR-TRUS registration, ” IEEE T r ansac- tions on Medical Imaging , vol. 37, no. 4, pp. 1067–1078, 2018. [10] J. A. Noble and D. Boukerroui, “Ultrasound image segmentation: a survey , ” IEEE T ransactions on Medical Imaging , vol. 25, no. 8, pp. 987–1010, 2006. [11] S. Ghose, A. Oliver , R. Mart ´ ı, X. Llad ´ o, J. C. V ilanova, J. Freixenet, J. Mitra, D. Sidib ´ e, and F . Meriaudeau, “ A survey of prostate segmen- tation methodologies in ultrasound, magnetic resonance and computed tomography images, ” Computer Methods and Progr ams in Biomedicine , vol. 108, no. 1, pp. 262–287, 2012. [12] H. M. Ladak, F . Mao, Y . W ang, D. B. Do wney , D. A. Steinman, and A. Fenster, “Prostate boundary segmentation from 2D ultrasound images, ” Medical Physics , vol. 27, no. 8, pp. 1777–1788, 2000. [13] S. D. Pathak, D. Haynor , and Y . Kim, “Edge-guided boundary delin- eation in prostate ultrasound images, ” IEEE T ransactions on Medical Imaging , vol. 19, no. 12, pp. 1211–1219, 2000. [14] A. Ghanei, H. Soltanian-Zadeh, A. Ratke wicz, and F .-F . Y in, “ A three- dimensional deformable model for segmentation of human prostate from ultrasound images, ” Medical Physics , vol. 28, no. 10, pp. 2147–2153, 2001. [15] D. Shen, Y . Zhan, and C. Davatzikos, “Segmentation of prostate boundaries from ultrasound images using statistical shape model, ” IEEE T ransactions on Medical Imaging , vol. 22, no. 4, pp. 539–551, 2003. [16] Y . W ang, H. N. Cardinal, D. B. Downey , and A. Fenster , “Semiautomatic three-dimensional segmentation of the prostate using two-dimensional ultrasound images, ” Medical Physics , vol. 30, no. 5, pp. 887–897, 2003. [17] N. Hu, D. B. Downe y , A. Fenster, and H. M. Ladak, “Prostate boundary segmentation from 3D ultrasound images, ” Medical Physics , vol. 30, no. 7, pp. 1648–1659, 2003. [18] L. Gong, S. D. Pathak, D. R. Haynor, P . S. Cho, and Y . Kim, “Parametric shape modeling using deformable superellipses for prostate segmentation, ” IEEE Tr ansactions on Medical Imaging , vol. 23, no. 3, pp. 340–349, 2004. [19] S. Badiei, S. E. Salcudean, J. V arah, and W . J. Morris, “Prostate segmentation in 2D ultrasound images using image warping and ellipse fitting, ” in International Confer ence on Medical Image Computing and Computer-Assisted Intervention . Springer, 2006, pp. 17–24. [20] I. B. T utar , S. D. Pathak, L. Gong, P . S. Cho, K. W allner , and Y . Kim, “Semiautomatic 3-D prostate segmentation from TRUS images using spherical harmonics, ” IEEE T ransactions on Medical Imaging , vol. 25, no. 12, pp. 1645–1654, 2006. [21] Y . Zhan and D. Shen, “Deformable segmentation of 3-D ultrasound prostate images using statistical texture matching method, ” IEEE T rans- actions on Medical Imaging , vol. 25, no. 3, pp. 256–272, 2006. [22] P . Y an, S. Xu, B. T urkbey , and J. Kruecker, “ Adaptiv ely learning local shape statistics for prostate segmentation in ultrasound, ” IEEE T ransactions on Biomedical Engineering , vol. 58, no. 3, pp. 633–641, 2011. [23] S. Ghose, A. Oliver , J. Mitra, R. Mart ´ ı, X. Llad ´ o, J. Freixenet, D. Sidib ´ e, J. C. V ilanov a, J. Comet, and F . Meriaudeau, “ A supervised learning framew ork of statistical shape and probability priors for automatic prostate segmentation in ultrasound images, ” Medical Image Analysis , vol. 17, no. 6, pp. 587–600, 2013. [24] W . Qiu, J. Y uan, E. Ukwatta, Y . Sun, M. Rajchl, and A. Fenster, “Prostate segmentation: an efficient conve x optimization approach with axial symmetry using 3-D TR US and MR images, ” IEEE T ransactions on Medical Imaging , vol. 33, no. 4, pp. 947–960, 2014. [25] C. Santiago, J. C. Nascimento, and J. S. Marques, “2D segmentation using a robust active shape model with the EM algorithm, ” IEEE T ransactions on Image Processing , vol. 24, no. 8, pp. 2592–2601, 2015. [26] P . W u, Y . Liu, Y . Li, and B. Liu, “Robust prostate segmentation using intrinsic properties of TR US images, ” IEEE T ransactions on Medical Imaging , vol. 34, no. 6, pp. 1321–1335, 2015. [27] X. Li, C. Li, A. Fedorov , T . Kapur , and X. Y ang, “Segmentation of prostate from ultrasound images using level sets on activ e band and intensity v ariation across edges, ” Medical Physics , vol. 43, no. 6Part1, pp. 3090–3103, 2016. [28] X. Y ang, P . J. Rossi, A. B. Jani, H. Mao, W . J. Curran, and T . Liu, “3D transrectal ultrasound (TRUS) prostate segmentation based on optimal feature learning framework, ” in Medical Imaging 2016: Image Pr ocessing , vol. 9784. International Society for Optics and Photonics, 2016, p. 97842F . [29] L. Zhu, C.-W . Fu, M. S. Brown, and P .-A. Heng, “ A non-local low- rank framework for ultrasound speckle reduction, ” in Proceedings of the IEEE Conference on Computer V ision and P attern Recognition , 2017, pp. 5650–5658. [30] L. Ma, R. Guo, Z. Tian, and B. Fei, “ A random walk-based segmentation framew ork for 3D ultrasound images of the prostate, ” Medical Physics , vol. 44, no. 10, pp. 5128–5142, 2017. [31] D. Ciresan, A. Giusti, L. M. Gambardella, and J. Schmidhuber, “Deep neural networks segment neuronal membranes in electron microscopy images, ” in Advances in Neur al Information Processing Systems , 2012, pp. 2843–2851. [32] J. Schmidhuber , “Deep learning in neural networks: An overview , ” Neural Networks , v ol. 61, pp. 85–117, 2015. SUBMIT TO IEEE TRANS. ON MEDICAL IMAGING, VOL. XX, NO. XX, XX XX 11 [33] J. Long, E. Shelhamer, and T . Darrell, “Fully convolutional networks for semantic segmentation, ” in Proceedings of the IEEE Conference on Computer V ision and P attern Recognition , 2015, pp. 3431–3440. [34] O. Ronneberger , P . Fischer , and T . Brox, “U-net: Convolutional net- works for biomedical image segmentation, ” in International Confer ence on Medical Image Computing and Computer-Assisted Intervention . Springer , 2015, pp. 234–241. [35] P . Liskowski and K. Krawiec, “Segmenting retinal blood vessels with deep neural networks, ” IEEE Tr ansactions on Medical Imaging , vol. 35, no. 11, pp. 2369–2380, 2016. [36] M. Havaei, A. Da vy , D. W arde-Farley , A. Biard, A. Courville, Y . Bengio, C. Pal, P .-M. Jodoin, and H. Larochelle, “Brain tumor segmentation with deep neural networks, ” Medical Image Analysis , vol. 35, pp. 18–31, 2017. [37] L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy , and A. L. Y uille, “Deeplab: Semantic image segmentation with deep con volutional nets, atrous con volution, and fully connected CRFs, ” IEEE T ransactions on P attern Analysis and Machine Intelligence , vol. 40, no. 4, pp. 834–848, 2018. [38] Y . Guo, Y . Gao, and D. Shen, “Deformable MR prostate segmentation via deep feature learning and sparse patch matching, ” IEEE T ransactions on Medical Imaging , vol. 35, no. 4, pp. 1077–1089, 2016. [39] N. Ghav ami, Y . Hu, E. Bonmati, R. Rodell, E. Gibson, C. Moore, and D. Barratt, “ Automatic slice segmentation of intraoperativ e transrectal ultrasound images using convolutional neural networks, ” in Medical Imaging 2018: Image-Guided Pr ocedur es, Robotic Interventions, and Modeling , vol. 10576. International Society for Optics and Photonics, 2018, p. 1057603. [40] N. Ghavami, Y . Hu, E. Bonmati, R. Rodell, E. Gibson, and et al, “Integration of spatial information in con volutional neural networks for automatic segmentation of intraoperative transrectal ultrasound images, ” Journal of Medical Imaging , vol. 6, no. 1, p. 011003, 2018. [41] X. Y ang, L. Y u, L. W u, Y . W ang, D. Ni, J. Qin, and P .-A. Heng, “Fine- grained recurrent neural networks for automatic prostate segmentation in ultrasound images, ” in AAAI Confer ence on Artificial Intelligence , 2017, pp. 1633–1639. [42] D. Karimi, Q. Zeng, P . Mathur, A. A vinash, S. Mahdavi, I. Spadinger, P . Abolmaesumi, and S. Salcudean, “ Accurate and robust segmentation of the clinical target volume for prostate brachytherapy , ” in International Confer ence on Medical Image Computing and Computer-Assisted Inter- vention . Springer , 2018, pp. 531–539. [43] E. M. A. Anas, S. Nouranian, S. S. Mahdavi, I. Spadinger, W . J. Morris, S. E. Salcudean, P . Mousavi, and P . Abolmaesumi, “Clinical target-v olume delineation in prostate brachytherapy using residual neural networks, ” in International Conference on Medical Image Computing and Computer-Assisted Intervention . Springer , 2017, pp. 365–373. [44] E. M. A. Anas, P . Mousavi, and P . Abolmaesumi, “ A deep learning approach for real time prostate segmentation in freehand ultrasound guided biopsy , ” Medical Image Analysis , v ol. 48, pp. 107–116, 2018. [45] Y . W ang, Z. Deng, X. Hu, L. Zhu, X. Y ang, X. Xu, P .-A. Heng, and D. Ni, “Deep attentional features for prostate segmentation in ultrasound, ” in International Confer ence on Medical Image Computing and Computer-Assisted Intervention . Springer , 2018, pp. 523–530. [46] S. Xie, R. Girshick, P . Doll ´ ar , Z. Tu, and K. He, “ Aggregated residual transformations for deep neural networks, ” in Pr oceedings of the IEEE International on Computer V ision and P attern Recognition . IEEE, 2017, pp. 5987–5995. [47] F . Y u and V . Koltun, “Multi-scale context aggregation by dilated con volutions, ” arXiv pr eprint arXiv:1511.07122 , 2015. [48] T .-Y . Lin, P . Doll ´ ar , R. B. Girshick, K. He, B. Hariharan, and S. J. Be- longie, “Feature pyramid networks for object detection, ” in Proceedings of the IEEE International on Computer V ision and P attern Recognition , 2017, pp. 2117–2125. [49] S. Xie and Z. T u, “Holistically-nested edge detection, ” in Proceedings of the IEEE International Conference on Computer V ision , 2015, pp. 1395–1403. [50] Q. Dou, L. Y u, H. Chen, Y . Jin, X. Y ang, J. Qin, and P .-A. Heng, “3D deeply supervised network for automated segmentation of volumetric medical images, ” Medical Image Analysis , vol. 41, pp. 40–54, 2017. [51] L.-C. Chen, G. Papandreou, F . Schroff, and H. Adam, “Rethinking atrous conv olution for semantic image segmentation, ” arXiv pr eprint arXiv:1706.05587 , 2017. [52] Y . W u and K. He, “Group normalization, ” in Proceedings of The Eur opean Conference on Computer V ision , September 2018. [53] J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks, ” in Pr oceedings of the IEEE Conference on Computer V ision and P attern Recognition , 2018, pp. 7132–7141. [54] H. Zhao, Y . Zhang, S. Liu, J. Shi, C. Change Loy , D. Lin, and J. Jia, “PSANet: Point-wise spatial attention network for scene parsing, ” in Pr oceedings of the European Confer ence on Computer V ision (ECCV) , 2018, pp. 267–283. [55] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human-le vel performance on imagenet classification, ” in Pr oceedings of the IEEE International Conference on Computer V ision , 2015, pp. 1026–1034. [56] F . Milletari, N. Na vab, and S.-A. Ahmadi, “V -net: Fully con volutional neural networks for volumetric medical image segmentation, ” in 2016 F ourth International Confer ence on 3D V ision (3DV) . IEEE, 2016, pp. 565–571. [57] D. P . Kingma and J. Ba, “ Adam: A method for stochastic optimization, ” Computer Science , 2014. [58] H.-H. Chang, A. H. Zhuang, D. J. V alentino, and W .-C. Chu, “Perfor- mance measure characterization for evaluating neuroimage segmentation algorithms, ” Neur oimage , vol. 47, no. 1, pp. 122–135, 2009. [59] G. Litjens, R. T oth, W . van de V en, C. Hoeks, S. Kerkstra, B. van Ginneken, G. V incent, G. Guillard, N. Birbeck, J. Zhang et al. , “Eval- uation of prostate segmentation algorithms for MRI: the PROMISE12 challenge, ” Medical Image Analysis , vol. 18, no. 2, pp. 359–373, 2014. [60] J. Neter , W . W asserman, and M. H. Kutner, Applied linear statistical models: r e gression, analysis of variance, and e xperimental designs; thir d edition . R.D. Irwin, 1985.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment