A Deep Learning Framework for Single-Sided Sound Speed Inversion in Medical Ultrasound

1 A Deep Learning Frame work for Single-Sided Sound Speed In v ersion in Medical Ultrasound Micha Feigin, Daniel Freedman, and Brian W . Anthony Abstract —Objective: Ultrasound elastography is gaining trac- tion as an accessible and useful diagnostic tool for such things as cancer detection and differentiation and th yroid disease diagnostics. Unfortunately , state of the art shear wave imaging techniques, essential to promote this goal, are limited to high- end ultrasound hardware due to high power requir ements; are extremely sensitive to patient and sonographer motion, and generally , suffer from low frame rates. Motivated by resear ch and theory showing that longitudinal wav e sound speed carries similar diagnostic abilities to shear wav e imaging, we present an alternative approach using single sided pressure-wa ve sound speed measurements from channel data. Methods: In this paper , we present a single-sided sound speed in version solution using a fully conv olutional deep neural network. W e use simulations for training, allowing the generation of limitless ground truth data. Results: W e show that it is possible to invert for longitudinal sound speed in soft tissue at high frame rates. W e validate the method on simulated data. W e present highly encouraging results on limited real data. Conclusion: Sound speed inv ersion on channel data has sig- niﬁcant potential, made possible in real time with deep learning technologies. Signiﬁcance: Specialized shear wa ve ultrasound systems r e- main inaccessible in many locations. longitudinal sound speed and deep learning technologies enable an alternative approach to diagnosis based on tissue elasticity . High frame rates are possible. Index T erms —deep lear ning, in verse pr oblems, ultrasound, sound speed inversion I . I N T RO D U C T I O N Mechanical tissue properties, tissue structures, and the spa- tial arrangement of properties and structures are useful in disease diagnosis in v arious organs, including the kidneys [1], [2], thyroid, muscle, breast [3], [4], liv er [5], [6], and prostate. T racking changes in tissue properties, tissue structure, and the spatial distribution of both is useful for monitoring disease progression as well as response to therapeutic interventions. As a clinical imaging modality , ultrasound is different from modalities such as CT and MRI in that it uses non-ionizing radiation, is mobile, and has signiﬁcantly lower purchase and operating costs than most other medical imaging alternati ves. The mode of operation is also quite different, as an interactive exploratory approach is taken. The operator can mov e the This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this v ersion may no longer be accessible. M. Feigin and B. W . Anthony are with the Department of Mechanical Engineering, Massachusetts Institute of T echnology , Cambridge, MA, USA. D. Freedman is with Google Research, Haifa, Israel. 20 40 60 Channel # 0 5 10 15 20 25 Time ( s) 40 60 80 Channel # 0 5 10 15 20 25 Time ( s) 80 100 120 Channel # 0 5 10 15 20 25 Time ( s) Crop line Crop line (a) Channel data input (b) B-mode image -1 0 1 x (cm) 0 0.5 1 1.5 2 2.5 3 3.5 Depth (cm) 1300 1400 1500 1600 1700 1800 m/s (c) Sound speed output Fig. 1. Goal: the target of this work is to be able to take raw ultrasound channel data (a) and in addition to the standard B-mode image (b), also produce the corresponding tissue sound speed map (c). probe around, v ary applied pressure, and adapt to ﬁndings in real time, making real-time quantitativ e diagnosis techniques that much more important. On the downside, different tissue types are not easily dif ferentiated in the images, requiring more experience to interpret the images. Embedded in ultrasound signals is information about the mechanical, and acoustic, properties of the tissue through which the ultrasound wav es hav e propag ated or from which ultrasound waves hav e been reﬂected. Properties include the longitudinal-wa ve speed of sound, shear-wa ve speed of sound, tissue density , attenuation, shear modulus, and bulk modulus. As part of the classical B-mode imaging process howe ver , signiﬁcant parts of this information are discarded through the application of beamforming (delay and sum focusing) and en velope detection. In this work, we exploit the information embedded in the raw ultrasound channel data signal. As depicted in Fig. 1, we take ultrasound channel data and generate the sound speed map of the tissue, without going through an explicit imaging or beam-forming step. This resulting information is useful both directly for diagnostic purposes, as well as indirectly , for improving the image formation process, ﬁnal image quality , and correct for refraction errors. our approach relies on the 2 power of deep con volutional neural networks. W e present validation results on simulation data as well as encouraging initial results on real data. The past sev eral years hav e seen a burst of interest in the use of artiﬁcial inteligence (AI) for improving the physi- cian’ s workﬂow , from automatic analysis and improv ement of medical images to the incorporation of medical data and physician notes into diagnostics. Considerable research has gone into analyzing image attributes for disease biomarkers. Howe ver , the majority of this research ef fort has taken the imaging process itself as a giv en and has focused on processing the images coming from a ﬁxed process. There has been considerably less work on the use of Deep Learning for the direct analysis and processing of raw signals. In ultrasound, the raw signals are the wa veforms coming from individual transducer elements, called channel data. This work is a ﬁrst step to wards learning a full wav eform solver for recov ering elastic and viscoelastic tissue parameters using Deep Learning and shows the viability of this approach. Despite numerous applications for various inv erse problems within dif ferent image domains, this is the ﬁrst work we are aware of which applies a Deep Learning framework to the analysis of raw time domain RF type signals. An adv antage of our approach is that it can work at real-time frame rates, with the current implementation running at ov er 150fps on a single NVIDIA 1080Ti GPU. It requires only a small number of transmits, and can, therefore, run in parallel to standard imaging. The physical limitation on frame rates is a function of tissue depth, but is thousands of frames per second. This opens the door to dynamic functional imaging. I I . B AC K G RO U N D A N D P R I O R W O R K A. General Bac kgr ound Our goal in this work is to measure physical tissue prop- erties with diagnostic relev ance. The currently deployed ap- proach in medical ultrasound is shear wa ve elastography [7], [8], [9]. Shear wav e imaging is based on the measuring the speed at which the shear wa ve front propagates in tissue. The shear w av e speed is directly dependent on the shear modulus; it is used to approximate Y oung’ s modulus by employing sev eral assumptions, mainly , isotropy , incompressibility , and a known density . Y oung’ s modulus is the value most closely related to what we intuiti vely perceive as material stif fness, and is thus related to the ph ysician’ s notion of palpation based diagnostics. An alternati ve approach, used mostly in the ﬁeld of breast imaging, is travel time tomography [10], [11]. Here, the speed of compression, or longitudinal, wa ves in tissue is measured based on the acoustic travel times between kno wn location pairs. Longitudinal speed of sound measurements can be viewed, depending on the case at hand, as either an alternative or a complementary approach to shear wave imaging, as the longitudinal sound speed is also related to Y oung’ s modulus. Among other things, variations of longitudinal sound speed in fat are more distinct than shear sound speed, making it rele vant for diagnosing liv er and kidney diseases, such as non-alcoholic fatty liv er disease (N AFLD), as well as degenerati ve muscle diseases such as Duchenne’ s muscular dystrophy . In this work, we tak e the approach of recov ering speed of sound information. W e ho wev er bypass tomographic imaging and take the path of single sided sound speed recov ery , similar to seismic imaging [12]. W e use numerical simula- tions to train a deep learning network to extract the speed of sound information present in the raw reﬂected pressure wa ves. Some examples of information regarding sound speed, independent of comparing geometric travel distance to travel times, includes the geometry and wav e front deformation due to sound speed and refraction, the presence and appearance of head wav es (wav es trav eling along the interface with a higher velocity medium), critical reﬂection angle, reﬂection amplitude and sign (positi ve or negati ve reﬂection) as well as the reﬂection coef ﬁcient variation as a function of the angle of incidence, called amplitude versus angle (A V A) or amplitude versus of fset (A V O). Probably the easiest of these to understand is the wav e front geometry of a point scaterrer . These appear as hyperbolas in the space/time plot, similar to those seen in Fig. 1a. The angle between the asymptotes depends only on the sound speed, while the structure of the apex is also dependent on the distance from the probe to the scatterer . Refracted wav es deform the wa vefront, providing both a source of information on sound speed, as well as motiv ation for being able to correct for sound speed variations. Next, we highlight speciﬁc aspects of prior work. W e begin by with an overvie w of the physics and techniques for shear wa ve elastography and tomographic ultrasound elastography . W e summarize the clinical motiv ation for using these metrics. Finally , we gi ve an ov erview of deep learning in the context of medical imaging. B. Elastography and Full W aveform In version The prev ailing model for ultrasound elastography of soft tissue is that of a linear isotropic elastic material [7]. While this model does not account for non-linear ef fects, it is still useful for diagnostic purposes for many soft tissues. Under this model, tissue properties can be described using density and two independent elastic coefﬁcients. Some of the commonly used pairs are Y oung’ s modulus ( E ) and Poisson ratio ( ν ), b ulk modulus ( K ) and shear modulus ( G ), and the two Lamme parameters ( λ and µ - where µ is the shear modulus). Y oung’ s modulus is most often used to describe tissue stiffness. The complementary elastic parameter , either the Poisson ratio or bulk modulus, as well as the density , are often assumed to be constant in soft tissue imaging, and dominated by the tissue’ s high water content. The pressure wav e (also known as p-wav e, primary wave or longitudinal wa ve) is an acoustic wa ve used for ultrasound imaging and travels at 1540 m/s ± 10% on av erage in soft tissue. The shear wav e (also known as s-wave, secondary wa ve or transverse wav e) is measured indirectly in ultrasound elastography , using pressure wa ves, and is much slower . It trav els at velocities on the order of 3 m/s in healthy tissue, and up to 60 m/s in highly pathological tissue. When working in the linear acoustics regime in soft tissue, as is the case for medical ultrasound imaging, the pressure wav es and the two orthogonally polarized shear wav es are independent, and 3 only distinctly couple at strong discontinuities. Note howe ver that this independence is only partially correct, as the pressure wa ves actually “see” the displacement caused by the shear wa ves. This, together with the vast sound speed difference between the two wa ve types, allows the use of pressure wa ves to image shear wav e propagation and is the basis for shear wa ve imaging. Common methods of shear wav e generation include acoustic radiation force (ARFI) [8] and supersonic shear wa ve imaging [13], [14]. A mechanical shear wav e is generated in the tissue and its propagation speed is tracked using pressure waves. These methods, howe ver , are limited to high-end de vices due to high power and probe requirements. They also generally suffer from low frame rates, long settling times, and high sensitivity to sonographer and subject motion. T omographic ultrasound imaging for travel time tomogra- phy and full wav eform inv ersion (FWI) are related and acti vely researched techniques. T rav el time tomography measures ﬁrst arriv al times between a set of kno wn transmitter-recei ver pairs. This trav el time depends on the integral on slowness (reciprocal of the sound speed) along the geodesic. FWI performs optimization on a tissue model, to minimize the residual between the measured signal and the simulation and is not dependent on knowing trav el distances. This research is currently mostly focused on breast [10], [11], [15], [16], [17], [18], [19], and musculoskeletal [20], [21] imaging, both showing promising prospects. Current in vi vo implementations require a full circumferential ﬁeld of view , limiting them to small body parts. Both are computationally expensi ve, with FWI being more costly . FWI is sensitiv e to noise, choice of initial conditions and has a harder time with piece wise constant velocities, while travel time tomography is sensitiv e to lensing effects. One example is limb imaging, where the ﬁrst arri val trav els in the bone layer around the marrow , shadowing the signal tra veling through the marrow , making it difﬁcult to impossible to image the bone marro w . Single-sided techniques are in use in the seismic domain, but often require operator intervention for good results. Their use in medical ultrasound imaging is limited, and is generally performed based on focusing techniques or PSF analysis on the ﬁnal b-mode image, and not by an in version method [22], [23], [24], [25], [26]. One of the few exceptions is the CUTE method that uses the wa vefront deformation as seen in the reﬂection pattern of scatterers [27], [28]. The motiv ation for these approaches can be understood by looking at the dependence of the shear wav e and longitudinal wa ve sound speeds on the underlying physical properties: C longitudinal = s K + 4 3 G ρ (1) C shear = s G ρ (2) where C denotes the appropriate speed of sound. Under the previously stated assumption of a constant bulk modulus and density , at least to a ﬁrst order approximation, both squared velocities depend linearly on the same single value, the shear T ABLE I Y O U N G ’ S M O DU L U S A S A D I S EA S E B I O M AR K E R F O R V A R IO U S B R E A ST T I SS U E T Y P E S [ 2 9] , [ 3 0] Brest tissue type # of samples Y oung’ s modulus (kPa) mean ± STD Normal fat 71 3.25 ± 0.91 Normal ﬁbroglandular tissue 26 3.24 ± 0.61 Fibroadenoma 16 6.41 ± 2.86 Low-grade IDC 12 10.40 ± 2.60 ILC 4 15.62 ± 2.64 DCIS 4 16.38 ± 1.55 Fibrocystic disease 4 17.11 ± 7.35 Intermediate-grade IDC 21 19.99 ± 4.2 High-grade IDC 9 42.52 ± 12.47 IMC 1 20.21 Fat necrosis 1 4.45 T ABLE II L O NG I T U DI NA L S O UN D S PE E D A S A D I S E AS E B IO M A R KE R F OR V A R I O US B R EA S T T I S S UE T Y PE S [ 3 3] Brest tissue type Sound speed (m/s) Normal fat 1442 ± 9 Breast parenchyma 1487 ± 21 Benign breast lesions 1513 ± 27 Malignant breast lesions 1548 ± 17 modulus, which is in turn directly related to the v alue of interest, Y oung’ s modulus. C. Clinical Motivation Short of attaining fully automated diagnostics, the next best thing is to solve the inv erse problem of measuring physical tissue properties. Focus is giv en to properties that can be used directly by the physician as reliable disease biomarkers. Achieving that end in an accessible and easily undertaken way can greatly improve the physician workﬂow as well as make quality health care much more accessible. Again, current research is split in two main directions, shear wa ve elastography and ultrasound tomography , both trav el time tomography and full wav eform inv ersion. T able I presents Y oung’ s modulus values for several healthy and pathological types of breast tissue as giv en by [29], [30]. Y oung’ s modulus has a strong predicti ve value for detecting and differentiating pathological tissue. Researchers have also shown that longitudinal wav e sound speed has similar diagnostic ability to shear wa ve imaging [11], [31], [32], [33], [34], [35]. Some of these results taken on breast tissue are presented in T able II (from [33]). Using the longitudinal speed of sound as a substitute for the transverse speed of sound presents several potential advantages. Longitudinal wav es travel signiﬁcantly faster in tissue than transverse wa ves, allowing for much higher frame rates. T ransverse wa ves cannot be detected directly by the probe, due to their strong attenuation in tissue along with low sensitivity of the sensor ellements to transverse motion. As a result, they are only imaged indirectly based on their effect on longitudinal wav es. The particle motion that is detected is on the order of 10 microns, on the order of 1 / 30 of a wa velength, resulting in measurements that are highly sensiti ve to probe and subject motion. The amount of ener gy required 4 to generate shear wa ves using acoustic radiation force is also high, requiring correspondingly high powered de vices. This, in turn, limits this technology to high-end ultrasound machines; furthermore, frame rates must be lowered due to FD A limitations on transmission power , tissue heating, and long settling times. D. Deep Learning The astounding success of Deep Learning in ﬁelds including computer vision, speech recognition, and natural language processing is by now widely known. Neural networks hav e achiev ed state of the art results on many benchmarks within each of these ﬁelds. In most cases, the networks in question are relativ ely deep (tens or hundreds of layers) and are trained by stochastic gradient descent or related techniques, as implemented by the standard backpropagation algorithm. Deep Learning has also achiev ed great success in medical imaging on standard computer vision tasks, such as classiﬁ- cation [36], detection [37], and segmentation [38]. Howe ver , only recently has Deep Learning been applied to problems in sensing and image reconstruction. The gro wing popularity of this trend is exempliﬁed by the recent special issue (June 2018) of TMI which was de voted to this topic. The issue contained many papers related to both CT and MRI. Within the realm of CT , a variety of topics were examined, including artifact reduction [39], denoising [40], [41], and sparse-view [42], [43] and low-dose [44], [45] reconstruction. Papers on MRI tended to focus on Deep Learning approaches to compressiv e sensing [46], [47], [48]. Deep Learning has also been applied, though not quite as widely , to PET [49], [50] and Photoacoustic T omography [51], [52]. Furthermore, we note that in the broader signal processing community , work has been devoted to applying Deep Learning techniques to general reconstruction problems, such as compressi ve sensing [53] and phase retriev al [54]. W ithin the ﬁeld of ultrasound, Deep Learning has been successfully employed in a few areas. V edula et al. [55] train a multi-resolution CNN to generate CT quality images from raw ultrasound measurements. Y oon et al. [56] apply Deep Learning to produce B-mode images from a small number of measurements, which can circumvent the heavy computational load imposed by competing compressed sensing algorithms. T om and Sheet [57] propose a method for generating B- mode images based on generative adversarial networks, which obviates the need for running expensi ve wav e equation solvers. Luchies and Byram [58] apply Deep Learning to the problem of beamforming in ultrasound, to minimize of f-axis scattering. Reiter and Bell [59] use a CNN to identify a common type of ultrasound artifact, namely the reﬂection artifact that arises when a small point-like target is embedded in the middle of an echogenic structure. Finally , we note that Deep Learning has also been applied in non-reconstruction tasks to ultrasound, including classiﬁcation [60], [61] and segmentation [62]. I I I . S I M U L A T I O N S Collecting real-world ground truth ultrasound data in quan- tities suf ﬁcient for training a neural network, is practically im- Probe face Propagation direction Recovered domain (a) Sound speed Probe face Propagation direction (b) Speckle Fig. 2. Simulation setup. Reﬂecting objects are deﬁned in the sound speed domain (a). Ultrasound speckle is deﬁned in the density domain (b). The probe face is at the top end of the domain, marked by hash marks, and is outside the PML, propagation is towards the bottom. The recovered sound speed domain is marked by a red dashed line in (a). possible. This leaves us with the alternativ e of using simulated data. For our simulations, the sensor is modeled based on our physical system, a Cephasonics cQuest Cicada ultrasound system, capable of transmitting and recei ving on 64 channels at a time. The ultrasound probe is a 1D linear array transmitting at a central frequency of 5 M H z . The probe face is 3 . 75 cm wide and contains 128 elements. W e locate the probe plane in the simulations just outside the perfectly matched layer (PML - used to numerically absorb wav es incident on the boundary) with four grid points per Piezo element and an extra four grid points for the kerf (spacing between elements). The total grid dimension is 4 . 24 cm by 4 . 24 cm or 1152 by 1152 elements. The simulation is run in 2D. T o generate the training data, we use a simpliﬁed soft tissue model for organs and lesions. The emphasis is on a model that can generate a large and diverse set of random samples. W e model org ans in tissue as uniform ellipses ov er a homogeneous background. The mass density is set to 0 . 9 g /cm 3 . Between one and ﬁv e ellipses (organs) are randomly placed. The sound speed for the background and each of the ellipses is randomly selected based on a uniform distribution in the range of 1300 m/s and 1800 m/s . Random speckle is generated in the density domain with a uniformly distributed mass density variations between − 3% and +6% and a mean distribution density of 2 reﬂectors per wav elength ( λ ) squared. Attenuation is ﬁxed at 0 . 5 dB / ( M H z · cm ) , or 2 . 5 dB /cm at the center frequency of 5 M H z . For the recovered domain we chose the central section, 1 . 875 cm wide by 3 . 75 cm deep, with a 3 . 75 cm wide probe. This is guided by two considerations: (1) cov erage limitations due to maximum aperture size; and (2) the desire to show that our method can handle signals arriving from outside the recovered domain. The setup is depicted in Figure 2. The numerical solver we work with is the k-wa ve toolbox for MA TLAB [63], [64]. It presents a compromise that can deal with both discontinuities as well as speckle noise over non-uniform domains while maintaining decent run times on an NVIDIA GPU. For the transmit pattern we are limited by three parameters: 5 (a) Left plane wave (b) Middle plane wav e (c) Right plane wave Fig. 3. The diagonal three plane wa ves generated in k-wav e as well as by the real probe. Each plane wave is generated by 64 elements (half the probe), the limit of our current system. simulation time, network resources, and signal to noise ratio (SNR). Both simulation time, as well as network run-time, training time, and resources, are controlled by the number of pulses. In this work, we inv estigate the sound speed recovery quality using either one or three transmit pulses. This prevents us from using the classic scanning focused beam imaging approach. Due to SNR issues, using point source transmits, i.e. transmitting from a single element, is also problematic. As a result, we choose to work with three plane wav es: one direct plane wav e transmitted from the center of the probe, and two diagonal plane wav es from the edges. These plane wa ves are depicted in Figure 3. The plane wa ve angle is chosen to best cover the full domain. I V . N E T W O R K S E T U P W e now describe the structure of our neural network. W e wish to map signals to signals, hence we use a type of fully con volutional neural network (FCNN) - that is, there are no fully-connected layers, only con volutional layers, in addition to v arious non-linear acti vations. Howe ver , most FCNNs as- sume the input and output sizes are identical, whereas that is not the case in our setup. Therefore, we use striding within the con volutions to effecti vely achie ve decimation. Our base network architecture is depicted in Figure 4, and possesses an encoder-decoder or “hourglass” structure. In examining Figure 4, note the C × H × W con vention is used to describe the size of the layer’ s output: that is, number of channels by height by width. The structure is referred to as hourglass due to its shape: in the initial “encoding” layers (shown in orange in Figure 4), the resolution decreases, i.e. H and W both decrease; while the number of channels increases, from the initial 1, to 32, 64, 128, and ﬁnally 512. Thus, the layers get smaller but longer . In the second “decoding” half of the network (shown in blue in Figure 4), the process is reversed: the resolution increases, while the number of channels goes down, ﬁnally reaching a single channel at the output. Thus, the layers get larger but shorter . Note that due to the input geometry and output aspect ratio, one linear interpolation step is required. This results from the resolution increase not being a factor of two, going from a vertical resolution of 152 to 256. On the encoding/downsampling path, the ﬁrst four stages consist of a strided 3 × 3 con volution followed by batch nor - malization and a Relu operation. Note that the stride used is 2 in the width dimension, which has the effect of downsampling 1 X 64 X 2462 32 X 64 X 1231 32 X 64 X 616 32 X 64 X 308 32 X 64 X 154 32 X 32 X 77 64 X 16 X 38 128 X 8 X 19 512 X 16 X 38 128 X 32 X 76 64 X 64 X 152 32 X 128 X 256 32 X 128 X 256 1 X 128 X 256 Fig. 4. Base network setup, for handling a single transmitted plane wa ve signal. The green layer denotes the input layer . Orange layers are the en- coding/downsampling layers. Blue layers are the decoding/upsampling layers. Most steps in volve a decrease or increase of resolution by a factor of two, except for the last upsampling step which is a linear interpolation stage to adapt the aspect ratio. that dimension by a factor of 2. (In fact, the downsampling factor is sometimes not exactly 2; this is due to the nature of the con volutional padding used in the particular layer .) The following three stages consist of an ordinary (non-strided) 3 × 3 con volution follo wed by batch normalization, Relu and a 2 × 2 maxpool. The latter operation has the effect of reducing the resolution by a factor of 2 in both height and width dimen- sions (again, approximately , depending on the con volutional padding used). For the decoding/upsampling path, each of the ﬁrst three stages consists of a 3 × 3 con volution follo wed by batch normalization, Relu and a × 2 up-sampling. The fourth stage in volv es a 3 × 3 con volution followed by batch normalization, Relu, and linear interpolation. The ﬁfth stage is a 3 × 3 con volution followed by batch normalization, Relu (and no upsampling/interpolation). The ﬁnal stage is a 1 × 1 con volution, which reduces the number of channels to one, and generates the output. For the loss function, we used an L 2 loss comparing the output of the network to the expected sound speed map, based on down sampling and cropping the map the was used to generate the signal. The base network, shown in Figure 4, has the capability of dealing with a single plane-wave. W e would like to use a variant of this base network for dealing with three plane- wa ves. T o that end, we test three dif ferent possibilities, as depicted in Figure 5. In the “Start Network”, the three plane wa ves are simply concatenated into a 3-channel image, and the remainder of the base network is identical. In the “Middle Network”, the three plane-wa ves are each passed into identical subnetworks for the encoding/downsampling part of the base network; the results are then concatenated channel-wise, and the remainder of the decoding/upsampling part of the base network is the same. In the “End Network”, the same idea is used, but the channel-wise concatenation only happens at the very end, before the 1 × 1 con volution. Note that for 6 3 X 64 X 2462 32 X 64 X 1231 32 X 64 X 616 32 X 64 X 308 32 X 64 X 154 32 X 32 X 77 64 X 16 X 38 128 X 8 X 19 512 X 16 X 38 128 X 32 X 76 64 X 64 X 152 32 X 128 X 256 32 X 128 X 256 1 X 128 X 256 (a) Start 512 X 16 X 38 128 X 32 X 76 64 X 64 X 152 32 X 128 X 256 32 X 128 X 256 1 X 128 X 256 384 X 8 X 19 1 X 6 32 X 32 X 32 X 32 X 32 X 64 X 128 X 1 X 6 32 X 32 X 32 X 32 X 32 X 64 X 128 X 1 X 64 X 2462 32 X 64 X 1231 32 X 64 X 616 32 X 64 X 308 32 X 64 X 154 32 X 32 X 77 64 X 16 X 38 128 X 8 X 19 (b) Middle 1 X 128 X 256 1 X 6 32 X 6 32 X 6 32 X 6 32 X 6 32 X 3 64 X 128 X 512 X 128 X 64 X 6 32 X 32 X 1 X 6 32 X 32 X 32 X 32 X 32 X 64 X 128 X 512 X 128 X 64 X 32 X 32 X 1 X 64 X 2462 32 X 64 X 1231 32 X 64 X 616 32 X 64 X 308 32 X 64 X 154 32 X 32 X 77 64 X 16 X 38 128 X 8 X 19 512 X 16 X 38 128 X 32 X 76 64 X 64 X 152 32 X 128 X 256 32 X 128 X 256 96 X 128 X 256 (c) End Fig. 5. Network conﬁgurations for dealing with the data from multiple plane wav es (multiple transmissions). The green layer denotes the data concatenation layer . Orange layers are the encoding/do wnsampling layers. Blue layers are the decoding/upsampling layers. both Middle and End Networks, weight-sharing in the training phase ensures that each plane-wa ve is treated identically . V . E X P E R I M E N T A L R E S U LT S W e present results for both synthetic data as well as initial results for real data. For the training of the neural network, we used the k- wa ve toolbox for MA TLAB to generate 6026 random training samples and 800 test samples using the procedure described in Sec III. This took roughly two weeks using two NVIDIA GTX 1080i GPUs. Before feeding the data into the network, gain correction is applied at a rate of 0 . 48 dB /µs ( 2 . 5 dB /cm at 1540 m/s ). The channel data signals are then cropped in time to remov e the transmit pulse, as depicted in Figure 1a. This is done to remov e the transmit pulse that is sev eral orders of magnitude stronger than the back-scattered signal, ske wing signal statistics and results. Our physical system also suffers from electrical cross-talk in this temporal range during transmit, corrupting the data farther . All results presented were generated using the same network trained on the full simulated dataset. For training, random Gaussian noise and quantization noise were injected into the signal, which proved essential to avoid over training. Training was executed for 200 Epocs, based on the conv ergence of the loss on the test set. A. Results: Synthetic Data Fig. 6 presents reconstruction results on sev eral samples from the test data. Recovery works well on larger objects but can miss ﬁne details, as can be seen for example in image 16. Fig. 7 shows absolute error values for the samples shown in Fig. 6, with a threshold at 50 m/s . As can be seen in frames 7 and 12, the system manages to recover sound speed also in the case of an empty domain, so information from speckle is also used and not just specular reﬂections. There is a slight misalignment at the edges, which is to be expected as e ven a 1 3 2 4 5 7 6 8 9 11 10 12 13 15 14 16 Depth x (a) Reference velocities 1 3 2 4 5 7 6 8 9 11 10 12 13 15 14 16 Depth x (b) Middle plane wave Fig. 6. Sound speed reco very maps on 16 test samples. Image (a) shows the ground truth data. Image (b) shows the sound speed maps recovered by the trained network using three plane wav es and the “middle” network (see Figure 5). Gray scale values are in the range of 1300 m/s (black) to 1800 m/s (white). 1 3 2 4 5 7 6 8 9 11 10 12 13 15 14 16 Depth x (a) Single plane wave 1 3 2 4 5 7 6 8 9 11 10 12 13 15 14 16 Depth x (b) Three plane wave Fig. 7. Absolute error on 16 test samples. Error has been cropped at 50 m/s (white). Image (a) shows the results for the reconstruction using the single central plane wave. Image (b) shows the reconstruction using three plane wav es and the “Middle” network (see Figure 5) tiny error in the location of the discontinuity or pixelization effects will cause a misalignment. Consequently , although we do show the classic root mean square error (RMSE) value, it does not con ve y the full story . The RMSE norm is a L 2 norm, making it sensitive to outliers. Thus we also report mean and median absolute errors, which are less sensitive to outliers. Furthermore, to present error numbers that de-emphasize the issues due to localization around discontinuities, we further report the follo wing modiﬁed error value: for each pixel, we take the minimum absolute error within a window with a radius of 5 pixels, for both the mean and median cases. For the mean error , in both cases, both the mean absolute error ( µ ) as well as the standard deviation ( σ ) are reported. Results for all error measures are presented in T able III. A vailable research suggests that for clinically relev ant re- sults, measurement accuracy on the order of 30 m/s is useful. 7 T ABLE III R E CO N S T RU CT I O N E R RO R F O R T H E T RA I N A N D T E S T S E T S F O R O U R S I X R E C OV ERY C A SE S : S IN G L E P L A NE W A V E R E C O NS T RU C T IO N F OR T H E T H R EE P L AN E W A V E S , A N D T H RE E P L AN E W A V E R E C ON S T RU C TI O N F O R T HE T H RE E J OI N T R EC O N ST R UC T I ON N E T WO R K S ( S T A RT , M ID D L E , E N D ). A L L V A L U E S A R E I N M E T E RS P E R S E CO N D S . R M S E M E A SU R E S T H E R OO T M E A N S Q UA RE E R RO R . µ A N D σ D E N OT E T H E M E A N A N D S TAN DA R D D EV I A T I O N V A L UE S F O R T H E M E AN A B SO L U T E E R RO R . M E DI A N S HO WS T H E M E D IA N A BS O L UT E E RR O R . T H E S T A R V A L UE S A RE F O R O U R M O DI FI E D E R RO R V A L UE , T A K IN G T H E M I NI M U M A B S OL U T E E R RO R O VE R A W I N D OW W I T H A R A DI U S O F 5 P I X E LS , F OR B OT H T H E M EA N A ND M E DI A N E RR OR M E AS U R E S . Network T rain T est RMSE µ σ Median µ ∗ σ ∗ Median ∗ RMSE µ σ Median µ ∗ σ ∗ Median ∗ Left 22.4 14.6 17.0 10.6 2.0 4.6 0.18 24.8 16.3 18.7 11.8 2.6 6.0 0.22 Center 23.3 15.1 17.8 10.7 2.5 5.7 0.19 25.2 16.2 19.3 11.4 3.1 7.0 0.24 Right 19.2 12.2 14.8 8.9 1.9 4.1 0.16 22.2 14.4 16.9 10.5 2.3 5.4 0.19 Start 21.8 14.2 16.9 10 2.4 5.4 0.19 24.3 15.6 18.6 11.0 2.9 6.5 0.23 Middle 18.8 11.5 14.9 8.2 2.1 4.3 0.17 20.5 12.5 16.1 8.7 2.6 5.2 0.21 End 18.9 11.9 14.8 8.5 1.6 3.7 0.14 20.8 12.9 16.3 9.0 2.0 5.0 0.16 (a) B-mode image (b) Ground truth sound speed (c) sound speed - single plane wave -1 0 1 x (cm) 0 0.5 1 1.5 2 2.5 3 3.5 Depth (cm) 1300 1400 1500 1600 1700 1800 m/s (d) sound speed - three plane wav es Fig. 8. Polyurethane phantom with inclusion. (a) sho ws the B-mode image. (b) shows ground truth sound speed map, measured at 1440 m/s for background and 1750 m/s for the inclusion. (c) shows the sound speed recovery from a single plane wav e. (d) shows the sound speed recovery for three plane wa ves with the “Middle” network All results are well within that range, and error measures that account for outliers at edges are an order of magnitude better . While more work is required to improve results on real data, we see very strong potential with our proposed technology . W e now turn to the issue of multiple inputs, that is, the use of multiple plane waves in image formation. While combining multiple inputs at the ﬁrst layer (“Start Network”) does not improv e reconstruction results, combining in both the middle (“Middle Network”) and the end (“End Network”) does provide some improvement, as can be seen, T able III and Figure 7. Howe ver all cases are close to the recovery limit, so we expect more v alue in terms of stability to noise when dealing with real data. B. Results: Real Data For the case of real data, we look at three data-sets: (1) a polyurethane phantom with an inclusion (Fig. 8), (2) a cross section of the neck (Fig. 10) and (3) an image of the calf (a) B-mode image (b) Shear wave sound speed Fig. 9. Shear wave imaging of the polyurethane phantom presented in Fig. 8. Image (a) shows the b-mode image and image (b) shows the overlaid shear wav e sound speed map. muscles (gastrocnemius and soleus, Fig. 12). All data was collected using a Cephasonics ultrasound system using a 128 linear probe transmitting at 5 M H z . Both human scans were taken as part of an MIT Committee on the Use of Humans as Experimental Subjects (COUHES) approved protocol. In all cases we sho w the results for the sound speed map reconstruction using a single plane wav e, as well as three plane wa ves using the “middle” network. Results using the “middle” and “end” are very similar, so for the sake of brevity , we omit the output of the “end” network. In addition to the pressure wa ve sound speed images, we also collected shear wa ve sound speeds at the same location for comparison. Results for the polyurethane phantom are given in Fig. 9, the neck in Fig. 11 and the calf in Fig 13. Shear wa ve data were collected using a GE Logic E9 system with a GE 9L 192 element linear probe. For the polyurethane phantom we have a ground truth sound speed map measured based on transmission travel time and presented in Fig 8b. Background sound speed is 1440 m/s and inclusion sound speed is 1750 m/s . Comparing to the sound speed ﬁeld reconstruction using a single plane wa ve (Fig 8c), we see that the near (top) side of the inclusion is detected correctly (with a slight sound speed ov ershoot) but the bottom half is not. Sounds speed close to the probe is under-estimated. Closer to the inclusion, sound speed is over -estimated, with large artifacts deeper into the phantom. In contrast, the three plane wav e reconstruction shows signiﬁcantly better results. The inclusion is fully detected, with an accuracy that is barely possible on the b-mode image. There are signiﬁcantly fewer 8 (a) B-mode image (b) Neck anatomy (c) Sound speed - single plane wave(d) Sound speed - three plane wav es Fig. 10. Sound speed recovery for the neck. Image (a) shows a b-mode US image reconstruction. Image (b) shows an anatomical drawing of a cross section of the neck [65], with the anatomical landmarks appearing in (a) highlighted. Image (c) shows the sound speed reconstruction using the single central plane wave. Image (d) shows the sound speed reconstruction using three plane waves and the “Middle” network (a) B-mode image (b) Shear wave sound speed Fig. 11. Shear wa ve imaging of the neck, taken from the same angle of view presented in Fig. 10. Image (a) shows the b-mode image, image (b) shows the ov erlaid shear sound speed map. artifacts as well, but sound speed is still underestimated closer to the probe and overestimated close to the inclusion, although not as much as for the single plane-wav e case. The shear wav e sound speed map presented in Fig 9 shows that shear wav e imaging does not fair well with this phantom. Although we do not have ground truth shear wav e sound speed maps, it is easy to see that the inclusion is not detected at all, and the sound speed map suffers from vertical artifacts, making the quality of these results questionable. For the neck cross-section sample, the annotated b-mode image is presented in Fig. 10a with the matching anatomical sketch in Fig. 10b. W e do not have a ground truth sound speed map, in this case, to compare to, but we do see that the recovered sound speed map follows the anatomy , as well as the shear wa ve image, differentiating between muscle, (a) B-mode image (b) Lower leg anatomy (c) Sound speed, straight leg (d) Sound speed, Bent leg Fig. 12. Sound speed imaging of the lower leg muscle. Image (a) shows the b-mode US image reconstruction with the major muscles of interest delineated (gastrocnemius and soleus). Image (b) shows an anatomical sketch of a cross section of the lower leg [65]. Images (c) and (d) show sound speed reconstruction with toes in ﬂexion, with a straight leg on the left, where we expect the gastrocnemius to be activ e, and a bent leg on the right, where we expect the soleus to be activ e. (a) Straight leg (b) Bent leg Fig. 13. Shear wave imaging of the leg matching Figs 12c and 12d. carotid artery and thyroid gland. The sound speed inside the carotid is underestimated, although we suspect that is due to the lack of backscatter energy from the blood content. Additionally , the recovered sound speed map for the near muscles (sternocleidomastoid, omohyoid, and sternothyroid), as well as the thyroid, match the expected statistical v alues. The deeper muscles are differentiated correctly anatomically but the sound speed does is overestimated. In this, case, probably due to the higher feature density , there is a much smaller difference between the single plane-wav e and three plane-wa ve versions. T urning our attention to the shear wa ve sound speed image presented in Fig. 11, we see that there is a general (though by no means perfect) correlation between the sound speed ﬁeld generated by our network and the shear wa ve speed ﬁeld. This correlation represents a general validation of our technique. 9 A cross-section scan of the calf muscles, speciﬁcally the gastrocnemius and soleus, is shown in Fig. 12. Fig. 12a shows the annotated b-bode image and Fig. 12b shows an anatomical sketch. In this example, we e xplore functional imaging. Both sound speed maps are taken with the toes in ﬂexion and under a small load to activate the calf muscles. The ﬁrst frame, presented in Fig. 12c, shows an image with a straight leg, where we expect the gastrocnemius (external muscle) to be the main activ e muscle. The second frame, shown in Fig. 12d, sho ws the results with a bent leg, where we expect the soleus (internal muscle) to be the one doing most of the work. The contraction of the gastrocnemius muscle is very obvious, although in the case where it is contracting, it appears that the lower half is estimated as ha ving a lo w sound speed instead of high. The response of the soleus is not as obvious, as sound speed estimation is too high in both cases, but can still be observed in the results. Assessed sound speed in the relaxed gastrocnemius muscle in Fig. 12d is about 1540 m/s and of the subcutaneous fat around 1450 m/s , both of which are extremely close to the expected values. As before, shear wav e sound speed images are presented for both cases in Fig 13. As is easily seen, these results do not provide any meaningful information. It is our general experience that full frame shear wa ve sound speed images on loaded muscles tend to be highly unstable at best, especially when looking at cross-section slices. Based on our experience as well as results reported by other researchers, better results can be achieved when using a very limited ﬁeld of vie w to increase frame rates combined with longitudinal probe positioning so that the shear waves propagate along the muscle ﬁbers. Ho wev er, in that case, we lose the bigger picture regarding which parts of the muscle are activ ating, and the potential frame rate is still limited. V I . C O N C L U S I O N S A N D F U T U R E W O R K In this paper , we hav e presented a Deep Learning framew ork for the recovery of sound speed maps from plane wa ve ultrasound channel data. Results on synthetic data are more than an order of magnitude better than our target accuracy , showing that this framew ork has great potential for clinical purposes. Initial real data results are also highly encouraging, although more research is required to improve the results, create cali- brated phantoms for validation and improv e training as well as dev elop better simulation techniques to better train the netw ork to deal with real data. R E F E R E N C E S [1] H. Y .-H. Lin, Y .-L. Lee, K.-D. Lin, Y .-W . Chiu, S.-J. Shin, S.-J. Hwang, H.-C. Chen, and C.-C. Hung, “ Association of Renal Elasticity and Renal Function Progression in Patients with Chronic Kidne y Disease Evaluated by Real-Time Ultrasound Elastography, ” Scientiﬁc Reports , vol. 7, Feb . 2017. [2] H. Singh, O. B. Panta, U. Khanal, and R. K. Ghimire, “Renal Corti- cal Elastography: Normal V alues and V ariations, ” Journal of Medical Ultrasound , vol. 25, no. 4, pp. 215–220, Dec. 2017. [3] J. Carlsen, C. Ewertsen, L. L ¨ onn, and M. Nielsen, “Strain Elastography Ultrasound: An Overvie w with Emphasis on Breast Cancer Diagnosis, ” Diagnostics , vol. 3, no. 1, pp. 117–125, 2013. [4] J. M. Chang, J.-K. W on, K.-B. Lee, I. A. Park, A. Y i, and W . K. Moon, “Comparison of shear-wa ve and strain ultrasound elastography in the differentiation of benign and malignant breast lesions, ” American Journal of Roentgenology , vol. 201, no. 2, pp. W347–W356, Aug. 2013, 00074. [5] R. G. Barr, G. Ferraioli, M. L. Palmeri, Z. D. Goodman, G. Garcia-Tsao, J. Rubin, B. Garra, R. P . Myers, S. R. Wilson, D. Rubens, and D. Levine, “Elastography Assessment of Liv er Fibrosis: Society of Radiologists in Ultrasound Consensus Conference Statement, ” Radiology , vol. 276, no. 3, pp. 845–861, Sep. 2015. [6] G. Ferraioli, P . Parekh, A. B. Levito v , and C. Filice, “Shear W ave Elastography for Ev aluation of Liv er Fibrosis, ” Journal of Ultrasound in Medicine , vol. 33, no. 2, pp. 197–203, Feb. 2014. [7] J. F . Greenleaf, M. Fatemi, and M. Insana, “Selected methods for imaging elastic properties of biological tissues, ” Annual Review of Biomedical Engineering , vol. 5, no. 1, pp. 57–78, Aug. 2003. [8] Kathy Nightingale, “Acoustic Radiation Force Impulse (ARFI) Imaging: A Review, ” Current Medical Imaging Reviews , vol. 7, no. 4, pp. 328– 339, Nov . 2011. [9] J. L. Gennisson, T . Defﬁeux, and M. Fink, “Ultrasound elastography: Principles and techniques, ” Diagnostic and Interventional Imaging , vol. 94, no. 5, pp. 487–495, 2013. [10] N. Duric, P . Littrup, S. Schmidt, C. Li, O. Roy , L. Bey-Knight, R. Janer, D. Kunz, X. Chen, J. Goll, A. W allen, F . Zafar , V . Allada, E. W est, I. Jovanovic, and K. Greenway , “Breast imaging with the SoftV ue imag- ing system: First results, ” Medical Imaging 2013: Ultrasonic Imaging, T omography , and Therapy , vol. 8675, p. 86750K, 2013. [11] M. Sak, N. Duric, P . Littrup, L. Bey-Knight, H. Ali, P . V allieres, M. E. Sherman, and G. L. Gierach, “Using speed of sound imaging to characterize breast density , ” Ultrasound in medicine & biology , vol. 43, no. 1, pp. 91–103, Jan. 2017. [12] O. Yilmaz, Seismic Data Analysis : Processing, Inver sion, and Interpr e- tation of Seismic Data , 2011. [13] J. Bercoff, M. T anter, and M. Fink, “Supersonic Shear Imaging : A Ne w T echnique for Soft T issue Elasticity Mapping, ” IEEE T ransactions on Ultrasonics, F err oelectrics and F requency Contr ol , vol. 51, no. 4, pp. 396–409, 2004. [14] A. Nahas, M. T anter , T .-M. Nguyen, J.-M. Chassot, M. Fink, and a. Claude Boccara, “From supersonic shear wav e imaging to full- ﬁeld optical coherence shear wa ve elastography . ” Journal of biomedical optics , vol. 18, no. 12, p. 121514, 2013. [15] O. Roy , I. Jov anovi ´ c, A. Hormati, R. Parhizkar , and M. V etterli, “Sound speed estimation using wave-based ultrasound tomography: Theory and GPU implementation, ” in SPIE Medical Imaging , J. D’hooge and S. A. McAleav ey , Eds., San Diego, California, USA, Mar. 2010, p. 76290J. [16] C. Li, A. Stewart, and N. Duric, “Multi-grid tomographic in version for breast ultrasound imaging, ” Proceedings of SPIE , vol. 8320, pp. 1–9, 2012. [17] T . Hopp, N. V . Ruiter , and N. Duric, “Breast tissue characterization by sound speed: Correlation with mammograms using a 2D/3D image registration, ” in 2012 IEEE International Ultrasonics Symposium , Oct. 2012, pp. 1–4. [18] J. Nebeker and T . R. Nelson, “Imaging of Sound Speed Using Reﬂection Ultrasound T omography, ” Journal of Ultrasound in Medicine , vol. 31, no. 9, pp. 1389–1404, Sep. 2012. [19] C. Li, G. S. Sandhu, O. Roy , N. Duric, V . Allada, and S. Schmidt, “T ow ard a practical ultrasound waveform tomography algorithm for im- proving breast imaging, ” in Medical Imaging 2014: Ultrasonic Imaging and T omography , vol. 9040, 2014, p. 90401P . [20] J. R. Fincke, M. Feigin, G. A. Prieto, X. Zhang, and B. Anthony , “T ow ards ultrasound trav el time tomography for quantifying human limb geometry and material properties, ” in SPIE Medical Imaging , N. Duric and B. Heyde, Eds., Apr . 2016, p. 97901S. [21] J. R. Fincke, “Imaging cortical bone using the level-set method to regularize travel-time and full wa veform tomography techniques, ” The Journal of the Acoustical Society of America , vol. 141, no. 5, pp. 3549– 3549, May 2017. [22] M. E. Anderson and G. E. Trahe y , “The direct estimation of sound speed using pulse–echo ultrasound, ” The Journal of the Acoustical Society of America , vol. 104, no. 5, pp. 3099–3106, Nov . 1998. [23] H. Hachiya, S. Ohtsuki, M. T anaka, and F . Dunn, “Determination of sound speed in biological tissues based on frequency analysis of pulse response, ” The J ournal of the Acoustical Society of America , vol. 92, no. 3, pp. 1564–1568, Sep. 1992. [24] H.-C. Shin, R. Prager , H. Gomersall, N. Kingsbury , G. T reece, and A. Gee, “Estimation of Speed of Sound using Medical Ultrasound Image Decon volution, ” Ultrasonics , vol. 50, no. 7, p. 24, Jun. 2010. 10 [25] Alex Benjamin, Rebecca E. Zubajlo, Manish Dhyani, Anthony E. Samir, Kai E. Thomenius, Joseph R. Grajo, and Brian W . Anthony , “Surgery for Obesity and Related Diseases: I. A Novel Approach to the Quantiﬁcation of the Longitudinal Speed of Sound and Its Potential for T issue Characterization, ” Ultrasound in Medicine and Biology , vol. 44, no. 12, pp. 2739–2748, Dec. 2018. [26] R. E. Zubajlo, A. Benjamin, J. R. Grajo, K. Kaliannan, J. X. Kang, Atul K. Bhan, Kai E. Thomenius, Brian W . Anthony , Manish Dhyani, and Anthony E. Samir, “Surgery for obesity and related diseases: II. Experimental validation of longitudinal speed of sound estimates in the diagnosis of hepatic steatosis, ” Ultrasound in Medicine & Biology , vol. 44, no. 12, pp. 2749–2758, Dec. 2018. [27] S. Preisser , G. Held, S. Peeters, M. Frenz, M. Jaeger , and M. Gr ¨ unig, “Computed Ultrasound T omography in Echo mode (CUTE) of speed of sound for diagnosis and for aberration correction in pulse-echo sonogra- phy , ” Medical Imaging 2014: Ultrasonic Imaging and T omography , v ol. 9040, no. February , p. 90400A, 2014. [28] P . St ¨ ahli, M. Kuriakose, M. Frenz, and M. Jaeger, “Forward model for quantitativ e pulse-echo speed-of-sound imaging, ” feb 2019. [Online]. A vailable: http://arxiv .org/abs/1902.10639 [29] A. Samani, J. Zubovits, and D. Plewes, “Elastic moduli of normal and pathological human breast tissues: An inv ersion-technique-based in vestigation of 169 samples. ” Physics in medicine and biology , vol. 52, no. 6, pp. 1565–1576, 2007. [30] Y .-C. Fung, Biomechanics: Mechanical Properties of Living T issues . New Y ork, NY : Springer New Y ork, 1993. [31] H. Hachiya, S. Ohtsuki, and M. T anaka, “Relationship Between Speed of Sound in and Density of Normal and Diseased Rat Liv ers, ” J apanese Journal of Applied Physics , vol. 33, no. 5S, p. 3130, 1994. [32] T . Matsuhashi, N. Y amada, H. Shinzawa, and T . T akahashi, “An ev alu- ation of hepatic ultrasound speed in injury models in rats: Correlation with tissue constituents.” Journal of Ultrasound in Medicine , vol. 15, no. 8, pp. 563–570, Aug. 1996. [33] C. Li, N. Duric, P . Littrup, and L. Huang, “In vivo breast sound-speed imaging with ultrasound tomography , ” Ultrasound Med Biol , vol. 35, no. 10, pp. 1615–1628, 2009. [34] M. Imbault, A. Faccinetto, B.-F . Osmanski, A. T issier , T . Defﬁeux, J.-L. Gennisson, V . V ilgrain, and M. T anter, “Rob ust sound speed estimation for ultrasound-based hepatic steatosis assessment, ” Physics in Medicine and Biology , vol. 62, no. 9, pp. 3582–3598, May 2017. [35] A. Benjamin, R. Zubajlo, K. Thomenius, M. Dhyani, K. Kaliannan, A. E. Samir , and B. W . Anthon y , “Non-in vasiv e diagnosis of non-alcoholic fatty liver disease (NAFLD) using ultrasound image echogenicity, ” in 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) . Seogwipo: IEEE, Jul. 2017, pp. 2920–2923. [36] X. W ang, Y . Peng, L. Lu, Z. Lu, M. Bagheri, and R. M. Summers, “Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classiﬁcation and localization of common thorax diseases, ” in Computer V ision and P attern Recognition (CVPR), 2017 IEEE Confer ence on . IEEE, 2017, pp. 3462–3471. [37] M. F . Byrne, F . Soudan, M. Henkel, C. Oertel, N. Chapados, F . J. Echag ¨ ue, S. H. Ghalehjegh, N. Guizard, S. Gigu ` ere, M. E. MacPhail et al. , “Mo1679 real-time artiﬁcial intelligence full colonoscopy work- ﬂow for automatic detection followed by optical biopsy of colorectal polyps, ” Gastr ointestinal Endoscopy , vol. 87, no. 6, p. AB475, 2018. [38] M. Hav aei, A. Davy , D. W arde-Farle y , A. Biard, A. Courville, Y . Bengio, C. Pal, P .-M. Jodoin, and H. Larochelle, “Brain tumor segmentation with deep neural networks, ” Medical image analysis , v ol. 35, pp. 18–31, 2017. [39] Y . Zhang and H. Y u, “Conv olutional neural network based metal artifact reduction in x-ray computed tomography , ” IEEE transactions on medical imaging , vol. 37, no. 6, pp. 1370–1381, 2018. [40] Q. Y ang, P . Y an, Y . Zhang, H. Y u, Y . Shi, X. Mou, M. K. Kalra, Y . Zhang, L. Sun, and G. W ang, “Low dose ct image denoising using a generativ e adversarial network with wasserstein distance and perceptual loss, ” IEEE transactions on medical imaging , 2018. [41] E. Kang, W . Chang, J. Y oo, and J. C. Y e, “Deep conv olutional framelet denoising for low-dose ct via wavelet residual network, ” IEEE transac- tions on medical imaging , vol. 37, no. 6, pp. 1358–1369, 2018. [42] Z. Zhang, X. Liang, X. Dong, Y . Xie, and G. Cao, “ A sparse-view ct reconstruction method based on combination of densenet and decon- volution. ” IEEE transactions on medical imaging , vol. 37, no. 6, pp. 1407–1417, 2018. [43] Y . Han and J. C. Y e, “Framing u-net via deep conv olutional framelets: Application to sparse-view ct, ” IEEE transactions on medical imaging , vol. 37, no. 6, pp. 1418–1429, 2018. [44] X. Zheng, S. Ravishankar , Y . Long, and J. A. Fessler, “Pwls-ultra: An efﬁcient clustering and learning-based approach for low-dose 3d ct image reconstruction, ” IEEE transactions on medical imaging , vol. 37, no. 6, pp. 1498–1510, 2018. [45] H. Shan, Y . Zhang, Q. Y ang, U. Kruger, W . Cong, and G. W ang, “3d con volutional encoder-decoder network for low-dose ct via transfer learning from a 2d trained network, ” IEEE transactions on medical imaging , vol. 37, no. 6, pp. 1522–1534, 2018. [46] G. Y ang, S. Y u, H. Dong, G. Slabaugh, P . L. Dragotti, X. Y e, F . Liu, S. Arridge, J. Kee gan, Y . Guo et al. , “Dagan: Deep de-aliasing generati ve adversarial networks for fast compressed sensing mri reconstruction, ” IEEE transactions on medical imaging , vol. 37, no. 6, pp. 1310–1321, 2018. [47] B. G ¨ ozc ¨ u, R. K. Mahabadi, Y .-H. Li, E. Ilıcak, T . Cukur, J. Scarlett, and V . Cevher , “Learning-based compressive mri, ” IEEE transactions on medical imaging , vol. 37, no. 6, pp. 1394–1406, 2018. [48] T . M. Quan, T . Nguyen-Duc, and W .-K. Jeong, “Compressed sensing mri reconstruction using a generativ e adversarial network with a cyclic loss, ” IEEE transactions on medical imaging , vol. 37, no. 6, pp. 1488– 1497, 2018. [49] B. Y ang, L. Ying, and J. T ang, “ Artiﬁcial neural network enhanced bayesian pet image reconstruction, ” IEEE transactions on medical imaging , 2018. [50] K. Kim, D. W u, K. Gong, J. Dutta, J. H. Kim, Y . D. Son, H. K. Kim, G. El Fakhri, and Q. Li, “Penalized pet reconstruction using deep learning prior and local linear ﬁtting, ” IEEE transactions on medical imaging , vol. 37, no. 6, pp. 1478–1487, 2018. [51] A. Hauptmann, F . Lucka, M. Betcke, N. Huynh, J. Adler, B. Cox, P . Beard, S. Ourselin, and S. Arridge, “Model-based learning for accel- erated, limited-view 3-d photoacoustic tomography , ” IEEE transactions on medical imaging , vol. 37, no. 6, pp. 1382–1393, 2018. [52] D. Allman, A. Reiter , and M. A. L. Bell, “Photoacoustic source detection and reﬂection artifact remov al enabled by deep learning, ” IEEE transactions on medical imaging , vol. 37, no. 6, pp. 1464–1477, 2018. [53] A. Mousavi, G. Dasarathy , and R. G. Baraniuk, “Deepcodec: Adaptive sensing and recovery via deep conv olutional neural networks, ” arXiv pr eprint arXiv:1707.03386 , 2017. [54] C. A. Metzler, P . Schniter, A. V eeraraghavan, and R. G. Baraniuk, “prdeep: Robust phase retrieval with ﬂexible deep neural networks, ” arXiv pr eprint arXiv:1803.00212 , 2018. [55] S. V edula, O. Senouf, A. M. Bronstein, O. V . Michailovich, and M. Zibulevsk y , “T owards CT-quality Ultrasound Imaging using Deep Learning, ” arXiv:1710.06304 [physics] , Oct. 2017, 00000. [56] Y . H. Y oon, S. Khan, J. Huh, and J. C. Y e, “Efﬁcient B-mode Ultrasound Image Reconstruction from Sub-sampled RF Data using Deep Learning, ” arXiv:1712.06096 [cs, stat] , Dec. 2017. [57] F . T om and D. Sheet, “Simulating patho-realistic ultrasound images using deep generativ e networks with adv ersarial learning, ” in Biomedical Imaging (ISBI 2018), 2018 IEEE 15th International Symposium on . IEEE, 2018, pp. 1174–1177. [58] A. C. Luchies and B. C. Byram, “Deep Neural Networks for Ultrasound Beamforming, ” IEEE T ransactions on Medical Imaging , pp. 1–1, 2018, 00005. [59] A. Reiter and M. A. Lediju Bell, “ A machine learning approach to identifying point source locations in photoacoustic data, ” in Photons Plus Ultrasound: Imaging and Sensing , A. A. Oraevsky and L. V . W ang, Eds., Mar . 2017, p. 100643J, 00000. [60] H. Ravishankar, P . Sudhakar , R. V enkataramani, S. Thiruvenkadam, P . Annangi, N. Babu, and V . V aidya, “Understanding the mechanisms of deep transfer learning for medical images, ” in Deep Learning and Data Labeling for Medical Applications . Springer , 2016, pp. 188–196. [61] Q. Zheng, G. T astan, and Y . Fan, “T ransfer learning for diagnosis of congenital abnormalities of the kidney and urinary tract in children based on ultrasound imaging data, ” in Biomedical Imaging (ISBI 2018), 2018 IEEE 15th International Symposium on . IEEE, 2018, pp. 1487–1490. [62] M. Xian, Y . Zhang, H.-D. Cheng, F . Xu, K. Huang, B. Zhang, J. Ding, C. Ning, and Y . W ang, A Benchmark for Breast Ultrasound Image Se gmentation (BUSIS) . Inﬁnite Study , 2018. [63] B. E. Treeby and B. T . Cox, “K-W ave: MA TLAB toolbox for the simulation and reconstruction of photoacoustic wa ve ﬁelds, ” Journal of Biomedical Optics , vol. 15, no. 2, p. 021314, 2010. [64] B. E. Treeby , J. Jaros, D. Rohrbach, and B. T . Cox, “Modelling elastic wav e propagation using the k-wave matlab toolbox, ” in Ultrasonics Symposium (Ius), 2014 Ieee International . IEEE, 2014, pp. 146–149, 00016. [65] H. Gray and W . H. Lewis, Anatomy of the Human Body , 20th ed. Lea & Febiger, 1918.

A Deep Learning Framework for Single-Sided Sound Speed Inversion in Medical Ultrasound

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment