Computed tomography medical image reconstruction on affordable equipment by using out-of-core techniques
As Computed Tomography (CT) scans are an essential medical test, many techniques have been proposed to reconstruct high-quality images using a smaller amount of radiation. One approach is to employ algebraic factorization methods to reconstruct the i…
Authors: Monica Chillaron, Gregorio Quintana-Orti, Vicente Vidal
1 Computed tomography medica l image reconstruction on af fordable equipment by using out-of-core techniques M ´ onica Chillar ´ on, Gregorio Quintana-Ort ´ ı, V icente V i dal, and Gumersindo V erd ´ u Abstract As Computed T omography (CT) scans are an essential medical test, many techniques have been proposed to reconstruct high-quality images using a smaller amount of radiation. One approach is to employ algebraic factorization methods to reconstruct the images, using f ewer views than the traditional analytical methods. Howe v er , their main drawback is the high computational cost and hence the time needed to obtain the images, which is criti cal in the daily clinical practice. For this reason, faster methods for solving this problem are required. In this paper , we propose a n e w reconstruc tion method b ased on t he QR f actorization th at is very ef ficient on afford able equipment (standard multicore processors and standard Solid-State Driv es) by using out-of-core techniques. Combining both affordab le hardware an d the new softw are, we c an boost th e performan ce of the recon structions and imp lement a reliable and competiti ve method that reconstructs high-quality CT images q uickly . Index T er ms CT , QR f actorization, Medical Image, Reconstruction, Out-O f - Core, Affordable Equipment. I . I N T R O D U C T I O N Now adays, Computed tomogr aphy (CT) [ 1] is an essential diagnostic medical im aging test in clin ical practice. Although it inv olves the use of X-r a y s a n d hen ce indu ces io nizing radiatio n in patients, t he in formatio n provided is critical in many cases. Th erefore, it is extremely impo r tant to reduce th e ra d iation do se as much as possible, and thus prevent patien ts from absorbing a higher dose than the recomme nded one. Otherwise, CTs could be a hazard to them, s ince it has been proven the X-r ays can b e harmful, especially to the most vulnerab le pa tien ts [2], [ 3]. The trad itional CT recon stru ction employs a n alytical meth ods, which are b ased on the Filtered Back-Pro jection (FBP) [4], [ 5], [6]. Th ey require a co m plete s et of projections o f an ob ject, o ver 360 degrees of rotation . Th ey are still the most co mmon method s because of th eir low com putationa l cost and theref ore fast rec onstruction . Howe ver , reducing the X-ray dose is difficult wh en a high-quality image must b e obtained. Se veral method s [ 7], [8] hav e been d eveloped that reduc e the radiation do se by minimizing the tube ’ s current or v oltage, and th en reco nstruct the sinograms with statis tical methods that im prove the image quality compared t o traditional FBP-based methods. A c o mmon a pproac h to reducing the r adiation d o se is the u se of iterativ e metho ds, which do no t requ ire a comp lete set of pro jections, nor are th ey restricted in ter ms of pr ojection angles [9], [1 0], [11], [1 2], [13], [ 1 4]. These type of metho ds requir e fewer pr o jections to rec onstruct an image. Nevertheless, they in v olve a high co mputation al cost, which implies that the reco nstruction s a r e much slower th a n with previous metho ds. Mo reover , since these methods are iterative, conv ergence is not guar anteed, nor the num b er of iterations needed to converge. Se veral works [15], [16], [17], [1 8] showed the pro blems of working with few-vie w limited-an gle CT . The use of few views M. Chilla r ´ on and V . V idal are with the Dpto. de S istemas Inform ´ aticos y Computaci ´ on, Unive rsitat Polit ` ecnic a de V al ` encia, V alenc ia, 46022 Spain (e-mail s: mnichipr@inf.upv .es, vvidal@dsic.upv .es). G. Quintana-Ort ´ ı is with the Depto. de Ingenier ´ ıa y Ciencia de Computadores, Univ ersit at Jaume I, Castell ´ on, 12071 Spain (e-mail: gquinta n@uji.es). G. V erd ´ u is with the Dpto. de Ingenier ´ ıa Qu ´ ımica y Nuclear , Unive rsitat Polit ` ecnica de V al ` encia, V alencia , 46022 Spain (e-mail: gverd u@iqn.upv .es). This research has been supported by “Uni versi tat Polit ` ecnica de V al ` encia”, “Generali tat V alenc iana” under PROMETEO/2018/ 035 and A CIF/201 7/075, c o-finance d by FEDER and FSE funds, and the “Spanish Ministry of Scie nce, Innov atio n and Univ ersit ies” under Grant R TI2018- 098156-B-C 54 co-financed by FEDER funds. This work has been submitted to the IEEE for possible publicatio n. Copyright may be transferred without notice , after which this version may no longer be accessi ble. 2 generates streak artifacts that can mask or conceal important par ts of the image to be reco nstructed, which can produ ce infor mation lo ss. This is potentially h a rmful since it can lead to wrong diagnosis. It also po ses a problem for secondary applications of the CT image s, as shown in [ 17], where the red u ction of the number of views to a m inimum n u mber imp lied an inaccur ate segmentation of the b lood vessels. Sechopou los [19] showed th at f ew views led to false positives in co mputer-aided detection for breast mass detection. Unlike direct metho ds, iterativ e methods often generate p a tchy or blo cky ar tifacts in the recon structed im a ges due to overregulariza tion [ 1 6], [20]. Therefo re, direc t algebraic methods such as the QR factorization [21 ], [22] hav e been e xplored recently . Althou gh they u sually require a greater number of views than the iterative ones (as was shown in a previous work [2 3]), they are much mor e accu rate wh en the ran k o f the weights matrix is comp lete. The main drawback o f the d irect algebraic methods is that the sparsity of the weights matrix cannot be taken advantage of , since the matrix fills in and beco mes den se as the process a dvances. T herefo r e, spa c e pro blems be cause of an insufficient main memory (RAM) can arise. In this case, it is important to find an efficient approach to tackle large pro blems without having to acquire e xpensive an d specialized dedicated equ ipment, which would require a large mon etary co st. In our paper, we present a solution to the CT imag e reconstruc tion problem by u sing the d ir ect solu tion of linear systems based o n th e QR factorization . By employing spe c ial high-pe r forman ce so f tware techn iques, hig h -quality images ar e obtained on affordab le com puters. W ithout these techniques, th e co mputer required would be very expensiv e (tens of tho usands of d ollars), mainly du e to th e price of the large main mem ory re quired to store the data. W ith these techniqu es, comp uters with a price abou t o ne order of mag n itude smaller can be employed. A careful application of out-of -core ( OOC) techniques allows to r ead and write blocks of data f rom/to the h ard dri ve just wh en they a r e needed for the calcu lations, instead o f loading the wh ole matrices into main m emory . By applying this method, as well as some other techn iques, we can solve large-scale problems, and therefo re a fast reconstruction of CT images with hig h resolutions can be achieved. Our new implemen tation is t ime-efficient and also scalab le, as can be seen in the results. In additio n , bo th very high qu ality and a reduction in the nu mber of v iews ( a nd therefo re radiation) are achieved, compare d to analytical methods. In our work, w e have checked th at the OOC appro ach is still valid on m uch larger matrices th an previous works. Moreover, we have assessed the p erform a nces on both traditional hard d riv es (HDD) and mode r n S olid-State Dri ves (SSD). The d ocumen t is organize d as follows: Sectio n II describes the simulation o f our projection data, as well as the simulated scanner para m eters. It is also explain ed how to perform a CT reconstructio n u sing the QR factorization of the weights matrix. Besides, the metrics employed to measure the imag e quality are introd uced, and the QR factorization and the reco n struction algorithm are described in detail. Section III as sesses our new method in terms of numer ica l stability and imag e qu ality . A d e ta iled p erform ance study compar ing the d ifferent configur ations usin g two types of hard dri ves is also included. T o conclude, Section IV summar izes and discusses the advantages o f th e studied method and p ropo ses future lin es of w or k. I I . M E T H O D S A. CT image reconstruction T o reconstruct CT image s with an algebraic approach, we mod el the pr oblem as: AX = B + W (1) where A = ( a i,j ) ∈ R M × N denotes th e so-called system matrix, with dimensions M × N . A is th e weights matr ix that models the phy sical scann er , being a i,j the contribution of the i -th ra y on the j -th pixel. T he dimension M is the produc t of the nu m ber of detectors of the CT scann e r multiplied by the number of projections or views taken. N d enotes the resolu tion of the image ( 256 × 256 pixels, 512 × 512 pixels, etc.). B = ( B j ) is a matr ix o f M × S elements, w h ere S is the nu mber of slices to be reco n structed, and B j denotes the colum n j th at will correspon d to the j -th sinogram . X = ( X j ) is a matrix of d imensions N × S , where X j is the column where the recon structed image corresponding with the j - th sinogr a m will be stored. W is the n oise contained in the sinograms, which will not be considered in this paper . 3 T ABLE I: Simulated fan-beam scanner parameters. Source traje ctory 360 o circul ar scan Scan radius 75 cm Source-to -detector di stance 150 cm X-ray source fan angle 30 o Number of detectors 1025 Pixe ls of the reconst ructed image 512 2 Number of project ions 260 The sinogr ams have b e en simulated u sing Joseph me th od [24]. W e modeled a fan- b eam scanner, using the parameter s shown in T able I. As was mentioned before, the number of pro jections taken d epends on the desired recon struction resolution, and it needs to be adjusted so that matrix A has full rank. The proje c tions are selected acco rding to Eq. 2, where the sym metry of the pr ojection data is broken by making an angle shift f or every qu arter of the circumfere n ce to improve th e rank. Θ i = (360 /v ) * ( i - 1) if 1 ≤ i ≤ ( v / 4) Θ v/ 4 + 0 . 5 + (36 0 /v ) * ( i - 1) if ( v / 4) < i ≤ ( v / 2) Θ v/ 2 - 0 . 75 + (3 60 /v ) * ( i - 1) if ( v / 2) < i ≤ (3 v / 4 ) Θ 3 v/ 4 - 0 . 25 + (3 60 /v ) * ( i - 1) if (3 v / 4) < i ≤ v (2) T o solve the prob lem in Eq. 1, first th e QR factorization of A is computed (Eq. 3), where Q is or thonor mal an d R is upper triangular . Then, to r e construct the images, E q. 4 is em ployed. A = QR (3) X = R − 1 ( Q T B ) (4) It is imp ortant to note that the QR factorizatio n do es not need to be computed for every imag e b eing gen erated, since it do es not depen d on B . It can be computed just o nce and, by storing the results, a lot of c o mputatio n al work can be saved. B. Image Quality Metr ics T o measure the quality of the reconstructed images, we use two well- e stab lished metrics for images: PSNR (Peak Signal-T o-No ise Ratio) and SSIM (Structural Similarity Index) [25]. Th e PSNR metric measur e s the ratio of the image signal to th e noise it co ntains. T o calculate it, anothe r metr ic is used, the so- c alled Mean Square Er ror (MSE), which is calculated acco rding to Eq. 5, and repre sen ts the mean of the squared error betwee n the r eference image I 0 and the reconstructed imag e I ( X in ou r e q uations). Once the MSE is calculated, it is used to calculate the PSNR accordin g to Eq. 6, in which M AX represen ts the maximu m value that a pixel c a n take. The higher the PSNR value we get, the better the reconstru ction ob tained. SSIM measur es the intern al structures (shapes) o f the imag es comp ared with the re f erence imag e. Th erefore , it does not f ocus o n the gray le vels of the pixels, but on the shapes of the reconstruc te d image with re spect to the r eference image. Therefo re, it m easures wh at is pe rceptible to th e hum an eye. It is applied th rough windows of fixed size, and the difference b etween two windows x and y correspon ding to th e two images to be co mpared is calcu la te d using Eq . 7. In this equ ation, µ x and µ y denote the average value of the wind ow x and y , σ 2 x and σ 2 y the v aria n ce, σ xy the co-variance between the windows, an d c 1 and c 2 are two stabilizing variables dep endent on th e d ynamic range of th e image. MSE = 1 M N M − 1 X i =0 N − 1 X j =0 ( I 0 ( i, j ) − I ( i, j )) 2 (5) PSNR = 10 log 10 MAX ( I 0 ) 2 MSE (6) SSIM = (2 µ x µ y + c 1 )(2 σ x,y + c 2 ) ( µ 2 x + µ 2 y + c 1 )( σ 2 x + σ 2 y + c 2 ) (7) 4 C. Out-of- cor e computations Some problems r equire the storage of data so large that there are no compu te r s with such a main me mory o r , in case they exist, their p rices are very high . Most o p erating systems provide a virtual m emory system to sto r e da ta (and prog rams) th at do not fit into the comp uter’ s main memory at one time. Howev e r, its perform a n ces are n ot very high when em ployed on stru c tured scientific p roblems. Hence, in high-perf ormanc e scientific compu tin g, special technique s, called Out-Of- Core ( OOC) or Out-Of- Memory (OOM) , are requ ir ed to efficiently process data stor e d in the ha rd dri ve. The se techniqu e s keep the data stored in the hard d rive, read them into memor y , and wr ite them into disk whenev e r is needed. The aim of these techniques is to minimize the ef fect of the slow s p eed of the read and write operations fr om/to disks i n order to ren der perform a nces as high as possible. 1) T radition al a ppr oach: I n m odern co mputer architectures floating-p oint operatio ns are m u ch f aster than m e mory accesses. Ther e fore, the ra tio of flo p s to memory accesses in compu tations is very imp ortant. An in creased ratio provides much high er pe r forman ces since it allows to compu te sev eral or even m any flops per each memo r y acce ss, and h ence cache memo ries and o ther moder n features can be fu lly explo ited. For instance, matrix-m atrix o peration s obtain significantly higher p e r forman ces than matrix-vector operations. In lin e ar algebra, un blocked algo rithms perf orm on e stage at a time (e.g. one colu mn in colu mn-or iented algorithms). In c ontrast, a blocked algo rithm pe rforms several stages ( e. g . se veral columns in co lumn-o riented algorithm s) of the traditional (unb locked) algor ithm at the same time because this agg regation can take advantage of the more efficient matrix- matrix operations. This n umber of stages (e. g. columns) that are processed at the same time is usually called the blo c k size. Howe ver, since most u sual algo rithms in linear algebr a proceed on triangu lar matrices, pr ocessing a fixed number of columns ( or rows ) at the same time can make the data to be p rocessed very la rge at the beginn ing, and very sm a ll at the end , or vice versa. This can make perfo r mances no t to be optimal because m a in m emory cou ld be und erused in som e stages and because o f the large variation in the data being tr ansferred . This variation of the transfer size can be a pr oblem when the data are stored in disk since this kind of de vices are mo re sensiti ve to transfer sizes. There ar e usually two commo n types of a lg orithms: Righ t-looking alg orithms u p date the re st of the matrix (righ t part) after th e pr ocessing of th e curr ent (block ) column or row , thus r e quiring O ( n 3 ) wr ites. In con trast, left-look ing algorithm s update the curre n t (block) column or row , with the data previously processed (left part), thus req uiring O ( n 2 ) writes. Since the cost of a write op eration in hard dri ves is usually hig h er than the cost of a r ead operation, left-lookin g alg orithms ar e usu ally pref erred when w o r king o n data stored in disk. Great ef f orts have been mad e to efficiently solve pro blems fro m lin e ar alg ebra wh ose d ata d o not fit in RAM an d must be stor ed in disk [26], [27], [28], [29], [30 ] , [31]. 2) Algorithms-b y-blocks: Like blo cked algorithm s, algo rithms-by -blocks also p erform several stages of the tr adi- tional (unb locked) alg orithm at the same time in or der to take advantage of the hig her speeds of matr ix -matrix operation s. Unlike b locked algorithms, Algorith ms-by-b locks ach iev e ma tr ix-matrix op erations b y raising the gran- ularity o f the data. First, the tra d itional (u nblocked) algor ithm must be reformu lated to perform o perations that process only scalar elemen ts. Then, the scalar elements are raised to being squ a re blocks o f dimen sion b × b , and the operation s pro cessing them are according ly raised too so th at they correctly pr o cess th ese square blo c k s. Therefor e, in the end the wh ole c o mputatio n to be perf ormed is d ivided into many tasks, each one p rocessing a few square blocks (between one an d four , b ut mor e usually tw o o r three). One of the main benefits of this approach is that all b locks are alw ay s of the same size (except maybe f or the final right an d b ottom b locks). This bring s in the benefit o f m aking th e m a jority of the transfers of the same size. T hus, by tuning the block size for a given mach ine, all the data transfers will b e v ery efficient, r egardless of the stage of the algorithm (first stages or last stages). Quintana-Or t´ ı et al. [32], [33] d ev e lo ped a runtime that can p r ocess algorithm -by-b locks very ef ficiently by apply ing two techniq u es: The u se of a cache o f bloc k s stored in memory to reuse informatio n, and the overlapping of computatio n and commu nications to reduce the cost of th e latter . 3) QR facto rization: The algorith m-by- blocks f o r efficiently comp uting the QR factorization was described in 2009 [ 34]. This approach emp loyed th e method s and ru ntime describ e d by Qu intana-Ort´ ı et a l. [32], [33]. However , these works assessed smaller matrices, they did not test mod ern fast Solid-State Drives ( SSD) , and they only assessed the QR factorization . In our cu rrent work we have chec ked that th is app roach is stil l valid on much larger matrices, 5 T ABLE II: List of tasks ge nerated by th e Algorithm-by-blocks for computing the QR factorization when m = n = 3 b , where b is the block size. Operati on Operands Out In Comp dense QR A 00 S 00 A 00 Comp TD QR A 00 A 10 S 10 A 00 A 10 Comp TD QR A 00 A 20 S 20 A 00 A 20 Apply left Q t of dense QR A 01 A 00 S 00 A 01 Apply left Q t of TD QR A 01 A 11 A 10 S 10 A 01 A 11 Apply left Q t of TD QR A 01 A 21 A 20 S 20 A 01 A 21 Comp dense QR A 11 S 11 A 11 Comp TD QR A 11 A 21 S 21 A 11 A 21 Apply left Q t of dense QR A 02 A 00 S 00 A 02 Apply left Q t of TD QR A 02 A 12 A 10 S 10 A 02 A 12 Apply left Q t of TD QR A 02 A 22 A 20 S 20 A 02 A 22 Apply left Q t of dense QR A 12 A 11 S 11 A 12 Apply left Q t of TD QR A 12 A 22 A 21 S 21 A 12 A 22 Comp dense QR A 22 S 22 A 22 we have com pared the perfor mances of th is appr oach on both traditional hard d rives and modern SSDs, an d we have implemented an d as sessed the app lica tio n of orthogon al transformatio ns previously compu ted and the reso lu tion of triangular linear system s (pr oblems not inc luded in these pre vio u s works). Figure 1 illustrates the pro cess p e rforme d by a left-loo king algo rithm-b y-block s for co m puting th e QR factorization of a 9 × 9 matrix with block si ze 3. Th e ‘ • ’ symbol repr e sents a non-mo dified element by the current tas k, the ‘ ∗ ’ symbol r epresents a mod ified element by the current task, and the ‘ ◦ ’ symbol r epresents a nullified element (either by the curr ent task or by a previous task). The nullified elements are shown because they store info rmation about the Hou seholder transfo rmations that will be later used to apply them . The continuou s lines surround the blocks in volved in the c u rrent task. T o red uce the size of this grap hic, it o nly shows the factorization o f the first and second block co lumns. In the p r ocessing of the first colu mn, as there are no previous column s, the work to do is ju st to nullify all the elements b elow the main diago nal. This pro cess is performed with three tasks (tasks 1 , 2, and 3). The first task nullifies elements below the diagonal in A 00 . Th e seco nd and third tasks nullify elements in A 10 and A 20 , resp e ctiv ely . T o nullify th ose two blocks, the se two tasks mu st also u pdate the A 00 block. I n the processing of the second column, the first w o rk to do is to apply pre v ious tr ansforma tions to the cur rent block column (tasks 4, 5, a nd 6). Then, the elemen ts belo w the diagonal in b lo cks A 11 and A 21 must be nullified (tasks 7 and 8). T able II illustrates all the tasks gener ated an d executed by the algorithm -by-b lo cks fo r computing the QR factorization for the previous c ase ( and also f o r the ge neral cases m = n = 3 b , where b is the block size). As can be seen, in th is case th e full factorizatio n compr ises 14 tasks. The effect of the first eig ht tasks is shown in Figu re 1. The rema in ing tasks (n ot shown in the gr aphic) proceed in an analo gous way on the third blo ck column : First, the transformations obtained whe n annihilating the eleme n ts below the d iagonal in the first b lock colu mn are applied to the thir d block column. Second , the tr ansforma tions obtained when annihilating the elements b elow the diagon al in the second block colu mn ar e applied to the thir d block co lu mn. Finally , the eleme n ts below the diagonal in the third block column are annihilated. The QR f acto rization of a matr ix of any dimension on ly requires th e follo win g f our generic tasks: • Compute dense QR( A, S ) : This task nullifies all the elements below the diagona l of input/outp ut block A . The output is two-fold: Th e first is the updated matrix A , and th e seco nd is the S factor . Th e u pper tr ia n gular par t of A contains the updated R triang ular factor . The strictly lo wer triang ular part of A co ntains the Househ older reflectors gener ated in this QR factorization. Matrix S contains the S factors, also r e q uired to apply th e transform ations o btained in th is task. • Apply left Qt of dense QR( Y , S, C ) : The input data o f th is ta sk are matr ices Y ( the H o useholde r reflectors) and S (the S factors), the output of the p revious task. Given th ese two input matrices Y and S , th is task applies those transform ations to input/o utput b lock C . • Compute TD QR( T , D, S ) : Th e inpu t data of this task a r e matrices T and D (triang u lar and dense, r espectively , and hence th e a cronym TD). This task nullifies all the elements in b lock D and accord ingly upd ates bloc k T . The output is three - fold: The first o utput is m atrix T (con ta in ing the updated tr ian gular f acto r), the secon d 6 ∗ ∗ ∗ • • • • • • ◦ ∗ ∗ • • • • • • ◦ ◦ ∗ • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • (1) After Compute QR( A 00 ) ∗ ∗ ∗ • • • • • • ◦ ∗ ∗ • • • • • • ◦ ◦ ∗ • • • • • • ◦ ◦ ◦ • • • • • • ◦ ◦ ◦ • • • • • • ◦ ◦ ◦ • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • (2) Aft er Compute TD QR ( A 00 , A 10 ) ∗ ∗ ∗ • • • • • • ◦ ∗ ∗ • • • • • • ◦ ◦ ∗ • • • • • • ◦ ◦ ◦ • • • • • • ◦ ◦ ◦ • • • • • • ◦ ◦ ◦ • • • • • • ◦ ◦ ◦ • • • • • • ◦ ◦ ◦ • • • • • • ◦ ◦ ◦ • • • • • • (3) Aft er Compute TD QR( A 00 , A 20 ) • • • ∗ ∗ ∗ • • • ◦ • • ∗ ∗ ∗ • • • ◦ ◦ • ∗ ∗ ∗ • • • ◦ ◦ ◦ • • • • • • ◦ ◦ ◦ • • • • • • ◦ ◦ ◦ • • • • • • ◦ ◦ ◦ • • • • • • ◦ ◦ ◦ • • • • • • ◦ ◦ ◦ • • • • • • (4) After Apply left Qt of Dense QR ( A 00 , A 01 ) • • • ∗ ∗ ∗ • • • ◦ • • ∗ ∗ ∗ • • • ◦ ◦ • ∗ ∗ ∗ • • • ◦ ◦ ◦ ∗ ∗ ∗ • • • ◦ ◦ ◦ ∗ ∗ ∗ • • • ◦ ◦ ◦ ∗ ∗ ∗ • • • ◦ ◦ ◦ • • • • • • ◦ ◦ ◦ • • • • • • ◦ ◦ ◦ • • • • • • (5) After Apply left Qt of TD QR( A 00 , A 10 , A 01 , A 11 ) • • • ∗ ∗ ∗ • • • ◦ • • ∗ ∗ ∗ • • • ◦ ◦ • ∗ ∗ ∗ • • • ◦ ◦ ◦ • • • • • • ◦ ◦ ◦ • • • • • • ◦ ◦ ◦ • • • • • • ◦ ◦ ◦ ∗ ∗ ∗ • • • ◦ ◦ ◦ ∗ ∗ ∗ • • • ◦ ◦ ◦ ∗ ∗ ∗ • • • (6) After Apply left Qt of TD QR( A 00 , A 20 , A 01 , A 21 ) • • • • • • • • • ◦ • • • • • • • • ◦ ◦ • • • • • • • ◦ ◦ ◦ ∗ ∗ ∗ • • • ◦ ◦ ◦ ◦ ∗ ∗ • • • ◦ ◦ ◦ ◦ ◦ ∗ • • • ◦ ◦ ◦ • • • • • • ◦ ◦ ◦ • • • • • • ◦ ◦ ◦ • • • • • • (7) Aft er Compute dense QR( A 11 , S 11 ) • • • • • • • • • ◦ • • • • • • • • ◦ ◦ • • • • • • • ◦ ◦ ◦ ∗ ∗ ∗ • • • ◦ ◦ ◦ ◦ ∗ ∗ • • • ◦ ◦ ◦ ◦ ◦ ∗ • • • ◦ ◦ ◦ ◦ ◦ ◦ • • • ◦ ◦ ◦ ◦ ◦ ◦ • • • ◦ ◦ ◦ ◦ ◦ ◦ • • • (8) After Compute TD QR ( A 11 , A 21 , S 21 ) Fig. 1: An illustration of the first tasks performed by an algorithm-by-blocks for compu ting the QR factorization. The ‘ • ’ symbol represents a n on-modified el ement by the current task, ‘ ∗ ’ represents a modified element by the current task, and ‘ ◦ ’ represents a nullifi ed element (by t he current task or by a pre vious task). The continuous lines surround the blocks inv olved in the current task. output is m atrix D (containing th e Householder reflector s) , and th e third outp ut is matrix S (c ontaining the S factors). • Apply left Qt of TD QR( D, S, F , G ) : The input d ata of this task are the inpu t matrix D (the Ho useholder reflectors) and S (th e S factors). Both of them are the output of the previous task , i.e. the compu tation of the QR factorization of a trian gular-dense factor . Th is task co rrespon dingly updates input/ou tput matrices F an d G with those transformations. These four generic tasks will be employed when comp uting the QR factorization ( A = QR ) and whe n co mputing the solution of the linear system X = R − 1 ( Q T B ) . 4) System solving: When a linear system of equations AX = B m ust b e solved by using the QR factorization, the first stage is obviou sly to compute the factorization A = Q R . The secon d stage is the following computation: X = R − 1 ( Q T B ) , wher e Q T is the transpose of Q . The first sub-step of the second stage ( X = R − 1 ( Q T B ) ) is to co mpute the product Q T B . A s u su al in linear algebra, matrix Q (o r its transpose) is not explicitly built because of the large cost (in both space and time) of the building operation and t he ev en larger com putation a l cost of the follo w in g matrix-matrix multip ly . Instead, the tr anspose of matrix Q will b e implicitly a p plied by u sing the Househ older reflectors and the S factor s p reviously ob tained in the QR factor ization. The second sub- step o f the seco nd stage ( X = R − 1 ( Q T B ) ) is to mu ltiply the inverse of R an d th e resu lt o f the previous sub -step ( Q T B ) . As usual in linear alg ebra, to redu c e the comp utational cost, the inv e r se of R is no t explicitly computed , and instead a linear backward sub stitution is applied. A b lock row algorithm fo r the backward substitution has been employed in o rder to both increase the locality and minim ize th e number of blocks being written (if a cach e of b lo cks is employed). 7 T ABLE III: List of tasks generated by the Algorithm-by-block s for solving a linear system using a prev iously computed QR factorization when m = n = 3 b , wh ere b is the block si ze Operati on Operands Out In Apply left Q t of dense QR B 00 A 00 S 00 B 00 Apply left Q t of TD QR B 00 B 10 A 10 S 10 B 00 B 10 Apply left Q t of TD QR B 00 B 20 A 20 S 20 B 00 B 20 Apply left Q t of dense QR B 10 A 11 S 11 B 10 Apply left Q t of TD QR B 10 B 20 A 21 S 21 B 10 B 20 Apply left Q t of dense QR B 20 A 22 S 22 B 20 Trsm lunn ( B = upper ( A ) − 1 B ) B 20 A 22 B 20 Gemm nn mo ( C = − AB + C ) B 10 B 10 A 12 B 20 Trsm lunn ( B = upper ( A ) − 1 B ) B 10 A 11 B 10 Gemm nn mo ( C = − AB + C ) B 00 B 00 A 01 B 10 Gemm nn mo ( C = − AB + C ) B 00 B 00 A 02 B 20 Trsm lunn ( B = upper ( A ) − 1 B ) B 00 A 00 B 00 T able III illustrates all the tasks generated and executed b y the algorithm- by-blo cks for comp u ting X = R − 1 ( Q T B ) when the QR factorizatio n has been pre v iously co mputed for the case m = n = 3 b , where b is the block size. I I I . E X P E R I M E N TA L S T U DY In this section, the precision and the speed of o ur new imp lementations are assessed. The first subsection describes the precision stud y , whereas the second subsection de scribes the p erform a nce study . In all the experime n ts we used double- precision arithm etic with do uble-pr ecision real matrices. A. Precision an d image quality study In a prelimina ry tes t to check the validity of this method , the Forbild Head Phantom [35] for different resolutions (from 64 2 to 512 2 ) was projected and recon structed. T able IV shows the relativ e residual r = || AX − B || F / || A || F for those resolution s. As can b e seen, the d ata shows that th e meth od is nu merically stable and the solution obtained is very accurate ev en on the highest resolu tion. Althoug h th e residu a l grows with the r esolution, it is still low , so higher image resolutions cou ld be r eached if needed. T able V shows the quality metrics r esults, as an average of the quality o f every slice o f the ph antom. The number of slices for every resolu tion in these tests is 32 for the 6 4 2 pixels r esolution, 128 f or the 25 6 2 and so o n. The SSIM metric is equal to 1 for every image resolution, which indicates we are n ot losin g any internal structure of the ima g es. The PSNR is high for every case, wit h results always above 2 00, although it is higher for the smaller reso lu tions. In o th er works as [14] where we worked with iter ativ e methods, we considered reconstructions with a PSNR of around 60 to be high- quality . Figures 2.1 a nd 2.2 sho w the central s lice of the phantom and ou r reconstruction for a resolution with 512 × 512 pixels, the higher resolutio n we h av e r econstructed . As can be ob served, the imag es are identical. A ran domly cho sen co llection of real CT images from the d a taset D e epLesion [ 36] was also tested. The selected images, which had 512 × 5 12 pixels, were projected with Joseph’ s m ethod an d used as refer ence. W ith these images from the dataset, the average PSNR o f the re construction s f or 20 48 slices cor respond ing to dif fe r ent studies is 220, and th e SSIM is 1 . Figures 2.3 and 2.4 show that our method achiev es really h igh-qu ality reconstru ctions, even though these images are mu c h m ore complex th a n the p h antom. T ABLE IV: Evo l ution of the r el at ive residual. Resoluti on 64 2 256 2 384 2 512 2 Residual 2 . 09 · 10 − 13 1 . 50 · 10 − 12 2 . 69 · 10 − 12 6 . 42 · 10 − 12 T ABLE V: A verage R econstruction Image Quality Resoluti on 64 2 256 2 384 2 512 2 PSNR 2 58 228 220 204 SSIM 1 1 1 1 8 (1) Phantom reference (2) QR reconstruction (3) Chest CT referen ce (4) QR reconstruction Fig. 2: CT ima g es B. P erformance study The com puter used in the performance experiments featur ed one In tel i7- 7800 X r CPU (6 phy sical co res) and 128 GiB of RAM in total. The clock fr equency of the pr ocessor was 3.50 GHz, an d the so-called Max T urbo F r eq uency was 4 .00 GHz. In additio n to on e sm all SSD for storing th e op erating system and prog ramming too ls, the computer had tw o disks that wer e employed in the expe r iments, bo th with a capacity of 2 TB: One Har d Disk Dri ve (HDD) and on e Solid - State Drive (SSD) with an M.2 c onnecto r . The HDD (spinning disk) was a T o shiba DT01 A CA20 0 (Firmware MX4OABB0). The SSD was a Samsung SS D 970 EVO 2TB (Fir m ware 1B2QE XE7). Accord ing to the Linux o p erating system hdparm tool, the re a d speed of the first on e was about 1 91.43 MB/s, wherea s the read speed of the second one was abou t was 24 27.50 MB/s. This is an up per-middle desktop perso n al com puter and its curren t price is only about a few thou sand dollars. Its OS was GN U /L inux (kernel version 3.10 .0-86 2.14. 4 .el7.x86 64). GCC compiler ( versio n 4.8.5 20150623) was used. Intel(R) Math Kernel Libra r y ( MKL) V er sion 2018.0.2 Produ ct Build 20180 127 fo r Intel(R) 64 architectu re was employed for solvin g some advanced line a r alg ebra problems. Our new implementatio ns were coded with the libflame ( Release 11104) high- p erform ance lib rary , which employed Intel’ s MKL for p erform in g the small- and m edium-sized basic linear alg e bra computation s. Because of the variability of the e x perimen ta l running time on s o me computers, wh en s olving linear s ystems three experiments were ran, an d the av er age values were rep o rted. Nev e r theless, we m ust say tha t the three ob tained times were similar o n the assessed architectu r e. All the experimen ts reported show only the time requ ired by the computatio n X = R − 1 ( Q T B ) , s ince the QR factor ization can be co mputed o nly once and th e n employed for many different images. Unless exp licitly stated oth erwise, all the experiments employed six thread s (and therefo re six co res) fo r co m putation since the computer had six co res, the only e x ception being the co des with overlapping of computatio n and I/O. In this case, fi ve threads (a nd five co res) were employed fo r computatio n an d one thread (one core) w a s employed for disk I/O tasks. W e have assessed fou r co nfigura tio ns, which are o btained as the com binations of two OOC AB me thods (n on- overlapping o r basic OOC AB, and overlapp ing OOC AB) and two types of disks (H D D and SSD). The assessed four configuration s were the following: • B - O O C + H D D : The basic (or no n-overlapping ) Out-Of-Core Algo rithm-by -Blocks for solving th e linear system was em ployed on th e HDD described a b ove. This is also called the initial configu ration. • O - O O C + H D D : The Out-Of-Cor e Algorith m-by- Bloc ks with overlapping of computation and I/O f o r solving the linear system w as employed on the HDD descr ib ed abov e . • B - O O C + S S D : The basic (or non-overlappin g) Ou t-Of-Core Algorithm- by-Blocks for so lv ing the linear system was employed on the SSD described abov e . • O - O O C + S S D : The Out-Of-Core Algorithm-b y-Blocks with overlapping of computation and I/O for solving the linear system w as employed on the SSD d escribed above. In all our implementatio ns we em ployed a block size 102 40 for the OOC co mputation s (the nu mber of rows and columns o f every square block, kept in a different file), sinc e this size usually renders good resu lts on all the assessed code variants [34], [32], [33]. I n our codes, the block size employed inside every task to process the block s once 9 2 5 6 5 1 2 1 0 2 4 2 0 4 8 Nu m b e r o f sl i c e s 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 T i m e i n se c o n d s ( l o w e r i s b e t t e r ) De c o m p o se d t i m e f o r B- O O C + HDD I /O t i m e C o m p u ta ti o n a l ti m e Fig. 3: Ov erall times and decompo sed times of the i nit ial configuration (B-OOC + HDD) for solving a l inear system wi th A of dimension 266 , 50 0 × 262 , 144 , and B o f dime nsion 266 , 500 × k , where k is the number of sli ces. they are stored in RAM was 1 2 8, sin ce this size u sually renders good pe r forman ces when processing matrices o f size 10240 . In the rest of the code s n ot d ev eloped by us (matrix-m atrix products, etc.), the blo ck size was determined by the l ibrary that performed that task (usually Intel’ s MKL). Figure 3 sh ows the overall times and the decomp osed times of the initial con figuration (B-OOC + HDD, that is, the basic or n on-overlapp in g OOC Algorithm-by- Blocks on th e HDD) for solving a linear sy stem with A of dimension 266 , 50 0 × 262 , 144 , an d B of dimension 26 6 , 500 × k , where k is the num ber of slices. The aim of th is plot was to assess if the process was feasible, a n d to d etermine the main bottlene c k of the applicatio n. For the system with 2048 slices, 2.50 second s per slice were n eeded; fo r the system with 256 slices, 14 .54 seconds per slice were need ed. These times showed that the pro cess was feasible, but the times were a bit high in so m e cases and very h igh in oth er cases. Mor eover , the decomp osition of the time showed that I/O times wer e very high, but they did not g row too much as the nu mber of slices increased. Theref o re, the main bo ttleneck of this p roblem was the I/O time, instead of the computational time. Then, adding more co res or se veral GPUs to the hardware configuration w as n o t going to help in th is case, and the focus s h ould instead be o n a f ast disk. Figure 4 compa r es the perf ormance s of the fou r config urations de scr ibed a b ove: Basic OOC AB on HDD, Ov erlap- ping OOC AB on HDD, Basic OOC AB on SSD, and Overlap ping OOC AB on SSD. The top subplot shows th e times in second s (lower i s better), wh ereas the b ottom sub plot sho ws the s p eedup (higher is better) with r espect to the initial con figuration (basic OOC AB on HDD). The speedup is comp uted as the qu otient of the time obtained by the referen ce con figuration and the time obtain ed b y th e new con figuration . Thus, this concep t means how m any times the new con figuration is as fas t as the reference co nfiguratio n. Hen ce, t he higher the speedups, the better the perfor mances are. As the refer ence c o nfigura tio n is the in itial o ne, in the bottom subplot the initial configu ration will be shown as ones. As can be seen , the SSD g reatly reduc e d the overall times and increased the speed by more than 6 times for the smallest case (256 slices) with respect to the initial configuratio n . T he overlapping of comp utation and I/O fur ther increased the speed up to near ly 9 times for the smallest case (256 slices). When the numb e r of slices w as high, the im p rovements were not so great but still very no ticeable. 2 5 6 5 1 2 1 0 2 4 2 0 4 8 Nu m b e r o f s l i c e s 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 6 0 0 0 T i m e i n s e c o n d s ( l o w e r i s b e t t e r ) T i m e v s n u m b e r o f s l i c e s B- O O C + HDD O - O O C + HDD B- O O C + S S D O - O O C + S S D 2 5 6 5 1 2 1 0 2 4 2 0 4 8 Nu m b e r o f sl i c e s 0 2 4 6 8 S p e e d u p ( h i g h e r i s b e t t e r ) S p e e d u p v s n u m b e r o f sl i c e s B- O O C + HDD O - O O C + HDD B- O O C + S S D O - O O C + S S D Fig. 4: T ime and speedups fo r the four configurations. 10 2 5 6 5 1 2 1 0 2 4 2 0 4 8 Nu m b e r o f sl i c e s 0 1 0 0 0 2 0 0 0 3 0 0 0 4 0 0 0 5 0 0 0 T i m e i n se c o n d s ( l o w e r i s b e t t e r ) B- OO C + HDD B- OO C + S S D O- O OC + S S D De c o m p o se d t i m e o f se v e r a l c o n fi g u r a t i o n s I /O ti m e C o m p u t a t i o n a l t i m e O v e r l a p p e d t i m e Fig. 5: Overall times and decomposed times of three con- figurations for solving a linear system wit h A of dimension 266 , 50 0 × 262 , 144 , and B of dimension 266 , 500 × k , where k is the number of slices. 2 5 6 5 1 2 1 0 2 4 2 0 4 8 Nu m b e r o f s l i c e s 0 2 4 6 8 1 0 1 2 1 4 T i m e i n s e c o n d s p e r s l i c e ( l o w e r i s b e t t e r ) T i m e p e r s l i c e v s n u m b e r o f s l i c e s B- O OC + HDD O - OO C + HDD B- O OC + S S D O - OO C + S S D Fig. 6: T ime in seconds per slice for the fou r co nfi gurations. Figure 5 shows the overall times and the d e composed times for solvin g a lin ear system with A of dimension 266 , 50 0 × 2 62 , 144 , and B of d imension 2 66 , 500 × k , where k is the nu mber of slices, on thr ee co nfiguratio ns: B-OOC + HDD, B-OOC + SSD, a nd O-OOC + SSD. The left bar f or eac h number of slices shows the overall times and the decomposed times of the initial configuration (B-OOC + HDD). As can be seen, its m ain drawback is the high I /O cost b e c ause of using a HDD. Th e center b ar fo r each number of slices shows the overall tim e s and the decomposed times of a configu ration similar to the pre vio us one with an SSD (B-OOC + SSD). As can be seen, the h igh I/O cost has been greatly red uced. The righ t bar for ea ch number of slices shows the overall times of the best configuratio n (O-OOC + S SD). As th is configuration overlaps co mputation and I/O, the time can n ot be decomp o sed. As can be seen, in mo st cases the I/O cost (the fast SSD) of the pr evious configuratio n is completely removed. Figure 6 shows th e time in second s required to compute one slice. As it sho ws, this time was not constant, and it depend ed somewhat o n the number of slices: th e more slices to compute, the lower the time pe r slice. J u st consider that, regardless of the number of sl ices (even for just one slice), th e whole factorized matr ix A must be read from disk. T h us, this large cost b ecomes diluted as more slices are being com puted. In the initial configu ration (basic O O C AB and HDD) the time per slice greatly dep ended on the number of slices. In the best configur ation (ov e rlapping OOC AB o n SSD) the time per slice is not s o dependen t o n the number of s lices. T able VI shows the time in seco n ds requ ired to compu te one slice fo r b oth the initial configu ration and the mo st perfor mant configu ration. As can be seen, the r ange o f th e initial configuration is very wide (from 2.50 to 14.54 seco nds), wh ereas th e ra n ge of the m ost perform a nt config uration is much narr ower (f r om 0.72 to 1.6 5 seconds). The weights matrix for the h ighest resolu tion in ou r experimental study required a storage o f abo ut 560 GB ( 265 , 5 0 0 × 262 , 14 4 d ouble precision elements). Besides the weights matrix, additional space (patient’ s data, final image, tempora ry data, applicatio n code, o perating system, disk buffers and cache , etc.) m akes the total size require d by th is p roblem even larger . As was told , the computer h ad 1 28 G B of RAM. Howe ver, only 32 GB wer e employed as a cache to stor e blocks o f the weights matrix , leaving the rest for o th er purpo ses (operating system disk cache and buf fe rs, etc.). W e assessed another comp uter with 48 GB of RAM, and results were similar wh en using a similar number of co res, but we did no t repor t its results because it was a much m o re expensive server . Hence, to obtain good p erform ances, a sma ller main memory could be used (th u s r educing the total p rice), but a fast SSD is a strong requirem ent. T ABLE VI: T ime in se conds per slice versus number of slices. Number of slices Method 256 512 1024 2048 B-OOC + HDD 14.54 7.72 4.33 2.50 O-OOC + SDD 1.65 1.00 0.83 0.72 11 I V . C O N C L U S I O N S In this pap er , we p resent a direct alg ebraic method based on the QR factorizatio n fo r reconstructing CT images efficiently on affordable com puters. As we have shown, this method is numerically stable e ven for high resolu tions provided th e weig h ts matrices have full ran k . For this reaso n, our m ethod employs mor e X-ray pr o jections than th e iterativ e m ethods, but fewer than the a nalytical method s. Besides, the use of few projection s ca n af fec t th e result of the diagnosis since some path o logies could be hidd en or simulated . W ith our proposed m ethod, which u ses a number of projection s that guarantee the full ran k of th e weights ma tr ix, high-q uality im ages are obtained without requir in g an a-priori knowledge or in teraction with the patient. This m ethod guaran tee s the non- creation of artifacts excep t those pr o duced by problems on detectors, dispersion, movement, intensity of th e source, etc . , which can be co rrected by filterin g and segmen ta tio n techniques. In ad dition, our reconstruc tions achieve remarkable quality even for com plex real CT im ages. I t is worth to mentio n we have not considered or r e moved the possible noise on the pr ojections, which i s left as a futu re work . W e have sho wn that an efficient reconstruction of CT images can be a c hieved using Out-Of-Core and Algorithm - By-Blocks techniqu es. By u sin g our tec hniques, affordable com puters with a p rice of abo ut one order of m agnitude lower can be successfully emp loyed, bec ause a large main mem ory (which is quite expensive) is not required , just a fas t hard dri ve. F o r this reason, t he equipmen t ne eded to reconstruct the imag es is af ford able and thus more accessible to the public. T he type of hard drive can improve our reconstruction times dr astically . Wh en using a HDD, the per forman ce is dominated by the I/O time, wh ereas when using an SSD the I /O time is greatly red uced and the perf ormance is do minated by the compu tational tim e ag ain. Furtherm ore, the me th od that overlaps compu tation and I/O ca n furth er reduce the rec onstructing time, thu s making ou r method m ore competitive. W e could pe r form an standard CT study with resolution 512 2 and 256 slices in a b out 4 min utes. W e ha ve also shown that the cost per slice is lower as the numb er o f simultaneo us slices to reco nstruct is high e r , wh ich would be bene ficial f or full-bo dy CT scans. Considering all of th e above, we conclu de that compu tin g h ig h-qu a lity reconstruction s with direc t alg ebraic method s on afford a b le equipment can be achieved. Since ou r me th od is stable we could increase the resolution of th e images provided we had enoug h storage space, and still get valid resu lts. R E F E R E N C E S [1] A. C. Kak and M. Slane y , Principles of Computerized T omographi c Imaging . SIAM, jan 2001, vol. 33. [2] A. B. De Gon z ´ alez, M. Mahesh, K.-P . Kim, M. Bharga van, R . Lewis, F . Mett ler , and C . Land, “Projected cance r risks from computed tomographi c s cans performed in the unite d states in 2007, ” Arc hives of internal medicine , v ol. 169, no. 22, pp. 2071–2077, 2009. [3] E. Hall and D. Brenner , “Cancer risks from diagnostic radiology , ” The British journal of radiology , vol. 81, no. 965, pp. 362–378, 2008. [4] X. T ang, J. Hsieh, R. A. Nilsen, S. Dutta, D. Samsonov , and A. Hagiwara, “A three -dimensional-weigh t ed cone beam filtered back projection (CB-FBP) algorithm for image reconstructi on in volumetri c CT helica l sca nning, ” Physics in Medici ne & Biolo gy , vol. 51, no. 4, p. 855, 2006. [5] T . Zhuang, S. Leng, B. E. Nett, and G. -H. Chen, “Fan-beam and cone-bea m image reconstruction via filtering the backprojec tion image of dif ferentiate d project ion da ta, ” P hysics in Medici ne & B iolo gy , vol. 49, no. 24, p. 5489, 2004. [6] S. Mori, M. Endo, S. K omatsu, S. Kandat su, T . Y ashiro, and M. Baba, “A combination-we ighted Feldkamp-based reconstruction algori thm for cone-b eam CT, ” P hysics in Medici ne & B iolo gy , vol. 51, no. 16, p. 3953, 2006. [7] M. J. Wil lemink, P . A. de Jong, T . L einer , L. M. de Heer , R. A. Nie velstein , R. P . Budde, and A. M. Schilham, “Itera tiv e reconstruction techni ques for compute d tomography part 1: technica l principle s , ” Eur opean radio logy , v ol. 23, no. 6, pp. 1623–1631 , 2013. [8] M. J. Wil lemink, T . Leiner , P . A. de Jong, L. M. de Heer , R. A. Nie velstei n, A. M. Schilham, and R. P . Budde, “Iterati ve reconstructi on techni ques for computed tomography part 2: initi al resul ts in dose reductio n a nd image quality , ” Eur opean radiolo gy , vol . 23, no. 6, pp. 1632–1642, 2013. [9] A. H. Andersen, “ Algebraic reconstructi on in CT from limited vie ws, ” IEE E T ransaction s on Medical Imaging , vol. 8, no. 1, pp. 50–55, 1989. [10] A. H. Andersen and A. C. Kak, “Simulta neous algebraic reconstructio n techn ique (SAR T ): a superior implementation of the AR T algorithm, ” Ultrason ic Imaging , vol. 6, no. 1, pp. 81–94, 1984. [11] W . Y u and L. Zeng, “A Nove l W eighted T otal Dif ference Based Image Reconstru ction Algorit hm for Few-V iew Computed T omography, ” PLoS ONE , vol. 9, no. 10, 2014. [12] L. Flores, V . V idal, and G. V erd ˜ A o , “Ite rati ve reconstruction from few-vi ew projections, ” Pr ocedia Computer Scie nce , vol. 51, pp. 703–712, 2015. [13] L. A. Flore s , V . V idal, P . Mayo, F . Rodenas, and G. V erd ´ u, “P arallel CT image reconstruction b ased on GPUs, ” Radiation Physics and Chemistry , vol. 95, pp. 247–250, 2014. 12 [14] M. Chill ar ´ on, V . V idal, D. Se grelles, I. Blanque r, and G. V erd ´ u, “Combining grid comput ing and Dock er containe rs for the study a nd parametri zation of CT image reconst ruction methods, ” Proc edia Computer Science , vol. 108, pp. 1195–1204, 2017. [15] Y an Liu, Z hengrong Liang, Jianhua Ma, Hongbing Lu, K e W ang, Hao Zhang, and W . Moore, “T otal V ariation-St okes Strate gy for Sparse- V ie w X-ray CT Image Reconstruct ion, ” IEEE T ransact ions on Medical Imaging , vol. 33, no. 3, pp. 749–763, mar 2014. [16] J. T ang, B. E. Nett, and G.-H. Chen, “Performa nce comparison between total varia tion (TV)-based compressed sensing and statistical iterat ive reconstructio n algorithms, ” Physics in Medicine and Biology , vol. 54, no. 19, pp. 5781–5804, oct 2009. [17] B. V andeghinste , S. V andenberghe , C. V anhove , S. Stael ens, and R. V an Holen, “Lo w-Dose Micro-CT Imaging for V ascular Se gm entation and Analysis Using Sparse-V iew Acquisitions, ” PLoS ONE , vol. 8, no. 7, p. e68449, jul 2013. [18] Z. Zhu, K. W ahid, P . Babyn, D. Cooper , I. Pratt, and Y . Carte r , “Improv ed compressed sensing-ba sed algorithm for sparse-vi ew CT image reconstru ction.” Computational and mathemati cal me thods in medici ne , vol. 2013, p. 185750, mar 2013. [19] I. Sechopo ulos, “A re vie w of breast tomosynt hesis. Part II. Image reconstruc tion, processing and a nalysis, and adv anced appli catio ns, ” Medica l physics , vol . 40, no. 1, 2013. [20] H. Qi, Z. Chen, and L. Zhou, “CT Imag e Reconst ruction f rom Sparse Proj ectio ns Using Adapti ve TpV Re gularization . ” Computational and mathemati cal methods in medicin e , vol. 2015, p. 354869, may 2015. [21] M. J. Rodr´ ıguez-Alv arez, F . Sanchez, A. Soriano, L. Moliner , S. Sanchez, and J. M. Benlloc h, “QR-fa ctoriz ation Alg orithm for Computed T omography (CT): Comparison with FDK and Conju gate Gradient (CG) Algorithms, ” IEEE T ransactio ns on Radiation and Plasma Medical Scienc es , vol. 2, no. 5, pp. 459–469, 2018. [22] M. Chil lar ´ on, V . V idal, and G. V erd ´ u, “CT image reconstr uction with SuiteSparseQR f actori zation packa ge, ” Radiation Physics and Chemistry , 2019. [23] M. Chillar ´ on, V . V idal, G. V erd ´ u, and J. Arnal, “CT Medical Imaging Reconstruc tion Using Direct Algebraic Methods with Fe w Projections, ” in Computati onal Science – ICCS 2018 . Cham: Springe r Internatio nal Publishing, 2018, pp. 334–346. [24] P . Joseph, “ An improv ed algorit hm for reprojecti ng ray s through pixel images, ” IEE E T ransactions on Medical Imaging , vol . 1, no. 3, pp. 192–196, 1982. [25] A. Hore and D. Ziou, “Image Quality Metrics: PSNR vs. SSIM, ” in 2010 20th International Confer ence on P attern Recogni tion . IEEE, aug 2010, pp. 2366–2369. [26] S. T oledo and F . Gustavson, “The design and implementati on of solar , a portable library for scalabl e out-of-core linea r algebra computati ons, ” Pr oceedings of the Annual W orkshop on I/O in P arallel and Distribute d Systems, IOP A DS , 08 1998. [27] E. D’Aze vedo and J. Dongarra, “Design and implementa tion of the parallel out-of-core ScaLAP ACK LU, QR, and Cholesky fa ctorizatio n routine s, ” Concurr ency - Practice and Experienc e , v ol. 12, pp. 1481–1493, 12 2000. [28] W . C. Rei ley and R. A. v an de Geijn, “POOCLAP AC K: Parallel Out-of-Core Linear Algebra Package, ” Departmen t of Computer Science s, The Uni versity of T exas at Austin, T ech. Rep. CS-TR-99-33, 1999. [29] B. C. Gunter and R. A. v an de Geijn, “P arallel out-of-core computation and updatin g the QR fa ctorization , ” ACM T ransactions on Mathemat ical Software , vol. 31, no. 1, pp. 60–78, March 2005. [30] T . Jof frain, E. S. Quint ana-Ort ´ ı, and R. A. van de Geijn, “Rapid dev elopment of high-pe rformance out-of-core solvers, ” in Pr oceedings of P ARA 2004 , ser . L N CS, no. 3732. Springer- V erlag Berlin Heidelber g, 20 05, pp. 413–422. [31] B. C. Gunter , W . C. Reiley , and R. A. van de Geijn, “Parallel out-of-core Cholesk y and QR factori zations with POOCLAP AC K, ” in Pr oceedings of the 15th Internat ional P aralle l and Distribute d P r ocessing Symposium (IP DPS) . IEEE Computer Society , 2001. [32] G. Quintana -Ort ´ ı, F . D. Igual, M. Marqu ´ es, E. S. Quinta na-Ort´ ı, and R. A. V an de Geij n, “ A runtime system for programming out-of-core matrix algorithms-by-til es on multit hreaded architec tures, ” ACM T ransacti ons on Mathema tical Soft war e (TOMS) , vol. 38, no. 4 , p. 25, 2012. [33] M. Marqu ´ es, G. Quintana-Ort ´ ı, E. S. Qui ntana-Ort ´ ı, and R. van de Geijn , “Using desktop computers to solv e large -scale dense linea r algebra proble m s, ” The Journal of Super computing , vol. 58, no. 2, pp. 145–150, Nov 2011. [34] M. Marqu ´ es, G. Quintana-Ort ´ ı, E. S. Quintana-Or t ´ ı, and R. A. va n de Geijn, “Out-of-Core Computation of the QR Factori zation on Multi-c ore P rocessors, ” in Lectur e Notes in Computer Science 5704, Euro-P ar’2009, (Eds. H. Sips, D. Epema, H. -X. Lin) , 2009, pp. 809–820. [35] G. Lauritsch and H. Bruder , “FORBILD Head Phantom. ” [Online]. A vaila ble: http://www .imp.uni- erlangen.de/ph antoms/head/head.html [36] K. Y an, X. W ang, L. Lu, and R. M. Summ ers, “Deeple s ion: automated m ining of large -scale lesion annota tions and univ ersal lesion detec tion with deep lear ning, ” Journal of Medical Imaging , vol . 5, no. 3, p. 036501, 2018.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment