Performance Evaluation of Multiple TCP connections in iSCSI

Scaling data storage is a significant concern in enterprise systems and Storage Area Networks (SANs) are deployed as a means to scale enterprise storage. SANs based on Fibre Channel have been used extensively in the last decade while iSCSI is fast be…

Authors: Bhargava Kumar K, Ganesh M. Narayan, K. Gopinath

Performance Evaluation of Multiple TCP connections in iSCSI
Pe rf ormance Evalu ation of Multiple TCP connections in iSCSI Bhar ga v a Kumar K Ganesh M. Narayan K. Gopi nath { bhar gava, nganesh, gopi } @csa.ii sc.ernet.in Computer Ar c hitectur e and Systems Laboratory Dept. of Computer Science and A utomation Indian Institute of Science, Bangalor e . Abstract Scaling data storag e is a significant concern in en ter - prise systems an d Storage Ar ea Networks (SANs) are de- ployed as a means to scale enterprise s torage . SANs ba sed on F ibr e Channel have b een used extensively in the last decade while iSCSI is fast becoming a seriou s conten der due to its reduced costs and un ified infrastructur e. This work examines the performance of iSCSI with multiple TCP connectio ns. Multiple TCP c onnection s ar e often used to r ealize h igher ban dwidth but there may be no fairness in how band width is distributed. W e pr opose a mechanism to share congestion informatio n acr oss multiple flows in “F air-TCP” fo r imp r o ved p erformance. Our r esults sho w that F air-TCP significa ntly imp r oves the p erformance for I/O intensive workloads. 1. Intr oduction Future co mputer systems are req uired to scale to large volume of d ata that is being g enerated and used, with in- creasing cap acities and drop ping prices o f m agnetic disks. T r aditional D AS based arch itectures, based on parallel SCSI transport, scale poo rly o wing to their distan ce, co n- nectivity and throughpu t limitations and are being rep laced by networked sto rage systems like SAN. Figure 1 shows a SAN connecting multiple servers to multiple targets. SANs, wher e the storage devices are con nected directly to a highspeed network, can p rovide high scalability an d throug hput gu arantees; SANs allow any-to -anywhere ac- cess acro ss the network, using in terconnec t elemen ts suc h as routers, gateways, hu bs and switches; they also f acilitate storage shar ing between possibly heterog eneous servers to improve storag e utilization and reduce downtime. Entities in a SAN, both storage and servers, commun i- cate using SCSI command s. A sende r encapsu lates SCSI Figure 1. Elements and Ecosystem of an en- terprise SAN command s over a transpor t protoco l and sends it to one or more receiv ers; receivers receive the payload , decapsulates the co mmand s, and execute them. Thus a SAN is d efined by th e transport it uses and the encapsulation standa rd it follows. In this lieu, there are two competing industry stan- dards – FC and iSCSI, which allow u s to b u ild SANs, each based on differing tr ansport and encapsulatio n standards. The Fibre Cha nnel (FC) is a serial interface, usually im- plemented with fibre- optic cable. FC Standar d [ 2] cov- ers th e physical, lin k, network and transpo rt layers of th e OSI ne twork stack and also provide a SCSI e ncapsulation protoco l – FCP . FC SANs, with m ost FCP implementa- tions being har dware acce lerated, provide better th rough - put gu arantees. Howev er , FC installations ar e co stlier and are cannot be dep loyed over lo ng distances. Fib re Chan- nel requires custom network c ompon ents and is not able to take advantage of the steep technolo gy curves and dro pping costs of IP-based networks. Internet SCSI or iS CSI [1] is a storage netw orking stan- dard that transp orts SCSI com mands over TCP/IP , essen- tially tunne ling stor age pro tocols on top of TCP , hence IP , to leverage the installed eq uipmen t base. This allows iSCSI to be used o ver any TCP /IP network infr astructure wit h the remote d evice being seen b y th e op erating system as a lo- 1 3.44 3.445 3.45 3.455 3.46 3.465 3.47 3.475 3.48 3.485 3.49 x 10 9 −10 −8 −6 −4 −2 0 2 4 6 8 10 Figure 2. Difference in Cong estion window of 2 different connections cally a vailable block -lev el device. Unlik e Fibre Channel, iSCSI ca n ru n o n any n etwork infrastru cture that suppor ts TCP/IP . A network that uses iSCSI needs only one network for both storag e and data traffic whereas Fibr e Channel re- quires separate in frastructur e for storage and d ata traffic. Howe ver, a respo nse to a block-level req uest in iSCSI may encoun ter a greater delay compared to Fibre Chan nel, de- pending o n the network con ditions and the lo cation of the target. Current efforts to improve end-to -end perfor mance for TCP are taking advantage of the empir ically discovered mechanism of stripin g d ata tran sfers acr oss a set of paral- lel TCP con nections between a sender and receiver to sub- stantially increase TCP th rough put. Howe ver , when mu lti- ple connections are u sed between the same source-target pairs, the co nnections them selves interact/compe te with each other in non trivial ways. In o rder to achiev e optimal throug hput it is imperative that we u nderstand these inter- actions and treat the con nections a ccording ly; failing which could lead to incr eased cong estion and redu ced through put. In our work, we s tudy the ef fects of using multiple TCP connectio ns on iSCS I. I t was shown in [10] that th e aggre- gate iSCSI th rough put increases w ith the increase in nu m- ber of TCP connections in an emulated wide area network. W e find that th e multiple T CP connectio ns used by iSCSI compete with each o ther and r esult in lesser through put for iSCSI than they are capable of. W e p ropose a solu- tion n amed Fair-TCP based on TCP c ontrol block interde - penden ce [20] fo r manag ing TCP conn ections. W e com- pare the per formanc e of our variant with the standard TCP Reno[3] with SA CK[4], using various workloads and v ar y- ing delay s in an em ulated wide a rea network. W e fin d that for I/O intensive work loads such as seque ntial write to a large file, Postmark and Bon nie, Fair-TCP provides signif- Figure 3. Congestion win dow of 2 different connections icant p erform ance improvements over standard TCP-Reno with SA CK. Section 2 d escribes the b ehaviour of multiple TCP con - nections and its effects on iSCSI. The proposed solution is also outlined ther e. Section 3 details the experimen tal setup, tools and benchmark s u sed in our experiments. Sec- tion 4 presents ou r resu lts with a discussion . Sectio n 5 re- views related work. Section 6 concludes the paper . 2. iSCSI and TCP SCSI standard assumes that the underlyin g tr ansport is reliable and suppo rts FIFO orderin g of commands. T CP has mechanisms to ackn owledge th e received TCP packets and to re send/requ est p ackets that are not ack nowledged within a cer tain tim e period, effecti vely guaranteeing r eli- able a nd in-order deli very of packets. Choosin g TCP as a transport is thus a natural choice. If iSCSI were defined on top o f a protocol that i s n ot reliab le and in-order then iSCSI would ha ve had to provide these services by itself. iSCSI initiators are usually conn ected to iSCSI targets using mu ltiple T CP co nnection s. The reason is two fold: due to TCP wind ow size restrictions and r ound trip times over lo ng distances, it might not be possible for a sing le TCP connection to utilize the fu ll b andwidth capacity o f the un derlying link; seco ndly , th ere may also be several physical intercon nects co nnecting the initiator and target, and it would b e most d esirable to agg regate and simulta- neously utilize all such existing ph ysical interconnects. As TCP does not suppo rt such aggregation , an iSCSI session is theref ore defin ed to be a collection of one or more T CP connectio ns between the initiator and the target. 2 Figure 4. Fair -TCP Design 2.1. Behaviour o f multiple TCP connections Many applicatio ns use multiple TCP connections be- tween client and server for increased throughpu t. Howe ver these TCP connec tions ar e treated indepen dently: most TCP im plementation s keep state o n a per-conne ction ba- sis in a structu re called TCP con trol bloc k (TCB) or an equiv alent constru ct an d each of th ese TCP con nections are handled ind ependen tly . Several researchers ([18], [19], [20]) hav e shown that concurren t connectio ns such as these compete with each o ther for link b andwidth, often resulting in unfair and arb itrary sharing of bandwidth. Conc urrent connectio ns do no t share indications o f co ngestion alo ng the shared path between the sender and receiver . There- fore each co nnection indep endently pu shes the network to a point where pac ket losses are boun d to happe n. Once the ne twork is congested, all the competin g conn ections re- duce their tr ansmission windows d rastically , thus limiting the effective band width av ailable to the application, This results in und er utilization of the shar ed link, and hence less aggregate throu ghput th an the ap plication is capable of achieving. Also, it often happ ens that som e of the co n- nections stall d ue to multiple losses, while others proce ed unaffected. Thu s c oncurr ent TCP connections, when left without any explicit arbitration, provide neither ban dwidth utilization nor fairness. Some of the info rmation in a TCB, like roun d-trip time (R TT), is not applicatio n sp ecific but is specific to a ho st (or subnet). If there are multiple TCP connections b etween the sam e hosts, each will indepen dently mon itor its trans- missions to estimate th e R TT between the hosts. Such a scheme is wasteful as it needs extra pro cessing a nd mem- ory at a TCP e ndpoin t. An alternate scheme is to share this informa tion b etween such concur rent co nnection s. In ord er to see if iSCSI su ffers from any of the ab ove problem s, we ev aluated the p erforma nce of iSCSI with multiple TCP co nnections. W e observed tra ces of con ges- tion window of ea ch co nnection , for a sequential file w rite of 1GByte to see if the conne ctions are com peting with each other . Figur e 3 shows a sample of traces of congestion window fo r 2 different c onnection s in a W AN en v ironmen t with delay of 4 ms for a p eriod of 1.2 secon ds. The co n- gestion window was collected appr oximately every 1 0ms. Ensemble Allocation conn srtt = ecb srtt conn rttv ar = ecb rttvar conn snd cwnd = ecb snd cwn d/ref cn t conn snd ssthresh = ecb snd ssthresh/ref cnt T able 1. Ensemble Allocation Figure 2 shows a sample of the difference in congestion window of the 2 connection s f or a period of 50 seconds. From the traces, we can see the two connectio ns com- pete f or bandwidth r esulting in one connection using the network more than th e other . The ob served mean and standard d eviation for cong estion win dow of the two co n- nections are 3.38/2. 14 and 3 .38/2.1 3. The observed mean and stand ard deviation for the difference in window sizes shown in figure 2 is 0 and 3.06. The mean 0 in the windo w difference ind icates that over long period s each connection gets the same amo unt o f network bandwid th. The larger de- viation in w indow difference compared to the deviations in each connectio n’ s window , indicates that when on e co nnec- tion has a large wind ow the other co nnection has a smaller window . Th is is a very u ndesirable behaviour f rom th e TCP connectio ns which r esults in reduced throu ghput. Similar patterns were observed in all traces. For the above traces the m ean turn around time was 4 53 ms with a standard d e- viation of 325ms which we try to reduce. Thus we un- derstand th at the multiple connections be tween iSCSI do compete and share the bandwid th disproportio nately and underu tilized the resourc es. In our work, we share the con- gestion infor mation am ong the different TCP conn ections to r educe the comma nd tu rnarou nd times and increase th e throug hput of iSCSI. 2.2. Fair -TCP Sev eral researchers ha ve worked on sharing the conges- tion information among multiple TCP conn ections ([20], [21], [22]). T ouch [20] pr oposed sharing T CP state am ong similar co nnections to imp rove the b ehaviour of a connec- tion bundle. A bundle of TCP connections sharin g TCB informa tion is called an ensemble . W e have implem ented a conge stion info rmation sharing mechanism, Fair -TCP based on T ouch[2 0 ]. The TCBs of individual conn ections are strippe d of R TT an d cong estion control variables. In- stead, they now simp ly contain a ref erence to th e Ensem- ble Control Block (ECB), o f the ensemble they are part of . Fair - TCP does n ot su pport cach ing of TCB states, since connectio ns in an iSCSI session are persistent for a very long time, and ar e no t reestablished fr equently . Figure 4 outlines the design of Fair -T CP . Fair - TCP agg regates congestion win dow and slow start threshold values in the ECB p er ensemble. Ensemble allo- 3 cates fair share of available window to eac h connectio n. Fair - TCP shares the roun d trip time information among connectio ns of the ensemb le. Fair-TCP maintains a refer- ence cou nt o f the num ber o f co nnection s in the ensemble. T able 1 outlines the allocation of congestion inf ormation to connectio ns of the ensemble. For each wind ow update rec ei ved from a connectio n, the aggr egate window is adjusted approp riately . Th e most recent value of srtt ( smoothed ro und trip time) and rttvar (roun d trip time variance) repor ted by a connectio n is m ain- tained in the ensemb le. Whenever a new co nnection is es- tablished, it is add ed to the correspon ding en semble with- out any chan ges to the en semble. If there is no ensemble correspo nding to th at conn ection, a ne w ensemble will be created a nd initialized with the values from that conn ection. Fair - TCP has b een implemented on both the target and th e initiator . 3. Experimental Setup 3.1. T ools and Benchmarks The UNH-iSCSI [1 5] protocol implementatio n of initia- tor and tar get is used for all our e xperimen ts. It is designed and maintained by UNH InterOpera bility lab’ s iSCSI Con- sortium. The im plementation c onsists of initiator an d target drivers for Linux 2.4.x and 2.6.x kernels. It su pports mu l- tiple sess ions between a giv en initiator target pair, mu ltiple connectio ns per session, arbitrar y number of o utstanding R2Ts, all combinatio ns o f initialR2T and ImmediateData keys, arbitrary values of d ata tran sfer size related iSCSI pa- rameters and Error Recovery level 1. The NIST Net [24] netw ork emulation tool for GNU/Linux is used for intr oducing d elays. NIST Net al- lows a GNU/Linux PC set up as a router to emu late a wide variety of network c onditions. The tool is desig ned to al- low contr olled, rep roducib le experiments f or network per- forman ce sensitiv e/adaptive app lications an d co ntrol p ro- tocols in a simple laborato ry setting. By op erating at th e IP level, NIST Net can emu late the critical end- to-end per - forman ce c haracteristics im posed by v ar ious wide area net- work situations (e.g., congestion loss) or by various under- lying sub network technolog ies. The too l allows an inex- pensive PC-based ro uter to emulate n umerou s comp lex per- forman ce scenarios, includin g: tu nable packet delay distri- butions, con gestion and back groun d loss, ban dwidth limi- tation, and packet reordering/du plication. Bonnie++ [ 25] is a benchmark suite that is aimed at per - forming a number of simple tests of h ard d riv e and file system pe rforman ce. The benchmark tests database ty pe access to a sing le file (or a set of files), an d it tests cr e- ation, reading , an d d eleting of small files which can simu- late the usag e of program s such a s Squid , INN, or Maild ir Figure 5. Experimental T estbed format email. The first six tests include per-char write, block write, block rewrite, per-char read, b lock read and random seeks. For each test, Bon nie reports the num- ber of Kilo-by tes proce ssed per e lapsed secon d, a nd the %cpu usage (sum of user and system). The next 6 tests in volve file create/stat/unlink to simulate some operation s that are common bottlenecks on large Sq uid and INN servers, and mach ines with tens o f th ousands o f ma il files in /var/spool/mail. The P o stmark [26] bench mark models the workload seen by a busy web server and is sensiti ve to I /O laten cy . The workloa d is meant to simulate a co mbination of elec- tronic mail, netnews and web-based comm erce transac- tions. Postmark creates a large number o f small files that are constantly updated. Once the p ool of files has been c re- ated, a specified n umber of transactions occu r . Each tran s- action consists of a pair of smaller transactions, i.e. Crea te file or Delete file an d Read file o r Append file. Each trans- action type and files it affects are chosen ran domly . On completion of each r un a repo rt is gen erated showing met- rics such as elap sed time , tran saction rate, total n umber of files created, read size, read th rough put, write size, write throug hput and so on . Th e Postmar k configuratio n used in our experiments is listed in T a ble 6 and rest of the parame- ters hav e been set to default. 3.2. Experimental T e stbed Our experimental W AN emulatio n testbed is illu strated in Figure 5. Th ree m achines were used in our e xperimen tal setup: initiator, rou ter an d target. All the th ree machines were connecte d to a D-Link DGS1 0008T L gigabit switch using gigabit NICs. The initiator hosted a 2.6 GHz Intel Pentium 4 proces- sor , 256 MBytes of RAM and Bro adcom BCM5700 gigabit Ethernet contr oller . It w as r unning the Linu x 2.4.2 0 ker- nel. Th e system in wh ich th e target was hosted had a dual 866 MHz Pentium III processor, 7 56 MBy tes RAM and Fi- bre Chann el host bus adap ter . Th e target was c onnected to two JBODs, each housing three Seagate ST336752FC 1 5K RPM disks. Th e target was r unning a Linux 2 .6.5 kern el for i686. Th e machine d esignated as router h osted a hyper- threaded 2.6 GHz Pentiu m 4 processor, 1 GB of RAM an d 4 0 2 4 6 8 10 0 0.5 1 1.5 2 2.5 3 x 10 4 TCP Retransmits Delay(ms) No of Retransmits TCP Fair−TCP Figure 6. TCP Retransmits for bloc k size o f 1024 bytes two gigabit NICs (D-Link DL2K and Intel 82547EI). Both the in itiator and th e target w ere r unning UN H iSCSI implem entation. The machine design ated a s rou ter between the initiator was run ning NIST Ne t network em - ulation too l to simulate a W AN en vironm ent. The W AN simulation was tuned in accordance with profiling in forma- tion pre sented in V ern Paxson [ 23], wh ich fou nd that over long p eriods n etwork conn ections suffered a 2.7 % pac ket loss in a wide- area network. Perfo rmance measur ements were calculated using varying d elays. Socket buf f er sizes on both the initiator and the target were set to 512KBytes. 4. Results and Discussion In all our experiments 4 TCP connection s were used in a session between the in itiator and the target. Girish[10] identifies th at beyond 4 connections the incremen tal in- crease in th rough put is very lo w . Standard ethern et fr ame size of 1500 bytes was used in all experiments. W e did not consider using jumbo fr ames, since in real systems no t all com ponents in the ne twork path suppo rt jumbo fr ames. Large socket buffers are n ecessary to a chieve peak perf or- mance for networks with large bandwidth -delay pro ducts. Socket buffers on b oth th e initiator a nd th e target wer e set to maximum v alue of 512KBytes. 4.1. Sequential File Writes Figure 7 shows the perf ormance of iSCSI fo r a sequen - tial file write of 1GB with different blo ck sizes for write system call and varying network d elays. A request for fsync was mad e bef ore closing the file to en sure th at all the data were wr itten to the disk. Figure 7 shows the perfo rmance of iSCSI with stand ard Reno TCP with SA CK (ref erred to as standard TCP or TCP) and F air-TCP implemented on top of it. As shown in the figu re Fair - TCP perform s bet- ter than normal TCP-Reno at all delay s. But as the delay s increase the ga p narrows, this we believe is due to delays Delay TCP Fair - TCP (ms) Mean SD %SD Mean SD %SD 0 17.0 5.0 29 16.4 3.2 1 9 2 13.2 4.3 33 16.0 3.4 2 1 4 13.6 4.1 30 15.7 3.2 2 0 6 14.2 4.1 29 15.6 3.2 2 0 8 14.5 4.2 29 15.6 3.1 2 0 10 14.8 4.1 27 15.6 3.1 2 0 T able 2. Aggregate Congestion Window overwhelming the window manag ement ef ficien cy of Fair - TCP . The blo ck sizes used in the write system call had little effect on the overall t hroug hput. T o find o ut the reason be- hind such behaviour, we observed the SCSI req uest sizes received by iSCSI. Since the writes were sequential the op- erating system was able to bundle all th e file wr ites into chunk s o f 128KBytes. The op erating system ag gressiv ely caches e ach write an d bundles them and sends to the disk. So the block size was not really a factor that affects the throug hput for sequential file writes. W ith increasing delays throughpu t of iSCSI decreased rapidly . This we believe is mainly due to th e syn chrono us nature of iSCSI. There can o nly be a limited n umber of pending SCSI requests that can be held with the target. The initiator has to wait for a m inimum of an R TT secon ds be- fore send ing anoth er requ est, i.e. the initiato r can only send a limited number of SCSI requests during an R TT inter- val, which do not generate enou gh traffic in the network to match the ban dwidth-d elay pro duct and get th e maximum possible throughp ut. T o see if the incr ease in through put obser ved in Fair- TCP is due to better man agement of window or aggressive nature o f Fair-TCP , we measured the number of TCP re- transmits f or a block size o f 1 024 b ytes with varying de- lays. The results are shown in fig ure 6. The number of retransmits for Fair -TCP are less t han that of stan dard TCP in almost all the cases . Fair -T CP shares the most recent es- timate o f R TT , b etween all conn ections. As a result it ha s fewer false retransmits than standard TCP . T able 3 shows the me an and Stan dard Deviation of SCSI command turnaro und times in millisecon ds (ms). Mean command turna round times f or Fair-TCP are less than that of standard TCP . The deviation perc entage is also less for Fair - TCP than stan dard TCP excep t for the delay of 0 ms, for which the deviation was more. Further experiments are required to d etermine the exact reason fo r such a behaviour . T able 2 sh ows the 1.5 aggregate con gestion wind ow of all con nections f or TCP and Fair - TCP for different d elays collected on the in itiator (write traffic is mainly d ata-outs from the initiator). Fair-TCP has a larger window and h as lesser deviation than standard TCP , w hich in dicates Fair- 5 512 1024 2048 4096 8192 0 1 2 3 4 5 6 7 8 9 Delay 0 ms Block Size(bytes) Throughput (MB/s) TCP Fair−TCP 512 1024 2048 4096 8192 0 1 2 3 4 5 6 Delay 2 ms Block Size(bytes) Throughput (MB/s) TCP Fair−TCP 512 1024 2048 4096 8192 0 0.5 1 1.5 2 2.5 3 3.5 Delay 4 ms Block Size(bytes) Throughput (MB/s) TCP Fair−TCP 512 1024 2048 4096 8192 0 0.5 1 1.5 2 2.5 Delay 6 ms Block Size(bytes) Throughput (MB/s) TCP Fair−TCP 512 1024 2048 4096 8192 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Delay 8 ms Block Size(bytes) Throughput (MB/s) TCP Fair−TCP 512 1024 2048 4096 8192 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 Delay 10 ms Block Size(bytes) Throughput (MB/s) TCP Fair−TCP Figure 7. Throughput f or Sequential File writes with v ar ying delays and bloc k siz es TCP has more stable window than standard TCP . In our experimen ts of sequential file writes, we o bserve that Fair-TCP offers better through put and reduc es the d e- viation in command tu rnarou nd times. Fair-TCP is less burstier than standard TCP and re duces th e number of false retransmits. Fair - TCP ensur es that each con nections an equal share of the a vailable bandwidth. 4.2. Sequential File Reads Figure 8 shows the per formanc e of iSCSI with d ifferent block sizes for read system call an d varying network de- lays. Th e displayed th rough put are f or a file read of 1 GB. The thr oughp ut f or read s are less th an f or writes. This is mainly due to buffer cache in the Linu x kernel, which per- forms all th e writes in the mem ory and flu shes them to th e disk as the mem ory becomes full. Whereas for read s, the operating system fetches the data fro m the disk whenever needed. As sho wn in th e figure 8, Fair-TCP performs better th an Delay TCP Fair - TCP (ms) Mean SD %SD Mean SD %SD 0 208 225 108 102 131 1 27 2 351 265 75 183 117 64 4 450 288 64 310 146 47 6 548 332 60 414 182 43 8 642 376 60 414 182 39 10 728 378 51 636 225 35 T able 3. SCSI Command turnar ound times for Writes normal TCP-Reno at all delay s. Howe ver th e increase in throug hput is not significant. This is due to only one pend- ing read request at the SCSI layer . The block sizes used in read system call had little ef fect on the overall thro ughpu t. From the traces, we observed that th e read reque st sent by the ope rating system are f or 128KBytes. The Linu x kernel prefetches disk block s start- 6 512 1024 2048 4096 8192 0 0.5 1 1.5 2 2.5 Delay 0 ms Block Size(bytes) Throughput (MB/s) TCP Fair−TCP 512 1024 2048 4096 8192 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Delay 2 ms Block Size(bytes) Throughput (MB/s) TCP Fair−TCP 512 1024 2048 4096 8192 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Delay 4 ms Block Size(bytes) Throughput (MB/s) TCP Fair−TCP 512 1024 2048 4096 8192 0 0.2 0.4 0.6 0.8 1 Delay 6 ms Block Size(bytes) Throughput (MB/s) TCP Fair−TCP 512 1024 2048 4096 8192 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Delay 8 ms Block Size(bytes) Throughput (MB/s) TCP Fair−TCP 512 1024 2048 4096 8192 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Delay 10 ms Block Size(bytes) Throughput (MB/s) TCP Fair−TCP Figure 8. Throughput f or Sequential file reads with v arying block sizes and dela ys ing with 1 pr efetch and increases th e numb er of prefetches upon success. The maximu m number of p refetches tha t can be done is 32, which is equiv alent to 128KBytes. Since the file is sequential, requests get clustered into a single disk read of size 1 28KBytes. Figure 9 shows the various kernel compon ents inv olved in a read operatio n[27 ]. T able 4 shows the me an and Stan dard Deviation of SCSI command turnaro und time s for requ ests g enerated in mil- liseconds (ms). Mean comm and turnaround times for Fair - TCP are less tha n that of standard TCP . The deviation per- centages are also smaller for F air-TCP . 4.3. Bonnie++ T able 5 shows th e p erforma nce of Fair - TCP and Stan- dard TCP for Rewrites and Seeks with Bonnie++[2 5 ]. Bon- nie++ was ru n in fast mode in all our experiments. File size of 1GB and block size of 102 4 bytes were used. Re- sults for block writes and block reads are om itted, since single process Bonnie++ writes and read s are similar to the Delay TCP Fair - TCP (ms) Mean SD %SD Mean SD %SD 0 52 105 201 56 104 18 6 2 87 140 161 71 82 116 4 116 127 110 103 95 92 6 147 147 100 133 99 75 8 172 150 87 160 104 65 10 196 139 71 187 103 55 T able 4. SCSI Command turnar ound times for Reads sequential file writes and reads which were discussed be- fore. Create/stat/unlink tests of bon nie++ were n ot used , since they were similar to work load gener ated by Postmar k. Fair - TCP impr oves the performan ce o f rewrites by about 5-35%. Rewrites are similar to reads, except that they are dirtied and written back. The lo wer throughpu t seen for rewrites is mainly due to block ing durin g read req uests. 7 Figure 9. Kernel components in v olved in a read operation TCP Fair - TCP Delay Rewrite Seeks Re write See ks (ms) (KB/s) /sec (KB/s) /sec 0 1383 65.8 16 33 93 .9 2 669 65 .4 917 82.7 4 484 58 .5 591 71.1 6 389 54 .5 433 63.7 8 322 52 .5 341 57.0 10 279 4 2.0 29 1 4 7.1 T able 5. Bonnie++ single pr ocess Fair - TCP imp roves the pe rforman ce of seeks by 10-40 %. Due to parallel seeks (3 seek s by default), more d ata is queued at the SCSI layer and results in better seek rate for Fair - TCP . Bonnie++ allows ru nning se veral in stances of it in a syn- chronize d way using semapho res. All the pro cess use the semaphor e will start each test at th e same time. W e ran 4 pro cess of Bonnie++ using the above f unctiona lity pr o- vided by semaphores. Each process perfor ms the tests with a file size o f 256 MB and a blo ck size of 10 24 bytes. Results for blo ck writes, block rewrites, block reads and random seeks are shown in fig ure 10. As o bserved in the previous experiments, F air-TCP perform s well b u t the improvement diminishes with increasing delay s. Howe ver, fo r rand om 0 2 4 6 8 10 0 2000 4000 6000 8000 10000 Block Writes Delay (ms) Throughput (KB/s) 0 2 4 6 8 10 0 1000 2000 3000 4000 Block Rewrites Delay (ms) Throughput (KB/s) 0 2 4 6 8 10 0 2000 4000 6000 8000 Block Reads Delay (ms) Throughput (KB/s) 0 2 4 6 8 10 0 50 100 140 Random Seeks Delay (ms) Seeks / sec Figure 10. Bonnie++ 4 proces s seeks, Fair -TCP perfo rms consistently better than standard TCP and improves the seek rate by 20-3 0% irrespective of delays. Parameter V alue Number of Simultaneou s Fil es 20000 Lower Bound on file size 500 Bytes Upper Bound on file size 100 KBytes Number of T ransactions 50000 T able 6. P ostm ark P arameters 4.4. Postmark Postmark[2 6 ] was run with 20000 initial files and 5 0000 transactions. Figure 11 shows the times taken to run the postmark for stan dard TCP and Fair-TCP . Around 4GB of d ata was transacted du ring the execution of p ostmark. W e n otice that standard TCP needs 1 0-18% more tim e than Fair-TCP to run Postma rk. Considering that Post- mark is single threaded an d reads are synchronou s, the perfor mance impr ovement ob served is main ly due to asyn- chrono us file writes and metad ata writes from the b uffer cache. Figure 13 shows the read and write throughpu t for stan- dard TCP and Fair -TCP . T he read and write thr ough put improves by abou t 1 0-18% for Fair -TCP . Due to filesys- tem cachin g effects and asynchro nous nature of writes, throug hput f or w rites in all cases is better than rea d throug hput, an d decreases with increasing delays. Postmark is co mpletely sing le-thread ed. In a normal web server , ther e would generally be more than on e thread runnin g a t a time. T o simulate the work load better we 8 0 2 4 6 8 0 2000 4000 6000 8000 10000 12000 Delay (ms) Execution Time (seconds) TCP Fair−TCP Figure 11. P ostmark execution times 0 2 4 6 8 10 0 1000 2000 3000 4000 5000 6000 Delay (ms) Execution Time (seconds) TCP Fair−TCP Figure 12. Execution time for 10 P o stmark proces ses ran 10 co ncurren t processes of Postmark, eac h with initial files of 200 0 and 5000 transaction s with the rest o f the pa- rameters as in T able 6. The results f or the time take to complete all th e Postmark p rocesses are shown in Fig ure 12. The times fo r multipro cess Postmark are almost h alf that of a single process Postmark f or the same param e- ters. W e ob server that stand ard TCP needs 1 7-50% mor e time than Fair -TCP to complete Postmark execution. The perfor mance improvement observed going from a sing le Postmark process to mu ltiple processes is due to se veral requests g etting queued a t the SCSI level. Fair-TCP h as more data av ailable at the TCP level in a multip rocess en - vironm ent th an in a sing le pr ocess and this im proves the perfor mance. Figure 14 shows th e read and write throughp ut in a mul- tiprocess Postmark en vironm ent for stand ard TCP and Fair- TCP . T he th rough put shown are the ag gregate throughp ut 0 2 4 6 8 0 50 100 150 200 250 300 350 400 450 500 550 Delay (ms) Throughput (KBytes\sec) TCP Read TCP write Fair−TCP Read Fair−TCP Write Figure 13. P ostmark Read Write throughput 0 2 4 6 8 10 0 200 400 600 800 1000 1200 1400 1600 Delay (ms) Throughput (KBytes/sec) TCP Read TCP Write Fair−TCP Read Fair−TCP Write Figure 14. Aggregate Read and Write throughput f or 10 Postmark pr ocesses for all the 10 pro cesses. W e see that Fair -TCP increases the ag gregate read and write throughpu t by 17-50% over standard TCP . 4.5. Ker nel Compile The kernel com pile experime nt in v olves untar , con fig and ma ke of 2.4.20 Linux kerne l. Th e time taken in sec- onds to complete the kernel comp ile for various delays is shown in figu re 1 5. Kernel compile is CPU intensive and generates large amo unts o f meta-d ata. Fair -TCP imp roves the perform ance o f kernel compile by 8-17%. 5. Related W ork Due to inc reasing imp ortance to stora ge scaling and re- ducing costs, there has been num ber of efforts to build ef- 9 ficient implem entations of iSCSI an d ev aluate various as- pects of iSCSI. The w ork in Aiken[6] evaluates the performance of iSCSI in 3 different configuratio ns, a comme rcial d eploy- ment of iSCSI with Fibr e Chan nel and iSCSI stor age, a SAN e n vironmen t with software-based targets and initia- tors u sing existing hardware an d in a W AN emulated en- vironm ent with varying delays. The auth ors in [9] e val- uate the p erform ance of iSCSI when used in stora ge o ut- sourcing. Th ey examin e th e impact of laten cy o n appli- cation perfor mance and how cach ing can be used to hide network latencies. The authors in [11] use simulation s to examine the impact of various iSCSI parameters such as iSCSI PDU s ize, Maximum Segment Size, Link Delay and TCP W ind ow Size. The work in [12] examine s the effect of block lev el request size and iSCSI window size in LAN, MAN and W AN environmen ts. The w ork in [10] exam- ines the use of various advanced TC P stacks such as F AST TCP , Binary Increase Congestion TCP (BIC-TCP), H-TCP , Scalable T CP and High-Speed T CP using simulatio ns and a emulated wan. The auth ors in [ 16] study the per formanc e of iSCSI in the context o f synchr onous r emote mir roring and find that iSCSI is a v iable approa ch to cost-effecti ve remote mirror- ing. Th e work in [7] compares the performance of NFS and iSCSI micro and macro be nchmark s. The work in [ 13] ex- amines the imp act of certain kernel SCSI s ubsystem v alu es and suggest mo difications to th ese value f or perform ance improvement of iSCSI. [17] propo ses a caching alg orithm and localization of certain u nnecessary pro tocol overheads and observe significant perform ance improvements over current iSCSI system. 6. Conclusions In our work, we in vestigated the perfo rmance of iSCSI with multiple TCP con nections an d foun d that iSCSI throug hput suffers f rom competing TCP con nections. W e propo sed a T CB inf ormation sharing m ethod called F air- TCP based on th e design of [2 0]. W e implemented Fair- TCP for the Linux kerne l and co mpared the p erform ance of iSCSI with Fair -TCP and standard TCP u nder d ifferent workloads. W e find that Fair-TCP improves the perfor- mance of iSCSI sig nificantly in I/O intensi ve workloads. For workloads such as single th readed r ead, the SCSI d ata generated is quite lo w , hence Fair-TCP does do n ot as good as in I/O intensi ve workloads. Refer ences [1] Julian Satran, Kalman Meth, Costa Sapu ntzakis, Mallikarjun Ch adalapaka and Efri Zeidne r . iSCSI (In- ternet SCSI) RFC 372 0. 0 2 4 6 8 10 0 100 200 300 400 500 600 Delay (ms) Time (seconds) TCP Fair−TCP Figure 15. 2.4.20 Kernel compile [2] A.Benn er . Fibre Channel: Gigabit Communication s and I/O for compu ter networks. McGraw Hill, 1996. [3] W .Stevens. TCP Slo w Start, Congestion A voidance, Fast Retran smit and Fast Recovery Alg orithms. RFC 2001. [4] M.M athis, J.Mahd avi and S.Floyd. TCP Selective Ac- knowledgment Options. RFC 201 8. [5] Kalma n Meth, Julian Satran, Design of th e iSCSI Pro- tocol. Pr oceeding s of the 20th IEEE/11th N ASA God- dar d Confer ence on Mass Storage Systems and T ech- nologies 2003 (MSST’03) . [6] S.Aiken, D.Gru nwald, A.Pleszkun an d J. W illeke. A Perfo rmance analysis of the iSCSI protoco l, Pr o- ceedings o f the 20th IEEE/1 1th N ASA God dar d Con- fer en ce on Mass S torage Systems an d T echnologies, 2003( MSST’03) . [7] Peter Radkov , Li Y in, Paw an Goyal, Prasenjit Sar kar and Prashan th Shenoy . A Performan ce Comp arison o f NFS and iSCSI for IP-Networked Stor age. Pr oce ed- ings of the 3r d USENIX Confer ence on F ile an d Stor - age T echnologies, April 2004 . [8] Dimitris Xindis, Michail D. Flouris and Ange los Bi- las. Performan ce Evaluation of Commod ity iSCSI- based Storage Systems. P r oceedin gs of th e 22 nd IEEE / 13th NASA Godd ar d Con fer en ce on Ma ss Storage S ystems and T echno logies (MSST’05) . [9] W ee T e ck Ng and Bruce K. Hillyer . Obtaining high perfor mance for storage outsourcin g. P r oceedin gs of the 2001 ACM S IGMETRICS international confer- ence on Measur ement and modeling of computer sys- tems, 2001 . 10 [10] Girish Motwani, K. Gop inath. Evaluation of Ad- vanced TCP Stacks in the iSCSI En vironme nt us- ing Simulatio n model. Pr oceedin gs of the 22nd IEEE / 13th NASA Goddard Conference on Mass S torage Systems and T echnologies (MSST’05) . [11] Y Lu. Fraukh, Noman an d D. H. Du. Simu lation study of iSCSI-based stora ge system . Pr oceedings o f the 21st IEEE / 12th NASA Goddard Confer ence o n Mass Storage S ystems and T echno logies (MSST’04) . [12] I. Dalgic, K. Ozdem ir , R. V elpuri, J. W eber, U. Kukreja and H.Chen. Comp arative perfo rmance ev al- uation o f iSCSI p rotoco l over metro, local an d wide area network s. Pr oceeding s of the 21nd IEE E / 12th N ASA Goddard Conference o n Mass Stora ge Systems and T echnologies (MSST’04 ) . [13] Saneyasu Y amaguch i, Masato Ogu chi a nd Masaru Kitsuregawa. Trace System of iSCSI storage ac cess. Pr oceed ings of the The 200 5 Symposium on Ap plica- tions and the Internet (SAI NT’05) . [14] W . Stevens. T CP Slow Start, Cong estion A voidance, Fast Retransmit and F ast Recovery Alg orithms. [15] UNH iSCSI Consortium. http://www .iol.un h.edu/co nortiums/iscsi [16] Ming Zhang, Y inan Liu , Qing (K en ) Y an g. Cost- Effecti ve Remo te Mirro ring Using the iSCSI Proto- col. Pr oceed ings of the 21nd IEEE / 12th N A SA God- dar d Confer ence on Mass Storage Systems and T ech- nologies (MSST’04 ) . [17] X.He, Q .Y ang and M.Zhang . Introduction to SCSI- to-IP Cache f or Storag e Area Networks. Pr o ceedings of the 2 002 In ternational Conference o n P arallel Pr o- cessing, 2002 . [18] H.Balakrishn an, V .N.Padmanab han, S.Sesh an, M.Stemm and R.Katz. TCP B ehavior o f a Busy I nter- net Ser ver:Analysis and Imp rovements. Pr oc eedings of IEEE INFOCOM , San Francisco, CA, 1998. [19] H.Balakrishn an, H.S. Rahul an d S.Seshan. An Inte- grated Congestion Man agement Architecture for In - ternet Hosts. ACM SIGCOMM , Sep 1999 . [20] J. T ouch . TCP Control Block I nterdepen dence, RFC 2140. [21] L. Eggret, J. Heidemann, and J. T ouch. Ef fects of Ensemble-T CP . In ACM Comp uter Communication Review , J anu ary 2000 . [22] H. Balakrishnan and S.Seshan. The Congestion Man- ager, RFC 3 124. [23] V ern Paxson . End-to-e nd Internet p acket dy namics. IEEE/ACM T ransactions on Networking , 1999. [24] NIST Net - A Linux-based network e mulating tool. http://snad.n csl.nist.gov/nistnet/ [25] Bonnie ++ now at 1.03 a. http://www .coker .co m.au/bon nie++/ [26] J. Katcher . Po stmark: A new file system benchmark . T ech nical Report TR0322, Network Appliance Inc. [27] Ali R.Butt, Chris Gn iady , and Y .Charlie Hu. The per - forman ce of Kernel Prefetch ing on Buffer Cache Re- placement Algor ithms. Pr oceeding s of the A CM In- ternational Conferen ce o n Measuremen t & Model- ing of Compu ter Systems (SIGM ETRICS ’05), Banff, Canada, June 6-10 , 2005. 11

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment