Extensible Component Based Architecture for FLASH, A Massively Parallel, Multiphysics Simulation Code
FLASH is a publicly available high performance application code which has evolved into a modular, extensible software system from a collection of unconnected legacy codes. FLASH has been successful because its capabilities have been driven by the nee…
Authors: ** 논문에 명시된 정확한 저자 목록은 제공되지 않았으나, 본 논문은 주로 **A. Antypas**, **B. Fryxell**
Extensible Comp onen t Based Arc hitecture for FLASH, A Massiv ely P arallel, Multiph y sics Sim ul ation Co de Ansh u Dub ey ∗ ,a , Katie An ty pa s b , Murali K. Ganapath y c , Lynn B. Reid a , Katherine Riley d , Dan Sheeler d , Andrew Siegel d , Klaus W eide a a ASC/Flash Center, The University of Chic ago, 5640 S. El lis Ave, Chic ago, IL 60637 b L awr enc e Berkeley National L ab or atory, 1 Cyclotr on R o ad, Berkeley, CA 94720 c Go o gle Inc. d Ar gonne National L ab or atory, 9700 S. Cass Ave, Ar go n ne, IL, 60439 Abstract FLASH is a publicly a v ailable high p erformance application code whic h has ev olv ed in to a mo dular, extensible softw are system from a collection of unconnected legacy co des. FLASH has b een s uccessful b ecause its ca- pabilities ha v e b een driv en b y the needs of scien tific applications, without compromising maintainabilit y , perfo rmance, and usability . In its ne w est in- carnation, F LASH3 consists of in ter-o p erable mo dules that can b e com bined to generate differen t applicatio ns. The FL ASH architec t ure allo ws arbitrarily man y alternativ e implemen ta tions of its comp onen ts to co-exist and inter- c hange with eac h o t her, r esulting in greater fle xibilit y . F urther, a simple and elegan t mec hanism exists for customization of co de f unctionality with- out t he need to mo dify the core implemen ta tion of the source. A built-in unit test framew ork prov iding ve r ifiabilit y , combined with a rigoro us soft- w are main t enance pro cess, allow the co de to op erate sim ultaneously in the ∗ Corresp o nding a uthor Email addr ess: dub ey@fl ash.u chicago.edu (A ns hu Dubey) Pr eprint submitt e d to Par al lel Computing Septemb er 27, 2018 dual mo de of pro duction and dev elopmen t. In this pap er w e describe the FLASH3 arc hitecture, with emphasis on solutions to the more c hallenging conflicts arising from solv er complexit y , p ortable perfo rmance requireme nts, and legacy co des. W e also include results f r o m user surv eys conducted in 2005 and 2007, whic h highligh t the success of the code. Key wor ds: Soft ware Arc hitecture, Portabilit y, Extens ibilit y, Massiv ely parallel, FLASH 1. In tr o duction The ASC/Flash Cen ter at the Univ ersit y of Chicago has dev elop ed a public domain astrophy sics a pplication co de, FLASH [11; 5]. FLASH is comp onen t- based, para llel, and p orta ble, and has a pro ven abilit y to scale to tens of thousands of pro cessors. The FLASH co de w as dev elop ed under con- tract with the Departmen t of Energy ASC/Alliance Program. It is av ailable to external users through a cost-free licensing a greemen t. Appro v ed users ma y download the source co de and make lo cal mo difications, but ma y not redistribute the code. F LASH is the flagship Computer Science pro duct o f the F lash Cente r, resulting from o ver 10 y ears of researc h and dev elopmen t . One of the mandates of the Flash Cen ter w as the deliv ery of a parallel, scalable, and highly-capable commun it y co de for astroph ysics. Motiv ation for the co de effort lay in the inc reasing complexit y of astroph ysical sim ula- tions. The traditional academic approac h of de v eloping numeric a l softw are in piecemeal w as deemed inadequate to meet the scie nce needs. Another aim of the Flash Cen ter w as to shift the par a digm of theoretical researc h tow ards w orking in m ultidisciplinary te ams with scien tific co des that 2 are deve lo p ed with mo dern soft w are practices prev alent in the commercial w orld. The FLASH co de has no w reac hed a lev el of maturity where it has a large n umber of users, more than 80% external to the Univ ersity of Chicago. Moreo v er, it also has a substan tial num b er of external co de con tributor s. The n um b er of requests f or do wnload, and the num b er o f publications using the FLASH co de, ha v e grown sup erlinearly in recen t ye ars (see Section 5). This success w a s ac hiev ed by carefully balancing the oft en conflicting requiremen ts of phys ics, softw are engineering, p ortability , and p erformance. F rom its ince pt io n, FLASH has sim ultaneously b een in dev elopmen t and in pro duction mo de. Its ev olution in to a modern component-based co de has tak en a path v ery different from that of most scien tific computing framew orks suc h as Chom b o, SAMRAI, CA CTUS, and POOMA [8; 24; 6; 13; 1 4 ; 16; 21; 19]. Those efforts dev elop ed the framew ork first, fo llow ed b y the addition of solv ers and other capabilities. An alternativ e path tak en b y scien tific application co des suc h as Enzo, SWMF, and Athena [20; 23; 12] is to grow in to a large application f r om smaller solv ers and a pplications. Both mo dels of dev elopmen t ha ve their adv an tages and disadv an tages: co des initialized with framew orks hav e sup erior mo dularity and main tainability , while co des b egun with solv ers g enerally deliv er b etter p erformance for their t a rget applications. FLASH straddles b oth approac hes. In the first releas ed vers io n, the deve lopmen t follow ed the solv ers-first mo del, but later v ersions place more emphasis on mo dularity , maintainabil- it y , and extens ibilit y . The outcome of this duality in dev elopmen t is that FLASH has more capabilities and customizabilit y , and it reac hes a muc h wider comm unity than most scien tific application co des. FLASH has gained 3 wide usage b ecause the capabilities of the co de ha v e been dr iven by phy sics, while its archite cture is driven b y exte nsibility and maintainabilit y . The ad- dition o f new solv ers to FLASH is almost alw ays dictated b y the needs of users’ applications. The solve rs fo r m ultiphys ics applications tend to put sev ere strain on an y mo dern ob ject-orien ted softw are design. Lateral data mo ve ment is normally required b etw een different solv ers and functional units, whic h makes resolving data owners hip and maintaining encapsulation esp e- cially ch allenging. Also, man y of t he core ph ysics solv ers are legacy third- part y soft ware written in F or t r an, whic h are rarely mo dular. While mo du- larit y , flexibilit y , a nd extensibilit y are some of t he primary guiding principles in the co de arc hitecture design, these goals often conflict with the equally imp ortant considerations of efficiency and performa nce. Additionally , since high p erformance platforms usually hav e a relativ ely short lifespan, the need for p erformance p ortabilit y pla ces ev en more constrain ts on the design pro- cess. Ac hieving a balance b et wee n these conflicting goals while retaining the v ery complex m ultiph ysics capabilities has been the biggest contributor to the widespread acceptance of the FLASH co de. The FLASH mo del o f dev elopmen t and architecture is informed b y t he literature from the common component architecture effor t [15; 2]. Since the pro ject’s inception, FLASH has undergone tw o ma jor revisions, b oth of whic h included significan t a r c hitectural and capabilities improv emen ts. FLASH has alw ays striven for a comp onent-based architecture , but this goal w as not re- alized in the first v ersion b ecause of a strong emphasis on pro ducing early scien tific results using legacy codes. How ev er, foundations for a comp onen t- based architec ture w ere firmly laid in the first v ersion FLASH1.6 [11] by 4 pro viding wrapp ers on all the solv ers and minimizing lateral comm unica- tion b et we en differen t solve rs. The second generation v ersions, FLASH2.0 – F LASH2.5, built up on this foundation b y address ing dat a o wnership and access, resulting in a cen tralized data managemen t approach . F inally , the curren t ve r sion, FLASH3, has realized a true comp onen t-based arc hitecture with decen tralized data manag ement, clean in t erfaces, and encapsulation of functional units. FL ASH3 also has w ell- defined rules for inheritance within a unit and f or in teractions suc h as data commun ication b et w een units. F u rther discussion of arc hitecture changes ov er revisions is provided in Ant ypas et a l. [1]. This latest release con tains o v er 380,000 lines of code, with o ver 13 8 ,000 additional lines of commen ts. The core of the FLASH co de is written in F ortran9 0 , with input/output in terfaces prov ided in C. Initia lly F or tran was c hosen b ecause the legacy computational ke rnels w ere written in F ortran, whose in terop erability with ob ject-orien ted languages can b e memory inef- ficien t and unp ortable. In addition, experience with system soft w are lim- itations on v arious sup ercomputers demonstrated the wisdom of av oiding complex features suc h as dynamic linking in the build pro cess. The c hoice of F o rtran do es affect the arc hitecture: instead o f dep ending up on the pro- gramming language to enforce mo dular implemen tation, FL ASH mus t rely up on a com bination of the Unix directory structure and sev eral scripts to main tain mo dularity (see Figure 2 ) . Ho we ver, lac k of strong chec king b y the language can also be adv an tageous b ecause it discourages complexit y in the design. In addition, the “primitiv e” features of F ortran allow dev elop ers to sometimes accelerate debugging b y temp orar ily b ypassing the a r chitecture 5 to give direct access to data structures. More t han 35 dev elop ers and researc hers ha v e con tr ibuted to all ve rsions of the FLASH co de. During the pa st 10 y ears, ov er 80 person-years of ef- fort ha v e built the co de and its scie ntific algorithms. As the complexit y of the co de and the num b er of dev elop ers ha ve gro wn, code v erification and managemen t of the softw are dev elopmen t pro cess hav e b ecome increasingly imp ortant to the success of the pro ject. The FLASH3 distribution now in- cludes a unit test framew ork a nd its o wn test-suite, called FlashT est, whic h can b e used for professional regression testing. In this pap er w e describ e the FLASH3 arc hitecture, with emphasis on solutions to the more challen ging conflicts arising from solve r complexit y , p ortable p erformance requiremen ts, and lega cy co des. W e also include re- sults from user surv eys conducted in 2005 and 200 7 , indicating how the arc hitecture c hoices hav e led to the widespread acceptance of the F LASH co de. 2. Arc hitecture Cornerstones FLASH is not a mono lit hic application co de; instead, it should b e view ed as a collection of comp onen ts tha t are selectiv ely group ed to form v arious applications. Users specify whic h comp onen ts should be included in a sim- ulation, define a rough discretization/para llelization la y o ut, and assign their o wn initial conditions, b oundary conditions, and problem setup to create a unique application executable. In FLASH terminology , a component that implemen ts an exclusiv e p or t io n of the co de’s functionalit y is called a unit . A t ypical FLASH sim ulation requires a pro p er subset of the units av a ilable 6 in t he co de. Th us, it is impo rtan t to distinguish b et w een the entire FLASH source co de and a given FLASH application. The FLASH ar chitecture is defined b y four cornerstones: unit, config- uration lay er, data mana g emen t, and inte raction b et w een units. Here w e describe the four cornerstones briefly . 2.1. Unit A FLASH unit pro vides w ell-defined f unctionalit y a nd conforms to a structure that f acilitates its in teractions with other units. A unit can ha ve in terc ha ngeable implemen tations of v ar ying complexit y , as w ell as subunits that pro vide subsets of the unit’s functionalit y . Each unit defines its Appli- cation Programming Interface (API), a collection of routines through whic h other units can in teract with it. Units m ust prov ide a n ull implemen tation for ev ery r o utine in their API. This feature p ermits an application to easily exclude a unit without the need to mo dify co de elsewhere. F or example, the input/output unit can be easily turned on and off for testing purp oses, b y linking with the null implemen ta tions. FLASH units can b e broadly classified into fiv e functiona lly distinct cat- egories: infrastructure, phys ics, driv er, monitoring , and sim ula tion. This categorization is mean t to clarify the role of differen t classes of units in a sim ulation, rather than an y arc hitectural differenc es among them. In terms of organization, and their treat ment b y the configuration t o ol, all units follow the same rules, except the IO and the Sim ulatio n units, describ ed in Sec- tions 3.3 and 3.4. The infra structure category includes the units resp onsible for housek eeping tasks suc h as the managemen t of run time parameters, the handling of input and output to and from the co de, and the administration 7 of the solution mesh. Units of t his t yp e are discu ssed further in Section 3.3. Units in the ph ysics category implemen t algorit hms to solv e the equations describing sp ecific ph ysical phen omena, and include units suc h as hydrody- namics, equations of state, and gravit y . These units constitute the core of the FLASH solution capabilities. The Driv er unit implemen ts the time adv ance- men t metho ds, initializes and finalizes the application, and con tr ols most of the interaction b etw een units included in a simulation. Because con t r o l o f the sim ulatio n is implemen ted b y the D riv er unit, it inte racts the most with other individual units (see Section 2.4 for more detail). The monitoring units trac k the progress and p erformance of a sim ula t io n. In g eneral these units are not ess en tial to pro ducing scien tific r esults, but pro vide informat io n to the user ab out hardw are usage and softw a re efficiency . The Simulation unit is of particular significance; it defines ho w a FLASH application will b e built and executed. It also provide s initial conditions and the sim ulation-sp ecific run- time parameters for the a pplication. The Sim ulatio n unit has been designed to enable customization of t he FLASH co de for sp ecific applications without mo difying other units, a s explained in Section 3.4. Additional details on the unit arc hitecture in general is prov ided in Section 3. 2.2. C o nfigur ation L ayer FLASH implemen ts its inheritance, extensibilit y , and ob ject-o r ien ted ap- proac h t hrough its configuratio n la y er. This la yer consists of a collection of text Confi g files that reside at v arious leve ls of the co de orga nization, a nd the setup to ol which in terprets the Config files. The t w o primary functions of this la y er are to configure a single application from the FLASH source tree, and to implemen t inheritance and customizabilit y in the co de. The 8 Config files for a unit con tain directiv es that apply to ev erything at, or b e- lo w, that hierarchic a l lev el, and describe it s dep endencies a s w ell a s v ariables and run time parameters requiremen ts. The setup to ol parses the relev ant Config files, starting with the o ne for the Sim ula tion unit describ ed in Sec- tion 3.4. D ep endencies are recursiv ely res o lv ed to configure individ ual units needed for the application. Remem b er tha t eac h application requires dif- feren t sections of co de and pro duces a dis t inct executable. This metho d of configuration av oids a n unnecessarily large binary and memory f o otprint, as only the needed sections of code are included. I t also enables extensibilit y , since the inclusion of a new unit, or a new implemen tation of a unit, need b ecome kno wn only to the Config file of the sp ecific problem setup in the Sim ulation unit. Figure 1 sho ws sections o f tw o sample Config files, one from the Simu- lation unit (left pa nel), and a nother one from a phys ics unit (right panel). In Figure 1 ( a), lines 1 and 3 -5 sp ecify units tha t mus t b e included. Line 2 sp ecifies a monitoring unit that is requested but ma y b e excluded. No substitutions are p ermitted for these units, or their implemen tatio ns. In the same file, lines 6-8 sp ecify desirable implemen ta tions of subunits. These sub- unit implemen tations will b e included if there a r e no ov erriding directiv es giv en on the setup command line, but suc h a directiv e can cause them to b e either exclud ed or replaced by another implemen tation. The remaining lines in the file p ertain to the run time parameters and v ariables. Similarly , in the Config file sho wn in the right panel of Figure 1(b), t he first 5 lines sp ecify the required and desirable units and subunits. Line 6 indicates whic h implemen tation of the current unit is to b e included b y default, in this case, 9 an implemen tation that is found in the P articlesMain/passiv e sub directory . Again, a directiv e to the setup to ol can replace this impleme ntation. Note that b ot h the Config files define the parameter “pt maxP erPro c”, along with its default v alue. Because of FL ASH’s inheritance rules, the par a meter v alue in the Sim ulation Config will b e used in the sim ulation, whic h in turn can b e o ve rwritten at runtime. (a) Config for Simulation (b) Co nfig for ParticlesMa in Figure 1: Sections o f Sample Config files. FLASH’s a pproac h of using the Unix directory structure with text anno- tations in the Config files to implemen t inheritance and other ob ject orien ted features has the triple adv antage o f b eing simple, extensible, a nd completely p ortable. Figure 2 sho ws an example unit and its corresp onding Unix di- rectory org a nization. The unit has t w o subunits: one with a single imple- men tation, a nd another o ne with t w o alt ernat ive impleme ntations. The top section o f F igure 2 (a) sho ws the logical archite cture of the unit, while the b ottom section of Fig ure 2(b) show s its orga nization using the Unix directory structure. 10 (a) Architecture view. (b) Unix tree s tructure view. Figure 2: Archit e cture o f Units, Subunits, a nd lo cal AP I for FLASH. 11 2.3. D ata Management In a large m ultiphy sics co de with many solv ers, management and mo ve- men t of data is one o f the biggest c hallenges. Legacy solver co des rarely a d- dress resolving the o wnership o f data b y differen t sections o f co de, a necessit y for encapsulation and modularity . During the first round of mo dernization in t he second v ersion o f FLASH, the data managemen t w as cen tralized in to a separate unit to unrav el the legacy co de. T his tec hnique is also the data managemen t model f ollo wed b y SAMRAI [6]. The cen tralized data man- agemen t extracted all the data f r o m the individual units, and ensured data coherency b y eliminating any p ossibility o f replication. The main dra wback of this approac h w as t hat it ga ve equal acces s to all units for data fetc hing and mo dification. Th us a unit could get m uta t o r access to data that it should nev er hav e mo dified. The o n us was on the dev elop er to find out the scope of eac h data item b eing fetc hed and to mak e sure that the scop e was not violated. This respo nsibilit y limited the abilit y to add more functionalit y to the co de to those who knew the co de v ery w ell, a serious handicap to extensibilit y . FLASH V ersion 3 tak es t he next and final step in mo dularizing data managemen t b y decen tralizing the data o wnership. Ev ery data item in the co de belongs to exactly one unit. The o wner unit ha s complete con trol o ver the scop e and mo difiabilit y of the data item while t he non-o wner units can access or m utate the data only through the owne r unit’s API functions. Additionally , the scop e o f data within a unit can v a ry . Th us for example a data item sp ecific to a subunit is visible only to that subunit, while unit scop e da ta is visible to all functions in the unit. 12 2.4. I nter action s Betwe e n Units The in teractions b et w een units are gov erned by b o th the Driv er unit and the published APIs of the individual units. The D r iv er unit is responsible for initializing all the include d units and the meta-data for the application as a whole. The Driver unit implemen ts the time-stepping sc heme of t he application, and hence dictates the order in whic h the units a re initialized and in v oked, and ho w they in teract with eac h other. Recall that units ha ve default n ull implemen tations, a feature that allo ws a comprehensiv e imple- men tation of the Driv er unit. Once a unit is in v ok ed by the driv er, it can also interact with other units thro ugh their API. The D r iver unit also cleanly closes the units and the application when the run is complete. 3. Unit Arc hitecture Of the four cornerstones of the FLASH arc hitecture, the unit structure is the most comple x. Unit arc hitecture separates the computational k ernel from the public in terfaces, and controls the scop e of v arious data items ow ned b y the unit. A detailed description of the unit archite cture is therefore crit- ical to understanding the ov erall structure and softw are methodolog y of the FLASH co de. Subunits are an imp o r tan t and no vel feature o f the unit a rc hi- tecture detailed b elo w. In additio n to the unit arc hitecture, we a lso describ e some of the infrastructure units a nd the Simulation unit, since these pla y a n imp ortant ro le in the code arc hitecture. The unit itself has three la ye r s. The outer la yer, the API, defines the f ull functionalit y of the unit. A unit’s API can b e view ed as ha ving t w o sections: one for making its priv ate data av ailable to the other units, and another whic h 13 defines its capabilities for mo difying the state of the sim ulatio n. The inner la ye r of the unit is kno wn as the k ernel, and implemen ts the full functionalit y . The middle la y er imple men ts the arc hitecture, and acts as conduit betw een the o uter and inner la y ers. It hides the kno wledge of the FLASH framew ork and unit ar chitecture from the kernel, and vice-v ersa, b y providing wrapp ers for the k ernel. The wrapp er lay er th us facilitates the imp ort of third part y solv ers a nd softw ar e into FLASH. T o include a new third party algorithm, additional wrapp ers would be implemen t ed in the middle la ye r to in terface b et wee n the already published API and the new functionality . 3.1. Subunits Units can ha ve one or more subunits which a r e groupings of self-con tained functionalit y . The concept of subunits is new in FLASH v ersion 3. It w as dev elop ed to constrain the complexit y of the co de ar c hitecture, and to min- imize the fragmen tation of co de units, whic h would result in proliferation of data access functions. In particular the concept of subunits formalizes the selectiv e use of a subset of a unit’s functionality , and the p ossibilit y of m ulti- ple alternative implemen tations of the same subset. The wrapp er lay er in the unit arc hitecture starts with the definition of subunits . Subun its implemen t disjoin t subsets of a unit’s API, where none of the subsets can be a n ull set. The union of all subsets constituting v arious subunits m ust be exactly equal to the unit API. Ev ery unit has at least a Main subunit that implemen ts the bulk o f the unit’s functionality , including its initia lizat io n. The Main subunit is also the custo dian of all the unit-scop e data. The wrapp er lay er arbitrates on lo cating functions common to many alternativ e implemen t a - tions of subunits, suc h that co de duplication is minimized and flexibilit y is 14 maximized. The use of the subunit concept is best illus t r ated with a n example of in terdep endencies b et w een the Grid unit, whic h ma na ges the Eulerian mesh, and the Particles unit. The discretized mesh in FLASH is comp osed of a collection of blo c ks, where individual blo c ks span a section of the domain, and all the blo cks tak en together cov er the entire domain. In parallel en vi- ronmen ts, domain decomp osition maps one or more blo c ks to eac h pro cessor participating in the sim ulation. P ar t icles ma y be massles s and passiv e, used to trac k the Lag rangian features of the sim ulation, or activ e part icles with mass whic h can affect gra vitatio nal fields . While individual elemen ts (zones and grid p oints) of the Eulerian mesh sta y at the same ph ysical lo cation in the domain throughout the ev o lutio n, the Lag rangian elemen t s (particles ) mo v e with the motion of the fluid. The motion of Lagrangian pa rticles relativ e to the underlying Eulerian mesh is best illustrated with snapshots of a set of particles at differen t times during evolution. Figure 3 shows the p ositions of a small subset of particles at different stag es o f evolution in a weak ly com- pressible turbulence simulation using a uniform grid [10]. Here, b ecause the mesh do es not change with time, the Eulerian elemen ts are stationary in the ph ysical domain at all times, while the narro w line of Lagrangia n elemen ts has spread all o v er the domain in the same t imefra me. FLASH has four distinct subsets of functionality related to particles, each of whic h can hav e m ultiple a lt ernat iv e implemen tations. The curren t FL ASH release pro vides three implemen t ation metho ds for initial distribution of par- ticles, four metho ds of mesh/particle mapping, tw o t yp es of gravitational field inte r action, and sev en metho ds of time integration. This lev el of com- 15 Figure 3: Images of Lag rangian tracer par ticles’ movemen t with adv a nce in time evolution. The snapsho ts a re taken at times (a) T=0 , (b) T= 0.75, (c) T=1.7 5 and finally (d) T= 4.25 seconds. The simulation was done o n 3 2 ,768 no des of the IBM BG/L machine at Lawrence Livermore National La b orator y , with 1 8 56 3 grid p oints and more than 16 million particles. 16 plexit y is not limited to the P a rticles unit. The time integration of particles can result in their migration b etw een ph ysical regions s erv ed b y differen t pro cessors. Similarly , regridding of the active mesh ma y require migratio n of particles. These particle-related mo ve men ts are best handled by the Grid unit since it kno ws the top ology of the Eulerian mesh, thereby retaining encapsulation of alternativ e unit implemen tatio ns. If FLASH w ere to solely follo w the unit mo del of arc hitecture describ ed in Section 2.1, then separate units for particles distribution, mapping, in- tegration and migratio n w ould b e neede d. Eac h of these units w ould need access to large amounts of data in the other units, thereb y requiring man y accessor-m utator functions. Therefore, the addition of subunits is a ma jor feature of the FLASH3 arc hitectural impro vem en ts. The concept of subunits v ery elegan tly solv es b o th the problems of data acces s and unit fra g men ta- tion t hr o ugh the introduction o f a lev el of hierarc hy in the unit’s architecture . Th us in the P articles unit the P a rticlesInitialization and P a rticlesMapping subunits resp ectiv ely deal with the initial spatial distribution and with map- pings to and fro m the Eulerian grid, while t he P articlesMain unit k eeps the unit scop e dat a and implemen ts time in tegratio n metho ds. Eac h subunit can ha ve sev eral alternativ e implemen tations. Hence, subunits not o nly organize a unit in to distinct functional subsets that can b e selectiv ely turned off, but also expand the flexibilit y of the co de since implemen tations of different sub- units can p erm ute with eac h other and therefore can b e com bined in man y differen t w ays . 17 3.2. L ater al Data Movemen t In a dditio n to the subunits lev el functionalit y , the other ma jor challen ge p osed b y the in t era ctio n b et we en solv ers f or m ultiphy sics sim ulations is the need for lateral data mov ement, which mak es resolution of data o wnership and encapsulation extremely difficult. F or instance, the calculation of the h ydro dynamics equations is dependen t up on the equation of state, and if gra vity is included in the simulation, up on gravitational accele r a tion. Simi- larly , within the h ydro dynamics calculation, there is a need to r econcile the fluxes at a global lev el when adaptiv e meshing is b eing used. All of these op erations requ ire a ccess to data whic h is ow ned b y different units. Though v ersion 2.5 of FLASH with its cen tralized database did not ha ve some o f these difficulties, it did not resolv e data o wnership, a nd did not ac hiev e en- capsulation. FLASH3’s solution to this c hallenge is to pro vide interfaces that allo w fo r tra nsfer bac k a nd forth b et w een units, so that da t a can b e accessed through argument passing b y reference. The ch allenge is then reduced to arbitration b et we en units as to which one is b est suited to implemen t the needed functiona lity . Figure 4 sho ws examples of lateral data mo v emen t b etw een the P articles unit a nd the Grid unit. The left panel of Figure 4(a) sho ws the flow of execution, starting in the Particles unit, as particles c hange their ph ysical p osition due to time in tegration. Some of t he ne w p o sitions in the Eulerian mesh ma y b e on differen t pro cessors. The mov emen t of particles to the appropriate pro cessor is b est carried out b y handing con trol, along with the particles data, to the G r id unit b ecause o f its kno wledge of the mes h la y o ut. Once it has mo v ed the particles appropria t ely , the Grid unit returns the 18 data and con trol bac k to the P ar t icles unit. The r igh t panel of F igure 4( b) sho ws mo v emen t b etw een the same tw o units where the example o p eration starts in the Grid unit. When using AMR, the mesh regridding op eration c hanges the mapping of blo c ks to pro cessors. In reorien ting themselv es to the new mesh, t he particles hav e t o mo v e among pr o cessors. Because the particles’ data struc tures are not a ccessible to the Grid unit, the control is temp orarily transferred to the Particles unit, whic h passes the particles’ data b y reference to the Grid unit fo r redistribution. Both examples preserv e data encapsulation and o wnership without compromising the p erformance. (a) Particle adv ancement in time. (b) Mesh refinement. Figure 4: Latera l Data Movemen t dur ing tw o differ ent alg orithmic steps. 3.3. I nfr a structur e Units The infr a structure units in FL ASH are resp onsible for discretization of the ph ysical domain; reading, writing, and maintaining the data structures related to the simulation data; and o ther housek eeping ta sks suc h as handling ph ysical constants and runtime parameters. Of these, the most extensiv e resp onsibilities lie with the Grid unit, whic h manages the disc retized mesh, 19 and the input/output IO unit, whic h reads and writes the da ta. These tw o units are also unique in that they share their data with eac h other; this exception to unit encapsu lation is allo wed for p erformance reasons. Here we describes these t w o units briefly (furt her discussion is found in [4] and [9]). The Grid unit is the custo dia n of all the data structures related to the ph ysical v ariables necessary for adv a ncing the sim ula t ions. Ev ery discrete p oint in the mesh is asso ciated with a n umber of ph ysical v aria bles, logi- cal and ph ysical co ordinates, and an indexing n umber. On eac h pro cessor, meta-data exists, suc h as the lo cation in the ph ysical doma in and the n um- b er of discretization p oints p er para llel g r o uping. FL ASH3 has tw o different Grid implemen tations: a simple g r id uniform in space and a blo c k-structured adaptiv e o ct- tree mesh. If Adaptive Mesh Refinemen t is b eing us ed, block s are created, destroy ed, and distributed dynamically , and different blo ck s ex- ist at v arying leve ls of resolution, a ll of whic h m ust b e trac k ed b y the unit. The Grid unit is a lso resp onsible for k eeping the phys ical v ariables consis- ten t throughout the simulation. F o r example, when t w o adja cen t blo c ks are at differen t resolutions, in terp olation and prolongation ensure that conser- v ation la ws are not violated. Hence, the Grid unit is the most complex and extensiv e unit in the co de, and most of the scaling perfo r mance of the co de is determined b y the effic iency of its parallel alg orithms. In FLASH, more than 90% o f the reading or writing of data to the disk is con t rolled b y t he IO unit. FLASH out puts data for c hec kp o inting and analysis. The c heck p oin ts sav e t he complete state of t he sim ula tion in full precision so that sim ulatio ns can transparen tly restart from a che c kp oint file. The analysis data is written in man y formats. The largest of these 20 are the plo tfiles, whic h record t he state of the ph ysical v a riables. Q uan tities in tegrated o ver the entire domain are written from t he master pro cessor in to a simple text file. The only input controlled b y the IO unit is the reading of c heck p oint files. Other forms of input, su c h as reading in a table of initial conditions needed by a sp ecific simulation, are manag ed by the unit in question. FLASH is one of the relativ ely few applications co des that hav e supp ort for mu ltiple IO libraries, suc h as HDF5 [18] and parallel netCDF [17; 7], where all pro cessors can write data to a single shared file. 3.4. Si mulation Unit The Sim ulation unit effectiv ely defines the scien tific application. Eac h sub directory in the Sim ulation unit con tains a different a pplication, whic h can b e view ed as a different implemen t a tion of the Sim ulation unit. This unit also prov ides a mec hanism b y whic h use rs can customize any part of their application without hav ing to mo dify the source co de in any other unit. An application can assume v ery sp ecific knowledge of units it w an t s to include and can selectiv ely replace functions from other units with its own customized ones by simply placing a differen t implemen tatio n of the func- tion in its Simulation subdirectory . A t configuration time, the arbitration rules of the setup to ol cause an implemen tation placed in the sim ula t ion unit to ov erride any other implemen tation o f that function elsewhere in the co de. Similarly , the sim ulat io n unit can also b e aw a r e of the runtime parameters de- fined in other units and can reset their default v alues. Additiona lly , FLASH do es not limit applications to the functionalit y distributed with the co de; a n application can add functionalit y b y placing its implemen tation in the Sim- ulation subdirectory . The setup to ol has the capabilit y to include an y new 21 functionalit y thus added at configuration time, without an y prior kno wledge of the functionality . Acc o rdingly , b y allowing g reat flexibilit y to the Sim- ulation unit, F LASH make s it p ossible f o r users to quick ly and painlessly customize the code for their applicatio ns. A typical use of this flexibilit y is in user-defined b oundary conditions tha t ma y not hav e standard support in FLASH. Another f requen tly customized functionalit y is con trol of refinemen t when using the AMR adaptiv e grid mo de. 4. Co de Main tenance While a clear arc hitecture design is the first step in pro ducing a useful co de, the FLASH code is not static and con tinue s to dev elop based on in- ternal pressures and external requests and collab orations. As the co de gains maturit y , regular testing and main tenance b ecome crucial. Main tenance of the F LASH co de is assisted by guidelines for all stages in the co de lifecyc le, some of whic h are enforced and o t hers are strongly enc ouraged. 4.1. Unit T es t F r am ework In k eeping with go o d soft ware practice, FLASH3 incorp orates a unit t est framew ork that allows for rigorous testing and easy isolation of errors. The implemen tation of a new co de unit or subunit is usually accompanied b y the creation of one or more corresp onding unit tests. Where p ossible, the unit tests compare n umerical res ults against known analytical or semi-analytical solutions whic h isolate the new code mo dule. The comp onen t s of the unit test reside in tw o differen t places in the FLASH source tree. One is a dedicated path in the Simulation unit, where 22 the sp ecific unit test acts as a n ordinary Simulation. The other is a sub- directory called unitT es t , lo cated w ithin the hierarc h y of the corresp onding unit, whic h implemen ts the actual test and an y help er functions it may need. These functions ha v e exte nsiv e access to the in t ernal data of the unit being tested. By splitting the unit test in to t w o lo cations in the source tree, unit encapsulation is main t a ined. Figure 5 illustrates t he split implemen tation of the unit test with an exam- ple. The figure sho ws relev an t sections of the Particles and Sim ulation units in the FLASH co de. The example do es not represen t the full implemen ta - tion of either unit; it includes only those few sections that b est highligh t t he features of the unit tes t framew o rk. In the Simulation unit, there is an or - ganizational directory which houses all the unit tests . Within this directory , there are tw o unit tests for the P articles unit. One of the tests v erifies the correct mov emen t of the particles aft er their p ositions hav e c hang ed b ecause of either time integration or regridding. The r o utine implemen ting this test resides at the to p lev el o f the P a r t iclesMain subunit. The other unit test v erifies the time in tegration methods that adv ance passiv e particles in time. F or this test, the corresp onding routine resides in the subdirectory ”pas- siv e” of Particles Main subunit, where time integration of passiv e particles is computed. Figure 5 also sho ws the P articlesInitialization subunit to facili- tate clearer understanding of the unit structure and the o v erlying unit test framew ork. The dotted a rro ws from the Sim ulation unit test to the P art icles unit show t he coupling b etw een the tw o units. The figure also highligh ts t he flexibilit y of ha ving alternative implemen ta tions of the same function co-exist at sev eral lev els in the source tree. 23 Figure 5: The unit test framework under ly ing the FLASH source tre e . Unit tests are split int o dr ivers lo cated in a sub director y of the Simulation unit a nd implementation r outines within the relev ant unit b eing tested. Files are shown in italics. Dotted lines indicate the coupling b etw een the tw o units. 24 4.2. D o cume ntation FLASH’s clean arc hitecture is w ell do cumen ted, whic h enables easy ex- ten tion b y external con tributors [4]. F or all routines defining the in terface of a unit, a w ell do cumen ted header is a co de req uirement. The deve lo p ers are a lso strongly encouraged to include extensiv e in-line do cumen tation in addition to a header describing eac h routine they implemen t . FLASH uses Rob o do c [22; 25] for automatic g eneration o f do cumen tation from internal headers. Compliance with co de regulations suc h as do cumen tatio n and go o d co ding pr a ctice is c hec ked through scripts that run nigh tly . In addition, rapidly executing example problems are prov ided in the pub- lic release of FLASH. Av ailability of a collection of example problems that a first-time user can set up and run in an hour or less has b een cited as one of t he more a ttractiv e features of FLASH in a code surv ey (see Section 5). FLASH comes with a User’s Guide, on-line ho wtos, on-line quick reference tips, and h yp erlinks to full descriptions with examples of all the API rou- tines that form the public in terfaces of v a rious units [3; 4]. All of these user-assistance comp onents are av a ilable on-line, a s is the curren t release. In addition, there is an activ e email User’s G roup where supp ort questions are addressed by b oth dev elop ers and kno wledgeable active users. 5. User Surv ey The FLASH Code has attracted a wide rang e of users and has become a premier comm unity co de preeminen t in, but not limited t o, the astrophys ics comm unit y . Many users cite FLASH’s capabilities, ease of use, scalabilit y , mo dularity , and extensiv e do cumen tatio n as the key reasons for their us e 25 of FLASH. A code surv ey perfo rmed in 2005, follow ed b y another in 2007, found t ha t the close to three h undred resp onding users utilize the co de in three ma jo r w a ys. The first group (appro ximately 41%) uses FLASH as a primary researc h to ol fo r a broad r a nge of application areas, including high- energy astrophysic s, cosmology , stars and stellar ev olution, computational fluid dynamics (CFD ), and algo rithm dev elopmen t. The second group of users ( ≈ 9%) emplo y t he FLASH co de for v erification and v alidation (V&V). These use r s primar ily attempt to compare F L ASH to other co des or use FLASH as a b enc hmark. Still others in this V&V group p ort FLASH to new mac hines to test compilers, libraries, and p erformance. Finally , the third group ( ≈ 25%) uses FLASH as a sample co de or for educational purp oses. The results of the surv ey clearly indicate that FLASH enjo ys wide ac- ceptance among researc hers from man y fields . By 2007, FLASH had b een do wnloaded more than 1700 times and us ed in more than 320 publications, b y b oth Cen ter mem b ers and external users . F ig ure 6 sho ws that b oth the n um b er of co de downloads and the nu m b er of publications has steadily grown as the code has matured. Fig ure 7 sho ws that w hile the presence of adap- tiv e mesh refinem en t is the top reason cited for using FLASH, it is the only one in the top six reasons that relates t o the capabilities of the co de. The remaining fiv e top reasons p ertain to the co de arc hitecture a nd its softw are pro cess. These reasons include flexibilit y , ease of use, and p erformance, thus vindicating the archite ctural c ho ices of FLASH. 26 Figure 6: Y ea rly num b er of publications in whic h the FLASH co de w a s used (left dark bars) a nd FLASH downloads (right strip ed bars). The jump in downloads in 2006 followed the rele a se of the alpha version of FLASH3, the new version of the co de. Figure 7: Results fro m a FLASH users survey in 200 7 : Reaso ns cited for FLASH usage. 27 6. Ac knowledgm en ts W e wish to thank a ll the past contributors to the FLASH co de. The soft- w are describ ed in this w o rk w as in part dev elop ed by the DOE-supp orted ASC / Alliance Cen ter for Astroph ysical Thermonuc lear Flashes at the Uni- v ersit y o f Chicago under gran t B523820 . References [1] An typas, K.B., Calder, A.C., Dub ey , A., Ga llagher, J.B., Joshi, J., Lam b, D.Q., Linde, T., Lusk, E.L., Messer, O .E.B., Mignone, A., Pan, H., P apk a, M., P eng, F ., Plew a, T., R iley , K.M., Ric k er, P .M., Sheeler, D., Siegel, A., T aylor, N., T ruran, J.W., Vladimirov a , N., W eirs, G ., Y u D., and Zhang, J. (20 06). FLASH: Applications and F uture. Par al lel Computational Fluid Dyna m ics 2005: The ory and Applic ations 235+. [2] Armstrong, R., Kumfert, G., McInnes, L., P ark er, S., Allan, B., Sottile, M., Epperly , T., and D ahlgren, T. (2006). The CCA comp onent mo del for high-p erforma nce scien tific computing. Concurr ency and Computa- tion: Pr actic e and Exp erienc e 18 ( 2 ), 215 –229. [3] ASC Flash Cen ter (2009). FLASH co de supp ort. http://flas h.uchicago.edu/website/codesupport/ . [4] ASC Flash Cen ter (2 009). FLASH user’s guide . http://flas h.uchicago.edu/website/codesupport/flash3_ug_3p 2.pdf . [5] Calder, A. C., F r yxell, B., Plew a, T., Rosner, R., Dursi, L. J., W eirs, V. G., Dup ont, T., Rob ey, H. F ., Kane, J. O., Remington, B. A., Drak e, 28 R. P ., Dimon te, G., Zingale, M., Timmes, F. X., Olson, K., Ric k er, P ., MacNeice, P ., and T uf o , H. M. (2002). O n v alidating an astro physic a l sim ulation co de. Astr ophysic a l Journal, Supplement 143, 201–229. [6] Cen ter fo r Applied Scien tific Computing (CASC) (200 7). SAM- RAI structured adaptiv e mesh refinemen t application infrastructure. https://com putation.llnl.gov/casc/SAMRAI/ . CASC, Lawrenc e Liv ermore National Lab orator y . [7] Chilan, C., Y ang, M., Cheng, A., and Arb er, L. (2006). P ar- allel I/O p erformance study with HDF 5, a scien t ific data pack age. http://www. hdfgroup.uiuc.edu/papers/papers/ParallelIO/Para llelPerformance.pdf . [8] Colella, P ., Grav es, D. T., Keen, N. D., Ligo c ki, T. J., Martin, D. F., McCorquoda le, P . W., Mo diano , D., Sc hw ar t z, P . O., Stern b erg , T. D., and V an Stra alen, B. (2009 ). Chombo So ft ware Pac k age for AMR Appli- cations, Design Do cumen t, https://see sar.lbl.gov / ANAG/chombo . [9] Dub ey , A., R eid, L., and Fisher, R. (2008). In tro duction to FLASH 3.0, with application to sup ersonic t urbulence. Physic a Scripta 132, 0 14046. [10] Fisher, R., Abarzhi, S., Ant ypas, K., Asida, S.M., C alder,A.C., Catta- neo, F., Constan tin, P ., Dub ey , A., F oster,I., G allagher, J.B., Ga nap- ath y , M.K., Glendenin, C.C., Kadanoff, L., Lamb, D.Q., Needham, S., P apk a, M., Plew a,T., R eid, L.B., Ric h, P ., Riley , K., Sheele r , D.(2008). T erascale T urbulence Computation on BG / L Using the F LASH3 Co de. IBM Journal of R ese ar ch and Developm ent 52(1/2), 127 –137. 29 [11] F ryxell, B., Olson, K., Ric k er, P ., Timmes, F. X., Zing a le, M., Lamb, D. Q., MacNeice, P ., Rosner, R., T ruran, J. W., a nd T ufo, H. (2000). FLASH: An adaptive mesh h ydro dynamics co de for mo deling a stroph ys- ical thermonucle a r flashes. Astr ophysic al Journal, S upplement 131, 27 3– 334. [12] Gardiner, T. A. and Stone, J. M. (2005). An unsplit Go dunov metho d for ideal MHD via constrained t r ansp ort. J. Computational Physics 205(2), 509–539. [13] Hornun g, R. and Kohn, S. (2002). Managing application complexit y in the SAMRAI ob ject-orien ted framew ork. Concurr ency and C omputa- tion: Pr actic e and Exp erienc e 14 ( 5 ), 347 –368. [14] Hornun g, R. D., Wissink, A. M., and Kohn, S. R. (2006). Managing complex data and geometry in parallel structured AMR applications. Engine ering with Computers 22, 18 1–195. [15] Hov la nd, P ., Keahey , K ., McInnes , L . C., Norris, B., Dia chin, L. F ., and Raghav an, P . (200 3 ). A qualit y of service approac h for high- p erformance numeric al comp onen ts. In Pr o c e e d i n gs of Workshop on QoS in Comp onent-Base d Softwar e Engine ering, Softwar e T e c h nolo gies Confer enc e . T oulouse, F rance. [16] Ko, S., Cho, K. W., Song, Y. D ., Kim, Y. G ., Na, J., and Kim, C. (20 0 5). Dev elopmen t of Cactus driv er fo r CFD a nalyses in the grid computing en vironmen t. In A dvanc es in Grid Computing - EGC 20 05 v ol. 347 0 , pp. 7 71–777. 30 [17] Li, J., Liao, W., Choudhary , A., Ross, R., Thakur, R., Gropp, W., Latham, R., Siegel, A., Gallagher, B., and Zingale, M. (2003 ). P ara llel netCDF: A high-p erformance scie ntific I/O inte rface. Sup er c omputing, 2003 ACM/IEEE Confer enc e 39+. [18] NCSA (2008). Heirarc hical D ata F ormat 5. http://hdf. ncsa.uiuc.edu/HDF5/ . [19] Oldham, J. (20 02). Scien t ific computing using POOMA. C++ Users Journal 20 ( 11), 6 –23. [20] O’Shea, B. W., Bry an, G., Bo r dner, J., Norman, M. L., Ab el, T., Hark- ness, R., and K ritsuk, A. ( 2005). In tro ducing Enzo, an AMR cosmolog y application. In Plew a, T., Tim ur, L., and W eirs, V. (eds.) Adaptive Mesh Refinemen t – Theory and Applic a tions. Springer, v ol. 41 of L e c- tur e Notes in Computational Scienc e and Engine ering . [21] Reynders, J., Hinker, P ., Cummings, J., A tlas, S., Banerjee, S., Humphrey , W., Karmesin, S., Keahey , K., Srik an t , M., and Tholburn, M. (1996). POOMA: A framew ork fo r scie n tific sim ulations on parallel arc hitectures. Par a l lel Pr o gr amming using C++ . [22] Slothoub er, F. (2007). Automating soft w ar e do cumen tation with R OBOD o c. http://www.x s4all.nl/ ~ rfsber/Robo /robodoc.html . [23] T oth, G., Sokolo v, I., Gombosi, T., Chesney , D., Clauer, C., De Zeeuw , D., Hansen, K., K ane, K., Manc hester, W., Oehmk e, R., et al. (2005). Space W eather Mo deling F ramew ork: A new too l for the space science comm unit y . J. Ge ophysic al R ese ar ch 1 1 0, 12 –226. 31 [24] Wissink, A. and Horn ung , R. (200 0). SAMRAI: A f ramew ork f or dev el- oping parallel AMR applications. In 5th Symp osium on Overset Grids and Solution T e chnolo gy , Davis , CA, pp. 18–20. [25] W orth, D. and G reenough, C. (2005). A surv ey of av ailable to ols f or dev eloping qualit y softw are using F or t ran 95. T echni- cal rep ort RAL- TR-2005, SFTC R utherfor d Appleton L ab or a tory, SESP So f twa r e Engine ering Supp ort Pr o gr amme . Av ailable at http://www. sesp.cse.clrc.ac.uk/html/Publications.html . 1 32 Figure 1 Caption: Sections of Sample Config files. Figure 2 Caption: Arc hitecture of Units, Subunits, and lo cal API. Figure 3 Caption: Images of Lagrangian tracer particles’ mov emen t with adv ance in time ev o- lution. The snapshots are tak en at times (a) T=0, (b) T=0.75, (c) T=1.75 and finally (d) T=4.2 5 seconds. The sim ulatio n w a s done on 32,768 no des of the IBM BG/ L mac hine at La wrence Liv ermore National Lab oratory , with 1856 3 grid p oin ts and more than 16 million particles. Figure 4 Caption: Lateral D ata Mov ement during tw o differen t algorithmic steps. Figure 5 Caption: The unit tes t framew ork underly ing the FLASH source tree. Unit tests are split into drive rs lo cat ed in a sub directory of the Sim ulatio n unit and imple- men tation routines within the relev ant unit b eing tested. Files are show n in italics. D otted lines indicate the coupling b et wee n the t w o units.. Figure 6 Caption: Y early num b er o f publications in whic h the F LASH co de w as used (left dark bars) and FLASH do wnloads (righ t strip ed bars). The jump in do wnloads in 2006 fo llo wed the release of the alpha v ersion o f FLASH3, the new v ersion of the co de. Figure 7 Caption: Results from a FLASH users surv ey in 2007: Reasons cited for FLASH usage. 33
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment