PVM-Distributed Implementation of the Radiance Code
The Parallel Virtual Machine (PVM) tool has been used for a distributed implementation of Greg Ward’s Radiance code. In order to generate exactly the same primary rays with both the sequential and the parallel codes, the quincunx sampling technique used in Radiance for the reduction of the number of primary rays by interpolation, must be left untouched in the parallel implementation. The octree of local ambient values used in Radiance for the indirect illumination has been shared among all the processors. Both static and dynamic image partitioning techniques which replicate the octree of the complete scene in all the processors and have load-balancing, have been developed for one frame rendering. Speedups larger than 7.5 have been achieved in a network of 8 workstations. For animation sequences, a new dynamic partitioning distribution technique with superlinear speedups has also been developed.
💡 Research Summary
The paper presents a distributed implementation of Greg Ward’s Radiance rendering system using the Parallel Virtual Machine (PVM) framework. Radiance is a hybrid global‑illumination engine that combines deterministic ray tracing, path tracing, and Monte‑Carlo integration, and it relies on three major acceleration techniques: an octree for fast ray‑object intersection, a second octree that stores locally computed ambient illumination values, and the quincunx sampling algorithm that dramatically reduces the number of primary rays by interpolating pixel colors. Preserving these techniques is essential because they guarantee that the parallel version produces exactly the same primary‑ray set and final image as the sequential code.
The authors adopt a client‑server architecture. The server builds the scene octree once and replicates it on all client machines. Each client maintains its own copy of the ambient‑value octree; whenever a client computes new ambient samples it broadcasts them to the others, ensuring a globally consistent ambient field and even allowing some indirect‑illumination rays to be avoided earlier than in the sequential run.
Two families of image partitioning strategies are explored. In static partitioning the image is divided into “scanbars” (groups of consecutive scanlines). Because the computational cost of a scanbar depends on scene complexity, the authors first estimate the cost of each scanbar by tracing a small random set of primary rays and counting either total rays generated or ray‑object intersections. Using these estimates they group scanbars so that each processor receives a set whose summed cost is roughly equal. To avoid idle time, the server implements a dynamic load‑balancing step: when a client finishes its work, the server asks the busiest client for half of its remaining scanbars and redistributes them, taking care to share the common boundary scanline so that the quincunx algorithm remains correct.
Dynamic partitioning goes further by maintaining a pool of work units that clients request on demand. Two variants are described: one that still uses whole scanbars (preserving the exact quincunx pattern) and another that uses fixed‑size “windows” (sub‑images) while still respecting the quincunx constraints on the window borders. This on‑the‑fly allocation eliminates idle processors entirely and adapts naturally to varying scene difficulty. For animation sequences the authors extend the scheme by keeping the ambient octree across frames; after each frame the updated ambient values are broadcast, which yields super‑linear speedups because later frames can reuse more indirect illumination data.
Experimental results on a heterogeneous network of eight UNIX workstations show speedups up to 7.5× for static partitioning and close to 8× for dynamic partitioning. The dynamic scheme consistently outperforms the static one on highly irregular scenes because its fine‑grained load balancing matches the actual runtime behavior. The main limitation is memory: every node must store a full copy of both octrees, so the maximum scene size is bounded by the smallest node’s RAM. Additionally, frequent broadcasting of ambient values can saturate the network, suggesting that larger clusters may need more sophisticated, possibly asynchronous, communication strategies.
In summary, the paper demonstrates that a careful preservation of Radiance’s intrinsic acceleration structures, combined with PVM’s message‑passing capabilities, enables efficient parallel rendering without sacrificing image quality. The proposed static and dynamic image partitioning methods, together with a simple yet effective load‑balancing protocol, provide a flexible framework that can be applied to both single‑frame rendering and animation pipelines. Future work is suggested on octree compression to reduce memory footprints and on reducing communication overhead to improve scalability on larger clusters.
Comments & Academic Discussion
Loading comments...
Leave a Comment