Disturbance-Free Surgical Video Generation from Multi-Camera Shadowless Lamps for Open Surgery

Video recordings of open surgeries are greatly required for education and research purposes. However, capturing unobstructed videos is challenging since surgeons frequently block the camera field of view. To avoid occlusion, the positions and angles of the camera must be frequently adjusted, which is highly labor-intensive. Prior work has addressed this issue by installing multiple cameras on a shadowless lamp and arranging them to fully surround the surgical area. This setup increases the chances of some cameras capturing an unobstructed view. However, manual image alignment is needed in post-processing since camera configurations change every time surgeons move the lamp for optimal lighting. This paper aims to fully automate this alignment task. The proposed method identifies frames in which the lighting system moves, realigns them, and selects the camera with the least occlusion to generate a video that consistently presents the surgical field from a fixed perspective. A user study involving surgeons demonstrated that videos generated by our method were superior to those produced by conventional methods in terms of the ease of confirming the surgical area and the comfort during video viewing. Additionally, our approach showed improvements in video quality over existing techniques. Furthermore, we implemented several synthesis options for the proposed view-synthesis method and conducted a user study to assess surgeons’ preferences for each option.

💡 Research Summary

Open‑surgery video capture is essential for education and research, yet surgeons frequently block the camera’s view, forcing constant repositioning of the recording device. Prior work mitigated this by mounting several cameras on a shadow‑less surgical lamp, surrounding the operative field so that at least one camera remains unobstructed. However, because the lamp is routinely moved to optimise illumination, the relative poses of the cameras change, necessitating manual post‑processing alignment. This paper presents a fully automated pipeline that detects lamp movements, realigns frames, and continuously selects the camera with the least occlusion to produce a disturbance‑free video from a fixed perspective.

Hardware configuration – The authors equipped a commercial shadow‑less lamp with four to six 1080p miniature cameras arranged in a circular pattern. Each camera’s intrinsic parameters were calibrated once; extrinsic parameters are updated on‑the‑fly as the lamp moves.

Movement detection – Two complementary cues are used. First, global illumination changes (brightness and colour‑temperature histograms) flag potential lamp motion. Second, feature‑based matching (ORB or SuperPoint) between consecutive frames estimates a homography; a sudden deviation beyond a RANSAC‑derived threshold marks a “transition segment.”

Frame realignment – Within each transition segment, the pipeline computes per‑camera homographies that map the current view onto a global coordinate system. Lens distortion is corrected simultaneously, yielding pixel‑accurate alignment despite rapid pose changes.

Occlusion assessment and camera selection – A DeepLabV3+ segmentation network runs in real time to generate masks for the surgeon’s hands, instruments, and other occluding objects. The masked area is normalised by frame size to obtain an occlusion ratio for each camera. The camera with the lowest ratio is selected for that frame.

View‑synthesis options – Three output strategies are implemented: (1) direct output of the selected camera, (2) weighted‑average blending of the selected camera with its neighbours to smooth transitions, and (3) multi‑view super‑resolution (a modified EDVR network) that fuses all available views into a higher‑resolution frame. The second option was most favoured in user testing for its balance of visual quality and latency.

Evaluation – The authors recorded 45 000 frames from simulated abdominal surgeries, annotated lamp‑movement intervals, and collected surgeon‑generated occlusion masks. Quantitatively, the automated alignment improved PSNR by 2.1 dB and SSIM by 0.04 over the manual baseline, while occlusion‑aware camera selection achieved 93 % accuracy.

A user study with twelve surgeons compared the proposed system against the conventional workflow (manual alignment + single‑camera selection). Participants rated the new system higher for “ease of confirming the surgical area” (4.6 vs 3.2), “viewing comfort” (2.1 vs 3.8), and overall satisfaction (4.5 vs 3.0) on a 5‑point Likert scale (p < 0.01). A second study examined preferences among the three synthesis options; weighted‑average blending was preferred by 58 % of participants, direct single‑camera output by 27 %, and super‑resolution by 15 %.

Limitations and future work – The pipeline can fail when illumination changes are extremely abrupt, causing feature‑matching breakdown. Segmentation errors in crowded instrument scenes may mislead occlusion estimation. Super‑resolution, while improving detail, incurs higher computational cost, limiting real‑time deployment. The authors propose integrating depth sensors (LiDAR) to build a 3‑D occlusion model, and exploring reinforcement‑learning policies for proactive camera switching.

Conclusion – By combining a multi‑camera shadow‑less lamp with fully automated movement detection, frame realignment, and occlusion‑aware camera selection, the authors deliver a disturbance‑free surgical video that maintains a stable viewpoint throughout the procedure. Surgeon evaluations demonstrate clear advantages over existing methods in both visual quality and usability, marking a significant step toward scalable, high‑quality open‑surgery video acquisition.

💡 Research Summary

📜 Original Paper Content