ConVibNet: Needle Detection during Continuous Insertion via Frequency-Inspired Features

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Purpose: Ultrasound-guided needle interventions are widely used in clinical practice, but their success critically depends on accurate needle placement, which is frequently hindered by the poor and intermittent visibility of needles in ultrasound images. Existing approaches remain limited by artifacts, occlusions, and low contrast, and often fail to support real-time continuous insertion. To overcome these challenges, this study introduces a robust real-time framework for continuous needle detection. Methods: We present ConVibNet, an extension of VibNet for detecting needles with significantly reduced visibility, addressing real-time, continuous needle tracking during insertion. ConVibNet leverages temporal dependencies across successive ultrasound frames to enable continuous estimation of both needle tip position and shaft angle in dynamic scenarios. To strengthen temporal awareness of needle-tip motion, we introduce a novel intersection-and-difference loss that explicitly leverages motion correlations across consecutive frames. In addition, we curated a dedicated dataset for model development and evaluation. Results: The performance of the proposed ConVibNet model was evaluated on our dataset, demonstrating superior accuracy compared to the baseline VibNet and UNet-LSTM models. Specifically, ConVibNet achieved a tip error of 2.80+-2.42 mm and an angle error of 1.69+-2.00 deg. These results represent a 0.75 mm improvement in tip localization accuracy over the best-performing baseline, while preserving real-time inference capability. Conclusion: ConVibNet advances real-time needle detection in ultrasound-guided interventions by integrating temporal correlation modeling with a novel intersection-and-difference loss, thereby improving accuracy and robustness and demonstrating high potential for integration into autonomous insertion systems.

💡 Research Summary

This paper introduces ConVibNet, a deep‑learning framework designed to detect and continuously track an ultrasound‑guided needle during live insertion, even when the needle is barely visible. Building on the previously proposed VibNet, which leveraged mechanical vibration of the needle to generate periodic frequency signatures for static detection, ConVibNet extends the concept to dynamic scenarios by explicitly modeling temporal dependencies across successive ultrasound frames.

The authors first demonstrate that the vibration‑induced frequency components remain salient during insertion. Using a short‑time Fourier transform (STFT) on pixel intensity time series, they show that spectrograms of the needle tip and shaft contain stronger, more distinct frequency energy than surrounding tissue or background, even as the needle moves. This validates the use of frequency‑domain features for continuous tracking.

ConVibNet’s architecture retains VibNet’s frequency feature extraction and encoder modules but replaces the computationally heavy deep Hough Transform (DHT) with a lightweight segmentation head, enabling real‑time inference (~30 fps). To address the need for temporal awareness, the network processes two overlapping sequences of L = 30 frames each, where the second sequence’s final frame is Δ = 5 frames later than the first. Both sequences are passed independently through the same network, and four loss components are combined: (1) focal loss for each sequence (to mitigate class imbalance inherent in thin needle masks), (2) an “intersection loss” that penalizes disagreement between the two predicted masks via binary cross‑entropy on their element‑wise product, and (3) a “difference loss” that penalizes the absolute difference between the masks, encouraging the model to capture genuine motion rather than static appearance. The total loss is L = L_focal(t) + L_focal(t+Δ) + α L_inter + β L_diff, with α and β tuned empirically; L_diff is activated only after the second epoch to avoid destabilizing early training.

Ground‑truth masks are generated using a custom data‑acquisition platform. An 18‑gauge needle is vibrated at ~2.5 Hz by a stepper motor while being inserted into ex‑vivo pork tissue at two predefined angles (15° and 30°). A 3‑D optical tracking system (NDI Polaris) records passive markers on the needle at 100 Hz, providing precise tip trajectories that are synchronized with the 30 fps ultrasound stream. Initial manual annotation of the tip in a frame where it becomes visible, combined with the tracked displacement, yields accurate masks for all subsequent frames, even when the needle is invisible. After quality control and removal of severely deviated trials, the final dataset comprises 106 video sequences, split 80 %/10 %/10 % for training, validation, and testing. On‑the‑fly augmentations (horizontal flip, contrast/brightness changes) are applied to mitigate overfitting.

Evaluation compares ConVibNet against two baselines trained on the same data: (i) VibNet without DHT (the DHT replaced by the same segmentation head as ConVibNet) and (ii) UNet‑LSTM, a classic encoder‑decoder with an LSTM module for temporal modeling. Three metrics are reported: mean tip error (Euclidean distance in mm), mean angle error (absolute deviation in degrees), and success rate (percentage of samples with tip error < 10 mm and angle error < 15°). ConVibNet achieves a tip error of 2.80 ± 2.42 mm and an angle error of 1.69 ± 2.00°, outperforming VibNet‑w/o DHT by 0.75 mm and UNet‑LSTM by a similar margin, while preserving real‑time performance.

The paper discusses limitations: the experiments are conducted on ex‑vivo tissue, so generalization to in‑vivo conditions with blood flow, heterogeneous elasticity, and patient motion remains untested. Moreover, the requirement for an external vibration actuator adds hardware complexity that must be integrated into existing clinical ultrasound systems.

Future work is suggested in three directions: (1) expanding the dataset to include diverse tissue types, insertion speeds, and patient‑specific motion; (2) integrating closed‑loop vibration control to adapt the frequency signature in real time; and (3) embedding ConVibNet into a fully autonomous needle‑insertion robot, where continuous, robust tip localization can drive feedback‑controlled actuation.

In summary, ConVibNet demonstrates that frequency‑inspired features combined with a novel intersection‑and‑difference loss can effectively capture temporal motion cues, leading to more accurate and robust needle tracking during continuous insertion, and it paves the way toward autonomous ultrasound‑guided interventions.

ConVibNet: Needle Detection during Continuous Insertion via Frequency-Inspired Features

💡 Research Summary

Comments & Academic Discussion

Leave a Comment