프롬프트 기반 다중 뷰 CT 재구성을 위한 Lipschitz 제약 네트워크와 저장 효율 딥 언폴딩 프레임워크
📝 Abstract
Despite significant advancements in deep learning-based sparse-view computed tomography (SVCT) reconstruction algorithms, these methods still encounter two primary limitations: (i) It is challenging to explicitly prove that the prior networks of deep unfolding algorithms satisfy Lipschitz constraints due to their empirically designed nature. (ii) The substantial storage costs of training a separate model for each setting in the case of multiple views hinder practical clinical applications. To address these issues, we elaborate an explicitly provable Lipschitz-constrained network, dubbed LipNet, and integrate an explicit prompt module to provide discriminative knowledge of different sparse sampling settings, enabling the treatment of multiple sparse view configurations within a single model. Furthermore, we develop a storage-saving deep unfolding framework for multiple-in-one SVCT reconstruction, termed PromptCT, which embeds LipNet as its prior network to ensure the convergence of its corresponding iterative algorithm. In simulated and real data experiments, PromptCT outperforms benchmark reconstruction algorithms in multiple-in-one SVCT reconstruction, achieving higher-quality reconstructions with lower storage costs. On the theoretical side, we explicitly demonstrate that LipNet satisfies boundary property, further proving its Lipschitz continuity and subsequently analyzing the convergence of the proposed iterative algorithms. The data and code are publicly available at https://github.com/shibaoshun/PromptCT .
💡 Analysis
Despite significant advancements in deep learning-based sparse-view computed tomography (SVCT) reconstruction algorithms, these methods still encounter two primary limitations: (i) It is challenging to explicitly prove that the prior networks of deep unfolding algorithms satisfy Lipschitz constraints due to their empirically designed nature. (ii) The substantial storage costs of training a separate model for each setting in the case of multiple views hinder practical clinical applications. To address these issues, we elaborate an explicitly provable Lipschitz-constrained network, dubbed LipNet, and integrate an explicit prompt module to provide discriminative knowledge of different sparse sampling settings, enabling the treatment of multiple sparse view configurations within a single model. Furthermore, we develop a storage-saving deep unfolding framework for multiple-in-one SVCT reconstruction, termed PromptCT, which embeds LipNet as its prior network to ensure the convergence of its corresponding iterative algorithm. In simulated and real data experiments, PromptCT outperforms benchmark reconstruction algorithms in multiple-in-one SVCT reconstruction, achieving higher-quality reconstructions with lower storage costs. On the theoretical side, we explicitly demonstrate that LipNet satisfies boundary property, further proving its Lipschitz continuity and subsequently analyzing the convergence of the proposed iterative algorithms. The data and code are publicly available at https://github.com/shibaoshun/PromptCT .
📄 Content
serves as an effective solution, reducing human radiation exposure by acquiring partial projection data through equidistant sampling over the full scanning range [2], [3]. This approach not only shortens scanning time but also mitigates motion artifacts caused by shaking, heartbeat, and respiration [4]. Nevertheless, the absence of projection data at certain angles may result in severe global streak artifacts in CT images reconstructed using filtered back-projection (FBP), potentially compromising critical tissue details and hindering clinical diagnosis. Effectively reconstructing high-quality CT images from sparse-view projection data remains a significant challenge [5].
Traditional iterative algorithms typically tackle the SVCT reconstruction problem by imposing various hand-crafted priors. Due to the insufficient characterization of image features by hand-crafted priors and numerous iterations required for algorithm convergence, these algorithms suffer from low-quality reconstruction and heavy computational burden. Inspired by the significant success of deep learning (DL) in medical image reconstruction, deep neural networks (DNNs) have been applied to the SVCT reconstruction task [6]. Despite the highquality reconstructions achieved by these data-driven methods, the model architectures of DNNs typically lack interpretability, thereby hindering further theoretical analysis [7]- [10].
Recently, a promising research direction for interpretable networks is the “iterative theory + deep learning” scheme, which can be divided into plug-and-play (PnP) reconstruction methods [11], [12] and deep unfolding reconstruction methods [13]. On the theoretical side, if the PnP iterative algorithms are unrolled, the convergence analysis of the PnP algorithms can be transferred to analyze the stability of the corresponding unrolled algorithms [14], i.e., the convergent PnP algorithms lead to stable unrolled algorithms, with their performance becoming increasingly stable as the stage increases [14]. However, the theoretical analysis of existing PnP or deep unfolding methods still faces several limitations [15], [16]. In particular, the selection and implementation of denoisers play a critical role in PnP reconstruction algorithm convergence [17]- [19]. In fact, to ensure algorithm convergence, the use of bounded denoisers necessitates that the gradients of data fidelity terms are bounded [20]. However, incomplete data, noise interference, and computational complexity often render closed-form solutions to optimization problems infeasible in most medical imaging tasks, potentially limiting the direct application of bounded denoisers in proving algorithm convergence. On the other hand, traditional deep unfolding algorithms often rely on empirically designed prior networks, which perform well in practical applications but are difficult to explicitly prove satisfying boundary properties or Lipschitz conditions. These conditions are common prerequisites for ensuring algorithm convergence.
On the practical side, there exists a range of sparse-view sampling strategies tailored to specific clinical requirements [21]. The distribution of artifacts in the reconstructed CT images varies under different sampling views or under-sampling rates. In recent years, existing DL-based SVCT reconstruction methods typically handle these sparse sampling configurations individually by training the separate model for each specific sparse-view setting [22]- [24]. Although this “one-model-forone-setting” approach demonstrates excellent experimental performance, it requires substantial storage costs, and the flexibility of the single model limits clinical applications. Inspired by prompt learning, all-in-one methods typically train a general model for various tasks, but they require extensive datasets [25]. However, acquiring large amounts of paired CT data is impractical in medical imaging.
To tackle the aforementioned issues, we propose a prompting Lipschitz-constrained network for multiple-in-one SVCT reconstruction. This “one-model-for-multi-view” strategy aims to develop a universal multi-view model for multiple sampling views in the SVCT reconstruction task, thereby generating high-quality CT images to assist doctors in diagnosis. In theory, the proposed Lipschitz-constrained network, serving as a prior network, cleverly integrates the two theories of boundary property and Lipschitz constraint, which successfully circumvents the stringent restrictions on the data fidelity terms while ensuring the convergence of iterative algorithms. Particularly, we introduce sparse sampling masks as explicit prompts to develop a flexible single model that is trained to handle the SVCT reconstruction with different sampling ratios.
• We propose an explicitly provable Lipschitz-constrained sparse representation model-driven network, termed LipNet, constructed from a deep unfolding sparse representation framework that satisfies boundary property. Within LipNet
This content is AI-processed based on ArXiv data.