Rethinking Explainable Disease Prediction: Synergizing Accuracy and Reliability via Reflective Cognitive Architecture
In clinical decision-making, predictive models face a persistent trade-off: accurate models are often opaque “black boxes,” while interpretable methods frequently lack predictive precision or statistical grounding. In this paper, we challenge this dichotomy, positing that high predictive accuracy and high-quality descriptive explanations are not competing goals but synergistic outcomes of a deep, first-hand understanding of data. We propose the Reflective Cognitive Architecture (RCA), a novel framework designed to enable Large Language Models (LLMs) to learn directly from tabular data through experience and reflection. RCA integrates two core mechanisms: an iterative rules optimization process that refines logical argumentation by learning from prediction errors, and a distribution-aware rules check that grounds this logic in global statistical evidence to ensure robustness. We evaluated RCA against over 20 baselines - ranging from traditional machine learning to advanced reasoning LLMs and agents - across diverse medical datasets, including a proprietary real-world Catheter-Related Thrombosis (CRT) cohort. Crucially, to demonstrate real-world scalability, we extended our evaluation to two large-scale datasets. The results confirm that RCA achieves state-of-the-art predictive performance and superior robustness to data noise while simultaneously generating clear, logical, and evidence-based explanatory statements, maintaining its efficacy even at scale. The code is available at https://github.com/ssssszj/RCA.
💡 Research Summary
The paper tackles a central dilemma in clinical AI: models that achieve high predictive performance are typically opaque, while interpretable models often sacrifice accuracy or lack solid statistical grounding. The authors argue that accuracy and high‑quality explanations are not competing objectives but mutually reinforcing outcomes of a deep, experience‑driven understanding of the data. To operationalize this insight, they introduce the Reflective Cognitive Architecture (RCA), a novel framework that enables large language models (LLMs) to learn directly from tabular medical records through a cycle of experience, reflection, and statistical validation.
RCA consists of two tightly coupled mechanisms. First, an Iterative Rules Optimization loop treats every mis‑prediction as an “experience.” The prediction model (M_pred) uses a dynamic rule base Rk‑1 together with a global data distribution summary D_train to generate a label and a natural‑language explanation for each patient. Incorrect cases are collected into a textual error batch S_error, which a reflective LLM (M_ref) processes to refine the rule base, turning concrete errors into abstract, generalizable rules. This loop continuously sharpens logical argumentation (LA) as the model strives for higher accuracy. Second, a Distribution‑aware Rules Check component (M_chk) audits the newly generated rules against the global statistical context D_train, ensuring that each rule is evidence‑based (EBM) and preventing over‑fitting or cognitive bias (CB). Rules that conflict with the overall distribution are corrected or discarded, guaranteeing that the final rule set remains grounded in the data.
The pipeline begins by converting each structured patient vector into an unstructured textual narrative (e.g., “Granulocyte‑to‑lymphocyte ratio is 4.88, D‑dimer is 3.16, no chemotherapy, catheterization is CVC”). This transformation makes the data directly consumable by LLMs. Simultaneously, a concise statistical summary of the training cohort (means, quantiles, frequencies) is extracted and supplied as contextual “sanity check” information. During training, the rule base evolves through repeated cycles of error‑driven refinement and distribution validation. At inference time, the final, vetted rule base R_f is fixed; M_pred then produces both a binary disease prediction and a narrative explanation that reflects the learned logical rules and the global evidence.
The authors evaluate RCA on three disease‑prediction datasets: a proprietary real‑world Catheter‑Related Thrombosis (CRT) cohort and two large public medical datasets. They compare against 22 baselines spanning traditional machine learning (logistic regression, XGBoost), deep models (MLP, Transformer), neural‑symbolic approaches (LTN, LNN), and recent LLM‑based agents that rely on tool‑calling or code generation. Evaluation metrics cover (a) predictive accuracy (AUROC, AUPRC), (b) robustness to injected label noise, and (c) explanation quality measured along four cognitive‑science‑inspired dimensions: Cognitive Load (CL), Logical Argumentation (LA), Evidence‑Based Medicine (EBM), and Cognitive Bias reduction (CB).
Results show that RCA consistently outperforms all baselines. Predictive performance reaches AUROC ≈ 0.93 across datasets, surpassing the best non‑RCA method by 2–5 percentage points. Under 10 % label noise, RCA’s AUROC degrades by only ~0.02, indicating strong robustness. Explanation assessments reveal markedly lower cognitive load (average score 1.8 vs. 3.2 for standard LLM explanations), higher logical coherence, stronger alignment with statistical evidence, and reduced bias. The final rule base contains a modest number of human‑readable conditionals, each validated against the global distribution, demonstrating that the system can generate concise, medically sound rationales.
The study validates the central hypothesis: by forcing the model to acquire a deep, experience‑based representation of the data, both predictive accuracy and explanatory fidelity improve together. Unlike existing LLM agents that interact with data through APIs or generated code—thereby remaining at a high level of abstraction—RCA engages the LLM directly with the raw data narrative, mimicking how clinicians iteratively examine cases, reflect on errors, and refine diagnostic heuristics. This “experience‑reflection” loop bridges the gap between statistical rigor and natural‑language interpretability, offering a new paradigm for trustworthy AI in healthcare.
In conclusion, the Reflective Cognitive Architecture demonstrates that a carefully designed feedback loop between error‑driven rule learning and distribution‑aware validation enables large language models to simultaneously achieve state‑of‑the‑art disease prediction performance, robustness to noisy clinical data, and high‑quality, evidence‑grounded explanations. The work opens avenues for extending the approach to multimodal medical data, automated rule visualization, and integration into real‑world clinical decision‑support workflows.
Comments & Academic Discussion
Loading comments...
Leave a Comment