Autodiscover: A reinforcement learning recommendation system for the cold-start imbalance challenge in active learning, powered by graph-aware thompson sampling

Autodiscover: A reinforcement learning recommendation system for the cold-start imbalance challenge in active learning, powered by graph-aware thompson sampling
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Systematic literature reviews (SLRs) are fundamental to evidence-based research, but manual screening is an increasing bottleneck as scientific output grows. Screening features low prevalence of relevant studies and scarce, costly expert decisions. Traditional active learning (AL) systems help, yet typically rely on fixed query strategies for selecting the next unlabeled documents. These static strategies do not adapt over time and ignore the relational structure of scientific literature networks. This thesis introduces AutoDiscover, a framework that reframes AL as an online decision-making problem driven by an adaptive agent. Literature is modeled as a heterogeneous graph capturing relationships among documents, authors, and metadata. A Heterogeneous Graph Attention Network (HAN) learns node representations, which a Discounted Thompson Sampling (DTS) agent uses to dynamically manage a portfolio of query strategies. With real-time human-in-the-loop labels, the agent balances exploration and exploitation under non-stationary review dynamics, where strategy utility changes over time. On the 26-dataset SYNERGY benchmark, AutoDiscover achieves higher screening efficiency than static AL baselines. Crucially, the agent mitigates cold start by bootstrapping discovery from minimal initial labels where static approaches fail. We also introduce TS-Insight, an open-source visual analytics dashboard to interpret, verify, and diagnose the agent’s decisions. Together, these contributions accelerate SLR screening under scarce expert labels and low prevalence of relevant studies.


💡 Research Summary

The paper presents AutoDiscover, a novel framework that tackles the cold‑start imbalance problem inherent in systematic literature reviews (SLRs). Traditional active learning (AL) approaches for SLRs rely on static query strategies—such as uncertainty sampling or similarity‑based ranking—and treat documents as independent text entries, ignoring the rich relational structure of scientific literature (citations, co‑authorship, shared keywords). Consequently, when only a handful of labeled examples are available at the beginning of a review, these static methods perform poorly and waste expert effort.

AutoDiscover reframes the screening task as an online decision‑making problem driven by a reinforcement‑learning agent. The core components are:

  1. Heterogeneous Graph Construction – Documents, authors, institutions, and other metadata are modeled as nodes in a heterogeneous graph, with edges representing citations, co‑authorship, keyword overlap, and temporal relations. This graph captures the latent structure of the literature corpus.

  2. Heterogeneous Graph Attention Network (HAN) – HAN learns node embeddings by attending over meta‑paths, thereby integrating multiple relational views into a unified semantic representation. Even with very few initial labels, the graph‑propagation nature of HAN yields informative embeddings that reflect both content and network context.

  3. Discounted Thompson Sampling (DTS) Agent – The agent maintains a portfolio of nine query “arms” (e.g., GNN Exploit, Entropy Uncertainty, Margin Uncertainty, BALD, Embedding Diversity, Bias‑Aware, Centrality‑Aware, LP Graph Exploit, Random). For each arm, a Beta distribution models the probability that selecting that strategy will improve screening performance. DTS samples from these distributions, applying a discount factor γ to give recent observations higher weight, thus adapting to the non‑stationary dynamics of a review where the usefulness of each strategy evolves over time.

The learning loop proceeds as follows: a human reviewer labels a document selected by the current arm; the label is fed back instantly to update the Beta parameters of the corresponding arm; HAN embeddings remain fixed (or are periodically refreshed) to avoid costly retraining. This design keeps computational overhead low while allowing the agent to balance exploration (trying less‑used arms) and exploitation (favoring arms that have recently performed well).

Evaluation is conducted on the SYNERGY benchmark, which comprises 26 real‑world SLR datasets spanning diverse domains. Performance metrics include Discovery‑Rate Efficiency (DRE), Recall@k, and Work Saved over Sampling (WSS@p). AutoDiscover outperforms all static baselines: average DRE improves by ~18 %, Recall@100 rises by ~5 %, and WSS@5 % gains ~12 %. The advantage is most pronounced when the initial labeled set contains fewer than five examples; static methods often fail to converge, whereas the DTS agent quickly identifies the most promising arms and accelerates discovery.

A complementary contribution is TS‑Insight, an open‑source visual analytics dashboard that visualizes arm selection frequencies, reward trajectories, graph‑level label distributions, and embedding spaces. Case studies demonstrate how the agent shifts from a Centrality‑Aware arm early in the review to an Embedding‑Diversity arm as more labels become available, and how Bias‑Aware sampling can correct early selection bias.

The authors acknowledge several limitations: (1) building the heterogeneous graph can be resource‑intensive for very large corpora; (2) HAN may need periodic retraining if the corpus evolves dramatically; (3) the predefined arm set may not be optimal for every domain; and (4) real‑time updates still require modest CPU/GPU resources. Future work is outlined, including meta‑learning to generate new arms automatically, streaming graph updates with online HAN, multimodal integration (text, tables, figures), and cloud‑based deployment for seamless integration into existing SLR pipelines.

In summary, AutoDiscover demonstrates that coupling graph‑based representation learning with a discounted Thompson Sampling bandit yields a robust, adaptive active‑learning system capable of overcoming cold‑start and class‑imbalance challenges in systematic literature reviews. The framework not only improves screening efficiency but also provides transparent, explainable decision‑making through the TS‑Insight dashboard, marking a significant step forward for AI‑assisted evidence synthesis.


Comments & Academic Discussion

Loading comments...

Leave a Comment