Predicting Startup Success Using Large Language Models: A Novel In-Context Learning Approach
Venture capital (VC) investments in early-stage startups that end up being successful can yield high returns. However, predicting early-stage startup success remains challenging due to data scarcity (e.g., many VC firms have information about only a few dozen of early-stage startups and whether they were successful). This limits the effectiveness of traditional machine learning methods that rely on large labeled datasets for model training. To address this challenge, we propose an in-context learning framework for startup success prediction using large language models (LLMs) that requires no model training and leverages only a small set of labeled startups as demonstration examples. Specifically, we propose a novel k-nearest-neighbor-based in-context learning framework, called kNN-ICL, which selects the most relevant past startups as examples based on similarity. Using real-world profiles from Crunchbase, we find that the kNN-ICL approach achieves higher prediction accuracy than supervised machine learning baselines and vanilla in-context learning. Further, we study how performance varies with the number of in-context examples and find that a high balanced accuracy can be achieved with as few as 50 examples. Together, we demonstrate that in-context learning can serve as a decision-making tool for VC firms operating in data-scarce environments.
💡 Research Summary
The paper tackles the notoriously difficult problem of forecasting early‑stage startup success in environments where labeled data are extremely scarce—a common situation for venture capital (VC) firms that often have only a few dozen historically labeled deals. Traditional supervised machine‑learning approaches require large training sets and tend to overfit when applied to such limited data, while existing large‑language‑model (LLM) applications either ignore historical outcomes (zero‑shot prompting) or rely on randomly selected few‑shot examples, which leads to unstable performance.
To overcome these limitations, the authors propose a novel few‑shot framework called k‑nearest‑neighbor In‑Context Learning (kNN‑ICL). The method first encodes each startup using a hybrid representation that concatenates normalized structured attributes (founding year, number of founders, funding rounds, etc.) with a text embedding derived from a pretrained LLM (e.g., GPT‑3.5) applied to the company’s short description. For a target startup, the system retrieves the k most similar historical startups from a modest database (4,034 Crunchbase profiles) using cosine similarity or Euclidean distance on the combined vectors. These k nearest neighbors, together with their known success/failure outcomes, are inserted into a prompt as in‑context examples. The LLM then receives the prompt, reads the target startup’s description, and generates a binary prediction (“success” or “failure”) by analogy to the provided examples.
The experimental design mirrors a realistic VC decision‑making workflow: only 10, 30, or 50 “shots” (in‑context examples) are allowed, reflecting the small number of labeled cases a firm might possess. The authors compare kNN‑ICL against three baselines: (1) standard supervised classifiers (logistic regression, random forest, XGBoost) trained on the same 4,034‑record dataset, (2) vanilla in‑context learning where the same number of examples are randomly sampled rather than retrieved by similarity, and (3) a zero‑shot prompt that does not use any historical outcomes. Evaluation metrics include balanced accuracy, precision, recall, and F1 score.
Results show a clear advantage for kNN‑ICL. With 50 shots, kNN‑ICL attains a balanced accuracy of 71.3 %, surpassing the best supervised baseline (63.1 %) and vanilla in‑context learning (69.6 %). Performance improves as the number of shots increases, confirming the few‑shot learning effect, but even with only 10 shots kNN‑ICL remains competitive. Sensitivity analysis reveals that selecting k between 5 and 10 yields stable results, while too small a k reduces the relevance of examples and too large a k introduces noise. Sector‑specific experiments (e.g., fintech, health‑tech, SaaS) demonstrate that the method consistently outperforms baselines across domains, with especially strong gains in sectors where textual descriptions carry rich signals.
The paper’s contributions are threefold: (1) It empirically validates that LLM‑based in‑context learning can serve as a practical decision‑support tool in data‑scarce VC settings, eliminating the need for costly model retraining. (2) It introduces a data‑driven example‑selection mechanism—k‑nearest‑neighbor retrieval—that substantially boosts few‑shot performance and reduces the reliance on manually crafted prompt templates. (3) It provides a unified pipeline that fuses structured and unstructured information, unlocking predictive power that traditional supervised models, which often ignore textual data, cannot capture.
The authors discuss broader implications, suggesting that dynamic curation of in‑context examples should become a standard practice for organizations deploying LLMs in operational contexts. They also outline future research avenues: integrating larger or more capable LLMs (e.g., GPT‑4), incorporating multimodal inputs such as logos or pitch deck images, combining kNN‑ICL with active learning to further reduce labeling effort, and deploying the framework in real‑time VC pipelines to assess its impact on investment outcomes.
Overall, the study demonstrates that a simple yet principled retrieval‑augmented prompting strategy can transform large language models into effective, low‑resource predictors for high‑stakes business decisions, opening the door to many other applications where labeled data are scarce but expert analogical reasoning is valuable.
Comments & Academic Discussion
Loading comments...
Leave a Comment