Encyclo-K: Evaluating LLMs with Dynamically Composed Knowledge Statements

Reading time: 1 minute
...

📝 Original Info

  • Title: Encyclo-K: Evaluating LLMs with Dynamically Composed Knowledge Statements
  • ArXiv ID: 2512.24867
  • Date: 2025-12-31
  • Authors: Yiming Liang, Yizhi Li, Yantao Du, Ge Zhang, Jiayi Zhou, Yuchen Wu, Yinzhu Piao, Denghui Cao, Tong Sun, Ziniu Li, Li Du, Bo Lei, Jiaheng Liu, Chenghua Lin, Zhaoxiang Zhang, Wenhao Huang, Jiajun Zhang

📝 Abstract

Robust benchmarks are essential for accurately reflecting the generalization capabilities of large language models (LLMs). Existing benchmarks that curate questions at the question level suffer from three limitations: vulnerability to data contamination, restriction to single-concept assessment, and reliance on costly domain expert annotation. We propose Encyclo-K , a statement-based benchmark that extracts standalone knowledge statements from authoritative textbooks and dynamically composes them into evaluation questions through random sampling at test time. This design directly addresses all three limitations: the combinatorial space resists memorization while maintaining stable model rankings across question sets; each question aggregates 8-10 statements for comprehensive knowledge assessment; and annotators only verify formatting compliance without requiring domain expertise. Experiments on over 50 LLMs demonstrate that Encyclo-K poses substantial challenges-even OpenAI-GPT-5.1 achieves only 62.07% accuracy, with model performance displaying clear gradient distributions across b...

📄 Full Content

...(본문 내용이 길어 생략되었습니다. 사이트에서 전문을 확인해 주세요.)

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut