The main evaluation results of K-EXAONE across eight
categories: world knowledge (MMLU-Pro), math (AIME 2025), coding (LiveCodeBench v6), agentic tool use
(τ2-Bench),
instruction following (IFBench), Korean (KoBALT), multilinguality (MMMLU), and safety (KGC-Safety). All models used in
assessment are reasoning models. τ2-Bench scores
are weighted average.