REFLEX: Reference-Free Evaluation of Log Summarization via Large Language Model Judgment
📝 Original Info
- Title: REFLEX: Reference-Free Evaluation of Log Summarization via Large Language Model Judgment
- ArXiv ID: 2511.07458
- Date: 2025-11-06
- Authors: ** 제공된 정보에 저자 명단이 포함되어 있지 않습니다. (논문 원문이나 메타데이터를 확인해 주세요.) — **
📝 Abstract
Evaluating log summarization systems is challenging due to the lack of high-quality reference summaries and the limitations of existing metrics like ROUGE and BLEU, which depend on surface-level lexical overlap. We introduce REFLEX, a reference-free evaluation metric for log summarization based on large language model (LLM) judgment. REFLEX uses LLMs as zero-shot evaluators to assess summary quality along dimensions such as relevance, informativeness, and coherence, without requiring gold-standard references or human annotations. We show that REFLEX produces stable, interpretable, and fine-grained evaluations across multiple log summarization dataset, and more effectively distinguishes model outputs than traditional metrics. REFLEX provides a scalable alternative for evaluating log summaries in real-world settings where reference data is scarce or unavailable.💡 Deep Analysis
📄 Full Content
Reference
This content is AI-processed based on open access ArXiv data.