REFLEX: Reference-Free Evaluation of Log Summarization via Large Language Model Judgment

Reading time: 1 minute
...

📝 Original Info

  • Title: REFLEX: Reference-Free Evaluation of Log Summarization via Large Language Model Judgment
  • ArXiv ID: 2511.07458
  • Date: 2025-11-06
  • Authors: ** 제공된 정보에 저자 명단이 포함되어 있지 않습니다. (논문 원문이나 메타데이터를 확인해 주세요.) — **

📝 Abstract

Evaluating log summarization systems is challenging due to the lack of high-quality reference summaries and the limitations of existing metrics like ROUGE and BLEU, which depend on surface-level lexical overlap. We introduce REFLEX, a reference-free evaluation metric for log summarization based on large language model (LLM) judgment. REFLEX uses LLMs as zero-shot evaluators to assess summary quality along dimensions such as relevance, informativeness, and coherence, without requiring gold-standard references or human annotations. We show that REFLEX produces stable, interpretable, and fine-grained evaluations across multiple log summarization dataset, and more effectively distinguishes model outputs than traditional metrics. REFLEX provides a scalable alternative for evaluating log summaries in real-world settings where reference data is scarce or unavailable.

💡 Deep Analysis

📄 Full Content

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut