A Semantic QA-Based Approach for Text Summarization Evaluation

February 23, 2026

Reading time: 6 minute

...

📝 Abstract

Many Natural Language Processing and Computational Linguistics applications involves the generation of new texts based on some existing texts, such as summarization, text simplification and machine translation. However, there has been a serious problem haunting these applications for decades, that is, how to automatically and accurately assess quality of these applications. In this paper, we will present some preliminary results on one especially useful and challenging problem in NLP system evaluation: how to pinpoint content differences of two text passages (especially for large pas-sages such as articles and books). Our idea is intuitive and very different from existing approaches. We treat one text passage as a small knowledge base, and ask it a large number of questions to exhaustively identify all content points in it. By comparing the correctly answered questions from two text passages, we will be able to compare their content precisely. The experiment using 2007 DUC summarization corpus clearly shows promising results.

💡 Analysis

🇰🇷 한글로 읽기

📄 Content

A Semantic QA-Based Approach for Text Summarization Evaluation Ping Chen, Fei Wu, Tong Wang, Wei Ding University of Massachusetts Boston ping.chen@umb.edu

Abstract Many Natural Language Processing and Computational Lin- guistics applications involve the generation of new texts based on some existing texts, such as summarization, text simplification and machine translation. However, there has been a serious problem haunting these applications for dec- ades, that is, how to automatically and accurately assess qual- ity of these applications. In this paper, we will present some preliminary results on one especially useful and challenging problem in NLP system evaluation – how to pinpoint content differences of two text passages (especially for large passages such as articles and books). Our idea is intuitive and very dif- ferent from existing approaches. We treat one text passage as a small knowledge base, and ask it a large number of ques- tions to exhaustively identify all content points in it. By com- paring the correctly answered questions from two text pas- sages, we will be able to compare their content precisely. The experiment using 2007 DUC summarization corpus clearly shows promising results. Introduction
Technologies spawned from Natural Language Processing (NLP) and Computational Linguistics (CL) have fundamen- tally changed how we process, share, and access infor- mation, e.g., search engines, and questions answering sys- tems. However, there has been a serious problem haunting many NLP applications, that is, how to automatically and accurately assess the quality of these applications. In some case, evaluation of a NLP task itself has become an active research area itself, such as text summarization evaluation. The main difficulty for developing such evaluation comes from the diversity of the NLP domain, and our insufficient understanding of natural languages and human intelligence in general. In this paper, we focus on one especially useful and challenging area in NLP evaluation – how to semanti- cally compare the content of two text passages (e.g., para- graphs, articles, or even large corpora). Pinpointing content differences among texts is critical to evaluation of many im- portant NLP applications, such as summarization, text cate- gorization, text simplification, and machine translation. Not surprisingly, many evaluation methods have been proposed,

but the quality of existing methods themselves is hard to as- sess. In many cases, human evaluation must be adopted, which is often slow, subjective, and expensive. In this paper we present an intuitive and innovative idea completely dif- ferent from existing methods:
If we treat one text passage as a small knowledge base, can we ask it a large number of questions to exhaustively identify all content points in it? By comparing the correctly answered questions from two text passages, we can compare their content precisely. This idea may seem confusing as “circling around the target” in- stead of “directly hitting the target”. However, Our Question Answering (QA)-based content evaluation is intuitive and supported by the following insights:

When we assess someone’s understanding on a subject, we do not ask him to write down all he knows about the subject. Instead, a list of questions will be asked, and accu- rate and objective assessment can be achieved by counting the number of correct answers. During this question answer- ing process, we can also identify which areas he needs to improve.
Practical operability. When assessing the similarity of two texts, direct comparison may look natural. However, with current methods (no matter supervised or rule-based) this direct approach becomes increasingly difficult as we move to larger text passages. For example, comparing two articles needs to answer the following questions: how to align sentences, how to semantically represent a sentence, how to generate similarity scores without annotated samples (or as few as possible to minimize cost), how to interpret and evaluate these scores, how to find the content differences of two texts, etc.
Easy to interpret. Many existing methods only generate a single score, which illustrates little detail as how an assess- ment measure is generated and offers no help for system im- provement. On the other hand, our QA-based approach re- quires minimum manual efforts, clearly shows how a meas- ure is calculated, and pinpoints exactly the content differ- ences of two text passages. In next section we will discuss some existing work. Sec- tion 3 will show the architecture for our QA-based evalua- tion approach, and experiment results will be presented in section 4. We will provide some insights and findings when we design our evaluation system and conduct experiments in one discussion section 5. We conclude in Section 6. Related Work Human eva

View Original ArXiv

This content is AI-processed based on ArXiv data.

A Semantic QA-Based Approach for Text Summarization Evaluation

📝 Abstract

💡 Analysis

📄 Content

Table of Contents

Table of Contents

📝 Abstract

💡 Analysis

📄 Content

Start searching

No results found