생성형 검색에서 공정한 기여도 평가를 위한 MAXSHAPLEY 알고리즘
📝 Abstract
Generative search engines based on large language models (LLMs) are replacing traditional search, fundamentally changing how information providers are compensated. To sustain this ecosystem, we need fair mechanisms to attribute and compensate content providers based on their contributions to generated answers. We introduce MAXSHAPLEY, an efficient algorithm for fair attribution in generative search pipelines that use retrieval-augmented generation (RAG). MAXSHAPLEY is a special case of the celebrated Shapley value; it leverages a decomposable max-sum utility function to compute attributions with linear computation in the number of documents, as opposed to the exponential cost of Shapley values. We evaluate MAXSHAPLEY on three multi-hop QA datasets (HotPotQA, MuSiQUE, MS MARCO); MAXSHAPLEY achieves comparable attribution quality to exact Shapley computation, while consuming a fraction of its tokens-for instance, it gives up to an 8x reduction in resource consumption over prior state-of-the-art methods at the same attribution accuracy.
💡 Analysis
Generative search engines based on large language models (LLMs) are replacing traditional search, fundamentally changing how information providers are compensated. To sustain this ecosystem, we need fair mechanisms to attribute and compensate content providers based on their contributions to generated answers. We introduce MAXSHAPLEY, an efficient algorithm for fair attribution in generative search pipelines that use retrieval-augmented generation (RAG). MAXSHAPLEY is a special case of the celebrated Shapley value; it leverages a decomposable max-sum utility function to compute attributions with linear computation in the number of documents, as opposed to the exponential cost of Shapley values. We evaluate MAXSHAPLEY on three multi-hop QA datasets (HotPotQA, MuSiQUE, MS MARCO); MAXSHAPLEY achieves comparable attribution quality to exact Shapley computation, while consuming a fraction of its tokens-for instance, it gives up to an 8x reduction in resource consumption over prior state-of-the-art methods at the same attribution accuracy.
📄 Content
Large language models (LLMs) have fundamentally changed how people interact with information online. As a prominent example, Generative search engines (also known as “LLM search”) reduce cognitive load on users by providing answers to queries without requiring users to sift through information sources or synthesize information themselves. As a result, generative search products (e.g. Perplexity AI [69] and Google Gemini [22]) are rapidly replacing traditional search engine products; many generative search products are already serving tens of millions of users daily [65].
Generative search pipelines typically invoke a two-step process for answering user queries: (1) First, they retrieve relevant documents from a large corpus (e.g. the web, or a proprietary knowledge base).
(2) Given the retrieved documents, they generate a concise response to the query, which is shown directly to the user. This paradigm is an example of retrieval-augmented generation (RAG) [33,38,47].
Despite its promise, generative search completely changes existing incentive structures for content providers. Today, content providers (e.g. news websites, blogs, education websites) rely in part on search engines to direct users to their sites; this traffic is typically monetized via advertisements [92]. Generative search engines instead allow users to obtain answers directly from an AI-generated summary without visiting original sources. Traffic to content providers appears to have dropped significantly since the launch of popular generative search engines [14,74], with Bain & Company estimating that as of early 2025, about 80% of web search users reported using AI summaries without progressing to another destination at least 40% of the time [81,82]-even though generative search engines have started to provide basic citations to original sources. According to recent reports [26,80], the fraction of worldwide web traffic produced by traditional search fell about 5% from June 2024 to June 2025, with some sources estimating an even larger drop (up to 25% [82]). Some media organizations are referring to the resulting reduction in traffic as an “extinction-level event” [4].
Content providers are starting to push back; several lawsuits have already been filed against generative search providers for reduced traffic and lost revenue [29,62,68]. A complementary, but related, set of lawsuits sued AI companies for using copyrighted material during training (e.g. the New York Times lawsuit against OpenAI [84] and the LibGen lawsuit against Anthropic [12]). These lawsuits are resulting in billions of dollars in liabilities and an increasing distrust from content creators [70].
Nascent industry efforts to rethink content providers’ relationship with LLM search include generative search engines that compensate content providers [1,32], and features allowing content providers to block AI crawlers or demand payment per crawl [3]. We do not know the full compensation structure for these approaches, and it is unclear if and how these efforts tailor compensation to the relevance of content. Khosrowi et al. argue that, “Credit for… [AI] output should be distributed between… contributors according to the nature and significance of… contributions made” [42]. Crucially, without a fair incentive structure, content providers may choose to withhold content from generative search engines, harming the whole ecosystem.
Problem statement and status quo. We predict that the business model for generative search will need to evolve to compensate content providers for their contributions. Early academic efforts to rethink the LLM ads ecosystem have primarily focused on sponsored search auctions for LLMs [11,13,21,25,28,34], which do not benefit organic content providers. In this paper, our goal is to define a method for attributing generative search results to original sources, so that content providers can be fairly compensated. In particular, we define “fairness” according to common axiomatic properties (Section 2). A key operational requirement is that our algorithm should be practical for existing generative search pipelines by minimizing the number and size of queries to an LLM oracle.
Prior Work. In the broader ML community, variants of the attribution problem have been used to interpret and explain the behaviors of complex machine learning models (we include a more complete description of related work in Section 6). Notable high-impact works include datamodels [37], TRAK [67] and Data Shapley [30,86,87] for training-time attribution to training samples, and LIME [73] and Kernel SHAP [53] for inference-time attribution between inputs and features. In contrast, our work aims to conduct inference-time attribution of outputs to RAG data sources.
In the RAG domain, the most relevant line of work is context attribution, which aims to identify which piece of retrieved context information leads to the final answer generated by an LLM [16-18, 23, 36, 49, 72, 90].
This content is AI-processed based on ArXiv data.