LORE: A Large Generative Model for Search Relevance

Achievement. We introduce LORE(Large Generative Model for Search Relevance), a complete and sustainable framework of iterative practices for large language models(LLMs) in e-commerce search relevance, which achieves a cumulative +27% improvement on the online GoodRate metric. Over the past three years, this project has demonstrated significant improvements in relevance judgment and has undergone three full-scale iterations across key dimensions, including data, features, training paradigms, evaluation, and application. Throughout the iterative development of LORE, we have gained valuable experience and insights that we believe are worth sharing with the community in this report. Insight. To enhance LLMs for relevance, existing works have modeled the Chain-of-Thought (CoT) from various perspectives. However, we find that these methods often exhibit blind spots, as they lack a principled deconstruction of the task itself. Our analysis reveals that complex relevance judgment is not a monolithic reasoning problem but rather a composite of distinct capabilities, including knowledge and reasoning, multi-modal matching, and rule adherence. Based on this insight, we propose a systematic framework that first deconstructs the problem and then leverages this deconstruction to guide a training paradigm that explicitly models each required capability. We argue that such qualitative-driven analysis is essential for breaking through existing performance bottlenecks. Contributions. LORE is a complete, replicable blueprint for LLMbased relevance modeling that spans the entire lifecycle. First, we conducted systematic preliminary explorations into foundational training elements-including features, prompts, and base models-and summarized the general principles derived from this process. Second, guided by our structural analysis, we propose a sophisticated twostage training paradigm. In the first stage, we use progressive CoT synthesis and Supervised Fine-Tuning (SFT) to instill comprehensive capabilities. In the second, a carefully designed Reinforcement Learning (RL) phase aligns the model with human preferences. We also share key insights from our exploration of these training strategies. Third, to ensure rigorous validation, we construct a comprehensive benchmark, RAIR, tailored to evaluate the core capabilities we identified. Finally, to overcome the challenges of real-time computation, we designed a query-based stratified deployment strategy that comprehensively transfers the offline LLM’s ability to online system, leading to substantial online performance gains. LORE serves as both a practical solution for developing advanced ecommerce relevance systems and a methodological reference for domainspecific post-training, with insights generalizable across vertical industries.

📜 Original Paper Content