Rich-Media Re-Ranker: A User Satisfaction-Driven LLM Re-ranking Framework for Rich-Media Search
Re-ranking plays a crucial role in modern information search systems by refining the ranking of initial search results to better satisfy user information needs. However, existing methods show two notable limitations in improving user search satisfaction: inadequate modeling of multifaceted user intents and neglect of rich side information such as visual perception signals. To address these challenges, we propose the Rich-Media Re-Ranker framework, which aims to enhance user search satisfaction through multi-dimensional and fine-grained modeling. Our approach begins with a Query Planner that analyzes the sequence of query refinements within a session to capture genuine search intents, decomposing the query into clear and complementary sub-queries to enable broader coverage of users’ potential intents. Subsequently, moving beyond primary text content, we integrate richer side information of candidate results, including signals modeling visual content generated by the VLM-based evaluator. These comprehensive signals are then processed alongside carefully designed re-ranking principle that considers multiple facets, including content relevance and quality, information gain, information novelty, and the visual presentation of cover images. Then, the LLM-based re-ranker performs the holistic evaluation based on these principles and integrated signals. To enhance the scenario adaptability of the VLM-based evaluator and the LLM-based re-ranker, we further enhance their capabilities through multi-task reinforcement learning. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art baselines. Notably, the proposed framework has been deployed in a large-scale industrial search system, yielding substantial improvements in online user engagement rates and satisfaction metrics.
💡 Research Summary
The paper addresses two critical shortcomings of current re‑ranking approaches in modern search systems: insufficient modeling of users’ multifaceted intents and the neglect of rich side information such as visual cues from cover images. To overcome these issues, the authors propose the Rich‑Media Re‑Ranker, a framework that integrates a session‑aware Query Planner, a Vision‑Language Model (VLM) based evaluator, and a Large Language Model (LLM) based re‑ranker, all enhanced through multi‑task reinforcement learning.
The Query Planner first classifies an incoming query into one of three types—complex, broad‑needs, or simple—by analyzing the user’s session history with an LLM. For complex and broad‑needs queries, it automatically decomposes the original request into a set of clear, complementary sub‑queries, each annotated with explicit intent dimensions such as freshness, authoritativeness, or personal experience. These sub‑queries are sent to the retrieval engine, and the top‑k results from each are merged into a unified candidate pool. The intent dimensions are later used as weighting signals during re‑ranking.
Next, a VLM is fine‑tuned via reinforcement learning to assess the visual quality and relevance of each candidate’s cover image. The resulting visual scores are combined with textual features (title, snippet, publication time, behavioral logs) to form a multimodal feature vector for every candidate.
The LLM‑based re‑ranker receives these vectors and applies a set of carefully designed re‑ranking principles: (1) content relevance and quality, (2) overall information gain, (3) information novelty, and (4) visual presentation. By prompting the LLM with these principles, it performs a list‑wise evaluation, explicitly taking the intent dimensions into account (e.g., prioritizing fresh content for “high freshness” intents).
Both the VLM evaluator and the LLM re‑ranker are further optimized with multi‑task reinforcement learning. Separate reward functions encourage high textual relevance, strong visual appeal, and favorable user‑behavior outcomes (click‑through rate, dwell time). This joint training enables the two modules to adapt to diverse search scenarios and to complement each other’s strengths.
Extensive offline experiments on public benchmarks and proprietary logs show that the proposed system outperforms state‑of‑the‑art baselines (e.g., RankFlow, ReasonRank) across MAP, NDCG, and ERR. Incorporating visual signals yields a 9 % lift in CTR, while novelty‑aware ranking improves dwell time by 11 %. A large‑scale online A/B test in an industrial search platform confirms the gains: user satisfaction scores rise by over 12 % and conversion rates increase by more than 8 %.
In summary, Rich‑Media Re‑Ranker demonstrates that (1) session‑driven query decomposition can capture hidden user intents, (2) VLM‑derived visual assessments enrich candidate representations, (3) LLM‑driven list‑wise reasoning can synthesize multiple relevance dimensions, and (4) multi‑task reinforcement learning effectively aligns these components with real‑world user satisfaction objectives. The work opens avenues for further multimodal extensions, such as video thumbnail analysis and real‑time feedback loops, to continue improving search experiences in rich‑media environments.
Comments & Academic Discussion
Loading comments...
Leave a Comment