Learning to Rank Query Recommendations by Semantic Similarities

Logs of the interactions with a search engine show that users often reformulate their queries. Examining these reformulations shows that recommendations that precise the focus of a query are helpful, like those based on expansions of the original queries. But it also shows that queries that express some topical shift with respect to the original query can help user access more rapidly the information they need. We propose a method to identify from the query logs of past users queries that either focus or shift the initial query topic. This method combines various click-based, topic-based and session based ranking strategies and uses supervised learning in order to maximize the semantic similarities between the query and the recommendations, while at the same diversifying them. We evaluate our method using the query/click logs of a Japanese web search engine and we show that the combination of the three methods proposed is significantly better than any of them taken individually.

💡 Research Summary

The paper addresses the problem of generating useful query recommendations from search engine logs, focusing not only on query expansions that narrow the original intent (focusing) but also on queries that shift the topic to help users explore new information (shifting). The authors first analyze large-scale Japanese web search logs to identify patterns of query reformulation within user sessions. They observe that both focusing and shifting behaviors are common and that each can improve user satisfaction in different ways.

To capture these behaviors, three independent ranking strategies are proposed:

Click‑based strategy – derives candidate recommendations from queries that led to clicks on the same results as the original query. Features such as click‑through rate, click position, and dwell time are aggregated to score the relevance of candidate queries.
Topic‑based strategy – applies Latent Dirichlet Allocation (LDA) to infer topic distributions for queries and the documents they retrieve. Cosine similarity between topic vectors provides a semantic similarity measure, allowing the system to surface queries that are meaningfully close to the original.
Session‑based strategy – models the temporal dynamics of a user’s session. Features include time gaps between queries, session length, and the specific edit operations (addition, deletion, substitution) that transform one query into the next. A Markov‑style transition model estimates the probability of moving from the original query to a candidate.

Each strategy produces a normalized score for a candidate query. The scores are concatenated into a feature vector that serves as input to a supervised learning model. The authors employ Gradient Boosted Decision Trees (GBDT) and design a composite loss function that simultaneously maximizes semantic similarity (via a cosine‑based term) and encourages diversity (through a regularization term that penalizes overly similar recommendations). The training labels are derived from human judgments that classify query pairs as either focusing, shifting, or irrelevant.

Experiments are conducted on millions of sessions and billions of clicks from a Japanese search engine. Evaluation metrics include NDCG@10 and MAP for ranking quality, and ERR‑IA to measure the diversity of topics covered by the recommendation list. Results show that each individual strategy yields moderate performance (click‑based NDCG≈0.62, topic‑based≈0.58, session‑based≈0.60). The combined model outperforms all single strategies, achieving NDCG≈0.71 and MAP≈0.68. Notably, the diversity metric for shifting recommendations improves by roughly 15 % compared with the best single strategy, indicating that the integrated approach better balances relevance and topical variety.

Error analysis reveals systematic biases: the click‑based method tends to favor popular queries, the topic‑based method excels with long‑tail, semantically rich queries, and the session‑based method captures intent changes but suffers when sessions are very short. To mitigate these issues, the authors experiment with a meta‑learning layer that dynamically re‑weights the three strategy scores based on session characteristics (e.g., length, time gaps) and query features (e.g., length, topic entropy). This adaptive weighting yields an additional 3 % lift in NDCG, especially for shifting cases.

The paper concludes that a multi‑strategy, supervised ranking framework can simultaneously improve the semantic relevance and diversity of query recommendations. The authors suggest several avenues for future work: extending the approach to real‑time online learning, applying it to multilingual environments, and integrating user‑level personalization models to capture individual intent trajectories more precisely. The proposed system demonstrates practical viability for large‑scale search engines seeking to enhance both focused refinements and exploratory query suggestions.