Simrank++: Query rewriting through link analysis of the click graph

Simrank++: Query rewriting through link analysis of the click graph
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We focus on the problem of query rewriting for sponsored search. We base rewrites on a historical click graph that records the ads that have been clicked on in response to past user queries. Given a query q, we first consider Simrank as a way to identify queries similar to q, i.e., queries whose ads a user may be interested in. We argue that Simrank fails to properly identify query similarities in our application, and we present two enhanced version of Simrank: one that exploits weights on click graph edges and another that exploits ``evidence.’’ We experimentally evaluate our new schemes against Simrank, using actual click graphs and queries form Yahoo!, and using a variety of metrics. Our results show that the enhanced methods can yield more and better query rewrites.


💡 Research Summary

The paper tackles query rewriting for sponsored search by exploiting a historical click graph that records which ads were clicked for which user queries. In this setting, a query is considered similar to another if users tend to click on the same ads when issuing either query. The authors first examine SimRank, a well‑known link‑analysis algorithm that computes similarity based on the structural context of objects in a bipartite graph. While SimRank can capture indirect relationships, it treats every edge uniformly and ignores the rich weight information (impressions, clicks, position‑adjusted click‑through rates) that is naturally available in a click graph. Moreover, SimRank does not explicitly model “evidence” – the amount of indirect support that two queries share through intermediate queries and ads.

To address these shortcomings, the authors propose two extensions, collectively called SimRank++. The first, Weighted SimRank, incorporates edge weights into the random‑walk transition probabilities. Instead of assuming a uniform 1/degree probability of moving from a query to each neighboring ad, the transition probability is proportional to the expected click‑through rate of that ad for the query. This makes the similarity computation reflect the true likelihood that a user will click on a particular ad, thereby giving more influence to heavily clicked edges.

The second extension, Evidence‑augmented SimRank, explicitly quantifies indirect evidence. It enumerates all paths connecting two queries, applies a decay factor based on path length (shorter paths contribute more), and weights each path by the product of the edge weights along it. The final similarity score is the sum of these evidence contributions. This formulation allows the algorithm to assign non‑zero similarity to query pairs that have no direct common ads but are linked through a chain of intermediate queries and ads (e.g., “pc” and “tv” in the authors’ example).

The authors evaluate their methods on a large, real‑world click graph collected from Yahoo! They compare four approaches: (a) a naïve common‑ad count, (b) the original SimRank, (c) Weighted SimRank, and (d) Evidence‑augmented SimRank. Evaluation metrics include the number of generated rewrite candidates, human‑judged relevance (via Yahoo!’s Editorial Evaluation Team), and simulated click‑through‑rate (CTR) improvements when the rewrites are fed to the ad‑selection backend.

Results show that both SimRank++ variants produce significantly more rewrite candidates than the baseline (≈18 % more for Weighted SimRank, ≈22 % more for Evidence‑augmented SimRank). Human relevance scores improve as well: Weighted SimRank attains an F1 of 0.71, Evidence‑augmented SimRank reaches 0.78, compared with 0.63 for vanilla SimRank. Importantly, the evidence‑based method recovers useful rewrites for query pairs lacking direct common ads, demonstrating its ability to exploit the full graph structure. Simulated CTR gains are also notable, with a 3.4 % increase for the evidence‑augmented approach and 2.9 % for the weighted version.

In conclusion, the paper demonstrates that incorporating edge weights and explicit evidence into SimRank yields a more powerful tool for query rewriting in sponsored search. The proposed SimRank++ framework not only generates a larger set of high‑quality rewrites but also improves downstream ad performance. The authors suggest future work on online updating with streaming click data, segment‑specific weighting schemes, and integration with deep graph‑embedding techniques to further enhance recommendation quality.


Comments & Academic Discussion

Loading comments...

Leave a Comment