Computer Science / Databases Computer Science / Digital Libraries Computer Science / Information Retrieval

Simrank++: Query rewriting through link analysis of the click graph

February 23, 2026

Reading time: 6 minute

...

#Analysis #Databases #Computer Science #Digital Libraries #Information Retrieval

📝 Original Info

Title: Simrank++: Query rewriting through link analysis of the click graph
ArXiv ID: 0712.0499
Date: 2007-12-04
Authors: Ioannis Antonellis, Hector Garcia-Molina, Chi-Chao Chang

📝 Abstract

We focus on the problem of query rewriting for sponsored search. We base rewrites on a historical click graph that records the ads that have been clicked on in response to past user queries. Given a query q, we first consider Simrank as a way to identify queries similar to q, i.e., queries whose ads a user may be interested in. We argue that Simrank fails to properly identify query similarities in our application, and we present two enhanced version of Simrank: one that exploits weights on click graph edges and another that exploits ``evidence.'' We experimentally evaluate our new schemes against Simrank, using actual click graphs and queries form Yahoo!, and using a variety of metrics. Our results show that the enhanced methods can yield more and better query rewrites.

💡 Deep Analysis

Deep Dive into Simrank++: Query rewriting through link analysis of the click graph.

We focus on the problem of query rewriting for sponsored search. We base rewrites on a historical click graph that records the ads that have been clicked on in response to past user queries. Given a query q, we first consider Simrank as a way to identify queries similar to q, i.e., queries whose ads a user may be interested in. We argue that Simrank fails to properly identify query similarities in our application, and we present two enhanced version of Simrank: one that exploits weights on click graph edges and another that exploits ``evidence.’’ We experimentally evaluate our new schemes against Simrank, using actual click graphs and queries form Yahoo!, and using a variety of metrics. Our results show that the enhanced methods can yield more and better query rewrites.

📄 Full Content

In sponsored search, paid advertisements (ads) relevant to a user's query are shown above or along-side traditional web search results. The placement of these ads is in general related to a ranking score which is a function of the semantic relevance to the query and the advertiser's bid. Ideally, a sponsored search system would appear as in Figure 1. The system has access to a database of available ads and a set of bids. Conceptually, each bid consists of a query q, an ad α, and a price p. With such a bid, the bidder offers to pay if the ad α is both displayed and clicked when a user issues query q. For many queries, there are not enough direct bids, so the sponsored search system attempts to find other ads that may be of interest to the user who submitted the query. Even though there is no direct bid, if the user clicks on one of these ads, the search engine will make some money (and the advertiser will receive a customer). The challenge is then to find ads related to incoming queries that may yield user click throughs.

For a variety of practical and historical reasons, the sponsored search system is often split into two components, as shown in Figure 2. A front-end takes an input query q and produces a list of re-writes, i.e., of other queries that are “similar” to q. For example, for query “camera,” the queries “digital camera” and “photography” may be useful because the user may also be interested in ads for those related queries. The query “battery” may also be useful because users that want a camera may also be in the market for a spare battery. The query and its rewrites are then considered by the back-end, which displays ads that have bids for the query or its rewrites. The split approach reduces the complexity of the back-end, which has to deal with rapidly changing bids. The work of finding relevant ads, indirectly through related queries, is off-loaded to the front-end.

At the front-end, queries can be rewritten using a variety of techniques (reviewed in our Related Work section) developed for document search. However, these techniques often do not generate enough useful rewrites. Part of the problem is that in our case “documents” (the ads) have little text, and queries are very short, so there is less information to work with, as compared with larger documents. Another part of the problem is that there are relatively few queries in the bid database, so even if we found all the textually related ones, we may not have enough. Thus, it is important to generate additional rewrites, using other techniques.

In this paper we focus on query rewrites based on the recent history of ads displayed and clicked on. The back-end generates a historical click graph that records the clicks that were generated by ads when a user inputs a given query. The click graph is a weighted bi-partite graph, with queries on one side and ads on the other (details in Section 2). The schemes we present analyze the connections in the click graph to identify rewrites that may be useful. Our techniques identify not only queries that are directly connected by an ad (e.g., users that submit either “mp3” or “i-tunes” click on ad an for “iPod.”) but also queries that are more indirectly related (Section 3). Our techniques are based on the notion of SimRank [5], which can compute query similarity based on the connections in a bi-partite click-graph. However, in our case we need to extend SimRank to take into account the specifics of our sponsored search application.

Briefly, the contributions of this paper are as follows.

• We present a framework for query rewriting in a sponsored search environment.

• We identify cases where SimRank fails to transfer correctly the relationships between queries and ads into similarity scores.

• We present two SimRank extensions: one that takes into account the weights of the edges in the click graph, and another that takes into account the “evidence” supporting the similarity between queries.

• We experimentally evaluate these query rewriting techniques, using an actual click graph from Yahoo!, and a set of queries extracted from Yahoo! logs. We evaluate the resulting rewrites using several metrics. One of the comparisons we perform involves manual evaluation of query-rewrite pairs by members of Yahoo!’s Editorial Evaluation Team. Our results show that we can significantly increase the number of useful rewrites over those produced by SimRank and by another basic technique.

The query rewriting problem has been extensively studied in terms of traditional web search. In traditional web search, query rewriting techniques are used for recommending more useful queries to the user and for improving the quality of search results by incorporating users’ actions in the results’ ranking of future searches. Given a query and a search engine’s results on this, the indication that a user clicked on some results can be interpreted as a vote that these specific results are matching the user’s needs and thus

…(Full text truncated)…

📄 Read Full PDF on ArXiv

📸 Image Gallery

Reference

This content is AI-processed based on ArXiv data.

Simrank++: Query rewriting through link analysis of the click graph

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

📸 Image Gallery

Reference

Related Posts

Knowledge Discovery from Social Media using Big Data provided Sentiment Analysis (SoMABiT)

Topic Map: An Ontology Framework for Information Retrieval

Leveraging Discarded Samples for Tighter Estimation of Multiple-Set Aggregates

Start searching

No results found