Exploring Query Categorisation for Query Expansion: A Study
The vocabulary mismatch problem is one of the important challenges facing traditional keyword-based Information Retrieval Systems. The aim of query expansion (QE) is to reduce this query-document mismatch by adding related or synonymous words or phra…
Authors: Dipasree Pal, M, ar Mitra
Exploring Query Categor isation for Query Expansion: A Study Dipasree P al ∗ 1 , Mandar Mitra † 1 , and Samar Bhattac hary a ‡ 2 1 Indian Statistical Institute, 203 B.T. Road, Kolk ata-700108, India 2 Jada vpur Univ ersit y , 188, Ra ja SCM Road, Kolk ata-700032, India Abstract The vocabular y mismatc h problem is one of the important c hallenge s facing traditiona l keyw ord-base d Information Retriev al Systems. The aim of query expans io n (QE) is to r educe this query-do cument mismatch by adding related or sy nonymous w ords or phras es to the query . Several existing query expansion alg orithms have proved their mer it, but they ar e not uniformly bene fic ia l for all kinds of queries. Our long-ter m goal is to for mulate methods for applying Q E techniques tailor ed to indiv idua l queries, rather than applying the same g eneral QE metho d to a ll quer ies. As an initial step, we hav e pr op osed a taxono my o f query classes (from a QE p ersp ective) in this rep ort. W e hav e dis cussed the pr op erties of e a ch quer y class with examples . W e have also discussed so me QE stra tegies that might b e effective for each query categor y . In future work, we intend to test the prop o sed techniques using standar d datas ets, and to ex plore automatic query categ orisatio n metho ds . 1 In tro duction The use of Search Engines (SEs) has b eco me an inseparable part of the activities of most computer users. P eople use SEs in v arious form s to fi nd inf orm ation in a wide v ariet y of con- texts: from W eb searc h through desktop searc h and email searc h to searc hing through do cument arc hiv es b elonging to sp ecific domains such as the medical and legal d omains. Dep ending on the information need, find ing the desired information can b e a more or less d ifficult task. The w ell-kno wn vo c abulary mismatch problem is one significant factor that mak es searching difficult. A user’s query Q and a useful do cu m en t D in a do cument collection ma y use different v o cabulary to refer to the same concept. Retriev al systems that r ely on k eyw ord-matc hing ma y not detect a matc h b et w een Q and D . A goo d retriev al sys tem must br idge the p oten tial v o cabulary gap that exists b et w een useful do cum en ts and the user’s query . Query Expans ion (QE), the addition of related terms to a user’s query , is one imp ortan t tec hnique that attempts to solv e this p roblem by increasing the lik eliho o d of a matc h b et w een the query and relev a nt do cuments. Most la y us er s prefer to ke ep their int eraction with a retriev al system simp le. Th us, most QE metho ds are completely automatic and in vo lve little or no additional effort on the part of a user. Of course, a completely automatic QE metho d may end up adding unrelated terms to a user’s query , thus c hanging the qu ery’s fo cus. T h is is kno wn as query drift . In su c h cases, Q E causes p erform an ce to deteriorate rather than impro v e. ∗ dipasree.pal.gmai l.com; Corresponding author. F ax.: +91 33 2577 3035; T el.: + 91 33 2575 2858. † mandar.mitra@gmail.co m; F ax.: + 91 33 2577 3035; T el.: +91 33 2575 2858. ‡ samar bhattachary a@ee.jdvu.ac.in; F ax.: +91 33 2577 3035; T el.: +91 33 2414 6129 1 -100.00 0.00 100.00 200.00 300.00 400.00 300 320 340 360 380 400 420 440 460 %-age change in MAP Query no. QE1 QE2 Figure 1: V ariabilit y of QE tec hniques across qu eries Ov er the y ears, man y d ifferen t QE tec hniques ha ve b een prop osed. A recen t surv ey of suc h tec hniques can b e found in [Carpineto and Romano, 2012]. While a n umb er of QE tec hniqu es ha v e b een sh o wn to b e effectiv e on a v erage (i.e. wh en their o v erall impact across a large set of queries is m easur ed), the effect of different QE tec hniques for individual queries can v ary greatly . Figure 1 mak es this p oin t graph ically 1 . The p oin ts on the X-axis repr esent ind ividual queries; the Y-axis denotes the relativ e impro ve ment in p erf ormance obtained for eac h query b y using query expansion. The lines lab ell ed QE1 and QE2 corresp ond to tw o differen t QE tec hniques. P oin ts on QE1 (resp. Q E2) that lie ab o v e the X-axis corresp ond to queries for whic h this expans ion metho d improv es p erformance, while a p oint b elo w the X-axis corresp onds to a q u ery for which the metho d hurts retriev al effectiv eness. T able 1 shows the Mean Averag e Precision (MAP) s cores for th r ee retriev al metho ds: a b aseline strategy that u ses the original, u n expanded queries, QE1 and QE2. QE1 and QE2 are clearly sup erior to the b aseline on a v erage. This reinforces the claim ab o v e that Q E tec hniques often impro ve o v erall p erformance. Ho w ev er, it is clear f rom Figure 1 th at the imp act of QE1 or QE2 v arie s greatly across queries. S p ecifically , QE1 and Q E2 result in decreased p erformance for a n umb er of qu eries. Also, wh ile th e o v erall p erformance figures for Q E1 and QE2 are comparable, eac h of these m etho ds outp erforms the other on ab ou t half the queries used in this exp eriment. The hyp othetical p erformance that w ould b e obtained if one could predict in adv ance th e most effectiv e tec hn ique for a query — no exp an s ion vs. QE1 vs. QE2 — is sho wn in the last column (MAX) of T able 1. Notice that suc h a capabilit y w ould lead to nearly 35% imp ro v ement in retriev a l effectiv eness. Baseline QE1 QE2 MAX MAP 0.1842 0.2 191 (+ 18.95%) 0.2183 (+ 18.51%) 0.2473 (+ 34.26%) T able 1: P oten tial im p ro ve ment obtainable in principle by jud iciously c ho osing QE tec hniqu es In their o v erview of the NRR C Reliable Information Access (RIA) W orksh op , Harman and Buc kley [2009] make a similar p oin t: “it ma y b e m ore imp ortan t for r esearc h to disco v er what current tec hniques sh ould b e app lied to whic h topics, r ather than to come up with n ew tec hniqu es”. 1.1 Problem statemen t In this stud y , w e consider the imp ortant problem of pred icting the most effectiv e QE tec hnique for a giv en query (includin g the p ossibilit y that not expand ing certain queries ma y b e m ost effectiv e). W e explore one p ossib le approac h to this question. W e examine a num b er of d ifferen t criteria th at can b e used to classify queries. F or eac h query category , w e discuss what QE tec hniques (or more generally , what q u ery pro cessing tec hniqu es) might b e most effectiv e. Our even tual goal is to fin d metho d s that can automatically (or semi-automatica lly , i.e., with some assistance fr om a user) classify a giv en qu ery in to one (or sometimes more) of seve ral 1 Details ab out the dataset and th e techniques used to generate th ese plots can b e found in the Ap p endix. 2 pre-defined categories. W e will then apply th e QE m etho d th at is m ost app r opriate for this catego ry . Our h yp othesis, supp orted by T able 1 and Harman and Buckle y [2009], is that ov erall p erform an ce should impr o v e if we apply QE tec hniqu es sp ecifical ly tailored to a giv en query , rather than applying the same general QE metho d to all queries. 1.2 Outline The rest of this rep ort is organised as follo ws. S ection 2 d iscu sses relate d w ork and its relatio nsh ip to this study . Section 3 p resen ts a taxonom y of quer y categories. F or eac h category , w e p ro vide examples of queries b elo nging to that class. W e also discuss QE techniques that are likel y to b e most effectiv e for that category . Details ab out the data p r esen ted in this Section are give n in App end ix A. W e conclud e in Section 4 by p resen ting a su mmary of the work done along with a roadmap for future w ork. 2 Related W ork Related work can b e broadly classified in to three catego ries: researc h related to qu ery categori- sation, prior wo rk on quer y exp ansion, and researc h on query p erformance prediction. 2.1 Query classification Automatic qu er y categorisation (QC) is a we ll-kno wn problem that has b een s tudied f or man y y ears in b oth the Information Retriev al and Mac hine Learning comm unities. QC is u sually treated as a multi-cl ass catego risation problem. I t is qu ite differen t from normal text categori- sation, since queries are not as long as text do cuments. Differen t t yp es of query classificatio n appr oac hes ha ve b een defined according to the p urp ose that the classification is intended to serv e. A well- known classification of W eb queries [Bro der , 2002] uses thr ee categories: informational, navi gational, and transactional. Na vigational queries are en tered by users lo oking for a sp ecific w ebsite, whereas in formational queries co v er a broad topic, for whic h there are typically many r elev ant do cuments. T rans actional queries ha ve commercial / transactional purp oses. T ransactional qu eries or q u eries with commercial inte nt are further classified in [Ashk an and C larke, 2009] dep end ing on wh ether the user h as “on-line commercial in ten t” (i.e. inte ntio n to pur c hase a p ro du ct or utilise a commercial service). Nat ur ally , these catego ries are not applicable to general-purp ose qu eries that ha v e n o commercial in ten t. On a somewhat related note, Baeza- Y ates et al. [2006] classify W eb queries according to whether they are in formational, non-inform ational or ambiguous. Another traditional approac h classifies queries according to the d omain or s ub ject area targeted b y th e query . F or example, the K DDCUP comp etitio n 2005 2 [Shen et al., 2006] fo cus ed on a W eb query classification task. This task defined 67 qu ery categories organised into a h ierar- c hical taxonom y , f or example Computers / Se curity , Computers / Softwar e , Entertainment / Celebrities , Sp o rts / T ennis . A s ingle query could b elong to multiple categories. F or example, relev ant do cum ents for the query “Beijing 2008”, ma y b elong to the f ollo wing domains: Sp orts / Olympic Games , Information / L o c al & R e g ional , Living / T r avel and V ac ation and Infor- mation / L aw and Politics . T hus, th is query b elongs to m ultiple categories. Comp etito rs we re required to classify 800,000 real us er quer ies int o th e 67 categories. Out of these queries, only 800 queries (randomly c hosen) w ere lab eled manually , among whic h 682 queries b elo nged to m ultiple categories [C ao et al., 2009b]. Beitzel et al. [2004] rep orted 16 categories of W eb queries. These query classes are also based on the sub ject d omain of relev ant d o cuments, lik e music, games, enterta inment, computer, health, 2 http://www .acm.org/sigs/ sigkdd/kdd2005/kddcup.html 3 US-sites. Th e authors analysed W eb tr affic on an hourly basis using these query t yp es. T h ey sho we d that m usic related queries co v er 50% of the total queries, while queries targete d at US-sites co v er 35% qu eries, and qu eries related to en tertainmen t comprise 5% of the o v erall query set. Cao et al. [2009a] also classify W eb qu eries in to 17 groups with the aim of imp ro ving p ersonalised searc h, but d etails ab out these query group s are not av ailable. In r ecen t times, query classification has b ecome a particularly imp ortan t pr oblem since most W eb searc h engines earn their rev en ue via targeted advertisemen ts provided alongside searc h results. Gabrilo vic h et al. [2009] classify queries on to a fin e-grained commercial taxonom y with appro ximately 6000 no des, arranged in a hierarch y with median depth 5 and maxim um depth 9. T he k ey id ea in this app roac h is to determine the class of a query by classifying the searc h results retriev ed for that query . Apart from the targeted domain, queries may also b e classified b ased on certain features, for example, (i) ambiguous queries, (ii) short queries, and (iii) har d or difficult queries. Xu et al. [2009] classify qu eries in to three categories, based on their relationship with Wikip edia topics. These categories are: (i) queries ab out sp ecific en tities; (ii) ambiguous q u eries; and (iii) all other queries. Hard queries w ere studied in the Robust T rac k at TRE C 3 [V o orh ees , 2003b]. Su bstan tial w ork has also b een d one on queries that ha ve m ultiple asp ect s [Harman, 1988, Buc kley, 2009 ]. In the next section (Section 3), we d iscuss v arious criteria for qu ery classification, including some of th e cr iteria m en tioned ab ov e. While some of these query t yp es hav e b een defined by other researc hers in earlier work, we sp ecifically inv est igate the relationship b et w een qu ery categories and ap p ropriate QE strategies. 2.2 Query p erformance prediction Query p erformance p rediction ma y b e regarded as a sp ecia l case of the query categorisation problem, in which the ob jectiv e is to classify a query as b eing either hard or easy for a giv en retriev a l system. Cr onen-T o wnsend et al. [2002] w ere among the earliest to study the Qu ery P erformance Prediction (QPP) problem . They defined the Clarity Sc or e as the r elativ e entrop y b et wee n a query language mo del and the corresp ond ing collection language mo del. Th is s core is intended to measure the ambiguit y of a query with resp ect to a do cumen t collection. Th e authors show ed that the Clarity Score is p ositiv ely correlated with a v erage precision (a standard ev a luation measure) on a v a riet y of b enchmark d atasets. QPP metho ds ma y b e classified into t w o broad categories. 1. P r e-r etrieval metho ds . T hese m etho ds (e.g., the metho d prop osed in He and Ou nis [2006]) use only th e in itial query , and term statistics from the target do cumen t corpus collected at indexing time. In particular, n o preliminary retriev a l results are n eeded. Hauff et al. [2008] presen t a s urve y of pre-retriev al QPP metho ds. 2. P ost-r e trieval metho ds. These metho d s additionally mak e use of the r esults retriev ed in resp onse to the initial query , us u ally by analysing the similarit y scores for the retriev ed do cuments. Th e prediction metho d pr op osed b y Sh tok et al. [2009, 2012] ma y b e regarded as a represent ativ e p ost-ret riev al metho d. It is based on the hypothesis that “high standard deviation of retriev a l scores in the result list correlates with reduced query-drift, and consequen tly , with impr o v ed effectiv eness.” A go o d introd uction to w ork in this area can b e foun d in a monograph by Carm el and Y om-T o v [2010]. Th e monograph p ro vides the bac kground and motiv ation for the QPP pr oblem. It co ve rs pre-retriev al and p ost-retriev al m etho ds, as well as metho ds that com bine these tw o app roac hes. 3 http://tre c.nist.gov 4 Finally , it also discu s ses applications of query d ifficult y estimation. A more up -to-date o v erview is pro vided in a tu torial b y C arm el an d Kurland [2012]. Recen tly , K u rland et al. [2012] hav e pr op osed a probabilistic framework for QPP that u nifies v ario us earlier, apparently dive rse app roac hes [Cronen-T ownsend et al., 2002, Vinay et al., 2006, Y om-T o v et al., 2005, Zh ou and Croft, 2006 , 2007]. Sond ak et al. [2013] generalise this fr ame- w ork by mod elling the u ser’s actual inform ation n eed (as repr esen ted by the query). T h eir frame- w ork mak es it p ossible to int egrate pre-retriev al, p ost-retriev al, and query-represent ativ eness based predictors. 2.3 Query expansion A great deal of w ork has b een done on QE. Carpin eto and Romano [2012] pr o vide a comprehen- siv e and up-to-date survey of v a rious automatic QE tec hniques. In earlier wo rk on QE, we fin d that e xp ansion terms (i.e. the terms that are ad d ed to the original quer y ) are generally selected from 3 t yp es of sour ces. On the basis of the source of exp ansion terms, QE s trategies can b e divided in to the follo wing groups. • L o cal: “Lo cal” QE tec hniqu es select candidate expansion terms from a set of d o cuments retriev ed in resp onse to the original (unexpanded) query . Ideally , expansion terms should b e d ra wn from some initially retriev ed r elevant d o cuments. S ince these do cuments are relev ant, terms present in these do cu m en ts are exp ected to b e related to the query , and should help to retriev e other similar do cuments wh ic h are also lik ely to b e r elev ant. If the user do es not p ro vide any feedback ab out whic h of the initially retriev ed do cum en ts are relev ant, certain simplifying assump tions ma y b e made. Usually , in the absence of user feedbac k, a few top-ranke d do cumen ts are assum ed to b e relev a nt. This is called pseudo r elevanc e fe e db ack (PRF). Th is metho d has an obvious drawbac k: if sev eral of the do cuments assum ed to b e relev ant are in fact non-relev an t, then the words added to the query (dr a wn mostly from these do cument s) are unlike ly to b e u seful expansion terms, and th e qualit y of the do cuments retriev ed usin g th e expanded query is likely to b e p o or. Mitra et al. [1998] prop ose a lo cal expansion metho d that tries to pr ev en t qu ery drift b y ensuring that the query is expanded in a b alanced w a y . Xu and Croft [1996, 2000] pr esen t a metho d called lo c al c ontext analysis that also obtains candidate expansion terms from a few top-rank ed do cuments. Th ese terms are scored on the b asis of their co-o ccurren ce patterns with all of the query terms. Th e highest scoring terms are add ed to the qu ery . Recen tly , Colace et al. [2015] hav e demonstrated the effectiv eness of a new expansion metho d that extracts w eigh ted w ord p airs from relev an t or pseudo-relev an t do cuments. Researchers ha v e also applied learnin g to rank metho ds to select u seful terms from a s et of candidate expansion terms within a PRF fr amew ork [Xu et al., 2015]. • Global: “Global” QE tec hniques select expansion terms from the entire database of do c- ument s. Cand idate terms are usu ally ident ified by minin g term-term r elationships from the target corpus. Qiu and F rei [1993] prop ose a global QE tec hn ique that makes use of a similarity thesaurus . A similarity th esaurus is a matrix con taining term-term similarit y scores as its en tries. These similarity scores are computed based on how w ord-pairs co-o ccur in the d o cuments con tained in a corpu s. Expansion terms are s elected on the basis of a probabilistic m easure of a term’s relationship to the query concept. Jing and Croft [1994] also p r op ose a global tec hnique, called phr asefinder , that is based on term co-o ccurrence inform ation in the corpus. Eac h term T corresp onds to a vect or V T of asso ciated (or co-o ccurrin g) terms . A term T is assigned a similarit y score b ased on the similarit y b et wee n the original query and V T . T h e terms that are m ost similar to the query are selected as expansion terms . 5 Gauc h et al. [1999 ] defin e tw o words as similar if they o ccur in similar c ontexts , where a w ord’s con text is defin ed in terms of its neigh b ouring w ords in a corpus. W ord s that are similar to the query words are selected for inclusion in the exp anded query . Carpineto et al. [2001] us e a com bination of lo cal and global approac hes. Their hyp othesis is that a u seful term will o ccur more frequently in r elev ant do cumen ts than in non-relev an t do cuments or in the whole corpus. V ec hto mo v a et al. [2003] also com bine lo cal and global information in the f orm of long-sp an c ol lo c ates — w ords that significantl y co-o ccur with query terms. Collocates of query terms are extracted from b oth the en tire corpus, as w ell as from a subs et of retriev ed do cuments. The s ignificance of asso ciation b et w een collo cates is estimated using mo dified Mutual In formation and Z score. • E xternal: “External” QE tec hn iques comprise metho d s th at obtain expans ion terms fr om other resources b esides the target corpus. T hese resources ma y include other do cument cor- p ora (including the W eb), linguistic resour ces lik e W ordn et 4 , and user-query logs. Li et al. [2007] use Wikip edia 5 as a sour ce of expansion terms. Given an initial qu er y , Wikip edia pages are retriev ed an d reranke d on the basis of Wikip edia catego ry information. The “b est” w iki pages p ro vide terms for inclusion in the expanded query . Xu et al. [2009] also u sed Wikip edia as a source of expansion terms. F or eac h query w ord, the related Wikip edia page (if any) is found; terms fr om this page are rank ed, and top-rank ed terms are added to the query . This app roac h needs few p arameter settings, since for eac h term, only one do cument is s elected. V o orh ees [1994] used W ordn et synsets to fi nd terms related to q u ery wo rd s . She sho we d that only the add ition of syn on yms of query words do es not consisten tly imp ro v e p erfor- mance. More recent ly , F ang [2008] show ed that W ordnet-based query expansion can yield go o d resu lts if the definitions (or glosses) of words pr o vided by W ordnet are u s ed instead of simply relying on the seman tic relations defined w ithin W ordnet. A comprehensive su rve y of the uses of on tologies in query expansion can b e found in [Bhogal et al. , 2007]. 2.4 Selectiv e query expansion As m en tioned in the In tro du ction, many of the ab o v e QE tec hniqu es hav e b een sho wn to b e effectiv e on the whole o v er large query sets, ev en though th ey ma y cause r etriev al effectiv eness for individual queries to suffer. Our ev entual goal is to formulate a metho d by wh ic h the t yp e of a give n query is first determined, and an appropr iate exp ansion strategy is then used based on th e query category . In other words, w e hop e to b e able to apply QE tec hniques tailored to individual queries, rather than app lyin g an y particular Q E tec hnique u niformly to all queries. As a sp ecial case of this pr oblem, researc hers ha v e lo ok ed at sele ctive q uery exp ans ion , i.e., the question of whether to expand a query at all. Amati et al. [2004] define an information theoretic measure that indicates, for a giv en query , whether it is lik ely to b enefit from expansion. Th is measure is used to selectiv ely apply QE to only some queries. T h e authors sho w that their approac h w orks b etter than applying QE u niformly across all topics in a test collection. S imilarly , Cronen-T o wnsend et al. [2004 ] sho w that a comparison b et w een language mo d els constructed on the b asis of the resu lts retrieve d by the u nexpanded and a giv en expan d ed query can b e u sed to predict wh ether expansion h as resulted in altering the sens e of the original query . In such cases, QE should b e a v oided. This idea wa s sh o wn to b e effectiv e in imp r o ving the robustn ess of expansion strategies. 4 http://wor dnet.princeton .edu 5 http://en. wikipedia.org 6 3 Query t yp es As d iscussed in Section 2.1, queries ma y b e classified into a wid e v ariet y of query t yp es. Th us far, customising online advertising and searc h result pr esentati on has b een the main m otiv ation b ehind query classification: searc h engines may tailor the form at of the resu lts p age or the adv ertisemen ts displa ye d in resp onse to a q u ery according to its ca tegory . O ur goal in this study is to fo cus on qu ery t yp es fr om a Q E p ers p ectiv e. In other words, we are interested in classification criteria that are lik ely to h a v e some relation to qu ery expansion. The t yp es we consider are not m utually exclusive. O u r in tent ion is that the retriev al system (or the user) w ill decide the (p ossibly m ultiple) categorie s that a particular query b elongs to, and then s elect th e appropriate QE metho d for th ese categ ories. T able 2 lists the query catego ries th at we are int erested in, along with very brief descrip tions. Some of these catego ries can b e determined automatical ly , while for some, a user’s in puts ma y b e required (these categories are mark ed M). In some cases, it ma y b e difficu lt to categorise queries b efore an initial retriev al (and ev aluation). F or example, to kn o w if a query is hard or not, we need to examine the in itial r etriev al results. Generally , we need to expand the queries only if we are not satisfied b y the initial retriev al. The follo wing sections discuss these categories in more detail. No. Name Characteristics 1 Short query F ew query terms 2 Hard query Lo w av erag e precision 3 Am biguous query Meaning of qu ery n ot clear 4 Query con taining negativ e terms Presence of negation 5 Query inv olving named entitie s Named en tities in query 6 Multi-asp ect quer y Qu er y con tains m ultiple su b-topics 7 High-lev el qu ery Query u ses abstract terms 8 Recall-o riented quer y (M) User requir es all/man y relev an t do cumen ts 9 Con text implicit in quer y (M) Mea ning of query determined by context 10 Domain sp ecific query (M) R elated to a particular domain 11 Query needing short answer (M) S p ecific answer needed 13 Qu ery needing sp ecial pr o cessing (M) Sp eci al ind exing tec hniques may b e required Multilingual query (M) Quer y uses more than one language Noisy query (M) Qu ery conta in some textual error T able 2: Query categories 3.1 Short / long queries A q u ery ma y b e classified as sh ort or long based on the num b er of terms or ke ywords that it con tains. In order to m ake this notion concrete, we adopt the follo wing definitions. • short queries: qu eries con taining fewer th an four wo rds • long queries: queries conta ining more than ten terms These definitions may b e regarded as rather arb itrary ; ho w ev er, they are only in tended to b e indicativ e. If a query consists of a single n amed en tit y that is f ou r wo rds long, it should r eally b e regarded as a short qu ery . It is generally b eliev ed th at casual u sers tend to form ulate sh ort qu eries, while more exp erienced or p rofessional searchers f orm ulate longer queries that b etter r epresent th eir in formation need. 7 Queries p ro vided by v arious test collect ions (e.g., TREC “topics”) 6 usually h a v e b oth a s hort and a long version. They typica lly consist of a title , a description and a narr ative . Th e title fields of these qu eries are short, since th ey are mostly in tended to mo d el qu eries created b y casual user s ; the d escriptions are longer. The Narrativ e section is only in tended to provide a detailed sp ecifica tion of what th e user deems r elev an t; it should generally not b e u s ed as a sour ce of k eyw ords. T able 3 sho ws the maximum and minimum lengths (in w ords) of different parts of TREC q u eries. Num b er Qu ery field Maxim um length Minimum length 1 Title 21 1 2 Desc 46 5 3 Narr (400 queries hav e na rr) 129 14 T able 3: TREC queries 1-450: qu ery length in w ords T able 4 shows th e d istribution of the length of th e title field of TRE C queries 1–450 (queries 201–2 50 are omitted f rom this table s in ce they do not cont ain a title field). W e can see from the table that more than half the qu eries con tain no more than 3 w ords. Only o ccasionally are they any longer, for example, wh en the title con tains some w ell kn own phrase or a long prop er name. Query length Number of Q u eries 1 18 2 101 3 113 4 50 5 32 6 30 > 6 56 T able 4: Distribution of length of titles of TREC queries 1-200 and 251-450 Benefits of expanding short queries. W e n o w turn to the relationship b et w een th e length of a qu ery and ho w it ma y b e affected b y QE. Giv en their brevity , it is reasonably likel y that a short query is an incomplete repr esen tation of the user’s information n eed. Expand in g a short query is likely to yield a more complete repr esen tation of the u ser’s information need. Th us, the b enefits of QE are exp ected to b e sub stan tial in the case of short qu er ies. On the other hand, a long query is usually a m ore comprehensiv e statemen t of the searc her’s inform ation need. A higher lev el of retriev al effectiv eness can generally b e obtained using long queries, and there is less opp ortunity for QE tec hniqu es to yield d ramatic improv emen ts for such queries. T able 5 illustrates these p oin ts. It sh ows the num b er of queries for whic h a standard QE tec hnique results in b etter / worse p erformance. QE improv es effectiv eness for 98 out of 150 short, title-only qu eries (T). The m aximum improv emen t in MAP ov er all qu eries is as muc h as 0.6016. In con trast, QE yields impr o v emen ts for few er medium (T D) or long (TDN) queries; further, the maximum impr o v emen t obtained is also substantia lly sm aller f or long queries (ab out 0.45). Risks related t o expanding long / short queries. Short queries usually con tain only imp ortant ke ywords. Users generally do n ot include stop-words (w ords suc h as articles, con- 6 App endix A gives an ov erview of the datasets provided by TREC. 8 Query field(s) MAP MAP # queries improv ed # queries hurt (no expansion) (after QE) (b est d ifferen ce) (w orst d ifference) T 0.2181 0.2630 98 (0.6016) 50 (0.340 4) TD 0.2560 0.2693 80 (0.5824) 70 (0.382 7) TDN 0.2567 0.2749 79 (0.4537) 70 (0.332 0) T able 5: Imp ro ve ments d ue to Q E for short / long queries (Query set: TRE C678 (queries 301– 450), IR system: TERRI E R, term-weigh ting metho d: IFB2c1.0, QE metho d: Bo1-based p seudo relev an ce feedbac k (40 terms fr om top ranke d 10 do cumen ts)) junctions, pr ep ositions that ha v e a primarily grammati cal fun ction) in short queries. Th us, short queries are often n ot grammatically well- formed sent ences or ph r ases, but this feature is generally an adv an tag e for many QE tec hniqu es: all qu ery terms can b e assu med to b e inform a- tiv e, and eve ry query term is lik ely to b e imp ortan t during exp ansion. In con trast, long qu eries ma y con tain “w eak” (relativ ely less usefu l / informative ) terms in addition to the im p ortan t k eyw ords. Two examples from the TREC topic set illustrate the imp ortant differences b et w een long and short queries. • O il Spill (n umb er-154) Long: A relev ant do cument will note the lo ca tion of the sp ill, amount of oil spilled, and the resp onsib le corp oration, if kn o wn. This will include sh ipb orne acciden ts, offsh ore d rilling and h olding tank spills, bu t sh ou ld not include inten tional sp ills suc h as Iraq/Kuw ait or leak age from broke n pip es. References to legislation br ough t ab out by a spill, litigation and clean up efforts asso ciated with a sp ill are not r elev ant un less sp ecifics of the spill are included. • Blac k Monda y (num b er-105) Long: Document will state reasons why U.S. s to c k market s crashed on 19 Octob er 1987 (“Blac k Monday” ), or rep ort on attempts to guard against another such crash. The sh ort version of these searc h topics (“Oil Spill”, “Blac k Mond a y”) con tain only k eywo rd s, but they do not prop erly describ e the user’s in formation n eed. In cont rast, the long queries con tain a clear and detailed sp ecifica tion of the user’s requirement in n atural language. Ho w ev er, they conta in a num b er of unimp ortan t or general terms (e.g., relev an t, d o cument, n ote, include, etc.) that would b e inappr op r iate in a k eywo rd -only version of these queries. A t the time of expansion, therefore, s p ecial care is needed in order to identify the stron g terms and to a v oid adding w ords related to w eak terms, since this ma y result in query drift . On the other hand, b ecause a short q u ery conta ins few w ords, it has a greater c hance of b eing am biguous. Compare, for example, the single term q u ery “SVM” with the longer queries “SVM pattern recognition” (in wh ic h SVM refers to Supp ort V ector Mac hines) and “SVM admission criteria” (in whic h SVM expands to School of V eterinary Medicine). Expanding suc h a single- term qu ery b y adding wo rds r elated to the “wrong” sense will also result in query dr ift. F u rther, short queries lie outside the scop e of QE tec hniques that use some form of language analysis. [CIT A TIO N???] Sp ecial pro cessing for v erb ose queries. Most W eb searc h qu eries are also short, b eing generally 2 or 3 wo rds long [Beitzel et al., 2005]. Ho w ev er, o v er the last ten years or so, long, v erb ose queries are b ecoming m uc h more frequent. In 2006 , Y aho o claimed that 17% of its queries con tained 5 w ords or more Gupta and Bendersky [2015b]. Users create long queries for a v ariet y of reasons. A num b er of tec hniques for p ro cessing verb ose qu eries hav e b een p rop osed o v er the y ears. Many of th ese fo cus on automatic metho ds for assigning we ight s to the original query terms that distin gu ish b etw een useful terms and we ak term s Bendersky and Cr oft [2008], Lease 9 Category Examples (TREC query #) r R Remarks Queries for wh ic h there are very few relev ant do cum ents Q303 Q320 Q344 10 6 5 10 6 5 Expanding suc h queries to target the few “needles in the h aystac k” is unlik ely to b e b eneficial in any r eal sense. Queries with sev eral relev ant do cum ents, for which recall is reasonably high, but r a nking is p oor Q374 Q399 Q435 203 37 44 204 102 117 Since the relev an t do cum en ts are retriev ed at p o or ranks, global expansion tec hniques m a y wo rk b etter. Queries with sev eral relev ant do cum ents, but for which r e c al l is p o or Q307 Q336 Q389 25 1 3 215 12 194 Automatic expansion tec hniques are lik ely to b e inapp ropriate for suc h queries. Man ual, interacti ve expan s ion ma y work well. T able 6: T yp es of hard qu er ies with examples from the TREC qu ery collection. R denotes the total n umb er of relev an t do cuments for a query , and r denotes the num b er of r elev an t do cumen ts retriev ed for that qu ery within the top 1000 r an k s . CHECK: WHA T SYSTEM? [2009], Bendersky et al. [2011], Paik and Oard [2014]. F or a comprehens iv e o v erview of these and other approac hes to h andling v erb ose queries, p lease see [Gupta and Bendersky, 2015a]. 3.2 Hard queries W e c haracterise a query as har d if no automatic retriev al m etho d yields go o d p erformance (as measured by Av erage Pr ecision (AP), or by the num b er of relev an t do cuments initially retriev ed, for example) for the query . A num b er of trac ks at TREC ha v e fo cused on hard q u eries. The goal of the Robu st T r ac k [V o orhees, 2003a ] (2003–2005 ) w as to study queries for w hic h p erformance is generally p o or. In 2003 , the topic set for this task consisted of a total of 100 queries. Th e minim um and maxim um num b er of relev ant d o cumen ts for an y topic w as 4 and 115 resp ecti ve ly . The follo wing y ear (2004), fifty new topics (651–700) were created for the Robust T rac k. Later, in its fin al y ear, the Million Query T rac k (2007 – 2009) [C arterette et al., 2009] defined hard qu eries based on the Averag e Av erage Precision (AAP) score for a query , whic h is the av erag e of AP esti- mates for a single query o ve r all sub m itted run s. Difficult y lev els were automatically assigned to queries b y partitioning the AAP score range in to three in terv als: [0 , 0 . 06) ( har d queries), [0 . 06 , 0 . 17) ( me dium queries), and [0 . 17 , 1 . 0] ( e a sy qu eries). These in terv als were chosen so that queries w ould b e roughly evenly distribu ted. Of the three, the h ard category comprises 38% of all queries, and includes all queries for whic h no relev an t do cumen ts were found. T yp es of hard queries. Hard queries ma y b e group ed in to the sub-categories s ho wn in T able 6 b ased on their prop erties. By d efi nition, th e initial retriev al r esults are p o or for h ard queries. In other wo rds , the top retriev ed set con tains more irrelev an t d o cuments than relev an t ones. PRF-based expansion, whic h assumes that th e top-ranked do cum en ts are relev ant, is unlik ely to w ork w ell for suc h queries, and ma y resu lt in severe p erformance degradation due to q u ery drift. F or example, if w e searc h for in formation ab out the T ERRIER IR system using only the term “terrier”, most / 10 all top retrieve d d o cuments ma y b e related to the b reed of dog. In s tead of using PRF, ad d ing the terms “IR” and “system” to th e q u ery manual ly is like ly to yield defin ite imp ro v ement s. On the other hand, for an easy query (e.g., TRE C qu eries 365, 403 and 423), the original query terms are generally go o d enough for retrieving relev an t d o cument s. Thus, most of the d esired do cuments are retriev ed early in th e fir st round, resulting in high AP (AP v alues for the ab ov e queries are 0.8213, 0.8891, 0.74 02 r esp.). As the user is like ly to b e satisfied with the r esults of th e initial retriev al, qu ery expansion sh ould b e done in a f airly conserv ativ e wa y , if at all, i.e., only a small num b er of terms (p ossibly zero) that are strongly related to the original query terms need b e added to the q u ery F or s uc h q u eries, since the baseline AP is high, A QE tec hniques (which may b e mo delled as ha ving an elemen t of sto c hastic error) are m ore lik ely to lead to p erform ance d egradation. Of course, while these catego ries can b e d efi ned easily for a T R E C-lik e test collect ion, earlier w ork discussed in Section 2.2 suggests that automatically differentia ting b et w een these query t yp es is non-trivial in a real-life s etting. The easiest option may b e to hav e the u ser lo ok at the in itially retrieved set and decide w hether a giv en query is hard or easy , and accordingly determine whether expansion is needed or not. 3.3 Am biguous queries According to W ordNet [Miller, 1995], the term ambiguous means “op en to t w o or more in ter- pretations” or “of un certain nature or significance” or “(often) in tended to mislead”. Extending this definition, w e can define an ambiguous query as one whose meaning is n ot clear, or on e whic h adm its of mutilple v alid int erpr etations. W e categorise am biguous queries in to t wo groups (analogo us to the grou p ing in San tos et al. [2015 ]), wh ic h are discussed in th e rest of this section. 3.3.1 Queries con taining p olysemo us words W e fir st consid er queries that are am biguous b ecause they con tain one or more p olysemous w ords, i.e., w ords that hav e m ultiple meanings. F or suc h queries, a matc h with a do cumen t on an am biguous term is only w eakly su ggestiv e of relev ance, since the term may h a v e b ee n used in a differen t sense from the in tended one in the matc hing d o cument. T his problem is more serious if the p olysemous wo rd is an imp ortant keyw ord in the query . Not surpr isin gly , th e TR E C qu er y collection con tains a num b er of p olysemous w ords. A few examples are: • T REC qu er y 350 : health and computer terminals . Th e w ord terminal ma y b e used as an adjectiv e; it may also refer to an airp ort terminal. • T REC query 355 : o cean remote sensing. R emote ma y also b e used as a noun (as in a “televisio n remote”). • T REC query 397 : automobile recalls. R e c al l ma y b e u s ed as a verb, or (less commonly) as the name of a metric. Sanderson [2008] p oints out that a large class of ambiguous words viz., wo rd s and p h rases th at are prop er nouns, or are u sed as such, o ccur rarely in traditional, TREC-like query collections. The query “apple” is a t ypical example. The w ord apple ma y refer to the f r uit, or the computer compan y , or a n umber of other en tities 7 . The term ‘Jaguar’ 8 is another t ypical example. It could refer to the “b ig cat” that is formally named P anther a onc a , b ut it could also refer to other ob jects / entitie s of more recen t origin su c h as cars, bands , p ens, 9 or one of sev eral 7 http://en. wikipedia.org/ wiki/Apple_(disambiguation) 8 http://en. wikipedia.org/ wiki/Jaguar_(disambiguation) 9 http://www .jaguarpen.com 11 companies 10 . Acron yms with multiple expansions (e.g., “SVM” discu s sed in Section 3.1), and acron yms that are also v alid words (e.g., FIRE, acron ym f or F orum for Information Retriev al Ev aluat ion) constitute another frequentl y o ccurring class of ambig uous queries. These examples sho w that p olysem y in a language generally increases o v er time, as new concepts ma y b e tagged with wo rd s f r om the existing v o cabulary . How ev er, these classes of p olysemous qu eries h a v e not b een seriously stud ied in p ast researc h on p olysemous qu eries. The p erformance of a s ystem on an ambiguous query dep ends on the target collecti on. Naturally , am biguit y is a concern only if th e collec tion actually con tains the word used in m ultiple senses. If the word is used in only one sense in the target collec tion, then the query is effectiv ely unam biguous for that collectio n. Th is ma y happ en, for examp le, in domain-sp ecific searc h engines (Section 3.10). Approac hes to handlin g p olysemous query w ords can br oadly b e d ivided int o thr ee groups. W ord sense disam biguation (WSD). A very large num b er of studies ha ve fo cused on the general problem of word sense disam biguation [Na vigli, 2009]. A significan t b o dy of work has also b een d one on WSD for IR. Quer ies may b e explicitly d isam biguated by tagging eac h p olysemous query w ord with a sens e co d e which is u tilised when computing query-do cumen t scores. Sc h ¨ utze and P edersen [1995] show ed that a word sense disambig uation algorithm can imp ro ve retriev a l effectiv eness by 7–14 %. Th eir WSD algorithm was applied in conjunction with the standard v ector space mo del for IR. The app roac h was ev aluated u sing the Category B TREC-1 corpus (WSJ sub collec tion). In a later critique, Ng [2011 ] argues that the question of how effectiv e WSD is f or IR remains an unresolv ed qu estion, with differen t researc hers rep orting contradict ory fi ndings. He s h o w ed th at man y stu d ies that hav e demonstrated a p ositive impact of WSD on IR h a v e made us e of small datasets, or w eak baselines. It is generally agreed, h o w ev er, that p olysem y is a more serious problem for sh ort qu er ies; it is also generally agreed that in situations where WSD helps IR, an increase in WSD accuracy has a p ositi ve imp act on IR effectiv eness [San d erson, 2000, Na vigli, 2009]. Implicit WSD. Retriev al metho d s ma y also mak e us e of implicit disambiguati on m etho ds. F or example, consider the query “erosion of r iv er banks caused during rain y season”. Ev en though the term “bank” is p olysemous, a do cumen t D that con tains the w ord in its inte nd ed sense is more likely to also contai n the terms er osion , river , or r a in , as compared to a d o cument D ′ that uses th e w ord in the sense of a financial institution. Most reasonable retriev al mo dels will fa v our D ov er D ′ , thus au tomatically “selecti ng” th e correct sens e of b ank . In other words, the in tended sense of a p olysemous word within a long query ma y b e automatically fa v oured b ecause of the add itional con text pro vided b y the other query terms (this p oint was also d iscussed in Section 3.1). Additional con text may also b e p ro vided by the earlier queries issu ed by the user within the same session. C ao et al. [2009b] use Conditional Random Fields to mo d el this con text, and sho w that incorp orating session in formation often improv es qu ery disam biguation. Searc h result div ersification. The third app roac h to han d ling am biguit y , sp ecially in the case of sh ort queries, is searc h result div ersification (SRD) [Santo s et al., 2015 ]. In SRD, the goal of a system is to present a r esult list that con tains do cumen ts group ed according to the v ario us p ossible interpretations of the given query . Th is allo ws the user to select the resu lts corresp ondin g to the app ropriate s en se of th e query . The user’s feedb ac k ma y b e us ed to expan d the query , k eeping in m ind its in tended sen s e. YIPPY 11 is an example of a real-life searc h en gine th at attempts a form of SRD. It presents a 10 http://www .jaguarind.com /aboutus/aboutus.html , http://www.jag uarltg.com/ 11 http://yipp y .co m 12 TREC query # Query title P ossible in terpretat ions Q260 Evidence of human life during a p articular p erio d in history ? in some geographical lo cations (e.g., desert islands)? Q364 Rabies p articular c ases and c orr e ctive action ? whic h animals are carriers? signs, symptoms, pr ev en tion, treatmen t? o v erview / encyclop edic ent ry? Q376 mainstreaming of childr en with physic al or mental imp a irments ? of physical ly d isabled p ers on s in general? of tribal / marginalised communities? of juvenile delinquents? T able 7: Examples of under s p ecified queries. The int erpr etation in italics is the one sp ecified in a longer v ersion of th e qu ery (sp ecifically , in the d escription field). rank ed list of links as usual, but also provides an automatically generated list of “clouds”, eac h of whic h corresp onds to a p ossible sense of the qu ery term(s). Expansion of queries con taining p olysemous words. Before expansion, q u ery terms need to b e disam biguated, either explicitly via WSD, or imp licitly . Disam biguation is particularly imp ortant when exp anding qu er ies using resources lik e W ordNet or Wikip edia. Since these re- sources hav e b road co v erage, expansion without prior d isambiguation ma y result in the inclusion of many terms related to irrelev an t senses of the qu ery term(s). Indeed, the failure of traditional W ordNet-based Q E app roac hes has b een attributed to this problem (citation?? Ch . V o orh ees). If d isam biguation is n ot p ossib le, then interac tion with the user is n eeded. Apart from query WSD, WSD may also b e applied to do cumen ts, bu t this practice is computa- tionally exp ensive , and th us not wid espread in practice [citation??]. 3.3.2 Undersp ecified querie s A user strongly fo cussed on a particular asp ect of a topic may b e temp orarily oblivious to other asp ects of the topic when searc hing for information. Th us, the user ma y not sp ecify whic h particular asp ect related to the s earc h k eyw ord(s) she is in terested in. Alternativ ely , she may not b e able to think of a pr ecise f ormulation for h er information need on the spur of the m omen t, and may p r o vide only a br oad sp ecifica tion of the topic of interest. In s uc h cases, the user’s information n eed m a y remain unclear from the query words, ev en when th ese w ords are not p olysemous. T able 7 sh o ws a few p ossible interpretatio ns of some TREC qu eries that are of this kind. Giv en th at suc h queries are op en to m ultiple in terpretations b y h umans, some lev el of user in teraction or true relev a nce f eedbac k is lik ely to b e unav oi dable in order to obtain satisfactory results from a searc h engine. If the d o cuments retriev ed in resp onse to the initial query tur n out to b e satisfactory , simple PRF is likel y to b e b eneficial (but, of course, n o further QE may b e necessary). Otherw ise, the b est option for the retriev al system ma y b e to present a div ersified set of results (as discu ssed in Section 3.3.1). The user can then pr o vide some feedb ac k by mark in g do cument sets or ind ivid ual do cuments, or ma y simp ly select one of the sets, if ap p ropriate. Recen t researc h has explored the p ossibility of obtaining implicit feedbac k via eye trac king or other n eu ro physiolog ical signals [Eugster et al., 2015, Gonzlez-Ibez and S hah, 2015]. In a 13 TREC Query # Query title Implicit con text 269 F oreign T r ade Lo cation (foreign == countries other than the US) T able 8: Examples of queries contai nin g implicit con text. researc h setting, this ma y inv olv e p lacing p otentia lly in trus ive / b others ome sen s ors, bu t with progress, n on-in trusive means of obtaining feedb ac k are lik ely to emerge. In such cases, a system ma y b e ab le to obtain feedback d ir ectly from the n atur al neuro physiolog ical signals emitted by the user (e.g., her facial expr essions), without requirin g an y explicit action on h er part. 3.4 Con text implicit in queries Quite often, when a p erson in a particular situation conv erts an information n eed to an actual query , e.g., “national elections”, sh e m ay not b e consciously a w are that the qu ery may hav e a v ery differen t int erpr etation for someone in a different situation. Th us, the inten t of suc h queries b eco mes clear on ly wh en additional information (e.g., nationalit y , gender, lo cation, time at wh ic h query was sub mitted, demographic information) ab out the user is kno wn. W e refer to this additional information as c ontext ; such queries may b e termed implicit-c ontext queries. Bai et al. [2007] furth er differentia te b et w een c ont ext ar ound and c ontext within a query . In their terminology , a u ser’s domain of interest, her bac kground kno wledge and preferences comprise the con text aroun d a query . In th is section, we use the word context in this sense. In con trast, the con text w ithin a query refers to the sense-disambiguating effect of the query wo rds when tak en together (as discussed in Section 3.3 un der Implicit WSD ). F or example, this “in ternal” con text determines that th e w ord pr o gr am in the query J ava pr o gr a m is related to the wo rd c om puter , but this r elationship d o es not h old if the query is TV pr o gr am . Bai et al. show ho w b oth kinds of con text information ma y b e integrate d in to a language mo deling appr oac h to IR. They r ep ort promisin g exp erimen tal results on the TRE C collectio ns. T able 8 lists examples of su c h queries tak en from the T REC query collection. Th e p er s ons w ho create the TREC topics are based in the US A. Th us, the con text implicit in Q269 implies that ‘foreign’ means countries other than the USA. T he same quer y would b e inte rpr eted different ly if it were to o ccur in the CLEF / FIRE / NTCIR query collect ions. Since implicit-con text queries admit of m utilple v alid in terpretations, th ey are related to ambiguous queries. Unlik e the creators of T REC topics, the o ve rwh elming ma jorit y of W eb searc h engine u sers are not trained information-seeking p rofessionals. Th us, imp licit-con text queries are encountered far more frequ en tly b y W eb searc h engines. In order to impr o v e retriev al effectiv eness for such queries, researc hers ha v e fo cu sed on p ersonalised searc h [Jeh and Widom, 2003, Liu et al., 2004], and the u s e of con textual information dur ing searc h [Coyle and Smyt h, 200 7 ]. While some systems explicitly capture or ask for con textual in f ormation [Bharat , 2000, Glo v er et al., 2001], others guess the con text from a u s er’s actions [Budzik and Hammond, 2000, Fink elstein et al., 2001], or from query logs [Huang et al., 2003]. 3.5 Queries in v olving common nouns or named en tities Generally , user-queries con tain a significant prop ortion of n ouns [Xu and Croft, 2000]. Th ese nouns ma y b e either named en tities (NEs) — n ames of p ersons, places, organisations, etc. — or common nouns. Queries containing named en tities. Man y TREC qu eries con tain NEs, e.g., K ing Hussain (Q450), b ab e ruth (Q481), b altimor e (Q478), Antar ctic a (Q353), A T&T (Q028), and Smithsonian 14 TREC query # Query title P ossibly relev ant snipp ets Q109 Find Innov ativ e C ompanies S on y wa s the fi rst to int ro d uce a video cassette format . . . Q172 The Effectiv eness of Medical Pro ducts and Related Programs Utilized in the C essation of S moking. Nicorette provides nicotine gum and nicotine lozenges to h elp you q u it smoking. Q194 The Amount of Money Earn ed by W riters J.K. Ro wling has b een paid around three quarters of a billion dollars by W arner Brothers . . . T able 9: Examples of queries contai nin g common nouns. Institute (Q686). These are usually an imp ortan t (often the most imp ortan t) comp on ent of the query . T h us, it ma y generally b e assum ed that an article should con tain the NE in order to b e relev an t. Con versely , the p r esence of th e NE in a do cumen t is a r easonable indicator of its relev an ce. Queries that are fo cussed on an NE are often relativ ely easy . If the query is expand ed nev ertheless, the relativ e imp ortance of the NE with resp ec t to other qu ery terms should b e main tained in the expand ed query . Ho w ev er, if the NE itself is ambiguous (e.g., Michael J or dan could refer to one of s ev eral dis- tinct well- known p ersons 12 ), then the issues discuss ed in S ection 3.3 need to b e addressed. An additional issue that ma y arise is the follo wing. A do cumen t con taining the NE will usu ally also con tain a num b er of pronoun s referring to the NE. Anaph ora or coreference resolution — the pr o cess of iden tifying pr onominal references or alternativ e names f or a named en tit y — ma y therefore b e useful. Queries containing common nouns. It ma y b e m uch harder to obtain satisfactory results if an imp orta nt asp ect of th e query is sp ecified via a common noun. T able 9 sh o ws a few examples of TREC q u eries b elonging to this category . The words “Companies” an d “W riters” are common n ouns. It is entirel y lik ely that relev an t do cuments for these queries will con tain the names of sp ecific companies or authors, rather than the corresp ondin g common nouns in their s u rface forms. Thus, d u ring expans ion, suc h queries should b e hand led differently from qu eries con taining named entiti es. In some cases, expanding common noun s in the original quer y using names of sp ecific ins tances ma y b e usefu l. F or example, the term ‘writers’ may b e expanded b y addin g the names of some p opular writers. The expanded query s h ould b e appropriately stru ctured, for example, by including the names as a list of disjuncts along with the term writer . This presup p oses access to approp r iate on tologies or gazett eer lists that p ro vide, for example, a list of auth or or company names. If su c h resources are a v ailable, it would b e more efficien t to use these durin g indexing, i.e., do cuments that con tain sp ecific author names could b e tagged with the terms writer or author . The system also needs to address the additional issue of selecting wh ic h common nouns are to b e expanded, since expand ing an y common noun pr esen t in the query m ay n ot b e a go o d id ea. In terestingly , Buc kley [2009] pro vides an example of a query that b elongs to this category ev en though it con tains an NE. T REC topic 398 ( Identify do cuments that discuss the Eur op e an Con- ventional Arms Cut as it r elates to the dismantling of Eu r op es arsenal. ) turns out to b e prob- lematic b ecause the w ord ‘Eur op e’ is to o general; relev ant do cuments are likel y to discuss mo ve s made b y sp ecific Europ ean countries tow ards disarmament . 12 https://en .wikipedia.org /wiki/Michael_Jordan_(disambiguation) 15 TODO: killer b ee example 3.6 Queries con taining negativ e terms Sometimes, u sers ma y b e able to anticipat e the t yp es of irrelev ant d o cuments that ma y b e retriev ed in resp onse to a giv en query . In such situations, a user ma y wa nt to pr o vide a d etailed statemen t of her information need that also explicitly sp ecifies what th e user is not lo oking for. An y keyw ords that are used to c haracterise irrelev ant information are referred to as ne g ative terms . Consider the query “terrorist attac ks on the US other than 9/11” . Since th e user h as explicitly sp ecified that she is not lo oking for information ab out the 9/11 attac k, this term sh ould b e coun ted as a negativ e term for this query . Likewise, if a us er is lo oking for lo cal restaurants b esides those that s er ve Chinese fo o d, she ma y subm it “restaurants n ot servin g Ch inese fo o d” as h er query . F or this query , Chinese w ould count as a negativ e term. T h is example is m ore complex, ho w ev er, since serving and fo o d should pr ob ab ly n ot b e counte d as negativ e terms, ev en though the n egation qualifies these terms synactically . T able 10 sh o ws some examples of TREC / INEX queries that con tain negativ e terms. During expansion, queries that con tain negativ e terms need to b e handled carefully . If the negating qualifiers are ignored (as they us ually are), QE is lik ely to add terms related to topics that are explicitly d esignated as irrelev an t, leading to a drop in p erf ormance. If the negativ e terms can b e iden tified, then th ey ma y simply b e remo v ed from the original quer y . A more aggressiv e approac h would b e to include the negativ e terms in a NOT clause within a stru ctured query . Na turally , for this metho d to work, negativ e terms ha ve to b e iden tified w ith high accuracy . T o the b est of our kn o wledge, app roac hes that try to address what the user d o es n ot wan t h a v e so far f o cused only on the in itial (verbose) qu er ies. F or examp le, Pramanik et al. [2015] p r op ose a metho d to automatically identi fy negativ e terms in verb ose queries and to remov e them b efore initial r etriev al. T his metho d is rep orted to yield imp ro v ements across a n umber of collections and v arious r etriev a l mo d els. W e exp ect th at these improv emen ts will also lead to p ost-QE impro ve ments. 3.7 Multi-aspect queries A multi-asp e ct query is one that s eeks information ab out a particular asp ect of a b roader topic. 13 Multi-asp ect qu eries are b est un dersto o d via examples. Consider the query “T errorist attac ks on Amarnath pilgrims.” One could regard “Amarnath pilgrims” as the primary topic of the 13 This definition of “m ulti-asp ect” ma y appear confusing. Ho w ever, historically , the b road topic and th e particular facet of t he topic that the user is interes ted in hav e b een regarded as the multiple asp e c ts of the query[Mitra et al., 1998, Buckley, 2009 ]. Query # Query title Narrativ e TREC Q124 Alternativ es to T r aditional Cancer Therapies . . . an y attempt to exp eriment with or demonstr ate the efficacy of any non-c hemical, non-surgical, or non-radiological approac h to prev enti ng or cur ing cancer . . . INEX Q419 film starring +”stev en seagal” . . . films play ed b y Steven Seagal, not p ro du ced by him. T able 10: Examples of queries conta ining negativ e terms / asp ects. 16 TREC Query # Query title Asp ects Q100 Con trolling the T ransfer of High T echnolog y 1. High T ec hnology 2. T ransfer 3. Con trolling Q294 Animal h usb an d ry for exotic animals 1. Animal husbandry 2. exotic animals Q299 Impact on lo cal economies of military d o wnsizing 1. military do wns izing 2. lo cal economies 3. Impact Q321 W omen in P arliamen ts 1. W omen 2. P arliamen ts T able 11: Examples of queries con taining multiple asp ects. query . There are v arious su b -topics of this general topic: tra v el r ou tes take n b y the pilgrims, places f or pilgrims to sta y along th e wa y , etc. In this query , the user is inte rested in one sp ecific sub-topic related to Amarnath p ilgrims. TREC query 203, on the economic impact of recycling tires, is a similar example. The broad topic of this query is r ecycling, b ut the u ser is only in terested in the r ecycling of tir es (rather than other material) , and more s p ecifically in the e c onomic imp act thereof (rather than, sa y , the tec hnology in vo lv ed). T able 11 lists a few more examples of multi- asp ect queries fr om the TREC query set. Sometimes, a user ma y designate m ultiple su b-topics of a topic as inte resting. F or a user w ho is in terested in “causes and effects of railwa y acciden ts”, do cuments exclusiv ely d iscu ssing either the causes or the effects of a railw a y accident are generally regarded as relev ant. Suc h queries that are “disju nctiv e” in a sense (but p ossibly conju nctiv e in form) h a v e a br oader s cop e than the examples discussed ab ov e, and are exp ected to b e easier to handle. Multi-asp ect qu eries are usually hard wh en th e multiple asp ects are com bined in a conju nctiv e sense. Buckle y [2009] con tains a detailed analysis of why automatic IR systems f requent ly fin d multi-aspect queries hard. Quite often, AQE metho ds add terms that are mostly related to the general topic of the original query (e.g., r e cycling for TREC Q203 discussed ab o v e). This o v eremph asises one asp ect of the query at the exp ense of the others, and usu ally leads to quer y drift. Ideally , durin g expansion, m ulti-asp ect queries should b e expand ed in a balanced wa y , i.e., using terms related to all (or most) of th e multiple asp ects. This requires systems to b e able to (i) recognise the v arious asp ects of a qu ery , and (ii) to identify wh ic h asp ect(s) of the query a candidate expansion term is related to. Mitra et al. [1998] studied some preliminary metho d s (b oth m an ual and automatic) that try to preve nt query d rift b y en s uring th at the query is exp anded in a balanced wa y . AbraQ, an approac h describ ed b y Crabtr ee et al. [2007], attempts balanced query exp ansion in a W eb searc h setting by first identifying the differen t asp ects of the query , identifying whic h asp ects are und er-represente d in the r esult set of the original query , and finally , identifying expansion terms that wo uld stren gthen th at particular asp ect of the qu ery . Zhao and Callan [2012] also ident ify “problematic” query terms — term s that are probably n ot present in relev an t do cumen ts — on the basis of th e term’s idf, or by the pr edicted p robabilit y of that term occur ring in the relev ant do cumen ts. Th ese query terms are selectiv ely expanded. The final expanded qu ery is a stru ctured query in C onjuctiv e Normal F orm (CNF), w ith eac h conjunct exp ecte d to corresp ond to a query term (or asp ect) an d its synon yms. The authors 17 TREC Query # Query title Abstract concepts 142 Impact of Go v ernment Regulated Grain F arming on In ternational Relations Impact; Go v ernment Regulated; (In ternational) Relatio ns. 352 British Chunnel impact impact. 353 An tarctica exploration exploration. 389 Illegal tec hnology transfer Illegal (other than p eaceful pu rp oses); tec hnology tran s fer (selling their pr o ducts, form ulas, etc.). T able 12: Examples of queries con taining ab s tract or “high-lev el” terms. argue that the u se of CNF ensu res balanced expansion, minimises topic drift, and yields stable p erform an ce across different lev els of expansion. W u et al. [2012 ] p rop ose a d ifferen t approac h within a true relev ance f eedb ac k framew ork that ma y also b e regarded as b eing targeted to w ards balanced expansion. This app roac h attempts to div ersify the set of do cumen ts j udged b y a user. In stead of simply letting the u s er ju dge the top-rank ed r esults retur ned in resp onse to the initial query , the system p artitions the initially retriev ed do cuments into sub -lists, and reranks the do cuments on the basis of the query term patterns that o ccur in them (i.e., whether a do cument cont ains on ly a single term, m ultiple terms o ccurr ing as a phr ase, or in close pr o ximit y , etc.). The do cuments are then presented iterativ ely to the us er for judgment. 3.8 “High-lev el” query Some queries, suc h as those sho wn in T able 12, conta in terms that corresp ond to ab s tract or “h igh-leve l” concepts. Th ese terms ma y n ot thems elves b e present in relev a nt do cumen ts; instead, other more concrete terms ma y b e used to con v ey sp ec ific in stances of the same concept. If one or more suc h abstract terms f orm an imp ortan t comp onen t of an information need, we refer to the corresp ond ing query as a high-level query . ‘Impact’ and ‘effect’ are typical examples of such high-lev el terms. Consider the query “effect of tsunami”, for examp le. Here ‘effect’ is a high lev el term, and r efers to anything that happ ened as a r esult of a tsunami. A r elev ant do cum en t may not cont ain the term ‘effect; instead, it ma y describ e the effect of a tsu n ami using w ords su c h as ‘death toll’, ‘pr op erty damage’, etc. TREC query 389 (“illegal tec hnology tr ansfer”) is another example. The description field of the query asks: “What sp ecific enti ties hav e b een accused of illegal tec hnology transfer suc h as: selling their pro du cts, formula s, etc. d irectly or indirectly to foreign ent ities for other than p eaceful purp oses?” ‘T ec hnology transfer’ is thus an abstract concept. Relev ant do cuments may or may not con tain this term . In stead, they may con tain terms lik e ‘sell’, ‘license’, that d escrib e concrete metho ds of tec hnology transfer. Difference wit h queries in v olving common nouns (Section 3.5). T h ere is a s u btle d iffer- ence b et w een high-lev el queries and queries inv olving common noun s (discus sed in Section 3.5). Consider an example from T able 9 : writers . Th e ‘instan tiation’ of wr iters, i.e., the set of p ersons who are writers, is not dep endent on the query con text. In contrast, an abstract term m a y b e instan tiated via different sets of keyw ords, dep ending on the sub ject or domain of the query . The ‘effects’ or ‘impact’ of a n atural disaster, a foreign tour by a head of state, or of substance 18 abuse are lik ely to b e describ ed using different w ords. Thus, findin g ‘bag-of-w ord’ equiv alen ts of suc h concepts, b eing conte xt-sensitiv e, is more d ifficult. As a result, S Es often fail to retriev e an adequate n umb er of relev an t do cumen ts in resp onse to high-lev el q u eries. F or th e same reason, correctly automaticall y expanding such queries is also c hallenging. Roussin o v [2010] sh o ws that external corp ora may b e mined to obtain words or word s equences (conditionally) related to high-lev el query terms. F or example, in TREC quer y 353, the notion of explor ation may b e indicated by the w ord station , pro vided it o ccurs along with the wo rd Antar ctic a , b ut n ot as a part of a phrase such as tr ain station . 3.9 Recall-orien ted queries In certain situations, recall is of p aramoun t imp ortance to the user. Queries issued by a u ser in suc h s itu ations can b e termed r e c al l-oriente d . T he TREC million query trac k [Allan et al., 2007] defines recall-o riented queries as “lo oking f or deep er, more op en-end ed information whereas precision-orien ted queries are lo oking for a small, well con tained set of facts”. Some typica l recall-orien ted searc h tasks are: 14 • E -disco v ery: searching for d o cuments required for disclosure in a legal case [Oard et al., 2010, Oard and W ebb er, 2013]. • P rior-art paten t searc h: looking for existing paten ts w h ic h might inv alidate a n ew paten t application. • E vidence-based medicine: fi nding all prior evidence on treatmen ts for a medical case. F or these tasks, h a ving to examine several irr elev ant do cuments may b e an acceptable ov erhead, but the p en alty for miss in g a relev a nt d o cument is like ly to b e high. The TREC legal trac k mo dels a recall-orien ted task. Query 100 from this trac k reads: “Sub mit all d o cuments representing or referencing a formal statemen t by a CEO of a tobacco compan y describing a company merger or acquisition p olic y or practice”. Note that the query exp licitly requires al l relev an t do cum en ts to b e retriev ed. This is in con trast to casual, ad ho c searches, in whic h users are generally satisfied by a s m all n umb er of relev an t do cumen ts r etriev ed at the top ranks. T able 13 sh o ws that recall generally increases with the num b er of terms added to a query du ring QE. Thus, for recall-orien ted queries, massive query exp ansion , i.e., expansion by adding a v ery large num b er of p oten tiall y useful terms that o ccur in at least one relev a nt d o cument, ma y b e a go o d id ea. How ev er, th e risk of qu ery drift significantl y increases if m assiv e expan s ion is based on PRF. Relev ance f eedbac k inv olving some user interactio n may b e necessary to ensure high recall without a concomitan t loss in precision. Ghosh and P arui [2015] hav e recen tly prop osed a m etho d that uses th e Clus ter Hyp othesis to effectiv ely lev erage only a m o dest amount of u ser in teraction for high r ecall. 3.10 Domain sp ecific queries Queries wh ich are related to and need information from one sp ecific domain (e.g., s p orts, medicine, law) are called domain sp ecific qu eries. Earlier w ork on classifying queries according to their domain has b een discussed in S ection 2.1. Ov er the ye ars, T REC has offered a num b er of tasks that ad d ress IR from s p ecific domains / genres. T able 14 provides a non-exhaustive list of some of these tasks. Domain-sp ecific qu eries constitute a sp ecial case of V ertic al se ar ch , wh ere the system caters to users in terested in a particular t yp e of online con ten t 15 . V ertical searc hes may f o cus not only on 14 http://www .isical.ac.in/ ~ fire/2011/ slides/fire.20 11.robertson.stephen.pdf 15 https://en .wikipedia.org /wiki/Vertical_search 19 #T erm #rel-ret(among top 1000) recall@1000 MAP 10 8273 0.6686 0.2452 20 8442 0.6784 0.2525 30 8530 0.6851 0.2561 40 8556 0.6891 0.2574 50 8551 0.6901 0.2586 60 8562 0.6906 0.2586 70 8587 0.6922 0.2595 80 8589 0.6927 0.2601 90 8602 0.6938 0.2605 100 8605 0.6943 0.2611 T able 13: Effect of increasing the degree of exp ansion on recall on the TREC678 collection (expansion metho d used: K L D, n o. of top do cu men ts: 40). T rac k name Y ears Domain Legal En terprise searc h Searc hing an organisation’s d ata Genomics Genomics data (br oadly constr u ed to include not just gene sequences b ut also supp orting do cumenta tion suc h as researc h pap ers, lab rep orts, etc. Chemical Information retriev al and extraction to ols for c hemical literature Medical records F ree-text fields of electronic med ical records T able 14: TREC trac ks that fo cus on domain-sp ecific IR. a particular domain or topic, bu t also on a sp ecific media typ e or genre of con ten t, e.g., image or video searc h, sh op p ing, tra v el, and sc holarly literature. Expansion strategy . F or some domains, it should b e p ossible to lev erage d omain-sp ecific lex- ical r esour ces for expansion. F or example, MeSH or the UMLS metathesaurus ma y b e used to expand queries in the biomedical d omain. Hersh et al. [2000] hav e r ep orted on the effectiv eness of using the UMLS metathesaurus for QE. Similarly , Lu et al. [2009] h a v e studied exp an s ion of PubMed queries us in g MeSH. Naturally , in ord er to utilise s uc h domain-sp ecific on tologies, a system should b e able to identify the target domain of us er queries with reasonable accuracy . On the other hand , if the user explicitly indicates the d omain of the qu er y , this not only elimi- nates th e query-classification step, but should also help to reduce an y ambig uity that migh t b e present . In recent work, Macias-Ga lindo et al. [2015] h a v e confi rmed that th e domain of in terest is imp ortan t when qu an tifying the semanti cally relatedness b etw een words. E ven though th eir exp eriments we re not directly related to QE, their fin dings are exp ect ed to b e applicable w hen estimating the seman tic r elation b et wee n the query and cand id ate expansion terms. 3.11 Short answ er type queries Some qu eries n eed v ery sp ecific and ‘to the p oint ’ answers that comprise a few words, a sin gle sen tence, or a s h ort passage. I n suc h cases, the user do es not w an t to read a fu ll d o cument, or a long passage to fi nd the answer. Most queries starting with ‘what’, ‘wh o’, ‘where’, ‘when’, 20 ‘whic h’, ‘whom’, ‘wh ose’, ‘why’ etc. fall in th is category . There are few examples of su c h queries in the TREC adho c dataset, but the qu ery sets for the Q u estion-Answ ering (QA) tasks at T R E C, CLEF and NTCIR consist of these types of qu eries. Systems that effectiv ely address suc h queries usually h a v e the f ollo wing arc hitecture [Pr ager, 2006]. The question is first analysed to determine the ans wer t yp e, an d to generate an appr o- priate k eyw ord query . The keyw ord query is us ed to retriev e a set of passages (or d o cuments) from a corpus. The retriev ed passages are analysed to generate a list of candid ate answers. The candidate answe rs are f urther pro cessed to generate the final rank ed list of an s w ers. Query exp ansion can, and often do es, pla y a role in retrieving passages or d o cuments in re- sp onse to the ke yword query . In one of the b est-kno wn QA systems [P asca and Harabagiu , 2001, Moldo v an et al., 2003], some of the q u estion words are selected as k eywo rds (usin g mainly part of sp eec h information). The original question is parsed to determine d ep end en cies b et w een the qu estion words, whic h are in tur n us ed to order the list of selected k eyw ords. These k eywo rd s are also sp ell-c h ec k ed; sp ell ing v arian ts are add ed to the qu ery if n ecessary . The most imp or- tan t of these ke ywords are u sed to retrieve d o cuments u sing the Bo olean mo del. F rom these do cuments, the system extracts paragraphs or smaller text passages con taining all k eyw ords in close pr o ximit y of one another. If to o many p aragraphs are retriev ed, the qu ery is expanded b y including add itional terms fr om the list of keyw ords; if to o few paragraphs are retriev ed, some of th e k eyw ords f rom the in itial query are d ropp ed. The system also emplo ys QE in a more traditional wa y by u sing W ordNet to expand th e query ke ywords with morphological, lexical and s eman tic alternativ es. 3.12 Queries that need sp ecial handlin g during query pro cessing Query pro cessing generally includes some (or all) of the follo wing steps: stop word r emo v al, stemming, case normalisation, treatment of acron yms and num b ers, h andling sp elling errors, etc. In this section, w e consid er q u eries th at need sp eci al handling during qu er y pro cessing, i.e., queries for which the general (or “standard ”) qu ery pro cessing metho ds w ould result in a loss of some imp ortan t information, whic h in turn wo uld lead to p oor retriev al effectiv eness. Note that th is sp ecial p r o cessing must b e done on the in itial query; the qu estion of whether (or ho w) to expand the query arises later. Indeed, without this sp ecia l pro cessing, initial retriev al effectiv eness ma y b e so p oor that an y subsequent expans ion of the quer y would b e p oin tless. • St op wo rd remo v al. Articles, conju nctions, prep ositions and other fr equen tly o ccurring w ords are discarded as stopw ords b eca use they usually h a v e a grammatical f unction, and are not indicativ e of the su b ject matter of do cuments and queries. The wo rds b efor e and after are t w o examples of such words that are included in the default stopw ord list used b y TERRIER. Ho w ev er, in a qu ery like “increased security measures after 9/11”, the word “after” is an imp ortan t qualifier. Discarding it as a stopw ord du ring ind exing of do cuments and queries ma y cause problems. • C ase normalisation. Man y IR systems r educe all alphab ets to their lo w ercase forms during ind exing. Sin ce pr op er noun s can b e id entified by a starting capital letter, th is case normalisation ma y result in loss of information in some cases. In TREC query 409 ( le ga l, Pan Am, 103 ), th e w ord Am is not actually used as a stopw ord; how ev er, through a com bination of case normalisation and stopw ord r emov al, many systems would incorrectly discard this word from th e query . This pr oblem w ould also arise if the acronym U.S. were written as US . The Sm art system ran in to a related p roblem d u ring the initial years of TREC : b eca use the amp ersand in qu ery 028 ( A T&T’ s T e chnic a l Efforts ) was treated as a word delimiter, A T&T’s w as tok enised as at , & , t , and ’s after case normalisation, and all four tok ens w ere discarded , resulting in v ery p o or p erformance for this qu ery . This problem migh t 21 also o ccur with TREC query 391 ( R&D drug pric es ), w ith R&D b eing toke nised as r and d . • I den tification of num bers. A u ser query ma y con tain n umb ers denoting a yea r, a flight n umb er or something s im ilar wh ic h is an integ ral p art of the information need. Simp ly ignoring num b ers du ring indexing of either do cuments or queries (suc h as TREC query 409 d iscussed ab ov e) ma y hav e a significan t d etrimen tal effect. • St emming. S temming is used to conflate morphological v arian ts of a wo rd to a canonical form, so that a k eyw ord in a qu ery matc hes a v arian t o ccurr ing in a do cument. Whether a query wo rd should b e stemmed or not often dep end s on the q u ery , and more sp eci fically the sense of the query word. F or example, in a query ab out Steve Jobs, the wo rd ‘Jobs’ should not b e stemmed to ‘job’. Similarly , the wo rd ‘apples’ o ccurrin g in a d o cument ab out the fruit sh ould not b e stemmed to matc h the wo rd ‘Apple’ in a qu ery ab out Apple’s marketing strategy for the iPhone. Paik et al. [2013] show that a query-sp ecific stemming approac h is signifi cantly more effectiv e than applyin g a generic stemmer un if orm ly to all quer ies and do cuments in a collection. T o ac hiev e this effect, d o cuments sh ould not b e stemmed at the time of indexing. Instead, a give n query should b e expanded by add in g to it only the desir able v ariants of query ke ywords. • I ndexing phra ses. Th e qu estion of whether to use p hrases — multiple w ords that o ccur con tiguously or in close pr o ximit y an d constitute a single semant ic un it, e.g., blo o d cancer, mac hine learning — during indexing an d retriev al has b een in v estigated in a n umb er of studies [F agan , 1987, Mitra et al., 1997]. Th is q u estion is also tied to the issue of w hether to use ph rases d uring QE. Th e use of phr ases has b een found to generally imp ro v e p erfor- mance, th ou gh its effect is not alw a ys significant . S ong et al. [2006] sho w th at keyphrases extracted from r etriev ed do cu men ts ma y b e useful as exp an s ion terms. Th eir keyphrase extraction algorithm mak es use of the o ccur r ences of stop w ords in the do cumen ts. Thus, in order to use their m etho d in a practical SE, d o cumen ts and queries need sp ecial handling during indexing and retriev al. Multi-lingual query Multilingual queries, i.e. , queries that mak e use of wo rds from more than one language (sa y , L 1 , L 2 , . . . , L k ), are a particular class of queries that n eed s p ecial handling. Such queries [Mustafa et al., 2011] are common in m ultilingual count ries or communities like In dia or the EU. A n umber of factors lead to the creation of m ultilingual queries. • T he amoun t and v ariet y of nativ e language conte nt on the W eb is still r ather lo w for man y languages, e.g., Assamese or Punjabi. An Assamese user ma y b e able to r ead English fluentl y , and is thus likely to kn ow the most imp ortant En glish keyw ords related to her information n eed. A t the same time, s h e may b e unable to fin d appropriate English w ords to completely f ormulate her query in English. F or examp le, consider a u ser who is lo oking for the differences b et w een in terpreters and assemblers. F or such a user, it w ould b e natural to submit a query that mixes the English words interpr e ters and assemblers ) with the Assamese equiv ale nts of differ enc e and the remaining wo rds . • I n a country like I n dia, where the language used at work is often English, u sers ma y not b e familiar with the lo cal equiv alen ts of all tec hnical terms. If suc h a user is sp ecifica lly in terested in a tec hnical or official do cument in h er nativ e language, her n atural tendency w ould b e to searc h using a mix of English and native language w ords. • S ome English terms are v ery commonly used in n on-English-sp ea king r egions. F or exam- ple, in Bengali do cuments, the term ‘recip e’ is more lik ely to b e used than the Bengali 22 equiv al ent ( r and hanpr onali ). In addition, do cuments may use either the Bengali translit- eration of ‘recip e’, or the original Roman f orm of the word. An exp erienced user wh o is a w are of this may in clude all thr ee terms in her query for b ett er recall. When pro cessing a multilingual qu ery , a s y s tem needs to addr ess the follo wing pr ob lems. • Source language identification. If wo rds from multiple languages are present in the query , then their r esp ectiv e languages ha v e to b e iden tified. This is trivial if the languages use distinctive scrip ts, but if an y of the languages in v olv ed shares its script with other languages, the language identificatio n pr oblem b ecomes harder . If d ifferen t inv erted indices are main tained for differen t languages, the system also needs to determine whic h target collect ions need to b e searched for a giv en multilingual qu er y . • T ransliteration. F or a ve ry long time, native language k eyb oards were a rarit y for man y languages. Users of these languages were habituated to using the Roman script when writing in their language. Such habits die h ard, and many u sers conti nue to pr efer using the Roman script to write in th eir language. In order to retriev e do cuments in the original language, the sys tem n eeds to fir s t b ac k-transliterate words from Roman to the nativ e language. Moreo v er, if a query entered by such a user is multilingual, w ord-leve l language identi- fication may b e hard er , since the Roman script is used for all w ords. The p roblem is comp ound ed f urther if, after transliteration, words in the user’s n ative language matc h v al id En glish w ords. F or examp le, Mor e is b oth a v a lid English w ord, and a reasonably common surname in Marathi. Similarly , Shulk is a fictional charac ter and the main pr o- tagonist in a p opular video game; it also means tax in Hindi and other In dian languages. Some of these problems are b eing studied within the “Searc h in the T r ansliterated Domain” trac k at FIRE. This tr ack inv olv es tw o su btasks: (i) given a multilingual qu ery , lab el eac h w ord with its language; and (ii) rev erse-transliterate n on-English words written in Roman script into their n ative script. If these problems are not pr op erly addressed , QE ma y h ur t p erformance. If an imp ortant expansion term in one language happ ens to b e a v alid w ord in another language, the system also needs to carefully consid er the n et b enefit of including su ch an am biguous term in the expanded query wh en r etrieving do cuments fr om a m ultilingual collection. Noisy queries Finally , w e consider queries that need sp ecia l handling b ecause of the p resence of err ors or noise. Suc h noise can b e introd uced b ecause of sp elling err ors committed b y th e u s er, or b ecause q u eries are submitted v ia a noise-indu cing in terface, e.g. sp oke n queries, querying via mobile messaging, and q u eries written by hand using a s t ylus. Since queries in most test collections are metho d ically created by exp erienced or professional users of IR systems, s u c h queries are usu ally free from n oise. T REC quer y 464 — nativitysc enes — is one of the rare TREC queries that cont ain an error. Ho w ev er, query logs of practical searc h engines are lik ely to h a v e large num b ers of such examples. Noisy queries also n eed sp eci al handling, usually sp elling correction. A fair amount of work has r ecen tly b een done on sp elling correction in queries Gao et al. [2010], Duan and Hsu [2011], Li et al. [2012], and a large num b er of p aten ts exist for such tec hniqu es. Some of these metho ds are emplo y ed in many practical W eb searc h engines that are often able to suggest or ev en automatica lly pr o vide corrections for such noisy qu er ies. 23 4 Conclusions and future wo rk Query expansion is a standard tec hnique for addressing the we ll-kno wn vocabulary mismatch problem faced by IR systems. Over the ye ars, a num b er of effectiv e QE tec hniqu es ha v e b een prop osed. Ho w ev er, the effect of different QE tec hniqu es for individ u al queries can v ary greatly . Our long-term goal is to improv e o ve rall p erformance by applying QE tec hniques tailored to a give n qu ery , rather th an applying the same general QE metho d to all queries. T o this en d , w e hav e prop osed a taxonom y of qu ery classes. Not all prop osed query catego ries are new. Ho w ev er, w e hav e sp ecifically considered quer y categorisation from a QE p ersp ectiv e. W e ha v e d iscussed the p rop erties of eac h query class w ith examples. W e hav e also pr op osed some QE s trategies that might b e effectiv e for eac h query category . W e b eliev e that there is significan t scop e for futu re w ork in a careful in vest igation of the most effectiv e QE tec hniques for eac h query class. Our next step will b e to come u p with more precise f ormulations of QE tec hniqu es for the v arious catego ries and to test th ese prop osed tec hniques using standard datasets. While for many query catego ries, suc h testing can b e done using TREC d atasets, a f ew categories p ertain sp ecifical ly to W eb queries. An add itional c hallenge will b e to automatica lly detect the t yp e of a giv en q u ery . This is likely to b e straigh tforwa rd f or some qu ery typ es, bu t we w ill n eed to systematicall y study automatic query classificatio n app roac hes in fu ture work. T o conclude, w e b eliev e that in the recen t future, as the W eb cont inues to gro w, and searc h b e- comes a more and more frequent activit y , IR systems w ill need customised metho d s for ind ividual queries and users. Th e work describ ed in this rep ort is an in itial step in this direction. A TREC: T ext REtriev al Conference Exerimen tal IR, lik e an y other exp erimen tal discipline, dep ends h eavily on the existence of standardised b enchmark datasets, or test c ol le ctions . A test collection in IR is a collection of do cuments along with a set of test queries. The set of relev an t articles for eac h query is also kno wn. T o m easur e the effectiv eness of a tec hniqu e, do cuments are r etrieved u s ing th at tec hnique for eac h test query in the collec tion. Using the relev ance information for the qu eries, the av erage p recision v alue can b e computed for eac h query . T he mean a v erage precision for the en tire query s et is then calculated. Differen t tec hniques can b e compared u sing the av erag e precision figures they yield on a give n test collect ion. Obvio usly , tec hniques that p erform well across a wide v a riet y of test collections can b e regarded as robus t. F or our exp er im ents, w e us e parts of the TREC collectio n [V o orhees and Harman, 2005]. TREC (T ext REtriev al Conference) is an ARP A and NIST co-sp onsored effort that brings together information retriev al researc hers fr om around the wo rld to d iscuss their systems and to ev al uate them on a common test platform. The d o cuments an d queries in this collection are describ ed b elo w. A.1 Do cumen ts The TREC d o cument collection consists of a large num b er of full-text do cuments dra wn from a v ariet y of sources. The do cuments are stored on CD-R OMS, called the TREC disks. The disks are n umb ered, and a combinatio n of sev eral d isks can b e used to form a text collectio n for exp erimenta tion. Some statistics ab out the data on v arious disks is listed in T able 15 (adapted from [V o orhees and Harman, 1998]). Th e sources for the data are: • Disk 1 24 – AP Newswir e, 1989. (AP) – Short abstracts from the U.S. Department of Energy publications. (DOE) – U.S. F ederal Register, 1989. (FR) – W all S tr eet Journal, 1987–1 989. (WSJ) – Articles fr om Computer Sele ct disks , Ziff Da vis Pu blishing. (ZIFF) • Disk 2 – AP Newswir e, 1988. (AP) – U.S. F ederal Register, 1988. (FR) – W all S tr eet Journal, 1990–1 992. (WSJ) – Articles fr om Computer Sele ct disks , Ziff Da vis Pu blishing. (ZIFF). • Disk 3 – AP Newswir e, 1990. (AP) – U.S. Pate nts, 1993. (P A T ) – San Jose Mercur y News, 1991. (S J MN) – Articles fr om Computer Sele ct disks , Ziff Da vis Pu blishing. (ZIFF) • Disk 4 – Financial Times, 1991– 1994. (FT) – U.S. F ederal Register, 1994. (FR) – U.S. Con gressional Record, 1993. (CR) • Disk 5 – F oreign Br oadcast In formation Service. (FBIS) – LA Tim es. (LA T) A.2 Queries The qu eries are n atur al-language queries supplied by users. Most queries consist of 3 parts: • Ti tle : a few keyw ords (usually 2–3) related to the users query , • D esc (description): a sh ort, natural-language statemen t of th e user’s inform ation n eed, • N arr (narr ative ): a more detailed sp eci fication of wh at mak es a do cumen t relev an t for the corresp ondin g topic. The qu eries hav e v aried widely from year to y ear. At the first tw o conferences, T REC–1 and TREC–2, the queries were quite long and represen ted long-standing user information n eeds. Reflecting a trend to w ards realistic user qu eries, the qu er ies for TREC–3 were consid er ab ly shorter and the queries for TREC –4 w ere just a sent ence or tw o. Some c haracteristics of the query sets are shown in T able 16 (the training queries we re pr o vided to h elp train systems for TREC–1). Users also pr o vide relev ance ju dgmen ts (i.e. they sp ecify which do cu men ts are useful and w h ic h are non-relev a nt) for the do cu m en ts in the collection. These jud gemen ts enable us to measur e the retriev al effectiv eness (usin g a v erage precision figures) of our algorithms. 25 T able 15: TREC Do cument Statistics Source Size (Mb) Number of ar ticles Median num ber Average num ber of terms/a r ticle of ter ms/article Disk 1 WSJ 270 98,7 3 2 182 329 AP 259 84,6 7 8 353 375 ZIFF 245 75,1 8 0 181 412 FR 26 2 25,960 31 3 1017 DOE 1 8 6 226,08 7 82 89 Disk 2 WSJ 247 74,5 2 0 218 377 AP 241 79,9 1 9 346 370 ZIFF 178 56,9 2 0 167 394 FR 21 1 19,860 31 5 1073 Disk 3 SJMN 290 90,2 5 7 279 337 AP 242 78,3 2 1 358 379 ZIFF 349 161,0 21 119 263 P A T 245 6,71 1 2896 3543 Disk 4 FT 5 6 4 210,15 8 316 4 13 FR94 39 5 5 5 ,630 588 645 CR 235 27,922 288 13 74 Disk 5 FBIS 470 130,4 71 322 544 LA T 475 131,896 351 527 T able 16: Query Statistics Query Id. # of Queries Min Max Mean TREC–1 50 44 250 107.4 51–10 0 TREC–2 50 54 231 130.8 101–1 50 TREC–3 50 49 180 103.4 151–2 00 TREC–4 50 8 33 16.3 201–2 50 TREC–5 50 29 213 82.7 251–3 00 TREC–6 50 47 156 88.4 301–3 50 TREC–7 50 31 114 57.6 350–4 00 TREC–8 50 23 98 51.8 401–4 50 26 In recen t yea rs, the TREC collection has emerged as a stand ard test collection f or exp erimen tal IR. A t TREC –6, the sixth in this series of conferences, th irt y-eigh t group s including participan ts from n ine different countries and ten companies were r epresent ed. Giv en the participation by suc h a wide v ariet y of IR researc hers, a large and heterogeneous collect ion of full-text do cuments, a sizeable num b er of user qu eries, and a set of relev a nce judgments, TREC h as rightfully b ecome a s tand ard test environmen t for current information retriev al researc h. B Bibliograph y James Allan, Ben Carterette, Blago v est Dac hev, Jav ed A. Aslam, Virgil Pa vlu, and Ev angelos Kanoulas. Million qu ery trac k 2007 o ve rview. In TREC , 2007. Giam battista Amati, Claud io Carpineto, and Gio v anni Romano. Query d ifficult y , robustness, and selectiv e app lication of qu er y expansion. In ECIR , p ages 127–1 37, 2004. Azin Ash k an and Charles L. A. C larke. Characte rizing commercial in ten t. In CIKM , p ages 67–76 , 2009. Ricardo A. Baez a-Y ates, Liliana C alder´ on-Bena vides, and Cristina N. Gonz´ alez- Caro. The in ten tion b ehind web queries. In F abio C restani, P aolo F err ag- ina, and Mark Sanderson, editors, SPIRE , vo lume 4209 of L e ctur e Notes in Computer Scie nc e , pages 98–109 . S pringer, 2006. ISBN 3-540-45 774-7. URL http://d blp.uni- trier. de/db/conf/spire/spire2006.html#Baeza- YatesCG06 . Jing Bai, J ian-Y un Nie, Guihong Cao, and Hugues Bouc hard. Usin g quer y con texts in infor- mation retriev al. In Pr o c e e dings of the 30th annual international ACM SIGIR c onfer enc e on R ese ar ch and development in information r etrieval , pages 15–22. A CM, 2007. Stev en M. Beitzel, Eric C. Jensen, Ab dur Ch o wdhury , Da vid Grossman, and Oph ir F rieder. Hourly analysis of a v ery la rge topically categorized we b quer y log. In Pr o c e e d- ings of the 27th annual international ACM SIGIR c onfer enc e on R ese ar ch and develop- ment in information r etrieval , SIGIR ’04, p ages 321–328 , New Y ork, NY, USA, 2004. A CM. ISBN 1-58113-8 81-4. d oi: http://doi. acm.org/10.1 145/1008992.1009048. URL http://d oi.acm.o rg/10.11 45/1008992.1009048 . Stev en M. Beit zel, Eric C. Jensen, Op hir F rieder, David D. Lewis, Ab dur Chowdh ury , and Aleksander Kolcz. Impro ving automatic query classification via semi-sup ervised learning. In Pr o c e e dings of the Fifth IEEE International Confer enc e on Data Mining , ICDM ’05, pages 42–49 , W ash ington, DC, USA, 2005. IEEE Comp uter So ciet y . IS BN 0-7695 -2278-5. doi: http: //dx.doi.org/1 0.1109/ICDM.2005.80. URL htt p://dx. doi.org/ 10.1109/ICDM.2005.80 . Mic hael Bendersky and W Bru ce Croft. Disco v ering k ey concepts in verb ose qu eries. In Pr o c e e d- ings of the 31st annual international ACM SIGIR c onfer enc e on R ese ar ch and development in information r etrieval , p ages 491–4 98. ACM, 2008. Mic hael Bendersky , Donald Metzler, and W Bruce C roft. Pa rameterized concept we ight ing in v erb ose qu eries. I n Pr o c e e dings of the 34th international ACM SIGIR c onfer enc e on R ese ar ch and development in Information R etriev al , pages 605–614 . ACM, 2011. Krishna Bharat. Searchpad: exp licit capture of search cont ext to s u pp ort web searc h. Computer Networks , 33(1):493 –501, 2000. J. Bh ogal, A. Macfarlane, and P . Smith. A review of ontolo gy based query exp ansion. Inf. Pr o c ess. Manage. , 43(4):86 6–886, July 2007. IS SN 0306-45 73. doi: 10.1016/j.ipm.20 06.09.003. URL http: //dx.do i.org/10 .1016/j.ipm.2006.09.003 . Andrei Z. Bro der. A taxonom y of w eb search. SIGIR F orum , 36(2):3 –10, 2002. 27 Chris Buc kley . Wh y curr en t IR engines fail. Inf. R etr. , 12:6 52–665, De- cem b er 2009. ISSN 1386-456 4. doi: 10.100 7/s10791 - 009- 9103- 2. URL http://d l.acm.or g/citati on.cfm?id=1644394.1644417 . Ja y Budzik and Kristian J Hammond. User interac tions with ev eryda y applications as con text for just-in-time information access. In Pr o c e e dings of the 5th international c onfer enc e on intel ligent user interfac e s , pages 44–51. ACM, 2000. Bin Cao, Jian-T ao Sun, Ev an W ei Xiang, Derek Hao Hu, Qiang Y ang, and Zh eng Chen. PQC: p ersonalized qu ery classification. In CIKM , p ages 1217–1226 , 2009a. Huanhuan Cao, Derek Hao Hu, Dou S hen, Daxin J iang, Jian-T ao Sun , En h ong C hen, and Qiang Y ang. Con text-a w are query classification. In SIGIR , pages 3–10, 2009 b. Da vid Carmel and Or en Kurland. Query p erformance prediction for IR . In Pr o c e e dings of the 35th International ACM SIGIR Confer enc e on R ese ar ch and Development i n Information R e- trieval , SIGIR ’12, p ages 1196–119 7, New Y ork, NY, USA, 2012. A CM. ISBN 978-1-450 3-1472- 5. doi: 10.1145/ 2348283.2348540. URL ht tp://doi .acm.org /10.1145/2348283.2348540 . Da vid C armel and Elad Y om-T o v. Estimating the qu ery diffi cu lt y for information retriev al. Synthesis L e ctur es on Information Conc ep ts, R etr ieval, and Servic es , 2(1):1–89, Jan uary 2010. ISSN 1947-945 X, 1947-9468 . doi: 10.2 200/S00235 ED1V01Y201004ICR015. Claudio Carpin eto and Gio v anni Romano. A survey of automatic query exp an s ion in information retriev a l. A CM Computing Surveys , 44(1):articl e 1, Jan uary 2012. Claudio Carpineto, Renato de Mori, Gio v anni Romano, and Brigitte Bigi. An information- theoretic app roac h to automatic qu er y expansion. A CM T r ans. Inf. Syst. , 19(1 ):1–27, 2001. Ben Carterette, Virgil P a vlu, Hui F ang, an d Ev angelos Kanoulas. Million query trac k 200 9 o v erview. In TREC , 2009. F rancesco Colace, Massimo De Santo, Lu ca Greco, and Paol o Nap olet ano. Imp ro ving r elev ance feedbac k-based qu ery expansion by the use of a w eigh ted word pairs appr oac h. Journal of the Asso ciat ion for Information Scienc e and T e chnolo gy , 2015. Maurice Co yle and Barry Smyt h. Inf ormation Reco v ery and Disco v ery in Collab orati ve W eb S earc h. In A dvanc es in Information R etrieval (ECIR 2007) , v olume 4425 of LNCS , pages 356–367, 2007 . ISBN 050118204 7. d oi: 10.1007/97 8- 3- 540- 7 1496- 5 \ 33. URL http://d x.doi.or g/10.100 7/978- 3- 540- 71496- 5_33 . Daniel W a yne Crabtree, Pete r Andreae, and Xiao ying Gao. Exploiting underrepr esen ted q u ery asp ects for automatic query expansion. In Pr o c e e dings of the 13th ACM SIGKDD international c onfer enc e on Know le dge disc o very and data mining , p ages 191–20 0. ACM, 2007. Stev e Cr onen-T o wnsend, Y u n Zhou, and W. Bru ce Croft. Predicting query p erfor- mance. In Pr o c e e dings of the 25th Annual International ACM SIGIR Confer enc e on R ese ar ch and Development i n Information R etrieval , SIGIR ’02, page s 299–306 , New Y ork, NY, USA, 2002. ACM. I SBN 1-58113 -561-0. doi: 10.1145 /564376 .564429. URL http://d oi.acm.o rg/10.11 45/564376.564429 . Stev e Cronen-T o wnsen d, Y un Z hou, and W. Bruce Croft. A framew ork for selectiv e query expansion. In Pr o c e e dings of the thirte enth ACM international c onfer enc e on Information and know le dge management , CIKM ’04, p ages 236–237 , New Y ork, NY, USA, 2004 . A CM. ISBN 1-581 13-874-1. Huizhong Duan and Bo-June Paul Hsu. On line sp elling correction f or query completion. In Pr o c e e dings of the 20th international c onfer enc e on World wide web , pages 117–126. A CM, 2011. 28 Man uel J. A. Eugster, T uukk a Ruotsalo, Mic hiel M. Spap, Ilkk a Kosunen, Osw ald Barral, Niklas Ra v a ja, Giulio J acucci, and Sam uel Kaski. Predicting Relev a nce of T ext from Neur o- Ph ysiology. In Pr o c e e dings of the Neur o-Physiolo gic al Metho d s in IR R ese ar ch - a SIGIR Workshop . A CM, 2015. URL h ttps://s ites.goo gle.com/site/neuroir2015/papers . Jo el L F agan. Exp erimen ts in automatic phrase ind exin g for do cument retriev al: a comparison of synta ctic and non-syn tactic metho d s. T echnical rep ort, Cornell Univ ersit y , 1987. Hui F ang. A re-examination of query expansion using lexical resources. In P r o c e e dings of ACL-08: HL T , pages 139–14 7, Columbus, O hio, Jun e 2008. Asso ciat ion for Compu tational Linguistics. URL ht tp://www .aclweb .org/anthology/P/P08/P08- 1017 . Lev Fink elstein, Evgeniy Gabrilo vic h, Y ossi Matias, Ehud Rivlin, Z ach S olan, Gadi W olfman, and Ey tan Rupp in. Placing searc h in con text: The concept revisited. In Pr o c e e dings of the 10th international c onfer enc e on World Wide Web , pages 406–41 4. ACM, 2001. Evgeniy Gabrilovic h, Andr ei Bro der, Marcus F onto ur a, Amruta Joshi, V anja Josifo vski, Lance Riedel, and T ong Zhan g. C lassifying searc h qu eries using the w eb as a source of kn owledge. ACM T r ans. We b , 3(2):5:1–5 :28, April 2009. ISSN 1559-1131 . doi: 10.1 145/1513 876.1513877. URL http: //doi.a cm.org/1 0.1145/1513876.1513877 . Jianfeng Gao, Xiaolong Li, Daniel Micol, Chris Quir k, and Xu Sun . A large scale ranke r-based system f or searc h query sp elling correction. In P r o c e e dings of the 23r d International Confer- enc e on Computational Linguistics , p ages 358–366. Asso ciation for Compu tational Linguistics, 2010. Susan Gauc h, Jianying W ang, and Sat y a Mahesh Rac hako nd a. A corpus analysis appr oac h for automatic query expansion and its extension to m ultiple databases. ACM T r a ns. Inf. Syst. , 17(3):250– 269, July 1999. IS SN 1046-8188. doi: 10.1145/ 314516.314519. URL http://d oi.acm.o rg/10.11 45/314516.314519 . Kripabandhu Ghosh and Swapan Kumar P arui. Clustered semi-sup ervised relev a nce feedback. In CIKM , 2015. Eric J Glo v er, S tev e L awrence, Mic hael D Gordon, William P Birmingham, and C L ee Giles. W eb searc h—yo ur wa y . Communic ations of the ACM , 44(12):9 7–102, 2001. Rob erto Gonzlez-Ibez and Ch irag Shah. Affec tiv e Signals as Implicit I n dicators of In- formation Relev a ncy and Information Pro cessing Str ategies . In Pr o c e e dings of the Neur o-Physiolo gic al M etho ds in IR R ese ar ch - a SIGIR Workshop . A CM, 2015. URL https:// sites.go ogle.com /site/neuroir2015/papers . Manish Gupta and Mic hael Bendersky . In formation Retriev al with V erb ose Queries. F oundations and T r ends in Information R etrieval , 9(3-4):209– 354, 2015a. IS S N 1554 -0669. doi: 10.1561/ 15000 00050. URL http ://www.n owpublis hers.com/article/Details/INR- 0 50 . Manish Gup ta and Mic hael Bend er s ky . Inf ormation r etriev al with ve rb ose queries. I n Pr o c e e dings o f the 38th International ACM SIGIR Confer enc e on R ese ar ch and De- velopment in Information R etrieval , SIGIR ’15, pages 1121–1124 , New Y ork, NY, USA, 2015b. ACM. ISBN 978-1-450 3-3621-5. doi : 10.114 5/27664 62.2767877. URL http://d oi.acm.o rg/10.11 45/2766462.2767877 . D. Harman. T o wa rds interacti ve query expansion. In Pr o c e e dings of the 11th annual international ACM SIGIR c onfer enc e on R e se ar ch and development in information r etrieval , S IGIR ’88, pages 321–331, New Y ork, NY, USA, 1988. A CM. I S BN 2-7061-0 309-4. doi: http://doi.a cm. org/10.1 145/624 37.62469. URL http ://doi.a cm.org/1 0.1145/62437.62469 . 29 Donna Harman and Chris Buckl ey . Ov erview of the Reliable Information Access W ork- shop. Information R etrieval , 12(6):6 15–641, July 2009. ISSN 1386- 4564. doi: 10.10 07/ s10791 - 009-9101- 4. URL h ttp://li nk.sprin ger.com /10.1007/s10791- 009 - 9101 - 4 . Claudia Hauff, Djo erd Hiemstra, and F rancisk a de J ong. A surv ey of pr e- retriev a l query p erformance predictors. In Pr o c e e dings of the 17th ACM c onfer enc e on Information and know le dge management , pages 1419–142 0. ACM, 2008. URL http://d l.acm.or g/citati on.cfm?id=1458311 . Ben He and Iadh Ounis. Query p erformance p rediction. Inf. Syst. , 31(7):5 85– 594, No v em b er 2006. ISSN 0306-43 79. doi: 10.1016/ j.is.2005.11.003. URL http://d x.doi.or g/10.101 6/j.is.2005.11.003 . William Hersh, Su s an Price, and Larry Donoho e. Assessin g thesaurus-based qu ery expansion using the umls metathesaurus. In In Pr o c. of the 2000 Americ an Me dic al Informatics Asso- ciation (AMIA) Symp osium , pages 344–348, 2000. Chien-Kang Hu ang, Lee-F eng Ch ien, and Y en-Jen Oy ang. Relev ant term suggestion in interac- tiv e w eb s earc h based on con textual information in qu ery session logs. Journal of the Americ an So ci ety for Information Scienc e and T e chno lo gy , 54(7):63 8–649, 2003. Glen Jeh an d Jennifer Widom. Scali ng p ersonalized w eb searc h. In Pr o c e e dings of the 12th international c onfer enc e on World Wide Web , pages 271–279 . ACM, 2003. Y. Jing and W.B. Cr oft. An asso ciation thesauru s for inform ation retriev al. In P r o c e e dings of the Intel ligent Multime dia Information R etrieval Systems (RIAO ’94, New Y ork, NY), 1994 , pages 146–16 0, 1994. Oren Kur land, Ann a Shtok, Shay Hummel, Fiana Raib er, Da vid Carmel, and Ofri Rom. Bac k to the ro ots: A pr obabilistic framew ork for query -p erformance prediction. In Pr o c e e dings of the 21st ACM International Confer enc e on Information and Know le d ge M anagement , CIKM ’12, pages 823–832, New Y ork, NY, USA, 2012. A CM. IS BN 978-1-450 3-1156-4. d oi: 10.114 5/ 23967 61.23968 66. URL htt p://doi .acm.org /10.1145/2396761.2396866 . Matthew Lease. An improv ed m ark o v random field mo del for su pp orting v erb ose qu eries. In Pr o c e e dings of the 32nd international ACM SIGIR c onfer enc e on R ese ar ch and development in information r etrieval , p ages 476–4 83. ACM, 2009. Y anen Li, Huizhong Duan, and Ch engXiang Zh ai. A generalized hid den marko v mo d el with discriminativ e training for qu er y s p elling correction. In P r o c e e dings of the 35th international ACM SIGIR c onfer enc e on R ese a r ch and development in information r etrieval , p ages 611–620. A CM, 2012. Yinghao L i, Wing Pong Rob er t Luk , Kei Shiu Ed w ard Ho, and F u Lai Korris Chung. Im p ro ving w eak ad-h o c queries using Wikip edia as external corpu s. In Pr o c e e dings of the 30th annual international ACM SIGIR c onfer enc e on R ese ar ch and development in information r etrieval , SIGIR ’07, pages 797–79 8, New Y ork, NY, USA, 2007. A CM. ISBN 978-1-59 593-597-7. doi: 10.114 5/12777 41.1277914. URL http ://doi.a cm.org/ 10.1145/1277741.1277914 . F ang Liu, Clemen t Y u, and W eiyi Meng. P ersonalized w eb searc h for impr oving r etriev al effec- tiv eness. Know le dge and D ata Engine ering, IEE E tr ansactions on , 16(1):28– 40, 2004. Zhiy ong Lu , W on Kim, and W. John Wilbur. Ev aluatio n of query expansion u sing mesh in pub med. Inf . R etr. , 12(1):6 9–80, F eb r uary 2009. ISSN 1386-4564. d oi: 10.1007/ s10791 - 008-9074- 8. URL h ttp://dx .doi.org /10.100 7/s10791- 008 - 9074- 8 . Daniel Macias-Galindo, La wrence Ca ve don, John Thangara jah, and Wilson W ong. E ffects of domain on measures of semantic relatedness. Journal of the Asso ciation for Informatio n Scienc e and T e chnolo gy , 2015. 30 George A. Miller. W ordnet: a lexical database for English. Commun. ACM , 38:39– 41, No v em b er 1995. ISSN 0001-0 782. doi: h ttp://doi .acm.org/10.1 145/219717.219748. URL http://d oi.acm.o rg/10.11 45/219717.219748 . Mandar Mitra, Chris Buc kley , Amit Sin ghal, C laire Cardie, et al. An analysis of statistical and syn tactic phrases. In RIAO , v olume 97, pages 200–21 4, 1997. Mandar Mitra, Amit Singhal, and Ch ris Buc kley . Impro ving automatic qu er y expansion. In SIGIR’98 , pages 206–214, 1998. Dan Moldo v an, Marius Paca, Sanda Harabagiu, and Mihai Su rdeanu. Performance iss ues and er- ror analysis in an op en-domain qu estion answering system. ACM T r ansactio ns on Information Systems , 21(2):133 –154, 2003. ISSN 1046818 8. doi: 10.1145/7 63693.763694. Mohammed Mustafa, Izzedin Osman, and Hussein S uleman. Indexing and weigh ting of mul- tilingual and mixed do cum ents. In Pr o c e e dings of the South Afric an Institute of Computer Scientists and Inf ormation T e chnolo gi sts Confer enc e on Know le dge, Innovation and L e ader- ship i n a Dive rse, Multidisciplinary Envir onment , SAICSIT ’11, pages 161–170 , New Y ork, NY, US A, 2011. A CM. ISBN 978-1-45 03-0878-6. doi: 10.1145 /207222 1.2072240. URL http://d oi.acm.o rg/10.11 45/2072221.2072240 . Rob erto Na vigli. W ord sense disam biguation: A survey . ACM Comput. Surv. , 41( 2): 10:1–1 0:69, F ebruary 2009. IS SN 0360-0300 . doi: 10.114 5/145935 2.1459355. URL http://d oi.acm.o rg/10.11 45/1459352.1459355 . Hw ee T ou Ng. Do es word sense disam biguation improv e inf orm ation retriev al? In Pr o c e e dings of the fourth workshop on E xploiting semantic annotations in information r etrieval , ESAIR ’11, pages 17–1 8, New Y ork, NY, USA, 2011. ACM. I SBN 978-1-4503 -0958-5. doi: 10.1145/ 20647 13.20647 24. URL htt p://doi .acm.org /10.1145/2064713.2064724 . Douglas W O ard and William W ebb er. Information retriev al for e-disco v ery . Information R e- trieval , 7(2-3):99 –237, 2013. Douglas W Oard , Jason R Baron, Bruce Hedin, Da vid D Lewis, and Steph en T omlinson. Ev al- uation of information retriev al for e-disco v ery . A rtificial Intel ligenc e and L aw , 18(4):34 7–386, 2010. Jiaul H. P aik and Douglas W. Oard. A fixed-p oin t method for weigh ting terms in v erb ose inf orm ational queries. In Pr o c e e dings of the 23r d ACM International Confer- enc e on Confer enc e on Information and Know le dge Management, CIKM 2014, Sha nghai, China, Novemb er 3-7, 2014 , pages 131–140, 2014. doi: 10.1145 /266182 9.2661957. URL http://d oi.acm.o rg/10.11 45/2661829.2661957 . Jiaul H P aik, Swapan K Parui, Dipasree Pa l, and Steph en E Rob ertson. Effectiv e and robust query-based stemming. ACM T r a nsactions on Information Systems (TOIS) , 31(4):1 8, 2013. Marius A Pasca and Sand ra M Harabagiu. High p erf ormance question/answ ering. In P r o c e e dings of the 24th annual internat ional ACM SIGIR c onfer enc e on R ese ar ch and development in information r etrieval , pages 366–37 4. ACM, 2001. John Prager. Op en-d omain question–answering. F ound. T r ends Inf. R etr. , 1(2): 91–23 1, January 2006 . ISSN 1554- 0669. doi: 10 .1561/15 00000001. URL http://d x.doi.or g/10.156 1/1500000001 . Rah ul Pr amanik, Sukomal P al, and Mana jit Ch akrab orty . Wh at the user does n ot w an t?: Query reformulation through term inclusion-exclusion. In Pr o c e e dings of the Se c- ond ACM IKDD Confer enc e on Data Sc i enc es , CoDS ’15, page s 116–11 7, New Y ork, NY, US A, 2015. A CM. ISBN 978-1-45 03-3436-5. doi: 10.1145 /273258 7.2732606. URL http://d oi.acm.o rg/10.11 45/2732587.2732606 . 31 Y onggang Qiu and Hans-P eter F rei. Concept b ased query expansion. In Pr o c e e dings of the 16th annual i nternational A CM SIGIR c onfer enc e on R ese ar ch and development in information r etr ieval , SIGIR ’93, pages 160–169, New Y ork, NY, USA, 1993 . A CM. ISBN 0-89791-60 5-0. doi: 10.1145 /160688 .16 07 13. URL htt p://doi .acm.org /10.1145/160688.160713 . Dmitri Rouss in o v. Asp ec t presence v erification conditional on other asp ects. In Pr o c e e dings of the 33r d international ACM SIGIR c onfer enc e on R ese a r ch and development in information r etr ieval , pages 865–86 6. ACM, 2010. M. Sand erson. Retrieving with go o d sense. Information R e trieval , 67:47–6 7, 2000. ISS N 1386- 4564. doi: 10.1023/ A:1009933700147. URL http:/ /dx.doi. org/10. 1023/A:1009933700147 . Mark S anderson. Ambiguous qu eries: test collections need more sense. In Pr o c e e dings of the 31st annual i nternational A CM SIGIR c onfer enc e on R ese ar ch and development in information r etr ieval , pages 499–50 6. ACM, 2008. Ro drygo L T San tos, Cr aig Macdonald, and Iadh Ou n is. Searc h r esult div ersification. F ounda- tions and T r end s in Information R etrieval , 9(1):1–90, 2015. Hinric h Sc h ¨ utze and Jan O. Pedersen. Inform ation retriev al based on w ord senses. In Pr o c e e dings of the fourth workshop on Exploiting semantic annotations in information r etr ieval , SD AIR, pages 161–17 5, 1995. Dou Sh en, Jian-T ao Sun, Qiang Y ang, and Zheng Ch en . Building bridges for w eb query classification. In Pr o c e e dings of the 29th annual international ACM SIGIR c onfer enc e on R ese ar ch and development i n information r etrieval , SI GIR ’06, pages 131–13 8, New Y ork, NY, USA, 2006. A CM. ISBN 1-59593 -369-7. doi: 10.1 145/1148 170.1148196. URL http://d oi.acm.o rg/10.11 45/1148170.1148196 . Anna Shto k, O ren Kur land , and David Carmel. Pr edicting query p erform an ce by query-drift estimation. In Leif Azzopardi, Gabriella Kaza i, Stephen Robertson, Stefan R ¨ u ger, Milad Shok ouhi, Da w ei Song, and Emine Yilmaz, editors, A dvanc es in Information R etrieval The ory , v olume 5766 of L e ctur e Notes in Computer Scienc e , pages 305–312. S pringer Berlin Heidelb erg, 2009. ISBN 978-3-64 2-04416-8. d oi: 10.100 7/978- 3- 642- 04417- 5 \ 30. Anna Sht ok, Oren Ku rland, Da vid Carmel, Fiana Raib er, and Gad Marko vit s. Pr e- dicting qu ery p erformance by qu ery-drift estimation. ACM T r ans. Inf. Syst. , 30 (2):11 :1–11:35, Ma y 2012. ISS N 1046-8188 . doi: 10.1145/ 2180868 .2180873. URL http://d oi.acm.o rg/10.11 45/2180868.2180873 . Mor Sond ak, Ann a Shtok, and Oren Ku r land. Es timating query representati ve ness for query- p erform an ce prediction. In Pr o c e e dings of the 36th International ACM SIGIR Confer enc e on R ese ar ch and Development in Informa tion R etrieval , SIGIR ’13, pages 853–856, New Y ork, NY, USA, 2013. A CM. IS BN 978-1-45 03-2034 -4. doi: 10.1145 /2484028.2484107. URL http://d oi.acm.o rg/10.11 45/2484028.2484107 . Min Song, Il Y eol Song, Rob ert B Allen, and Zoran Obrado vic. Keyphrase extraction-based query expansion in digital libr aries. In Pr o c e e dings of the 6th ACM/IEEE-CS joint c o nfer enc e on Digital libr a ries , pages 202–20 9. ACM, 2006. Olga V ec ht omo v a, Stephen Rob ertson, and Su san Jones. Q uery expans ion with long-span collo- cates. Inf. R etr. , 6(2):251– 273, Apr il 2003. IS S N 1386-4 564. d oi: 10.1 023/A:102 3936321956. URL http: //dx.do i.org/10 .1023/A:1023936321956 . Vish wa Vina y , Ingemar J. Cox, Natasa Milic-F ra yling, and Kenneth R. W o o d. On ranking the effectiv eness of searc hes. In SIGIR 2006: Pr o c e e dings of the 29th Annual International ACM SIGIR Confer enc e on R ese ar ch and Development in Information R etrieval, Se attle, Washington, USA, August 6-11, 2006 , pages 398–404, 2006. doi: 10.114 5/11481 70.1148239. URL http: //doi.a cm.org/1 0.1145/1148170.1148239 . 32 Ellen V o orhees and D. Harman. Ov erview of the S ixth Text REtriev al Conference (TR E C-6) . In Pr o c e e dings of the Sixth T ext REtrieval Confer enc e (TREC- 6) , p ages 1–24. NIST S p ecial Publication 500-240, 1998. Ellen M. V o orhees. Q uery expansion using lexical-sema ntic relations. In Pr o c e e dings of the 17th annual i nternational A CM SIGIR c onfer enc e on R ese ar ch and development in information r etr ieval , SIGIR ’94, pages 61–69 , New Y ork, NY, USA, 1994. Sp ringer-V erlag New Y ork, Inc. ISBN 0-387-19 889-X. URL ht tp://dl. acm.org/ citation.cfm?id=188490.188508 . Ellen M. V o orhees. Ov erview of th e TREC 2003 robust retriev a l trac k. In TREC , pages 69–77, 2003a . Ellen M. V o orhees. Ov erview of th e TREC 2003 robust retriev a l trac k. In TREC , pages 69–77, 2003b. Ellen M. V o orh ees and Donna K. Harman , editors. TREC Exp eriment and Evaluation in Infor- mation R e trieval . MIT Pr ess, 2005. HC W u , Rob ert WP Luk, Kam-F ai W ong, and Jian-Y u n Nie. A split-list approac h for r elev ance feedbac k in information retriev al . Information P r o c essing & Management , 48(5):969– 977, 2012. Bo Xu , Hongfei Lin, and Y uan Lin. Assessment of learning to r an k metho ds for query expansion. Journal of the Asso ciatio n for Informatio n Scienc e and T e chnol o gy , 2015. Jinxi Xu and W. Bruce Croft. Q u ery exp an s ion using lo cal and global do cument analysis. In SIGIR , pages 4–11, 1996. Jinxi Xu and W. Bruce Croft. Im pro ving the effectiv eness of information retriev al with lo cal con text analysis. ACM T r a ns. Inf. Syst. , 18(1):79– 112, 2000. Y. Xu, G.J.F. Jones, and B. W ang. Query d ep enden t pseudo-relev ance feedbac k based on Wikip edia. I n SIGIR 2009 , pages 59–66, 2009. Elad Y om-T o v, Shai Fine, Da vid Carmel, and Adam Darlo w. Learning to estimate query difficult y: In clud ing applications to missing con ten t detection and distribu ted informa- tion retriev al. In Pr o c e e dings of the 28th Annual International ACM SIGIR Confer enc e on R ese ar ch and Development in Informa tion R etrieval , SIGIR ’05, pages 512–519, New Y ork, NY, USA, 2005. A CM. ISBN 1-59593 -034-5. doi: 10.1 145/1076 034.1076121. URL http://d oi.acm.o rg/10.11 45/1076034.1076121 . Le Z h ao and Jamie Callan. Automatic term mismatc h d iagnosis for selectiv e query expansion. In Pr o c e e dings of the 35th international ACM SIGIR Confer enc e on R ese ar ch and Development in Information R etrieval , S IGIR ’12, pages 515–524 , New Y ork, NY, US A, 2012. A CM. IS BN 978-1 -4503-147 2-5. Y un Zhou and W. Bruce C roft. Ranking robustness: A nov el framework to pre- dict query p erformance. In Pr o c e e dings of the 15th ACM International Confer enc e on Informat ion an d Know le dge Mana gement , CIKM ’06, p ages 567 –574, New Y ork, NY, USA, 2006. ACM. IS BN 1-59593-43 3-2. doi: 10.1145/1 183614.1183696. URL http://d oi.acm.o rg/10.11 45/1183614.1183696 . Y un Zh ou and W. Bruce C roft. Query p erformance pr ediction in w eb s earc h environ- men ts. I n Pr o c e e dings of the 30th Annual International ACM SIGIR Confer enc e on R e- se ar ch and Dev e lopment in Information R etrieval , S IGIR ’07, pages 543–5 50, New Y ork, NY, US A, 2007. A CM. ISBN 978-1-59 593-597-7. doi: 10.1145 /127774 1.1277835. URL http://d oi.acm.o rg/10.11 45/1277741.1277835 . 33 -100.00 0.00 100.00 200.00 300.00 400.00 300 320 340 360 3 80 %-age change in MAP Que ry no.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment