A Novel Combined Term Suggestion Service for Domain-Specific Digital Libraries

A Novel Combined Term Suggestion Service for Domain-Specific Digital   Libraries
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Interactive query expansion can assist users during their query formulation process. We conducted a user study with over 4,000 unique visitors and four different design approaches for a search term suggestion service. As a basis for our evaluation we have implemented services which use three different vocabularies: (1) user search terms, (2) terms from a terminology service and (3) thesaurus terms. Additionally, we have created a new combined service which utilizes thesaurus term and terms from a domain-specific search term re-commender. Our results show that the thesaurus-based method clearly is used more often compared to the other single-method implementations. We interpret this as a strong indicator that term suggestion mechanisms should be domain-specific to be close to the user terminology. Our novel combined approach which interconnects a thesaurus service with additional statistical relations out-performed all other implementations. All our observations show that domain-specific vocabulary can support the user in finding alternative concepts and formulating queries.


💡 Research Summary

The paper investigates interactive query expansion (IQE) techniques for domain‑specific digital libraries, focusing on how term suggestion services can alleviate the well‑known “vocabulary problem.” Four distinct term suggestion services were implemented and evaluated within the social‑science portal Sowiport, which hosts over 4.8 million bibliographic records and attracts roughly 7,000 unique visitors per month. The services differ in their knowledge sources: (1) User‑Search‑Terms (UST) – a flat list of 28 000 distinct user‑entered query terms collected since 2007, ranked by frequency; (2) Heterogeneity Service (HTS) – a controlled vocabulary drawn from 25 external thesauri (≈26 500 terms), presented alphabetically; (3) Thesaurus Service (TS) – the Social‑Science Thesaurus (≈11 600 entries, of which 7 750 are descriptors), also listed alphabetically; and (4) Combined Term Suggestion (CTS) – a novel hybrid that merges the TS list with recommendations from a Search Term Recommender (STR). The STR builds term‑term associations between natural‑language words extracted from titles/abstracts and controlled vocabulary terms, weighting co‑occurrences using likelihood‑ratio statistics and latent semantic analysis, then returns the most strongly associated controlled terms. In CTS, when a user types three characters only the TS list appears; from the fourth character onward an additional “Alternative Search Terms” section shows STR suggestions, with duplicate entries filtered out.

Evaluation was conducted by logging user interactions only when a search was submitted (via button click or Enter key), thereby excluding bots and crawlers. For each interaction the system recorded the entered term, the term selected from the suggestion list (including its position), the service type, timestamp, and session identifier. Each service was activated sequentially for a period sufficient to reach 1 000 unique visitors, after which the next service replaced it. This design allowed a direct comparison of usage metrics across services under comparable traffic conditions.

Two primary usage metrics were analyzed: (a) the proportion of unique visitors who selected at least one suggestion, and (b) the proportion of all submitted searches that incorporated a suggestion. CTS achieved the highest adoption: 50.9 % of visitors used the service, and 14 % of all searches included a CTS suggestion. TS followed with 37.5 % of visitors and 9 % of searches; UST attracted 25.2 % of visitors and about 7 % of searches; HTS lagged behind with 10.4 % of visitors and only 3 % of searches. Across all services, the maximum usage never exceeded 15 % of searches, indicating that term suggestion, while beneficial, is not a dominant factor in user behavior. Further analysis showed that selected suggestions tended to occupy the second position in the list on average, and that users typically typed enough characters to trigger the more specific suggestions (four or more characters for CTS). The distribution of selection positions declined sharply after the tenth entry, confirming that users focus on the top‑ranked items.

The authors interpret these findings as evidence that domain‑specific vocabularies (e.g., a social‑science thesaurus) are more aligned with user terminology than generic or purely statistical lists, leading to higher acceptance. Moreover, the hybrid CTS approach, which couples a curated thesaurus with statistically derived alternatives, outperforms each component alone, supporting the hypothesis that combining multiple IQE sources yields superior results. Nonetheless, the overall low adoption rates suggest usability challenges: the suggestion list may not be visually prominent, users may prefer to type their own terms, or the cognitive load of evaluating suggestions may deter use. The paper recommends future work on UI design enhancements, real‑time relevance feedback, and broader cross‑domain validation to determine whether the observed benefits generalize beyond the social‑science context.


Comments & Academic Discussion

Loading comments...

Leave a Comment