📝 Original Info
- Title: CATCH: A Controllable Theme Detection Framework with Contextualized Clustering and Hierarchical Generation
- ArXiv ID: 2512.21715
- Date: 2025-12-25
- Authors: ** - Rui Ke¹ - Jiahui Xu¹ - Shenghao Yang¹ - Kuang Wang¹ - Feng Jiang²* (교신 저자) - Haizhou Li¹,³,⁴ ¹ Shenzhen Research Institute of Big Data, The School of Data Science, The Chinese University of Hong Kong, Shenzhen ² Artificial Intelligence Research Institute, Shenzhen University of Advanced Technology ³ The School of Artificial Intelligence, The Chinese University of Hong Kong, Shenzhen ⁴ Department of Electrical and Computer Engineering, National University of Singapore **
📝 Abstract
Theme detection is a fundamental task in user-centric dialogue systems, aiming to identify the latent topic of each utterance without relying on predefined schemas. Unlike intent induction, which operates within fixed label spaces, theme detection requires cross-dialogue consistency and alignment with personalized user preferences, posing significant challenges. Existing methods often struggle with sparse, short utterances for accurate topic representation and fail to capture user-level thematic preferences across dialogues. To address these challenges, we propose CATCH (Controllable Theme Detection with Contextualized Clustering and Hierarchical Generation), a unified framework that integrates three core components: (1) context-aware topic representation, which enriches utterance-level semantics using surrounding topic segments; (2) preference-guided topic clustering, which jointly models semantic proximity and personalized feedback to align themes across dialogue; and (3) a hierarchical theme generation mechanism designed to suppress noise and produce robust, coherent topic labels. Experiments on a multi-domain customer dialogue benchmark (DSTC-12) demonstrate the effectiveness of CATCH with 8B LLM in both theme clustering and topic generation quality.
💡 Deep Analysis
📄 Full Content
CATCH: A Controllable Theme Detection Framework
with Contextualized Clustering and Hierarchical Generation
Rui Ke 1, Jiahui Xu 1, Shenghao Yang 1, Kuang Wang 1, Feng Jiang 2*, Haizhou Li 1,3,4
1Shenzhen Research Institute of Big Data, The School of Data Science, The Chinese University of Hong Kong, Shenzhen
2Artificial Intelligence Research Institute, Shenzhen University of Advanced Technology
3The School of Artificial Intelligence, The Chinese University of Hong Kong, Shenzhen
4Department of Electrical and Computer Engineering, National University of Singapore
jiangfeng@suat-sz.edu.cn
Abstract
Theme detection is a fundamental task in user-centric dia-
logue systems, aiming to identify the latent topic of each ut-
terance without relying on predefined schemas. Unlike intent
induction, which operates within fixed label spaces, theme
detection requires cross-dialogue consistency and alignment
with personalized user preferences, posing significant chal-
lenges. Existing methods often struggle with sparse, short
utterances for accurate topic representation and fail to cap-
ture user-level thematic preferences across dialogues. To ad-
dress these challenges, we propose CATCH (Controllable
Theme Detection with Contextualized Clustering and Hi-
erarchical Generation), a unified framework that integrates
three core components: (1) context-aware topic represen-
tation, which enriches utterance-level semantics using sur-
rounding topic segments; (2) preference-guided topic cluster-
ing, which jointly models semantic proximity and personal-
ized feedback to align themes across dialogue; and (3) a hi-
erarchical theme generation mechanism designed to suppress
noise and produce robust, coherent topic labels. Experiments
on a multi-domain customer dialogue benchmark (DSTC-12)
demonstrate the effectiveness of CATCH with 8B LLM in
both theme clustering and topic generation quality.
Introduction
In real-world customer service domains such as banking, fi-
nance, travel, and insurance, accurately identifying the un-
derlying theme of each user utterance is essential for en-
hancing service efficiency, understanding user needs, and
retrieving contextually relevant knowledge. Unlike intent
induction (Gung et al. 2023), which typically maps utter-
ances to a predefined label space (Pu et al. 2022; Costa
et al. 2023), theme detection aims to uncover latent and
potentially novel topics without prior knowledge. Effective
theme detection requires preliminary precise topic assign-
ment within a single dialogue (Nguyen et al. 2022; Du,
Buntine, and Johnson 2013a), but more importantly should
be consistent across multiple dialogues and align with user
preferences (Mendonc¸a et al. 2025), which regularizes inter-
dialogue theme consolidation, as illustrated in Figure 1.
*Corresponding Author.
Copyright © 2026, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
C: What’s the interest
rate today ?
A: Sure thing!
C: I want to open a new
account.
change PIN
Inter-dialog Alignment
Theme Space
inquire about
bank account
check interest rate
change PIN
check account
balance
check interest rate
open bank account
Intra-dialog Separated
Theme Spaces
Dialogue within a
Specific Domain
User
Preference
Semantic
Conclusion
A: Sure thing!
A: hi how may I help you?
C: Can you check my
account balance ?
C: I want to change my
PIN.
Figure 1: Illustration of the controllable theme detection
task. Given a set of dialogues with unlabeled utterances, a
theme is generated for each utterance. The theme granular-
ity is influenced by auxiliary inputs such as user preferences
(Mendonc¸a et al. 2025).
These challenges underscore the need for models that can
generalize beyond surface-level semantics and adapt to di-
verse real-world conversational scenarios.
However, existing approaches fail to address the real-
world controllable theme detection for three key challenges.
First, short utterances often lead to sparse and ambiguous
semantic signals, making it difficult for conventional topic
modeling methods (Blei, Ng, and Jordan 2003; Pham et al.
2024) to construct reliable topic representations. Second,
while topic clustering methods (Chatterjee and Sengupta
2020; Gung et al. 2023) group utterances based on surface-
level semantics, they typically overlook user-specific pref-
erences, resulting in inconsistent clustering across dialogues
even when the underlying intent is similar. Moreover, most
previous work lacks a structured and controllable theme gen-
eration mechanism (Perkins and Yang 2019; Zeng et al.
2021), causing the generated topic labels to vary arbitrarily
between contexts and limiting their applicability in down-
stream applications.
To
address
these
challenges,
we
propose
CATCH (Controllable And Thematic Clustering with
Hierarchy), a controllable theme detection framework
that combines intra-dialogue context modeling with inter-
dialogue user preference alignment. Specifically, CATCH
comprises th
Reference
This content is AI-processed based on open access ArXiv data.