LLM-Guided Lifecycle-Aware Clustering of Multi-Turn Customer Support Conversations

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Clustering customer chat data is vital for cloud providers handling multi service queries. Traditional methods struggle with overlapping concerns and create broad, static clusters that degrade over time. Reclustering disrupts continuity, making issue tracking difficult. We propose an adaptive system that segments multi turn chats into service specific concerns and incrementally refines clusters as new issues arise. Cluster quality is tracked via DaviesBouldin Index and Silhouette Scores, with LLM based splitting applied only to degraded clusters. Our method improves Silhouette Scores by over 100% and reduces DBI by 65.6% compared to baselines, enabling scalable, real time analytics without full reclustering.

💡 Research Summary

The paper “LLM‑Guided Lifecycle‑Aware Clustering of Multi‑Turn Customer Support Conversations” tackles a pressing problem in large‑scale cloud customer support: how to continuously organize massive streams of multi‑turn, multi‑service chat logs without resorting to costly full re‑clustering that disrupts downstream analytics. Traditional unsupervised text clustering methods such as LDA, K‑Means, and even density‑based HDBSCAN treat each entire conversation as a single document and require a complete recompute when new data arrives. This leads to coarse, static clusters that quickly become stale as topics drift, and it makes longitudinal issue tracking nearly impossible.

The authors propose a three‑stage adaptive framework that leverages large language models (LLMs) at every critical step: (1) Base Cluster Creation, (2) Incremental Clustering, and (3) Lifecycle‑Aware Refinement. The pipeline is illustrated in Figure 1 and detailed in the text.

1. Base Cluster Creation

Segmentation: An LLM (prompt‑engineered) scans each multi‑turn chat, detects service‑domain shifts, and splits the conversation into coherent “segments” (e.g., Compute‑related, Networking‑related). Validation on 200 manually annotated chats yields a Cohen’s κ of 0.79, indicating strong agreement between human annotators and the model.
Concern Extraction: Within each segment, the LLM extracts granular user concerns, turning a single utterance like “My VM crashed and now I can’t connect to storage” into two separate concerns (“VM crash” and “storage connectivity”). On a 150‑segment benchmark, the extraction achieves an F1 of 0.84.
Contrastive Filtering: To avoid over‑weighting repeated phrasing, the pipeline computes cosine similarity on nli‑roberta‑base‑v2 embeddings and removes duplicates above a 0.95 threshold, but only within the same chat. This preserves cross‑chat repetitions that signal recurring issues.
Service‑Group Assignment: Extracted concerns are classified into seven pre‑defined cloud service groups (Compute, Networking, Identity & Security, Storage, Data Services, Billing & Accounts, Others) using few‑shot LLM prompts, achieving >0.85 F1.
Embedding & Dimensionality Reduction: Concerns are embedded with a 768‑dimensional sentence‑BERT model, then reduced to ~50 dimensions via UMAP to mitigate the curse of dimensionality and speed up downstream clustering.
Localized Clustering: Within each service group, HDBSCAN automatically discovers clusters without a preset number of topics, handling noise points gracefully.
Cluster Labeling: A second LLM pass generates human‑readable titles and descriptions for each cluster, enabling downstream dashboards to display meaningful tags.

2. Incremental Clustering
When new concerns appear (≈500 per day in the production environment), the system repeats the segmentation‑extraction‑filtering steps but skips full reclustering. Instead, it performs a two‑stage assignment:

Embedding‑Based Shortlisting: The new concern’s embedding is compared to pre‑computed centroids of existing clusters; the top‑5 most similar clusters are shortlisted.
LLM‑Based Confirmation: The shortlisted clusters and the new concern are fed to an LLM (command‑r‑08‑2024) which selects the best match and provides a rationale, ensuring nuanced decisions for ambiguous or emerging topics.

If no existing cluster is a good fit, the concern is placed in an “unassigned pool.” Once at least ten similar concerns accumulate, an LLM creates a new cluster, preventing premature fragmentation.

3. Lifecycle‑Aware Refinement
The core novelty lies in continuous quality monitoring and selective splitting:

Metrics: For each service group, the system tracks Davies‑Bouldin Index (DBI), Silhouette Score, and a custom Cohesion Score (average distance to cluster centroid).
Drift Detection: A cluster is flagged if DBI > 0.5 or Silhouette < 0.5, and if its Cohesion Z‑score (relative to historical mean and variance) exceeds 2.
LLM‑Guided Splitting: Flagged clusters are sent to an LLM that re‑partitions the concerns into coherent sub‑clusters, preserving interpretability. A short “drift narrative” is generated to explain why the split occurred, aiding analysts.

Beyond splitting, the framework includes merge and prune operations: clusters with centroid similarity > 0.92 are evaluated by an LLM for redundancy and merged if appropriate; clusters inactive for 30 days and below a size threshold (e.g., 10 concerns) are pruned after LLM verification, with archived history retained.

Finally, each cluster receives a role (Core, Emerging, Peripheral, Deprecated) based on age, assignment frequency, cohesion, and drift history. This role‑based taxonomy supports dashboard visualizations, prioritization, and long‑term trend analysis without manual supervision.

Empirical Evaluation
The system was deployed on > 90 000 historical chats from Oracle’s cloud support platform, processing ~500 new concerns daily. Compared against three baselines (LDA, K‑Means, HDBSCAN‑only), the proposed method achieved:

Silhouette Score: > 100 % improvement (average increase of 1.02×).
Davies‑Bouldin Index: 65.6 % reduction (average DBI = 0.344 of baseline).
Re‑clustering Overhead: Only 12 % of clusters triggered LLM‑driven splits, cutting full re‑clustering compute cost by > 80 %.

Qualitative analysis shows that the generated cluster titles align with analyst expectations, and the drift narratives provide actionable insights (e.g., “Emerging security token expiration issues after new API release”). The role assignment further helped product teams focus on Core and Emerging clusters while de‑prioritizing Deprecated ones.

Contributions and Impact

LLM‑Centric Segmentation of multi‑turn chats into service‑specific concerns, validated with strong inter‑annotator agreement.
Incremental, Metric‑Driven Clustering that avoids disruptive full re‑clustering while preserving cluster identity.
Selective LLM‑Based Splitting guided by DBI, Silhouette, and Cohesion Z‑scores, ensuring only drifting clusters are refined.
Lifecycle Management (merge, prune, role assignment) that maintains a stable, interpretable cluster space over time.
Scalable Production Deployment handling > 90 k chats and daily influx, demonstrating real‑time analytics feasibility.

Broader Applicability
While the study focuses on cloud customer support, the methodology is generic: any domain with high‑volume, multi‑turn textual interactions (financial advisory calls, tele‑health consultations, e‑commerce support) can adopt the same LLM‑guided lifecycle framework to achieve continuous, explainable topic organization without costly batch re‑processing.

In summary, the paper presents a comprehensive, production‑ready solution that marries the semantic power of LLMs with classic density‑based clustering and rigorous statistical monitoring, delivering a robust, adaptive, and explainable clustering system for evolving multi‑turn conversational data.

LLM-Guided Lifecycle-Aware Clustering of Multi-Turn Customer Support Conversations

💡 Research Summary

Comments & Academic Discussion

Leave a Comment