Automatic Aggregation by Joint Modeling of Aspects and Values

We present a model for aggregation of product review snippets by joint aspect identification and sentiment analysis. Our model simultaneously identifies an underlying set of ratable aspects presented in the reviews of a product (e.g., sushi and miso for a Japanese restaurant) and determines the corresponding sentiment of each aspect. This approach directly enables discovery of highly-rated or inconsistent aspects of a product. Our generative model admits an efficient variational mean-field inference algorithm. It is also easily extensible, and we describe several modifications and their effects on model structure and inference. We test our model on two tasks, joint aspect identification and sentiment analysis on a set of Yelp reviews and aspect identification alone on a set of medical summaries. We evaluate the performance of the model on aspect identification, sentiment analysis, and per-word labeling accuracy. We demonstrate that our model outperforms applicable baselines by a considerable margin, yielding up to 32% relative error reduction on aspect identification and up to 20% relative error reduction on sentiment analysis.

💡 Research Summary

The paper introduces a generative probabilistic model that simultaneously discovers ratable aspects (e.g., “sushi” and “miso” in a Japanese restaurant review) and the sentiment (value) expressed toward each aspect within short review snippets. Traditional pipelines treat aspect identification and sentiment classification as separate stages, which leads to error propagation: mistakes made during aspect detection inevitably degrade downstream sentiment analysis. By modeling aspects and values jointly, the proposed approach captures the inherent dependency between what is being talked about and how it is evaluated, allowing each task to inform the other during inference.

Model Architecture
Each document (or snippet) is assumed to be generated from a mixture of K latent aspects. For a given aspect k, a sentiment label v is drawn from a Dirichlet‑parameterized distribution βk, and then a word is emitted from a multinomial distribution φkv that is specific to the (aspect, sentiment) pair. The aspect selection itself follows a Dirichlet prior α, yielding a per‑document aspect proportion π. Formally, the generative process can be described as:

Draw aspect proportions π ∼ Dir(α).
For each token n in the document:
a. Choose an aspect zn ∼ Mult(π).
b. Choose a sentiment label vn ∼ Mult(βzn).
c. Generate word wn ∼ Mult(φzn,vn).
Thus, each word is conditioned on both the aspect and its associated sentiment, which enables the model to learn aspect‑specific sentiment vocabularies (e.g., “delicious” may be strongly associated with the “food” aspect, while “slow” aligns with the “service” aspect).

Inference
Exact posterior inference is intractable, so the authors employ a mean‑field variational approximation. The posterior q is factorized as q(π)q(z)q(v|z)q(φ), where q(z) = ∏n γn,k and q(v|z) = ∏n,k τn,k,v. Coordinate ascent updates are derived by taking expectations under the current variational distribution, leading to closed‑form updates for the Dirichlet parameters of π and β, and for the multinomial parameters of φ. The updates resemble an EM algorithm: the E‑step computes expected counts of (aspect, sentiment) assignments for each token, and the M‑step updates the Dirichlet hyper‑parameters and word‑distribution tables. The overall computational complexity per iteration is O(N·K·V), where N is the total number of tokens, K the number of aspects, and V the number of sentiment categories. This linear scaling, together with the possibility of parallelizing the token‑wise expectations, makes the algorithm suitable for large‑scale corpora.

Extensibility
A key design goal is modularity. The authors illustrate several extensions without altering the core variational machinery:

Hierarchical aspects can be modeled by placing a higher‑level Dirichlet process over the aspect proportions, allowing sub‑aspects to inherit word distributions from parent aspects.
Sentiment granularity can be increased from binary (positive/negative) to ternary (positive/neutral/negative) or even continuous rating scales by replacing the categorical β with a Gaussian‑Wishart prior.
Incorporating side information (e.g., user IDs, timestamps) is straightforward by conditioning π or β on additional covariates via logistic‑normal transformations.
All these variants require only minor modifications to the update equations, preserving the overall efficiency of the inference algorithm.

Experiments
The model is evaluated on two distinct datasets:

Yelp Review Snippets – A collection of short excerpts from restaurant reviews, manually annotated with five pre‑defined aspects (food, service, ambiance, price, and location) and binary sentiment labels for each aspect.
Medical Summaries – Clinical note abstracts where only aspect identification is relevant (sentiment labels are absent).

Three evaluation metrics are reported: (i) aspect identification accuracy, (ii) sentiment classification F1 score, and (iii) per‑token labeling accuracy (the proportion of tokens correctly assigned both an aspect and a sentiment). Baselines include:

A standard LDA‑based aspect model (no sentiment component).
A two‑stage pipeline where an LDA model first predicts aspects and a separate supervised classifier predicts sentiment.
A recent neural attention model that jointly learns aspect and sentiment representations but does not enforce a generative coupling.

Results show that the joint generative model consistently outperforms all baselines. On the Yelp data, aspect identification error is reduced by up to 32 % relative to the best baseline, while sentiment classification error drops by up to 20 % relative. Token‑level labeling improves by roughly 15 % absolute over the pipeline approach. On the medical dataset, where only aspect detection is required, the model still achieves a 10 % relative error reduction compared to LDA, demonstrating that the sentiment component does not hinder pure aspect discovery and may even provide regularization.

Contributions and Impact
The paper makes four principal contributions:

Joint Modeling – By coupling aspect and sentiment generation, the model leverages mutual information between the two tasks, eliminating error propagation inherent in sequential pipelines.
Efficient Variational Inference – The mean‑field algorithm yields closed‑form updates, scales linearly with data size, and can be parallelized, making it practical for real‑world large‑scale review corpora.
Modular Extensibility – The probabilistic framework accommodates hierarchical aspects, richer sentiment representations, and side‑information conditioning with minimal changes to the inference routine.
Empirical Validation – Extensive experiments on both consumer‑review and clinical domains demonstrate substantial performance gains in aspect identification, sentiment analysis, and fine‑grained token labeling.

Future Directions
The authors suggest several avenues for further research: (a) modeling conditional sentiment transitions, where the sentiment expressed for one aspect influences the sentiment of subsequent aspects within the same document; (b) integrating pre‑trained language models (e.g., BERT) as priors for φ to capture richer lexical semantics; (c) personalizing aspect‑sentiment priors based on user profiles to enable recommendation‑oriented summarization; and (d) extending the framework to multilingual settings by sharing aspect‑sentiment structures across languages while allowing language‑specific word distributions.

Overall, the work provides a solid probabilistic foundation for automatic aggregation of review content, offering both theoretical elegance and practical utility for businesses seeking to surface high‑rated or problematic product features from massive streams of user feedback.