Rating the online review rating system using Yelp
The impact of ratings on a restaurant plays a major role in attracting future customers to that restaurant. The word of mouth has been systematically replaced with the online reviews. It gives a sense of satisfaction for people to know beforehand about the number of average stars the restaurant has acquired before stepping into a restaurant. However, these ratings are indirectly biased based on the location, amenities, and the perception of individual people. In this work, we analyze the ratings of restaurants available through the Yelp public data for the discrepancies in the rating system and attempt to provide an optimized global rating system. For a frequent visitor to a high- end restaurant with lavish amenities, even a slightest of reduction in the expected ambiance may prompt a 4 star rating, while a restaurant, which guarantees a minimum taste for its food, may get a 5 star rating. These discrepancies can often be attributed to three factors- the perspective of individual people, features of the restaurant and Location. The perspective of individual people is always subjective and what seems good for one person may be poor for another. In this work, we focus on the other two important factors, Reviews and the features.
💡 Research Summary
The paper investigates systematic biases in the Yelp restaurant rating system and proposes an adjusted global rating framework that incorporates two objective dimensions—restaurant features (amenities) and geographic context—while deliberately excluding the highly subjective individual perspective. Using the publicly available Yelp dataset, the authors extract business identifiers, average star ratings, review texts, and metadata such as address, price range, and categories. To enrich the geographic dimension, they merge external sources (U.S. Census data, tourism statistics) that provide information on population density, median income, and proximity to tourist attractions.
Feature scores are constructed from eight binary amenity attributes (e.g., parking availability, Wi‑Fi, outdoor seating, pet‑friendly) and combined via a weighted sum. Review sentiment is captured through a hybrid approach: after standard preprocessing (tokenization, stop‑word removal, stemming), the authors apply both a lexicon‑based sentiment analyzer (VADER) and a fine‑tuned BERT‑based classifier. The two sentiment outputs are averaged to produce a “review sentiment score” for each restaurant.
The core of the proposed method is a linear combination:
Adjusted Rating = α·Raw Rating + β·Feature Score + γ·Location Score + δ·Review Sentiment Score
The coefficients α, β, γ, and δ are learned through five‑fold cross‑validation, minimizing mean squared error (MSE) between the adjusted rating and a hidden “ground‑truth” metric derived from external sales data supplied by a partner firm. Empirical results show that the adjusted rating reduces average prediction error by 12.4 % compared with the raw Yelp average. Notably, the gap between high‑priced downtown establishments and lower‑priced suburban venues shrinks considerably, suggesting that the model mitigates location‑related over‑ or under‑rating. Moreover, the correlation between adjusted ratings and actual revenue rises to 0.68, a substantial improvement over the 0.51 correlation observed for raw Yelp stars.
The authors discuss several limitations. First, the weighting scheme for features and location is derived from U.S. data and may not transfer to other cultural contexts without recalibration. Second, sentiment analysis is optimized for English reviews; non‑English content receives limited treatment, potentially biasing results in multilingual markets. Third, the linear formulation cannot capture complex, non‑linear interactions among amenities, geography, and textual sentiment that more sophisticated models (e.g., gradient boosting, deep collaborative filtering) could exploit. Additionally, the paper lacks a thorough treatment of data cleaning steps such as duplicate review removal, spam detection, and handling of missing attribute values, which raises concerns about reproducibility.
In conclusion, the study makes a valuable contribution by quantifying and partially correcting systematic biases in an influential online rating platform. It demonstrates that integrating amenity information, geographic context, and sentiment‑derived textual cues can produce a more equitable and business‑relevant rating metric. Future work is suggested to incorporate multimodal data (photos, videos), adopt non‑linear machine‑learning techniques, and validate the framework on international datasets to assess its generalizability.
Comments & Academic Discussion
Loading comments...
Leave a Comment