Power law in website ratings

Power law in website ratings
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In the practical work of websites popularization, analysis of their efficiency and downloading it is of key importance to take into account web-ratings data. The main indicators of website traffic include the number of unique hosts from which the analyzed website was addressed and the number of granted web pages (hits) per unit time (for example, day, month or year). Of certain interest is the ratio between the number of hits (S) and hosts (H). In practice there is even used such a concept as “average number of viewed pages” (S/H), which on default supposes a linear dependence of S on H. What actually happens is that linear dependence is observed only as a partial case of power dependence, and not always. Another new power law has been discovered on the Internet, in particular, on the WWW.


💡 Research Summary

The paper investigates the statistical relationship between two fundamental web‑traffic metrics—unique hosts (H) and page hits (S)—and a website’s position (r) in a ranking list. While industry practice often assumes a linear relationship (S/H = constant), the authors argue that this is merely a special case of a more general power‑law dependence. They propose the following hypotheses: H = Cₕ · r⁻ᵅ and S = Cₛ · r⁻ᵝ, where α and β are scaling exponents and Cₕ, Cₛ are constants.

To test these hypotheses, the authors collected daily traffic data from the Ukrainian rating service http://top.ucoz.com/, focusing on two distinct content domains: “Business and Finances” and “Games.” For each domain they extracted the number of unique hosts and the total number of hits for each ranked site, then plotted both variables against rank on log‑log axes. The resulting plots (Figures 2 and 3) display straight‑line behavior with slopes ranging from –1.1 to –1.6 and coefficients of determination (R²) between 0.60 and 0.96, confirming that both H and S follow a power‑law distribution with respect to rank.

Having established the rank‑based power laws, the authors eliminate r to obtain a direct relationship between hits and hosts. Solving the first equation for r yields r = (H/Cₕ)^(‑1/α). Substituting this expression into the second equation gives S = Cₛ · (Cₕ)^(β/α) · H^(β/α), which can be written as S = Cₛₕ · H^γ where γ = β/α. Thus, the ratio S/H is not universally constant; instead, it follows a power law whose exponent γ may be close to, but not necessarily equal to, one.

Figure 5 presents the empirical S‑vs‑H relationship for the two domains. In the “Business and Finances” sector the fitted line has a slope γ ≈ 1.10 (R² ≈ 0.81), while in the “Games” sector γ ≈ 1.05 (R² ≈ 0.66). These results indicate that, although the exponent is often near unity—justifying the common linear approximation—significant deviations can occur, especially across different content categories.

The authors also explore the effect of constructing the ranking based on hits rather than hosts. When sites are ordered by decreasing S, the rank‑vs‑S plot (Figure 4) becomes smoother, suggesting that the choice of ranking metric influences the apparent regularity of the power‑law distribution.

From a practical standpoint, the identified relationships have several applications:

  1. Traffic Analysis: Deviations of observed S/H ratios from the domain‑specific γ can flag anomalous behavior, such as bots, content farms, or sudden popularity spikes.
  2. Load Forecasting: Knowing that S scales as H^γ allows operators to predict server load from host‑count forecasts, aiding capacity planning.
  3. Advertising Planning: Advertisers can estimate potential impressions (hits) based on expected reach (unique hosts) using the derived exponent, improving budget allocation.

The paper concludes that a new power law governs web‑site ratings on the World Wide Web, that in many cases γ is close enough to one to permit a linear approximation, but that the more general power‑law model provides a richer, more accurate framework.

Limitations are acknowledged: the study relies on a single rating service and only two content categories, omits temporal dynamics (seasonality, events), and treats α and β as static constants despite possible time‑varying behavior. Future work is suggested to incorporate larger, multi‑regional datasets, to examine temporal evolution of the exponents, and to test the robustness of the power‑law model under varying traffic acquisition methods.


Comments & Academic Discussion

Loading comments...

Leave a Comment