We describe a general stochastic processes-based approach to modeling user-contributory web sites, where users create, rate and share content. These models describe aggregate measures of activity and how they arise from simple models of individual users. This approach provides a tractable method to understand user activity on the web site and how this activity depends on web site design choices, especially the choice of what information about other users' behaviors is shown to each user. We illustrate this modeling approach in the context of user-created content on the news rating site Digg.
Deep Dive into Stochastic Models of User-Contributory Web Sites.
We describe a general stochastic processes-based approach to modeling user-contributory web sites, where users create, rate and share content. These models describe aggregate measures of activity and how they arise from simple models of individual users. This approach provides a tractable method to understand user activity on the web site and how this activity depends on web site design choices, especially the choice of what information about other users’ behaviors is shown to each user. We illustrate this modeling approach in the context of user-created content on the news rating site Digg.
arXiv:0904.0016v1 [cs.CY] 31 Mar 2009
Stochastic Models of User-Contributory Web Sites
Tad Hogg
Hewlett-Packard Laboratories
Kristina Lerman
USC Information Sciences Institute
November 7, 2021
Abstract
We describe a general stochastic processes-based approach to modeling user-contributory
web sites, where users create, rate and share content. These models describe aggregate measures
of activity and how they arise from simple models of individual users. This approach provides
a tractable method to understand user activity on the web site and how this activity depends on
web site design choices, especially the choice of what information about other users’ behaviors
is shown to each user. We illustrate this modeling approach in the context of user-created content
on the news rating site Digg.
1
Introduction
The Web is becoming more complex and dynamic as sites allow users to contribute and personalize
content. Such sites include Digg, Flickr and YouTube where users share and rate news stories, photos
and videos, respectively. Additional examples of such web sites include Wikipedia and Bugzilla,
enabling anyone to contribute to encyclopedia articles or help develop open source software. These
social web sites also often allow users to form explicit links with other users whose contributions
they find interesting and highlight the activity of a user’s designated friends [13] to help users find
relevant content.
Web sites often provide users with aggregate summaries of recent activity. For example, both
Digg and Flickr have a front page that features ‘hot’ (popular or interesting) content. News orga-
nizations, such as The New York Times, allow users to subscribe to or embed RSS feeds of their
most popular (e.g., emailed) stories in the users’ own pages. Feedback between individual and col-
lective actions can lead to nonlinear amplification of even small signals. For example, the ‘Digg
effect’ refers to the phenomenon where a ‘hot’ story on the social news aggregator Digg brings
down servers hosting the story that are not equipped to deal with heavy traffic that a popular story
on Digg generates.
Aggregate activity of many users determines the structure and usefulness of user-participatory
web sites. Understanding this emergent behavior will enable, for example, predicting which newly
contributed content will likely become popular, identifying productive ways to change how infor-
mation is displayed to users, or how to change user incentives so as to improve the content.
The behavior of an individual user on a user-contributory web site is governed by a myriad of
social, economic, emotional and cognitive factors, and often subject to unpredictable environmental
1
influences, such as the weather or the economy. Nevertheless, the combined activities of many users
often produce remarkably robust aggregate behaviors [24, 25].
In this paper, we present a stochastic processes-based framework for relating aggregate behavior
of web users to simple descriptions of their typical individual behavior. The models can be written
directly from the individual behavior descriptions, and quantified with empirical observations of a
representative sample of users.
The methodology we describe applies to behaviors that can be modeled as Markov processes,
i.e., where the relevant changes depend only on the current state of the system, not the detailed
history of how it arrived at that state. In principle such models can always be applied by extending
the complexity of the “state” describing the system. However, such complexity can lead to models
requiring estimates for an impractically large number of parameters characterizing how the state
changes. Instead, the Markov modeling assumption is useful primarily in connection with systems
requiring only a few variables to define their current state.
At first glance an assumption of Markov processes and simple states may appear overly restric-
tive for describing human behavior. However, many online activities provide only a fairly limited
set of actions for users and present information based on little or no historical context of particu-
lar individuals. In these cases, a few state variables can capture the main context involved in user
actions. Furthermore, we discuss simplifying approximations to the models that readily enable iden-
tifying how key system behaviors relate to user actions. These simplifications come at a cost: while
the resulting models correctly describe the typical aggregate behaviors, they say little about their
extreme cases, e.g., where web site use is suddenly and briefly much larger than average. Even with
this limitation, however, simplified models are often preferred over full models, which frequently
require multiple simulation trials, which are computationally expensive and whose typical behaviors
can be challenging to identify [14].
The paper is organized as follows. Section 2 reviews the stochastic modeling framework. In
Section 3 we then illustrate the framework fo
…(Full text truncated)…
This content is AI-processed based on ArXiv data.