📝 Original Info
- Title: Agents, Bookmarks and Clicks: A topical model of Web traffic
- ArXiv ID: 1003.5327
- Date: 2011-11-24
- Authors: Researchers from original ArXiv paper
📝 Abstract
Analysis of aggregate and individual Web traffic has shown that PageRank is a poor model of how people navigate the Web. Using the empirical traffic patterns generated by a thousand users, we characterize several properties of Web traffic that cannot be reproduced by Markovian models. We examine both aggregate statistics capturing collective behavior, such as page and link traffic, and individual statistics, such as entropy and session size. No model currently explains all of these empirical observations simultaneously. We show that all of these traffic patterns can be explained by an agent-based model that takes into account several realistic browsing behaviors. First, agents maintain individual lists of bookmarks (a non-Markovian memory mechanism) that are used as teleportation targets. Second, agents can retreat along visited links, a branching mechanism that also allows us to reproduce behaviors such as the use of a back button and tabbed browsing. Finally, agents are sustained by visiting novel pages of topical interest, with adjacent pages being more topically related to each other than distant ones. This modulates the probability that an agent continues to browse or starts a new session, allowing us to recreate heterogeneous session lengths. The resulting model is capable of reproducing the collective and individual behaviors we observe in the empirical data, reconciling the narrowly focused browsing patterns of individual users with the extreme heterogeneity of aggregate traffic measurements. This result allows us to identify a few salient features that are necessary and sufficient to interpret the browsing patterns observed in our data. In addition to the descriptive and explanatory power of such a model, our results may lead the way to more sophisticated, realistic, and effective ranking and crawling algorithms.
💡 Deep Analysis
Deep Dive into Agents, Bookmarks and Clicks: A topical model of Web traffic.
Analysis of aggregate and individual Web traffic has shown that PageRank is a poor model of how people navigate the Web. Using the empirical traffic patterns generated by a thousand users, we characterize several properties of Web traffic that cannot be reproduced by Markovian models. We examine both aggregate statistics capturing collective behavior, such as page and link traffic, and individual statistics, such as entropy and session size. No model currently explains all of these empirical observations simultaneously. We show that all of these traffic patterns can be explained by an agent-based model that takes into account several realistic browsing behaviors. First, agents maintain individual lists of bookmarks (a non-Markovian memory mechanism) that are used as teleportation targets. Second, agents can retreat along visited links, a branching mechanism that also allows us to reproduce behaviors such as the use of a back button and tabbed browsing. Finally, agents are sustained by
📄 Full Content
Agents, Bookmarks and Clicks:
A topical model of Web navigation
Mark R. Meiss1,3∗
Bruno Gonçalves1,2,3
José J. Ramasco4
Alessandro Flammini1,2
Filippo Menczer1,2,3,4
1School of Informatics and Computing, Indiana University, Bloomington, IN, USA
2Center for Complex Networks and Systems Research, Indiana University, Bloomington, IN, USA
3Pervasive Technology Institute, Indiana University, Bloomington, IN, USA
4Complex Networks and Systems Lagrange Laboratory (CNLL), ISI Foundation, Turin, Italy
ABSTRACT
Analysis of aggregate and individual Web traffic has shown that
PageRank is a poor model of how people navigate the Web. Us-
ing the empirical traffic patterns generated by a thousand users, we
characterize several properties of Web traffic that cannot be repro-
duced by Markovian models. We examine both aggregate statistics
capturing collective behavior, such as page and link traffic, and in-
dividual statistics, such as entropy and session size. No model cur-
rently explains all of these empirical observations simultaneously.
We show that all of these traffic patterns can be explained by an
agent-based model that takes into account several realistic browsing
behaviors. First, agents maintain individual lists of bookmarks (a
non-Markovian memory mechanism) that are used as teleportation
targets. Second, agents can retreat along visited links, a branch-
ing mechanism that also allows us to reproduce behaviors such as
the use of a back button and tabbed browsing. Finally, agents are
sustained by visiting novel pages of topical interest, with adjacent
pages being more topically related to each other than distant ones.
This modulates the probability that an agent continues to browse or
starts a new session, allowing us to recreate heterogeneous session
lengths. The resulting model is capable of reproducing the collec-
tive and individual behaviors we observe in the empirical data, rec-
onciling the narrowly focused browsing patterns of individual users
with the extreme heterogeneity of aggregate traffic measurements.
This result allows us to identify a few salient features that are nec-
essary and sufficient to interpret the browsing patterns observed in
our data. In addition to the descriptive and explanatory power of
such a model, our results may lead the way to more sophisticated,
realistic, and effective ranking and crawling algorithms.
∗Corresponding author. Email: mmeiss@indiana.edu
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$10.00.
Categories and Subject Descriptors
H.3.4 [Information Storage and Retrieval]: Systems and Soft-
ware—Information networks; H.4.3 [Information Systems Appli-
cations]: Communications Applications—Information browsers;
H.5.4 [Information Interfaces and Presentation]: Hypertext/ Hy-
permedia—Navigation
Keywords
Web links, navigation, traffic, clicks, browsing, entropy, sessions,
agent-based model, bookmarks, back button, interest, topicality,
PageRank, BookRank
1.
INTRODUCTION
Despite its simplicity, PageRank [6] has been a remarkably ro-
bust model of human Web browsing characterizing it as a random
surfing activity. Such models of Web surfing have allowed us to
speculate how people interact with the Web. As ever more peo-
ple spend a growing portion of their time online, their Web traces
provide an increasingly informative window into human dynam-
ics. The availability of large volumes of Web traffic data enables
systematic testing of PageRank’s underlying navigation assump-
tions [20]. Traffic patterns aggregated across users have revealed
that some of its key assumptions—uniform random walk and uni-
form random teleportation—are widely violated, making PageRank
a poor predictor of traffic. Such results leave open the question of
how to design a better Web navigation model. Here we expand on
our previous empirical analysis [20, 19] by considering also indi-
vidual traffic patterns [14]. Our results provide further evidence
for the limits of simple (memoryless) Markovian models such as
PageRank. They suggest the need for an agent-based model with
more realistic features, such as memory and topicality, to account
for both individual and aggregate traffic patterns observed in real-
world data.
Models of user browsing also have important practical applica-
tions. First, the traffic received by pages and Web sites has a direct
impact on the financial success of many companies and institutions.
Indirectly, understanding traffic patterns has consequences for pre-
dicting advertising revenues and on policies used to establish ad-
vertising prices [11]. Second, realistic models of Web navigation
could guide
…(Full text truncated)…
Reference
This content is AI-processed based on ArXiv data.