Agents, Bookmarks and Clicks: A topical model of Web traffic

Reading time: 6 minute
...

📝 Original Info

  • Title: Agents, Bookmarks and Clicks: A topical model of Web traffic
  • ArXiv ID: 1003.5327
  • Date: 2011-11-24
  • Authors: Researchers from original ArXiv paper

📝 Abstract

Analysis of aggregate and individual Web traffic has shown that PageRank is a poor model of how people navigate the Web. Using the empirical traffic patterns generated by a thousand users, we characterize several properties of Web traffic that cannot be reproduced by Markovian models. We examine both aggregate statistics capturing collective behavior, such as page and link traffic, and individual statistics, such as entropy and session size. No model currently explains all of these empirical observations simultaneously. We show that all of these traffic patterns can be explained by an agent-based model that takes into account several realistic browsing behaviors. First, agents maintain individual lists of bookmarks (a non-Markovian memory mechanism) that are used as teleportation targets. Second, agents can retreat along visited links, a branching mechanism that also allows us to reproduce behaviors such as the use of a back button and tabbed browsing. Finally, agents are sustained by visiting novel pages of topical interest, with adjacent pages being more topically related to each other than distant ones. This modulates the probability that an agent continues to browse or starts a new session, allowing us to recreate heterogeneous session lengths. The resulting model is capable of reproducing the collective and individual behaviors we observe in the empirical data, reconciling the narrowly focused browsing patterns of individual users with the extreme heterogeneity of aggregate traffic measurements. This result allows us to identify a few salient features that are necessary and sufficient to interpret the browsing patterns observed in our data. In addition to the descriptive and explanatory power of such a model, our results may lead the way to more sophisticated, realistic, and effective ranking and crawling algorithms.

💡 Deep Analysis

Deep Dive into Agents, Bookmarks and Clicks: A topical model of Web traffic.

Analysis of aggregate and individual Web traffic has shown that PageRank is a poor model of how people navigate the Web. Using the empirical traffic patterns generated by a thousand users, we characterize several properties of Web traffic that cannot be reproduced by Markovian models. We examine both aggregate statistics capturing collective behavior, such as page and link traffic, and individual statistics, such as entropy and session size. No model currently explains all of these empirical observations simultaneously. We show that all of these traffic patterns can be explained by an agent-based model that takes into account several realistic browsing behaviors. First, agents maintain individual lists of bookmarks (a non-Markovian memory mechanism) that are used as teleportation targets. Second, agents can retreat along visited links, a branching mechanism that also allows us to reproduce behaviors such as the use of a back button and tabbed browsing. Finally, agents are sustained by

📄 Full Content

Agents, Bookmarks and Clicks: A topical model of Web navigation Mark R. Meiss1,3∗ Bruno Gonçalves1,2,3 José J. Ramasco4 Alessandro Flammini1,2 Filippo Menczer1,2,3,4 1School of Informatics and Computing, Indiana University, Bloomington, IN, USA 2Center for Complex Networks and Systems Research, Indiana University, Bloomington, IN, USA 3Pervasive Technology Institute, Indiana University, Bloomington, IN, USA 4Complex Networks and Systems Lagrange Laboratory (CNLL), ISI Foundation, Turin, Italy ABSTRACT Analysis of aggregate and individual Web traffic has shown that PageRank is a poor model of how people navigate the Web. Us- ing the empirical traffic patterns generated by a thousand users, we characterize several properties of Web traffic that cannot be repro- duced by Markovian models. We examine both aggregate statistics capturing collective behavior, such as page and link traffic, and in- dividual statistics, such as entropy and session size. No model cur- rently explains all of these empirical observations simultaneously. We show that all of these traffic patterns can be explained by an agent-based model that takes into account several realistic browsing behaviors. First, agents maintain individual lists of bookmarks (a non-Markovian memory mechanism) that are used as teleportation targets. Second, agents can retreat along visited links, a branch- ing mechanism that also allows us to reproduce behaviors such as the use of a back button and tabbed browsing. Finally, agents are sustained by visiting novel pages of topical interest, with adjacent pages being more topically related to each other than distant ones. This modulates the probability that an agent continues to browse or starts a new session, allowing us to recreate heterogeneous session lengths. The resulting model is capable of reproducing the collec- tive and individual behaviors we observe in the empirical data, rec- onciling the narrowly focused browsing patterns of individual users with the extreme heterogeneity of aggregate traffic measurements. This result allows us to identify a few salient features that are nec- essary and sufficient to interpret the browsing patterns observed in our data. In addition to the descriptive and explanatory power of such a model, our results may lead the way to more sophisticated, realistic, and effective ranking and crawling algorithms. ∗Corresponding author. Email: mmeiss@indiana.edu Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$10.00. Categories and Subject Descriptors H.3.4 [Information Storage and Retrieval]: Systems and Soft- ware—Information networks; H.4.3 [Information Systems Appli- cations]: Communications Applications—Information browsers; H.5.4 [Information Interfaces and Presentation]: Hypertext/ Hy- permedia—Navigation Keywords Web links, navigation, traffic, clicks, browsing, entropy, sessions, agent-based model, bookmarks, back button, interest, topicality, PageRank, BookRank 1. INTRODUCTION Despite its simplicity, PageRank [6] has been a remarkably ro- bust model of human Web browsing characterizing it as a random surfing activity. Such models of Web surfing have allowed us to speculate how people interact with the Web. As ever more peo- ple spend a growing portion of their time online, their Web traces provide an increasingly informative window into human dynam- ics. The availability of large volumes of Web traffic data enables systematic testing of PageRank’s underlying navigation assump- tions [20]. Traffic patterns aggregated across users have revealed that some of its key assumptions—uniform random walk and uni- form random teleportation—are widely violated, making PageRank a poor predictor of traffic. Such results leave open the question of how to design a better Web navigation model. Here we expand on our previous empirical analysis [20, 19] by considering also indi- vidual traffic patterns [14]. Our results provide further evidence for the limits of simple (memoryless) Markovian models such as PageRank. They suggest the need for an agent-based model with more realistic features, such as memory and topicality, to account for both individual and aggregate traffic patterns observed in real- world data. Models of user browsing also have important practical applica- tions. First, the traffic received by pages and Web sites has a direct impact on the financial success of many companies and institutions. Indirectly, understanding traffic patterns has consequences for pre- dicting advertising revenues and on policies used to establish ad- vertising prices [11]. Second, realistic models of Web navigation could guide

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut