The growth of the World Wide Web has emphasized the need for improvement in user latency. One of the techniques that are used for improving user latency is Caching and another is Web Prefetching. Approaches that bank solely on caching offer limited performance improvement because it is difficult for caching to handle the large number of increasingly diverse files. Studies have been conducted on prefetching models based on decision trees, Markov chains, and path analysis. However, the increased uses of dynamic pages, frequent changes in site structure and user access patterns have limited the efficacy of these static techniques. In this paper, we have proposed a methodology to cluster related pages into different categories based on the access patterns. Additionally we use page ranking to build up our prediction model at the initial stages when users haven't already started sending requests. This way we have tried to overcome the problems of maintaining huge databases which is needed in case of log based techniques.
A Dynamic Web Page Prediction Model Based on Access
Patterns to Offer Better User Latency
Debajyoti Mukhopadhyay1, 2 Priyanka Mishra1 Dwaipayan Saha1 Young-Chon Kim2
1 Web Intelligence & Distributed Computing Research Lab, Techno India (West Bengal University of Technology)
EM 4/1, Salt Lake Sector V, Calcutta 700091, India
Emails: {debajyoti.mukhopadhyay, dwaipayansaha, priyanka147}@gmail.com
2 Chonbuk National University, Division of Electronics & Information Engineering
561-756 Jeonju, Republic of Korea; Email: yckim@chonbuk.ac.kr
ABSTRACT
The growth of the World Wide Web has
emphasized the need for improvement in user
latency. One of the techniques that are used for
improving user latency is Caching and another
is Web Prefetching. Approaches that bank solely
on
caching
offer
limited
performance
improvement because it is difficult for caching to
handle the large number of increasingly diverse
files. Studies have been conducted on prefetching
models based on decision trees, Markov chains,
and path analysis. However, the increased uses
of dynamic pages, frequent changes in site
structure and user access patterns have limited
the efficacy of these static techniques. In this
paper, we have proposed a methodology to
cluster related pages into different categories
based on the access patterns. Additionally we
use page ranking to build up our prediction
model at the initial stages when users haven’t
already started sending requests. This way we
have tried to overcome the problems of
maintaining huge databases which is needed in
case of log based techniques.
Keywords
Levels, Classes, Product Value, Prediction
Window, Date of modification, Page rank, links,
prediction model, Predictor, Update Engine
- INTRODUCTION
The exponential proliferation of Web usage has
dramatically increased the volume of Internet
traffic and has caused serious performance
degradation in terms of user latency and
bandwidth on the Internet. The use of the World
Wide Web has become indispensable in
everybody’s life which has also made it critical
to look for ways to accommodate increasing
numbers of users while preventing excessive
delays and congestion. Studies have been
conducted on prefetching models based on
decision trees, Markov chains, and path analysis.
[1][2][4] There are several factors that contribute
to the Web access latencies such as:
•
Server configuration
•
Server load
•
Client configuration
•
Document to be transferred
•
Network characteristics
Web Caching is a technique that made efforts to
solve the problem of these access latencies.
Specially, global caching methods that straddle
across users work quite well. However, the
increasing trend of generating dynamic pages in
response to HTTP requests from users has
rendered them quite ineffective. The following
can be seen as the major reasons for the
increased use of dynamic Web pages:
For user customized Web pages the content of
which depends on the users’ interests. Such
personalized pages allow the user to reach the
information they want in much lesser time.
For pages that need frequent updating it is
irrational to make those changes on the static
Web pages. Maintaining a database and
generating the content of the Web pages from
the database is a much cheaper alternative.
Pages displaying sports updates, stock updates
weather information etc. which involve a lot of
variables are generated dynamically.
Pages that need a user authentication before
displaying their content are also generated
dynamically, as separate pages are generated as
per the user information for each user.
This trend is increasing rapidly.
4. All response pages on a secure connection are
generated dynamically as per the password and
other security features such as encryption keys.
These pages expire immediately by resetting the
Expire field and/or by the Pragma directive of
‘nocache’ in the HTTP header of the server
response, to prevent them from being misused in
a Replay attack.
As the Internet grows and becomes a primary
means of communication in business as well as
the day to day life, the majority of Web pages
will tend to be dynamic. In such a situation
traditional caching methods will be rendered
obsolete. The dynamic pages need a substantial
amount of processing on the server side, after
receiving the request from the client and hence
contribute to the increase in the access latency
further.
An important prefetching task is to build an
effective prediction model and data structure for
predicting the future requests of the user and
then sending those predicted requests to the user
before he/she actually makes the request.
CLIENT
The organization of rest of the paper is as
follows: our methodology is presented in Section
2, in Section 3 the Experimental Se
This content is AI-processed based on open access ArXiv data.