Analysis of Web Logs and Web User in Web Mining

Reading time: 5 minute
...

📝 Abstract

Log files contain information about User Name, IP Address, Time Stamp, Access Request, number of Bytes Transferred, Result Status, URL that Referred and User Agent. The log files are maintained by the web servers. By analysing these log files gives a neat idea about the user. This paper gives a detailed discussion about these log files, their formats, their creation, access procedures, their uses, various algorithms used and the additional parameters that can be used in the log files which in turn gives way to an effective mining. It also provides the idea of creating an extended log file and learning the user behaviour.

💡 Analysis

Log files contain information about User Name, IP Address, Time Stamp, Access Request, number of Bytes Transferred, Result Status, URL that Referred and User Agent. The log files are maintained by the web servers. By analysing these log files gives a neat idea about the user. This paper gives a detailed discussion about these log files, their formats, their creation, access procedures, their uses, various algorithms used and the additional parameters that can be used in the log files which in turn gives way to an effective mining. It also provides the idea of creating an extended log file and learning the user behaviour.

📄 Content

International Journal of Network Security & Its Applications (IJNSA), Vol.3, No.1, January 2011 DOI : 10.5121/ijnsa.2011.3107 99 ANALYSIS OF WEB LOGS AND WEB USER IN WEB MINING L.K. Joshila Grace1, V.Maheswari2, Dhinaharan Nagamalai3, 1Research Scholar, Department of Computer Science and Engineering joshilagracejebin@gmail.com 2 Professor and Head,Department of Computer Applications 1,2Sathyabama University,Chennai,India 3Wireilla Net Solutions PTY Ltd, Australia

ABSTRACT Log files contain information about User Name, IP Address, Time Stamp, Access Request, number of Bytes Transferred, Result Status, URL that Referred and User Agent. The log files are maintained by the web servers. By analysing these log files gives a neat idea about the user. This paper gives a detailed discussion about these log files, their formats, their creation, access procedures, their uses, various algorithms used and the additional parameters that can be used in the log files which in turn gives way to an effective mining. It also provides the idea of creating an extended log file and learning the user behaviour. KEYWORDS Web Log file, Web usage mining, Web servers, Log data, Log Level directive.

  1. INTRODUCTION Log files are files that list the actions that have been occurred. These log files reside in the web server. Computers that deliver the web pages are called as web servers. The Web server stores all of the files necessary to display the Web pages on the users computer. All the individual web pages combines together to form the completeness of a Web site. Images/graphic files and any scripts that make dynamic elements of the site function. , The browser requests the data from the Web server, and using HTTP, the server delivers the data back to the browser that had requested the web page. The browser in turn converts, or formats, the files into a user viewable page. This gets displayed in the browser. In the same way the server can send the files to many client computers at the same time, allowing multiple clients to view the same page simultaneously.
  2. CONTENTS OF A LOG FILE The Log files in different web servers maintain different types of information. [6]The basic information present in the log file are • User name: This identifies who had visited the web site. The identification of the user mostly would be the IP address that is assigned by the Internet Service provider (ISP). This may be a temporary address that has been assigned. There fore here the unique identification of the user is lagging. In some web sites the user identification is made by getting the user profile and allows them to access the web site by using a user name and password. In this kind of access the user is being identified uniquely so that the revisit of the user can also be identified.

International Journal of Network Security & Its Applications (IJNSA), Vol.3, No.1, January 2011 100

• Visiting Path: The path taken by the user while visiting the web site. This may be by using the URL directly or by clicking on a link or trough a search engine. • Path Traversed: This identifies the path taken by the user with in the web site using the various links. • Time stamp: The time spent by the user in each web page while surfing through the web site. This is identified as the session. • Page last visited: The page that was visited by the user before he or she leaves the web site. • Success rate: The success rate of the web site can be determined by the number of downloads made and the number copying activity under gone by the user. If any purchase of things or software made, this would also add up the success rate. • User Agent: This is nothing but the browser from where the user sends the request to the web server. It’s just a string describing the type and version of browser software being used. • URL: The resource accessed by the user. It may be an HTML page, a CGI program, or a script. • Request type: The method used for information transfer is noted. The methods like GET, POST. These are the contents present in the log file. This log file details are used in case of web usage mining process. According to web usage mining it mines the highly utilized web site. The utilisation would be the frequently visited web site or the web site being utilized for longer time duration. There fore the quantitative usage of the web site can be analysed if the log file is analysed. 3. LOCATION OF A LOG FILE A Web log is a file to which the Web server writes information each time a user requests a web site from that particular server. [7]A log file can be located in three different places: • Web Servers • Web proxy Servers • Client browsers 3.1 Web Server Log files The log file that resides in the web server notes the activit

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut