Online Social Networks (OSN) during last years acquired a huge and increasing popularity as one of the most important emerging Web phenomena, deeply modifying the behavior of users and contributing to build a solid substrate of connections and relationships among people using the Web. In this preliminary work paper, our purpose is to analyze Facebook, considering a significant sample of data reflecting relationships among subscribed users. Our goal is to extract, from this platform, relevant information about the distribution of these relations and exploit tools and algorithms provided by the Social Network Analysis (SNA) to discover and, possibly, understand underlying similarities between the developing of OSN and real-life social networks.
The problem of analyzing social networks was already introduced during late sixties by Milgram [18] and Travers [23] in Psychology and Sociology. Starting from this point for twenty and more years, several kind of real-life social experiments have been conducted and studied by sociologists, trying to understand motivations, dynamics and rules of real-life social networks.
During last years the Web phenomenon of OSN started spreading and computational aspects have also been considered [3,15,22]. Several Social Networking Services were developed, most of them gathered millions of users in an incredible short amount of time. The OSN we are considering in this work is Facebook 4 , which collected more than 500 millions of world-wide users as of July 2010.
The unpredicted success and the fast growing rate of these platforms shortly opened new fascinating academic problems; e.g. it is possible to study OSN with tools provided by the SNA science [19]? Is the behavior of OSN users comparable with the one showed by actors of real-life social networks [12]? What are the topological characteristics of OSN [2]? And what about their structure and evolution [16]? Today we exploit computational resources to analyze data acquired from OSN, trying to answer to these problems. In this work, we analyze connections among almost a million of Facebook users, data collected through some developed ad hoc Information Extraction techniques.
This paper is organized as follows: in Section 2 we consider related works on social networks, OSN, etc., in particular regarding data mining experiments and SNA; Section 3 covers aspects of Artificial Intelligence and Information Extraction related to algorithms and techniques used in order to acquire and gather data from Facebook; Section 4 presents collected data, focusing on their statistical analysis, exploiting some tools provided by the SNA science. In Section 5 we try to graphically plot this information, i.e., a large graph where nodes represent users and edges reflect ties among them. Section 6 concludes, providing some suggestions for future work.
Literature on Web (and social Web) data extraction is growing: Ferrara et al. [10] provided a comprehensive survey on applications and techniques. In [9], Ferrara and Baumgartner developed some techniques for automatic wrapper adaptation. A slightly modified version of that algorithm, relying on analyzing structural similarities inside the DOM tree structure of Facebook friend-list pages, is the core of the agent used here to gather data.
A common SNA task is to discover, if existing, aggregations and subsets of nodes playing similar roles or occupying a particular position in a network [7]. Some strictly connected problems are related to optimizing the visual representation of graphs [4]; for large social networks graphs is not trivial to find a meaningful graphical representation, because of the number of elements to display, and finding algorithms for the planar embedding of the graph, so as reducing (or eliminating) intersecting edges and improving aesthetic and functional characteristics of the graph itself, is part of the solution [5].
Several SNA tools have been developed during the last years: GUESS [1] focuses on improving the interactive exploration of graphs; NodeXL [21], developed as an add-in to the Microsoft Excel 2007 spreadsheet software, provides tools for network overview, discovery and exploration. LogAnalysis [8] helps forensic analysts in visual statistical analysis of mobile phone traffic networks. Jung [17] and Prefuse [14] provide Java APIs implementing algorithms and methods for building applications for graphical visualization and SNA for graphs.
The very first step of a SNA experiment is acquiring data: for this purpose we designed and developed a custom agent, an automaton simulating the behavior of real users, visiting Facebook publicly accessible profiles and automatically extracting relationships among them. Once acquired, information must be collected in some kind of well-structured format; completed this process, data must be cleaned, removing duplicates and irrelevant information, then they are ready to be used for their purpose.
In order to acquire information about friendship relations, we developed an agent that automatically visits the friend-list page of a real user seed profile, and then recursively, acquires friendship relations visiting friend-list pages of friends of the seed, and so on, down to the third sub-level of friendship relations. Only friendship relations among real users have been acquired, fan pages and companies having been discarded (Facebook provides this filter). This agent acquires information only from profiles in friendship relation with the seed and from publicly accessible profiles, thus respecting the Facebook privacy policies.
We thus obtained an undirected graph composed of 547,302 vertices and 836,468 edges; for privacy reasons only user IDs were collected. The agent
This content is AI-processed based on open access ArXiv data.