We study numerically the spectrum and eigenstate properties of the Google matrix of various examples of directed networks such as vocabulary networks of dictionaries and university World Wide Web networks. The spectra have gapless structure in the vicinity of the maximal eigenvalue for Google damping parameter $\alpha$ equal to unity. The vocabulary networks have relatively homogeneous spectral density, while university networks have pronounced spectral structures which change from one university to another, reflecting specific properties of the networks. We also determine specific properties of eigenstates of the Google matrix, including the PageRank. The fidelity of the PageRank is proposed as a new characterization of its stability.
The rapid growth of the World Wide Web (WWW) brings the challenge of information retrieval from this enormous database which at present contains about 10 11 webpages. An efficient algorithm for classification of webpages was proposed in [1], and is now known as the PageRank Algorithm (PRA). This PRA formed the basis of the Google search engine, which is used by the majority of Internet users in everyday life. The PRA allows to determine efficiently a vector ranking the nodes of a network by order of importance. This PageRank vector is obtained as an eigenvector of the Google matrix G built on the basis of the directed links between WWW nodes (see e.g. [2]):
Here S is the matrix constructed from the adjacency matrix A ij of the directed links of the network of size N , with A ij = 1 if there is a link from node j to node i, and A ij = 0 otherwise. Namely, S ij = A ij / k A kj if k A kj > 0, and S ij = 1/N if all elements in the column j of A are zero. The last term in Eq. (1) with uniform matrix E ij = 1 describes the probability 1 -α of a random surfer propagating along the network to jump randomly to any other node. The matrix G belongs to the class of Perron-Frobenius operators. For 0 < α < 1 it has a unique maximal eigenvalue at λ = 1, separated from the others by a gap of size at least 1 -α (see e.g. [2]). The eigenvector associated to this maximal eigenvalue is the PageRank vector, which can be viewed as the steadystate distribution for the random surfer. Usual WWW networks correspond to very sparse matrix A and repeated applications of G on a random vector converges quickly to the PageRank vector, after 50 -100 iterations for α = 0.85 which is the most commonly used value [2]. The PageRank vector is real nonnegative and can be ordered by decreasing values p j , giving the relative importance of the node j. It is known that when α varies, all eigenvalues evolve as αλ i where λ i are the eigenvalues for α = 1 and i = 2, …N , while the largest eigenvalue λ 1 = 1, associated with the PageRank, remains unchanged [2].
The properties of the PageRank vector for WWW have been extensively studied by the computer science community and many important properties have been established [3][4][5][6][7]. For example, it was shown that p j decreases approximately in an algebraic way p j ∼ 1/j β with the exponent β ≈ 0.9 [3]. It is also known that typically for the Google matrix of WWW at α = 1 there are many eigenvalues very close or equal to λ = 1, and that even at finite α < 1 there are degeneracies of eigenvalues with λ = α (see e.g. [8]).
In spite of the important progress obtained during these investigations of PageRank vectors, the spectrum of the Google matrix G was rarely studied as a whole. Nevertheless, it is clear that the structure of the network is directly linked to this spectrum. Eigenvectors other than the PageRank describe the relaxation processes toward the steady-state, and also characterize various communities or subsets of the network. Even if models of directed networks of small-world type [9] have been analyzed, constructed and investigated, the spectral properties of matrices corresponding to such networks were not so much studied. Generally for a directed network the matrix G is nonsymmetric and thus the spectrum of eigenvalues is complex. Recently the spectral study of the Google matrix for the Albert-Barabasi (AB) model [10] and randomized university WWW networks was performed in [11]. For the AB model the distribution of links is typical of scale-free networks [9]. The distribu-tion of links for the university network is approximately the same and is not affected by the randomization procedure. Indeed, the randomization procedure corresponds to the one proposed in [12] and is performed by taking pairs of links and inverting the initial vertices, keeping unchanged the number of ingoing and outgoing links for each vertex. It was established that the spectra of the AB model and the randomized university networks were quite similar. Both have a large gap between the largest eigenvalue λ 1 = 1 and the next one with |λ 2 | ≈ 0.5 at α = 1. This is in contrast with the known property of WWW where λ 2 is usually very close or equal to unity [2,8]. Thus it appears that the AB model and the randomized scale-free networks have a very different spectral structure compared to real WWW networks. Therefore it is important to study the spectral properties of examples of real networks (without randomization).
In this paper, we thus study the spectra of Google matrices for the WWW networks of several universities and show that indeed they display very different properties compared to random scale-free networks considered in [11]. We also explore the spectra of a completely different type of real network, built from the vocabulary links in various dictionaries. In addition, we analyze the properties of eigenvectors of the Google matrix for these networks. A special attention is paid to the PageRank vecto
This content is AI-processed based on open access ArXiv data.