One of the most fundamental problems in large scale network analysis is to determine the importance of a particular node in a network. Betweenness centrality is the most widely used metric to measure the importance of a node in a network. In this paper, we present a randomized parallel algorithm and an algebraic method for computing betweenness centrality of all nodes in a network. We prove that any path-comparison based algorithm cannot compute betweenness in less than O(nm) time.
Deep Dive into Betweenness Centrality : Algorithms and Lower Bounds.
One of the most fundamental problems in large scale network analysis is to determine the importance of a particular node in a network. Betweenness centrality is the most widely used metric to measure the importance of a node in a network. In this paper, we present a randomized parallel algorithm and an algebraic method for computing betweenness centrality of all nodes in a network. We prove that any path-comparison based algorithm cannot compute betweenness in less than O(nm) time.
arXiv:0809.1906v2 [cs.DS] 19 Oct 2008
Betweenness Centrality : Algorithms and Lower Bounds
Shiva Kintali∗
Abstract
One of the most fundamental problems in
large-scale network analysis is to determine the
importance of a particular node in a network.
Betweenness centrality is the most widely used
metric to measure the importance of a node in a
network. In this paper, we present a randomized
parallel algorithm and an algebraic method for
computing betweenness centrality of all nodes in
a network. We prove that any path-comparison
based algorithm cannot compute betweenness
in less than O(nm) time.
Keywords: all-pairs shortest paths, between-
ness centrality, lower bounds, parallel graph al-
gorithms, social networks.
1
Introduction
One of the most fundamental problems in large-
scale network analysis is to determine the im-
portance of a particular node (or an edge) in
a network. For example, in social networks we
wish to know agents that have very short con-
nections to large portions of the population. In
communication networks we wish to know the
links that carry a lot of traffic, ISPs that at-
tract a lot of business, links that, if disconnected,
decrease network performance dramatically, and
so on. A particular way to measure the impor-
tance of network elements (nodes or edges) is us-
ing centrality metrics such as closeness centrality
[29], graph centrality [19], stress centrality [31]
and betweenness centrality ([16], [2]). An impor-
∗College of Computing, Georgia Institute of Technol-
ogy, Atlanta, GA-30332. Email : kintali@cc.gatech.edu
tant application of centrality arises in the study
epidemic phenomena in networks when an infec-
tious disease or a computer virus is disseminated.
The power of a node to spread the epidemic is
related to its centrality [28]. Centrality metrics
also find applications in natural language pro-
cessing [14], to compute relative importance of
textual units.
Betweenness centrality (introduced by Free-
man [16] and Anthonisse [2]) is the most pop-
ular (and computationally expensive) centrality
metric.
Some recent applications of between-
ness include the study of biological networks
[20, 26, 12], study of sexual networks and AIDS
[24], identifying key actors in terrorist networks
[22, 10], organizational behavior [6], supply chain
management [9], and transportation networks
[18]. Betweenness can also be used as a heuristic
to solve NP-hard problems like graph clustering.
For example, Newman and Girvan [25] developed
a heuristic to find community structure in large
networks, based on betweenness of the edges of
the network.
Since the networks of interest are huge, it is
important to develop algorithms that compute
these metrics efficiently. Brandes [4] showed that
betweenness centrality can be computed in the
same asymptotic time bounds as n Single Source
Shortest Path (SSSP) computations.
Brandes
and Pich [5] presented experimental results of
estimating different centrality measures under
various node-selection strategies. Eppstein and
Wang [13] presented a randomized approxima-
tion algorithm for closeness centrality.
1
1.1
Betweenness Centrality
We denote a network by an undirected graph
G(V, E),
with vertex set {v1, v2, . . . , vn} (or
{1, 2, . . . , n}), with |V | = n vertices and |E| = m
edges, representing the relationships between the
vertices.
In this paper, we refer to connected
undirected graphs, unless otherwise stated. Each
edge e ∈E has a positive integer weight w(e).
Unweighted graphs have w(e) = 1 for all edges.
A path from s to t is defined as a sequence of
edges (vi, vi+1), 0 ≤i ≤l, where v0 = s and
vl = t. The length of a path is the sum of weights
of edges in this sequence. We use d(s, t) to de-
note the distance (the minimum length of any
path connecting s and t in G) between vertices s
and t. We set d(i, i) = 0 by convention. We de-
note the total number of shortest paths between
vertices s and t by λst = λts. We set λss = 1
by convention.
The number of shortest paths
between s and t, passing through a vertex v, is
denoted by λst(v). Let Diam(G) be the diam-
eter (the longest shortest path) of the graph G.
Let A = (aij) be the adjacency matrix of the
graph, i.e., A is a 0-1 matrix with aij = 1 iff
(i, j) ∈E.
Let δst(v) denote the fraction of shortest
paths between s and t that pass through a partic-
ular vertex v i.e., δst(v) = λst(v)
λst . We call δst(v)
the pair-dependency of s, t on v. Betweenness
centrality of a vertex v is defined as
BC(v) =
X
s,t:s̸=v̸=t
δst(v)
The dependency of a source vertex s ∈V on a
vertex v ∈V is defined as
δs∗(v) =
X
t:t̸=s,t̸=v
δst(v).
The betweenness centrality of a vertex v can
be then expressed as
BC(v) =
X
s:s̸=v
δs∗(v).
Define the set of predecessors of a vertex v
on shortest paths from s as Ps(v) = {u ∈V :
(u, v) ∈E, d(s, v) = d(s, u) + w(u, v)}. The fol-
lowing theorem, states that the dependencies of
the closer vertices can be computed from the de-
pendencies of the farther vertices.
Theorem 1.1. [4] The dependency of s ∈V
…(Full text truncated)…
This content is AI-processed based on ArXiv data.