Algorithmic Aspects of Homophyly of Networks
We investigate the algorithmic problems of the {\it homophyly phenomenon} in networks. Given an undirected graph $G = (V, E)$ and a vertex coloring $c \colon V \rightarrow {1, 2, …, k}$ of $G$, we say that a vertex $v\in V$ is {\it happy} if $v$ shares the same color with all its neighbors, and {\it unhappy}, otherwise, and that an edge $e\in E$ is {\it happy}, if its two endpoints have the same color, and {\it unhappy}, otherwise. Supposing $c$ is a {\it partial vertex coloring} of $G$, we define the Maximum Happy Vertices problem (MHV, for short) as to color all the remaining vertices such that the number of happy vertices is maximized, and the Maximum Happy Edges problem (MHE, for short) as to color all the remaining vertices such that the number of happy edges is maximized. Let $k$ be the number of colors allowed in the problems. We show that both MHV and MHE can be solved in polynomial time if $k = 2$, and that both MHV and MHE are NP-hard if $k \geq 3$. We devise a $\max {1/k, \Omega(\Delta^{-3})}$-approximation algorithm for the MHV problem, where $\Delta$ is the maximum degree of vertices in the input graph, and a 1/2-approximation algorithm for the MHE problem. This is the first theoretical progress of these two natural and fundamental new problems.
💡 Research Summary
The paper introduces two novel optimization problems that formalize the homophily phenomenon in networks. Given an undirected graph G = (V, E) and a partial vertex coloring c: V → {1,…,k}, a vertex is called “happy” if all its neighbors share its color, and an edge is “happy” if its two endpoints have the same color. The authors define (i) Maximum Happy Vertices (MHV): extend the partial coloring to a full coloring so that the number of happy vertices is maximized, and (ii) Maximum Happy Edges (MHE): extend the coloring to maximize the number of happy edges.
The first major contribution is a complete complexity classification with respect to the number of colors k. When k = 2, both MHV and MHE can be solved exactly in polynomial time. The authors show that each problem reduces to a classic cut or flow problem: MHV becomes the problem of maximizing the size of monochromatic connected components, which is equivalent to a minimum‑cut computation; MHE reduces to minimizing the cut between the two color classes, again solvable by a max‑flow/min‑cut algorithm.
When k ≥ 3, the situation changes dramatically. By reductions from the 3‑coloring problem (for MHV) and from known NP‑hard edge‑labeling formulations (for MHE), the paper proves that both problems are NP‑hard. This establishes that, unlike the binary‑color case, no polynomial‑time algorithm is expected for the general case unless P = NP.
Given the hardness, the authors turn to approximation algorithms. For MHV they propose two complementary strategies. The first is a simple random assignment that guarantees an expected 1/k fraction of the optimum, because each vertex has a 1/k chance of matching all its neighbors under a uniform random coloring. The second is a degree‑dependent greedy algorithm: repeatedly assign to each uncolored vertex the color that would make the largest number of its already‑colored neighbors happy. By a careful analysis they prove this algorithm achieves an Ω(Δ⁻³) approximation, where Δ is the maximum degree of G. The final guarantee for MHV is the larger of the two bounds, i.e., max{1/k, Ω(Δ⁻³)}. This result is particularly strong for sparse graphs (small Δ) or when the number of colors is small.
For MHE the paper presents a deterministic 1/2‑approximation algorithm. The method colors all vertices arbitrarily with the two colors, computes the number of monochromatic edges in each color class, and then selects the coloring that yields the larger count. By linearity of expectation, at least half of the total edges can be made happy, guaranteeing the 1/2 factor.
The work is the first theoretical treatment of these homophily‑based optimization problems. It provides a clear dichotomy (polynomial for k = 2, NP‑hard for k ≥ 3), and supplies the first non‑trivial approximation algorithms with provable performance guarantees. The analysis also highlights the role of graph degree in the quality of approximation for MHV, suggesting that real‑world networks—often sparse and with limited color palettes—can be handled effectively. Potential applications include community detection, label propagation, and marketing strategies where aligning node attributes with their neighborhoods is desirable. The paper thus opens a new line of research at the intersection of network science and algorithmic graph theory.