Improved approximation algorithms for low-density instances of the Minimum Entropy Set Cover Problem

We study the approximability of instances of the minimum entropy set cover problem, parameterized by the average frequency of a random element in the covering sets. We analyze an algorithm combining a greedy approach with another one biased towards large sets. The algorithm is controled by the percentage of elements to which we apply the biased approach. The optimal parameter choice has a phase transition around average density $e$ and leads to improved approximation guarantees when average element frequency is less than $e$.

💡 Research Summary

The paper investigates the approximability of the Minimum Entropy Set Cover (MESC) problem when instances are characterized by a low average element frequency, which the authors refer to as the “average density” d̄. In the classic formulation, a family F of subsets of a ground set U covers each element e ∈ U a certain number of times, denoted freq(e). The average density is defined as d̄ = (1/|U|) ∑_{e∈U} freq(e). While the standard greedy algorithm achieves a logarithmic approximation ratio for arbitrary instances, its performance can be far from optimal when d̄ is small, a situation that frequently occurs in real‑world data such as document‑keyword matrices or biological pathway annotations.

To exploit this structural property, the authors propose a hybrid algorithm that mixes two sub‑routines: (1) a conventional greedy step that repeatedly selects the set covering the largest number of uncovered elements, and (2) a “large‑set‑biased” step that preferentially picks among the biggest sets in the current family. The mixing is governed by a parameter p ∈