An Incentive Compatible Multi-Armed-Bandit Crowdsourcing Mechanism with Quality Assurance
Consider a requester who wishes to crowdsource a series of identical binary labeling tasks to a pool of workers so as to achieve an assured accuracy for each task, in a cost optimal way. The workers are heterogeneous with unknown but fixed qualities and their costs are private. The problem is to select for each task an optimal subset of workers so that the outcome obtained from the selected workers guarantees a target accuracy level. The problem is a challenging one even in a non strategic setting since the accuracy of aggregated label depends on unknown qualities. We develop a novel multi-armed bandit (MAB) mechanism for solving this problem. First, we propose a framework, Assured Accuracy Bandit (AAB), which leads to an MAB algorithm, Constrained Confidence Bound for a Non Strategic setting (CCB-NS). We derive an upper bound on the number of time steps the algorithm chooses a sub-optimal set that depends on the target accuracy level and true qualities. A more challenging situation arises when the requester not only has to learn the qualities of the workers but also elicit their true costs. We modify the CCB-NS algorithm to obtain an adaptive exploration separated algorithm which we call { \em Constrained Confidence Bound for a Strategic setting (CCB-S)}. CCB-S algorithm produces an ex-post monotone allocation rule and thus can be transformed into an ex-post incentive compatible and ex-post individually rational mechanism that learns the qualities of the workers and guarantees a given target accuracy level in a cost optimal way. We provide a lower bound on the number of times any algorithm should select a sub-optimal set and we see that the lower bound matches our upper bound upto a constant factor. We provide insights on the practical implementation of this framework through an illustrative example and we show the efficacy of our algorithms through simulations.
💡 Research Summary
The paper tackles a fundamental problem in crowdsourcing: a requester must assign a stream of identical binary labeling tasks to a pool of heterogeneous workers while guaranteeing a pre‑specified target accuracy for each task and minimizing total payment. Each worker has an unknown but fixed quality (probability of providing the correct label) and a private cost. The requester therefore faces a dual learning‑and‑incentive challenge: (i) learn the workers’ qualities from noisy labels, and (ii) elicit truthful cost reports when workers are strategic.
To address this, the authors introduce the Assured Accuracy Bandit (AAB) framework, which formalizes the per‑round decision as a constrained optimization problem: select a subset of workers whose aggregated answer (e.g., majority vote) meets the target accuracy α while incurring the smallest possible cost. The accuracy constraint is non‑convex and must hold every round, unlike many bandit settings where constraints are only required in expectation.
Two algorithmic solutions are presented.
-
CCB‑NS (Constrained Confidence Bound – Non‑Strategic) handles the case where costs are known. It maintains upper and lower confidence bounds on each worker’s quality, uses the lower bound to construct a safe set that satisfies the accuracy constraint with high probability, and selects the cheapest such set. The algorithm interleaves exploration (pulling workers whose bounds are still wide) with exploitation (using the current safe set). The authors prove an upper bound on the number of rounds in which a sub‑optimal set is chosen; this bound scales logarithmically with the horizon and depends on α and the true qualities. They also derive an information‑theoretic lower bound, showing that CCB‑NS is optimal up to a constant factor.
-
CCB‑S (Constrained Confidence Bound – Strategic) extends CCB‑NS to the setting where workers may misreport costs. The algorithm is made exploration‑separated: during an initial exploration phase every worker is sampled a predetermined number of times, independent of bids, to obtain unbiased quality estimates. In the subsequent exploitation phase the mechanism selects the cheapest set that satisfies the accuracy constraint using the learned qualities and the reported costs. Crucially, the allocation rule is shown to be ex‑post monotone in each worker’s cost, which, via the generic transformation of Babaioff et al., yields an ex‑post incentive compatible and ex‑post individually rational mechanism. For a specific linear‑cost/accuracy formulation the authors further design a non‑exploration‑separated variant that exploits problem structure to prune low‑quality workers early, reducing exploration cost.
Theoretical contributions are complemented by extensive simulations. Compared with a baseline ε‑greedy approach, both CCB‑NS and CCB‑S achieve the target accuracy with far fewer label acquisitions and substantially lower total payments. The adaptive confidence‑bound strategy automatically scales the amount of exploration with the difficulty of the accuracy requirement, avoiding the excessive exploration that plagues fixed‑schedule methods.
In summary, the paper makes three major advances: (i) it introduces a novel constrained‑bandit model that enforces per‑round accuracy guarantees; (ii) it provides near‑optimal algorithms for both non‑strategic and strategic environments, with rigorous regret upper and lower bounds; (iii) it bridges bandit learning with mechanism design, delivering a truthful, individually rational reverse‑auction mechanism for crowdsourcing under accuracy constraints. The work opens avenues for extensions to multi‑label tasks, time‑varying worker qualities, and richer budget or fairness constraints.
Comments & Academic Discussion
Loading comments...
Leave a Comment