Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems

Budget-Optimal T ask Allo cation for Reliable Cro wdsourcing Systems Da vid R. Karger ∗ , Sew o ong Oh † , and Dev a vrat Shah ‡ No vem ber 27, 2024 Abstract Cro wdsourcing systems, in which n umerous tasks are electronically distributed to n umerous “information piece-work ers”, ha ve emerged as an eﬀective paradigm for human-pow ered solving of large scale problems in domains suc h as image classiﬁcation, data entry , optical character recognition, recommendation, and proofreading. Because these lo w-paid work ers can be un- reliable, nearly all suc h systems must devise schemes to increase conﬁdence in their answers, t ypically by assigning eac h task multiple times and combining the answ ers in an appropriate manner, e.g. ma jority v oting. In this paper, w e consider a general model of such cro wdsourcing tasks and pose the problem of minimizing the total price (i.e., num b er of task assignments) that must be paid to ac hieve a target o v erall reliability . W e giv e a new algorithm for deciding which tasks to assign to whic h work ers and for inferring correct answers from the w ork ers’ answers. W e show that our algorithm, inspired by belief propagation and lo w-rank matrix approximation, signiﬁcan tly outp erforms ma jority v oting and, in fact, is optimal through comparison to an oracle that kno ws the reliability of every w orker. F urther, w e compare our approach with a more general class of algorithms which can dynamically assign tasks. By adaptively deciding which questions to ask to the next arriving work er, one migh t hop e to reduce uncertain ty more eﬃciently . W e sho w that, p erhaps surprisingly , the minimum price necessary to ac hieve a target reliability scales in the same manner under b oth adaptiv e and non-adaptive scenarios. Hence, our non-adaptiv e approac h is order-optimal under both scenarios. This strongly relies on the fact that w ork ers are ﬂeeting and can not be exploited. Therefore, architecturally , our results suggest that building a reliable w ork er-reputation system is essential to fully harnessing the p oten tial of adaptiv e designs. ∗ Computer Science and Artiﬁcial In telligence Lab oratory and Departmen t of EECS at Massach usetts Institute of T echnology . Email: k arger@mit.edu † Departmen t of Industrial and Enterprise Systems Engineering at Universit y of Illinois at Urbana-Champaign. Email: swoh@illinois.edu ‡ Lab oratory for Information and Decision Systems and Departmen t of EECS at Massach usetts Institute of T ec h- nology . Email: dev a vrat@mit.edu. This work was supp orted in parts by NSF EMT pro ject, AF OSR Complex Net works pro ject and Army Researc h Oﬃce under MURI Award 58153-MA-MUR. 1 1 In tro duction Bac kground. Crowdsourcing systems hav e emerged as an eﬀective paradigm for h uman-p o wered problem solving and are now in widespread use for large-scale data-processing tasks such as im- age classiﬁcation, video annotation, form data en try , optical character recognition, translation, recommendation, and pro ofreading. Cro wdsourcing systems suc h as Amazon Mec hanical T urk 1 , establish a mark et where a “taskmaster” can submit batches of small tasks to b e completed for a small fee b y any work er choosing to pic k them up. F or example a w ork er ma y b e able to earn a few cen ts b y indicating whic h images from a set of 30 are suitable for c hildren (one of the b eneﬁts of cro wdsourcing is its applicabilit y to suc h highly sub jective questions). Because these crowdsourced tasks are tedious and the pa y is lo w, errors are common even among w orkers who mak e an eﬀort. At the extreme, some work ers are “spammers”, submitting arbitrary answ ers indep enden t of the question in order to collect their fee. Thus, all cro wdsourcers need strategies to ensure the reliability of their answers. When the system allows the cro wdsourcers to iden tify and reuse particular w ork ers, a common approac h is to manage a p ool of reliable w ork ers in an explore/exploit fashion. Ho wev er in many cro wdsourcing platforms such as Amazon Mechanical T urk, the work er crowd is large, anon ymous, and transient, and it is generally diﬃcult to build up a trust relationship with particular work ers. 2 It is also diﬃcult to condition pa yment on correct answ ers, as the correct answer ma y never truly b e known and delaying pa yment can annoy work ers and make it harder to recruit them to y our task next time. Instead, most cro wdsourcers resort to redundancy , giving each task to m ultiple w orkers, paying them all irresp ectiv e of their answers, and aggregating the results by some metho d suc h as ma jority voting. F or suc h systems there is a natural core optimization problem to b e solved. Assuming the taskmaster wishes to achiev e a certain reliabilit y in her answers, how can she do so at minimum cost (whic h is equiv alent to asking how she can do so while asking the fewest p ossible questions)? Sev eral characteristics of crowdsourcing systems mak e this problem in teresting. W orkers are neither p ersisten t nor iden tiﬁable; each batc h of tasks will b e solv ed by a work er who may be completely new and who you ma y never see again. Th us one cannot identify and reuse particularly reliable work ers. Nonetheless, by comparing one w orker’s answ er to others’ on the same question, it is possible to draw conclusions ab out a work er’s reliabilit y , whic h can be used to w eight their answ ers to other questions in their batch. How ever, batc hes must b e of manageable size, ob eying limits on the num b er of tasks that can b e given to a single work er. Another interesting asp ect of this problem is the choic e of task assignments . Unlik e man y inference tasks whic h mak es inferences based on a ﬁxed set of signals, our algorithm can c ho ose whic h signals to measure by deciding whic h questions to include in which batc hes. In addition, there are sev eral plausible options: for example, we might c ho ose to ask a few “pilot questions” to each w orker (just lik e a qualifying exam) to decide on the reliability of the work er. Another p ossibilit y is to ﬁrst ask few questions and based on the answers decide to ask more questions or not. W e would lik e to understand the role of all suc h v ariations in the ov erall optimization of budget for reliable task pro cessing. In the remainder of this section, w e will deﬁne a formal probabilistic model that captures these asp ects of the problem. W e consider b oth a non-adaptive sc enario , in which all questions are 1 h ttp://www.mturk.com 2 F or certain high-v alue tasks, crowdsourcers can use entrance exams to “prequalify” work ers and blo ck spammers, but this increases the cost of the task and still pro vides no guaran tee that the work ers will try hard after qualiﬁcation. 2 ask ed sim ultaneously and all the resp onses are collected simultaneously , and an adaptive sc enario , in whic h one ma y adaptiv ely choose which tasks to assign to the next arriving work er based on all the previous answers collected thus far. W e pro vide a non-adaptive task allo cation scheme and an inference algorithm based on low-rank matrix approximations and b elief propagation. W e will then sho w that our algorithm is order-optimal: for a giv en target error rate, it sp ends only a constant factor times the minimum necessary to ac hieve that error rate. The optimality is established through comparisons to the b est p ossible non-adaptive task allo cation scheme and an oracle estimator that can mak e optimal decisions based on extra information pro vided b y an oracle. In particular, we derive a parameter q that c haracterizes the ‘collectiv e’ reliability of the cro wd, and sho w that to ac hieve target reliabilit y ε , it is b oth necessary and suﬃcient to replicate each task Θ(1 /q log(1 /ε )) times. This leads to the next question of in terest: by using adaptiv e task assignmen t, can we ask fewer questions and still achiev e the same error rate? W e, somewhat surprisingly , show that the optimal costs under this adaptive scenario scale in the same manner as the non-adaptiv e scenario; asking questions adaptively does not help! Setup. W e consider the following probabilistic model for cro wdsourcing. There is a set of m binary tasks whic h is asso ciated with unobserved ‘correct’ solutions: { t i } i ∈ [ m ] ∈ {± 1 } m . Here and after, w e use [ N ] to denote the set of ﬁrst N integers. In the image categorization example stated earlier, a set of tasks corresp onds to labeling m images as suitable for children (+1) or not ( − 1). W e will b e in terested in ﬁnding the true solutions by querying noisy work ers who arrive one at a time in an on-line fashion. An algorithmic solution to cro wdsourcing consists of tw o comp onen ts: a task allo cation scheme and an inference algorithm. A t task al lo c ation phase queries are made sequentially according to the follo wing rule. A t j -th step, the task assignment sc heme chooses a subset T j ⊆ [ m ] of tasks to b e assigned to the next arriving noisy work er. The only constraint on the choice of the batch is that the size | T j | m ust ob ey some limit on the n umber of tasks that can b e giv en to a single work er. Let r denote suc h a limit on the n umber of tasks that can b e assigned to a single w orker, suc h that all batc hes m ust satisfy | T j | ≤ r . Then, a work er j arriv es, whose laten t reliabilit y is parametrized b y p j ∈ [0 , 1]. F or each assigned task, this w orker giv es a noisy answer suc h that A ij =  t i with probabilit y p j , − t i otherwise , and A ij = 0 if i / ∈ T j . (Throughout this paper, w e use b oldface c haracters to denote random v ariables and random matrices unless it is clear from the context.) The next assignmen t T j +1 can b e chosen adaptively , taking into accoun t all of the previous assignments and the answers collected th us far. This pro cess is rep eated until the task assignmen t scheme decides to stop, typically when the total n um b er of queries meet a certain budget constrain t. Then, in the subsequent infer enc e phase , an inference algorithm makes a ﬁnal estimation of the true answ ers. W e say a task allo cation sc heme is adaptive if the choice of T j dep ends on the answers collected on previous steps, and it is non-adaptive if it do es not depend on the answ ers. In practice, one might prefer using a non-adaptive scheme, since assigning all the batches simultaneously and having all the batc hes of tasks pro cessed in parallel reduces latency . How ev er, b y switc hing to an adaptiv e task allo cation, one might b e able to reduce uncertaint y more eﬃcien tly . W e inv estigate this possibility in Section 2.4, and show that the gain from adaptation is limited. Note here that at the time of assigning tasks T j for a next arriving w ork er j , the algorithm is not a ware of the laten t reliability of the w orker. This is consisten t with how real-w orld crowdsourcing 3 w orks, since taskmasters typically ha v e no c hoice ov er which w ork er is going to pick up whic h batc h of tasks. F urther, we make the p essimistic assumption that w ork ers are neither p ersistent nor identiﬁable; eac h batc h of tasks T j will b e solved by a work er who ma y b e completely new and who y ou ma y never see again. Th us one cannot iden tify and reuse particularly reliable work ers. This is a diﬀerent setting from adaptive games [L W89], where you ha ve a sequence of trials and a set of predictions is made at eac h step by a p o ol of exp erts. In adaptive games, y ou can identify reliable exp erts from their past p erformance using tec hniques lik e m ultiplicative weigh ts, whereas in cro wdsourcing y ou cannot hop e to exploit any particular work er. The laten t v ariable p j captures ho w some work ers are more diligen t or hav e more exp ertise than others, while some other w ork ers migh t b e trying to c heat. The random v ariable A ij is independent of any other ev en t giv en p j . The underlying assumption here is that the error probability of a w ork er do es not dep end on the particular task and all the tasks share an equal lev el of diﬃcult y . Hence, eac h w ork er’s performance is consistent across diﬀeren t tasks. W e discuss a p ossible generalization of this model in Section 2.7. W e further assume that the reliabilit y of w orkers { p j } are indep enden t and iden tically dis- tributed random v ariables with a given distribution on [0 , 1]. As one example w e deﬁne the sp ammer-hammer mo del , where each work er is either a ‘hammer’ with probability q or is a ‘spam- mer’ with probability 1 − q . A hammer answ ers all questions correctly , meaning p j = 1, and a spammer gives random answ ers, meaning p j = 1 / 2. It should b e noted that the meaning of a ‘spammer’ might b e diﬀeren t from its use in other literature. In this model, a spammer is a w orker who giv es uniformly random lab els independent of the true label. In other literature in cro wdsourcing, the word ‘spammer’ has b een used, for instance, to refer to a w orker who alwa ys giv es ‘+’ labels [R Y12]. Another example is the b eta distribution with some parameters α > 0 and β > 0 ( f ( p ) = p α − 1 (1 − p ) β − 1 /B ( α, β ) for a prop er normalization B ( α , β )) [Hol11, R YZ + 10b]. A distribution of p j c haracterizes a cro wd, and the following parameter pla ys an imp ortant role in capturing the ‘collectiv e qualit y’ of this cro wd, as will be clear from our main results: q ≡ E [(2 p j − 1) 2 ] . A v alue of q close to one indicates that a large proportion of the work ers are diligent, whereas q close to zero indicates that there are many spammers in the crowd. The deﬁnition of q is consisten t with use of q in the spammer-hammer mo del and in the case of b eta distribution, q = 1 − (4 αβ / (( α + β )( α + β + 1))). W e will see later that our bound on the ac hiev able error rate dep ends on the distribution only through this parameter q . When the cro wd population is large enough suc h that w e do not need to distinguish whether the w orkers are ‘sampled’ with or without replacemen t, then it is quite realistic to assume the existence of a prior distribution for p j . In particular, it is met if w e simply randomize the order in which w e upload our task batc hes, since this will hav e the eﬀect of randomizing which work ers p erform whic h batches, yielding a distribution that meets our requirements. The mo del is therefore quite general. On the other hand, it is not realistic to assume that we kno w what the prior is. T o execute our inference algorithm for a given num b er of iterations, w e do not require any kno wledge of the distribution of the reliability . Ho wev er, q is necessary in order to determine how man y times a task should b e replicated and how many iterations we need to run to achiev e a certain target reliability . W e discuss a simple wa y to o v ercome this limitation in Section 2.2. The only assumption we make ab out the distribution is that there is a bias tow ards the right answ er, i.e. E [ p j ] > 1 / 2. Without this assumption, w e can hav e a ‘p erfect’ cro wd with q = 1, but 4 ev eryone is adversarial, p j = 0. Then, there is no w ay we can correct for this. Another wa y to justify this assumption is to deﬁne the “ground truth” of the tasks as what the ma jorit y of the cro wd agrees on. W e w ant to learn this consensus eﬃciently without having to query every one in the cro wd for every task. If w e use this deﬁnition of the ground truth, then it naturally follo ws that the w ork ers are on a verage more likely to b e correct. Throughout this paper, w e are going to assume that there is a ﬁxed cost y ou need to pa y for eac h resp onse y ou get regardless of the qualit y of the resp onse, such that the total cost is prop ortional to the total num b er of queries. When w e ha ve a given target accuracy we wan t to ac hiev e, and under the probabilistic crowdsourcing model describ ed in this section, w e wan t to design a task allo cation scheme and an inference algorithm that can achiev e this target accuracy with minimal cost. P ossible deviations from our mo del. Some of the main assumptions w e make on how cro wd- sourcing systems work are ( a ) work ers are neither iden tiﬁable nor reusable, ( b ) every work er is paid the same amount regardless of their performance, and ( c ) eac h work er completes only one batch and she completes all the tasks in that batc h. In this section, we discuss c ommon strategies used in real cro wdsourcing that migh t deviate from thes e assumptions. First, there has b een growing interest recently in dev eloping algorithms to eﬃcien tly iden tify go od work ers assuming that work er identities are known and workers ar e r eusable . Imagine a cro wdsourcing platform where there are a ﬁxed p o ol of identiﬁable work ers and we can assign the tasks to whic hever w orker w e choose to. In this setting, adaptive sc hemes can b e used to signiﬁcan tly impro v e the accuracy while minimizing the total n um b er of queries. It is natural to exp ect that b y ﬁrst exploring to ﬁnd better work ers and then exploiting them in the follo wing rounds, one might b e able to improv e p erformance signiﬁcan tly . Donmez et al. [DCS09] prop osed IEThresh whic h sim ultaneously estimates work er accuracy and activ ely selects a subset of work ers with high accuracy . Zheng et al. [ZSD10] prop osed a t wo-phase approach to identify go od w orkers in the ﬁrst phase and utilize the b est subset of work ers in the second phase. Ertekin et al. [EHR11] prop osed using a weigh ted ma jorit y voting to b etter estimate the true lab els in CrowdSense, which is then used to identify go o d w ork ers. The pow er of such exploration/exploitation approac hes w ere demonstrated on numerical ex- p erimen ts, how ev er none of these approac hes are tested on real-w orld crowdsourcing. All the exp erimen ts are done using pr e-c ol le cte d datasets. Given these datasets they simulate a labor mar- k et where they can trac k and reuse any w orkers they c ho ose to. The reason that the exp erimen ts are done on such sim ulated lab or markets, instead of on p opular cro wdsourcing platforms suc h as Amazon Mechanical T urk, is that on real-world crowdsourcing platforms it is almost imp ossible to trac k w orkers. Many of the p opular crowdsourcing platforms are completely op en lab or markets where the work er cro wd is large and transien t. F urther, oftentimes it is the work ers who choose whic h tasks they wan t to work on, hence the taskmaster cannot reuse particular work ers. F or these reasons, w e assume in this pap er that the w orkers are ﬂeeting and pro vide an algorithmic solution that works even when work ers are not reusable. W e sho w that any taskmaster who wishes to outperform our algorithm m ust adopt complex w ork er-tracking tec hniques. F urthermore, no w orker-trac king tec hnique has b een developed that has b een prov en to be fo olpro of. In particu- lar, it is imp ossible to preven t a work er from starting o v er with a new account. Man y trac king algorithms are susceptible to this attack. Another imp ortant and closely related question that has not b een formally addressed in crowd- 5 sourcing literature is how to diﬀer entiate the p ayment based on the inferred accuracy in order to incen tivize go o d w orkers. Regardless of whether the w orkers are iden tiﬁable or not, when all the tasks are completed we get an estimate of the quality of the w ork ers. It w ould b e desirable to pay the go od w orkers more in order to incentivize them to work for us in the future tasks. F or example, b on uses are built in to Amazon Mec hanical T urk to b e granted at the taskmaster’s discretion, but it has not b een studied ho w to use b onuses optimally . This could b e an interesting direction for future researc h. It has b een observed that increasing the cost on crowdsourcing platforms do es not directly lead to higher qualit y of the resp onses [MW10]. Instead, increasing the cost only leads to faster resp onses. Mason and W atts [MW10] attributes this coun terintuitiv e ﬁndings to an “anchoring” eﬀect. When the (expected) paymen t is higher, work ers perceive the v alue of their work to b e greater as well. Hence, they are no more motiv ated than w ork ers who are paid less. How ev er, these studies w ere done in isolate d exp erimen ts, and the long term eﬀect of taskmasters’ keeping a go od reputation still needs to be understo od. W orkers of Mec hanical T urk can manage reputation of the taskmasters using for instance T urk opticon 3 , a Firefox extension that allo ws y ou to rate taskmasters and view ratings from other work ers. Another example is T urkernation 4 , an on-line forum where w ork ers and taskmasters can discuss Mec hanical T urk and leav e feedbac k. Finally , in Mec hanical T urk, it is t ypically the w ork ers who cho ose which tasks they want to work on and when they w ant to stop. Without an y regulations, they migh t resp ond to m ultiple batc hes of your tasks or stop in the middle of a batc h. It is p ossible to systematically prev ent the same work er from coming bac k and rep eating more than one batc h of your tasks. F or example, on Amazon’s Mec hanical T urk, a w orker cannot rep eat the same task more than once. Ho wev er, it is diﬃcult to guarantee that a work er completes all the tasks in a batc h she started on. In practice, there are simple wa ys to ensure this by , for instance, conditioning the pa yment on completing all the tasks in a batch. A problem with restricting the num b er of tasks assigned to each w orker (as we prop ose in Section 2.1) is that it migh t take a long time to hav e all the batches completed. Letting the w orkers choose how many tasks they wan t to complete allows a few eager w orkers to complete enormous amount of tasks. Ho wev er, if we restrict the num ber of tasks assigned to eac h work er, we migh t need to recruit more w orkers to complete all the tasks. This problem of tasks taking long time to ﬁnish is not just restricted to our model, but is a v ery common problem in op en cro wdsourcing platforms. Ipeirotis [Ipe10] studied the completion time of tasks on Mechanical T urk and observ ed that it follows a hea vy tail distribution according to a p o w er law. Hence, for some tasks it tak es signiﬁcan t amoun t of time to ﬁnish. A n um b er of strategies ha ve b een proposed to complete tasks on time. This includes optimizing pricing policy [FHI11], con tin uously p osting tasks to sta y on the ﬁrst page [BJJ + 10, CHMA10], and having a large amoun t of tasks av ailable [CHMA10]. These strategies are eﬀective in attracting more w orkers fast. Ho wev er, in our mo del, w e assume there is no restrictions on the latency and w e can wait un til all the batc hes are completed, and if w e ha ve goo d strategies to reduce w orker resp onse time, such strategies could b e incorp orated in to our system design. Prior w ork. Previous cro wdsourcing system designs ha v e fo cused on dev eloping inference algo- rithms assuming that the task assignmen ts are ﬁxed and the work ers’ responses are already giv en. 3 h ttp://turkopticon.diﬀerenceengines.com 4 h ttp://turkernation.com 6 None of the prior work on crowdsourcing pro vides an y systematic treatmen t of task assignmen t under the cro wdsourcing mo del considered in this pap er. T o the best of our knowledge, we are the ﬁrst to study b oth asp ects of crowdsourcing together and, more imp ortan tly , establish optimalit y . A naive approac h to solve the inference problem, which is widely used in practice, is ma jority v oting. Ma jority voting simply follows what the ma jority of work ers agree on. When w e ha ve man y spammers in the crowd, ma jorit y v oting is error-prone since it gives the same weigh t to all the responses, regardless of whether they are from a spammer or a diligent work ers. W e will sho w in Section 2.3 that ma jorit y voting is pro v ably sub-optimal and can b e signiﬁcan tly impro ved upon. If we know how reliable eac h w ork er is, then it is straigh tforw ard to ﬁnd the maximum likelihoo d estimates: compute the weigh ted sum of the resp onses weigh ted b y the log-likelihoo d. Although, in realit y , we do not hav e this information, it is p ossible to learn ab out a w ork er’s reliability by comparing one w orker’s answer to others’. This idea w as ﬁrst prop osed by Da wid and Sk ene, who in tro duced an iterative algorithm based on exp ectation maximization (EM) [DS79]. They con- sidered the problem of classifying patien ts based on labels obtained from m ultiple clinicians. They in tro duce a simple probabilistic mo del describing the clinicians’ resp onses, and gav e an algorithmic solution based on EM. This mo del, whic h is describ ed in Section 2.7, is commonly used in modern cro wdsourcing settings to explain ho w w orkers make mistakes in classiﬁcation tasks [SPI08]. This heuristic algorithm iterates the follo wing tw o steps. In the M-step, the algorithm estimates the error probabilities of the work ers that maximizes the likelihoo d using the curren t estimates of the answers. In the E-step, the algorithm estimates the likelihoo d of the answers using the curren t estimates of the error probabilities. More recently , a num b er of algorithms follo wed this EM approach based on a v ariety of probabilistic mo dels [SFB + 95, WR W + 09, R YZ + 10a]. The cro wdsourcing mo del w e consider in this paper is a sp ecial case of these mo dels, and w e discuss their relationship in Section 2.7. The EM approach has also b een widely applied in classiﬁcation problems, where a set of lab els from low-cost noisy work ers is used to ﬁnd a goo d classiﬁer [JG03, R YZ + 10a]. Giv en a ﬁxed budget, there is a trade-oﬀ b et ween acquiring a larger training dataset or acquiring a smaller dataset but with more lab els p er data p oin t. Through extensive exp erimen ts, Sheng, Prov ost and Ip eirotis [SPI08] sho w that getting rep eated lab eling can give considerable adv an tage. Despite the p opularit y of the EM algorithms, the performance of these approac hes are only empirically ev aluated and there is no analysis that giv es p erformance guarantees. In particular, EM algorithms are highly sensitive to the initialization used, making it diﬃcult to predict the quality of the resulting estimate. F urther, the role of the task assignment is not at all understo o d with the EM algorithm (or for that matter any other algorithm). W e w ant to address b oth questions of task allo cation and inference together, and devise an algorithmic solution that can ac hiev e minim um error from a ﬁxed budget on the total num b er of queries. When we ha ve a given target accuracy , suc h an algorithm will achiev e this target accuracy with minim um cost. F urther, we wan t to pro vide a strong performance guarantee for this approach and show that it is close to a fundamen tal limit on what the b est algorithm can ac hiev e. Con tributions. In this w ork, we provide the ﬁrst rigorous treatmen t of both aspects of designing a reliable cro wdsourcing system: task allo cation and inference. W e provide both an order-optimal task allo cation scheme (based on random graphs) and an order-optimal algorithm for inference (based on low-rank appro ximation and b elief propagation) on that task assignment. W e show that our algorithm, whic h is non-adaptive, p erforms as well (for the worst-case work er distribution) as 7 the optimal oracle estimator which can use any adaptive task allo cation sc heme. Concretely , giv en a target probabilit y of error ε and a cro wd with collective qualit y q , we show that spending a budget whic h scales as O ( (1 /q ) log(1 /ε ) ) is suﬃcien t to ac hiev e probabilit y of error less than ε using our approach. W e giv e a task allocation scheme and an inference algorithm with run time which is linear in the total n um b er of queries (up to a logarithmic factor). Conv ersely , w e also show that using the b est adaptiv e task allo cation sc heme together with the b est inference algorithm, and under the w orst-case work er distribution, this scaling of the budget in terms of q and ε is una voidable. No algorithm can achiev e error less than ε with n um b er of queries smaller than ( C /q ) log(1 /ε ) with some p ositiv e univ ersal constan t C . This establishes that our algorithm is w orst-case optimal up to a constant factor in the required budget. Our main results show that our non-adaptiv e algorithm is w orst-case optimal and there is no signiﬁcan t gain in using an adaptive strategy . W e attribute this limit of adaptation to the fact that, in existing platforms suc h as Amazon’s Mec hanical T urk, the work ers are ﬂeeting and the system do es not allow for exploiting go od work ers. Therefore, a p ositiv e message of this result is that a goo d rating system for w orkers is essential to truly b eneﬁt from cro wdsourcing platforms using adaptivit y . Another no vel con tribution of our w ork is the analysis technique. The iterative inference al- gorithm w e introduce op erates on real-v alued messages whose distribution is a priori diﬃcult to analyze. T o ov ercome this challenge, w e dev elop a no vel tec hnique of establishing that these mes- sages are sub-Gaussian and compute the parameters recursiv ely in a closed form. This allows us to prov e the sharp result on the error rate. This technique could be of indep enden t in terest in analyzing a more general class of message-passing algorithms. 2 Main result T o ac hieve a certain reliabilit y in our answ ers with minim um n umber of queries, w e prop ose using random regular graphs for task allocation and in tro duce a nov el iterativ e algorithm to infer the correct answers. While our approac h is non-adaptive , we show that it is suﬃcien t to ac hiev e an order-optimal performance when compared to the best p ossible approac h using adaptive task allo cations. Precisely , we pro ve an upp er b ound on the resulting error when using our approac h and a matching lo wer b ound on the minimax error rate achiev ed by the b est p ossible adaptiv e task allo cation together with an optimal inference algorithm. This shows that our approach is minimax optimal up to a constant factor: it requires only a constant factor times the minim um necessary budget to ac hieve a target error rate under the worst-case w ork er distribution. W e then presen t the in tuitions b ehind our inference algorithm through connections to low-rank matrix appro ximations and belief propagation. 2.1 Algorithm T ask allo cation. W e use a non-adaptive scheme which makes all the task assignmen ts b efore any w orker arrives. This amounts to designing a bipartite graph with one t yp e of no des corresp onding to eac h of the tasks and another set of no des corresponding to eac h of the batches. An edge ( i, j ) indicates that task i is included in batch T j . Once all T j ’s are determined according to the graph, these batc hes are submitted simultaneously to the crowdsourcing platform. Eac h arriving w orker 8 will pick up one of the batc hes and complete all the tasks in that batch. W e denote by j the work er w orking on j -th batch T j . T o design a bipartite graph, the taskmaster ﬁrst makes a c hoice of ho w many work ers to assign to eac h task and how many tasks to assign to eac h w ork er. The task degree ` is t ypically determined b y ho w muc h resources (e.g. money , time, etc.) one can sp end on the tasks. The work er degree r is t ypically determined b y ho w man y tasks are manageable for a w orker dep ending on the application. The total num ber of work ers that w e need is automatically determined as n = m`/r , since the total n umber of edges has to b e consisten t. W e will show that with suc h a regular graph, y ou can achiev e probability of error which is quite close to a low er bound on what an y inference algorithm can achiev e with any task assignmen t. In particular, this includes all p ossible graphs which migh t hav e irregular degrees or hav e v ery large w orker degrees (and small n um b er of work ers) conditioned on the total n umber of edges b eing the same. This suggests that, among other things, there is no signiﬁcant gain in using an irregular graph. W e assume that the total cost that m ust b e paid is prop ortional to the total n umber of edges and not the n um b er of w ork ers. If w e ha v e more budget we can increase ` . It is then natural to exp ect the probabilit y of error to decrease, since we are collecting more responses. W e will sho w that the error rate decreases exp onen tially in ` as ` gro ws. How ever, increasing r do es not incur increase in the cost and it is not immediately clear how it aﬀects the p erformance. W e will show that with larger r w e can learn more ab out the w ork ers and the error rate decreases as r increases. Ho wev er, ho w m uch we can gain by increasing the work er degree is limited. Giv en the task and work er degrees, there are m ultiple wa ys to generate a regular bipartite graph. W e wan t to choose a graph that will minimize the probability of error. Deviating slightly from regular degrees, we propose using a simple random construction kno wn as c onﬁgur ation mo del in random gr aph literature [R U08, Bol01]. W e start with [ m ] × [ ` ] half-edges for task no des and [ n ] × [ r ] half-edges for the work er no des, and pair all the half-edges according to a random p ermutation of [ m` ]. The resulting graph might ha v e multi-edges where t w o no des are connected b y more than one edges. Ho w ever, they are very few in thus generated random graph as long as `  n , whence we also hav e r  m . Precisely , the n umber of double-edges in the graph conv erges in distribution to P oisson distribution with mean ( ` − 1)( r − 1) / 2 [Bol01, P age 59 Exercise 2.12]. The only property that w e need for the main result to hold is that the resulting random graph conv erges locally to a random tree in probability in the large system limit. This enables us to analyze the performance of our inference algorithm and provide sharp b ounds on the probability of error. The intuition b ehind wh y random graphs are go o d for our inference problem is related to the sp ectral gap of random matrices. In the follo wing, we will use the (approximate) top singular v ector of a weigh ted adjacency matrix of the random graph to ﬁnd the correct lab els. Since, sparse random graphs are excellent expanders with large sp ectral gaps, this enables us to reliably separate the lo w-rank structure from the data matrix which is p erturb ed by random noise. Inference algorithm. W e are giv en a task allocation graph G  [ m ] ∪ [ n ] , E  where we connect an edge ( i, j ) if a task i is assigned to a work er j . In the follo wing, we will use indexes i for a i -th task no de and j for a j -th w orker no de. W e use ∂ i to denote the neigh b orhoo d of no de i . Eac h edge ( i, j ) on the graph G has a corresponding work er response A ij . T o ﬁnd the correct lab els from the giv en resp onses of the w ork ers, we in troduce a nov el iterative algorithm. This algorithm is inspired by the celebrated belief propagation algorithm and low-rank 9 matrix appro ximations. The connections are explained in detail in Section 2.5 and 2.6, along with mathematical justiﬁcations. The algorithm op erates on real-v alued task messages { x i → j } ( i,j ) ∈ E and w orker messages { y j → i } ( i,j ) ∈ E . A task message x i → j represen ts the log-likelihoo d of task i b eing a p ositiv e task, and a work er mes- sage y j → i represen ts ho w reliable work er j is. W e start with the w orker messages initialized as indep enden t Gaussian random v ariables, although the algorithm is not sensitiv e to a sp eciﬁc ini- tialization as long as it has a strictly p ositive mean. W e could also initialize all the messages to one, but then w e need to add extra steps in the analysis to ensure that this is not a degenerate case. A t k -th iteration, the messages are up dated according to x ( k ) i → j = X j 0 ∈ ∂ i \ j A ij 0 y ( k − 1) j 0 → i , for all ( i, j ) ∈ E , and (1) y ( k ) j → i = X i 0 ∈ ∂ j \ i A i 0 j x ( k ) i 0 → j , for all ( i, j ) ∈ E , (2) where ∂ i is the neigh b orho od of the task no de i and ∂ j is the neigh b orhoo d of the work er no de j . At task up date, w e are giving more w eight to the answ ers that came from more trust worth y w orkers. A t work er up date, w e increase our conﬁdence in that work er if the answers she gav e on another task, A i 0 j , has the same sign as what w e b eliev e, x i 0 → j . Intuitiv ely , a work er message represen ts our b elief on ho w ‘reliable’ the work er is. Hence, our ﬁnal estimate is a w eighted sum of the answ ers w eighted by eac h w orker’s reliability: ˆ t ( k ) i = sign  X j ∈ ∂ i A ij y ( k − 1) j → i  . Iterativ e Algorithm Input: E , { A ij } ( i,j ) ∈ E , k max Output: Estimate ˆ t ∈ {± 1 } m 1: F or all ( i, j ) ∈ E do Initialize y (0) j → i with random Z ij ∼ N (1 , 1) ; 2: F or k = 1 , . . . , k max do F or all ( i, j ) ∈ E do x ( k ) i → j ← P j 0 ∈ ∂ i \ j A ij 0 y ( k − 1) j 0 → i ; F or all ( i, j ) ∈ E do y ( k ) j → i ← P i 0 ∈ ∂ j \ i A i 0 j x ( k ) i 0 → j ; 3: F or all i ∈ [ m ] do x i ← P j ∈ ∂ i A ij y ( k max − 1) j → i ; 4: Output estimate vector ˆ t ( k ) = { sign( x i ) } . While our algorithm is inspired by the standard Belief Propagation (BP) algorithm for approx- imating max-marginals [Pea88, YFW03], our algorithm is original and ov ercomes a few limitations of the standard BP for this inference problem under the crowdsourcing model. First, the iterative algorithm do es not require any knowledge of the prior distribution of p j , whereas the standard BP requires it as explained in detail in Section 2.6. Second, the iterative algorithm is prov ably order- optimal for this crowdsourcing problem. W e use a standard technique, kno wn as density evolution , to analyze the p erformance of our message-passing algorithm. Although w e can write down the densit y ev olution equations for the standard BP for crowdsourcing, it is not trivial to describe or 10 compute the densities, analytically or numerically . It is also very simple to write do wn the densit y ev olution equations (cf. (13) and (14)) for our algorithm, but it is not a priori clear how one can analyze the densities in this case either. W e develop a nov el tec hnique to analyze the densities for our iterative algorithm and pro v e optimalit y . This technique could b e of independent in terest to analyzing a broader class of message-passing algorithms. 2.2 P erformance guarantee and experimental results W e pro vide an upp er b ound on the probability of error ac hiev ed b y the iterativ e inference algorithm and task allo cation according to the conﬁguration mo del. The b ound decays as e − C `q with a univ ersal constant C . F urther, an algorithm-indep enden t lo wer b ound that w e establish suggests that suc h a dep endence of the error on `q is unav oidable. 2.2.1 Bound on the a verage error probability T o ligh ten the notation, let ˆ ` ≡ ` − 1 and ˆ r ≡ r − 1, and recall that q = E [(2 p j − 1) 2 ]. Using these notations, we deﬁne σ 2 k to b e the eﬀective v ariance in the sub-Gaussian tail of our estimates after k iterations of our inference algorithm: σ 2 k ≡ 2 q µ 2 ( q 2 ˆ ` ˆ r ) k − 1 +  3 + 1 q ˆ r  1 − (1 /q 2 ˆ ` ˆ r ) k − 1 1 − (1 /q 2 ˆ ` ˆ r ) . With this, we can pro v e the follo wing upp er bound on the probability of error when w e run k iterations of our inference algorithm with ( `, r )-regular assignments on m tasks using a crowd with collectiv e quality q . W e refer to Section 3.1 for the pro of. Theorem 2.1. F or ﬁxe d ` > 1 and r > 1 , assume that m tasks ar e assigne d to n = m`/r workers ac c or ding to a r andom ( `, r ) -r e gular gr aph dr awn fr om the c onﬁgur ation mo del. If the distribution of the worker r eliability satisﬁes µ ≡ E [2 p j − 1] > 0 and q 2 > 1 / ( ˆ ` ˆ r ) , then for any t ∈ {± 1 } m , the estimate after k iter ations of the iter ative algorithm achieves 1 m m X i =1 P  t i 6 = ˆ t ( k ) i  ≤ e − `q / (2 σ 2 k ) + 3 `r m ( ˆ ` ˆ r ) 2 k − 2 . (3) The second term, which is the probability that the resulting graph is not locally tree-like, v anishes for large m . Hence, the dominan t term in the error b ound is the ﬁrst term. F urther, when q 2 ˆ ` ˆ r > 1 as p er our assumption and when we run our algorithm for large enough num b er of iterations, σ 2 k con verges linearly to a ﬁnite limit σ 2 ∞ ≡ lim k →∞ σ 2 k suc h that σ 2 ∞ =  3 + 1 q ˆ r  q 2 ˆ ` ˆ r q 2 ˆ ` ˆ r − 1 . With linear con vergence of σ 2 k , we only need a small n umber of iterations to ac hieve σ k close to this limit. It follows that for large enough m and k , w e can pro ve an upp er b ound that do es not dep enden t on the problem size or the num b er of iterations, whic h is stated in the following corollary . 11 Corollary 2.2. Under the hyp otheses of The or em 2.1, ther e exists m 0 = 3 `r e `q / 4 σ 2 ∞ ( ˆ ` ˆ r ) 2( k − 1) and k 0 = 1 +  log( q /µ 2 ) / log ( ˆ ` ˆ rq 2 )  such that 1 m m X i =1 P  t i 6 = ˆ t ( k ) i  ≤ 2 e − `q / (4 σ 2 ∞ ) , (4) for al l m ≥ m 0 and k ≥ k 0 . Pr o of. F or ˆ ` ˆ rq 2 > 1 as p er our assumption, k = 1 + log ( q /µ 2 ) / log ( ˆ ` ˆ rq 2 ) iterations suﬃce to ensure that σ 2 k ≤ (2 q /µ 2 )( ˆ ` ˆ rq 2 ) − k +1 + q ˆ ` (3 q ˆ r + 1) / ( q 2 ˆ ` ˆ r − 1) ≤ 2 σ 2 ∞ . Also, m = 3 `r e `q / 4 σ 2 ∞ ( ˆ ` ˆ r ) 2( k − 1) suﬃces to ensure that ( ˆ ` ˆ r ) 2 k − 2 (3 `r ) /m ≤ exp {− `q / (4 σ 2 ∞ ) } .  The required n umber of iterations k 0 is small (only logarithmic in ` , r , q , and µ ) and do es not dep end on the problem size m . On the other hand, the required n umber of tasks m 0 in our main theorem is quite large. Ho wev er, numerical simulations suggest that the actual p erformance of our approac h is not v ery sensitiv e to the n umber of tasks and the b ound still holds for tasks of small size as well. F or example, in Figure 1 (left), we ran numerical exp erimen t with m = 1000, q = 0 . 3, and k = 20, and the resulting error exhibits exp onen tial decay as predicted by our theorem even for large ` = r = 30. In this case, theoretical requirement on the num b er of tasks m 0 is muc h larger than what w e used in the exp erimen t. Consider a set of w orker distributions {F | E F [(2 p − 1) 2 ] = q } that ha v e the same collective qualit y q . These distributions that ha ve the same v alue of q can giv e diﬀeren t v alues for µ ranging from q to q 1 / 2 . Our main result on the error rate suggests that the error do es not dep end on the v alue of µ . Hence, the eﬀective second mom en t q is the righ t measure of the collective quality of the cro wd, and the eﬀectiv e ﬁrst moment µ only aﬀects ho w fast the algorithm conv erges, since w e need to run our inference algorithm k = Ω(1 + log( q /µ 2 ) / log ( ˆ ` ˆ rq 2 )) iterations to guarantee the error bound. The iterative algorithm is eﬃcien t with run-time comparable to that of ma jority v oting which requires O ( m` ) op erations. Each iteration of the iterativ e algorithm requires O ( m` ) op erations, and w e need O (1+log ( q /µ 2 ) / log ( q 2 ˆ ` ˆ r )) iterations to ensure an error b ound in (4). By deﬁnition, w e hav e q ≤ µ ≤ √ q . The run-time is the worst when µ = q , whic h happ ens under the spammer-hammer mo del, and it is the smallest when µ = √ q whic h happ ens if p j = (1 + √ q ) / 2 deterministically . In an y case, we only need extra logarithmic factor that does not increase with compared to ma jorit y v oting, and this Notice that as w e increase the n um b er of iterations, the messages con verge to an eigenv ector of a particular sparse matrix of dimensions m` × m` . This suggests that w e can alternativ ely compute the mes sages using other algorithms for computing the top singular vector of large sparse matrices that are kno wn to con verge faster (e.g. Lanczos algorithm [Lan50]). Next, w e mak e a few remarks on the p erformance guaran tee. First, the assumption that µ > 0 is necessary . If there is no assumption on µ , then we cannot distinguish if the resp onses came from tasks with { t i } i ∈ [ m ] and work ers with { p j } j ∈ [ n ] or tasks with {− t i } i ∈ [ m ] and w orkers with { 1 − p j } j ∈ [ n ] . Statistically , b oth of them give the same output. The h yp othesis on µ allo ws us to distinguish whic h of the tw o is the correct solution. In the case when w e know that µ < 0, we can use the same algorithm c hanging the sign of the ﬁnal output and get the same performance guarantee. Second, our algorithm does not require an y information on the distribution of p j . Ho wev er, in order to generate a graph that ac hieves an optimal performance, w e need the knowledge of q for 12 selecting the degree ` = Θ(1 /q log (1 /ε )). Here is a simple wa y to o vercome this limitation at the loss of only additional constant factor, i.e. scaling of cost p er task still remains Θ(1 /q log(1 /ε )). T o that end, consider an incremen tal design in which at step a the system is designed assuming q = 2 − a for a ≥ 1. At step a , we design tw o replicas of the task allo cation for q = 2 − a . No w compare the estimates obtained by these t w o replicas for all m tasks. If they agree amongst m (1 − 2 ε ) tasks, then w e stop and declare that as the ﬁnal answ er. Otherwise, we increase a to a + 1 and rep eat. Note that by our optimality result, it follo ws that if 2 − a is less than the actual q then the iteration must stop with high probabilit y . Therefore, the total cost paid is Θ(1 /q log (1 /ε )) with high probability . Th us, even lack of knowledge of q do es not aﬀect the order optimality of our algorithm. F urther, unlik e previous approac hes based on Exp ectation Maximization (EM), the iterativ e algorithm is not sensitive to initialization and con verges to a unique solution from a random initial- ization with high probabilit y . This follo ws from the fact that the algorithm is essen tially computing a leading eigen v ector of a particular linear op erator. Finally , we observe a phase transition at ˆ ` ˆ rq 2 = 1. Abov e this phase transition, when ˆ ` ˆ rq 2 > 1, w e will show that our algorithm is order-optimal and the probability of error is signiﬁcan tly smaller than ma jority voting. Ho wev er, perhaps surprisingly , when w e are b elo w the threshold, when ˆ ` ˆ rq 2 < 1, w e empirically observe that our algorithm exhibit a fundamentally diﬀerent b eha vior (cf. Figure 1). The error we get after k iterations of our algorithm increases with k . In this regime, w e are better oﬀ stopping the algorithm after 1 iteration, in whic h case the estimate we get is essen tially the same as the simple ma jorit y v oting, and w e cannot do b etter than ma jorit y v oting. This phase transition is universal and w e observe similar b eha vior with other inference algorithms including EM approaches. W e provide more discussions on the choice of ` and the limitations of ha ving small r in the following section. 2.2.2 Minimax optimality of our approac h F or a task master, the natural core optimization problem of her concern is how to ac hieve a certain reliabilit y in the answers with minim um cost. Throughout this pap er, we assume that the cost is prop ortional to the total num b er of queries. In this section, w e show that if a taskmaster w ants to ac hieve a target error rate of ε , she can do so using our approac h with budget p er task scaling as O ((1 /q ) log(1 /ε )) for a broad range of w orker degree r . Compared to the necessary condition whic h w e pro vide in Section 2.3, this is within a constant factor from what is necessary using the b est non-adaptive task assignment and the b est inference algorithm. F urther, w e sho w in Section 2.4 that this scaling in the budget is still necessary if w e allow using the b est adaptive task assignment together with the b est inference algorithm. This prov es that our approac h is minimax optimal up to a constan t factor in the budget. Assuming for no w that there is no restrictions on the work er degree r and w e can assign as man y tasks to eac h work er as w e wan t, w e can get the follo wing simpliﬁed upp er b ound on the error that holds for all r ≥ 1 + 1 /q . T o simplify the resulting bound, let us assume for now that ˆ ` ˆ rq ≥ 2. Then, we get that σ 2 ∞ ≤ 2(3 + 1 / ˆ rq ). Then from (4), w e get the follo wing b ound: 1 m X i ∈ [ m ] P ( t i 6 = ˆ t ( k ) i ) ≤ 2 e − `q / 32 , for large enough m ≥ m 0 . In terms of the budget or the num b er of queries necessary to ac hieve a target accuracy , w e get the following suﬃcien t condition as a corollary . 13 Corollary 2.3. Using the non-adaptive task assignment scheme with r ≥ 1 + 1 /q and the iter ative infer enc e algorithm intr o duc e d in Se ction 2.1, it is suﬃcient to query (32 /q ) log(2 /ε ) times p er task to guar ante e that the pr ob ability of err or is at most ε for any ε ≤ 1 / 2 and for al l m ≥ m 0 . W e provide a matc hing minimax necessary condition up to a constan t factor for non-adaptiv e algorithms in Section 2.3. When the nature can c ho ose the w orst-case w ork er distributions, no non- adaptiv e algorithm can achiev e error less than ε with budget p er task smaller than ( C 0 /q ) log(1 / 2 ε ) with some universal p ositiv e constant C 0 . This es tablishes that under the non-adaptive scenario, our approach is minimax optimal up to a constan t factor for large enough m . With our approac h y ou only need to ask (and pay for) a constant factor more than what is necessary using the b est non- adaptiv e task assignment sc heme together with the b est inference algorithm under the w orst-case w orker distribution. P erhaps surprisingly , we will show in Section 2.4 that the necessary condition do es not c hange ev en if we allow adaptive task assignmen ts. No algorithm, adaptiv e or non-adaptive, can ac hiev e error less than ε without asking ( C 00 /q ) log(1 / 2 ε ) queries p er task with some univ ersal p ositiv e constan t C 00 . Hence, our non-adaptive approach achiev es minimax optimal p erformance that can b e achiev ed b y the b est adaptive sc heme. In practice, we migh t not b e allow ed to hav e large r dep ending on the application. F or diﬀerent regimes of the restrictions on the allow ed w ork er degree r , w e need diﬀerent choices of ` . When w e hav e a target accuracy ε , the follo wing corollary establishes that we can ac hieve probabilit y of error ε with ` ≥ C (1 + 1 / ˆ rq )(1 /q ) log (1 /ε ) for any v alue of r . Corollary 2.4. Using the non-adaptive task assignment scheme with any r and the iter ative in- fer enc e algorithm intr o duc e d in Se ction 2.1, it is suﬃcient to query (24 + 8 / ˆ r q )(1 /q ) log(2 /ε ) times p er task to guar ante e that the pr ob ability of err or is at most ε for any ε ≤ 1 / 2 and for al l m ≥ m 0 . Pr o of. W e will show that for ` ≥ max { 1 + 2 / ( ˆ rq 2 ) , 8(3 + 1 / ˆ rq )(1 /q ) log (1 /ε ) } , the probabilit y of error is at most ε . Since, 1+2 / ( ˆ r q 2 ) ≤ 8(3+ 1 / ˆ r q )(1 /q ) log(1 /ε ) for ε ≤ 1 / 2, this prov es the corollary . Since ˆ ` ˆ r q 2 ≥ 2 from the ﬁrst condition, w e get that σ 2 ∞ ≤ 2(3 + 1 / ˆ r q ). Then, the probabilit y of error is upp er b ounded by 2 exp {− `q / (24 + 8 / ˆ rq ) } . This implies that for ` ≥ (24 + 8 / ˆ rq )(1 /q ) log (2 /ε ) the probabilit y of error is at most ε .  F or r ≥ C 0 /q , this implies that our approac h requires O ((1 /q ) log (1 /ε )) queries and it is minimax optimal. How ev er, for r = O (1), our approach requires O ((1 /q 2 ) log (1 /ε )) queries. This is due to the fact that when r is small, w e cannot eﬃcien tly learn the qualit y of the work ers and need signiﬁcan tly more questions to ac hieve the accuracy we desire. Hence, in practice, we wan t to b e able to assign more tasks to eac h work er when w e ha v e low-qualit y w ork ers. 2.2.3 Exp erimen tal results Figure. 1 sho ws the comparisons betw een probabilities of error ac hieved b y diﬀeren t inference algorithms, but on the same task assignmen t using regular bipartite random graphs. W e ran 20 iterations of EM and our iterative algorithm, and also the sp ectral approac h of using leading left singular vector of A for estimation. The sp ectral approach, whic h w e call Singular V ector in the graph, is explained in detail in Section 2.5. The error rates are compared with those of ma jorit y v oting and the oracle estimator. The oracle estimator p erformance sets a low er b ound on what any inference algorithm can achiev e, since it kno ws all the v alues of p j ’s. F or the numerical simulation 14 1e-05 0.0001 0.001 0.01 0.1 1 5 10 15 20 25 30 Probability of error Number of queries per task Majority Voting Expectation Maximization Singular Vector Iterative Algorithm Oracle Estimator 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 0 0.1 0.2 0.3 0.4 Probability of error Collective quality of the crowd = q Majority Voting Expectation Maximization Singular Vector Iterative Algorithm Oracle Estimator Figure 1: The iterative algorithm improv es o ver ma jorit y voting and EM algorithm. Using the top singular v ector for inference has similar p erformance as our iterativ e approac h. on the left-hand side, w e set m = 1000, ` = r and used the spammer hammer mo del for the distribution of the work ers with q = 0 . 3. According to our theorem, w e exp ect a phase transition at ` = 1 + 1 / 0 . 3 = 4 . 3333. F rom the ﬁgure, w e observe that the iterative inference algorithm starts to perform b etter than ma jority voting at ` = 5. F or the ﬁgure on the right-hand side, we set ` = 25. F or fair c omparisons with the EM approac h, we used an implemen tation of the EM approac h in Ja v a b y Sheng et al. [SPI08], which is publicly av ailable. W e also ran t wo exp eriments with real crowd using Amazon Mechanical T urk. In our exp eri- men ts, w e created tasks for comparing colors; we show ed three colors on each task, one on the top and tw o on the b ottom. W e ask ed the crowd to indicate “if the color on the top is more similar to the color on the left or on the righ t”. The ﬁrst exp eriment conﬁrms that the ground truth for these color comparisons tasks are what is exp ected from pairwise distances in the Lab color space. The distances in the Lab color space b et w een the a pair of colors are known to b e a go od measure of the p erceiv ed distance b et ween the pair [WS67]. T o c heck the v alidity of this Lab distance w e collected 210 resp onses on each of the 10 color comparison tasks. As sho wn in Figure. 2, for all 10 tasks, the ma jority of the 210 resp onses w ere consistent with the Lab distance based ground truth. Next, to test our approach, w e created 50 of such similarit y tasks and recruited 28 w orkers to answ er all the questions. Once we ha ve this data, w e can subsample the data to simulate what w ould ha ve happened if we collected smaller num b er of resp onses p er task. The resulting av erage probabilit y of error is illustrated in Figure. 3. F or this crowd from Amazon Mec hanical T urk, w e can estimate the collectiv e qualit y from the data, which is about q ' 0 . 175. Theoretically , this indicates that phase transition should happ en when ( ` − 1)((50 / 28) ` − 1) q 2 = 1, since we set r = (50 / 28) ` . With this, we exp ect phase transition to happ en around ` ' 5. In Figure. 3, we see that our iterativ e algorithm starts to p erform b etter than ma jorit y v oting around ` = 8. 2.3 F undamental limit under the non-adaptiv e scenario Under the non-adaptive scenario, we are allow ed to use only non-adaptive task assignmen t sc hemes whic h assign all the tasks a priori and collect all the resp onses simultaneously . In this section, we 15 151 59 123 87 141 69 126 84 109 101 121 89 141 69 141 69 149 61 159 51 Figure 2: Exp erimen tal results on color comparison using real data from Amazon’s Mechanical T urk. The color on the left is closer to the one on the top in Lab distance for eac h triplet. The v otes from 210 w orkers are shown b elo w each triplet. 0 0.1 0.2 0.3 0.4 0.5 4 8 12 16 20 24 28 Probability of error Number of queries per task Majority Voting Expectation Maximization Iterative Algorithm Figure 3: The av erage probability of error on color comparisons using real data from Amazon’s Mec hanical T urk. 16 in vestigate the fundamen tal limit on how small an error can b e achiev ed using the b est p ossible non- adaptiv e task assignmen t scheme together with the b est possible inference algorithm. In particular, w e are in terested in the minimax optimalit y: What is the minimum error that can b e ac hieved under the worst-case w orker distribution? T o this end, we analyze the p erformance of an oracle estimator when the work ers’ latent qualities are dra wn from a sp eciﬁc distribution and provide a low er b ound on the minimax rate on the probability of error. Compared to our main result, this establishes that our approac h is minimax optimal up to a constan t factor. In terms of the budget, the natural core optimization problem of our concern is ho w to ac hieve a certain reliability in our answers with minimum cost. Let us assume that the cost is prop ortional to the total n umber of queries. W e sho w that for a giv en target error rate ε , the total budget suﬃcien t to achiev e this target error rate using our algorithm is within a constant factor from what is necessary using the b est non-adaptiv e task assignmen t and the best inference algorithm. F undamen tal limit. Consider a crowd characterized by work er distribution F suc h that p j ∼ F . Let F q b e a set of all distributions on [0 , 1], suc h that the collectiv e qualit y is parametrized b y q : F q =  F | E F [(2 p j − 1) 2 ] = q  . W e wan t to pro ve a low er b ound on the minimax r ate on the probability of error, whic h only dep ends on q and ` . Deﬁne the minimax rate as min τ ∈T ` , ˆ t max t ∈{± 1 } m , F ∈F q 1 m X i ∈ [ m ] P  t i 6 = ˆ t i  , where ˆ t ranges ov er all estimators which are measurable functions ov er the responses, and τ ranges o ver the set T ` of all task assignmen t sc hemes whic h are non-adaptive and ask m` queries in total. Here the probability is tak en o ver all realizations of p j ’s, A ij ’s, and the randomness in tro duced in the task assignmen t and the inference. Consider an y non-adaptive scheme that assigns ` i w orkers to the i -th task. The only constrain t is that the a verage num ber of queries is b ounded by (1 /m ) P i ∈ [ m ] ` i ≤ ` . T o get a lo wer b ound on the minim um ac hiev able error, we consider an oracle estimator that has access to all the p j ’s, and hence can make an optimal estimation. F urther, since we are pro ving minimax optimalit y and not instance-optimalit y , the worst-case error rate will alw a ys b e lo w er b ounded b y the error rate for an y c hoice of w orker distribution. In particular, we pro ve a lo wer b ound using the spammer-hammer mo del. Concretely , we assume the p j ’s are drawn from the spammer-hammer mo del with p erfect hammers: p j =  1 / 2 with probability 1 − q , 1 otherwise . Notice that the use of q is consisten t with E [(2 p j − 1) 2 ] = q . Under the spammer-hammer mo del, the oracle estimator only mak es a mistake on task i if it is only assigned to spammers, in which case w e ﬂip a fair coin to achiev e error probabilit y of half. F ormally , P ( ˆ t i 6 = t i ) = 1 2 (1 − q ) ` i . By con vexit y and using Jensen’s inequality , the av erage probabilit y of error is low er bounded by 1 m X i ∈ [ m ] P ( ˆ t i 6 = t i ) ≥ 1 2 (1 − q ) ` . 17 Since we are interested in how many more queries are necessary as the quality of the crowd dete- riorates, w e are going to assume q ≤ 2 / 3, in which case (1 − q ) ≥ e − ( q + q 2 ) . As long as total m` queries are used, this low er b ound holds regardless of how the actual tasks are assigned. And since this lo wer b ound holds for a particular choice of F , it holds for the w orst case F as w ell. Hence, for the best task assignmen t scheme and the b est inference algorithm, we hav e min τ ∈T ` , ˆ t max t ∈{± 1 } m , F ∈F q 1 m X i ∈ [ m ] P  t i 6 = ˆ t i  ≥ 1 2 e − ( q + q 2 ) ` . This low er b ound on the minimax rate holds for any positive in teger m , and regardless of the n umber of work ers or the num b er of queries, r , assigned to each w orker. In terms of the a v erage n umber of queries necessary to achiev e a target accuracy of ε , this implies the following necessary condition. Lemma 2.5. Assuming q ≤ 2 / 3 and the non-adaptive sc enario, if the aver age numb er of queries p er task is less than (1 / 2 q ) log (1 / 2 ε ) , then no algorithm c an achieve aver age pr ob ability of err or less than ε for any m under the worst-c ase worker distribution. T o pro ve this worst-cased b ound, w e analyzed a sp eciﬁc distribution of the spammer-hammer mo del. How ev er, the result (up to a constant factor) seems to b e quite general and can also b e pro ved using diﬀeren t distributions, e.g. when all work ers ha ve the same qualit y . The assumption on q can b e relaxed as muc h as w e w ant, by increasing the constant in the necessary budget. Compared to the suﬃcient condition in Corollary 2.3 this establishes that our approach is minimax optimal up to a constan t factor. With our approac h y ou only need to ask (and pay for) a constan t factor more than what is necessary for an y algorithm. Ma jorit y voting. As a comparison, we can do similar analysis for the simple ma jority voting and sho w that the p erformance is signiﬁcan tly worse than our approach. The next lemma pro vides a b ound on the minimax rate of ma jorit y voting. A pro of of this lemma is provided in Section 3.4. Lemma 2.6. F or any C < 1 , ther e exists a p ositive c onstant C 0 such that when q ≤ C , the err or achieve d by majority voting is at le ast min τ ∈T ` max t ∈{± 1 } m , F ∈F q 1 m X i ∈ [ m ] P  t i 6 = ˆ t i  ≥ e − C 0 ( `q 2 +1) . In terms of the num b er of queries necessary to achiev e a target accuracy ε using ma jority v oting, this implies that w e need to ask at least ( c/q 2 ) log ( c 0 /ε ) queries per task for some universal constan ts c and c 0 . Hence, ma jorit y voting is signiﬁcantly more costly than our approach in terms of budget. Our algorithm is more eﬃcient in terms of computational complexit y as w ell. Simple ma jority v oting requires O  ( m/q 2 ) log(1 /ε )  op erations to ac hiev e target error rate ε in the w orst case. F rom Corollary 2.2, together with ` = O ((1 /q ) log(1 /ε )) and `r q 2 = Ω(1), w e get that our approac h requires O (( m/q ) log(1 /q ) log (1 /ε )) op erations in the w orst case. 2.4 F undamental limit under the adaptiv e scenario In terms of the scaling of the budget necessary to ac hiev e a target accuracy , we established that using a non-adaptive task assignmen t, no algorithm can do b etter than our approach. One migh t 18 prefer a non-adaptiv e scheme in practice b ecause having all the batc hes of tasks pro cessed in parallel reduces the latency . This is crucial in many applications, esp ecially in real-time applications suc h as searc hing, visual information pro cessing, and document pro cessing [BJJ + 10, BLM + 10, YK G10, BBMK11]. How ev er, by switc hing to an adaptiv e task assignment, one might hop e to b e more eﬃcien t and still obtain a desired accuracy from fewer questions. On one hand, adaptation can help impro v e performance. But on the other hand, it can signiﬁcan tly complicate system design due to careful sync hronization requirements. In this section, w e wan t to pro v e an algorithm-indep enden t upp er b ound on ho w m uch one can gain b y using an adaptiv e task allo cation. When the iden tities of the w orkers are known, one migh t b e tempted to ﬁrst identify which w orkers are more reliable and then assign all the tasks to those work ers in an explore/exploit manner. Ho w ever, in typical crowdsourcing platforms suc h as Amazon Mechanical T urk, it is unrealistic to assume that we can identify and reuse any particular w orker, since typical w orkers are neither p ersisten t nor iden tiﬁable and batches are distributed through an op en-call. Hence, exploiting a reliable w orker is not p ossible. Ho w ever, we can adaptiv ely resubmit batches of tasks; w e can dynamically choose whic h subset of tasks to assign to the next arriving w orker. In particular, w e can allocate tasks to the next batc h based on all the information w e ha ve on all the tasks from the responses collected thus far. F or example, one migh t hope to reduce uncertain t y more eﬃciently b y adaptively collecting more resp onses on those tasks that she is less certain ab out. F undamen tal limit. In this section, we show that, p erhaps surprisingly , there is no signiﬁcan t gain in switching from our non-adaptive approach to an adaptiv e strategy when the w ork ers are ﬂe eting . W e ﬁrst prov e a lo wer b ound on the minimax error rate: the error that is achiev ed b y the best inference algorithm ˆ t using the best adaptiv e task allocation sc heme τ under a w orst-case w orker distribution F and the worst-case true answ ers t . Let e T ` b e the set of all task assignment sc hemes that use at most m` queries in total. Then, we can show the following low er b ound on the minimax rate on the probability of error. A pro of of this theorem is pro vided in Section 3.5. Theorem 2.7. When q ≤ C for any c onstant C < 1 , ther e exists a p ositive c onstant C 0 such that min τ ∈ e T ` , ˆ t max t ∈{± 1 } m , F ∈F q 1 m X i ∈ [ m ] P  t i 6 = ˆ t i  ≥ 1 2 e − C 0 `q , (5) for al l m wher e the task assignment scheme τ r anges over al l adaptive schemes that use at most m` queries and ˆ t r anges over al l estimators that ar e me asur able functions over the r esp onses. W e cannot av oid the factor of half in the low er bound, since w e can alw a ys ac hieve error prob- abilit y of half without asking an y queries (with ` = 0). In terms of the budget required to achiev e a target accuracy , the ab o ve low er bound prov es that no algorithm, adaptive or non-adaptive, can ac hieve an error rate less than ε with n um b er of queries p er task less than ( C 0 /q ) log(2 /ε ) in the w orst case of w orker distribution. Corollary 2.8. Assuming q ≤ C for any c onstant C < 1 and the iter ative sc enario, ther e exists a p ositive c onstant C 0 such that if the aver age numb er of queries is less than ( C 0 /q ) log(1 / 2 ε ) , then no algorithm c an achieve aver age pr ob ability of err or less than ε for any m under the worst-c ase worker distribution. Compared to Corollary 2.3, w e ha ve a matc hing suﬃcient and necessary conditions up to a constan t factor. This prov es that there is no signiﬁcant gain in using an adaptiv e sc heme, and our 19 approac h achiev es minimax-optimalit y up to a constant factor with a non-adaptiv e scheme. This limitation of adaptation strongly relies on the fact that w orkers are ﬂe eting in existing platforms and can not b e reused. Therefore, architecturally our results suggest that building a reliable reputation system for w ork ers would b e essen tial to harnessing the potential of adaptive designs. A counter example for instance-optimality . The ab o ve corollary establishes minimax- optimalit y: for the w orst-case work er distribution, no algorithm can impro ve o ver our approach other than impro ving the constan t factor in the necessary budget. Ho wev er, this do es not imply instance-optimalit y . In fact, there exists a family of w orker distributions where all non-adaptiv e algorithms fail to ac hieve order-optimal performance whereas a trivial adaptive algorithm succeeds. Hence, for particular instances of w ork er distributions, there exists a gap betw een what can be ac hieved using non-adaptive algorithms and adaptive ones. W e will prov e this in the case of the spammer-hammer mo del where each new work er is a hammer ( p j = 1) with probabilit y q or a spammer ( p j = 1 / 2) otherwise. W e sho w ed in Section 2.3 that no non-adaptive algorithm can ac hieve an error less than (1 / 2) e − C 0 `q for any v alue of m . In particular, this does not v anish even if we increase m . W e will in tro duce a simple adaptiv e algorithm and show that this algorithm achiev es an error probability that go es to zero as m grows. The algorithm ﬁrst groups all the tasks in to √ m disjoin t sets of size √ m eac h. Starting with the ﬁrst group, the algorithm assigns all √ m tasks to new arriving work ers un til it sees tw o work ers who agreed on all √ m tasks. It declares those resp onses as its estimate for this group and mov es on to the next group. This pro cess is rep eated until it reac hes the allow ed num b er of queries. This estimator makes a mistak e on a group if ( a ) there w ere tw o spammers who agreed on all √ m tasks or ( b ) we run out of allo wed num b er of queries b efore w e ﬁnish the last group. F ormally , we can pro ve the following upp er b ound on the probability of error. Lemma 2.9. Under the sp ammer-hammer mo del, when the al lowe d numb er of queries p er task ` is lar ger than 2 /q , ther e is an adaptive task al lo c ation scheme and an infer enc e algorithm that achieves aver age pr ob ability of err or at most m` 2 2 − √ m + e − (2 /` )( `q − 2) 2 √ m . Pr o of. Recall that we are only allo w ed `m queries. Since we are allocating √ m queries p er w orker, w e can only ask at most ` √ m work ers. First, the probabilit y that there is at least one pair of spammers (among all p ossible pairs from ` √ m work ers) who agreed an all √ m resp onses is at most m` 2 2 − √ m . Next, giv en that no pairs of spammers agreed on all their resp onses, the probabilit y that we run out of all m` allo wed queries is the probabilit y that the num b er of hammers in ` √ m w orkers is strictly less than 2 √ m (which is the n um b er of hammers we need in order to terminate the algorithm, conditioned on that no spammers agree with one another). By standard concen tration results, this happens with probabilit y at most e − (2 /` )( `q − 2) 2 √ m .  This pro ves the existence of an adaptive algorithm whic h achiev es v anishing error probabilit y as m gro ws for a b oard range of task degree ` . Comparing the abov e upp er b ound with the known lo wer b ound for non-adaptiv e schemes, this pro ves that non-adaptiv e algorithms cannot be instance optimal: there is a family of distributions where adaptation can signiﬁcantly impro ve p erformance. This is generally true when there is a strictly p ositiv e probability that a w orker is a hammer ( p j = 1). One might b e tempted to apply the ab o v e algorithm in more general settings other than the spammer-hammer mo del. How ev er, this algorithm fails when there are no p erfect work ers in the 20 cro wd. If we apply this algorithm in suc h a general setting, then it pro duces useless answ ers: the probabilit y of error approac hes half as m gro ws for an y ﬁnite ` . 2.5 Connections to lo w-rank matrix appro ximation In this section, w e ﬁrst explain why the top singular v ector of the data matrix A reveals the true answ ers of the tasks, where A is the m × n matrix of the resp onses and w e ﬁll in zeros wherever w e ha ve no resp onses collected. This naturally deﬁnes a sp ectral algorithm for inference which w e present next. It was prov en in [K OS11] that the error achiev ed b y this sp ectral algorithm is upp er bounded by C / ( `q ) with some constant C . But n umerical exp erimen ts (cf. Figure 1) suggest that the error deca ys m uch faster, and that the gap is due to the w eakness of the analysis used in [KOS11]. Inspired by this sp ectral approach, w e introduced a no vel inference algorithm that p erforms as well as the sp ectral algorithm (cf. Figure 1) and prov ed a m uch tigh ter upp er b ound on the resulting error which scales as e − C 0 `q with some constant C 0 . Our inference algorithm is based on p ower iter ation , which is a well-kno wn algorithm for computing the top singular vector of a matrix, and Figure 1 suggests that b oth algorithms are equally eﬀectiv e and the resulting errors are almost iden tical. The data matrix A can be viewed as a rank-1 matrix that is p erturb ed b y random noise. Since, E [ A ij | t i , p j ] = ( r /m ) t i (2 p j − 1), the conditional exp ectation of this matrix is E  A | t, p  =  r m  t (2 p − 1 ) T , where 1 is the all ones v ector, the v ector of correct solutions is t = { t i } i ∈ [ m ] and the v ector of w orker reliability is p = { p j } j ∈ [ n ] . Notice that the rank of this conditional exp ectation matrix is one and this matrix rev eals the correct solutions exactly . W e can decomp ose A in to a low-rank exp ectation plus a random p erturbation: A =  r m  t (2 p − 1 ) T + Z , where Z ≡ A − E  A | t, p  is the random p erturbation with zero mean. When the sp ectral radius of the noise matrix Z is muc h smaller than the spectral radius of the signal, w e can correctly extract most of t using the leading left singular v ector of A . Under the cro wdsourcing mo del considered in this paper, an inference algorithm using the top left singular v ector of A w as in troduced and analyzed by Karger et al. [K OS11]. Let u b e the top left singular vector of A . They prop osed estimating ˆ t i = sign( u i ) and prov ed an upp er bound on the probability of error that scales as O (1 /`q ). The main technique b ehind this result is in analyzing the sp ectral gap of A . It is not diﬃcult to see that the sp ectral radius of the conditional exp ectation matrix is ( r /m ) k t (2 p − 1 ) T k 2 = √ `r q , where the op erator norm of a matrix is denoted b y k X k 2 ≡ max a k X a k / k a k . Karger et al. prov ed that the sp ectral radius of the p erturbation k Z k 2 is in the order of ( `r ) 1 / 4 . Hence, when `r q 2  1, we exp ect a separation b et w een the conditional exp ectation and the noise. One wa y to compute the leading singular v ector is to use p o w er iteration: for t wo vectors u ∈ R m and v ∈ R n , starting with a randomly initialized v , p o wer iteration iteratively updates u and v by rep eating u = Av and v = A T u . It is known that normalized u (and v ) con verges linearly to the leading left (and right) singular v ector. Then w e can use the sign of u i to estimate t i . W riting the 21 up date rule for each entry , we get u i = X j ∈ ∂ i A ij v j , v j = X i ∈ ∂ j A ij u i . Notice that this p o w er iteration up date rule is almost iden tical to those of message passing up dates in (1) and (2). The ` task messages { x i → j } j ∈ ∂ i from task i are close in v alue to the en try u i of the p o w er iteration. The r work er messages { y j → i } i ∈ ∂ j from work er j are close in v alue to the en try v j of the pow er iteration. Numerical sim ulations in Figure 1 suggest that the qualit y of the estimates from the tw o algorithms are almost identical. How ever, the kno wn pe rformance guaran tee for the sp ectral approach is weak. W e dev elop ed nov el analysis tec hniques to analyze our message passing algorithm, and provide an upp er b ound on the error that scales as e − C `q . It migh t b e p ossible to apply our algorithm, together with the analysis tec hniques, to other problems where the top singular v ector of a data matrix is used for inference. 2.6 Connections to b elief propagation The crowdsourcing model describ ed in this pap er can naturally b e describ ed using a graphical mo del. Let G ([ m ] × [ n ] , E , A ) denote the weigh ted bipartite graph, where [ m ] is the set of m task no des, [ n ] is the set of n work er no des, E is the set of edges connecting a task to a w orker who is assigned that task, and A is the set of w eigh ts on those edges according to the resp onses. Given such a graph, w e wan t to ﬁnd a set of task answ ers that maximize the follo wing p osterior distribution F ( ˆ t, p ) : {± 1 } m × [0 , 1] n → R + . max ˆ t,p Y a ∈ [ n ] F ( p a ) Y ( i,a ) ∈ E n p a I ( ˆ t i = A ia ) + (1 − p a ) I ( ˆ t i 6 = A ia ) o , where with a sligh t abuse of notation we use F ( · ) to denote the prior probability densit y function of p a ’s and w e use i and j to denote task no des and a and b to denote work er nodes. F or such a problem of ﬁnding the most probable realization in a graphical mo del, the celebrated b elief propagation (BP) giv es a goo d approximate solution. T o b e precise, BP is an appro ximation for maximizing the marginal distribution of eac h v ariable, and a similar algorithm kno wn as min-sum algorithm approximates the most probable realization. How ev er, the t w o algorithms are closely related, and in this section we only presen t standard BP . There is a long line of literature providing the theoretical and empirical evidences supp orting the use BP [Pea88, YFW03]. Under the crowdsourcing graphical mo del, standard BP op erates on t wo sets of messages: the task messages { e x i → a } ( i,a ) ∈ E and the w ork er messages { e y a → i } ( i,a ) ∈ E . In our iterative algorithm the messages were scalar v ariables with real v alues, whereas the messages in BP are probability densit y functions. Eac h task message corresp onds to an edge and each w orker message also corresponds to an edge. The task node i corresp onds to random v ariable ˆ t i , and the task message from task i to w orker a , denoted b y e x i → a , represen ts our b elief on the random v ariable ˆ t i . Then e x i → a is a probabilit y distribution o ver {± 1 } . Similarly , a w orker node a corresp onds to a random v ariable p a . The w ork er message e y a → i is a probabilit y distribution of p a o ver [0 , 1]. F ollo wing the standard BP framework, w e iterativ ely up date the messages according to the follo wing rule. W e start with 22 randomly initialized e x i → a ’s and at k -th iteration, e y ( k ) a → i ( p a ) ∝ F ( p a ) Y j ∈ ∂ a \ i n ( p a + ¯ p a + ( p a − ¯ p a ) A j a ) e x ( k ) j → a (+1) + ( p a + ¯ p a − ( p a − ¯ p a ) A j a ) e x ( k ) j → a ( − 1) o , e x ( k +1) i → a ( ˆ t i ) ∝ Y b ∈ ∂ i \ a Z  e y ( k ) b → i ( p b )  p b I ( A ib = ˆ t i ) + ¯ p b I ( A ib 6 = ˆ t i )   dp b , for all ( i, a ) ∈ E and for ¯ p = 1 − p . The ab o v e up date rule only determines the messages up to a scaling, where ∝ indicates that the left-hand side is prop ortional to the righ t-hand side. The algorithm produces the same estimates in the end regardless of the scaling. After a predeﬁned n umber of iterations, w e mak e a decision b y computing the decision v ariable e x i ( ˆ t i ) ∝ Y b ∈ ∂ i Z  e y ( k ) b → i ( p b )  p b I ( A ib = ˆ t i ) + ¯ p b I ( A ib 6 = ˆ t i )   dp b , and estimating ˆ t i = sign  e x i (+) − e x i ( − )  . In a sp ecial case of a Haldane prior, where a w orker either alwa ys tells the truth or alw ays giv es the wrong answ er, p j =  0 with probability 1 / 2 , 1 otherwise , the ab o v e BP up dates b oils down to our iterativ e inference algorithm. Let x i → a = log  e x i → a (+) / e x i → a ( − )  denote the log-likelihoo d of e x i → a ( · ). Under Haldane prior, p a is also a binary random v ariable. W e can use y a → i = log  e y a → i (1) / e y a → i (0)  to denote the log-lik eliho o d of e y a → i ( · ). After some simpliﬁ- cations, the abov e BP update b oils do wn to y ( k ) a → i = X j ∈ ∂ a \ i A j a x ( k − 1) j → a , x ( k ) i → a = X b ∈ ∂ i \ a A ib y ( k ) b → i . This is exactly the same update rule as our iterativ e inference algorithm (cf. Eqs. (1) ad (2)). Th us, our algorithms is belief propagation for a v ery speciﬁc prior. Despite this, it is surprising that it p erforms near optimally (with random regular graph for task allo cation) for all priors. This robustness prop ert y is due to the models assumed in this crowdsourcing problem and is not to b e exp ected in general. 2.7 Discussion In this section, we discuss several implications of our main results and p ossible future researc h directions in generalizing the mo del studied in this pap er. Belo w phase transiti on. W e ﬁrst discuss the p erformance guarantees in the b elo w threshold regime when ˆ ` ˆ r q 2 < 1. As we will sho w, the b ound in (4) alw ays holds ev en when ˆ ` ˆ r q 2 ≤ 1. Ho wev er, n umerical experiments suggest that we should stop our algorithm at ﬁrst iteration when w e are b elo w the phase transition as discussed in Section 2.2. W e pro vide an upper b ound on 23 the resulting error when only one iteration of our iterativ e inference algorithm is used (whic h is equiv alen t as ma jorit y voting algorithm). Notice that the bound in (4) is only meaningful when it is less than a half. When ˆ ` ˆ r q 2 ≤ 1 or `q < 24 log 2, the righ t-hand side of inequality (4) is alwa ys larger than half. Hence the upp er b ound alw a ys holds, ev en without the assumption that ˆ ` ˆ r q 2 > 1, and we only ha ve that assumption in the statemen t of our main theorem to emphasize the phase transition in how our algorithm b eha v es. Ho wev er, we can also try to get a tigh ter b ound than a trivial half implied from (4) in the b elow threshold regime. Sp eciﬁcally , w e empirically observ e that the error rate increases as the n umber of iterations k increases. Therefore, it mak es sense to use k = 1. In whic h case, the algorithm essen tially b oils do wn to the ma jority rule. W e can pro ve the follo wing error bound which generally holds for any regime of ` , r and the w orker distribution F . A pro of of this statemen t is provided in Section 3.6. Lemma 2.10. F or any value of ` , r , and m , and any distribution of workers F , the estimates we get after ﬁrst step of our algorithm achieve 1 m m X i =1 P  t i 6 = ˆ t i  ≤ e − `µ 2 / 4 , (6) wher e µ = E F [2 p j − 1] . Since µ is alw ays b et ween q and q 1 / 2 , the scaling of the ab o v e error exp onen t is alwa ys worse than what w e hav e after running our algorithm for a long time (cf. Theorem 2.1). This suggests that iterating our inference algorithm helps when ˆ ` ˆ r q 2 > 1 and esp ecially when the gap b et ween µ and q is large. Under these conditions, our approach do es signiﬁcantly b etter than ma jorit y voting (cf. Figure 1). The gain of using our approac h is maximized when there exists b oth go od work ers and bad work ers. This is consistent with our intuition that when there is a v ariety of work ers, our algorithm can iden tify the goo d ones and get b etter estimates. Golden standard units. Next, consider the v ariation where we ask questions to w ork ers whose answ ers are already kno wn (also kno wn as ‘gold standard units’). W e can use these to assess the qualit y of the w orkers. There are t wo wa ys we can use this information. First, we can embed ‘seed gold units’ along with the standard tasks, and use these ‘seed gold units’ in turn to perform more informed inference. How ever, w e can show that there is no gain in using suc h ‘seed gold units’. The optimal low er b ound of 1 /q log(1 /ε ) essentially utilizes the existence of oracle that can identify the reliabilit y of every w orker exactly , i.e. the oracle has a lot more information than what can b e gained by such embedded golden questions. Therefore, clearly ‘seed gold units’ do not help the oracle estimator, and hence the order optimality of our approach still holds even if w e include all the strategies that can utilize these ‘seed gold units’. How ev er, in practice, it is common to use the ‘seed gold units’, and this can impro ve the constant factor in the required budget, but not the scaling. Alternativ ely , we can use ‘pilot gold units’ as qualifying or pilot questions that the work ers m ust complete to qualify to participate. Typically a taskmaster do not ha ve to pa y for these qualifying questions and this provides an eﬀective wa y to increase the qualit y of the participating w ork ers. Our approac h can b eneﬁt from such ‘pilot gold units’, whic h has the eﬀect of increasing the eﬀectiv e collectiv e qualit y of the cro wd q . F urther, if we can ‘measure’ ho w the distribution of w ork ers change 24 when using pilot questions, then our main result fully describes how m uch w e can gain by suc h pilot questions. In an y case, pilot questions only change the distribution of participating w orkers, and the order-optimalit y of our approach still holds even if we compare all the schemes that use the same pilot questions. Ho w to optimize o v er a multiple choices of cro wds. W e next consider the scenario where w e ha ve a c hoice ov er which cro wdsourcing platform to use from a set of platforms with diﬀerent cro wds. Eac h crowd migh t ha ve diﬀeren t w orker distributions with diﬀeren t prices. Sp eciﬁcally , supp ose there are K cro wds of work ers: the k -th crowd has collective quality q k and requires paymen t of c k to perform a task. Now our optimalit y result suggests that the per-task cost scales as c k /q k log(1 /ε ) if we only used w ork ers of class k . More generally , if w e use a mix of these w ork ers, sa y α k fraction of w orkers from class k , with P k α k = 1, then the eﬀective parameter q = P k α k q k . And sub ject to this, the optimal p er task cost scales as ( P k α k c k ) / ( P k α k q k ) log (1 /ε ). This immediately suggests that the optimal choice of fraction α k m ust b e such that α k > 0 only if c k /q k = min i c i /q i . That is, the optimal choice is to select w ork ers only from the classes that hav e maximal quality p er cost ratio of q k /c k o ver k ∈ [ K ]. One implication of this observ ation is that it suggests a pricing scheme for cro wdsourcing platforms. If you are managing a crowdsourcing platform with the collective qualit y q and the cost c and there is another cro wdsourcing platform with q 0 and c 0 , you w ant to c ho ose the cost suc h that the quality per cost ratio is at least as go o d as the other cro wd: q /c ≥ q 0 /c 0 . General cro wdsourcing mo dels. Finally , w e consider p ossible generalizations of our mo del. The mo del assumed in this pap er do es not capture sev eral factors: tasks with diﬀerent level of diﬃculties or work ers who alwa ys answer positive or negative. In general, the resp onses of a w orker j to a binary question i may depend on sev eral factors: ( i ) the correct answ er to the task; ( ii ) the diﬃculty of the task; ( iii ) the exp ertise or the reliability of the work er; ( iv ) the bias of the w orker tow ards p ositiv e or negativ e answers. Let t i ∈ { +1 , − 1 } represent the correct answer and r i ∈ [0 , ∞ ) represen ts the lev el of diﬃcult y . also, let α j ∈ [ −∞ , ∞ ] represen t the reliabilit y and β j ∈ ( −∞ , ∞ ) represent the bias of w orker j . In form ula, a work er j ’s resp onse to a binary task i can be mo deled as A ij = sign( Z i,j ) , where Z i,j is a Gaussian random v ariable distributed as Z i,j ∼ N ( α j t i + β j , r i ) and sign( Z ) = 1 almost surely for Z ∼ N ( ∞ , 1). A task with r i = 0 is an easy task and large r i is a diﬃcult task. A w orker with large p ositive α j is more lik ely to give the right answ er and large negativ e α j is more lik ely to give the wrong answer. When α j = 0, the w ork er giv es indep enden t answers regardless of what the correct answ er is. A work er with large β j is biased to wards p ositiv e resp onses and if β j = 0 then the work er is un biased. A similar mo del with multi-dimensional latent v ariables w as studied in [WBBP10]. Most of the models studied in the crowdsourcing literature can b e reduced to a sp ecial case of this mo del. F or example, the early patien t-classiﬁcation mo del in tro duced by Da wid and Sk ene [DS79] is equiv alen t to the ab o ve Gaussian mo del with r i = 1. Each w ork er is represen ted b y tw o laten t quality parameters p + j and p − j , suc h that A ij =  t i with probabilit y p t i j , − t i otherwise . 25 This model captures the bias of w ork ers. More recently , Whitehill et al. [WR W + 09] in tro duced another model where P ( A ij = t i | a i , b j ) = 1 / (1 + e − a i b j ), with w orker reliability a i and task diﬃcult y b j . This is again a sp ecial case of the ab o ve Gaussian model if w e set β j = 0. The mo del w e study in this pap er has an underlying assumption that all the tasks share an equal lev el of diﬃcult y and the work ers are un biased. It is equiv alen t to the ab o ve Gaussian mo del with β j = 0 and r i = 1. In this case, there is a one-to-one relation b et w een the w orker reliabilit y p j and α j : p j = Q ( α j ), where Q ( · ) is the tail probability of the standard Gaussian distribution. 3 Pro of of main results In this section, we pro vide proofs of the main results. 3.1 Pro of of the main result in Theorem 2.1 By symmetry , we can assume that all t i ’s are +1. Let ˆ t ( k ) i denote the resulting estimate of task i after k iterations of our iterativ e inference algorithm deﬁned in Section 2.1. If we dra w a random task I uniformly in [ m ], then we wan t to compute the av erage error probability , which is the probabilit y that w e make an error on this randomly chosen task: 1 m X i ∈ [ m ] P  t i 6 = ˆ t ( k ) i  = P  t I 6 = ˆ t ( k ) I  . (7) W e will prov e an upp er b ound on the probability of error in tw o steps. First, w e prov e that the lo cal neighborho od of a randomly c hosen task node I is a tree with high probability . Then, assuming that the graph is lo cally tree-lik e, w e pro vide an upp er b ound on the error using a technique kno wn as density evolution . W e construct a random bipartite graph G ([ m ] ∪ [ n ] , E ) according to the conﬁguration mo del. W e start with [ m ] × [ ` ] half-edges for task no des and [ n ] × [ r ] half-edges for the w orker no des, and pair all the m` task half-edges to the s ame n um b er of work er half-edges according to a random p erm utation of [ m` ]. Let G i,k denote a subgraph of G ([ m ] ∪ [ n ] , E ) that includes all the nodes whose distance from the ‘ro ot’ i is at most k . At ﬁrst iteration of our inference algorithm, to estimate the task i , w e only use the responses provi ded b y the w ork ers who were assigned to task i . Hence w e are p erforming inference on the lo cal neigh b orhoo d G i, 1 . Similarly , when w e run k iterations of our (message-passing) inference algorithm to estimate a task i , we only run inference on lo cal subgraph G i, 2 k − 1 . Since w e update both task and w ork er messages, we need to grow the subgraph by distance t wo at each iteration. When this lo cal subgraph is a tree, then we can apply densit y ev olution to analyze the probability of error. When this lo cal subgraph is not a tree, w e can make a p essimistic assumption that an error has b een made to get an upp er bound on the actual error probability . P  t I 6 = ˆ t ( k ) I  ≤ P  G I , 2 k − 1 is not a tree  + P  G I , 2 k − 1 is a tree and t I 6 = ˆ t ( k ) I  . (8) Next lemma b ounds the ﬁrst term and shows that the probability that a local subgraph is not a tree v anishes as m gro ws. A pro of of this lemma is pro vided in Section 3.2. 26 Lemma 3.1. F or a r andom ( `, r ) -r e gular bip artite gr aph gener ate d ac c or ding to the c onﬁgur ation mo del, P  G I , 2 k − 1 is not a tree  ≤  ( ` − 1)( r − 1)  2 k − 2 3 `r m . (9) Then, to bound the second term of (8), w e pro vide a sharp upp er bound on the error probabilit y conditioned on that G I , 2 k − 1 is a tree. Let x ( k ) i denote the decision v ariable for task i after k iterations of the iterativ e algorithm such that ˆ t ( k ) i = sign( x ( k ) i ). Then, w e mak e an error whenever this decision v ariable is negative. When this is exactly zero, we make a random decision, in which case we make an error with probability half. Then, P  t I 6 = ˆ t ( k ) I   G I ,k is a tree  ≤ P  x ( k ) I ≤ 0   G I ,k is a tree  . (10) T o analyze the distribution of the decision v ariable on a lo cally tree-like graph, w e use a standard probabilistic analysis technique known as ‘density evolution’ in co ding theory or ‘recursiv e distri- butional equations’ in probabilistic combinatorics [R U08, MM09]. Precisely , w e use the following equalit y that P  x ( k ) I ≤ 0   G I ,k is a tree  = P  ˆ x ( k ) ≤ 0  , (11) where ˆ x ( k ) is deﬁned through densit y ev olution equations (13), (14) and (15) in the follo wing. W e will pro ve in the follo wing that when ˆ ` ˆ r q 2 > 1, P  ˆ x ( k ) ≤ 0  ≤ e − `q / (2 σ 2 k ) . (12) T ogether with equations (11), (10), (9), (8), and (7), this ﬁnishes the pro of of Theorem 2.1. Densit y evolution. At iteration k the algorithm op erates on a set of messages { x ( k ) i → j } ( i,j ) ∈ E and { y ( k ) j → i } ( i,j ) ∈ E . If w e chose an edge ( i, j ) uniformly at random, the v alues of x and y messages on that randomly c hosen edge deﬁne random v ariables whose randomness comes from random choice of the edge, an y randomness introduced by the inference algorithm, the graph, and the realizations of p j ’s and A ij ’s. Let x ( k ) denote this random v ariable corresp onding to the message x ( k ) i → j and y ( k ) p denote the random v ariable corresp onding to y ( k ) j → i conditioned on the latent w orker quality being p for randomly chosen edge ( i, j ). As prov ed in Lemma 3.1, the ( `, r )-regular random graph lo cally con v erges in distribution to a ( `, r )-regular tree with high probability . On a tree, there is a recursiv e wa y of deﬁning the distribution of messages x ( k ) and y ( k ) p . At initialization, we initialize the w orker messages with Gaussian random v ariable with mean one and v ariance one. The corresponding random v ariable y (0) p ∼ N (1 , 1), which at initial step is indep enden t of the work er qualit y p , fully describes the distribution of y (0) j → i for all ( i, j ). At ﬁrst iteration, the task messages are up dated according to x (1) i → j = P j 0 ∈ ∂ i \ j A ij 0 y (0) j 0 → i . If we kno w the distribution of A ij 0 ’s and y (0) j 0 → i ’s, we can up date the distribution of x (1) i → j . Since we are assuming a tree, all x (1) i → j are indep endent. F urther, b ecause of the symmetry in the w ay w e construct our random graph, all x (1) i → j ’s are identically distributed. Precisely , they are distributed according to x (1) deﬁned in (13). This recursiv ely deﬁnes x ( k ) and y ( k ) through the density evolution e quations in (13) and (14) [MM09]. 27 Let us ﬁrst introduce a few deﬁnitions ﬁrst. Here and after, we drop the sup erscript k denoting the iteration num b er whenever it is clear from the context. Let x b ’s and y p,a ’s b e indep enden t ran- dom v ariables distributed according to x and y p resp ectiv ely . Also, z p,a ’s and z p,b ’s are indep enden t random v ariables distributed according to z p , where z p =  +1 with probability p , − 1 with probability 1 − p . This represen ts the answ er giv en by a w orker conditioned on the w orker having qualit y parameter p . Let p ∼ F b e a random v ariable distributed according to the distribution of the work er’s quality F ov er [0 , 1]. Then p a ’s are indep enden t random v ariable distributed according to p . F urther, z p,b ’s and x b ’s are indep enden t, and z p a ,a ’s and y p a ,a ’s are conditionally indep enden t conditioned on p a . W e initialize y p with a Gaussian distribution, whence it is indep endent of the laten t v ariable p : y (0) p ∼ N (1 , 1). Let d = denote equalit y in distribution. Then, for k ∈ { 1 , 2 , . . . } , the task messages are distributed as the sum of ` − 1 incoming messages that are independent and identically distributed according to y ( k − 1) p and w eighted by i.i.d. resp onses: x ( k ) d = X a ∈ [ ` − 1] z p a ,a y ( k − 1) p a ,a . (13) Similarly , the w orker messages (conditioned on the latent w ork er qualit y p ) are distributed as the sum of r − 1 incoming messages that are indep enden t and iden tically distributed according to x ( k ) and w eighted by i.i.d. resp onses: y ( k ) p d = X b ∈ [ r − 1] z p,b x ( k ) b . (14) F or the decision v ariable x ( k ) I on a randomly chosen task I , we hav e ˆ x ( k ) d = X i ∈ [ ` ] z p i ,i y ( k − 1) p i ,i . (15) Numerically or analytically computing the densities in (13) and (14) exactly is not computation- ally feasible when the messages take contin uous v alues as is the case for our algorithm. Typically , heuristics are used to approximate the densities suc h as quantizing the messages, appro ximating the densit y with simple functions, or using Monte Carlo metho d to sample from the density . A nov el con tribution of our analysis is that w e prov e that the messages are sub-Gaussian using recursion, and w e provide an upp er b ound on the parameters in a closed form. This allo ws us to prov e the sharp result on the error b ound that decays exponentially . Mean and v ariance computation. T o giv e an intuition on ho w the messages b eha ve, w e describ e the ev olution of the mean and the v ariance of the random v ariables in (13) and (14). Let p b e a random v ariable distributed according to the measure F . Deﬁne m ( k ) ≡ E [ x ( k ) ], ˆ m ( k ) p ≡ E [ y ( k ) p | p ], v ( k ) ≡ V ar( x ( k ) ), and ˆ v ( k ) p ≡ V ar( y ( k ) p | p ). Also let ˆ ` = ` − 1 and ˆ r = r − 1 to 28 simplify notation. Then, from (13) and (14) w e get that m ( k ) = ˆ ` E p  (2 p − 1) ˆ m ( k − 1) p  , ˆ m ( k ) p = ˆ r (2 p − 1) m ( k ) , v ( k ) = ˆ `  E p [ ˆ v ( k − 1) p +  ˆ m ( k − 1) p  2 ] −  E p  (2 p − 1) ˆ m ( k − 1) p   2  , ˆ v ( k ) p = ˆ r n v ( k ) + ( m ( k ) ) 2 −  (2 p − 1) m ( k )  2 o . Recall that µ = E [2 p − 1] and q = E [(2 p − 1) 2 ]. Substituting ˆ m p and ˆ v p w e get the following ev olution of the ﬁrst and the second moment of the random v ariable x ( k ) . m ( k +1) = ˆ ` ˆ r q m ( k ) , v ( k +1) = ˆ ` ˆ r v ( k ) + ˆ ` ˆ r ( m ( k ) ) 2 (1 − q )(1 + ˆ r q ) . Since ˆ m (0) p = 1 and ˆ v (0) = 1 as p er our assumption, we ha v e m (1) = µ ˆ ` and v (1) = ˆ ` (4 − µ 2 ). This implies that m ( k ) = µ ˆ ` ( ˆ ` ˆ r q ) k − 1 , and v ( k ) = av ( k − 1) + bc k − 2 , with a = ˆ ` ˆ r , b = µ 2 ˆ ` 3 ˆ r (1 − q )(1 + ˆ r q ), and c = ( ˆ ` ˆ r q ) 2 . After some algebra, it follows that v ( k ) = v (1) a k − 1 + bc k − 2 P k − 2 ` =0 ( a/c ) ` . F or ˆ ` ˆ r q 2 > 1, w e hav e a/c < 1 and v ( k ) = ˆ ` (4 − µ 2 )( ˆ ` ˆ r ) k − 1 + (1 − q )(1 + ˆ r q ) µ 2 ˆ ` 2 ( ˆ ` ˆ r q ) 2 k − 2 1 − 1 / ( ˆ ` ˆ r q 2 ) k − 1 ˆ ` ˆ r q 2 − 1 . The ﬁrst and second moment of the decision v ariable ˆ x ( k ) in (15) can b e computed using a similar analysis: E [ ˆ x ( k ) ] = ( `/ ˆ ` ) m ( k ) and V ar( ˆ x ( k ) ) = ( `/ ˆ ` ) v ( k ) . In particular, w e ha v e V ar( ˆ x ( k ) ) E [ ˆ x ( k ) ] 2 = ˆ ` (4 − µ 2 ) ` ˆ `µ 2 ( ˆ ` ˆ r q 2 ) k − 1 + ˆ ` (1 − q )(1 + ˆ r q ) ` ( ˆ ` ˆ r q 2 − 1)  1 − 1 ( ˆ ` ˆ r q 2 ) k − 1  . Applying Cheb yshev’s inequalit y , it immediately follo ws that P ( ˆ x ( k ) < 0) is b ounded b y the righ t- hand side of the ab o v e equality . This b ound is w eak compared to the b ound in Theorem 2.1. In the following, w e pro v e a stronger result using the sub-Gaussianity of x ( k ) . But ﬁrst, let us analyze what this w eak er b ound giv es for diﬀeren t regimes of ` , r , and q , which indicates that the messages exhibit a fundamentally diﬀeren t b eha vior in the regimes separated by a phase transition at ˆ ` ˆ r q 2 = 1. In a ‘go o d’ regime where we hav e ˆ ` ˆ r q 2 > 1, the bound con v erges to a ﬁnite limit as the num b er of iterations k grows. Namely , lim k →∞ P ( ˆ x ( k ) < 0) ≤ ˆ ` (1 − q )(1 + ˆ r q ) ` ( ˆ ` ˆ r q 2 − 1) . Notice that the upp er b ound conv erges to (1 − q ) / ( `q ) as ˆ ` ˆ r q 2 gro ws. This scales in the same wa y as the known b ounds for using the left singular v ector directly for inference (cf. [K OS11]). In the case when ˆ ` ˆ r q 2 < 1, the same analysis giv es V ar( ˆ x ( k ) ) E [ ˆ x ( k ) ] 2 = e Θ( k ) . 29 Finally , when ˆ ` ˆ r q 2 = 1, w e get v ( k ) = ( ˆ ` ˆ r ) k + ˆ ` ˆ r (1 − q )(1 + ˆ r q )( ˆ ` ˆ r q ) 2 k − 2 k , whic h implies V ar( ˆ x ( k ) ) E [ ˆ x ( k ) ] 2 = Θ( k ) . Analyzing the densit y . Our strategy to pro vide a tigh t upp er b ound on P ( ˆ x ( k ) ≤ 0) is to sho w that ˆ x ( k ) is sub-Gaussian with appropriate parameters and use the Chernoﬀ bound. A random v ariable z with mean m is said to b e sub-Gaussian with parameter ˜ σ if for all λ ∈ R the follo wing inequalit y holds: E [ e λ z ] ≤ e mλ +(1 / 2) ˜ σ 2 λ 2 . Deﬁne ˜ σ 2 k ≡ 2 ˆ ` ( ˆ ` ˆ r ) k − 1 + µ 2 ˆ ` 3 ˆ r (3 q ˆ r + 1)( q ˆ ` ˆ r ) 2 k − 4 1 − (1 /q 2 ˆ ` ˆ r ) k − 1 1 − (1 /q 2 ˆ ` ˆ r ) , and m k ≡ µ ˆ ` ( q ˆ ` ˆ r ) k − 1 for k ∈ Z . W e will ﬁrst sho w that, x ( k ) is sub-Gaussian with mean m k and parameter ˜ σ 2 k for a regime of λ w e are interested in. Precisely , w e will show that for | λ | ≤ 1 / (2 m k − 1 ˆ r ), E [ e λ x ( k ) ] ≤ e m k λ +(1 / 2) ˜ σ 2 k λ 2 . (16) By deﬁnition, due to distributional indep endence, we hav e E [ e λ ˆ x ( k ) ] = E [ e λ x ( k ) ] ( `/ ˆ ` ) . Therefore, it follo ws from (16) that ˆ x ( k ) satisﬁes E [ e λ ˆ x ( k ) ] ≤ e ( `/ ˆ ` ) m k λ +( `/ 2 ˆ ` ) ˜ σ 2 k λ 2 . Applying the Chernoﬀ b ound with λ = − m k / ( ˜ σ 2 k ), w e get P  ˆ x ( k ) ≤ 0  ≤ E  e λ ˆ x ( k )  ≤ e − ` m 2 k / (2 ˆ ` ˜ σ 2 k ) , (17) Since m k m k − 1 / ( ˜ σ 2 k ) ≤ µ 2 ˆ ` 2 ( q ˆ ` ˆ r ) 2 k − 3 / (3 µ 2 q ˆ ` 3 ˆ r 2 ( q ˆ ` ˆ r ) 2 k − 4 ) = 1 / (3 ˆ r ), it is easy to chec k that | λ | ≤ 1 / (2 m k − 1 ˆ r ). This implies the desired b ound in (12). No w w e are left to prov e that x ( k ) is sub-Gaussian with appropriate parameters. W e can write do wn a recursiv e formula for the ev olution of the momen t generating functions of x and y p as E  e λ x ( k )  =  E p h p E [ e λ y ( k − 1) p | p ] + ¯ p E [ e − λ y ( k − 1) p | p ] i ˆ ` , (18) E  e λ y ( k ) p  =  p E  e λ x ( k )  + ¯ p E  e − λ x ( k )   ˆ r , (19) where ¯ p = 1 − p and ¯ p = 1 − p . W e can pro ve that these are sub-Gaussian using induction. First, for k = 1, we show that x (1) is sub-Gaussian with mean m 1 = µ ˆ ` and parameter ˜ σ 2 1 = 2 ˆ ` , where µ ≡ E [2 p − 1]. Since y p is initialized as Gaussian with unit mean and v ariance, w e hav e E [ e λ y (0) p ] = e λ +(1 / 2) λ 2 regardless of p . Substituting this into (18), we get for an y λ , E h e λ x (1) i =  E [ p ] e λ + (1 − E [ p ]) e − λ  ˆ ` e (1 / 2) ˆ `λ 2 ≤ e ˆ `µλ + ˆ `λ 2 , (20) where the inequalit y follows from the fact that ae z + (1 − a ) e − z ≤ e (2 a − 1) z +(1 / 2) z 2 for any z ∈ R and a ∈ [0 , 1] (cf. [AS08, Lemma A.1.5]). 30 Next, assuming E [ e λ x ( k ) ] ≤ e m k λ +(1 / 2) ˜ σ 2 k λ 2 for | λ | ≤ 1 / (2 m k − 1 ˆ r ), w e sho w that E [ e λ x ( k +1) ] ≤ e m k +1 λ +(1 / 2) ˜ σ 2 k +1 λ 2 for | λ | ≤ 1 / (2 m k ˆ r ), and compute appropriate m k +1 and ˜ σ 2 k +1 . Substituting the b ound E [ e λ x ( k ) ] ≤ e m k λ +(1 / 2) ˜ σ 2 k λ 2 in (19), we get E [ e λ y ( k ) p ] ≤ ( pe m k λ + ¯ pe − m k λ ) ˆ r e (1 / 2) ˆ r ˜ σ 2 k λ 2 . F urther applying this bound in (18), w e get E h e λ x ( k +1) i ≤  E p h p ( p e m k λ + ¯ p e − m k λ ) ˆ r + ¯ p ( p e − m k λ + ¯ p e m k λ ) ˆ r i ˆ ` e (1 / 2) ˆ ` ˆ r ˜ σ 2 k λ 2 . (21) T o b ound the ﬁrst term in the righ t-hand side, we use the next k ey lemma. A pro of of this lemma is pro vided in Section 3.3. Lemma 3.2. F or any | z | ≤ 1 / (2 ˆ r ) and p ∈ [0 , 1] such that q = E [(2 p − 1) 2 ] , we have E p h p ( p e z + ¯ p e − z ) ˆ r + ¯ p ( ¯ p e z + p e − z ) ˆ r i ≤ e q ˆ rz +(1 / 2)(3 q ˆ r 2 + ˆ r ) z 2 . Applying this inequalit y to (21) gives E [ e λ x ( k +1) ] ≤ e q ˆ ` ˆ rm k λ +(1 / 2)  (3 q ˆ ` ˆ r 2 + ˆ ` ˆ r ) m 2 k + ˆ ` ˆ r ˜ σ 2 k  λ 2 , for | λ | ≤ 1 / (2 m k ˆ r ). In the regime where q ˆ ` ˆ r ≥ 1 as p er our assumption, m k is non-decreasing in k . A t iteration k , the ab ov e recursion holds for | λ | ≤ min { 1 / (2 m 1 ˆ r ) , . . . , 1 / (2 m k − 1 ˆ r ) } = 1 / (2 m k − 1 ˆ r ). Hence, w e get the follo wing recursion for m k and ˜ σ k suc h that (16) holds for | λ | ≤ 1 / (2 m k − 1 ˆ r ). m k +1 = q ˆ ` ˆ r m k , ˜ σ 2 k +1 = (3 q ˆ ` ˆ r 2 + ˆ ` ˆ r ) m 2 k + ˆ ` ˆ r ˜ σ 2 k . With the initialization m 1 = µ ˆ ` and ˜ σ 2 1 = 2 ˆ ` , w e ha v e m k = µ ˆ ` ( q ˆ ` ˆ r ) k − 1 for k ∈ { 1 , 2 , . . . } and ˜ σ 2 k = a ˜ σ 2 k − 1 + bc k − 2 for k ∈ { 2 , 3 , . . . } , with a = ˆ ` ˆ r , b = µ 2 ˆ ` 2 (3 q ˆ ` ˆ r 2 + ˆ ` ˆ r ), and c = ( q ˆ ` ˆ r ) 2 . After some algebra, it follo ws that ˜ σ 2 k = ˜ σ 2 1 a k − 1 + bc k − 2 P k − 2 ` =0 ( a/c ) ` . F or ˆ ` ˆ r q 2 6 = 1, w e hav e a/c 6 = 1, whence ˜ σ 2 k = ˜ σ 2 1 a k − 1 + bc k − 2 (1 − ( a/c ) k − 1 ) / (1 − a/c ). This ﬁnishes the pro of of (16). 3.2 Pro of of Lemma 3.1 Consider the follo wing discrete time random pro cess that generates the random graph G I , 2 k − 1 starting from the ro ot I . A t ﬁrst step, we connect ` w orker nodes to no de I according to the conﬁguration model, where ` half-edges are matches to a randomly c hosen subset of nr w orker half-edges of size ` . Let α 1 denote the probability that the resulting graph is a tree, that is no pair of edges are connected to the same work er no de. Since there are  ` 2  pairs and each pair of half-edges are connected to the same w orker no de with probabilit y ( r − 1) / ( nr − 1): α 1 ≤  ` 2  r − 1 nr − 1 . Similarly , deﬁne α t ≡ P ( G I , 2 t − 1 is not a tree | G I , 2 t − 2 is a tree ) , and β t ≡ P ( G I , 2 t − 2 is not a tree | G I , 2 t − 3 is a tree ) . 31 Then, P ( G I , 2 k − 1 is not a tree ) ≤ α 1 + k X t =2  α t + β t  . (22) W e can upp er b ound α t ’s and β t ’s in a similar wa y . F or generating G I , 2 t − 1 conditioned on G I , 2 t − 2 b eing a tree, there are ` ( ˆ ` ˆ r ) t − 1 half-edges, where ˆ ` = ` − 1 and ˆ r = r − 1. Among  ` ( ˆ ` ˆ r ) t − 1 2  pairs of these half-edges, each pair will b e connected to the same work er with probabilit y at most ( r − 1) / ( r ( n − P t − 1 a =1 ` ( ˆ ` ˆ r ) a − 1 ) − 1), where P t − 1 a =1 ` ( ˆ ` ˆ r ) a − 1 is the total n um b er of work er no des that are assigned so far in G I , 2 t − 2 . Then, α t ≤ ` 2 ( ˆ ` ˆ r ) 2 t − 2 2 r − 1 r ( n − ( ` (( ˆ ` ˆ r ) t − 2 − 1)) / ( ˆ ` ˆ r − 1)) − 1 ≤ ` 2 ( ˆ ` ˆ r ) 2 t − 2 2( n − ` ( ˆ ` ˆ r ) t − 2 / 2) ≤ ` 2 ( ˆ ` ˆ r ) 2 t − 2 n + ` ( ˆ ` ˆ r ) t − 2 n ≤ 3 ` 2 ( ˆ ` ˆ r ) 2 t − 2 2 n , where the second inequalit y follo ws from the fact that ( a − 1) / ( b − 1) ≤ a/b for all a ≤ b and ˆ ` ˆ r ≥ 2 as p er our assumption, and in the third inequalit y we used the fact that α t is upp er b ounded by one and the fact that for f ( x ) = b/ ( x − a ) which is upp er b ounded by one, w e hav e f ( x ) ≤ (2 b/x ) + (2 a/x ). Similarly , w e can sho w that β t ≤ 3 ` 2 ( ˆ ` ˆ r ) 2 t − 2 ˆ ` 2 m . Substituting α t and β t in to (22), w e get that P ( G I , 2 k − 1 is not a tree ) ≤ ( ˆ ` ˆ r ) 2 k − 2 3 `r m . 3.3 Pro of of Lemma 3.2 By the fact that ae b + (1 − a ) e − b ≤ e (2 a − 1) b +(1 / 2) b 2 for an y b ∈ R and a ∈ [0 , 1], w e ha ve p e z + ¯ p e − z ≤ e (2 p − 1) z +(1 / 2) z 2 almost surely . Applying this inequality once again, we get E h p ( p e z + ¯ p e − z ) ˆ r + ¯ p ( ¯ p e z + p e − z ) ˆ r i ≤ E h e (2 p − 1) 2 ˆ r z +(1 / 2)(2 p − 1) 2 ˆ r 2 z 2 i e (1 / 2) ˆ rz 2 . Using the fact that e a ≤ 1 + a + 0 . 63 a 2 for | a | ≤ 5 / 8, E h e (2 p − 1) 2 ˆ r z +(1 / 2)(2 p − 1) 2 ˆ r 2 z 2 i ≤ E h 1 + (2 p − 1) 2 ˆ r z + (1 / 2)(2 p − 1) 2 ˆ r 2 z 2 + 0 . 63((2 p − 1) 2 ˆ r z + (1 / 2)(2 p − 1) 2 ˆ r 2 z 2 ) 2 i ≤ 1 + q ˆ r z + (3 / 2) q ˆ r 2 z 2 ≤ e q ˆ rz +(3 / 2) q ˆ r 2 z 2 , for | z | ≤ 1 / (2 ˆ r ). This prov es Lemma 3.2. 32 3.4 Pro of of a b ound on ma jorit y voting in Lemma 2.6 Ma jority v oting simply follo ws what the ma jorit y of w orkers agree on. In form ula, ˆ t i = sign( P j ∈ W i A ij ), where W i denotes the neigh b orhoo d of no de i in the graph. It makes a random choice when there is a tie. W e w an t to compute a lo wer b ound on P ( ˆ t i 6 = t i ). Let x i = P j ∈ W i A ij . Assuming t i = +1 without loss of generalit y , the error rate is lo w er bounded b y P ( x i < 0). After rescaling, (1 / 2)( x i + ` ) is a standard binomial random v ariable Binom( `, α ), where ` is the n umber of neigh b ors of the no de i , α = E [ p j ], and b y assumption eac h A ij is one with probability α . It follows that P ( x i = − l + 2 k ) = (( ` !) / ( ` − k )! k !) α k (1 − α ) l − k . F urther, for k ≤ α` − 1, the probabilit y distribution function is monotonically increasing. Precisely , P ( x i = − ` + 2( k + 1)) P ( x i = − ` + 2 k ) ≥ α ( ` − k ) (1 − α )( k + 1) ≥ α ( ` − α` + 1) (1 − α ) α` > 1 , where w e used the fact that the ab ov e ratio is decreasing in k whence the minim um is ac hiev ed at k = α` − 1 under our assumption. Let us assume that ` is even, so that x i tak e ev en v alues. When ` is o dd, the same analysis w orks, but x i tak es odd v alues. Our strategy is to use a simple b ound: P ( x i < 0) ≥ k P ( x i = − 2 k ). By assumption that α = E [ p j ] ≥ 1 / 2, F or an appropriate choice of k = √ l , the righ t-hand side closely appro ximates the error probabilit y . By deﬁnition of x i , it follo ws that P  x i = − 2 √ `  =  ` `/ 2 + √ `  α `/ 2 − √ l  1 − α  `/ 2+ √ ` . (23) Applying Stirling’s appro ximation, w e can sho w that  ` `/ 2 + √ `  ≥ C 2 √ ` 2 l , (24) for some p ositiv e constan t C 2 . W e are in terested in the case where work er qualit y is low, that is α is close to 1 / 2. Accordingly , for the second and third terms in (23), w e expand in terms of 2 α − 1. log  α `/ 2 − √ `  1 − α  `/ 2+ √ `  =  ` 2 − √ `  log(1 + (2 α − 1)) − log (2)  +  ` 2 + √ `  log(1 − (2 α − 1)) − log (2)  = − ` log (2) − ` (2 α − 1) 2 2 + O ( p ` (2 α − 1) 4 ) . (25) W e can substitute (24) and (25) in (23) to ge t the following b ound: P ( x i < 0) ≥ exp  − C 3 ( ` (2 α − 1) 2 + 1)  , (26) for some positive constant C 3 . No w, let ` i denote the degree of task no de i , suc h that P i ` i = `m . Then for an y { t i } ∈ {± 1 } m , an y distribution of p such that µ = E [2 p − 1] = 2 α − 1, and any non-adaptive task assignmen t for m tasks, the following lo w er b ound is true. 1 m X i ∈ [ m ] P  t i 6 = ˆ t i  ≥ 1 m m X i =1 e − C 3 ( ` i µ 2 +1) ≥ e − C 3 ( `µ 2 +1) , 33 where the last inequalit y follows from con v exity of the exponential function. Under the spammer- hammer model, where µ = q this gives min τ ∈T ` max t ∈{± 1 } m , F ∈F q 1 m X i ∈ [ m ] P  t i 6 = ˆ t i  ≥ e − C 3 ( `q 2 +1) . This ﬁnishes the pro of of lemma. 3.5 Pro of of a b ound on the adaptive sc hemes in Theorem 2.7 In this section, we pro ve that, ev en with the help of an oracle, the probabilit y of error cannot deca y faster than e − C `q . W e consider an labeling algorithm whic h has access to an oracle that kno ws the reliabilit y of ev ery w orker (all the p j ’s). A t k -th step, after the algorithm assign T k and all the | T k | answ ers are collected from the k -th work er, the oracle pro vides the algorithm with p k . Using all the previously collected answ ers { A ij } j ≤ k and the w orker reliabilit y { p j } j ≤ k , the algorithm mak es a decision on the next task assignmen t T k +1 . This pro cess is repeated until a stopping criterion is met, and the algorithm outputs the optimal estimate of the true labels. The algorithm can compute the maximum lik eliho o d estimates, which is known to minimize the probabilit y of making an error. Let W i b e the set of w orkers assigned to task i , then ˆ t i = sign  X j ∈ W i log  p j 1 − p j  A ij  . (27) W e are going to show that there exists a family of distributions F suc h that for any stopping rule and any task assignmen t scheme, the probabilit y of error is low er b ounded by e − C `q . W e deﬁne the follo wing family of distributions according to the spammer-hammer mo del with imp erfect hammers. W e assume that q ≤ a 2 and p j =  1 / 2 with probability 1 − ( q /a 2 ) , 1 / 2(1 + a ) with probability q /a 2 , suc h that E [(2 p j − 1) 2 ] = q . Let W i denote the set of work ers assigned to task i when the algorithm has stopp ed. Then | W i | is a random v ariable representing the total num b er of work ers assigned to task i . The oracle estimator knows all the v alues necessary to compute the error probabilit y of each task. Let E i = E [ I ( t i 6 = ˆ t i ) |{ A ij } , { p j } ] b e the random v ariable representing the error probability as computed b y the oracle estimator, conditioned on the | W i | resp onses we get from the work ers and their reliabilit y p j ’s. W e are in terested in identifying how the a verage budget (1 /m ) P i E  | W i |  dep ends on the ac hiev e a verage error rate (1 /m ) P i E [ E i ]. In the following we will sho w that for any task i , indep enden t of whic h task allocation scheme is used, it is necessary that E  | W i |  ≥ 0 . 27 q log  1 2 E [ E i ]  . (28) By con vexit y of log (1 /x ) and Jensen’s inequality , this implies that 1 m m X i =1 E  | W i |  ≥ 0 . 27 q log  1 2(1 /m ) P m i =1 E [ E i ]  . 34 Since the total num ber of queries has to b e consisten t, w e hav e P j | T j | = P i | W i | ≤ m` . Also, by deﬁnition E [ E i ] = P ( t i 6 = ˆ t i ). Then, from the ab o ve inequality , we get (1 /m ) P i ∈ [ m ] P ( t i 6 = ˆ t i ) ≥ (1 / 2) e − (1 / 0 . 27) q ` , which ﬁnishes the pro of of the theorem. Note that this b ound holds for an y v alue of m . No w, we are left to prov e that the inequalit y (28) holds. F ocusing on a single task i , since w e kno w who the spammers are and spammers give us no information ab out the task, w e only need the resp onses from the reliable w ork ers in order to mak e an optimal estimate as p er (27). The conditional error probability E i of the optimal estimate dep ends on the realizations of the answers { A ij } j ∈ W i and the w ork er reliabilit y { p j } j ∈ W i . The follo wing lo w er bound on the error only depends on the n um b er of reliable work ers, whic h we denote by ` i . Without loss of generality , let t i = +1. Then, if all the reliable work ers provide ‘ − ’ answers, the maxim um lik eliho o d estimation would b e ‘ − ’ for this task. This leads to an error. Therefore, E i ≥ P (all ` i reliable w orkers answered − ) = 1 2  1 − a 2  ` i , for all the realizations of { A ij } and { p j } . The scaling by half ensures that the ab o ve inequality holds ev en when ` i = 0. By con v exity and Jensen’s inequality , it follows that E  ` i  ≥ log  2 E [ E i ]  log  (1 − a ) / 2  . When we recruit | W i | work ers, w e see ` i = ( q /a 2 ) | W i | reliable ones on a verage. F ormally , w e hav e E [ ` i ] = ( q /a 2 ) E [ | W i | ]. Again applying Jensen’s inequality , we get E  | W i |  ≥ 1 q a 2 log  (1 − a ) / 2  log  2 E [ E i ]  . Maximizing o ver all choices of a ∈ (0 , 1), w e get E  | W i |  ≥ − log  2 E [ E i ]  0 . 27 q , whic h in particular is true with a = 0 . 8. F or this c hoice of a , the result holds in the regime where q ≤ 0 . 64. Notice that b y changing the constant in the b ound, w e can ensure that the result holds for an y v alues of q . This ﬁnishes the proof of (28). 3.6 Pro of of a b ound with one iteration in Lemma 2.10 The probabilit y of making an error after one iteration of our algorithm for no de i is P ( t i 6 = ˆ t (1) i ) ≤ P ( ˆ x i ≤ 0), where ˆ x i = P j ∈ ∂ i A ij y (1) j → i . Assuming t i = +, without loss of generalit y , A ij is +1 with probabilit y E [ p ] and − 1 otherwise. All y (1) j → i ’s are initialized as Gaussian random v ariables with mean one and v ariance one. All these random v ariables are independent of one another at this initial step. Hence, the resulting random v ariable ˆ x i is a sum of a shifted binomial random v ariable 2(Binom( `, E [ p ]) − ` ) and a zero-mean Gaussian random v ariable N (0 , ` ). F rom calculations similar to (20), it follows that E h e λ ˆ x (1) i ≤ e `µλ + `λ 2 ≤ e − (1 / 4) `µ 2 , where w e c ho ose λ = − µ/ 2. By Chernoﬀ ’s inequalit y , this implies the lemma for any v alue of m . 35 4 Conclusion W e conclude with some limitations of our results and interesting researc h directions. 1. Mor e gener al mo dels. In this pap er, we provided an order-optimal task assignmen t scheme and an order-optimal inference algorithm for that task assignment assuming a probabilistic cro wd- sourcing mo del. In this mo del, we assumed that each work er makes a mistak e randomly according to a w ork er speciﬁc qualit y parameter. Two main simpliﬁcations we make here is that, ﬁrst, the w orker’s reliabilit y do es not dep end on whether the task is a p ositiv e task or a negative task, and second, all the tasks are equally easy or diﬃcult. The main remaining challenges in developing inference algorithms for cro wdsourcing is how to dev elop a solution for more generic mo dels for- mally described in Section 2.7. When w orkers exhibit bias and can ha ve heterogeneous qualit y parameters dep ending on the correct answ er to the task, sp ectral metho ds using low-rank matrix appro ximations nicely generalize to giv e an algorithmic solution. Also, it w ould b e interesting to ﬁnd algorithmic solutions with p erformance guarantees for the generic mo del where tasks diﬃculties are tak en in to account. 2. Impr oving the c onstant. W e prov e our approach is minimax optimal up to a constant factor. Ho wev er, there migh t b e another algorithm with b etter constan t factor than our inference algorithm. Some modiﬁcation of the exp ectation maximization or the belief propagation migh t ac hieve a better constan t compared to our inference algorithm. It is an in teresting research direction to ﬁnd such an algorithm and giv e an upper b ound on the error probabilit y that is smaller than what we ha v e in our main theorem. 3. Instanc e-optimality. The optimality of our approac h is pro ved under the w orst-case work er distribution. How ev er, it is not known whether our approach is instance-optimal or not under the non-adaptiv e scenario. It would b e important to pro ve low er b ounds for all w orker distributions or to ﬁnd a counter example where another algorithm ac hieves a strictly b etter performance for a particular w orker distribution in terms of the scaling of the required budget. 4. Phase tr ansition. W e empirically observ e that there is a phase transition around ˆ ` ˆ r q 2 = 1. Belo w this, no algorithm can do b etter than ma jority voting. This phase transition seems to b e an algorithm-indep enden t and fundamen tal prop ert y of the problem (and the random graph). It migh t b e possible to formally pro ve the fundamental diﬀerence in the w ay information propagates under the cro wdsourcing mo del. Suc h phase transition has been studied for a simpler mo del of broadcasting on trees in information theory and statistical mec hanics [EKPS00]. References [AS08] N. Alon and J. H. Spencer, The pr ob abilistic metho d , John Wiley , 2008. [BBMK11] M. S. Bernstein, J. Brandt, R. C. Miller, and D. R. Karger, Cr owds in two se c onds: enabling r e altime cr owd-p ower e d interfac es , Pro ceedings of the 24th annual A CM sym- p osium on User interface softw are and tec hnology , UIST ’11, 2011, pp. 33–42. [BJJ + 10] J. P . Bigham, C. Ja yan t, H. Ji, G. Little, A. Miller, R. C. Miller, R. Miller, A. T atarow- icz, B. White, S. White, and T. Y eh, Vizwiz: ne arly r e al-time answers to visual ques- tions , Pro ceedings of the 23nd annual A CM symp osium on User interface soft ware and tec hnology , UIST ’10, 2010, pp. 333–342. 36 [BLM + 10] M. S. Bernstein, G. Little, R. C. Miller, B. Hartmann, M. S. Ac k erman, D. R. Karger, D. Cro w ell, and K. Pano vic h, Soylent: a wor d pr o c essor with a cr owd inside , Pro ceed- ings of the 23nd annual A CM symp osium on User interface soft ware and tec hnology (New Y ork, NY, USA), ACM UIST, 2010, pp. 313–322. [Bol01] B. Bollob´ as, R andom Gr aphs , Cambridge Univ ersit y Press, Jan uary 2001. [CHMA10] L. B. Chilton, J. J. Horton, R. C. Miller, and S. Azenk ot, T ask se ar ch in a human c omputation market , Pro ceedings of the ACM SIGKDD W orkshop on Human Compu- tation, HCOMP ’10, 2010, pp. 1–9. [DCS09] P . Donmez, J. G. Carb onell, and J. Sc hneider, Eﬃciently le arning the ac cur acy of lab el- ing sour c es for sele ctive sampling , Proceedings of the 15th ACM SIGKDD in ternational conference on Kno wledge disco very and data mining, A CM, 2009, pp. 259–268. [DS79] A. P . Dawid and A. M. Skene, Maximum likeliho o d estimation of observer err or-r ates using the em algorithm , Journal of the Ro yal Statistical Society . Series C (Applied Statistics) 28 (1979), no. 1, 20–28. [EHR11] S ¸ . Ertekin, H. Hirsh, and C. Rudin, Appr oximating the wisdom of the cr owd , Pro ceed- ings of the Second W orkshop on Computational So cial Science and the Wisdom of Cro wds (NIPS 2011), 2011. [EKPS00] W. Ev ans, C. Keny on, Y. Peres, and L. J. Sch ulman, Br o adc asting on tr e es and the ising mo del , The Annals of Applied Probability 10 (2000), no. 2, pp. 410–433. [FHI11] S. F aradani, B. Hartmann, and P . G. Ipeirotis, What’s the right pric e? pricing tasks for ﬁnishing on time , Human Computation’11, 2011. [Hol11] S. Holmes, Cr owd c ounting a cr owd , 2011, Marc h 2011, Statistics Seminar, Stanford Univ ersity . [Ip e10] P . G. Ip eirotis, Analyzing the amazon me chanic al turk marketplac e , XRDS 17 (2010), no. 2, 16–21. [JG03] R. Jin and Z. Ghahramani, L e arning with multiple lab els , Adv ances in neural informa- tion processing systems, 2003, pp. 921–928. [K OS11] D. R. Karger, S. Oh, and D. Shah, Budget-optimal cr owdsour cing using low-r ank matrix appr oximations , Pro c. of the Allerton Conf. on Comm un., Con trol and Computing, 2011. [Lan50] C. Lanczos, A n iter ation metho d for the solution of the eigenvalue pr oblem of line ar diﬀer ential and inte gr al op er ators , Journal of Researc h of The National Bureau Of Standards 45 (1950), no. 4, 255–282. [L W89] N. Littlestone and M. K. W armuth, The weighte d majority algorithm , F oundations of Computer Science, 1989., 30th Annual Symp osium on, oct 1989, pp. 256 –261. [MM09] M. Mezard and A. Montanari, Information, physics, and c omputation , Oxford Univer- sit y Press, Inc., New Y ork, NY, USA, 2009. 37 [MW10] W. Mason and D. J. W atts, Financial inc entives and the “p erformanc e of cr owds” , SIGKDD Explor. Newsl. 11 (2010), no. 2, 100–108. [P ea88] J. Pearl, Pr ob abilistic R e asoning in Intel ligent Systems , Morgan Kaufmann Publ., San Mateo, Califonia, 1988. [R U08] T. Ric hardson and R. Urbanke, Mo dern Co ding The ory , Cam bridge Universit y Press, marc h 2008. [R Y12] V. C. Ra yk ar and S. Y u, Eliminating sp ammers and r anking annotators for cr owd- sour c e d lab eling tasks , J. Mac h. Learn. Res. 13 (2012), 491–518. [R YZ + 10a] V. C. Ra yk ar, S. Y u, L. H. Zhao, G. H. V aladez, C. Florin, L. Bogoni, and L. Moy , L e arning fr om cr owds , J. Mac h. Learn. Res. 99 (2010), 1297–1322. [R YZ + 10b] V. C. Ra yk ar, S. Y u, L. H. Zhao, G. H. V aladez, C. Florin, L. Bogoni, L. Mo y , and D. Blei, L e arning fr om cr owds , Journal of Mac hine Le arning Researc h (2010), no. 11, 1297–1322. [SFB + 95] P . Smyth, U. F ayy ad, M. Burl, P . Perona, and P . Baldi, Inferring gr ound truth fr om subje ctive lab el ling of venus images , Adv ances in neural information pro cessing systems, 1995, pp. 1085–1092. [SPI08] V. S. Sheng, F. Pro vost, and P . G. Ip eirotis, Get another lab el? impr oving data quality and data mining using multiple, noisy lab elers , Pro ceeding of the 14th ACM SIGKDD in ternational conference on Knowledge disco very and data mining, KDD ’08, A CM, 2008, pp. 614–622. [WBBP10] P . W elinder, S. Branson, S. Belongie, and P . Perona, The multidimensional wisdom of cr owds , Adv ances in Neural Information Processing Systems, 2010, pp. 2424–2432. [WR W + 09] J. Whitehill, P . Ruvolo, T. W u, J. Bergsma, and J. Mov ellan, Whose vote should c ount mor e: Optimal inte gr ation of lab els fr om lab elers of unknown exp ertise , Adv ances in Neural Information Processing Systems, v ol. 22, 2009, pp. 2035–2043. [WS67] G. Wyszecki and W. S. Stiles, Color scienc e: Conc epts and metho ds, quantitative data and formulae , Wiley-Interscience, 1967. [YFW03] J. S. Y edidia, W. T. F reeman, and Y. W eiss, Understanding b elief pr op agation and its gener alizations , pp. 239–269, Morgan Kaufmann Publishers Inc., San F rancisco, CA, USA, 2003. [YK G10] T. Y an, V. Kumar, and D. Ganesan, Cr owdse ar ch: exploiting cr owds for ac cur ate r e al- time image se ar ch on mobile phones , Pro ceedings of the 8th in ternational conference on Mobile systems, applications, and services, MobiSys ’10, 2010, pp. 77–90. [ZSD10] Y. Zheng, S. Scott, and K. Deng, A ctive le arning fr om multiple noisy lab elers with varie d c osts , Data Mining (ICDM), 2010 IEEE 10th International Conference on, dec. 2010, pp. 639 –648. 38

Budget-Optimal Task Allocation for Reliable Crowdsourcing Systems

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment