A Study of Unsupervised Adaptive Crowdsourcing

Reading time: 5 minute
...

📝 Original Info

  • Title: A Study of Unsupervised Adaptive Crowdsourcing
  • ArXiv ID: 1110.1781
  • Date: 2023-06-15
  • Authors: : John Doe, Jane Smith, Robert Johnson

📝 Abstract

We consider unsupervised crowdsourcing performance based on the model wherein the responses of end-users are essentially rated according to how their responses correlate with the majority of other responses to the same subtasks/questions. In one setting, we consider an independent sequence of identically distributed crowdsourcing assignments (meta-tasks), while in the other we consider a single assignment with a large number of component subtasks. Both problems yield intuitive results in which the overall reliability of the crowd is a factor.

💡 Deep Analysis

Figure 1

📄 Full Content

On-line crowdsourcing addresses the problem of solving a large meta-task by decomposing it into a large number of small tasks/questions and assigning them to an online community of peers/users. Examples of decomposable meta-tasks include [4], [11]:

• annotating (including recommending) or classifying a large number of consumer products and services, or data objects such as documents [12], web sites (e.g., answering which among a large body of URLs contains pornography), images, videos; • translating or transcribing a document [6] possibly including decoding a body of CAPTCHAs [17]; • document correction through proofreading [5], [20]; and • creating and maintaining content, e.g., Wikipedia and open-source communities. General purpose platforms for on-line crowdsourcing include Amazon’s Mechanical Turk [1], [12], [14] and Crowd Flower [7].

Users responding to questions may do so with different degrees of reliability. If p is the probability that a user correctly answers a question, let the expectation Ep be taken over the ensemble of users. Thus, Ep is a measure of the reliability of the majority and, fundamentally, whether the positive correlation with the majority ought to be sought for individual users (as is typically assumed in many online unsupervised “polling” systems).

A user population is arguably reliable (Ep > 0.5) when the population itself ultimately decides the issue (e.g., confidence intervals for an election poll), or the questions concern a commonplace issue with commonplace expertise among the population (e.g., whether a web site contains pornography), or the population is significantly financially incentivized to be accurate (i.e., incentivized to acquire the required expertise This work is supported in part by NSF CNS grant 0916179. to be accurate). Some market-based crowdsourcing scenarios (e.g., questions of investing in stocks of complex companies), or analogies to bookmaking (setting odds so that the house always profits), may not be relevant here, i.e., scenarios where questions are pushed to users who minimally profit by answering them correctly. That is, for some specialized technical issues, it may be possible that the “crowd” will be unhelpful (Ep ≈ 0.5) or incorrectly prejudiced/biased (Ep < 0.5). In many cases, the users may need to be paid for questions answered [19]. Thus, the crowdsourcer is incentivized to determine the reliability of individual users in a scalable fashion.

This paper is organized as follows. The iterative, unsupervised framework and assumptions of [13] in Section II. In Section III, we find expressions for the mean and variance of the parameters (y) used to weight user answers after one iteration, under certain assumptions related to the regularity of connectivity of the bipartite graph matching users to questions/sub-tasks. We also state the existence of a fixed point for a normalized version of the user-weights iteration.

To derive an asymptotic result, we consider the user weight iteration spanning a sequence of independent and identically distributed (i.i.d.) meta-tasks, with one iteration per meta-task. We give the results of a numerical study for the original system (multiple iterations for single meta-task) in Section V. In Section VI, how the crowdsourcing system of [13] is related to LDPC decoding is decribed. Finally, we conclude with a summary in Section VII.

In [13], a single meta-task is divided into a group of |Q| similar subtasks/questions i ∈ Q for which the true Boolean answers are encoded z i ∈ {-1, 1}. These questions are assigned to a group of U users a ∈ U . If a is assigned question i, then his/her answer is A ia ∈ {-1, 1}. Again, the questions i are assumed similar so we model user a with a task-independent parameter p a which reflects the reliability of the user’s answer: for all i,

Suppose that the response to question i is determined by the crowdsourcer as

where ∂i ⊂ U is the group of users assigned to question i and y b→i is the weight given to user b for question i.

If y b→i is the same positive constant for all b, i, then the crowdsourcer is simply taking a majority vote without any knowledge of the reliability of the peers.

One approach to determining weights y is to assess how each user a performs with respect to the majority of those assigned to the same question i. The presumption is that the majority will tend to be correct on average. Given that, how can the crowdsourcer identify the unreliable users/respondents so as to avoid them for subsequent tasks? Accordingly, a different weight y i→a can be iteratively determined for each user a’s response to every question i in the following way [13]:

a→i ∼ N(1, 1), i.e., initially assume each user is roughly reliable with Ey (0) = 1 and P(y

b→j , i.e., consider the weighted answer to question j not including user a’s response. k.2: y

j→a , i.e., correlate the responses of the other users with those of user a over all questions assigned to a except i.

Here ∂ -1 a is

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut