wiki:BluePrintHumanIntelligenceTasks

Context Navigation

Version 21 (modified by Pat Tressel, 14 years ago) ( diff )
--

BluePrint for Human Intelligence Task processing (a.k.a. "Job Jar")

Motivation

During a disaster, people on the scene may report via social media or text messages. Emergency managers might find useful information there, but it's buried in large quantities of incomplete pieces, possibly not in a language known to the EM personnel. Much of this work is not easily automated, e.g. determining whether a message is a request for aid (and specifically what and how much and where). In order to clean up this crowdsourced information, we need...another crowd, but one that's trained, or at least learns as they go, and whose work is cross-checked for accuracy. A similar need holds for data gathering done by volunteers.

These are called "human intelligence tasks" (HITs), and what we want to produce is a system for managing them. Examples of HIT platforms are Amazon's Mechanical Turk and Crowdflower. We want a system tailored to the needs of emergency managers, and to working with crowdsourced data during emergencies.

See also: http://www.playsourcing.org/ (only s/Ushahidi/Sahana Eden/ ;-) And in June, see article on Playsourcing in: http://www.crcpress.com/product/isbn/9781439853498

We're suggesting this for GSoC, but the whole thing would be too big for one GSoC project -- one or more subsets can be split out into GSoC projects.

Overview

We would like to manage human-intelligence tasks, e.g. data entry, cleaning crowdsourced data. This involves:

Figure out how to specify a task, and provide a UI for defining tasks.
Allow reading data for tasks from specified sources.
Assign tasks to workers based on skills and / or measured performance, or let workers select tasks.
Provide tasks to users in a web form.
Compare results of multiple workers on the same task.
Decide when a task is sufficiently complete.
Insert tasks with known results as tests.
Evaluate worker performance.
Administer training and testing for new workers.

Project breakdown

These are some suggested ways to split the project into somewhat independent pieces.

Task definition

Each overall job to accomplish one purpose will have its own specific set of input data, its own instructions to workers, its own way for workers to enter results, and its own needs for how much verification is needed. Rather than hard-wire a few sorts of task types, we'd like to let the people who need the work done specify these things. To permit this, we might need to...

Figure out what information is needed to specify a task. This might include:

Specify the task input data source(s).
- Note: The actual connection to the data source, and the process of reading from it, should be isolated. This is not specific to this project -- reading from a remote feed can be used for many purposes. Here, we could assume that the data is being placed in one or more database tables, or in files on the server.
- Most commonly each task will operate on one item of data -- can assume this to start.
Provide instructions for workers.
Specify what worker skills are needed.
Specify how many workers should receive each task, and how many need to agree on the result for it to be accepted.
Specify the web form tools that the worker will need to enter results, e.g.:
- Text input.
- Radio buttons or exclusive select from a list.
- Multi-select boxes or list.
- Combo-box, i.e. either exclusive or multi-select, but also allow adding a new option.
- Selection of locations on a map.
Specify constraints on the results, especially ones that can be checked in the form, such as:
- Data types for text fields (date, number with range, ...)
- Multiple selections that make sense together.
Specify how to compare results from different workers -- when do the results match?
Assign experts who can handle difficult cases and verify a sampling of results.

Decide how the task setup should be stored, that is:

What database schema is appropriate?
For a general task, with arbitrary data sources and arbitrary form fields and layout, we may need a data structure beyond just fixed fields in a single table. This might be implemented in various ways -- options include:
- An XML specification.
- A database schema using multiple tables.
- A form template.
How does it fit in with other Eden components? What other pieces can we use directly or crib from?

To start, tasks could be defined by editing a setup file. Later, we can provide a web form or wizard that allows the administrator to enter the task setup information. This isn't just for the administrator's convenience -- we can also better guarantee correctness of the setup. E.g. if the user specifies connection info for a data source, we could test the connection on the spot.

To assist people in setting up tasks, we can provide some sample task definitions. We could mimic crowdsourcing activities that occurred in actual emergencies.

Registration and training of workers

Basic registration is already available, and there is support for volunteer signup that we can customize. Administrators can list skills they're interested in and volunteers can say what skills they have. We'll need to add whatever fields we want for performance tracking.

There are several parts to training:

Instruction in using the HIT system. This will be "our" part of the training.
Preliminary instruction in performing the specific task, and online help info for reference while performing tasks: This will be the responsibility of whoever defines the task. But we can assist by providing sample training procedures to go with our sample tasks, a way to direct workers to any external training, and a means of providing help info via the task form.
Worker practice on actual tasks: This would be built in to the task queue.

(Larger-scale volunteer training is outside the scope of this project.)

Managing the task queue

The task queue is the core workflow for processing tasks. It gets data items from input sources and moves them through the various processing stages to completion.

The overall flow would be:

Identify new items of data that have not been processed.
Pick one or more appropriate workers for each.
Present the tasks to the workers.
Allow workers to refuse tasks.
Receive their results.
Do any automated checking -- push back obviously invalid results.
Store results.
Record completion of tasks when there is enough consensus on the results.
Record task failure if there's not enough agreement.
Dispatch failed tasks and a sample of successful tasks to experts.

The task queue will be driven by the input data. Actually reading the data from a remote source is "beyond the scope" of this project -- that would be a common process needed by other Eden components as well. If we don't have a component for reading from a particular source that we'd like to support, we can mock reading from it (e.g. by downloading a batch of data or generating fake data and having a cron job add it to appropriate source data tables).

Assigning tasks to users

There are several conditions for assigning tasks to workers.

They should be signed up to work on tasks, and maybe on specific kinds of tasks. That is, there will be other people registered as users, so we need to identify those who are HIT workers.
We should let HIT workers say when they're available to work -- that might not be the instant they log in, and they'll want to take breaks.
An inactivity timer might be needed to detect whether the user has dropped offline or is stuck on something.
Workers will have different skills, and different levels of ability and training on those skills. For production work (not training), we'd want to assign tasks that are appropriate for each worker's skills and skill level. The requirements for skills will come from the task definition.
If the task definition calls for it, we'll need to dispatch each task to multiple workers.
We need to track who has worked on what task, even if they declined it, so that we don't assign a task to a worker more than once.

Sometimes the assignment might be adjusted (these are beyond the basics):

When we're training new workers, we might want to give them tasks to practice on even if their level of ability isn't high. We might not even count their results toward completion of a task, but just compare them against the other assigned workers.
Experienced workers with higher performance ratings on a task might need fewer other workers assigned to the same task. Eventually a worker might be promoted to "expert", and become a reviewer for other workers.
If the task instructions change, we could inject a notification. Could also use this to prompt the worker to take breaks or deliver performance "rewards".

There is a volunteer management component in Eden that is currently under revision. It has simple support for specifying worker skills, and can also serve to identify HIT workers. So that work on this project is not affected by changes, we can fork the volunteer component and modify it as needed. Any useful or necessary mods can be fed back in to the new version later.

Presenting tasks to workers

The task will need a form that displays the data to the worker and accepts their answer.

The form should be assembled based on what's specified in the task definition.
- Appropriate means of displaying the data.
- Widget that implements the required input method.
- A way to make instructions and help info available without distracting an experienced worker.
The worker should be allowed to decline a task in case they don't think they can do it.
Letting them annotate the task with comments or concerns could be helpful in diagnosing problems with the input data, or with the HIT system.
Customizing the layout or look-and-feel might improve usability.
A means of contacting the user via the form could help in determining if they're still online and active.

Collating, comparing, verifying results

Evaluating worker performance

Providing feedback

Caution: Yer humble author is embarking on a <fanatic expostulation>.

One might expect the purpose of providing feedback to be performance improvement, and certainly that's desired. But some common means of supplying feedback are a bad model. A productive worker is not one who's in the dumps from being told their work quality is lousy and they're a bad person for having done it poorly, and they're not as good as that other person over there. Rather, a productive worker is an engaged and enthusiastic worker who feels good about what they're doing. (I'd like to say, a productive worker is a happy and secure worker -- because this is true -- but we're expecting people to be doing this work during disasters. "Happy" is an unlikely state when one has been watching a video feed of the tsunami roaring into Minamisanriku.) Besides just performance improvement, we want to keep our trained workers working, and coming back day after day, and they won't come back as readily if they're made to feel unworthy, and they won't perform as well if they're only there because of grim determination to help.

What to do? Well, fortunately, this is a well studied problem, and -- regardless of how well this isn't done in most employment settings -- this is pretty much a solved problem. The context in which it's been studied, and the solutions, come from the game industry.

The underlying principle is to deliver rewards -- little jolts of self-satisfaction that let people feel good about themselves and what they're doing. With that, people will absorb the "constructive feedback" as training that lets them get better and so get more rewards. They'll do this even if they know they're being jollied along, and rather enjoy that too. Think of your friends, or your kids (or -- admit it -- yourself) spending far too long playing, oh, say, WoW or Angry Birds or Sudoku or Rabids Go Home or... This isn't a huge industry because people get some material goods or practical benefit out of it. And people are playing avidly without even a Good Cause to motivate them. So, if we take a page from the game industry's playbook on how to get people revved up and wanting to improve, and on top of that have the icing of Doing Good, we should have a hit, not just HITs.

Ok, so, what do we actually do here? Since the effectiveness of a game's rewards goes straight to the company's bottom line, they may not give out such details as actual reward pacing and effectiveness of various types of rewards. Yes, yes, there are actual references on this topic. But surely the best methods are proprietary. Fortunately, they're also on display -- extracting them just might take a little research. Now tell me -- where have you heard a better excuse for playing games?

</fanatic expostulation>

Some suggested subsets for GSoC

This are several pieces that can probably be done independently:

Implement the task queue. This is the "backbone" of the project.
Task definition seems like it could be paired naturally not just with the UI for for setting up tasks, but also with generating the task presentation forms, since those are based on the task definition.
Performance evaluation and feedback could be paired with collating and comparing results from the multiple workers assigned to each task, plus the expert evaluation of results, as the result scoring feeds into performance evaluation.

BluePrints

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text