Version 11 (modified by Pat Tressel, 13 years ago) ( diff )


BluePrint for Human Intelligence Task processing (a.k.a. "Job Jar")

Caution to GSoC students considering this project
This is a difficult project, involving statistics, human performance evaluation, and coming up with a way to allow an administrator to define an arbitrary task. Recommended for advanced students with an interest in HCI. We will limit the scope of the project to what can be done during the summer -- this could easily be split into multiple projects. See the end of the page for some suggested subsets that could be GSoC projects.


During a disaster, people on the scene may report via social media or text messages. Emergency managers might find useful information there, but it's buried in large quantities of incomplete pieces, possibly not in a language known to the EM personnel. Much of this work is not easily automated, e.g. determining whether a message is a request for aid (and specifically what and how much and where). In order to clean up this crowdsourced information, we need...another crowd, but one that's trained, or at least learns as they go, and whose work is cross-checked for accuracy. A similar need holds for data gathering done by volunteers.

These are called "human intelligence tasks" (HITs), and what we want to produce is a system for managing them. Examples of HIT platforms are Amazon's Mechanical Turk and Crowdflower. We want a system tailored to the needs of emergency managers, and to working with crowdsourced data during emergencies.

See also: (only s/Ushahidi/Sahana Eden/ ;-) And in June, see article on Playsourcing in:


Manage tasks performed on the site, e.g. data entry, cleaning crowdsourced data. This involves:

  • Provide administrator UI for defining tasks.
  • Allow reading data for tasks from specified sources.
  • Provide tasks to users in a web form.
  • Assign tasks to workers based on skills and / or measured performance, or let workers select tasks.
  • Compare results of multiple workers on the same task.
  • Decide when a task is sufficiently complete.
  • Insert tasks with known results as tests.
  • Evaluate worker performance.
  • Administer training and testing for new workers.

Project breakdown

Task definition

Provide a web form or wizard that allows the administrator to:

  • Specify the task input data source.
    • Note: The actual connection to the data source, and the process of reading from it, should be isolated. This is not specific to this project -- reading from a remote feed can be used for many purposes. Here, we could assume that the data is being placed in one or more database tables, or in files on the server.
    • Most commonly each task will operate on one item of data -- assume this to start.
  • Provide instructions for workers.
  • Specify what worker skills are needed.
  • Specify the format of the results, e.g.:
    • Text input.
    • Radio buttons or exclusive select from a list.
    • Multi-select boxes or list.
    • Combo-box, i.e. either exclusive or multi-select, but also allow adding a new option.
    • Selection of locations on a map.
  • Advanced setup:
    • Allow specifying form layout.
    • Multiple data sources.

Task assembly and presentation

Assigning tasks to users

Collating, comparing, verifying results

Evaluating worker performance

Providing feedback


Note: See TracWiki for help on using the wiki.