= GHC Social Media HIT Processing = [[TOC]] == Introduction == * Receive tweets and / or SMS messages from the public. * Dispatch these to online workers to classify and geocode. * Display on a map. === Background === During the Haiti earthquake of Jan 2010, people trapped in buildings sent SMS messages to a designated shortcode. These were classified, translated, and geocoded by online workers using [https://www.mturk.com Amazon's Mechanical Turk], then provided to emergency managers. During the Kenya 2013 general election, citizens and trained election monitors reported election-related incidents via SMS and twitter. These were automatically entered into a map database, then vetted by online workers to remove spam and contact the sender for clarification, before making the information public. See: https://uchaguzi.co.ke/ During a Random Hacks of Kindness hackathon in 2010, a variant of this project was implemented using a Sahana Eden as the back end and a custom web page (not automatically generated by Eden) as the front end. This was designed as a training game -- workers got "experience points" and were awarded badges. See: http://gwob.org/101010-hackathon-winners/ ===== Project breakdown ===== This project is intended to be easy to subdivide into tasks that can be worked on somewhat independently and in parallel, given the choice of a few naming conventions for new database tables and fields. In order to keep our work together, and distinct from other work, we'll add a new module. This is the first step in added "human intelligence task" processing, in which results are verified by sending the same task to multiple workers, and comparing the results. So let's call our new module "hit". That means the controller file will be: {{{ eden/controllers/hit.py }}} The model will be: {{{ eden/modules/s3db/hit.py }}} The view pages will be in the directory: {{{ eden/view/hit }}} Everyone may find it useful to refer to: * The lesson on "making a new module" in the Eden book:[[BR]] http://booki.flossmanuals.net/sahana-eden/_draft/_v/1.0/building-a-new-application/ * The index of Eden APIs:[[BR]] http://eden.sahanafoundation.org/wiki/S3 We'll deal with some data outside the new module, such as: Received messages are stored in the "message log" table, msg_message.[[BR]] https://github.com/flavour/eden/blob/master/modules/s3db/msg.py#L93 (This is a special kind of table called (in Eden terminology) a "superentity". This is like a superclass but for database tables. Records in multiple specialized tables have "parent" records in a shared superentity table, so other tables can refer to any of the specialized tables without needing a foreign key field for every one, by instead linking to the superentity record. References to ordinary non-superentity tables are simpler.) Workers will sign up for accounts on an Eden site. When they sign up, they will have a record in the {{{auth_user}}} table and a record in the {{{pr_person}}} table for profile information. Because there are other things besides people that have addresses and such, there is a superentity for person-like types. But when we know we're referring to an actual person, we refer to their record in pr_person. ===== Fill in required "new module" boilerplate ===== Look at the lesson on "making a new module" in the Eden book:[[BR]] http://booki.flossmanuals.net/sahana-eden/_draft/_v/1.0/building-a-new-application/ That puts the model file in the {{{eden/models}}} directory, but that is just to avoid complication. Models in {{{eden/models}}} are loaded on every http request, whether they're needed or not. Most Eden models are in {{{eden/modules/s3db}}}, and are only loaded by http requests that need them. Since our message processing won't be used by most types of requests, we want it in {{{eden/modules/s3db}}}. Add the new module to the list of enabled modules. This is normally specified in a "template" that has the customizations for a particular site. Here, we will "cheat" and just add the new module to our configuration file {{{eden/models/000_config.py}}}. Get the default module list from {{{eden/private/templates/default/config.py}}} Copy it to {{{models/000_config.py}}} and add an entry for the hit module. ===== Add a database table for message processing tasks ===== We want to add a table that holds the data entered by one worker for one message. The table will need fields for: * A foreign key reference to the msg_message table. Here is another table with such a reference:[[BR]] https://github.com/flavour/eden/blob/master/modules/s3db/msg.py#L1609 * A category that the worker will assign. This can be just a text field for now. Here's an example of a text field (the from address in a message):[[BR]] https://github.com/flavour/eden/blob/master/modules/s3db/msg.py#L98 [[BR]] (The category will be empty til the worker fills it in, so we can't require that it be non-empty.) * A location that the worker will enter either by filling out a form and / or clicking on a map. There is a standard widget for selecting locations that will be included automatically if the location is specified as in this example:[[BR]] https://github.com/flavour/eden/blob/master/modules/s3db/cr.py#L219 [[BR]] Here is where the function that generates the foreign key reference is defined:[[BR]] https://github.com/flavour/eden/blob/master/modules/s3db/gis.py#L265 [[BR]] The name for the function used outside the gis module includes the "gis_" prefix to avoid ambiguity. Why do we want a separate table? Why not just add a category and location to the msg_message table? Eventually, we want to do "human intelligence task" processing, in which results are verified by sending the same task to multiple workers, and comparing the results. So we may have more than one set of results for each message. We want to include which worker did each task, so we can check the quality of their work and refer them to more training if needed. ===== Track which messages have been processed ===== We don't want to add fields to {{{msg_message}}} just for our module. But we need a way to tell when a message has been processed, so we can select unprocessed messages to give to users. Later, when we add more human intelligence task features such as sending the same task to multiple workers, we'll need somewhere to record how complete the work is for one message. So we may want to add a table that gets a record added when each message intended for processing arrives. The table should be defined in the same model file as the task table. It should refer to the {{{msg_message}}} record in the same way as above, and should have a boolean field for whether the message is processed. (Later, this can be changed to support processing by multiple workers, but for now, we can consider it done when one worker has processed it.) We don't want to enter all messages, just ones for our workers (e.g. incoming email for individuals should not be sent to workers). We'll need a way to recognize our messages, e.g. a Twitter direct message recipient or hashtag, or a particular SMS shortcode. There is a new and somewhat experimental feature for selecting out messages, which may be useful for this, but documentation is lacking. This will require consultation on IRC. IF we cannot use this, we can (temporarily) add specialized code in the {{{msg}}} module's incoming message handling to pick out the desired messages and create records for them. ===== Add a controller function to generate task pages for workers ===== Look at: * Other controllers in {{{eden/controllers}}} * The documentation for the controller helper function:[[BR]] wiki:S3/S3REST/s3_rest_controller [[BR]] * The documentation for a custom controller:[[BR]] http://eden.sahanafoundation.org/wiki/S3/S3Method Eden will automatically generate pages that correspond to database tables (list forms) or individual records (read or edit forms), or empty forms for adding new records (create forms). However, when a worker requests a task, there is not yet any database record for the task. Instead, we want to: * Select a new, not-yet-processed message. * Create a record in the (new) hit_task table (being worked on by the model team). * Return the standard form for the new record to the user. ===== Add a view that presents a task to the worker and submits their work ===== The worker will get an autogenerated "edit" form, produced by the controller and view code together. If a plain edit form is ok, then this may be a very short task -- it may be that no special view file is needed.. So this can probably be left until after the controller is done. ===== Generate a list of categories from the database ===== We would like to encourage workers to use existing categories when there is a close enough match, but be able to add new ones if not. So, we want to give the worker a menu of categories to choose from, consisting of all the current categories found in the category field being added by the team working on adding the new tables, and also let the worker add a new category.