|Version 31 (modified by 10 years ago) ( diff ),|
GHC Social Media HIT Processing
Table of Contents
- Current System
- HIT Module
- Project breakdown
- Fill in required "new module" boilerplate
- Add a database table for message processing tasks
- Track which messages have been processed
- Add a controller function to generate task pages for workers
- Add a view that presents a task to the worker and submits their work
- Generate a list of categories from the database
- See Also
- Receive tweets and / or SMS messages from the public.
- Dispatch these to online workers to classify and geocode.
- Display on a map.
During the Haiti earthquake of Jan 2010, people trapped in buildings sent SMS messages to a designated shortcode. These were classified, translated, and geocoded by online workers using Amazon's Mechanical Turk, then provided to emergency managers.
During the Kenya 2013 general election, citizens and trained election monitors reported election-related incidents via SMS and twitter. These were automatically entered into a map database, then vetted by online workers to remove spam and contact the sender for clarification, before making the information public. See: https://uchaguzi.co.ke/
During a Random Hacks of Kindness hackathon in 2010, a variant of this project was implemented using a Sahana Eden as the back end and a custom web page (not automatically generated by Eden) as the front end. This was designed as a training game -- workers got "experience points" and were awarded badges. See: http://gwob.org/101010-hackathon-winners/
For ease of getting a large dataset to play with, we will use Twitter for this example, although the same system can be used for other message channels, like Email, SMS & RSS.
Test Server to see User Interface:
Install on your own system:
- Install Eden
- Install the TwitterSearch library
- Follow the User Guidelines to get a Twitter OAuth account & use this to search
Note: Documentation for both end-users & devs could use improvement
- feel free to dive in!
There are 2 projects that we can work on during this session: (A) Some simple enhancements to the current interface (B) Creating a 'Human Intelligence Tasking' module to allow the processing of these messages to be divided up amongst a lot of workers
The current workflow for a user to search Twitter is very cumbersome!
Some ideas for improvements (although feel free to come up with your own!)
- There should be an option (on by default) to have the Search run after save
- The next screen should be the Results:
s3db.configure(create_next = URL(f="twitter_result"))
- There should be some filters above the results:
- There should be a link to see the results on a Map
- this will require modifying the code to have the msg_twitter_result table use self.gis_location_id() Foreign key instead of !lat/lon fields
- The S3Map() method is then automatically accessible via /eden/msg/twitter_result/map
- This will use the same filters defined for the list view
- There should be a Report method configured for charts based on pivot tables
- We could create a 'Summary' view which allows the Table, Chart & Map to be separate tabs on a single page, sharing a common set of filters, keeping the settings when moving between the views
- Example: http://demo.lacrmt.sahanafoundation.org/eden/vulnerability/risk/summary
- This is accessible via /eden/msg/twitter_result/summary
- The Chart, Table, Filter, map settings are inherited
- The configuration of which Tabs to display can be seen in the CRMT template
- Can see this locally by switching to the CRMT template:
settings.base.template = "CRMT"in
models/000_config.py& doing a fresh prepopulate
- Run Parsers on the list
- Train these parsers
- We want to add tools to the Table view to Geocode, Classify, Whitelist sender, Blacklist sender
- This has overlap witht he HIT module approach
- Timeline Report?
- Work on KeyGraph visualisation?
This project is intended to be easy to subdivide into tasks that can be worked on somewhat independently and in parallel, given the choice of a few naming conventions for new database tables and fields.
In order to keep our work together, and distinct from other work, we'll add a new module. This is the first step in added "human intelligence task" processing, in which results are verified by sending the same task to multiple workers, and comparing the results. So let's call our new module "hit". That means the controller file will be:
The model will be:
The view pages will be in the directory:
Everyone may find it useful to refer to:
- The lesson on "making a new module" in the Eden book:
- The index of Eden APIs:
We'll deal with some data outside the new module, such as:
Received messages are stored in the "message log" table, msg_message.
(This is a special kind of table called (in Eden terminology) a "superentity". This is like a superclass but for database tables. Records in multiple specialized tables have "parent" records in a shared superentity table, so other tables can refer to any of the specialized tables without needing a foreign key field for every one, by instead linking to the superentity record. References to ordinary non-superentity tables are simpler.)
Workers will sign up for accounts on an Eden site. When they sign up, they will have a record in the
auth_user table and a record in the
pr_person table for profile information. Because there
are other things besides people that have addresses and such, there is a superentity for person-like types.
But when we know we're referring to an actual person, we refer to their record in pr_person.
Fill in required "new module" boilerplate
Look at the lesson on "making a new module" in the Eden book:
That puts the model file in the
eden/models directory, but that is just to avoid complication.
eden/models are loaded on every http request, whether they're needed or not.
Most Eden models are in
eden/modules/s3db, and are only loaded by http requests that need them.
Since our message processing won't be used by most types of requests, we want it in
Add the new module to the list of enabled modules. This is normally specified in a "template" that has the
customizations for a particular site. Here, we will "cheat" and just add the new module to our configuration
eden/models/000_config.py. Get the default module list from
Copy it to
models/000_config.py and add an entry for the hit module.
Add a database table for message processing tasks
We want to add a table that holds the data entered by one worker for one message. The table will need fields for:
- A foreign key reference to the msg_message table. Here is another table with such a reference:
- A category that the worker will assign. This can be just a text field for now. Here's an example of a
text field (the from address in a message):
(The category will be empty til the worker fills it in, so we can't require that it be non-empty.)
- A location that the worker will enter either by filling out a form and / or clicking on a map.
There is a standard widget for selecting locations that will be included automatically if the
location is specified as in this example:
Here is where the function that generates the foreign key reference is defined:
The name for the function used outside the gis module includes the "gis_" prefix to avoid ambiguity.
Why do we want a separate table? Why not just add a category and location to the msg_message table? Eventually, we want to do "human intelligence task" processing, in which results are verified by sending the same task to multiple workers, and comparing the results. So we may have more than one set of results for each message. We want to include which worker did each task, so we can check the quality of their work and refer them to more training if needed.
Track which messages have been processed
We don't want to add fields to
msg_message just for our module. But we need a way to tell when
a message has been processed, so we can select unprocessed messages to give to users. Later, when we
add more human intelligence task features such as sending the same task to multiple workers, we'll need
somewhere to record how complete the work is for one message.
So we may want to add a table that gets a record added when each message intended for processing arrives.
The table should be defined in the same model file as the task table. It should refer to the
record in the same way as above, and should have a boolean field for whether the message is processed.
(Later, this can be changed to support processing by multiple workers, but for now, we can consider it
done when one worker has processed it.)
We don't want to enter all messages, just ones for our workers (e.g. incoming email for individuals
should not be sent to workers). We'll need a way to recognize our messages, e.g. a Twitter direct message
recipient or hashtag, or a particular SMS shortcode. There is a new and somewhat experimental feature
for selecting out messages, which may be useful for this, but documentation is lacking. This will
require consultation on IRC. IF we cannot use this, we can (temporarily) add specialized code in the
msg module's incoming message handling to pick out the desired messages and create records for them.
Add a controller function to generate task pages for workers
- Other controllers in
- The documentation for the controller helper function:
- The documentation for a custom controller:
Eden will automatically generate pages that correspond to database tables (list forms) or individual records (read or edit forms), or empty forms for adding new records (create forms). However, when a worker requests a task, there is not yet any database record for the task. Instead, we want to:
- Select a new, not-yet-processed message.
- Create a record in the (new) hit_task table (being worked on by the model team).
- Return the standard form for the new record to the user.
Add a view that presents a task to the worker and submits their work
The worker will get an autogenerated "edit" form, produced by the controller and view code together. If a plain edit form is ok, then this may be a very short task -- it may be that no special view file is needed.. So this can probably be left until after the controller is done.
Generate a list of categories from the database
We would like to encourage workers to use existing categories when there is a close enough match, but be able to add new ones if not. So, we want to give the worker a menu of categories to choose from, consisting of all the current categories found in the category field being added by the team working on adding the new tables, and also let the worker add a new category.