wiki:BluePrint/Messaging/ExtendingParsing

Context Navigation

← Previous Version
View Latest Version
Next Version →

Version 16 (modified by Fran Boon, 11 years ago) ( diff )
--

Blueprint for extending the message parsing framework

The inbound message parsing framework was developed during GSoC 2012. See the 2012 GSoC message parser project.

The framework is highly extensible and the parsing workflows are customisable per deployment in the templates. A nice example of this is the NLTK synonym matching filter developed during the H4D2 hackathon.(See here).

The system supports multiple communication channels i.e Emails, SMS and Twitter.But, certainly a number of incoming feeds (so not just SMS/Tweets, but also RSS feeds, etc.) can be integrated with the system.So, plugging in the RSS feeds would be one useful step.

Things that we want to extract and are essential requirements for the framework are discussed below.

Data Model Changes

Make msg_message a Super Entity
Each Channel would have an instance of this super entity which acts as the InBox and/or OutBox as-appropriate for that instance type
The 'Master Message Log' then becomes the view of the super-entity (rather than having to copy messages here)
Move non-core fields to component tables so that the core tables are uncluttered & fast

Input Source Improvements

Reliability/trustworthiness of the message sources/senders

Currently, this is done manually through the CRUD interface with the msg_sender data model.

A 'river' of messages are processed with starring of senders & adding of keywords on the fly so that the system gradually becomes more automated through the process.

We could as well pre-populate the keywords database with the most frequently used keywords (esp. in incident reporting) and the rest can be added on the fly.

Parser Improvements

Topic Detection

KeyGraph is used to detect topics across tweets/other feeds to filter relevant and actionable information from the rest of them. This is done after doing a loose filtering of information based on keywords and location.
See http://keygraph.codeplex.com/ .

Actionability

Is this something that we can actually do something with?
Its important to manage the content coming from various message sources and separate the ones that are actionable and contains useful information from the rest of them.

"Whom Should I Follow? Identifying Relevant Users During Crises":

http://www.public.asu.edu/~huanliu/papers/ht2013.pdf

Location

Another important requirement is to improve the ability to extract location data out of unstructured text and make sense of ambigous locations.

An OpenGeoSMS parser already exists in the default parser template(also available as an API within s3msg.py) which is able to parse lat-lon information of the location from OpenGeoSMS formatted messages. But , it would be great if it could be linked with the database (look the location up from the database).

UI Improvements

Implementing a UI which prioritises message parsing for starred senders is a useful requirement.
The user should be able to *star* senders and *mark* keywords through the UI.
Possible inspirations:
- dataLists with Filters (like TweetDeck):
- http://tldrmpdemo.aidiq.com/eden/default/index/updates
- S3Profile:

http://tldrmpdemo.aidiq.com/eden/org/organisation/1/profile

S3Summary:

https://sahana.mybalsamiq.com/projects/sahanacommunityresiliencemappingprojectfinal/naked/Risk+Summary?key=ff49e93ddf8139e5eb61065660c796caa6f95845 http://i.imgur.com/jjaDmQ1.png http://twitris.knoesis.org/indiarain2013/

Tweak the Tweet (EPIC)
http://idisaster.wordpress.com/2013/05/28/more-research-on-boston-marathon-official-twitter-activity-smem/
https://github.com/ushahidi/SwiftRiver
http://twitcident.com
http://wis.ewi.tudelft.nl/twitcident/
Different Users:
- Power user looking for info themselves
- Dedicated miner (volunteer/intern/junior) mining stuff for decision makers
Vision: be able to move between Filtered view & Firehose in gradual increments. Be able to train the automated assistants to make the filtered view more useful

(subscribe/unsubscribe)

Features
- See all Messages in a datatable/list across media types (FB/Twitter/RSS/YouTube/Flickr)
- Filter them
- Add Sender to Whitelist/Blacklist
- Add Keyword to back-end filters
- View Images/Video
- Find Situation Reports
- ReliefWeb, etc
- Grouping/Linking results both to enhance validity & also provide a single point of entry
- Route to other Sahana Modules
- Drag and Drop between Raw source & Target Module
- Mark for Action
- create Tasks
- create Incident Reports
- create Assessments
- create Situation Reports
- Forward via Outbound Channels (Public e.g. Twitter & Private e.g. Email/SMS)
- Semantic Search?
- RDF Channel?

Use Cases

Parsing bounced messages

This is very important for IFRC Africa who send out bulk emails to their volunteer base from Eden & want to know which mails are mis-typed / users moved / etc.

Note: See TracWiki for help on using the wiki.

Download in other formats:

Plain Text