= Blueprint for extending the message parsing framework =
[[TOC]]


The inbound message parsing framework was developed during GSoC 2012. See the [wiki:BluePrint/Messaging/Parsing 2012 GSoC message parser project].


* The framework is highly extensible and the parsing workflows are customisable per deployment in the templates. A nice example of this is the NLTK synonym matching filter developed during the H4D2 hackathon.(See [https://github.com/flavour/eden/blob/master/modules/templates/default/parser.py#L67 here]).


* The system supports multiple communication channels i.e Emails, SMS and Twitter.But, certainly a number of incoming feeds (so not just SMS/Tweets, but also RSS feeds, etc.) can be integrated with the system.So, plugging in the RSS feeds would be one useful step. 


* Things that we want to extract and are essential requirements for the framework are discussed below.

== Data Model Changes ==

* Make msg_message a Super Entity
* Each Channel would have an instance of this super entity which acts as the InBox and/or OutBox as-appropriate for that instance type
* The 'Master Message Log' then becomes the view of the super-entity (rather than having to copy messages here)
* Move non-core fields to component tables so that the core tables are uncluttered & fast


== Input Source Improvements ==

=== Reliability/trustworthiness of the message sources/senders ===

                  * Currently, this is done manually through the CRUD interface with the msg_sender data model. 
             
                  * A 'river' of messages are processed with starring of senders & adding of keywords on the fly so that the system gradually becomes more automated through the process.

                  * We could as well pre-populate the keywords database with the most frequently used keywords (esp. in incident reporting) and the rest can be added on the fly.

      
== Parser Improvements ==

=== Topic Detection ===

* KeyGraph is used to detect topics across tweets/other feeds to filter relevant and actionable information from the rest of them. This is done after doing a loose filtering of information based on keywords and location.
* See http://keygraph.codeplex.com/ .

=== Actionability ===
* Is this something that we can actually do something with?
* Its important to manage the content coming from various message sources and separate the ones that are actionable and contains useful information from the rest of them.

"Whom Should I Follow? Identifying Relevant Users During Crises":
* http://www.public.asu.edu/~huanliu/papers/ht2013.pdf

=== Location ===
                  * Another important requirement is to improve the ability to extract location data out of unstructured text and make sense of ambigous locations.

                  * An OpenGeoSMS parser already exists in the default parser template(also available as an API within s3msg.py) which is able to parse lat-lon information of the location from OpenGeoSMS formatted messages. But , it would be great if it could be linked with the database (look the location up from the database).

== UI Improvements ==

                  * Implementing a UI which prioritises message parsing for starred senders is a useful requirement.
                  * The user should be able to *star* senders and *mark* keywords through the UI.
                  * Possible inspirations:
                                            * dataLists with Filters (like TweetDeck):
                                            * http://tldrmpdemo.aidiq.com/eden/default/index/updates
                                            * S3Profile:
                                                       http://tldrmpdemo.aidiq.com/eden/org/organisation/1/profile
                                            * S3Summary:
                                                       https://sahana.mybalsamiq.com/projects/sahanacommunityresiliencemappingprojectfinal/naked/Risk+Summary?key=ff49e93ddf8139e5eb61065660c796caa6f95845
                                                       http://i.imgur.com/jjaDmQ1.png
                                                       http://twitris.knoesis.org/indiarain2013/
                                            * Tweak the Tweet (EPIC)
                                            * http://idisaster.wordpress.com/2013/05/28/more-research-on-boston-marathon-official-twitter-activity-smem/
                                            * https://github.com/ushahidi/SwiftRiver
                                            * http://twitcident.com
                                            * http://wis.ewi.tudelft.nl/twitcident/
                                            * Different Users:
                                                               * Power user looking for info themselves
                                                               * Dedicated miner (volunteer/intern/junior) mining stuff for decision makers
                                            * Vision: be able to move between Filtered view & Firehose in gradual increments. Be able to train the automated assistants to make the filtered view more useful                        
                                                      (subscribe/unsubscribe)
                                            * Features 
                                                       * See all Messages in a datatable/list across media types (FB/Twitter/RSS/YouTube/Flickr)
                                                       * Filter them
                                                       * Add Sender to Whitelist/Blacklist
                                                       * Add Keyword to back-end filters
                                                       * View Images/Video
                                                       * Find Situation Reports
                                                       * ReliefWeb, etc
                                                       * Grouping/Linking results both to enhance validity & also provide a single point of entry
                                                       * Route to other Sahana Modules
                                                       * Drag and Drop between Raw source & Target Module
                                                       * Mark for Action
                                                       * create Tasks
                                                       * create Incident Reports
                                                       * create Assessments
                                                       * create Situation Reports
                                                       * Forward via Outbound Channels (Public e.g. Twitter & Private e.g. Email/SMS)
                                                       * Semantic Search?
                                                       * RDF Channel?


== Use Cases ==

=== Parsing bounced messages ===

                  * This is very important for IFRC Africa who send out bulk emails to their volunteer base from Eden & want to know which mails are
                   mis-typed / users moved / etc.