|Version 14 (modified by 9 years ago) ( diff ),|
Blueprint for extending the message parsing framework
Table of Contents
The inbound message parsing framework was developed during GSoC 2012. See the 2012 GSoC message parser project.
- The framework is highly extensible and the parsing workflows are customisable per deployment in the templates. A nice example of this is the NLTK synonym matching filter developed during the H4D2 hackathon.(See here).
- The system supports multiple communication channels i.e Emails, SMS and Twitter.But, certainly a number of incoming feeds (so not just SMS/Tweets, but also RSS feeds, etc.) can be integrated with the system.So, plugging in the RSS feeds would be one useful step.
- Things that we want to extract and are essential requirements for the framework are discussed below.
Input Source Improvements
Reliability/trustworthiness of the message sources/senders
- Currently, this is done manually through the CRUD interface with the msg_sender data model.
- A 'river' of messages are processed with starring of senders & adding of keywords on the fly so that the system gradually becomes more automated through the process.
- We could as well pre-populate the keywords database with the most frequently used keywords (esp. in incident reporting) and the rest can be added on the fly.
- Is this something that we can actually do something with?
- Its important to manage the content coming from various message sources and separate the ones that are actionable and contains useful information from the rest of them.
"Whom Should I Follow? Identifying Relevant Users During Crises":
- Another important requirement is to improve the ability to extract location data out of unstructured text and make sense of ambigous locations.
- An OpenGeoSMS parser already exists in the default parser template(also available as an API within s3msg.py) which is able to parse lat-lon information of the location from OpenGeoSMS formatted messages. But , it would be great if it could be linked with the database (look the location up from the database).
- Implementing a UI which prioritises message parsing for starred senders is a useful requirement.
- The user should be able to *star* senders and *mark* keywords through the UI.
Parsing bounced messages
- This is very important for IFRC Africa who send out bulk emails to their volunteer base from Eden & want to know which mails are mis-typed / users moved / etc.
Situational Awareness Dashboard
- Trying to present a single,simplified view for many disparate information sources including RSS feeds.
- Decision Makers have many different feeds of information flowing to them from both internal & external sources.Especially external sources may come in the form of RSS feeds . So, having the system somehow prioritise & classify things before they reach them (instead of manual screening) will make their task much easier.