wiki:BluePrint/Messaging/Parsing

Version 22 (modified by Dominic König, 12 years ago) ( diff )

--

Inbound Message Parsing

Introduction

This is a project proposal for GSoC 2012. We need to parse inbound messages, with an initial focus being SMS.

We can test this with Clickatell or a local phone (the Clickatell Inbound SMS fucntionality needs to be developed, which could be in-scope for this project).

Where should the code live?

Currently message parsing is done in the core code: modules/s3/s3msg.py

We want to be able to make this a deployment-specific set of options.

We are still working on developing our Profile Layer for having deployment-specific files separated from core code, so we can start by just making this a deployment-template like 000_config.py copying the file to modules/s3/s3parsing.py for easy import into S3MSG.

How does the code get run?

The Inbound Message receiving tasks should be run via the Scheduler:

Before exiting those threads, it should trigger a new Schduled task (once, now) to do the parsing.

All Parser Workflows should be defined in tasks.py.

The Configuration links these Tasks to Inbound Sources, along with any other args, in a new msg_workflow table.

  • so this can be linking 2 FKs in the scheduler_task table

Tropo, however, is different, since that receives inbound connections to it's controller. In order to make this consistent a schduled task can be created but set to never run on schedule & have that run from the controller (or, even better, have this run the same code that the schduled task runs but inline...since this will be pretty much a no-op other than triggering the parser, so won't hold the Tropo connection up & avoids having to spawn a new thread).

Parsing

We want to be able to process OpenGeoSMS

If we need to have complex parsing, then we can make use of pyparsing

Parser

Routing

We want to be able to direct the message to the appropriate module to handle the data.

This could be done either by launching a real REST request or else simulating one via the API.

resource = s3db.resource("module_resourcename")

tbc

Project Plan

Project Deliverable: The project aims at parsing inbound messages such as SMS from CERT responders after deployment. It enables the processing of responses to deployment notifications; which is essentially controlled by the module to which the message is routed.

Project Justification: Parsing of inbound messages is a critical utility for a trained volunteer group such as CERT(Community Emergency Response Teams) where communication between various deployments and volunteers play a vital role. As this will be a deployment-specific option, the functionality becomes an important component for Sahana Eden.

Implementation Plan:

  • Keeping the development of Profile Layer in mind and the functionality being a part of deployment-specific options, the rules for parsing are contained in private/templates/default/parser.py.

The current parsing rules implement the functionality in the following manner:

  1. The inbound message text is passed as an argument to the parse_message() method in the s3msg.py module.
  2. The text is matched with a predefined list of primary and contact keywords after splitting with whitespace as the delimiter.
  3. A database query is generated to the concerned database according to the matched keywords.
  4. The query retrieves the relevant field values and generates a reply to the inbound message query.
  5. Also these parsing rules have been implemented only for modules – ‘Person’ , ‘Hospital’ and ‘Organisation’.

Extending these rules to other modules can be in scope of the project. *One of the main issues will be identifying the messages that belong to a particular source, so it could have its own processing.Now, that here is handled by the data model which defines a ‘msg_workflow' table in the database which links the Source to the Workflow with any required args.So the essential features of this approach have been listed below:

  1. The Parser workflow table links 'Source X' to 'Workflow Y'.
  2. Now, designing the details of the Workflow Y would be a developer task.
  3. Whereas linking ‘Source X’ to ‘Workflow Y’ will be a configurable option.
  4. So essentially,the Parser Table links Source to Workflow with any other required args & this acts like a Template for the schduler_task table.
  • Now, a task process_log() is defined in tasks.py , where the objective of process_log() is to scan through all the messages in msg_log; and process those for parsing which are flagged as unparsed (is_parsed=False).The task is scheduled in zzz_1st_run.py where it is chained to the concerned parsing task(this is achieved by the msg_workflow table, the ‘source_task_id’ field in msg_log will help retrieve the respective parsing workflow_task_id from msg_workflow).
  • Also,this allows for chaining of workflows where a source for a workflow could be another workflow instead of an Incoming source.We can have 2nd-pass Parser workflows which don't start from the Source direct but can plugged as output from a 1st-pass one.

Source -> process_log() ->1st pass parser -> detailed Parser ---> Module

  • Here,the 1st pass parser is customized per-deployment;and decides which email source goes to a particular workflow (simple msg_workslow link) or decides based on other factors such as keywords to which main workflow the messages should be passed.
  • The data model is integrated with the templates folders (or a sub-folder say private/templates/parsing) which serves as the initial UI.The post-install UI will consist of a CRUD interface admin panel, a simple s3_rest_controller().However, eventually this is planned to be the part of the WebSetup.
  • We want to be able to direct the message to the appropriate module to handle the data.This could be done either by launching a real REST request or else simulating one via the API.
    resource = s3db.resource("module_resourcename")
    
  • Messages which are routed to a specific resource can be subscribed to by the user.For this purpose,we can use the existing Save Search and Subscription functionality where the user can subscribe to new messages for a specific resource using a resource filter.The msg_log can be made a component for the resources.Now,if it's a component, then when someone opens the resource, messages will be there in a tab.Also, if the message has to be tied to multiple resources, then we can use a relationship (link) table.
  • Implementing/extending the utility for other modules especially the IRS module will be of real use, where enabling to log reports through SMS will be vital, which can also use the OpenGeoSMS encoding standards(LatLon generates a google-maps URL) for integration with our Android Client. A dedicated routine to generate OpenGeoSMS URLs already exists in prepare_opengeosms() in s3msg.py itself. So integration with the parsing routine won’t be difficult. Other modules for which this can be implemented are : ‘Request’ and ’Inventory’.
  • Finally the code will be tested on the system and the bugs (if any ;-) ) will be fixed.

Future Options:

  • Though the parsing rules will be generic , a few minor tweaks for other processes such as Email and Twitter will have to be performed to maintain its generic nature.
  • One of the most valuable functionality that can be added here is to make the SMS communication more interactive. e.g. the text body received does not match any of the expected keywords , the API dispatches a reply stating the expected format of the message.
  • Adapting the parsing rules to cover as wide a base of inbound messages as possible. This will involve making a wider collection of keywords to be searched for every concerned module.Linking different labels across the DB to module-specific keywords will be really helpful.Also the list of primary keywords to be matched can also be made a deployment-specific option.

Data Model

Data Model Blueprint: msg_workflow

Field Name Purpose Datatype/table
source_task_idInbound Email/SMS Source(Another workflow in case of chained workflows).String corresponding to username of the source.
workflow_task_idParsing WorkflowString corresponding to name of the parsing function.
s3.meta_fields() metadata

Changes in msg_log:

Field Name Purpose Datatype/table
is_parsedParsing Status of Inbound MessagesBoolean
source_task_idInbound Email/SMS Source(Another workflow in case of chained workflows).String corresponding to username of the source.

Use Cases

This will be used by at least CERT & Tzu Chi.

They wish to process responses to deployment notifications where the recipients send back at least 'Accepted' / 'Reject'.

Additional information that it would be useful to capture are:

  • Current Location
  • ETA to deployment location
  • Questions/Comments (free text)

GSoC Project

Note: See TracWiki for help on using the wiki.