Changes between Version 3 and Version 4 of Event/2012/GSoC/MessageParsing


Ignore:
Timestamp:
05/14/12 17:31:35 (13 years ago)
Author:
Ashwyn
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • Event/2012/GSoC/MessageParsing

    v3 v4  
    1 ==1. Personal Details==
     1===1. Personal Details===
    22 
    33Name:  Ashwyn Sharma[[BR]]
     
    55Email: ashwyn1092@gmail.com[[BR]]
    66
    7  
    87Freenode IRC Nickname: ashwyn[[BR]]
    98
    10  
    119Skype: ashwyn sharma[[BR]]
    1210
    13  
    14 Age:19
    15  
    16 Education: Currently pursuing B.E.(Bachelor in Engineering) from N.S.I.T, New Delhi.
    17  
    18 Country:India
    19  
    20 Timezone: GMT +0530
    21  
    22 Linked In Profile: http://in.linkedin.com/pub/ashwyn-sharma/37/bb2/777
    23  
    24 Exposure To Similar Technologies and/or  FOSS in general :
    25  
    26 My work experience with FOSS was pretty limited (when I started to contribute for Sahana), but I have  spent a past few years developing a great understanding of its modus operandi , thus giving me a sufficient exposure to the whole concept of Free Open Source Software (FOSS).However , my involvement with the Sahana Software Foundation for the last two months has given me tremendous experience with Python and the web2py technology in particular.
    27  
    28 Why would you like to help the Sahana project?:
     11Age:19[[BR]]
     12
     13Education: Currently pursuing B.E.(Bachelor in Engineering) from N.S.I.T, New Delhi.[[BR]]
     14
     15 
     16Country:India[[BR]]
     17
     18 
     19Timezone: GMT +0530[[BR]]
     20
     21 
     22Linked In Profile: http://in.linkedin.com/pub/ashwyn-sharma/37/bb2/777[[BR]]
     23
     24 
     25Exposure To Similar Technologies and/or  FOSS in general :[[BR]]
     26
     27 
     28My work experience with FOSS was pretty limited (when I started to contribute for Sahana), but I have  spent a past few years developing a great understanding of its modus operandi , thus giving me a sufficient exposure to the whole concept of Free Open Source Software (FOSS).However , my involvement with the Sahana Software Foundation for the last two months has given me tremendous experience with Python and the web2py technology in particular.[[BR]]
     29
     30 
     31Why would you like to help the Sahana project?:[[BR]]
     32
    2933 
    3034As we rightly know that the Sahana Software Foundation enable organisations and communities to better prepare for and respond to disasters.In the process,they save a million lives  through its information management solutions. Contributing to a cause as noble as this just adds that extra motivation or rather a purpose behind all the coding and development process that goes on during the summer. Living in a country like India which is prone to several natural hazards and disasters ; and having witnessed one myself in  http://en.wikipedia.org/wiki/2001_Gujarat_earthquake ,it makes me understand the need of a deployment tool like Sahana Eden in a more realisable way. Moreover, I really believe that my work with the Sahana community so far helps me contribute to Sahana and the Eden Project in particular in the coming future.
    31  
    32  
    33  
    34  
    35  
    36  
    37  
    38 2. Personal Availability
     35[[BR]]
     36 
     37 
     38 
     39 
     40 
     41 
     42 
     43===2. Personal Availability===
    3944 
    4045Have you reviewed the timeline for GSoC 2012?
     46[[BR]]
    4147 
    4248Yes,  I have reviewed the timeline for GSoC 2012.
     49[[BR]]
    4350 
    4451Do you have any significant conflicts with the listed schedule? If so, please list them here.
     52[[BR]]
    4553 
    4654No,not really.My semester exams are in the second half of May,2012.However, the coding starts on 21st May according to the timeline.So, I will not be losing any significant time.Moreover, I would be more than happy to start coding before that period so that I will be able to compensate for the few lost days
     55[[BR]]
    4756 
    4857Will you need to finish your project prior to the end of the GSOC?
     58[[BR]]
    4959 
    5060No, my project  is planned to be developed along the whole summer.
     61[[BR]]
    5162 
    5263Are there any significant periods during the summer that you will not be available?
     64[[BR]]
    5365 
    5466Apart from the conflict listed above, I will be completely available throughout the summer
    55  
    56  
    57  
    58  
    59  
    60  
    61  
    62  
    63 3. Project Abstract
    64  
     67[[BR]]
     68 
     69 
     70 
     71 
     72 
     73 
     74 
     75 
     76===3. Project Abstract===
     77 
     78[[BR]]
    6579 
    6680The essential requirement for this project is to parse inbound messages , with an initial focus to SMS. The project is specifically aimed at the CERT usecase  where they wish to process responses to deployment notifications. Or in other words , to handle replies to deployment requests. Currently  the message parsing is done in the core code i.e. modules/s3/s3msg.py ,to be particular, in the parse_message() method. The parsing rules will be defined in private/prepopulate which allows for hosting of multiple profile options in the main code.Now,s3parsing.py can import these parsing rules from prepopulate.This also enforces the on-going work in the development of the Profile Layer , in which deployment-specific files are separated from core code.The parsing module utilizes a data model "msg_workflow”  to link the source and the workflow to schedule tasks. Processing of OpenGeoSMS encoded messages is also an important area to work on especially for the existing Android Client, for which it will be of real use. Also to provide robustness and extend the existing code , the pyparsing Parser module can be incorporated or any other parsing generator ; which will be subjective to the parsing needs.
    67  
    68  
    69  
    70  
    71  
    72 4. Project Plan
    73  
    74 Project Deliverable:
     81[[BR]]
     82 
     83 
     84 
     85 
     86 
     87===4. Project Plan===
     88 
     89Project Deliverable:[[BR]]
     90
    7591 
    7692The project aims at parsing inbound messages such as SMS from CERT responders after deployment.
    7793It enables the processing of responses to deployment notifications;which is essentially controlled by the module to which the message is routed.
     94[[BR]]
    7895
    7996Project Justification:
     97[[BR]]
    8098 
    8199Parsing of inbound messages is a critical utility for a trained volunteer group such as CERT(Community Emergency Response Teams) where communication between various deployments and volunteers play a vital role. As this will be a deployment-specific option, the functionality becomes an important component for Sahana Eden.
    82  
     100[[BR]]
     101 
     102
    83103Implementation Plan:
    84 Keeping the development of Profile Layer in mind and the functionality being a part of deployment-specific options, the rules for parsing are contained in private/prepopulate from where s3parsing.py imports them.
    85 The module contains a class S3ParsingModel which contains the “msg_workflow” data model (See https://docs.google.com/document/d/1Y9dDCshurrZSw33r-RC_uVQ_Va6_LEZM-2aLcaT2Krc/edit?pli=1) and another class S3Parsing in which the parsing routines are defined which decide the various parsing workflows.
     104[[BR]]
     105
     106*Keeping the development of Profile Layer in mind and the functionality being a part of deployment-specific options, the rules for parsing are contained in private/prepopulate from where s3parsing.py imports them.
     107[[BR]]
     108*The module contains a class S3ParsingModel which contains the “msg_workflow” data model (See https://docs.google.com/document/d/1Y9dDCshurrZSw33r-RC_uVQ_Va6_LEZM-2aLcaT2Krc/edit?pli=1) and another class S3Parsing in which the parsing routines are defined which decide the various parsing workflows.
     109[[BR]]
    86110The current parsing rules implement the functionality in the following manner:
    87 The inbound message text is passed as an argument to the parse_message() method in the s3msg.py module.
    88 The text is matched with a predefined list of primary and contact keywords after splitting with whitespace as the delimiter.
    89 A database query is generated to the concerned database according to the matched keywords.
    90 The query retrieves the relevant field values and generates a reply to the inbound message query.
    91 Also these parsing rules have been implemented only for modules –  ‘Person’ , ‘Hospital’  and ‘Organisation’.
     111[[BR]]
     1121.The inbound message text is passed as an argument to the parse_message() method in the s3msg.py module.
     113[[BR]]
     1142.The text is matched with a predefined list of primary and contact keywords after splitting with whitespace as the delimiter.
     115[[BR]]
     1163.A database query is generated to the concerned database according to the matched keywords.
     117[[BR]]
     1184.The query retrieves the relevant field values and generates a reply to the inbound message query.
     119[[BR]]
     1205.Also these parsing rules have been implemented only for modules –  ‘Person’ , ‘Hospital’  and ‘Organisation’.
     121[[BR]]
    92122Extending these rules to other modules can be in scope of the project.
    93123 
    94 One of the main issues will be  identifying the messages that belong to a particular  source, so it could have its own processing.Now, that here is handled by the data model which defines a ‘msg_workflow' table in the database which links the Source to the Workflow with any required args.So the essential features of this approach have been listed below:
    95 The Parser workflow table links 'SMS Source X' to 'Workflow Y'.
    96 Now, designing the details of the Workflow Y would be a developer task.
    97 Whereas linking ‘SMS X’ to ‘Workflow Y’ will be a configurable option.
    98 So essentially,the Parser Table links Source to Workflow with any other required args & this acts like a Template for the schduler_task table.
    99 Now, a task process_log() is defined in tasks.py , where the objective of process_log() is to scan through all the messages in msg_log; and process those for parsing which are flagged as unparsed (is_parsed=False).The task is scheduled in zzz_1st_run.py where it is chained to the concerned parsing task(this is achieved by the msg_workflow table, the ‘source_task_id’ field in msg_log will help retrieve the respective parsing workflow_task_id from msg_workflow).
    100 Also,this allows for chaining of workflows where a source for a workflow could be another workflow instead of an Incoming source.We can have 2nd-pass Parser workflows which don't start from the Source direct but can plugged as output from a 1st-pass one.
     124[[BR]]
     125[[BR]]
     126*One of the main issues will be  identifying the messages that belong to a particular  source, so it could have its own processing.Now, that here is handled by the data model which defines a ‘msg_workflow' table in the database which links the Source to the Workflow with any required args.So the essential features of this approach have been listed below:
     127[[BR]]
     1281.The Parser workflow table links 'SMS Source X' to 'Workflow Y'.
     129[[BR]]
     1302.Now, designing the details of the Workflow Y would be a developer task.
     131[[BR]]
     1323.Whereas linking ‘SMS X’ to ‘Workflow Y’ will be a configurable option.
     133[[BR]]
     1344.So essentially,the Parser Table links Source to Workflow with any other required args & this acts like a Template for the schduler_task table.
     135[[BR]]
     136
     137*Now, a task process_log() is defined in tasks.py , where the objective of process_log() is to scan through all the messages in msg_log; and process those for parsing which are flagged as unparsed (is_parsed=False).The task is scheduled in zzz_1st_run.py where it is chained to the concerned parsing task(this is achieved by the msg_workflow table, the ‘source_task_id’ field in msg_log will help retrieve the respective parsing workflow_task_id from msg_workflow).
     138[[BR]]
     139*Also,this allows for chaining of workflows where a source for a workflow could be another workflow instead of an Incoming source.We can have 2nd-pass Parser workflows which don't start from the Source direct but can plugged as output from a 1st-pass one.
    101140            Source -> process_log() ->1st pass parser -> detailed Parser ---> Module
    102 Here,the 1st pass parser is customized per-deployment;and decides which email source goes to a particular workflow (simple msg_workslow link) or decides based on other factors such as keywords to which main workflow the messages should be passed.
    103 The data model is  integrated with the prepopulate folders (or a sub-folder say private/prepopulate/parsing) which serves as the initial UI.The post-install UI will consist of a CRUD interface admin panel, a simple s3_rest_controller().However, eventually this is planned to be the part of the WebSetup.
    104 We want to be able to direct the message to the appropriate module to handle the data.This could be done either by launching a real REST request or else simulating one via the API.
     141[[BR]]
     142*Here,the 1st pass parser is customized per-deployment;and decides which email source goes to a particular workflow (simple msg_workslow link) or decides based on other factors such as keywords to which main workflow the messages should be passed.
     143[[BR]]
     144*The data model is  integrated with the prepopulate folders (or a sub-folder say private/prepopulate/parsing) which serves as the initial UI.The post-install UI will consist of a CRUD interface admin panel, a simple s3_rest_controller().However, eventually this is planned to be the part of the WebSetup.
     145[[BR]]
     146*We want to be able to direct the message to the appropriate module to handle the data.This could be done either by launching a real REST request or else simulating one via the API.
    105147             resource = s3mgr.define_resource("module", "resourcename")
    106 Messages which are routed to a specific resource can be subscribed to by the user.For this purpose,we can use the existing Save Search and Subscription functionality where the user can subscribe to new messages for a specific resource using a resource filter.The msg_log can be made a component for the resources.Now,if it's a component, then when someone opens the resource, messages will be there in a tab.Also, if the message has to be tied to multiple resources, then we can use a relationship (link) table.
    107 Implementing/extending  the utility for other modules especially the IRS module will be of real use, where enabling to log reports through SMS will be vital, which can also use the OpenGeoSMS encoding standards(LatLon generates a google-maps URL) for integration with our Android Client. A dedicated routine to generate OpenGeoSMS URLs already exists in prepare_opengeosms() in s3msg.py itself. So integration with the parsing routine won’t be difficult. Other modules for which this can be implemented are : ‘Request’  and ’Inventory’.
    108 Finally the code will be tested on the system and the bugs (if any ;-) ) will be fixed.
    109  
    110  
     148[[BR]]
     149*Messages which are routed to a specific resource can be subscribed to by the user.For this purpose,we can use the existing Save Search and Subscription functionality where the user can subscribe to new messages for a specific resource using a resource filter.The msg_log can be made a component for the resources.Now,if it's a component, then when someone opens the resource, messages will be there in a tab.Also, if the message has to be tied to multiple resources, then we can use a relationship (link) table.
     150[[BR]]
     151*Implementing/extending  the utility for other modules especially the IRS module will be of real use, where enabling to log reports through SMS will be vital, which can also use the OpenGeoSMS encoding standards(LatLon generates a google-maps URL) for integration with our Android Client. A dedicated routine to generate OpenGeoSMS URLs already exists in prepare_opengeosms() in s3msg.py itself. So integration with the parsing routine won’t be difficult. Other modules for which this can be implemented are : ‘Request’  and ’Inventory’.
     152[[BR]]
     153*Finally the code will be tested on the system and the bugs (if any ;-) ) will be fixed.
     154 
     155 
     156[[BR]]
    111157 
    112158Future Options:
    113 Though the parsing rules will be generic , a few minor tweaks for other processes such as Email and Twitter will have to be performed to maintain its generic nature.
    114 One of the most valuable functionality that can be added here is to make the SMS communication more interactive. e.g. the text body received does not match any of the expected  keywords , the API dispatches a reply stating the expected format of the message.
    115 Adapting the parsing rules to cover as wide a base of inbound messages as possible. This will involve making a wider collection of keywords to be searched for every concerned module.Linking different labels  across the DB to module-specific keywords will be really helpful.Also the list of primary keywords to be matched can also be made a deployment-specific option.
     159[[BR]]
     160*Though the parsing rules will be generic , a few minor tweaks for other processes such as Email and Twitter will have to be performed to maintain its generic nature.
     161[[BR]]
     162*One of the most valuable functionality that can be added here is to make the SMS communication more interactive. e.g. the text body received does not match any of the expected  keywords , the API dispatches a reply stating the expected format of the message.
     163[[BR]]
     164*Adapting the parsing rules to cover as wide a base of inbound messages as possible. This will involve making a wider collection of keywords to be searched for every concerned module.Linking different labels  across the DB to module-specific keywords will be really helpful.Also the list of primary keywords to be matched can also be made a deployment-specific option.
     165[[BR]]
    116166 
    117167 
    118168Relevant Experience:
    119  
    120 I have developed a thorough understanding of the existing parsing routine in the application. Also, I am comfortable using various parsing generators. I have discussed many of the ideas in the proposal with the mentors and rest of the community.
    121 My experience with the Sahana community has been very enjoyable so far, Sahanathon being one of the highlights where I got the opportunity to demonstrate my ability to work with the code and contribute to Sahana Eden. My notable contributions so far have been listed below:
    122 I solved the bug #1132 in the Trac (http://eden.sahanafoundation.org/ticket/1132) which was merged during the Sahanathon itself. Pull request: https://github.com/flavour/eden/pull/31
    123 Reported and fixed a defect with the update (“Open”) button in the saved searches table.
     169[[BR]]
     170 
     171*I have developed a thorough understanding of the existing parsing routine in the application. Also, I am comfortable using various parsing generators. I have discussed many of the ideas in the proposal with the mentors and rest of the community.
     172[[BR]]
     173*My experience with the Sahana community has been very enjoyable so far, Sahanathon being one of the highlights where I got the opportunity to demonstrate my ability to work with the code and contribute to Sahana Eden. My notable contributions so far have been listed below:
     174[[BR]]
     175*I solved the bug #1132 in the Trac (http://eden.sahanafoundation.org/ticket/1132) which was merged during the Sahanathon itself. Pull request: https://github.com/flavour/eden/pull/31
     176[[BR]]
     177*Reported and fixed a defect with the update (“Open”) button in the saved searches table.
    124178Made milestones in the project task workflow a deployment-specific option.  (See https://github.com/flavour/eden/pull/35 ).
    125 Fixed email_settings() in the msg controller and required changes in the menu. (See https://github.com/flavour/eden/pull/42 ).
    126  
    127  
    128  
    129  
    130  
    131 5. Project Goals and Timeline
    132  
    133  
     179[[BR]]
     180*Fixed email_settings() in the msg controller and required changes in the menu. (See https://github.com/flavour/eden/pull/42 ).
     181 
     182 
     183[[BR]]
     184[[BR]]
     185 
     186 
     187 
     188===5. Project Goals and Timeline===
     189 
     190 
     191[[BR]]
    134192 
    135193Work Already Undertaken:
    136  
    137 Currently parsing is implemented by the parse_message() method in the s3msg.py module, though its usage is limited or rather unimplemented as of now.
    138 Also, the current method is hard-coded and inefficeient to handle different processes.
    139 A dedicated data model has been developed with consent of the mentors.The msg_workflow has been defined exhaustively in the implementation details and also in the linked gdoc.
    140 Mechanism to route messages to resources has also been designed and discussed.
     194 [[BR]]
     195
     196*Currently parsing is implemented by the parse_message() method in the s3msg.py module, though its usage is limited or rather unimplemented as of now.
     197[[BR]]
     198*Also, the current method is hard-coded and inefficeient to handle different processes.
     199[[BR]]
     200*A dedicated data model has been developed with consent of the mentors.The msg_workflow has been defined exhaustively in the implementation details and also in the linked gdoc.
     201[[BR]]
     202*Mechanism to route messages to resources has also been designed and discussed.
     203[[BR]]
    141204 
    142205First trimester:
     206[[BR]]
    143207
    144208
    145209Due Date -SMART Goal-Measure
     210[[BR]]
    146211(24th April  - 7th May)-
     212[[BR]]
    147213Development of the workflow handling module s3parsing.py starts.
     214[[BR]]
    148215-
     216[[BR]]
    1492171.Community bonding period:
     218[[BR]]
    150219I have been involved with the community for some time now, so won’t take a lot of time :)
     220[[BR]]
    1512212.Decision to outline the template.
    152  
     222[[BR]]
     223 
     224[[BR]]
    153225(8th May – 21st May)-
     226[[BR]]
    154227The msg_workflow data model is developed.
     228[[BR]]
    155229S3ParsingModel starts to take shape.
     230[[BR]]
    156231-Code committed locally.
    157  
    158  
    159  
    160  
    161  
    162  
     232[[BR]]
     233 
     234 
     235 
     236 
     237 
     238 
     239[[BR]]
    163240Second Trimester:
    164  
     241[[BR]]
     242 
     243[[BR]]
    165244Due Date- SMART Goal -Measure
     245[[BR]]
    16624628th May  -*(won’t be available due to university exams )
    167  
     247[[BR]]
     248 
     249[[BR]]
    1682504th June-
     251[[BR]]
    169252CERT:Deployment Request SMS Handler development starts ( critical requirement of the project).
     253[[BR]]
    170254Parsing workflow is developed.
     255[[BR]]
    171256- SMS response processing starts to take shape.
    172  
     257[[BR]]
     258 
     259[[BR]]
    17326011th June-
     261[[BR]]
    174262CERT:Deployment Request SMS Handler development continues ( critical requirement of the project).
     263[[BR]]
    175264-
     265[[BR]]
    1762661.Code committed locally.
     267[[BR]]
    1772682.Tested on local system.
    178  
     269[[BR]]
     270 
     271[[BR]]
    17927217th June-
     273[[BR]]
    180274process_log() method is designed.
     275[[BR]]
    181276Tweaks in msg_log implemented. 
     277[[BR]]
    182278-Code committed to trunk.
    183  
     279[[BR]]
     280 
     281[[BR]]
     282[[BR]]
    1842832nd July -
     284[[BR]]
    185285Parsing workflow is chained to process_log().
     286[[BR]]
    186287Sources are linked to respective workflows.
     288[[BR]]
    187289-Code committed to trunk.
    188  
     290[[BR]]
     291 
     292[[BR]]
     293[[BR]]
     294[[BR]]
     295[[BR]]
    1892968th  July -
     297[[BR]]
    190298Clickatell functionality is developed for eden.
     299[[BR]]
    191300Clickatell allows for a more robust testing mechanism.
     301[[BR]]
    192302Commits are altered with mentor feedback and suggested changes.
    193  
     303[[BR]]
     304 
     305[[BR]]
     306[[BR]]
    194307-
     308[[BR]]
    1953091.Commits so far tested thoroughly.
     310[[BR]]
    1963112.Bug fixing.
     312[[BR]]
    1973133.Code committed to trunk.
    198  
     314[[BR]]
     315 
     316[[BR]]
    199317[Mid-Term evaluations :-) ]
    200  
     318[[BR]]
     319 
     320[[BR]]
    201321 
    202322 
     
    204324 
    205325Third Trimester:
     326[[BR]]
    206327 
    207328Due Date-SMART Goal-Measure
    208 
     329[[BR]]
     330
     331[[BR]]
    20933215th July-
     333[[BR]]
    210334Integration with prepopulate folders.
     335[[BR]]
    211336Development of post-install UI (CRUD interface admin panel).
     337[[BR]]
     338[[BR]]
     339[[BR]]
    212340- Code committed locally.
     341[[BR]]
    213342 
    21434322nd July-
     344[[BR]]
    215345Routing mechanism to resources.
     346[[BR]]
    216347Save search and subscription implemented for msg_log.
     348[[BR]]
    217349-Code committed to trunk.
    218  
     350[[BR]]
     351 
     352[[BR]]
    21935329th July-
     354[[BR]]
    220355OpenGeoSMS (process_opengeosms() ) routine is tweaked and linked with the respective parsing methods.
     356[[BR]]
     357[[BR]]
    221358The existing functionality in the Android Client is tested .
     359[[BR]]
    222360-Code committed to trunk.
     361[[BR]]
    223362 
    2243635th August-
     364[[BR]]
    225365Integration of the IRS module for incident reporting through SMS.
     366[[BR]]
    226367The functionality is extended to other modules (if needed).
     368[[BR]]
    227369Feedback from mentors.
     370[[BR]]
    228371-Code committed locally.
     372[[BR]]
    229373 
    23037413th August-
     375[[BR]]
    231376System testing & Bug fixing.
     377[[BR]]
    232378Final changes to the code are applied.
     379[[BR]]
     380[[BR]]
    233381-
     382[[BR]]
    2343831.Project reaches final stage.
     384[[BR]]
    2353852.Bug fixes
     386[[BR]]
    2363873.Final Code committed to trunk.
    237  
    238 20th August- PENCILS DOWN! -Project Completed. :-)
     388[[BR]]
     389 
     39020th August- PENCILS DOWN! -Project Completed. :-)[[BR]]