Changes between Version 47 and Version 48 of BluePrint/TextSearch


Ignore:
Timestamp:
04/29/13 15:16:22 (12 years ago)
Author:
Vishrut Mehta
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • BluePrint/TextSearch

    v47 v48  
    8585*    Proper understanding and the work model of '''S3Filter''' is required.
    8686
    87 *    Literature study of Apache Lucene and Pylucene. Getting familiar with '''Pylucene''' and deploy it into my local machine.
     87*    Literature study of Apache Solr and Pylucene. Getting familiar with both of the and deploy it into my local machine.
    8888
    8989*    Studying the linkage of the Lucene daemon and web2py server.
     
    100100*    Efficient search mechanism to search over all the resource.
    101101 
    102 === Non-functional ===
    103 === Standards ===
    104102=== System Constraints ===
    105 
    106 *    The user should have Pylucene installed in there machine.
    107 
    108 *    Also, while starting the web2py server, the Lucence deamon should also start.
    109 
    110 *    In case of failure, the search query related to full-text search will not be functional.
    111103
    112104== Use-Cases ==
     
    170162[[BR]]
    171163
    172 *    '''Apache Lucene''' and '''Pylucene'''
     164*    '''Pylucene''' or '''Apache Solr'''
    173165
    174166A comparison analysis was done here whether to choose between Apache Lucene or Apache Solr.[[BR]]
     
    187179==== Full-Text Search ====
    188180
    189 TODO
    190 
     181*    As suggested by Dominic, First, we need to send the uploaded documents to the indexer(to a external content search engine like Solr/Pylucene) in onaccept.
     182
     183*    We need to make a '''Filter Widget''' for upload files which would be a simple text field for Advanced search and checkbox for simple search.
     184
     185*    We need to extend the functionality of the S3ResourceFilter after extracting all result IDs in S3Resource.select .
     186
     187*    Then after extracting all the id's, it would identify all the document content filters, then will extract the file system path and then run them through the external contents search engine (Apache Solr/Pylucene), which would in turn return the IDs of the matching items along with the documents.
     188
     189*    Along with the IDs, we also need the snippet of the matching text in the respective document (as there in Google Search)
     190
     191*    After running the master query against the record IDs which we obtained from filtering, we combine the results and show up it in the UI.
     192
     193*    We also need to have a user friendly UI for the users(take inspiration from the Google or Bing! search results.)
    191194
    192195*    The main thing we could focus on is the efficiency of the search, as we know it will be '''Computationally challenging''' to perform accurate search.
     
    209212 
    210213== References ==
     214
     215=== Mailing List Discussion ===
     216[[BR]]
     217https://groups.google.com/forum/?fromgroups=#!topic/sahana-eden/bopw1fX-uW0
     218
    211219=== Chats and Discussions ===
    212220http://logs.sahanafoundation.org/sahana-eden/2013-03-24.txt