Context Navigation

Changes between Version 47 and Version 48 of BluePrint/TextSearch

Timestamp:: 04/29/13 15:16:22 (12 years ago)
Author:: Vishrut Mehta
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

BluePrint/TextSearch

-              v47
+              v48
 *    Proper understanding and the work model of '''S3Filter''' is required.
 *    Literature study of Apache Lucene and Pylucene. Getting familiar with '''Pylucene''' and deploy it into my local machine.
+*    Literature study of Apache Solr and Pylucene. Getting familiar with both of the and deploy it into my local machine.
 *    Studying the linkage of the Lucene daemon and web2py server.
 …
 *    Efficient search mechanism to search over all the resource.
-=== Non-functional ===
-=== Standards ===
 === System Constraints ===
-*    The user should have Pylucene installed in there machine.
-*    Also, while starting the web2py server, the Lucence deamon should also start.
-*    In case of failure, the search query related to full-text search will not be functional.
 == Use-Cases ==
 …
 [[BR]]
 *    '''Apache Lucene''' and '''Pylucene'''
+*    '''Pylucene''' or '''Apache Solr'''
 A comparison analysis was done here whether to choose between Apache Lucene or Apache Solr.[[BR]]
 …
 ==== Full-Text Search ====
+TODO
+*    As suggested by Dominic, First, we need to send the uploaded documents to the indexer(to a external content search engine like Solr/Pylucene) in onaccept.
+*    We need to make a '''Filter Widget''' for upload files which would be a simple text field for Advanced search and checkbox for simple search.
+*    We need to extend the functionality of the S3ResourceFilter after extracting all result IDs in S3Resource.select .
+*    Then after extracting all the id's, it would identify all the document content filters, then will extract the file system path and then run them through the external contents search engine (Apache Solr/Pylucene), which would in turn return the IDs of the matching items along with the documents.
+*    Along with the IDs, we also need the snippet of the matching text in the respective document (as there in Google Search)
+*    After running the master query against the record IDs which we obtained from filtering, we combine the results and show up it in the UI.
+*    We also need to have a user friendly UI for the users(take inspiration from the Google or Bing! search results.)
 *    The main thing we could focus on is the efficiency of the search, as we know it will be '''Computationally challenging''' to perform accurate search.
 …
 == References ==
+=== Mailing List Discussion ===
+[[BR]]
+https://groups.google.com/forum/?fromgroups=#!topic/sahana-eden/bopw1fX-uW0
 === Chats and Discussions ===
 http://logs.sahanafoundation.org/sahana-eden/2013-03-24.txt