Changes between Version 47 and Version 48 of BluePrint/TextSearch
- Timestamp:
- 04/29/13 15:16:22 (12 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
BluePrint/TextSearch
v47 v48 85 85 * Proper understanding and the work model of '''S3Filter''' is required. 86 86 87 * Literature study of Apache Lucene and Pylucene. Getting familiar with '''Pylucene'''and deploy it into my local machine.87 * Literature study of Apache Solr and Pylucene. Getting familiar with both of the and deploy it into my local machine. 88 88 89 89 * Studying the linkage of the Lucene daemon and web2py server. … … 100 100 * Efficient search mechanism to search over all the resource. 101 101 102 === Non-functional ===103 === Standards ===104 102 === System Constraints === 105 106 * The user should have Pylucene installed in there machine.107 108 * Also, while starting the web2py server, the Lucence deamon should also start.109 110 * In case of failure, the search query related to full-text search will not be functional.111 103 112 104 == Use-Cases == … … 170 162 [[BR]] 171 163 172 * ''' Apache Lucene''' and '''Pylucene'''164 * '''Pylucene''' or '''Apache Solr''' 173 165 174 166 A comparison analysis was done here whether to choose between Apache Lucene or Apache Solr.[[BR]] … … 187 179 ==== Full-Text Search ==== 188 180 189 TODO 190 181 * As suggested by Dominic, First, we need to send the uploaded documents to the indexer(to a external content search engine like Solr/Pylucene) in onaccept. 182 183 * We need to make a '''Filter Widget''' for upload files which would be a simple text field for Advanced search and checkbox for simple search. 184 185 * We need to extend the functionality of the S3ResourceFilter after extracting all result IDs in S3Resource.select . 186 187 * Then after extracting all the id's, it would identify all the document content filters, then will extract the file system path and then run them through the external contents search engine (Apache Solr/Pylucene), which would in turn return the IDs of the matching items along with the documents. 188 189 * Along with the IDs, we also need the snippet of the matching text in the respective document (as there in Google Search) 190 191 * After running the master query against the record IDs which we obtained from filtering, we combine the results and show up it in the UI. 192 193 * We also need to have a user friendly UI for the users(take inspiration from the Google or Bing! search results.) 191 194 192 195 * The main thing we could focus on is the efficiency of the search, as we know it will be '''Computationally challenging''' to perform accurate search. … … 209 212 210 213 == References == 214 215 === Mailing List Discussion === 216 [[BR]] 217 https://groups.google.com/forum/?fromgroups=#!topic/sahana-eden/bopw1fX-uW0 218 211 219 === Chats and Discussions === 212 220 http://logs.sahanafoundation.org/sahana-eden/2013-03-24.txt