Searching Guidelines
Lucene allow searching and indexing simultaneously. However, an IndexReader only searches the index as of the "point in time" that it was opened. Either any updates to the index, added or deleted documents, will not be visible until the IndexReader is re-opened. Therefore, your application must periodically re-open its IndexReaders to see the latest updates. The IndexReader.isCurrent() method allows you to test whether any updates have occurred to the index since your IndexReader was opened.
Lucene supports wild card queries which allow you to perform searches such as book*, which will find documents containing termssuch as book, bookstore, booklet, etc. Lucene refers to this type of aquery as a 'prefix query'.
Lucene also supports wild card queries, which allow you to place a wild card in the middle of the query term. For instance, you could make searches like: mi*pelling. That will match both misspelling, which is the correct way to spell this word, as well as mispelling, which is a common spelling mistake.
Another wild card character that you can use is '?', a question mark. The ? will match a single character. This allows you to perform queries such as Bra?il. Such a query will match both Brasil and Brazil. Lucene refers to this type of a query as a 'wildcard query'.
Leading wildcards (e.g. *ook) are not supported by the QueryParser by default. As of Lucene 2.1, they can be enabled by calling QueryParser.setAllowLeadingWildcard (true ). Note that this can be an expensive operation: it requires scanning the list of tokens in the index in its entirety to look for those that match the pattern.
To restrict searches to only return results from a limited subset of documents in the index (e.g. for privacy reasons) The QueryFilter class is designed precisely for such cases.
Another way of doing it is the following: Just before calling IndexSearcher.search() add a clause to the query to exclude documents in categories not permitted for this search.
If you are restricting access with a prohibited term, and someone tries to require that term, then the prohibited restriction wins. If you are restricting access with a required term, and they try prohibiting that term, then they will get no documents in their search result.
As for deciding whether to use required or prohibited terms, if possible, you should choose the method that names the less frequent term. That will make queries faster.
Subscribe to:
Post Comments (Atom)


No comments:
Post a Comment