Meet Lucene, Page 4
1.4.2 Searching an index
Searching in Lucene is as fast and simple as indexing; the power of this functionality is astonishing, as chapters 3 and 5 will show you. For now, let's look at Searcher, a command-line program that we'll use to search the index created by Indexer. (Keep in mind that our Searcher serves the purpose of demonstrating the use of Lucene's search API. Your search application could also take a form of a web or desktop application with a GUI, an EJB, and so on.)
In the previous section, we indexed a directory of text files. The index, in this example, resides in a directory of its own on the file system. We instructed Indexer to create a Lucene index in a build/index directory, relative to the directory from which we invoked Indexer. As you saw in listing 1.1, this index contains the indexed files and their absolute paths. Now we need to use Lucene to search that index in order to find files that contain a specific piece of text. For instance, we may want to find all files that contain the keyword java or lucene, or we may want to find files that include the phrase "system requirements".
Using Searcher to implement a search
The Searcher program complements Indexer and provides command-line searching capability. Listing 1.2 shows Searcher in its entirety. It takes two command-line arguments:
- The path to the index created with Indexer
- A query to use to search the index
Listing 1.2 Searcher: searches a Lucene index for a query passed as an argument.
Searcher, like its Indexer sibling, has only a few lines of code dealing with Lucene. A couple of special things occur in the search method,
Editor's note: The following numbered steps refer to the numbers in Listing 1.2.
- We use Lucene's IndexSearcher and FSDirectory classes to open our index for searching.
- We use QueryParser to parse a human-readable query into Lucene's Query class.
- Searching returns hits in the form of a Hits object.
- Note that the Hits object contains only references to the underlying documents. In other words, instead of being loaded immediately upon search, matches are loaded from the index in a lazy fashion—only when requested with the hits.doc(int) call.
Let's run Searcher and find some documents in our index using the query 'lucene':
%java lia.meetlucene.Searcher build/index 'lucene'Found 6 document(s) (in 66 milliseconds) that matched query 'lucene':/lucene/README.txt/lucene/src/jsp/README.txt/lucene/BUILD.txt/lucene/todo.txt/lucene/LICENSE.txt/lucene/CHANGES.txt
The output shows that 6 of the 13 documents we indexed with Indexer contain the word lucene and that the search took a meager 66 milliseconds. Because Indexer stores files' absolute paths in the index, Searcher can print them out. It's worth noting that storing the file path as a field was our decision and appropriate in this case, but from Lucene's perspective it's arbitrary meta-data attached to indexed documents.
Of course, you can use more sophisticated queries, such as 'lucene AND doug' or 'lucene AND NOT slow' or '+lucene +book', and so on. Chapters 3, 5, and 6 cover all different aspects of searching, including Lucene's query syntax.