Meet Lucene, Page 5
Using the xargs utility
The Searcher class is a simplistic demo of Lucene's search features. As such, it only dumps matches to the standard output. However, Searcher has one more trick up its sleeve. Imagine that you need to find files that contain a certain keyword or phrase, and then you want to process the matching files in some way. To keep things simple, let's imagine that you want to list each matching file using the ls UNIX command, perhaps to see the file size, permission bits, or owner. By having matching document paths written unadorned to the standard output, and having the statistical output written to standard error, you can use the nifty UNIX xargs utility to process the matched files, as shown here:
% java lia.meetlucene.Searcher build/index 'lucene AND NOT slow' | xargs ls -lFound 6 document(s) (in 131 milliseconds) that matched query 'lucene AND NOT slow':-rw-r--r-- 1 erik staff 4215 10 Sep 21:51 /lucene/BUILD.txt-rw-r--r-- 1 erik staff 17889 28 Dec 10:53 /lucene/CHANGES.txt-rw-r--r-- 1 erik staff 2670 4 Nov 2001 /lucene/LICENSE.txt-rw-r--r-- 1 erik staff 683 4 Nov 2001 /lucene/README.txt-rw-r--r-- 1 erik staff 370 26 Jan 2002 /lucene/src/jsp/ README.txt-rw-r--r-- 1 erik staff 943 18 Sep 21:27 /lucene/todo.txt
In this example, we chose the Boolean query 'lucene AND NOT slow', which finds all files that contain the word lucene and don't contain the word slow. This query took 131 milliseconds and found 6 matching files. We piped Searcher's output to the xargs command, which in turn used the ls l command to list each matching file. In a similar fashion, the matched files could be copied, concatenated, emailed, or dumped to standard output.3
Our example indexing and searching applications demonstrate Lucene in a lot of its glory. Its API usage is simple and unobtrusive. The bulk of the code (and this applies to all applications interacting with Lucene) is plumbing relating to the business purpose—in this case, Indexer's file system crawler that looks for text files and Searcher's code that prints matched filenames based on a query to the standard output. But don't let this fact, or the conciseness of the examples, tempt you into complacence: There is a lot going on under the covers of Lucene, and we've used quite a few best practices that come from experience. To effectively leverage Lucene, it's important to understand more about how it works and how to extend it when the need arises. The remainder of this book is dedicated to giving you these missing pieces.
|More to Come|
The rest of this sample chapter will appear on our website starting March 31st.
1 Erik freely admits to his fondness of all things Apple.
2 Lucene is Doug's wife's middle name; it's also her maternal grandmother's first name.
3 Neal Stephenson details this process nicely in "In the Beginning Was the Command Line": http://www.cryptonomicon.com/beginning.html.
About the Authors
Erik Hatcher codes, writes, and speaks on technical topics that he finds fun and challenging. He has written software for a number of diverse industries using many diffedifferentnologies and languages. Erik coauthored Java Development with Ant (Manning, 2002) with Steve Loughran, a book that has received wonderful industry acclaim. Since the release of Erik's first book, he has spoken at numerous venues including the No Fluff, Just Stuff symposium circuit, JavaOne, O'Reilly's Open Source Convention, the Open Source Content Management Conference, and many Java User Group meetings. As an Apache Software Foundation member, he is an active contributor and committer on several Apache projects including Lucene, Ant, and Tapestry. Erik currently works at the University of Virginia's Humanities department supporting Applied Research in Patacriticism.
Otis Gospodnetic has been an active Lucene developer for four years and maintains the jGuru Lucene FAQ. He is a Software Engineer at Wireless Generations, a company that develops technology solutions for educational assessments of students and teachers. In his spare time, he develops Simpy, a Personal Web Service that uses Lucene, which he created out of his passion for knowledge, information retrieval, and management. Previous technical publications include several articles about Lucene, published by O'Reilly Network and IBM developerWorks. Otis also wrote To Choose and Be Chosen: Pursuing Education in America, a guidebook for foreigners wishing to study in the United States; it's based on his own experience.
About the BookLucene in Action by Erik Hatcher and Otis Gospodnetic
Foreword by Doug Cutting, the inventor of Lucene
Published December 2004, Softbound, 456 pages
Published by Manning Publications Co.
Retail price: $44.95
Ebook price: $22.50. To purchase the ebook go to http://www.manning.com/hatcher2.
This material is from Chapter 1 of the book.