Meet Lucene Part 2
1.7.2 Indexing and searching applications
The other group of available software, both free and commercial, is assembled into prepackaged products. Such software usually doesn't expose a lot of its API and doesn't require you to build a custom application on top of it. Most of this software exposes a mechanism that lets you control a limited set of parameters but not enough to use the software in a way that's drastically different from its assumed use. (To be fair, there are notable exceptions to this rule.)
As such, we can't compare this software to Lucene directly. However, some of these products may be sufficient for your needs and let you get running quickly, even if Lucene or some other IR library turns out to be a better choice in the long run. Here's a short list of several popular products in this category:
- SWISH, SWISH-E, and SWISH++—http://homepage.mac.com/pauljlucas/software/swish/, http://swish-e.org/
- Glimpse and Webglimpse—http://webglimpse.net/
- Harvest and Harvest-NGhttp://www.sourceforge.net/projects/harvest/, http://webharvest.sourceforge.net/ng/
- Microsoft Index Server—http://www.microsoft.com/NTServer/techresources/ webserv/IndxServ.asp
1.7.3 Online resources
- DMOZ—At the DMOZ Open Directory Project (ODP), you'll find http://dmoz.org/Computers/Software/Information_Retrieval/ and all its subcategories very informative.
- Google—Although Google Directory is based on the Open Directory's data, the two directories do differ. So, you should also visit http://directory.google.com/Top/Computers/Software/Information_Retrieval/.
- Searchtools—There is a web site dedicated to search tools at http://www.searchtools.com/. This web site isn't always up to date, but it has been around for years and is fairly comprehensive. Software is categorized by operating system, programming language, licenses, and so on. If you're interested only in search software written in Java, visit http://www.searchtools.com/tools/tools-java.html.
We've provided positive reviews of some alternatives to Lucene, but we're confident that your requisite homework will lead you to Lucene as the best choice!
In this chapter, you've gained some basic Lucene knowledge. You now know that Lucene is an Information Retrieval library, not a ready-to-use product, and that it most certainly is not a web crawler, as people new to Lucene sometimes think. You've also learned a bit about how Lucene came to be and about the key people and the organization behind it.
In the spirit of Manning's in Action books, we quickly got to the point by showing you two standalone applications, Indexer and Searcher, which are capable of indexing and searching text files stored in a file system. We then briefly described each of the Lucene classes used in these two applications. Finally, we presented our research findings for some products similar to Lucene.
Search is everywhere, and chances are that if you're reading this book, you're interested in search being an integral part of your applications. Depending on your needs, integrating Lucene may be trivial, or it may involve architectural considerations.
We've organized the next couple of chapters as we did this chapter. The first thing we need to do is index some documents; we discuss this process in detail in chapter 2.
About the Authors
Erik Hatcher codes, writes, and speaks on technical topics that he finds fun and challenging. He has written software for a number of diverse industries using many diffedifferentnologies and languages. Erik coauthored Java Development with Ant (Manning, 2002) with Steve Loughran, a book that has received wonderful industry acclaim. Since the release of Erik's first book, he has spoken at numerous venues including the No Fluff, Just Stuff symposium circuit, JavaOne, O'Reilly's Open Source Convention, the Open Source Content Management Conference, and many Java User Group meetings. As an Apache Software Foundation member, he is an active contributor and committer on several Apache projects including Lucene, Ant, and Tapestry. Erik currently works at the University of Virginia's Humanities department supporting Applied Research in Patacriticism.
Otis Gospodnetic has been an active Lucene developer for four years and maintains the jGuru Lucene FAQ. He is a Software Engineer at Wireless Generations, a company that develops technology solutions for educational assessments of students and teachers. In his spare time, he develops Simpy, a Personal Web Service that uses Lucene, which he created out of his passion for knowledge, information retrieval, and management. Previous technical publications include several articles about Lucene, published by O'Reilly Network and IBM developerWorks. Otis also wrote To Choose and Be Chosen: Pursuing Education in America, a guidebook for foreigners wishing to study in the United States; it's based on his own experience.
About the BookLucene in Action by Erik Hatcher and Otis Gospodnetic
Foreword by Doug Cutting, the inventor of Lucene
Published December 2004, Softbound, 456 pages
Published by Manning Publications Co.
Retail price: $44.95
Ebook price: $22.50. To purchase the ebook go to http://www.manning.com/hatcher2.
This material is from Chapter 1 of the book.
Page 3 of 3