An Introduction to Vista's Integrated Search Capabilities
We all have files on our PCs, lots of files in some cases, and there is nothing more frustrating than "knowing" you have a file you can't find. Whether it be in your file system, email, or perhaps even zipped up and archived, we have all been there before. You were stuck either waiting for that overly eager dog to find it or off on your own, browsing folder after folder. Desktop search utilities, while not entirely new, were designed to be generic and therefore are not very granular in their search capabilities. You've even probably heard of a couple of the big ones, like Google Desktop search and Windows Desktop Search. With the release of Vista, this familiar capability now has been built directly into the OS. Developers now are able to provide users with targeted searches from within your applications without constructing or maintaining an indexing engine.
As previously eluded to, a rich full-text and metadata index store of your PC's documents, emails, and other files is baked right into Vista. (Think Windows Desktop Search, fully integrated throughout the operating system.) The ability to search Vista's index store is now integrated throughout the operating system. A quick and easy search box is located in the start menu, explorer-dialogs, the control panel, and embedded in applications. In most cases, this search box will do real-time searching as you type; Microsoft calls this "Instant Search." It allows you to search for a file based on its metadata, contents (full-text), or even get results based on the context in which you are searching. It even has the ability to filter with inline search commands, but I will get into that a bit later.
Now, the best part of what I described above (at least for the developer) is the fact that Microsoft is giving you access to the very same index store that Vista search uses. What makes it even better is that they have provided access to this store with something familiar to all developers, namely OLE DB. And, if that is not good enough, I should mention that you can query this index store with SQL syntax. Microsoft has essentially taken out all the stops and removed most of the work to include full and rich file searches from within your custom-developed applications.
In the following sections, I will cover some basics on connecting to this index store via OLE DB, skim over some syntax, and then build your first application that will take advantage of this new capability.
Using OLE DB and SQL to Query the Index
The best way to understand connecting to the index store is to view it as you would when connecting to any other database. You first need a connection string to tell your application where, what, and how to connect. With the index store, you still need a connection string and you utilize the Search.CollatorDSO provider. The following is an example of a full connection string to the index store:
string connectionString= "Provider=Search.CollatorDSO; Extended Properties="Application=Windows";";
As for the rest of the process, it is handled just like a normal connection to a DB. You'll need to create your OLE DBConnection object, query string (I will cover this later), OLE DBCommand object, and finally your OLE DBDataReader object. If you're not familiar with this setup, it will be covered a bit later when you do the full demo.
The only part of the process that is new is the Windows Search SQL syntax for the query string. Although the syntax shares many commonalities with SQL, such as SELECT, FROM, WHERE, ORDER BY, and GROUP BY, it also includes new properties to take advantage of things such as directory scope and full-text searches. Windows Search SQL syntax is an extension of the standard SQL-92 and SQL-99 syntax but enhanced for text-based searching. There are enough syntax nuances to fill their own article so I will not cover them in depth here. To get more information on setting up your OLE DB connection and on Windows Search SQL syntax, there is a good MSDN writeup here.
Start with a simple query to get introduced to some of the syntax. The query is going to return the display name of all files containing the word "Search" in the contents.
string queryString = "SELECT System.ItemNameDisplay FROM SYSTEMINDEX" + "WHERE CONTAINS ('Search')";
The SELECT has plenty of properties to choose from, and I would recommend taking a peek at the Windows Vista SDK documentation for a more complete list. Because there is only one indexing catalog, the FROM is always going to be the same unless you are searching a remote PC, in which case it would be [machineName.]SYSTEMINDEX. One other variation on this is SYSTEMINDEX..SCOPE() but, as I said previously, because there is only one catalog there is no need to add the "..SCOPE()" piece.
That leaves the WHERE clause and its predicate CONTAINS. There are a few variances on how to use CONTAINS, as well as a couple of different predicates the WHERE clause can use. For a good starting point, I would recommend the Windows Vista SDK documentation again, or there is a good blog article on it. Your use of it here limits the search to the contents of the item.
You can make the searches as simple or as complex as you want, depending on how many properties and clauses you include. However, you should not underestimate the potential of tapping this index with a simple approach. Directory, full-text, and relevancy (rank) searches are just a few examples of the power you have with the syntax. You will cover a bit more syntax in the sample application you are creating, so it's time to get started.
Building Your First Search Integrated Application on Vista
For this example, you are going to build a simple Windows forms application that searches Word documents based on a user-entered search value. It will also give the user the ability to do full-text searches or just limit the search to the metadata associated with the files. The search will return the title of the file, where it is located (path), and who authored it.
First, create a new Windows forms application project in Visual Studio. Keep in mind that to get the full functionality out of this application, you will have to run it on Vista. On the form, add the following controls:
- TextBox: txtbKeyWords
- Button: btnSearch
- CheckBox: ckbFullTextSearch
- ListView: lvResults
Figure 1. The form with the controls added.
In the click event of the search button, you're going to add your connection string, create your OLE DBConnection object, and start building your query string based on some user inputs. The text box and check box controls are for user input. The value entered in the text box will be concatenated into the query string one of two ways, based on the value of the check box. The check box is there to give the user the option to do a full-text search on the files contents or to just limit the search on the title property of the document. The following block of code demonstrates how you build the query string at run time based on the user's inputs.
Before tying this back into the rest of the application, first I want to dissect some key points of the above query string. This should help point out some common pitfalls as well as key in on areas that could be expanded upon to make a fuller, richer application.
- In this line, you utilize the FREETEXT predicate to narrow your search to only files of type "DOC." The System.CanonicalType property allows you to quickly add file type to your search criteria in this case. Say that you wanted to change your search to look for Excel files. You would simply have to change this line to "XLS" or "XLSX." Of course, if you didn't want to narrow your search by file type, you could simply remove this line altogether. Finally, an option to make this sample application fuller would be to give the user a dropdown list of file types and then dynamically build this line with the appropriate extension.
- These two lines are marked to highlight the similarities between the two WHERE predicates FREETEXT and CONTAINS. Both are virtually interchangeable for most of their uses. The key difference is that FREETEXT returns a rank value where CONTAINS will only return a 0 or 1000 (match found or not). So, if you are building an application that takes relevancy into account, you'll want to use FREETEXT. (Don't forget to include System.Search.Rank in the SELECT list to see the search's rank values.)
- This line can appear to be pretty confusing because of the single and double quote mess going on. Once concatenated together, it will look similar to this.
CONTAINS ( ' "[text value]" ' )
What you have is a string contained within double quotes, contained within single quotes. Search strings always have to be contained within single quotes. The kicker is that to search for anything beyond one word at once you have to use double quotes as well. For a single word, you could get by using just the single quotes. In your application's case, because you are not restricting what the user can enter in the text box, you have to be able to handle a multi-word search.
- In this line, I wanted to point out that you can narrow your search down to a specific property. Without this property, as seen in #3, you simply search the contents of the file and not any of its associated properties. To search all properties and the file's content, you can use the wildcard character (*), without quotes, instead of a property name. This syntax is the same for both FREETEXT and CONTAINS.