Instance-Based Learning: A Java Implementation
The example that comes with the distribution of the Selection Engine is a small application/applet about buying a computer from an online store. You (the buyer) will specify your target preference (query point xq ) for the computer that is closest or similar to the attributes you want. The online store has a database with different types of computers (Training examples). The target query (specification for a computer to buy) will be ranked according to its nearest neighbours. The closer the Euclidean distance to an instance of a Training example, the higher the rank(similarity is high). Selection Engine's query format, is pipe delimited and uses c and w to represent constraints and weights. The query ( xq ) looks like the format below.
c | Vendor | % | Alienware c | Vendor | !% | HP c | Vendor | != | Dell w | Vendor | 1 c | Price | ~ | [MIN_VAL] c | Price | <= | 1000 w | Price | 1 c | HD | ~ | [MAX_VAL] w | HD | 4 c | DVD | % | TRUE w | DVD | 1 c | cpu_benchmark | ~ | [MAX_VAL] w | cpu_benchmark | 5
The Selection Engine's distribution file contain a number of text files for the PC computer data file (Training examples) for the online store and also the query file for the buyer which is similar to the format displayed above.
|private static final String DEFAULT_FILE_NAME = "C:/WINDOWS/MyProject/PCdata.txt";|
|private static final String DEFAULT_FILE_NAME = "C:/WINDOWS/MyProject/PCquery.txt";|
There is a PCShoppingAsst GUI application that comes with the distribution (can be downloaded—refer to the resources). Running this application will give the results specified in the "PCquery.txt" file as shown below in Table 2.
The classifications for the closest match or similar item for your perfect computer, is ranked from highest to lowest, that is from the closet neighbours to the more distant ones. The 1-Nearest Neighbour has a Similarity = 44% : < maker=compaq, model=deskpro, cpu=650, ram=64, ..., 3.5"=yes >. The 3-Nearest Neighbours are the first three items on the list from Table 2.
Here are some of the real-world applications of Instance-Based Learning Classification:
- Law: Most work in artificial intelligence and law has concentrated on modelling the type of reasoning done by trial lawyers. In fact, most lawyers' work involves planning. For example, wills and trusts, real estate deals, and business mergers and acquisitions. Certain planning issues, such as the use of underspecified or "open-textured" rules, are illustrated clearly in this domain.
During the 1990s, the University of Massachusetts worked on a series of projects in the area of artificial intelligence and law, emphasizing particularly algorithmic and control issues. One of the software programs from these projects is called HYPO.
HYPO illustrates a use of cases that takes a sequence of facts as input. The target problem (query) is to determine whether the facts satisfy certain legal requirements (in this case, whether they constitute a trade secrets violation). HYPO generates arguments based on previous cases (Training Examples). User input and cases are both represented in simplified form, using a standard "legal caseframe" to hold the important facts of the case. Legal case frames also include such information as the date of the decision, the court deciding the case, and the official citation. Each of the cases in HYPO's case base can be retrieved using a fixed set of indices (termed "dimensions"); dimensions are also used to compare cases. When the user inputs a legal situation, expressed in the legal case-frame language, HYPO uses this representation to calculate the applicability of dimensions to the situation, and then uses these values to index into relevant cases. The system then organizes the retrieved cases into a subset lattice based on the dimensions each retrieved case shares with the current situation and uses the lattice in constructing arguments for both plaintiff and defendant. Arguments are constructed by reciting the dimensions shared with favorable cases and the differences from unfavorable cases (nonshared dimensions or differences in the values of shared dimensions).
- Medicine: CBR has been used for medical diagnosis software over the past decade which the system diagnoses such diseases as heart failure. As input it uses a patient's symptoms and produces a causal network of possible internal states that could lead to those symptoms. When a new case arises, the software tries to find cases of patients with similar but not necessarily identical symptoms. If the new case matches, the CBR system adapts the retrieved diagnosis by considering differences in symptoms between the old and new cases. The CBR system stores explanations on making use of the solution in its cases. Medical CBR systems have also been integrated with different disciplines of knowledge engineering such as rule-based. There has been a significant advance in medical CBR toward diagnostic systems that can effectively learn from experience.
- E-commerce: The number of electronic catalogs has grown rapidly during the past few years. Most of these catalogs use standard databases for storing and retrieving product information. Using ordinary databases for product catalogs, however, has the major drawback that it is often very difficult to find the products desired: very often, the database does not return a matching product at all or it returns many products that have to be examined manually. To overcome this problem, Case-Based Reasoning (CBR) techniques has been adopted for commercial electronic catalogs as an approach to requirement-oriented retrieval of products. CBR incorporates product knowledge into the database by means of a similarity measure.
Many commercial electronic vendors offer product information over the Internet and most of their customers use this service frequently. The majority of these catalogs employ standard database approaches for storage and retrieval of product information. Whereas today, databases are well understood and many off-the-shelf solutions exist, we claim that they can only be used efficiently by advanced users—users who are already very familiar with the contents of the database. This is especially a problem for electronic catalogs meant to be used directly by end consumers or engineers who, in many cases, are not experts in the family of products they are looking for, or at least do not have a good overview over all the products and their specialties inside the catalog. Imagine an electronics engineer looking for a device to integrate into her new circuit that pushes the limits of the most advanced devices from her catalog. How can this person select the device best fitting her needs if she is no expert in the area of devices she is looking for? A standard database offers no support for her.
- Design Re-use: The design of electronic circuits is a discipline in which two contrasting tendencies can be observed: On the one hand, modern circuit designs get ever more complex and difficult to handle by electrical engineers. On the other hand, the global competition requires a continuous reduction of development times. The correctness and reliability of the designs should, of course, not suffer from shorter development cycles.
These requirements have become so dominant that they cannot be met anymore without extensive utilization of design reuse. It is getting vitally important for an electrical engineer to re-use old designs (or parts of them) and not to re-design a new application from scratch. Re-using designs from the past requires that the engineer has enough experience and knowledge about existing designs, to be able to find candidates that are suitable for reuse in his specific new situation. Searching databases of existing designs can be an extremely time-consuming task because there are currently no intelligent tools to support the engineer in deciding whether a given design from a database can be easily adapted to meet the specification of his new application. Because of this, until now the most effective way of designing is designer re-use.
CBR techniques uses a similarity-based approach that relies on suitable heuristics and makes decisions in a way very similar to the decision-making process of a human designer. CBR is employed to suggest old designs that are re-usable for a given new task.
- Help Desk: CBR has been applied successfully for the concept of help desk automation by offering World Wide Web access to a casebased help desk. It explores the use of case-based reasoning to create an "intelligent" help desk system that learns. NASA has developed a CBR help desk for its data users.
Many organizations, particularly in computer hardware and software areas, provide extensive customer support via telephone "help desks." This assistance depends primarily on human expertise to respond to user questions. In many cases, the help desk staff must answer the same questions repeatedly. Such support is expensive and requires a large technical staff. Due to budget limitations, many organizations are trying to serve an increasing user population while either maintaining or reducing the size of their help desk staff. A CBR system fits perfectly for help desk applications because many requests for help are reoccurring or variations of previous requests (similar cases).
Currently, most CBR systems for help desk applications are designed to provide in-house staff support. Typically, these tools are used by help desk staff to personally respond to requests for assistance—a predominantly manual system with the case base providing decision support for help desk staff . More recently, however, attention has focused on designing tools that allow users requiring assistance to access the case base themselves—that is help desk automation.
Page 4 of 5