InstanceBased Learning: A Java Implementation
Definition
InstanceBased Learning (IBL) is defined as the generalizing of a new instance (target) to be classified from the stored training examples. Training examples are processed when a new instance arrives. InstanceBased Learning methods are sometimes called Lazy Learning because they delay the processing until a new instance must be classified. Each time a new query instance is encountered, its relationship to the previously stored examples is examined to assign a target function value for the new instance.
To sum up the first paragraph: Search for the best match, similar match, or close match, but not necessary exact match. For example, the popular industry draughting software for architecture and engineering, called AutoCAD, contains a standard design library plus the user's previous stored designs. If a new project requires some machinery parts to be designed, the engineer would specify some design parameters (attributes) to the AutoCAD software. It will retrieve similar instances of any past design that had previously been stored in the database. If it matches exactly a previous design instance, it retrieves that one or else it retrieves the closest design match. There can be many similar instances retrieved and the best attributes of each design can be combined and used to design a completely new one.
Introduction
InstanceBased Learning (IBL) algorithms consist of simply storing the presented training examples (data). When a new instance is encountered, a set of similar, related instances is retrieved from memory and used to classify the query instance (target function).
The following are the most common IBL methods:
 kNearest Neighbour
 Locally Weighted Regression
 Radial Basis Function
IBL approaches can construct a different approximation of the target function for each distinct query instance that is to be classified. Some techniques only construct local approximation of the target function that applies in the neighbourhood of the new query instance and never construct an approximation designed to perform well over the entire instance space. This has a significant advantage when the target function is very complex, but can still be described by a collection of less complex local approximations.
kNearest Neighbour
The kNearest Neighbour algorithm is the most basic of all InstanceBased Learning (IBL) methods. The algorithm assumes all instances correspond to points in the ndimensional space R_{n}. The nearest neighbours of an instance are defined in terms of standard Euclidean geometry (distances between points in ndimensional space). More precisely, let an arbitrary instance x be described by the feature attribute lists: < a_{1}(x), a_{2}(x), a_{3}(x), ..., a_{n}(x)>, where a_{r}(x) denotes the value of the r^{th} attribute of instance x. The distance between the two instances x_{i} and x_{j} is given by Equation 1. This is the general form for calculating distance in ndimensional space.
Equation 1 
In nearestneighbour learning, the target function may be either discretevalued or realvalued. Consider learning discretevalued target functions of the form f :R^{n}>V, where V = {v_{1}, v_{2}, v_{3}, ..., v_{s}} is a finite set and R_{n} is real ndimensional space. The kNearest Neighbour algorithm for approximating a discretevalued target function is given in Algorithm 1 below:
Algorithm 1 (kNearest Neighbour algorithm for discretevalued function)Training Algorithm: Classification Algorithm:

The algorithm for continuousvalued target functions is the same as Algorithm 1 shown above, except that Step 2 is replaced with the following expression:
For example, to find the Euclidean distance, for 2dimensional space (n=2) takes two points from the plot in Figure 1 below, say point A and point B. The axes ( xaxis and yaxis) have arbitrary units.
Figure 1 
The coordinates for A = (1,6), and B = (4,3). Now use Equation 1 to calculate the distance between point A and point B, that is d(A,B), shown below:
d(A,B)  =  
=  
=  
=  4.24 (round to two decimal places) 
It is easy to calculate distance in 2dimensional space because we all have been taught how to plot graphs in high school Mathematics. In ndimensional space, the calculation is extended to the n^{th} axis. Even though we cannot visualize dimensions in more than 3D (that is x, y, and z), calculation of distances between points in ndimensional space is easy using Equation 1.
Dimension is the number of attributes. Suppose you have a database of people. David, Daniel, Patrick, and John are people's names all stored in this database as instances. Some attributes from the database could be < weight, height, test_scores, and age >, so instances are the records (rows) of table people and the attributes are columns.
Table 1name  weight (kg)  height (cm)  test_scores (out of 100%)  age (years) 
David  108.4 
177 
78 
31 
Daniel  88.2 
183 
60 
25 
Patrick  81 
175 
54 
27 
John  104 
198 
55 
38 
Page 1 of 5
This article was originally published on October 31, 2002