October 21, 2014
Hot Topics:
RSS RSS feed Download our iPhone app

Instance-Based Learning: A Java Implementation

  • October 31, 2002
  • By Sione Palu
  • Send Email »
  • More Articles »

Definition

Instance-Based Learning (IBL) is defined as the generalizing of a new instance (target) to be classified from the stored training examples. Training examples are processed when a new instance arrives. Instance-Based Learning methods are sometimes called Lazy Learning because they delay the processing until a new instance must be classified. Each time a new query instance is encountered, its relationship to the previously stored examples is examined to assign a target function value for the new instance.

To sum up the first paragraph: Search for the best match, similar match, or close match, but not necessary exact match. For example, the popular industry draughting software for architecture and engineering, called AutoCAD, contains a standard design library plus the user's previous stored designs. If a new project requires some machinery parts to be designed, the engineer would specify some design parameters (attributes) to the AutoCAD software. It will retrieve similar instances of any past design that had previously been stored in the database. If it matches exactly a previous design instance, it retrieves that one or else it retrieves the closest design match. There can be many similar instances retrieved and the best attributes of each design can be combined and used to design a completely new one.

Introduction

Instance-Based Learning (IBL) algorithms consist of simply storing the presented training examples (data). When a new instance is encountered, a set of similar, related instances is retrieved from memory and used to classify the query instance (target function).

The following are the most common IBL methods:

  • k-Nearest Neighbour
  • Locally Weighted Regression
  • Radial Basis Function

IBL approaches can construct a different approximation of the target function for each distinct query instance that is to be classified. Some techniques only construct local approximation of the target function that applies in the neighbourhood of the new query instance and never construct an approximation designed to perform well over the entire instance space. This has a significant advantage when the target function is very complex, but can still be described by a collection of less complex local approximations.

k-Nearest Neighbour

The k-Nearest Neighbour algorithm is the most basic of all Instance-Based Learning (IBL) methods. The algorithm assumes all instances correspond to points in the n-dimensional space Rn. The nearest neighbours of an instance are defined in terms of standard Euclidean geometry (distances between points in n-dimensional space). More precisely, let an arbitrary instance x be described by the feature attribute lists: < a1(x), a2(x), a3(x), ..., an(x)>, where ar(x) denotes the value of the rth attribute of instance x. The distance between the two instances xi and xj is given by Equation 1. This is the general form for calculating distance in n-dimensional space.

Equation 1

In nearest-neighbour learning, the target function may be either discrete-valued or real-valued. Consider learning discrete-valued target functions of the form f :Rn->V, where V = {v1, v2, v3, ..., vs} is a finite set and Rn is real n-dimensional space. The k-Nearest Neighbour algorithm for approximating a discrete-valued target function is given in Algorithm 1 below:

Algorithm 1     (k-Nearest Neighbour algorithm for discrete-valued function)
Training Algorithm:
  • For each example <x, f(x)>, add the example to the list training_examples


  • Classification Algorithm:
    • Given a query instance xq to be classified,
      • Step 1: Let x1, x2, ... , xk denote the k instances from the training_examples that are nearest to xq,

      • Step 2: Return,
        where , ( argmax means maximum of function ).

    The algorithm for continuous-valued target functions is the same as Algorithm 1 shown above, except that Step 2 is replaced with the following expression:

    For example, to find the Euclidean distance, for 2-dimensional space (n=2) takes two points from the plot in Figure 1 below, say point A and point B. The axes ( x-axis and y-axis) have arbitrary units.

    Figure 1

    The coordinates for A = (1,6), and B = (4,3). Now use Equation 1 to calculate the distance between point A and point B, that is d(A,B), shown below:

    d(A,B) =
    =
    =
    = 4.24 (round to two decimal places)

    It is easy to calculate distance in 2-dimensional space because we all have been taught how to plot graphs in high school Mathematics. In n-dimensional space, the calculation is extended to the nth axis. Even though we cannot visualize dimensions in more than 3D (that is x, y, and z), calculation of distances between points in n-dimensional space is easy using Equation 1.

    Dimension is the number of attributes. Suppose you have a database of people. David, Daniel, Patrick, and John are people's names all stored in this database as instances. Some attributes from the database could be < weight, height, test_scores, and age >, so instances are the records (rows) of table people and the attributes are columns.

    Table 1
    name weight (kg) height (cm) test_scores (out of 100%) age (years)
    David
    108.4
    177
    78
    31
    Daniel
    88.2
    183
    60
    25
    Patrick
    81
    175
    54
    27
    John
    104
    198
    55
    38




    Page 1 of 5



    Comment and Contribute

     


    (Maximum characters: 1200). You have characters left.

     

     


    Sitemap | Contact Us

    Rocket Fuel