February 28, 2021
Hot Topics:

Instance-Based Learning: A Java Implementation

  • By Sione Palu
  • Send Email »
  • More Articles »

Training examples (items) are first run through a FilterEngine object. The FilterEngine object does a hit or miss filtering or SQL type filter, which filters out items you do not want to include for the similarity classification. For example, from the people database of Table 2, if you do not want to include any person who is shorter than 180 cm, the FilterEngine can eliminate those Training examples before the rest are passed to the SimilarityEngine for classification.

Listing 2 (computeSimilarity() method)

  public SimilarItems computeSimilarity( Items items,
                      SimilarityCriteria criteria,
                      SimilarityWeights weights,
                      DataSetStatistics statistics )
     //--- This is the item that we want to compare ourselves to
     //---   to see how similar we are
     SimilarityCriterionScores targetValues = getTargetValues
                               ( items.getTraitDescriptors(), 
                               criteria, weights, 
                               statistics );

     float maxDistance = getMaxDistance( criteria, weights );

     //--- Create a similarity descriptor for each item
     SimilarItems similarItems = new SimilarItems();
     Iterator itemList = items.iterator();

    while (itemList.hasNext()) {
      Item item = (Item) itemList.next();
      SimilarityDescription descriptor = new
      descriptor.setItem( item );

      SimilarityCriterionScores itemValues = normalizeValues(
                                item, criteria, weights,
                                statistics );
      float distance = computeDistance( targetValues, itemValues );
      float percentDifference = (distance / maxDistance);
      float percentSimilarity = (1 - percentDifference);
      descriptor.setPercentSimilarity( percentSimilarity );
      similarItems.add( descriptor );

     //--- Now that we know how similar everyone is, let's go and
     //---   and rank them so that the caller has an easy way to
     //---   sort the items and so to make it easier to select the
     //---   k-best matches

     //--- All done.
     return similarItems;

The computeSimilarity() method of the SimilarityEngine class also requires arguments of SimilarityCriteria and SimilarityWeights objects. SimilarityCriteria contains a collection of SimilarityCriterion objects. A SimilarityCriterion object describes a simple relationship made up of:

  • an attribute
  • a value
  • an operator (3 types)
    1. ~ means "around"   (Use with numbers)
    2. % means "prefer"   (Use with strings and booleans)
    3. !% means "try to avoid"   (Use with strings and booleans)

Examples of using operators, from the hypothetical database in Table 2, Training examples:

  • "age ~ 37" (literally means around 37 years of age)
  • "name % 'Daniel'" (literally means prefer Daniel)
  • "name !% 'Daniel'" (literally means avoid or to exclude Daniel)

Selection Engine is case insensitive with attributes. The filter operators used by FilterEngine are ( =, !=, <, >, <=, and >= ), which are for hit or miss (boolean-type) filtering. Example, "age < 30" will filter all years under 30.

For numeric attributes, the SimilarityEngine recognizes two special values, [MAX_VAL] and [MIN_VAL]. These are relative values rather than absolute values. The SimilarityEngine translates relative numbers into absolutes by determining the max and min values for each of the item's attributes.

Listing 2 shows the overloaded method computeSimilarity() is passed a third parameter, SimilarityWeights, which is a collection of SimilarityWeight objects. SimilarityWeight is also a "name/value" pair, where name is the name of an attribute and value is its weight. Weight can be any integer value. The default weight of all attributes is 1. Weights only affect similarity operators ( ~, %, !% ), but not filter operators ( =, !=, <, >, <=, and >= ).

The way the Selection Engine does its similartiy computation procedure is as follows:

  • Compute the maximum distance of the nth attribute.
  • Calculate the Euclidean distance between the query instance xq and the k-Nearest Neighbour of the Training Examples (items).
  • Divide the Euclidean distance by the maximum distance (a normalization process).
  • Subtract the normalized value from 1 (percentage similarities between the query instance xq and the k-Nearest Neighbour of the Training Examples, that is Items).

Listing 3 is a test class, called SelectionEngineTest, which comes with the Selection Engine's distribution.

Listing 3

import net.sourceforge.selectionengine.*;

  * This is the MAIN CONTROLING class. It calls everything else.
  * ©author baylor wetzel (Selection Engine)
  public class SelectionEngineTest {

    public static void main( String[] args ) {
      SelectionEngineTest demo = new SelectionEngineTest();
    } //--- main

  * This is the main routine. It takes items that matched a
  * given target and displays the results to stdout. It's good
  * for running tests.
  public void start( ) {
     try {
       MatchingItemsManager matchingItemsManager =
                    new MatchingItemsManager();
       DisplayManager displayManager = new DisplayManager();

       TraitDescriptors traitDescriptors =
       FilterCriteria filterCriteria =

       Items items = matchingItemsManager.getItems();
       Items filteredItems =

       SimilarItems similarItems =
       SimilarityCriteria sc =
       SimilarityWeights sw =

       displayManager.displayTraitDescriptors( traitDescriptors );
       displayManager.displayItems( traitDescriptors, items );
       displayManager.displayFilterCriteria( filterCriteria );
       displayManager.displayItems( traitDescriptors,
                                    filteredItems );
       displayManager.displaySimilarityCriteria( sc );
       displayManager.displaySimilarityWeights( sw );
       displayManager.displaySimilarItems( traitDescriptors,
                                           similarItems );
       System.out.println( "" );
      catch( Exception e) {
        Log.log( "Error: " + e );
     } //--- start

  }  //--- SelectionEngineTest

Page 3 of 5

This article was originally published on October 31, 2002

Enterprise Development Update

Don't miss an article. Subscribe to our newsletter below.

Thanks for your registration, follow us on our social networks to keep up-to-date