classifiers.usm.distance
Class USMDistanceFunction

java.lang.Object
  |
  +--coreComponents.DistanceMetric
        |
        +--classifiers.usm.distance.USMDistanceFunction
All Implemented Interfaces:
java.io.Serializable
Direct Known Subclasses:
USMStringDistance, USMWavDistance

public abstract class USMDistanceFunction
extends DistanceMetric
implements java.io.Serializable

Abstract class to implement Vitanyi's wonderful Universal Similarity Metric (USM) distance function.

Version:
$Version: 1.0$
Author:
David Lindsay (davidl@cs.rhul.ac.uk)
See Also:
Serialized Form

Constructor Summary
USMDistanceFunction(Instances data)
          Constructs a distance function object.
 
Method Summary
abstract  USMComplexityCache calculateComplexityAndComplexityStarOf(Instance i, Attribute a)
          Calculates the first K(x) and second order K(x*) complexities of an object x
abstract  double calculateConcatenatedComplexityOf(Instance x, Instance y, USMComplexityCache usmcc)
          Calculates the first K(xy*) complexity of an object x with a compressed object y*
 double calculateUSM(double kX, double kXStar, double kY, double kYStar, double kXY, double kYX)
          Function to compute the USM distance from the respective complexity estmiates from the instances x and y.
abstract  boolean checkInstance(Instance i)
          Function to check that an individual instance is of the correct format.
abstract  boolean checkInstances(Instances is)
          Function to check that the data instances are in the correct format.
 void definePrefix(java.lang.String p)
          This function defines the prefix used for this type.
 double distance(Instance x, Instance y)
          Calculates the distance (or similarity) between two instances.
 void findAttributesWithPrefix()
          The function
 int findInstanceInCache(Instance inst, Attribute a)
          Finds if complexities have been cached for an instance and if so returns the index in the cache vector.
 USMComplexityCache returnInstanceCache(int index)
          Simple function that returns the instance cache at a specified index
 java.lang.String toString()
          Converts a DistanceFunction object to a string
 void updateInstanceCache(USMComplexityCache usmcc)
          Updates the instance complexity cache
 
Methods inherited from class coreComponents.DistanceMetric
forName, reset, updateRanges
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

USMDistanceFunction

public USMDistanceFunction(Instances data)
Constructs a distance function object.

Parameters:
data - the instances the distance function should work on.
Method Detail

definePrefix

public void definePrefix(java.lang.String p)
This function defines the prefix used for this type.

Parameters:
p - the prefix string

findAttributesWithPrefix

public void findAttributesWithPrefix()
The function


checkInstances

public abstract boolean checkInstances(Instances is)
Function to check that the data instances are in the correct format.

Specified by:
checkInstances in class DistanceMetric
Parameters:
is - the instances to be checked
Returns:
true if the Instances are in the correct format for this distance function, else false

checkInstance

public abstract boolean checkInstance(Instance i)
Function to check that an individual instance is of the correct format. This may be a little inefficient with lots of distance calculations.

Parameters:
i - the instance to be checked
Returns:
true if the instance has the same fromat as initially set, else false

findInstanceInCache

public int findInstanceInCache(Instance inst,
                               Attribute a)
Finds if complexities have been cached for an instance and if so returns the index in the cache vector. If not found returns -1.


returnInstanceCache

public USMComplexityCache returnInstanceCache(int index)
Simple function that returns the instance cache at a specified index

Returns:
the instance complexity cache

calculateUSM

public double calculateUSM(double kX,
                           double kXStar,
                           double kY,
                           double kYStar,
                           double kXY,
                           double kYX)
Function to compute the USM distance from the respective complexity estmiates from the instances x and y.

Parameters:
kX - the Kolmogorov complexity of the instance object x
kXStar - the Kolmogorov complexity of the compressed instance object x (aka x* or xStar)
kY - the Kolmogorov complexity of the instance object x
kYStar - the Kolmogorov complexity of the compressed instance object x (aka x* or xStar)
kXY - the Kolmogorov complexity of the string x concatenated with y (aka xy), this does not include a comma!
kYX - the Kolmogorov complexity of the string y concatenated with x (aka yx), this does not include a comma!

distance

public double distance(Instance x,
                       Instance y)
Calculates the distance (or similarity) between two instances.

Specified by:
distance in class DistanceMetric
Parameters:
x - the first instance
y - the second instance
Returns:
the distance between the two given instances

updateInstanceCache

public void updateInstanceCache(USMComplexityCache usmcc)
Updates the instance complexity cache

Parameters:
usmcc - the complexity cache for an example

calculateComplexityAndComplexityStarOf

public abstract USMComplexityCache calculateComplexityAndComplexityStarOf(Instance i,
                                                                          Attribute a)
Calculates the first K(x) and second order K(x*) complexities of an object x

Parameters:
i - the instance to calculate the complexities for
Returns:
the complexitys in a paired array [ K(x) , K(x*) ]

calculateConcatenatedComplexityOf

public abstract double calculateConcatenatedComplexityOf(Instance x,
                                                         Instance y,
                                                         USMComplexityCache usmcc)
Calculates the first K(xy*) complexity of an object x with a compressed object y*

Parameters:
x - the instance to use
y - the instance to use the compression of
Returns:
the complexitys of the concatenated string xy (not with a comma!)

toString

public java.lang.String toString()
Converts a DistanceFunction object to a string

Specified by:
toString in class DistanceMetric
Returns:
a string describing a distance function


Copyright (c) 2003 David Lindsay, Computer Learning Research Centre, Dept. Computer Science, Royal Holloway, University of London