OnlineEvaluation (clrc1.0)

Overview

Package

Class

Tree

Index

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

evaluationMethods
Class OnlineEvaluation

java.lang.Object
  |
  +--evaluationMethods.OnlineEvaluation

public class OnlineEvaluation
extends java.lang.Object

Class for evaluating machine learning models in the online setting.

-------------------------------------------------------------------

General options when evaluating a learning scheme from the command-line:

-t filename
Name of the file with the training data. (required)

-T filename
Name of the file with the test data. This will be joined to the training data for the online test. (optional: if not specified then the training file will just be used)

-c index
Index of the class attribute (1, 2, ...; default: last).

-S gapSize
Switch to slow teacher with fixed gap (lag) size

-L gapSize
Switch to lazy teacher with fixed gap size

-x number
The number of folds for the cross-validation (default: 10).

-s seed
Random number seed for the cross-validation (default: 1).

-m filename
The name of a file containing a cost matrix.

-l filename
Loads classifier from the given file.

-d filename
Saves classifier built from the training data into the given file.

-v
Outputs no statistics for the training data.

-o
Outputs statistics only, not the classifier.

-i
Outputs information-retrieval statistics per class.

-k
Outputs information-theoretic statistics.

-p range
Outputs predictions for test instances, along with the attributes in the specified range (and nothing else). Use '-p 0' if no attributes are desired.

-b numberOfBins
Sets the number of bins to create for the calibration of the probabilities (default 10)
-r
Outputs cumulative margin distribution (and nothing else).

-g
Only for classifiers that implement "Graphable." Outputs the graph representation of the classifier (and nothing else).

-n significance level
Specifies the significance level to be used in the online setting.

-P
Outputs the p-values or probabilities for each prediction in the online setting (default: switched off - to save space) -S lagSize
Sets the learning machine into slow teaching more with a fixed lag size

-A lagBase,lagPower
Specify a slow teacher mode with growing Arithmetic Progression lag. The lag is created using the recursive function lag_0 = lagBase; lag_i+1 = lag_i*lagPower;

-B lagBase,lagPower
Specify a slow teacher mode with growing Geometric Progression lag. The lag is created using the recursive function lag_0 = lagBase; lag_i+1 = lag_i^lagPower;

-L gapSize
Sets the learning machine into lazy teaching more with a fixed gap size. Lazy updates at each instance that is introduced.

-U gapBase,gapPower
Specify a lazy teacher mode with growing Arithmetic Progression gap. The gap is created using the recursive function gap_0 = Base; gap_i+1 = gap_i*gapPower;

-V gapBase,gapPower
Specify a lazy teacher mode with growing Geometric Progression gap. The gap is created using the recursive function gap_0 = gapBase; gap_i+1 = gap_i^lagPower;

-N (no parameters)
Specifies a normal online experiment

-E bernoulliProb,randomSeed
Creates an erratic teaching plan. This will essentially introduce training data at every success in a bernoulli trial with probability = bernoulliProb (must be between 0 and 1), the randomSeed (must be long integer) is used to create the random numbers

-X data not from file
This is a very rarely used option but allows training data to be passed without reading it from a file. Instead we use function setData() to pass an Instances object.

-M filename of the matlab program
This option converts the output of the online experiment into a line plot hardcoded in Matlab code

Version:: $Revision: 1.00 $
Author:: David Lindsay (davidl@cs.rhul.ac.uk)

Field Summary

double m_ErraticBernoulliProb
          Defines the probability of introducing a training example in the online setting

long m_ErraticRandomSeed
          Defines the random seed used to generate the erratic experience plan

int m_SlowLazyFixedGap
          Defines the fixed gap size for both the slow and lazy settings

double m_SlowLazyGrowGapBase
          Defines the base in the Arithmetic/Geometric progression growing gap

double m_SlowLazyGrowGapPower
          Defines the base in the Arithmetic/Geometric progression growing gap

static int ONLINE_ERRATIC


static int ONLINE_LAZY_AP_GAP


static int ONLINE_LAZY_FIXED


static int ONLINE_LAZY_GP_GAP


static int ONLINE_NORMAL


static int ONLINE_SLOW_AP_GAP


static int ONLINE_SLOW_FIXED


static int ONLINE_SLOW_GP_GAP


static Tag[] TAGS_ONLINE_MODE


Constructor Summary

OnlineEvaluation()


Method Summary

double[] calculateConfidencePerformanceStatistics(double sigLevel)
          Creates an array of numbers reporting the performance of the p-values of each prediction at a set significance level.

java.lang.String createConfidenceCalibrationHistogram()
          Creates a histogram tracking the performance of the confidence predictions at various significance levels.

java.lang.String evaluateModelOnline(java.lang.String classifierName, java.lang.String[] argv)
          Evaluate the classifier model online.

java.lang.String evaluateProbabilityCalibration()
          Creates a string reporting the calibration performance of the probabilities for each prediction

java.lang.String evalulatePValuesAndProbs(double sigLevel)
          Deprecated. now use the incremental stats methods instead of this clumsy batch function Creates a string reporting the performance of the p-values of each prediction at a set significance level

int getNumberOfBins()
          Gets the number of bins used in the probability calibration histograms

SelectedTag getOnlineMode()
          Gets the chosen online learning mode.

java.lang.String getOptions()
          Gets the current settings of the online experiment.

boolean getOutputPValuesAndProbs()
          Reports whether p-values and probabilities are output for each example in the online experiment

double getPerformanceStat(java.lang.String statName, int trialNum)
          Gets the value for a particular numeric statistic at a particular trial number.

double[] getPredictionOfLastExample()
          This will return the last prediction made in the online process.

double getSignificanceLevel()
          Sets the significance level for a confidence classifier in an online experiment.

static void main(java.lang.String[] args)
          A test method for this class.

java.lang.String ouputOnlineSummaryString(Classifier c)
          Used to replace the old method to summarise the online experiment

java.lang.String outputDebugPValuesProbs()
          Creates an output string of the p-values or probabilities output by the learning machine at the last trial

java.lang.String outputDebugStatOutput(Classifier c)
          Gives a brief summary of some of that stats as you go along.

void plotGraph()
          This an attempt to add a graphical element to this package allowing to plot performance on a graph.

void setData(Instances data)
          Sets the data set to be used in the online experiment.

void setNumberOfBins(int numBins)
          Set the number of calibration bins to use in testing the calibration of the online probabilities

void setOnlineMode(SelectedTag newMode)
          Sets the online mode used.

void setOutputPValuesAndProbs(boolean switchOn)
          Sets whether to output the probabilities or p-values are to be output by the classifier for each prediction in the online experiment.

void setSignificanceLevel(double sigLevel)
          Sets the significance level for a confidence classifier in an online experiment.

java.lang.String toCalibrationHistogramString()
          Outputs the probability calibration histogram of the data

void updateConfidenceClassifierStats(double[] pValues)
          Updates the confidence statistics.

void updateDistributionClassifierStats(double[] probs)
          Updates the distribution classifier statistics.

void updateStandardStatistics(Instance instance, double[] pValuesAndProbs, double sigLevel)
          A method used to update the standard statistics based on some probabilities and p-values output by a classifier.

void updateVennProbabilityClassifierStats(Matrix vennProbMatrix)
          Updates the Venn probability statistics.

Methods inherited from class java.lang.Object

equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

ONLINE_NORMAL

public static final int ONLINE_NORMAL

See Also:: Constant Field Values

ONLINE_SLOW_FIXED

public static final int ONLINE_SLOW_FIXED

See Also:: Constant Field Values

ONLINE_SLOW_AP_GAP

public static final int ONLINE_SLOW_AP_GAP

See Also:: Constant Field Values

ONLINE_SLOW_GP_GAP

public static final int ONLINE_SLOW_GP_GAP

See Also:: Constant Field Values

ONLINE_LAZY_FIXED

public static final int ONLINE_LAZY_FIXED

See Also:: Constant Field Values

ONLINE_LAZY_AP_GAP

public static final int ONLINE_LAZY_AP_GAP

See Also:: Constant Field Values

ONLINE_LAZY_GP_GAP

public static final int ONLINE_LAZY_GP_GAP

See Also:: Constant Field Values

ONLINE_ERRATIC

public static final int ONLINE_ERRATIC

See Also:: Constant Field Values

TAGS_ONLINE_MODE

public static final Tag[] TAGS_ONLINE_MODE

m_SlowLazyFixedGap

public int m_SlowLazyFixedGap

Defines the fixed gap size for both the slow and lazy settings

m_SlowLazyGrowGapBase

public double m_SlowLazyGrowGapBase

Defines the base in the Arithmetic/Geometric progression growing gap

m_SlowLazyGrowGapPower

public double m_SlowLazyGrowGapPower

Defines the base in the Arithmetic/Geometric progression growing gap

m_ErraticBernoulliProb

public double m_ErraticBernoulliProb

Defines the probability of introducing a training example in the online setting

m_ErraticRandomSeed

public long m_ErraticRandomSeed

Defines the random seed used to generate the erratic experience plan

Constructor Detail

OnlineEvaluation

public OnlineEvaluation()

Method Detail

getPerformanceStat

public double getPerformanceStat(java.lang.String statName,
                                 int trialNum)

Gets the value for a particular numeric statistic at a particular trial number.

updateStandardStatistics

public void updateStandardStatistics(Instance instance,
                                     double[] pValuesAndProbs,
                                     double sigLevel)

A method used to update the standard statistics based on some probabilities and p-values output by a classifier.

Parameters:: instance - the instance the prediction is made for.; pValuesAndProbs - a double array containing the p-values and probabilities.; sigLevel - the significance level considered by the learning machine

updateConfidenceClassifierStats

public void updateConfidenceClassifierStats(double[] pValues)

Updates the confidence statistics.

updateDistributionClassifierStats

public void updateDistributionClassifierStats(double[] probs)

Updates the distribution classifier statistics.

updateVennProbabilityClassifierStats

public void updateVennProbabilityClassifierStats(Matrix vennProbMatrix)

Updates the Venn probability statistics.

ouputOnlineSummaryString

public java.lang.String ouputOnlineSummaryString(Classifier c)

Used to replace the old method to summarise the online experiment

outputDebugStatOutput

public java.lang.String outputDebugStatOutput(Classifier c)

Gives a brief summary of some of that stats as you go along.

outputDebugPValuesProbs

public java.lang.String outputDebugPValuesProbs()

Creates an output string of the p-values or probabilities output by the learning machine at the last trial

main

public static void main(java.lang.String[] args)

A test method for this class. Just extracts the first command line argument as a classifier class name and calls evaluateModelOnline.

Parameters:: args - an array of command line arguments, the first of which must be the class name of a classifier.

setOnlineMode

public void setOnlineMode(SelectedTag newMode)

Sets the online mode used. Values other than ONLINE_NORMAL, ONLINE_SLOW_FIXED, ONLINE_SLOW_AP_GAP, ONLINE_SLOW_GP_GAP, ONLINE_LAZY_FIXED, ONLINE_LAZY_AP_GAP, ONLINE_LAZY_GP_GAP, ONLINE_ERRATIC

Parameters:: newMode - the chosen online mode

getOnlineMode

public SelectedTag getOnlineMode()

Gets the chosen online learning mode. Values other than ONLINE_NORMAL, ONLINE_SLOW_FIXED, ONLINE_SLOW_AP_GAP, ONLINE_SLOW_GP_GAP, ONLINE_LAZY_FIXED, ONLINE_LAZY_AP_GAP, ONLINE_LAZY_GP_GAP, ONLINE_ERRATIC

Returns:: the online learning mode

setOutputPValuesAndProbs

public void setOutputPValuesAndProbs(boolean switchOn)

Sets whether to output the probabilities or p-values are to be output by the classifier for each prediction in the online experiment. This is default set to false, the output can get huge so you may want to keep this switched off for large data sets.

Parameters:: switchOn - a boolean flag specifying whether this is switch on output of probs and p-values

getOutputPValuesAndProbs

public boolean getOutputPValuesAndProbs()

Reports whether p-values and probabilities are output for each example in the online experiment

Returns:: whether output of probs and p-values is switched on

setSignificanceLevel

public void setSignificanceLevel(double sigLevel)
                          throws java.lang.Exception

Sets the significance level for a confidence classifier in an online experiment.

Parameters:: sigLevel - the significance level to mark p-values against; java.lang.Exception

getSignificanceLevel

public double getSignificanceLevel()

Sets the significance level for a confidence classifier in an online experiment.

getOptions

public java.lang.String getOptions()

Gets the current settings of the online experiment.

Returns:: an array of strings suitable for printing to screen

evaluateProbabilityCalibration

public java.lang.String evaluateProbabilityCalibration()
                                                throws java.lang.Exception

Creates a string reporting the calibration performance of the probabilities for each prediction

Returns:: a string of text representing the histogram that can be plotted using MATLAB?
Throws:: java.lang.Exception - if number of pvalues and training examples dont match, should never happen!

toCalibrationHistogramString

public java.lang.String toCalibrationHistogramString()
                                              throws java.lang.Exception

Outputs the probability calibration histogram of the data

Returns:: the probability calibration histogram in string format
Throws:: java.lang.Exception - if the class attribute is numeric

calculateConfidencePerformanceStatistics

public double[] calculateConfidencePerformanceStatistics(double sigLevel)
                                                  throws java.lang.Exception

Creates an array of numbers reporting the performance of the p-values of each prediction at a set significance level. The figures returned can then be accumalated for creating a confidence calibration histogram.

Parameters:: sigLevel - the chosen significance level to evaluate the p-values at
Returns:: an array of doubles returning the performance statistics at a set significance level; java.lang.Exception

createConfidenceCalibrationHistogram

public java.lang.String createConfidenceCalibrationHistogram()
                                                      throws java.lang.Exception

Creates a histogram tracking the performance of the confidence predictions at various significance levels. The resolution is determined by the number of bins specified.

Returns:: a string of text representing the table of results at various significance levels.; java.lang.Exception

setNumberOfBins

public void setNumberOfBins(int numBins)
                     throws java.lang.Exception

Set the number of calibration bins to use in testing the calibration of the online probabilities

Parameters:: numBins - this is the number of bins to use; java.lang.Exception

getNumberOfBins

public int getNumberOfBins()

Gets the number of bins used in the probability calibration histograms

Returns:: the number of bins

setData

public void setData(Instances data)
             throws java.lang.Exception

Sets the data set to be used in the online experiment. Alternative to using the -t command for reading training data from an arff file.

Parameters:: data - the data set to be tested
Throws:: if - instances are null; java.lang.Exception

evaluateModelOnline

public java.lang.String evaluateModelOnline(java.lang.String classifierName,
                                            java.lang.String[] argv)
                                     throws java.lang.Exception

Evaluate the classifier model online. Both distribution and confidence classifiers treated the same (treat probabilities as p-values and vice versa).

java.lang.Exception

plotGraph

public void plotGraph()

This an attempt to add a graphical element to this package allowing to plot performance on a graph. This is not working yet please dont try to use it!!!

getPredictionOfLastExample

public double[] getPredictionOfLastExample()
                                    throws java.lang.Exception

This will return the last prediction made in the online process. Useful for forecasting!

Returns:: the probs (or p-values) predicted for the last example
Throws:: if - no predictions have been made; java.lang.Exception

evalulatePValuesAndProbs

public java.lang.String evalulatePValuesAndProbs(double sigLevel)
                                          throws java.lang.Exception

Deprecated. now use the incremental stats methods instead of this clumsy batch function Creates a string reporting the performance of the p-values of each prediction at a set significance level

Parameters:: sigLevel - the chosen significance level to evaluate the p-values
Returns:: a string of text representing the table that can be plotted using MATLAB?; java.lang.Exception