|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--weka.classifiers.Classifier | +--weka.classifiers.rules.JRip
This class implements a propositional rule learner, Repeated Incremental Pruning to Produce Error Reduction (RIPPER), which is proposed by William W. Cohen as an optimized version of IREP.
The algorithm is briefly described as follows:
Initialize RS = {}, and for each class from the less prevalent one to the more frequent one, DO:
1. Building stage: repeat 1.1 and 1.2 until the descrition length (DL) of the ruleset and examples is 64 bits greater than the smallest DL met so far, or there are no positive examples, or the error rate >= 50%.
1.1. Grow phase:
Grow one rule by greedily adding antecedents (or conditions) to
the rule until the rule is perfect (i.e. 100% accurate). The
procedure tries every possible value of each attribute and selects
the condition with highest information gain: p(log(p/t)-log(P/T)).
1.2. Prune phase:
Incrementally prune each rule and allow the pruning of any
final sequences of the antecedents;
The pruning metric is (p-n)/(p+n) -- but it's actually
2p/(p+n) -1, so in this implementation we simply use p/(p+n)
(actually (p+1)/(p+n+2), thus if p+n is 0, it's 0.5).
2. Optimization stage: after generating the initial ruleset {Ri},
generate and prune two variants of each rule Ri from randomized data
using procedure 1.1 and 1.2. But one variant is generated from an
empty rule while the other is generated by greedily adding antecedents
to the original rule. Moreover, the pruning metric used here is
(TP+TN)/(P+N).
Then the smallest possible DL for each variant and the original rule
is computed. The variant with the minimal DL is selected as the final
representative of Ri in the ruleset.
After all the rules in {Ri} have been examined and if there are still
residual positives, more rules are generated based on the residual
positives using Building Stage again.
3. Delete the rules from the ruleset that would increase the DL of the whole ruleset if it were in it. and add resultant ruleset to RS.
ENDDO
Note that there seem to be 2 bugs in the ripper program that would affect the ruleset size and accuracy slightly. This implementation avoids these bugs and thus is a little bit different from Cohen's original implementation. Even after fixing the bugs, since the order of classes with the same frequency is not defined in ripper, there still seems to be some trivial difference between this implementation and the original ripper, especially for audiology data in UCI repository, where there are lots of classes of few instances.
If wrapped by other classes, typical usage of this class is:
JRip rip = new JRip();
Instances data = ... // Data from somewhere
double[] orderedClasses = ... // Get the ordered class counts for the data
double expFPRate = ... // Calculate the expected FP/(FP+FN) rate
double classIndex = ... // The class index for which ruleset is built
// DL of default rule, no theory DL, only data DL
double defDL = RuleStats.dataDL(expFPRate, 0.0, data.sumOfWeights(),
0.0, orderedClasses[(int)classIndex]);
rip.rulesetForOneClass(expFPRate, data, classIndex, defDL);
RuleStats rulesetStats = rip.getRuleStats(0);
// Can get heaps of information from RuleStats, e.g. combined DL,
// simpleStats, etc.
double comDL = rulesetStats.combinedDL(expFPRate, classIndex);
int whichRule = ... // Want simple stats of which rule?
double[] simpleStats = rulesetStats.getSimpleStats(whichRule);
...
Details please see "Fast Effective Rule Induction", William W. Cohen,
'Machine Learning: Proceedings of the Twelfth International Conference'
(ML95).
PS. We have compared this implementation with the original ripper implementation in aspects of accuracy, ruleset size and running time on both artificial data "ab+bcd+defg" and UCI datasets. In all these aspects it seems to be quite comparable to the original ripper implementation. However, we didn't consider memory consumption optimization in this implementation.
Constructor Summary | |
JRip()
|
Method Summary | |
void |
buildClassifier(Instances instances)
Builds Ripper in the order of class frequencies. |
java.lang.String |
checkErrorRateTipText()
Returns the tip text for this property |
java.lang.String |
debugTipText()
Returns the tip text for this property |
double[] |
distributionForInstance(Instance datum)
Classify the test instance with the rule learner and provide the class distributions |
java.util.Enumeration |
enumerateMeasures()
Returns an enumeration of the additional measure names |
java.lang.String |
foldsTipText()
Returns the tip text for this property |
boolean |
getCheckErrorRate()
|
boolean |
getDebug()
Get whether debugging is turned on. |
int |
getFolds()
|
double |
getMeasure(java.lang.String additionalMeasureName)
Returns the value of the named measure |
double |
getMinNo()
|
int |
getOptimizations()
|
java.lang.String[] |
getOptions()
Gets the current settings of the Classifier. |
FastVector |
getRuleset()
Get the ruleset generated by Ripper |
RuleStats |
getRuleStats(int pos)
Get the statistics of the ruleset in the given position |
long |
getSeed()
|
boolean |
getUsePruning()
|
java.lang.String |
globalInfo()
Returns a string describing classifier |
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options Valid options are: |
static void |
main(java.lang.String[] args)
Main method. |
java.lang.String |
minNoTipText()
Returns the tip text for this property |
java.lang.String |
optimizationsTipText()
Returns the tip text for this property |
java.lang.String |
seedTipText()
Returns the tip text for this property |
void |
setCheckErrorRate(boolean d)
|
void |
setDebug(boolean d)
Set debugging mode. |
void |
setFolds(int fold)
|
void |
setMinNo(double m)
|
void |
setOptimizations(int run)
|
void |
setOptions(java.lang.String[] options)
Parses a given list of options. |
void |
setSeed(long s)
|
void |
setUsePruning(boolean d)
|
java.lang.String |
toString()
Prints the all the rules of the rule learner. |
java.lang.String |
usePruningTipText()
Returns the tip text for this property |
Methods inherited from class weka.classifiers.Classifier |
classifyInstance, forName, makeCopies |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait |
Constructor Detail |
public JRip()
Method Detail |
public java.lang.String globalInfo()
public java.util.Enumeration listOptions()
-F number
The number of folds for reduced error pruning. One fold is
used as the pruning set. (Default: 3)
-N number
The minimal weights of instances within a split.
(Default: 2)
-O number
Set the number of runs of optimizations. (Default: 2)
-D
Whether turn on the debug mode
-S number
The seed of randomization used in Ripper.(Default: 1)
-E
Whether NOT check the error rate >= 0.5 in stopping criteria.
(default: check)
-P
Whether NOT use pruning. (default: use pruning)
listOptions
in interface OptionHandler
listOptions
in class Classifier
public void setOptions(java.lang.String[] options) throws java.lang.Exception
setOptions
in interface OptionHandler
setOptions
in class Classifier
options
- the list of options as an array of strings
java.lang.Exception
- if an option is not supportedpublic java.lang.String[] getOptions()
getOptions
in interface OptionHandler
getOptions
in class Classifier
public java.util.Enumeration enumerateMeasures()
enumerateMeasures
in interface AdditionalMeasureProducer
public double getMeasure(java.lang.String additionalMeasureName)
getMeasure
in interface AdditionalMeasureProducer
additionalMeasureName
- the name of the measure to query for its value
java.lang.IllegalArgumentException
- if the named measure is not supportedpublic java.lang.String foldsTipText()
public void setFolds(int fold)
public int getFolds()
public java.lang.String minNoTipText()
public void setMinNo(double m)
public double getMinNo()
public java.lang.String seedTipText()
public void setSeed(long s)
public long getSeed()
public java.lang.String optimizationsTipText()
public void setOptimizations(int run)
public int getOptimizations()
public java.lang.String debugTipText()
debugTipText
in class Classifier
public void setDebug(boolean d)
Classifier
setDebug
in class Classifier
d
- true if debug output should be printedpublic boolean getDebug()
Classifier
getDebug
in class Classifier
public java.lang.String checkErrorRateTipText()
public void setCheckErrorRate(boolean d)
public boolean getCheckErrorRate()
public java.lang.String usePruningTipText()
public void setUsePruning(boolean d)
public boolean getUsePruning()
public FastVector getRuleset()
public RuleStats getRuleStats(int pos)
pos
- the position of the stats, assuming correctpublic void buildClassifier(Instances instances) throws java.lang.Exception
buildClassifier
in class Classifier
instances
- the training data
java.lang.Exception
- if classifier can't be built successfullypublic double[] distributionForInstance(Instance datum)
distributionForInstance
in class Classifier
datum
- the instance to be classified
public java.lang.String toString()
toString
in class java.lang.Object
public static void main(java.lang.String[] args)
args
- the options for the classifier
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
Copyright (c) 2003 David Lindsay, Computer Learning Research Centre, Dept. Computer Science, Royal Holloway, University of London