|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object | +--weka.datagenerators.ClusterGenerator | +--weka.datagenerators.BIRCHCluster
Cluster data generator designed for the BIRCH System Dataset is generated with instances in K clusters. Instances are 2-d data points. Each cluster is characterized by the number of data points in it its radius and its center. The location of the cluster centers is determined by the pattern parameter. Three patterns are currently supported grid, sine and random. todo: (out of: BIRCH: An Efficient Data Clustering Method for Very Large Databases; T. Zhang, R. Ramkrishnan, M. Livny; 1996 ACM) Class to generate data randomly by producing a decision list. The decision list consists of rules. Instances are generated randomly one by one. If decision list fails to classify the current instance, a new rule according to this current instance is generated and added to the decision list.
The option -V switches on voting, which means that at the end of the generation all instances are reclassified to the class value that is supported by the most rules.
This data generator can generate 'boolean' attributes (= nominal with the values {true, false}) and numeric attributes. The rules can be 'A' or 'NOT A' for boolean values and 'B < random_value' or 'B >= random_value' for numeric values.
Valid options are:
-G
The pattern for instance generation is grid.
This flag cannot be used at the same time as flag I.
The pattern is random, if neither flag G nor flag I is set.
-I
The pattern for instance generation is sine.
This flag cannot be used at the same time as flag G.
The pattern is random, if neither flag G nor flag I is set.
-N num .. num
The range of the number of instances in each cluster (default 1..50).
Lower number must be between 0 and 2500, upper number must be between
50 and 2500.
-R num .. num
The range of the radius of the clusters (default 0.1 .. SQRT(2)).
Lower number must be between 0 and SQRT(2), upper number must be between
SQRT(2) and SQRT(32).
-M num
Distance multiplier, only used if pattern is grid (default 4).
-C num
Number of cycles, only used if pattern is sine (default 4).
-O
Flag for input order is ordered. If flag is not set then input
order is randomized.
-P num
Noise rate in percent. Can be between 0% and 30% (default 0%).
(Remark: The original algorithm only allows noise up to 10%.)
-S seed
Random number seed for random function used (default 1).
Field Summary | |
static int |
GRID
|
static int |
ORDERED
|
static int |
RANDOM
|
static int |
RANDOMIZED
|
static int |
SINE
|
Constructor Summary | |
BIRCHCluster()
|
Method Summary | |
Instances |
defineDataFormat()
Initializes the format for the dataset produced. |
Instance |
generateExample()
Generate an example of the dataset. |
Instances |
generateExamples()
Generate all examples of the dataset. |
Instances |
generateExamples(java.util.Random random,
Instances format)
Generate all examples of the dataset. |
java.lang.String |
generateFinished()
Compiles documentation about the data generation after the generation process |
java.lang.String |
generateStart()
Compiles documentation about the data generation before the generation process |
Instances |
getDatasetFormat()
Gets the dataset format. |
double |
getDistMult()
Gets the distance multiplier. |
boolean |
getGridFlag()
Gets the grid flag (option G). |
int |
getInputOrder()
Gets the input order. |
java.lang.String |
getInstNums()
Gets the upper and lower boundary for instances per cluster. |
int |
getMaxInstNum()
Gets the upper boundary for instances per cluster. |
double |
getMaxRadius()
Gets the upper boundary for the radiuses of the clusters. |
int |
getMinInstNum()
Gets the lower boundary for instances per cluster. |
double |
getMinRadius()
Gets the lower boundary for the radiuses of the clusters. |
double |
getNoiseRate()
Gets the percentage of noise set. |
int |
getNumCycles()
Gets the number of cycles. |
java.lang.String[] |
getOptions()
Gets the current settings of the datagenerator BIRCHCluster. |
boolean |
getOrderedFlag()
Gets the ordered flag (option O). |
int |
getPattern()
Gets the pattern type. |
java.lang.String |
getRadiuses()
Gets the upper and lower boundary for the radius of the clusters. |
java.util.Random |
getRandom()
Gets the random generator. |
int |
getSeed()
Gets the random number seed. |
boolean |
getSineFlag()
Gets the sine flag (option S). |
boolean |
getSingleModeFlag()
Gets the single mode flag. |
java.lang.String |
globalInfo()
Returns a string describing this data generator. |
java.util.Enumeration |
listOptions()
Returns an enumeration describing the available options. |
static void |
main(java.lang.String[] argv)
Main method for testing this class. |
void |
setDatasetFormat(Instances newDatasetFormat)
Sets the dataset format. |
void |
setDefaultOptions()
Sets all options to their default values. |
void |
setDistMult(double newDistMult)
Sets the distance multiplier. |
void |
setInputOrder(int newInputOrder)
Sets the input order. |
void |
setInstNums(java.lang.String fromTo)
Sets the upper and lower boundary for instances per cluster. |
void |
setMaxInstNum(int newMaxInstNum)
Sets the upper boundary for instances per cluster. |
void |
setMaxRadius(double newMaxRadius)
Sets the upper boundary for the radiuses of the clusters. |
void |
setMinInstNum(int newMinInstNum)
Sets the lower boundary for instances per cluster. |
void |
setMinRadius(double newMinRadius)
Sets the lower boundary for the radiuses of the clusters. |
void |
setNoiseRate(double newNoiseRate)
Sets the percentage of noise set. |
void |
setNumCycles(int newNumCycles)
Sets the the number of cycles. |
void |
setOptions(java.lang.String[] options)
Parses a list of options for this object. |
void |
setPattern(int newPattern)
Sets the pattern type. |
void |
setRadiuses(java.lang.String fromTo)
Sets the upper and lower boundary for the radius of the clusters. |
void |
setRandom(java.util.Random newRandom)
Sets the random generator. |
void |
setSeed(int newSeed)
Sets the random number seed. |
Methods inherited from class weka.datagenerators.ClusterGenerator |
getClassFlag, getDebug, getNumAttributes, getNumClusters, getNumExamplesAct, getOutput, getRelationName, makeData, setClassFlag, setDebug, setNumAttributes, setNumClusters, setNumExamplesAct, setOutput, setRelationName |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
public static final int GRID
public static final int SINE
public static final int RANDOM
public static final int ORDERED
public static final int RANDOMIZED
Constructor Detail |
public BIRCHCluster()
Method Detail |
public java.lang.String globalInfo()
public void setInstNums(java.lang.String fromTo)
public java.lang.String getInstNums()
public int getMinInstNum()
public void setMinInstNum(int newMinInstNum)
newMinInstNum
- new lower boundary for instances per clusterpublic int getMaxInstNum()
public void setMaxInstNum(int newMaxInstNum)
newMaxInstNum
- new upper boundary for instances per clusterpublic void setRadiuses(java.lang.String fromTo)
public java.lang.String getRadiuses()
public double getMinRadius()
public void setMinRadius(double newMinRadius)
newMinRadius
- new lower boundary for the radiuses of the clusterspublic double getMaxRadius()
public void setMaxRadius(double newMaxRadius)
newMaxRadius
- new upper boundary for the radiuses of the clusterspublic boolean getGridFlag()
public boolean getSineFlag()
public int getPattern()
public void setPattern(int newPattern)
newPattern
- new pattern typepublic double getDistMult()
public void setDistMult(double newDistMult)
newDistMult
- new distance multiplierpublic int getNumCycles()
public void setNumCycles(int newNumCycles)
newNumCycles
- new number of cyclespublic int getInputOrder()
public void setInputOrder(int newInputOrder)
newInputOrder
- new input orderpublic boolean getOrderedFlag()
public double getNoiseRate()
public void setNoiseRate(double newNoiseRate)
newNoiseRate
- new percentage of noisepublic java.util.Random getRandom()
public void setRandom(java.util.Random newRandom)
newRandom
- is the random generator.public int getSeed()
public void setSeed(int newSeed)
newSeed
- the new random number seed.public Instances getDatasetFormat()
public void setDatasetFormat(Instances newDatasetFormat)
newDatasetFormat
- the new dataset format.public boolean getSingleModeFlag()
getSingleModeFlag
in class ClusterGenerator
public java.util.Enumeration listOptions()
listOptions
in interface OptionHandler
public void setDefaultOptions()
public void setOptions(java.lang.String[] options) throws java.lang.Exception
For list of valid options see class description.
setOptions
in interface OptionHandler
options
- the list of options as an array of strings
java.lang.Exception
- if an option is not supportedpublic java.lang.String[] getOptions()
getOptions
in interface OptionHandler
public Instances defineDataFormat() throws java.lang.Exception
defineDataFormat
in class ClusterGenerator
java.lang.Exception
- data format could not be definedpublic Instance generateExample() throws java.lang.Exception
generateExample
in class ClusterGenerator
java.lang.Exception
- if format not defined or generating public Instances generateExamples() throws java.lang.Exception
generateExamples
in class ClusterGenerator
java.lang.Exception
- if format not definedpublic Instances generateExamples(java.util.Random random, Instances format) throws java.lang.Exception
java.lang.Exception
- if format not definedpublic java.lang.String generateFinished() throws java.lang.Exception
generateFinished
in class ClusterGenerator
java.lang.Exception
- no input structure has been definedpublic java.lang.String generateStart()
generateStart
in class ClusterGenerator
public static void main(java.lang.String[] argv)
argv
- should contain arguments for the data producer:
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
Copyright (c) 2003 David Lindsay, Computer Learning Research Centre, Dept. Computer Science, Royal Holloway, University of London