weka.clusterers
Class ClusterEvaluation

java.lang.Object
  extended byweka.clusterers.ClusterEvaluation
All Implemented Interfaces:
java.io.Serializable

public class ClusterEvaluation
extends java.lang.Object
implements java.io.Serializable

Class for evaluating clustering models.

Valid options are:

-t
Specify the training file.

-T
Specify the test file to apply clusterer to.

-d
Specify output file.

-l
Specifiy input file.

-p
Output predictions. Predictions are for the training file if only the training file is specified, otherwise they are for the test file. The range specifies attribute values to be output with the predictions. Use '-p 0' for none.

-x
Set the number of folds for a cross validation of the training data. Cross validation can only be done for distribution clusterers and will be performed if the test file is missing.

-c
Set the class attribute. If set, then class based evaluation of clustering is performed.

Version:
$Revision: 1.26 $
Author:
Mark Hall (mhall@cs.waikato.ac.nz)
See Also:
Serialized Form

Constructor Summary
ClusterEvaluation()
          Constructor.
 
Method Summary
 java.lang.String clusterResultsToString()
          return the results of clustering.
static double crossValidateModel(DensityBasedClusterer clusterer, Instances data, int numFolds, java.util.Random random)
          Perform a cross-validation for DensityBasedClusterer on a set of instances.
static java.lang.String crossValidateModel(java.lang.String clustererString, Instances data, int numFolds, java.lang.String[] options, java.util.Random random)
          Performs a cross-validation for a DensityBasedClusterer clusterer on a set of instances.
static java.lang.String evaluateClusterer(Clusterer clusterer, java.lang.String[] options)
          Evaluates a clusterer with the options given in an array of strings.
 void evaluateClusterer(Instances test)
          Evaluate the clusterer on a set of instances.
 int[] getClassesToClusters()
          Return the array (ordered by cluster number) of minimum error class to cluster mappings
 double[] getClusterAssignments()
          Return an array of cluster assignments corresponding to the most recent set of instances clustered.
 double getLogLikelihood()
          Return the log likelihood corresponding to the most recent set of instances clustered.
 int getNumClusters()
          Return the number of clusters found for the most recent call to evaluateClusterer
static void main(java.lang.String[] args)
          Main method for testing this class.
 void setClusterer(Clusterer clusterer)
          set the clusterer
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

ClusterEvaluation

public ClusterEvaluation()
Constructor. Sets defaults for each member variable. Default Clusterer is EM.

Method Detail

setClusterer

public void setClusterer(Clusterer clusterer)
set the clusterer

Parameters:
clusterer - the clusterer to use

clusterResultsToString

public java.lang.String clusterResultsToString()
return the results of clustering.

Returns:
a string detailing the results of clustering a data set

getNumClusters

public int getNumClusters()
Return the number of clusters found for the most recent call to evaluateClusterer

Returns:
the number of clusters found

getClusterAssignments

public double[] getClusterAssignments()
Return an array of cluster assignments corresponding to the most recent set of instances clustered.

Returns:
an array of cluster assignments

getClassesToClusters

public int[] getClassesToClusters()
Return the array (ordered by cluster number) of minimum error class to cluster mappings

Returns:
an array of class to cluster mappings

getLogLikelihood

public double getLogLikelihood()
Return the log likelihood corresponding to the most recent set of instances clustered.

Returns:
a double value

evaluateClusterer

public void evaluateClusterer(Instances test)
                       throws java.lang.Exception
Evaluate the clusterer on a set of instances. Calculates clustering statistics and stores cluster assigments for the instances in m_clusterAssignments

Parameters:
test - the set of instances to cluster
Throws:
java.lang.Exception - if something goes wrong

evaluateClusterer

public static java.lang.String evaluateClusterer(Clusterer clusterer,
                                                 java.lang.String[] options)
                                          throws java.lang.Exception
Evaluates a clusterer with the options given in an array of strings. It takes the string indicated by "-t" as training file, the string indicated by "-T" as test file. If the test file is missing, a stratified ten-fold cross-validation is performed (distribution clusterers only). Using "-x" you can change the number of folds to be used, and using "-s" the random seed. If the "-p" option is present it outputs the classification for each test instance. If you provide the name of an object file using "-l", a clusterer will be loaded from the given file. If you provide the name of an object file using "-d", the clusterer built from the training data will be saved to the given file.

Parameters:
clusterer - machine learning clusterer
options - the array of string containing the options
Returns:
a string describing the results
Throws:
java.lang.Exception - if model could not be evaluated successfully

crossValidateModel

public static double crossValidateModel(DensityBasedClusterer clusterer,
                                        Instances data,
                                        int numFolds,
                                        java.util.Random random)
                                 throws java.lang.Exception
Perform a cross-validation for DensityBasedClusterer on a set of instances.

Parameters:
clusterer - the clusterer to use
data - the training data
numFolds - number of folds of cross validation to perform
random - random number seed for cross-validation
Returns:
the cross-validated log-likelihood
Throws:
java.lang.Exception - if an error occurs

crossValidateModel

public static java.lang.String crossValidateModel(java.lang.String clustererString,
                                                  Instances data,
                                                  int numFolds,
                                                  java.lang.String[] options,
                                                  java.util.Random random)
                                           throws java.lang.Exception
Performs a cross-validation for a DensityBasedClusterer clusterer on a set of instances.

Parameters:
clustererString - a string naming the class of the clusterer
data - the data on which the cross-validation is to be performed
numFolds - the number of folds for the cross-validation
options - the options to the clusterer
random - a random number generator
Returns:
a string containing the cross validated log likelihood
Throws:
java.lang.Exception - if a clusterer could not be generated

main

public static void main(java.lang.String[] args)
Main method for testing this class.

Parameters:
args - the options