weka.associations
Class PriorEstimation

java.lang.Object
  extended byweka.associations.PriorEstimation
All Implemented Interfaces:
java.io.Serializable

public class PriorEstimation
extends java.lang.Object
implements java.io.Serializable

Class implementing the prior estimattion of the predictive apriori algorithm for mining association rules. Reference: T. Scheffer (2001). Finding Association Rules That Trade Support Optimally against Confidence. Proc of the 5th European Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD'01), pp. 424-435. Freiburg, Germany: Springer-Verlag.

Version:
$Revision: 1.3 $
Author:
Stefan Mutter (mutter@cs.waikato.ac.nz)
See Also:
Serialized Form

Constructor Summary
PriorEstimation(Instances instances, int numRules, int numIntervals, boolean car)
          Constructor
 
Method Summary
 RuleItem addCons(int[] itemArray)
          generates a class association rule out of a given premise.
 void buildDistribution(double conf, double length)
          updates the distribution of the confidence values.
 double calculatePriorSum(boolean weighted, double mPoint)
          calculates the numerator and the denominator of the prior equation
 java.util.Hashtable estimatePrior()
          Method to estimate the prior probabilities
 double findIntervall(double conf)
          searches the mid point of the interval a given confidence value falls into
 void generateDistribution()
          Calculates the prior distribution.
 double[] getMidPoints()
          returns an ordered array of all mid points
static double logbinomialCoefficient(int upperIndex, int lowerIndex)
          Method that calculates the base 2 logarithm of a binomial coefficient
 double midPoint(double size, int number)
          calculates the mid point of an interval
 void midPoints()
          split the interval [0,1] into a predefined number of intervals and calculates their mid points
 int[] randomCARule(int maxLength, int actualLength, java.util.Random randNum)
          Constructs an item set of certain length randomly.
 int[] randomRule(int maxLength, int actualLength, java.util.Random randNum)
          Constructs an item set of certain length randomly.
 RuleItem splitItemSet(int premiseLength, int[] itemArray)
          splits an item set into premise and consequence and constructs therefore an association rule.
 void updateCounters(ItemSet itemSet)
          updates the support count of an item set
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

PriorEstimation

public PriorEstimation(Instances instances,
                       int numRules,
                       int numIntervals,
                       boolean car)
Constructor

Parameters:
instances - the instances to be used for generating the associations
numRules - the number of random rules used for generating the prior
numIntervals - the number of intervals to discretise [0,1]
car - flag indicating whether standard or class association rules are mined
Method Detail

generateDistribution

public final void generateDistribution()
                                throws java.lang.Exception
Calculates the prior distribution.

Throws:
java.lang.Exception - if prior can't be estimated successfully

randomRule

public final int[] randomRule(int maxLength,
                              int actualLength,
                              java.util.Random randNum)
Constructs an item set of certain length randomly. This method is used for standard association rule mining.

Parameters:
maxLength - the number of attributes of the instances
actualLength - the number of attributes that should be present in the item set
randNum - the random number generator
Returns:
a randomly constructed item set in form of an int array

randomCARule

public final int[] randomCARule(int maxLength,
                                int actualLength,
                                java.util.Random randNum)
Constructs an item set of certain length randomly. This method is used for class association rule mining.

Parameters:
maxLength - the number of attributes of the instances
actualLength - the number of attributes that should be present in the item set
randNum - the random number generator
Returns:
a randomly constructed item set in form of an int array

buildDistribution

public final void buildDistribution(double conf,
                                    double length)
updates the distribution of the confidence values. For every confidence value the interval to which it belongs is searched and the confidence is added to the confidence already found in this interval.

Parameters:
conf - the confidence of the randomly created rule
length - the legnth of the randomly created rule

findIntervall

public final double findIntervall(double conf)
searches the mid point of the interval a given confidence value falls into

Parameters:
conf - the confidence of a rule
Returns:
the mid point of the interval the confidence belongs to

calculatePriorSum

public final double calculatePriorSum(boolean weighted,
                                      double mPoint)
calculates the numerator and the denominator of the prior equation

Parameters:
weighted - indicates whether the numerator or the denominator is calculated
mPoint - the mid Point of an interval
Returns:
the numerator or denominator of the prior equation

logbinomialCoefficient

public static final double logbinomialCoefficient(int upperIndex,
                                                  int lowerIndex)
Method that calculates the base 2 logarithm of a binomial coefficient

Parameters:
upperIndex - upper Inedx of the binomial coefficient
lowerIndex - lower index of the binomial coefficient
Returns:
the base 2 logarithm of the binomial coefficient

estimatePrior

public final java.util.Hashtable estimatePrior()
                                        throws java.lang.Exception
Method to estimate the prior probabilities

Returns:
a hashtable containing the prior probabilities
Throws:
java.lang.Exception - throws exception if the prior cannot be calculated

midPoints

public final void midPoints()
split the interval [0,1] into a predefined number of intervals and calculates their mid points


midPoint

public double midPoint(double size,
                       int number)
calculates the mid point of an interval

Parameters:
size - the size of each interval
number - the number of the interval. The intervals are numbered from 0 to m_numIntervals.
Returns:
the mid point of the interval

getMidPoints

public final double[] getMidPoints()
returns an ordered array of all mid points

Returns:
an ordered array of doubles conatining all midpoints

splitItemSet

public final RuleItem splitItemSet(int premiseLength,
                                   int[] itemArray)
splits an item set into premise and consequence and constructs therefore an association rule. The length of the premise is given. The attributes for premise and consequence are chosen randomly. The result is a RuleItem.

Parameters:
premiseLength - the length of the premise
itemArray - a (randomly generated) item set
Returns:
a randomly generated association rule stored in a RuleItem

addCons

public final RuleItem addCons(int[] itemArray)
generates a class association rule out of a given premise. It randomly chooses a class label as consequence.

Parameters:
itemArray - the (randomly constructed) premise of the class association rule
Returns:
a class association rule stored in a RuleItem

updateCounters

public final void updateCounters(ItemSet itemSet)
updates the support count of an item set

Parameters:
itemSet - the item set