weka.associations
Class RuleGeneration

java.lang.Object
  extended byweka.associations.RuleGeneration
All Implemented Interfaces:
java.io.Serializable
Direct Known Subclasses:
CaRuleGeneration

public class RuleGeneration
extends java.lang.Object
implements java.io.Serializable

Class implementing the rule generation procedure of the predictive apriori algorithm. Reference: T. Scheffer (2001). Finding Association Rules That Trade Support Optimally against Confidence. Proc of the 5th European Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD'01), pp. 424-435. Freiburg, Germany: Springer-Verlag.

The implementation follows the paper expect for adding a rule to the output of the n<\i> best rules. A rule is added if: the expected predictive accuracy of this rule is among the n<\i> best and it is not subsumed by a rule with at least the same expected predictive accuracy (out of an unpublished manuscript from T. Scheffer).

Version:
$Revision: 1.1 $
Author:
Stefan Mutter (mutter@cs.waikato.ac.nz)
See Also:
Serialized Form

Constructor Summary
RuleGeneration(ItemSet itemSet)
          Constructor
 
Method Summary
static boolean aSubsumesB(RuleItem a, RuleItem b)
          Methods that decides whether or not rule a subsumes rule b.
static double binomialDistribution(double accuracy, double ruleCount, double premiseCount)
          calculates the probability using a binomial distribution.
 boolean change()
          Gets if the list fo the best rules has been changed
 int count()
          Gets the actual maximum value of the generation time
static double expectation(double ruleCount, int premiseCount, double[] midPoints, java.util.Hashtable priors)
          calculates the expected predctive accuracy of a rule
 java.util.TreeSet generateRules(int numRules, double[] midPoints, java.util.Hashtable priors, double expectation, Instances instances, java.util.TreeSet best, int genTime)
          Generates all rules for an item set.
 boolean removeRedundant(RuleItem toInsert)
          Method that removes redundant rules out of the list of the best rules.
static FastVector singleConsequence(Instances instances, int attNum, FastVector consequences)
          generates a consequence of length 1 for an association rule.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

RuleGeneration

public RuleGeneration(ItemSet itemSet)
Constructor

Parameters:
itemSet - item set for that rules should be generated. The item set will form the premise of the rules.
Method Detail

binomialDistribution

public static final double binomialDistribution(double accuracy,
                                                double ruleCount,
                                                double premiseCount)
calculates the probability using a binomial distribution. If the support of the premise is too large this distribution is approximated by a normal distribution.

Parameters:
accuracy - the accuracy value
ruleCount - the support of the whole rule
premiseCount - the support of the premise
Returns:
the probability value

expectation

public static final double expectation(double ruleCount,
                                       int premiseCount,
                                       double[] midPoints,
                                       java.util.Hashtable priors)
calculates the expected predctive accuracy of a rule

Parameters:
ruleCount - the support of the rule
premiseCount - the premise support of the rule
midPoints - array with all mid points
priors - hashtable containing the prior probabilities
Returns:
the expected predictive accuracy

generateRules

public java.util.TreeSet generateRules(int numRules,
                                       double[] midPoints,
                                       java.util.Hashtable priors,
                                       double expectation,
                                       Instances instances,
                                       java.util.TreeSet best,
                                       int genTime)
Generates all rules for an item set. The item set is the premise.

Parameters:
numRules - the number of association rules the use wants to mine. This number equals the size n<\i> of the list of the best rules.
midPoints - the mid points of the intervals
priors - Hashtable that contains the prior probabilities
expectation - the minimum value of the expected predictive accuracy that is needed to get into the list of the best rules
instances - the instances for which association rules are generated
best - the list of the n<\i> best rules. The list is implemented as a TreeSet
genTime - the maximum time of generation
Returns:
all the rules with minimum confidence for the given item set

aSubsumesB

public static boolean aSubsumesB(RuleItem a,
                                 RuleItem b)
Methods that decides whether or not rule a subsumes rule b. The defintion of subsumption is: Rule a subsumes rule b, if a subsumes b AND a has got least the same expected predictive accuracy as b.

Parameters:
a - an association rule stored as a RuleItem
b - an association rule stored as a RuleItem
Returns:
true if rule a subsumes rule b or false otherwise.

singleConsequence

public static FastVector singleConsequence(Instances instances,
                                           int attNum,
                                           FastVector consequences)
generates a consequence of length 1 for an association rule.

Parameters:
instances - the instances under consideration
attNum - an item that does not occur in the premise
consequences - FastVector that possibly already contains other consequences of length 1
Returns:
FastVector with consequences of length 1

removeRedundant

public boolean removeRedundant(RuleItem toInsert)
Method that removes redundant rules out of the list of the best rules. A rule is in that list if: the expected predictive accuracy of this rule is among the best and it is not subsumed by a rule with at least the same expected predictive accuracy

Parameters:
toInsert - the rule that should be inserted into the list
Returns:
true if the method has changed the list, false otherwise

count

public int count()
Gets the actual maximum value of the generation time

Returns:
the actual maximum value of the generation time

change

public boolean change()
Gets if the list fo the best rules has been changed

Returns:
whether or not the list fo the best rules has been changed