laur.dm.ar
Class HashTree

java.lang.Object
  |
  +--laur.dm.ar.HashTree

public class HashTree
extends java.lang.Object

A HashTree is a special data structure that is used to index an ArrayList of Itemset objects for more efficient processing.


Constructor Summary
HashTree(java.util.ArrayList itemsets)
          Create a new HashTree.
HashTree(int listSize, int hashSize, java.util.ArrayList itemsets)
          Create a new HashTree.
 
Method Summary
 void add(int index)
          This method indexes in the HashTree the Itemset at index index from ArrayList itemsets which was passed to the constructor of this HashTree.
 void checkLargeness(Itemset itemset)
          Verifies if any of the indexed Itemsets is not large by checking whether they're included in the frequent itemset itemset.
 long countFrequentSubsets(Itemset itemset, long minWeight)
          Count how many frequent Itemsets (frequent = having weight greater than a specified minimum weight) are included in itemset
 long countSubsets(Itemset itemset)
          Count how many Itemsets are included in itemset
 void prepareForDescent()
          This method needs to be called after filling the tree and before any other processing (like update(), count*(), or checkLargeness()).
 void update(Itemset row)
          Update the weights of all indexed Itemsets that are included in row
 void update(Itemset row, long[][] counts)
          Update the weights of all indexed Itemsets that are included in row and also update the matrix counts
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HashTree

public HashTree(int listSize,
                int hashSize,
                java.util.ArrayList itemsets)
Create a new HashTree. The listSize parameter determines after how many inserts in a ListNode we have to change it to a HashNode (i.e. perform a split). The hashSize parameter can be specified to improve the efficiency of the structure.
Parameters:
listSize - the size of the internal lists in the list nodes
hashSize - the size of the internal hashtables in the hash nodes
itemsets - the ArrayList of Itemsets that we should index
Throws:
java.lang.IllegalArgumentException - itemsets is null or listSize <= 0 or hashSize <= 0

HashTree

public HashTree(java.util.ArrayList itemsets)
Create a new HashTree. This initializes the HashTree with default parameters.
Parameters:
itemsets - the ArrayList of Itemsets that we should index
Throws:
java.lang.IllegalArgumentException - itemsets is null
Method Detail

prepareForDescent

public void prepareForDescent()
This method needs to be called after filling the tree and before any other processing (like update(), count*(), or checkLargeness()). It gathers all leaves of the HashTree for more efficient processing.

add

public void add(int index)
This method indexes in the HashTree the Itemset at index index from ArrayList itemsets which was passed to the constructor of this HashTree.
Parameters:
index - the index of the Itemset that we need to index in this HashTree.
Throws:
java.lang.IllegalArgumentException - if the itemset added has different size from previous itemsets added.

update

public void update(Itemset row)
Update the weights of all indexed Itemsets that are included in row
Parameters:
row - the Itemset (normally a database row) against which we test for inclusion

update

public void update(Itemset row,
                   long[][] counts)
Update the weights of all indexed Itemsets that are included in row and also update the matrix counts
Parameters:
row - the Itemset (normally a database row) against which we test for inclusion
counts - a matrix used by some algorithms to speed up computations; its rows correspond to Itemsets and its columns correspond to items; each value in the matrix tells for how many times had an item appeared together with an itemset in the rows of the database.

countFrequentSubsets

public long countFrequentSubsets(Itemset itemset,
                                 long minWeight)
Count how many frequent Itemsets (frequent = having weight greater than a specified minimum weight) are included in itemset
Parameters:
itemset - the Itemset for which we count the subsets
minWeight - the minimum weight

countSubsets

public long countSubsets(Itemset itemset)
Count how many Itemsets are included in itemset
Parameters:
itemset - the Itemset for which we count the subsets

checkLargeness

public void checkLargeness(Itemset itemset)
Verifies if any of the indexed Itemsets is not large by checking whether they're included in the frequent itemset itemset. If an Itemset is not large then it will be marked.
Parameters:
itemset - the Itemset we check