laur.dm.ar
Class SyntheticDataGenerator

java.lang.Object
  |
  +--laur.dm.ar.SyntheticDataGenerator

public class SyntheticDataGenerator
extends java.lang.Object

This class implements a synthetic data generator that generates data by simulating transactions in a supermarket. The algorithm is described in the article "Fast Algorithms for Mining Association Rules" by Rakesh Agrawal and Ramakrishnan Srikant from IBM Almaden Research Center, 1994. I have also used as additional information the C++ source code of the generator that is kindly distributed by Mr. Rakesh Agrawal, and the Master Thesis of Mr. Andreas Mueller.


Constructor Summary
SyntheticDataGenerator(long num_transactions, int avg_transaction_size, int num_large_itemsets, int avg_large_itemset_size, int num_items)
          Create a new synthetic data generator with mean correlation 0.5 and mean corruption 0.5.
SyntheticDataGenerator(long num_transactions, int avg_transaction_size, int num_large_itemsets, int avg_large_itemset_size, int num_items, double correlation_mean, double corruption_mean)
          Create a new synthetic data generator.
 
Method Summary
 java.util.ArrayList getLargeItemsets()
          Return the large itemsets used in the generation of transactions.
 Itemset getNextTransaction()
          Get next transaction.
 boolean hasMoreTransactions()
          Tell whether there are more transactions to generate.
static void main(java.lang.String[] args)
          sample usage and testing
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SyntheticDataGenerator

public SyntheticDataGenerator(long num_transactions,
                              int avg_transaction_size,
                              int num_large_itemsets,
                              int avg_large_itemset_size,
                              int num_items)
Create a new synthetic data generator with mean correlation 0.5 and mean corruption 0.5.
Parameters:
num_transactions - the number of transactions to generate
avg_transaction_size - the average size of a transaction
num_large_itemsets - the number of large itemsets to be used as patterns in the generation of transactions
avg_large_itemset_size - the average size of a large itemset
num_items - the number of items to appear in transactions
Throws:
java.lang.IllegalArgumentException - if the integer arguments are not strictly positive

SyntheticDataGenerator

public SyntheticDataGenerator(long num_transactions,
                              int avg_transaction_size,
                              int num_large_itemsets,
                              int avg_large_itemset_size,
                              int num_items,
                              double correlation_mean,
                              double corruption_mean)
Create a new synthetic data generator.
Parameters:
num_transactions - the number of transactions to generate
avg_transaction_size - the average size of a transaction
num_large_itemsets - the number of large itemsets to be used as patterns in the generation of transactions
avg_large_itemset_size - the average size of a large itemset
num_items - the number of items to appear in transactions
correlation_mean - the mean correlation between the large itemsets
corruption_mean - the mean of the corruption coefficient that will indicate how much a large itemset will be corrupted before being used.
Throws:
java.lang.IllegalArgumentException - if the integer arguments are not strictly positive or if the floating point arguments are not between 0 and 1.
Method Detail

hasMoreTransactions

public boolean hasMoreTransactions()
Tell whether there are more transactions to generate.
Returns:
true if there are more transactions, false otherwise

getNextTransaction

public Itemset getNextTransaction()
Get next transaction.
Returns:
an Itemset representing the transaction
Throws:
NoSuchElementException - if all transactions were generated

getLargeItemsets

public java.util.ArrayList getLargeItemsets()
Return the large itemsets used in the generation of transactions. This can be useful for debugging.
Returns:
an ArrayList containing the large itemsets as Itemset objects.

main

public static void main(java.lang.String[] args)
sample usage and testing