laur.dm.ar
Class SyntheticDataGenerator
java.lang.Object
|
+--laur.dm.ar.SyntheticDataGenerator
- public class SyntheticDataGenerator
- extends java.lang.Object
This class implements a synthetic data generator that generates
data by simulating transactions in a supermarket. The algorithm is
described in the article "Fast Algorithms for Mining Association
Rules" by Rakesh Agrawal and Ramakrishnan Srikant from IBM Almaden
Research Center, 1994. I have also used as additional information
the C++ source code of the generator that is kindly distributed by
Mr. Rakesh Agrawal, and the Master Thesis of Mr. Andreas Mueller.
Constructor Summary |
SyntheticDataGenerator(long num_transactions,
int avg_transaction_size,
int num_large_itemsets,
int avg_large_itemset_size,
int num_items)
Create a new synthetic data generator with mean correlation 0.5
and mean corruption 0.5. |
SyntheticDataGenerator(long num_transactions,
int avg_transaction_size,
int num_large_itemsets,
int avg_large_itemset_size,
int num_items,
double correlation_mean,
double corruption_mean)
Create a new synthetic data generator. |
Method Summary |
java.util.ArrayList |
getLargeItemsets()
Return the large itemsets used in the generation of transactions. |
Itemset |
getNextTransaction()
Get next transaction. |
boolean |
hasMoreTransactions()
Tell whether there are more transactions to generate. |
static void |
main(java.lang.String[] args)
sample usage and testing |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
SyntheticDataGenerator
public SyntheticDataGenerator(long num_transactions,
int avg_transaction_size,
int num_large_itemsets,
int avg_large_itemset_size,
int num_items)
- Create a new synthetic data generator with mean correlation 0.5
and mean corruption 0.5.
- Parameters:
num_transactions
- the number of transactions to generateavg_transaction_size
- the average size of a transactionnum_large_itemsets
- the number of large itemsets to be used
as patterns in the generation of transactionsavg_large_itemset_size
- the average size of a large itemsetnum_items
- the number of items to appear in transactions- Throws:
java.lang.IllegalArgumentException
- if the integer arguments
are not strictly positive
SyntheticDataGenerator
public SyntheticDataGenerator(long num_transactions,
int avg_transaction_size,
int num_large_itemsets,
int avg_large_itemset_size,
int num_items,
double correlation_mean,
double corruption_mean)
- Create a new synthetic data generator.
- Parameters:
num_transactions
- the number of transactions to generateavg_transaction_size
- the average size of a transactionnum_large_itemsets
- the number of large itemsets to be used
as patterns in the generation of transactionsavg_large_itemset_size
- the average size of a large itemsetnum_items
- the number of items to appear in transactionscorrelation_mean
- the mean correlation between the large
itemsetscorruption_mean
- the mean of the corruption coefficient
that will indicate how much a large itemset will be corrupted before
being used.- Throws:
java.lang.IllegalArgumentException
- if the integer arguments
are not strictly positive or if the floating point arguments are
not between 0 and 1.
hasMoreTransactions
public boolean hasMoreTransactions()
- Tell whether there are more transactions to generate.
- Returns:
- true if there are more transactions, false otherwise
getNextTransaction
public Itemset getNextTransaction()
- Get next transaction.
- Returns:
- an Itemset representing the transaction
- Throws:
NoSuchElementException
- if all transactions were generated
getLargeItemsets
public java.util.ArrayList getLargeItemsets()
- Return the large itemsets used in the generation of transactions.
This can be useful for debugging.
- Returns:
- an ArrayList containing the large itemsets as Itemset objects.
main
public static void main(java.lang.String[] args)
- sample usage and testing