Ph.D. Thesis Abstract

Mining Rules in Single-table and Multiple-table Databases

Abstract

Data mining represents the process of extracting interesting and previously unknown knowledge from data. In this thesis we address the important data mining problem of discovering association rules in single-table and multiple-table databases, and we also introduce a generalization of the database concept of functional dependency: the purity dependencies, which can be viewed as a type of rules that are informationally more significant than association rules. An association rule expresses the dependence of a set of attribute-value pairs, also called items, upon another set of items (itemset). The mining of association rules is performed in two stages: the discovery of frequent sets of items from the data and the generation of association rules from the frequent itemsets. The first stage is computationally intensive and the second stage has the drawback of possibly generating thousands of rules, thus making the analysis of the results hard to perform by a human analyst. Our contributions relate to both stages of this process. We analyze stage one and introduce two new algorithms among which algorithm Closure improves upon the performance of the classic algorithm Apriori. We define the concept of an informative cover of the set of association rules and present algorithm CoverRules for computing such a cover, which is in practice one or more orders of magnitude smaller than the set of association rules. Next, we investigate how the mining of association rules should be performed when the data reside in several tables organized in a star schema. We show how to modify the Apriori algorithm to compute the support of itemsets by analyzing the star schema tables or their outer join. Whereas association rules express the dependency between itemsets, purity dependencies express the dependency between sets of attributes. We introduce the concept of purity dependency as a generalization of the concept of functional dependency and we show that these purity dependencies satisfy properties that are similar to the Armstrong rules for functional dependencies. As an application of our theory we present an algorithm for discovering purity dependencies that can be used to predict the value of a target attribute.

Back to Laurentiu's home page