Data Mining

Fall 2011

Professor Dan A. Simovici

E-mail: dsim@cs.umb.edu

Home page: www.cs.umb.edu/~dsim

Telephone: 617-287-6472 (office)  617-731-3297 (home)

Office hours: Monday 3:00-5:00 and Wednesady 3:00-4:00.



Data Mining is the computing discipline that seeks to identify patterns hidden in data with the purpose of formulating predictions, offer decision support, and classify and analyze data trends.  This discipline uses knowledge originating in data bases, artificial intelligence, statistics and several areas of mathematics.  Its applications are diverse ranging from health care, finance, national security to biology, chemistry, and humanities.

In this first course we shall cover several topics including:

·         The architecture of data mining

·         Association rules (frequent item sets, the Apriori algorithm, association rules)

·         Clustering techniques (k-means, hierarchical clustering, cluster evaluation)

·         Classification and Regression (Bayesian classifiers, decision trees, support vector machines, ensemble classifiers)

We recommend for this course the book “Introduction to Data Mining” by P.-N. Tan, M. Steinbach and V. Kumar published by Addison Wesley, and “Mathematical Tools for Data mining” by D. A. Simovici and C. Djeraba published by Springer.  However, the primary source of information are our lectures.  We shall use several free pieces of software: WEKA 3.6, a general data mining package, and SVM light.   Download and install this software on your computer and get familiar with them.   The WEKA package is described in a book by Ian Witten, Eibe Frank, and Mark Hall “Data Mining”, 3rd edition, published by Morgan-Kaufmann.  The course will be supplemented by handouts posted periodically on the website of the course.

The course will entail homeworks, an in-class exam in early December and a term paper.

Homeworks will be posted here ->

Handouts will be posted here ->