Dr. Dan A. Simovici
Professor of Computer Science
University of Massachusetts Boston
Department of Computer Science
Education
 Ph.D., July 1974, University of Bucharest, Romania
 M.S. in Mathematics, June 1970, University of Iasi,
Romania
 M.S. in E.E., June 1965, Polytechnical Institute of
Iasi, Romania
Professional Affiliations
 Senior Member of IEEE
 Vice Chair of the Technical Committee for MultipleValued Logic
of the Computer Society
 Association for Computing Machinery
 AAAI
Academic Career
 1985  present: Professor of Computer Science, UMB
 2010  present: Honorary Professor of Computer Science, University of Iasi, Romania
 January 2006: Visiting Professor of Computer Science, University of Science and Technology, Lille, France
 1998: Visiting Professor of Computer Science, Tohoku
University, Sendai, Japan
 1984  present: Director of the Computer Science Graduate Program, UMB
 1982  1985: Associate Professor of Computer Science, UMB
 1981  1982: Associate Professor of Computer Science, University of Miami, Florida
Research Interests
 InformationTheoretical and Linear Methods in Data Mining
 Semantic Models in Databases
 Algebraic Aspects of MultipleValued Logic
Other Activities

ViceChair the Program Committees for ISPA'07 for Databases and Data Mining

Member of Program Committees for major datamining conferences: KDD, PKDD, DAWAK, EGC.

General Chairman of the 32nd
International Symposium for MultipleValued
Logic, Boston, Massachusetts, 1518 May 2002.

Managing Editor of Journal for MultipleValued Logic and Soft Computing

Editor of International Journal for Parallel, Emergent, and Distributed Systems

Editor of International Journal for Software and Information Technologies

General CoChairman of the 26th International Symposium for MultipleValued
Logic, Santiago de Compostela, Spain, May 1996.

General Chairman of the 24th International Symposium for MultipleValued
Logic, Boston, Massachusetts, May 1994.

Chairman of the Technical Committee of the Computer Society/IEEE (elected
at the 19th annual meeting, May 1988, Spain).

General Chairman of the 17th International Symposium for MultipleValued
Logic, Boston, Massachusetts.

Reviewer for the Computer Science Accreditation Board of IEEE/ACM.
Ph.D. Students
Books

Mathematical Analysis for Machine Learning and Data Mining , World Scientific, 2018

Linear Algebra Tools for Data Mining , World Scientific, 2012

Mathematical Tools for Data Mining ,
SpringerVerlag 2008 by Dan A. Simovici and C. Djeraba (second
edition 2015)

Theory of Formal Languages with Applications by Dan
Simovici and Richard Tenney, World Scientific, 1999

Relational Database Systems by Dan
Simovici and Richard Tenney, Academic Press, 1995

Mathematical Foundations of Computer Science. vol. I: Sets, Relations,
Induction in Computer Science (with Peter Fejer), Springer Verlag, New York,
1990

Logical Foundations of Computer Science (with Peter Fejer), in
preparation for SpringerVerlag, New York

Introduction aux Structures Algébriques, (2 vols.), ERPI, Montreal,
Canada, 1992
 Formal Languages and Compiling Techniques, Editura
Didactica si Pedagogica, Bucharest, Romania, 1978
Recent Publications

Dual Criteria Determination of the Number of Clusters in Data
(Kaixun Hua), Proceedings of SYNASC 2018, Timisoara, 201207, Computer Society.

Ultrametricity of Dissimilarity Spaces and Its Significance for Data Mining
(with R. Vetro and K. Hua),
EGC 2015, Luxembourg, Revue des Nouvelles Technologies de l'Information,
RNTI E. 28, 89100
(pdf file)

Several Remarks on Dissimilarities and Ultrametrics, Scientific Annals of Computer Science, "Al. I. Cuza" University
of Iasi, Romania, vol. XXV, 1, 2015, pp. 155170
(pdf file)

Representative Training Sets for Classification and the Variablity of Empirical Distributions (with Saaid Baraty),
Extraction et Gestion des Connaisances, February 2014,
EGC'2014, Revue des Nouvelles Technologies, E. 26, pp. 299304
(pdf file)

Evaluating Data minability Through Compression  An Experimental Study (with Saaid Baraty and
Dan Pletea), International Journal on Advances in Software, vol. 6, no.34, 2013, pp 237245
(pdf file)

On Submodular and Supermodular Functions on Lattices and Related Stuctures, to appear in the
Proceedings of the 44th International Symposium for MultipleValued Logic, Bremen, Germany, May 1719, 2014
(pdf file)

Data Mining of Medical Data: Opportunities and Challeges in Mining
Association Rules, IALS, Cecilienhof, Potsdam, August 2012
(pdf file)

Evaluating Data Minability through Compression  An Experimental Study
(with D. Pletea and S. Baraty)  Proceedings of Data Analytics 2012,
Barcelona, Spain, September 2012, pp. 97102
(pdf file)

Polarities, Axiallities, and Marketability, DaWaK 2012,
Vienna, September 2012 (with P. Fomenky and W. Kurz), LNCS 7448,
SpringerVerlag, pp.243252
(pdf file)

InformationTheoretical Mining of Determining Sets for Partially
Defined Functions, to appear at the Journal for MultipleValued Logic
and Soft Computing (with Dan Pletea and Rosanne Vetro) (pdf file)

Evaluating Bayesian Networks by Sampling with Simplified Assumptions
EGC 2012, Bordeaux, Revue des Nouvelles Technologies de l'Information, RNTI, E.23, pp. 1116
(with Saaid Baraty)

Several Remarks on the Metric Space of Genetic Codes,
International Journal of Data Mining and Bioinformatics, vol. 6, 2012,
pp. 1726 (with D. Weisman)
(pdf file)

EntropicGenetic Clustering, Revue des Nouvelles Technologies d'Information, Extraction et Gestion des Connaissances, 2011,
Brest, France, pp. 7176 (with M. Breaban and H. Luchian)
(pdf file)

Approximative distance computation by random hashing (with S. Mimaroglu and M. Yagci),
Journal of Supercomputing, appeared in "On line first" (to appear in print this Fall,
(pdf file)

Entropy quadtrees for high complexity region detection (with R. Vetro and W. Ding), IJSSCI, vol. 3, pp. 1633, 2011.

The Impact of Triangular Inequality Violations on MedoidBased
Clustering, Proceedings of ISMIS 2011, (with S. Baraty and C. Zara)
Warsaw, Poland, June 2011, Lecture Notes in Artificial Intelligence,
LNAI 6804, pp. 280289, (pdf file)

Entropies on Bounded Lattices, Proceedings of the 41st International Symposium for MultipleValued Logic,
Tuusula, Finland, May 2011, pp. 307312 (pdf file)

Singular value decomposition is a valid predictor of stroke importance
in reading Chinese, (with Wang, H.C., Angele, B., Schotter, E., Yang,
J., Pomplun, M. and Rayner, K.) Poster at the 16th European
Conference on Eye Movements (ECEM2011), Marseille, France. August,
2011.

Bernoulli Trials Based Feature Selection for Crater Detections,
(with Liu, W. Ding, J. P. Cohen, T. Stepinski)
the 19th ACM SIGSPATIAL International Conference on Advances in
Geographic Information Systems, Chicago, IL, November, 2011

Mining Determining Sets for Partially Defined Functions (with D. Pletea and R. Vetro),
Advances in Data Mining, Lecture Notes in Artificial Intelligence LNAI 5633, SpringerVerlag
(pdf file)

Scalable pattern mining with Bayesian networks, Data Mining and Knowledge Discovery,
(SpringerVerlag), vol. 18, 2009, pp.56100 (with S. Jaroszewicz and T. Scheffer)
(pdf file)

Mining Approximative Descriptions of Sets Using Rough Sets (with Selim Mimaroglu),
Proceedings of the 39th International Symposium for MultipleValued Logic, Okinawa, Japan, May 2009
(pdf file)

Binary Sequences and Association Graphs for Fast Detection of Sequential Patterns
(with S. Mimaroglu), EGC 2009, Strassbourg, January 2009
(pdf file)

Edge Evaluation in Bayesian Network Structures (with Saaid Baraty),
Proceedings of the 8th Australian Data Mining Conference (AusDM 2009),
Australian Computer Society and ACM, pp. 193201
(pdf file)

Structural Classification of XML Documents Using Multisets (with S. Iyer),
International Journal on Artificial Intelligence Tools, vol. 17, no.5, pp.10031022
(pdf file)

Approximate Computation of Object Distances by LocallySensitive Hashing (with S. Mimaroglu),
to appear in Proceedings of DBMIN'08, Las Vegas, August 2008
(pdf file)

MetricEntropy Pairs on Lattices, Journal of Universal Computer Science
(SpringerVerlag), vol. 13, no.11, 2007,pp. 17671778
(pdf file)
 Betweenness, Metrics and Entropies in Lattices, Proceedings of ISMVL 2008, Dallas, TX,
May 2008; the posted version is a preprint that will appear in the Journal for MultipleValued Logic
and Soft Computing
(pdf file)
 Detecting Eye Fixations by Projection Clustering
(with T. Urruty, S. Lew, N. Ihadaddene) ACM Transactions on Multimedia
Computing, Communications and Applications, vol. 3, no.4, December
2007, (pdf file)
 Metric Methods in Data Mining, a chapter in Data
Mining Patterns  New Methods and Applications,
P. Poncelet. M. Teisseire, F. Masseglia (eds.), Information Science
Reference, Hershey, 2007, pp. 131.

Structure Inference of Bayesian Networks from Data: A New Approach Based on Generalized
Conditional Entropy (with Saaid Baraty), Proceedings of ECG 2008, Sophia Antipolis, France,
Revue des Nouvelles Technologies et de l'Information, RNTIE11, 2008, pp. 337342
(pdf file)

Multisets and Clustering XML Documents (with Swami Iyer) Proceedings of ICTAI,
October 2007, Patras, Greece, IEEE CS Press, pp.267274
(pdf file)
 Clustering and Approximate Identification of Frequent Item Sets
(with S. Mimaroglu) Proceedings of FLAIRS 2007, Key West, May 2007, pp. 502506
(pdf file)
 Clustering by Random Projections (with T. Urruty and C. Djeraba),
ICDM 2007, Leipzig (pdf file),
Lecture Notes in Artificial Intelligence, no. 4597, pp. 107119
 On the Axiomatization of Generalized Entropic Distances,
accepted at ISMVL 2007, Oslo, May 2007
(pdf file)
An extended version in the Journal of Multivalued Logic and Soft Computing, v. 13, f.46,pp.295320 is
(pdf file)
 Model detection for User Behavior in Video Sessions
(with Sylvain Mongy and Chabane Djeraba), DMIN, June 2007, Las Vegas
(pdf file),
Proceedings of DMIN 2007, CSREA Press, pp. 99103
 A New Metric Splitting Criterion for Decision Trees(with Szymon Jaroszewicz)
(pdf
file) Journal of Parallel, Emerging and Distributed Computing, vol.21, no.4, pp. 239256, 2006.
 On Feature Extraction through Clustering (with Richard Butterworth and Gregory
PiatetskyShapiro) (pdf
file) Proceedings of ICDM 2005, pp. 581584 Houston, Texas, November 2005.
 Biclustering of Gene Expression Data Based on Local Nearness (with
J. AguilarRuiz and Domingo Savio Rodriguez)
(pdf file) Proceedings of EGC 2006,
Lille, France, January 2006, pp. 681692.
 On the Ranges of Algebraic Functions in Lattices (with S. Rudeanu)
(pdf file) Studia Logica, vol. 84, no.3,
pp. 451483, December 2006.
 SemiSupervised Incremental Clustering of Categorical Data
(with N. Singla), Proceedings of EGC 2005, Paris, France, pp. 189200.
 An Abstract Axiomatization of the Notion of Entropy (with Ivo Rosenberg),
Proceedings of ISMVL, May 2005, Calgary, Canada
(pdf file) .
 Metric Incremental Clustering of Nominal Data
(with N. Singla and M. Kuperberg), Proceedings of ICDM 2004, Brigton, UK, pp. 523527
(pdf file)
 Interestingness of Frequent Itemsets Using Bayesian
Networks as Background Knowledge (with S. Jaroszewicz), Proceedings of
KDD 2004, Seattle, pp. 178186.
(pdf file)
 A Greedy Algorithm for Supervised Discretization (with
R. Butterworth, D. S. Santos and Lucila OhnoMachado),
Journal of Biomedical Informatics, vol. 37(4), pp. 285292.
(pdf file)

Measures on Boolean polynomials and their applications in data mining
(with S. Jaroszewicz and I. G. Rosenberg), Applied Discrete Mathematics,
volume on Discrete Mathematics and Data Mining, vol. 144,1, pp. 123139
(pdf file)

A Metric Approach to Building Decision Trees Based on GoodmanKruskal
Association Index (with S. Jaroszewicz),
PAKDD 2004, Sydney, Australia, May 2004,
LNAI 3056, SpringerVerlag, pp. 181190
(pdf file)

A GraphTheoretical Approach to Boolean Interpolation of NonBoolean
Functions (with S. Rudeanu), Proceedings of the 34th International
Symposium for MultipleValued Logic, Toronto, May 2004, published by
IEEE Computer Society, pp. 245250
(pdf file)

Evolutionary Strategy for Learning MultipleValued Logic
Functions (with A. Ngom and I. Stojmenovic),
Proceedings of the 34th International
Symposium for MultipleValued Logic, Toronto, May 2004, published by
IEEE Computer Society, pp. 154160

A Metric Approach to Supervised Discretization (with R. Butterworth),
EGC 2004, ClermontFerrand, France, January 2004, Revue des Nouvelles Technologies de
l'Information, RNTIE2, vol. 1, pp. 197203

The GoodmanKruskal Coefficient and Its Applications in the Genetic
Diagnosis of Cancer (with S. Jaroszewicz, W. Kuo and L.
OhnoMachado), IEEE Transactions on Biomedical Engineering, vol. 51, no. 7,
pp. 10951102, July 2004.
(pdf file)

Generating an Informative Cover for Association Rules (with L.
Cristofor), Proceedings of the 2002 IEEE International Conference on
Data Mining, pp. 597600
(pdf file)

Approximation of NonBoolean Functions by Boolean Functions and
Applications in Nonstandard Computing, in Proceedings of the 2002
International Symposium on New Paradigm Computing, December 2002,
Sendai, Japan, pp. 2731 (invited talk)
(pdf file)

Several Remarks on NonBoolean Functions over Boolean Algebras,
Proc. of the International Symposium for MultipleValued Logic, Meiji
University, Tokyo, May 2003, pp. 163168
(pdf file)

An Algebraic Approach to Entropy in Beyond Two: Theory and
Applications of MultipleValued Logic, M. Fitting and E. Orlowska
(editors), SpringerVerlag, Heidelberg, New York, 2003, pp. 101115.

Generalized Entropy and Decision Trees, EGC 2003  Journees
francophones d'Extraction et de Gestion de Connaissances, January
2003, Lyon, France (with S. Jaroszewicz), pp. 369380
(ps file) (pdf file)

Support Approximations using BonferroniType Inequalities (with
S. Jaroszewicz), Principles of Data Mining and Knowledge Discovery,
PKDD 2002, Helsinki, August 2002, Lecture Notes in Artificial
Intelligence, vol. 2431, pp. 212224, Springer Verlag, Berlin, 2002.
(ps file)
(pdf file)

Generating Informative Cover Rules (with Laurentiu Cristofor),
International Conference on Data Mining, Maebashi, Japan, December
2002.
(ps file)
(pdf file)

Finding Median Partitions Using InformationTheoretical Algorithms
(with D. Cristofor), Journal of Universal Computer Science, vol 8,
no.2, 153172.
(ps file)
(pdf file)

An InclusionExclusion Result for Boolean Polynomials and Its
Applications in Data Mining (with S. Jaroszewicz and I. Rosenberg),
Proceedings of the Discrete Mathematics and Data
Mining Workshop, Washington, April, 2002 (SIAM DM Meeting),
pp. 165173.
(ps file)
(pdf file)

An InformationTheoretical Approach to Clustering Categorical
Databases Using Genetic Algorithms (with Dana Cristofor),
Proceedings of the Workshop on Clustering HighDimensional Data and
Its Applications, Washington, April, 2002 (SIAM DM Meeting),
pp. 3746.
(ps file)
(pdf file)

On Functions Defined on Free Boolean Algebras (with I. Rosenberg and
S. Jaroszewicz), Proceedings of the ISMVL 2002,
Boston, Massachusetts, IEEE Computer Society, Los Alamitos,
California,
pp. 192201.
(ps file)
(pdf file)

Mining for Purity Dependencies in Relational Databases (with
L. Cristofor and D. Cristofor), EGC 2000, Montpellier, January 1923
(ps file)
(pdf file)
(best paper award received from
AFIA (The French Association for Artificial Intelligence).

An Axiomatization of Partition Entropy (with S. Jaroszewicz)
Transactions on Information Theory, July 2002, vol. 48 (7),
pp. 21382142 (a preliminary form appeared in
the Proceedings of the 31st ISMVL,.Warsaw, Poland, May 2001, pp. 259266).

Impurity Measures in Databases (with L. Cristofor and D. Cristofor),
Acta Informatica, 38 (2002), pp. 307324.

Prunning Redundant Association Rules Using Maximum Entropy Principle
(with S. Jaroszewicz), Proceedings of PAKDD, Taipei,
May 2002, Lecture Notes in Artificial Intelligence, vol. 2336,
Springer Verlag, pp. 135147.
(ps file)
(pdf file)

A General Measure of Rule Interestingness (with S. Jaroszewicz)
in Principles of Data Mining and Knowledge Discovery, the 5th
European Conference, PKDD 2001, Freiburg, September 2001, Lecture Notes in
Artificial Intelligence, vol. 2168, SpringerVerlag, pp. 253266.

Mining Association Rules in EntityRelationship Modeled Databases
(with Laurentiu Cristofor), Technical Report, UMB, TR 20011
(pdf file)

An InformationTheoretical Approach to Genetic Algorithms for Clustering
(with Dana Cristofor), Technical Report, UMB, TR 20012

Generalized Entropy and Projection Clustering of Categorial Data
(with D. Cristofor, L. Cristofor)
in Principles of Data Mining and Knowledge Discovery, the 4th
European Conference, PKDD 2000, Lyon, Lecture Notes in Artificial
Intelligence, 1910, SpringerVerlag, pp. 619625

Impurity Measures and Applications to Classification and Clustering,
(with Dana and Laurentiu Cristofor)
presented at the Int. Conf. on Advances
in Infrastructure for Electronic Bussiness, Science, and Education,
Scuola Superiore G.R. Romoli (Telecom  Italia),
Aquila, Italy, August 2000

Data Mining of Weak Functional Decompositions
(with S. Jarosiewicz)
in the Proceedings of the 30th International Symposium for
MultipleValued Logic, Portland, Oregon, pp. 7782

On InformationTheoretical Aspects of Relational Databases
(with S. Jarosewicz), in Finite vs. Infinite, SpringerVerlag, pp. 301322

Galois Connections and Data Mining,
(with L. Cristofor and D. Cristofor),
Journal of Universal Computer
Sciene, Springer Verlag, vol.6, no.1, pp. 6074

Boolean Completeness in Twovalued Set Logic,
(with I. Stojmenovic and R. Tosic)
Multi. Val. Logic, 2000,
vol. 5, pp. 267280

On Axiomatization of Conditional Entropy of Functions between Finite Sets,
(with S. Jarosiewicz)
Proc.of the 29th ISMVL, Freiburg, Germany, pp. 2431

Automatic Data Restructuring
(with S. Ginsburg and Nan Shu)
Journal of Universal Computer Sciene, vol. 5, no, 4, pp. 243286

Learning with Permutably Homogeneous Perceptrons,
(with A. Ngom, I. Stojenovic, C. Reischer)
Proc.of the 28th ISMVL, Fukuoka, Japan, pp. 161167

Functional Entropy and Decision Trees,
(with V. Shmerko, V. Cheushev, S. Yanushkiewicz)
Proc.of the 28th ISMVL, Fukuoka, Japan, pp. 257264

Completeness Criteria in SetValued Logic Under Composition with Union
and Intersection,
Proceedings of the 27th International
Symposium for MultipleValued Logic, May 1997, pp. 7582.

A Characterization of the Information Content of a Classification
(with K. Baclawski), Information Processing Letters, vol 57 (1996),
pp. 211214.

Several Remarks on the Complexity of SetValued Switching Functions,
(with C. Reisher)
Proceedings of the 26th International Symposium for MultipleValued
Logic, Santiago de Compostela, Spain, 1996

A Categorial Approach to Database Semantics,
(with K. Baclawski and W. White)
Math. Struct. in Computer
Science (1994), v. 4, pp. 147183
Recent Talks

Data Mining of Medical Data: Opportunities and Challanges
(Potsdam, August, 2012)
(pdf file)

The VapnikChervonenkis Dimension and Learnability (full version),
Siemens Doctoral Summer School at the University of Iasi, Romania,
June, 2012
(pdf file)

Linear Methods in Data Mining,
Siemens Doctoral Summer School at the University of Iasi, Romania, June 20, 2009
(pdf file)

Hereditary Families of Sets in Data Mining,
University of Bucharest, Romania, June 25, 2009
(pdf file)

Metric Methods in Data Mining,
IDA 2006, Iasi, Romania, June 16, 2006,
(pdf file)

Metric Methods in Mining,
Dana Farber Cancer Institute, Boston, February 27, 2004,
(pdf file)

Wavelets and Applications (MIT, April 27, 2004)
(pdf file)

Research Directions in Data Mining (in Romanian, October 2004, Universities of Bucharest and Iasi, Romania)
(pdf file)

Metrics on Partitions of Finite Sets and Data Mining Applications (UMB, May 11, 2005)
(pdf file)

An Abstract Axiomatization of the Notion of Entropy (Calgary, May 19, 2005)
(pdf file)

Efficient Computing Through Random Algorithms (Doctoral Summer School, June 2013, Iasi, Romania)
(pdf file)

Multivalued and Binary Ultrametrics and Clusterings (Doctoral Summer School, June 2014, Iasi, Romania)
(pdf file)

Clustering Axiomatization (Doctoral Summer School, June 2019,
Iasi, Romania)
(pdf file)
CS671  MACHINE LEARNING  SLIDES and HANDOUTS
CS 671 SLIDES