This software, COMUSA, is written in Java, and it is used to obtain a better quality final clustering by merging a collection of input clusterings.
Requirements:
COMUSA is written in Java and tested with JRE 1.6
How to Run:
java -jar comusa.jar
java -jar -Xms1g -Xmx1g comusa.jar (using 1GB of main memory)
Download COMUSA:
comusa.jar
Download Cluster Ensembles Used In the Paper:
input.txt (sample input for the example shown in the paper)
test data sets (all the input test data sets)
ecoli.txt (not in the paper, ecoli data set)
Note:
Compiled code for research purposes only, NO COMMERCIAL USE
Disclaimer:
The software is provided on an *as is* basis for research purposes. There is no additional support offered, nor are the author(s) or their institutions liable under any circumstances.
Usage:
The input format is shown below, name your input file as "input.txt". Each row represents a cluster and each column shows cluster membership of an object in the cluster. For example, first line indicates that the first cluster has first, third, and sixth objects. Order of the clusters is not important.
1,0,1,0,0,1,0,0 0,0,0,1,1,0,0,0 0,1,0,0,0,0,1,1 1,1,0,1,0,0,0,0 0,0,0,0,0,0,0,1 0,0,0,0,1,0,1,0 0,0,1,0,0,1,0,0 0,0,1,0,0,1,0,0 1,1,0,1,0,0,0,1 0,0,0,0,1,0,1,0 |
![]() | |
| Figure 2: COMUSA screenshot |
Abstract
Multiple clusterings are produced for various needs and reasons in both distributed and local environments. Combining multiple clusterings into a final clustering which has better overall quality has gained importance recently. It is also expected that the final clustering is novel, robust, and scalable. In order to solve this challenging problem we introduce a new graph-based method. Our method uses the evidence accumulated in the previously obtained clusterings, and produces a very good quality final clustering. The number of clusters in the final clustering is obtained automatically; this is another important advantage of our technique. Experimental test results on real and synthetically generated data sets demonstrate the effectiveness of our new method.
Keywords: Clustering, Combining Clustering Partitions, Cluster Ensemble, Evidence Accumulation, Robust Clustering, Mutual Information. ______________________________________________________________