ARMiner Frequently Asked Questions
			  Version 1.02

      (last updated on December 5th 2001 by Laurentiu Cristofor)
_____________________________________________________________________

Q: What is the purpose of this document?

A: After getting several emails that asked similar questions I
realized that it would be better to collect answers to "frequently
asked questions" in one document in the hope that visitors of the
ARMiner page may get answers to their questions quickly (and I don't
have to repeat explanations often :)). So here are the questions that
were asked most often and the answers I provided for them:
_____________________________________________________________________

Q: What are the ARMiner requirements?

A: You need a Java 1.2 distribution in order to run and/or compile
ARMiner. It does not run or compile with Java 1.1 or earlier versions.
_____________________________________________________________________

Q: How do I set up ARMiner?

A: Just uncompress the tar.gz file in some directory and then launch
the server and the client as described in the readme.txt file. The
files needed by the server are the ones in bin/Server/ and the files
needed by the client are in bin/Client/.
_____________________________________________________________________

Q: How can I make my own data available to ARMiner?

A: Get the db_asc_tools.tar.gz archive from the add-ons section of
this site. The tutorial included in the archive should help you. The
basic procedure would be to set up your data in an ASCII file as
described in the tutorial and then to use the asc2db program to create
a .db file. Then you can upload the .db file to the ARMiner server by
selecting Database/Add in ARMiner's menu.
_____________________________________________________________________

Q: Isn't there a simpler way of importing data into ARMiner's format?

A: Unfortunately, no. Although we thought of writing a more complex
tool eventually we didn't had enough time for it. Therefore you will
need to either use the asc2db tool or to write your own conversion
Java program.
_____________________________________________________________________

Q: I'd like to use my own data with ARMiner, can you help me out?

A: If I could use your data in the research that I am doing then I
might help you converting it to ARMiner's format, but otherwise you'll
have to rely only on the db_asc_tools.tar.gz archive and your own Java
skills.
_____________________________________________________________________

Q: How do I recompile ARMiner?

A: To recompile ARMiner you need a Java 1.2 distribution and,
optionally, a make utility. The following explanations assume that you
have a make program, if not then you can read the contents of the
makefile to see what commands you need to issue for the compilation
and build of the jar files.

To compile the server:

1. copy src/server/* and src/common/* to some directory, let's say
buildServer/

2. copy makefile from add-ons section of ARMiner website to directory
buildServer/

3. run 'make' or 'make allServer' while in directory buildServer/

4. You should have obtained two jar files: Server.jar and
DBConfig.jar. These, together with the contents of the DB/ directory,
are what you need to run the server.

5. You can optionally type 'make clean' to delete all Java compiled
files.

To compile the client:

1. copy src/client/* and src/common/* to some directory, let's say
buildClient/

2. copy makefile from add-ons section of ARMiner website to directory
buildClient/

3. run 'make allClient' while in directory buildClient/

4. You should have obtained one jar file: Client.jar. This and the
files first.gif and last.gif are the files needed for running the
client.

5. You can optionally type 'make clean' to delete all Java compiled
files.
_____________________________________________________________________

Q: What are known problems with ARMiner? Can you add feature X? 

A: Since ARMiner has been released I have focused on eliminating all
major errors and I think I'm quite done with this stage. There are
many things however that are lacking and that could be frustrating for
someone who expects a complete product. But ARMiner was never intended
to be equivalent to a commercial association rule mining
application. Its intent was to provide people with a tool for
experimenting and exploring association rules.

As of December 5th 2001, I am redirecting my efforts to the completion
of ARtool (www.cs.umb.edu/~laur/ARtool/). ARtool will contain updated
versions of the core files of ARMiner and newer algorithms. I also
intend to write a better interface that offers more functionality. I
will still fix outstanding errors in ARMiner if they are reported to
me but I do not intend to add any new features in the near future.
_____________________________________________________________________

Q: What are the usual times it takes to mine a database?

A: The time taken to mine a database depends on 4 factors: 

a) size, the number of rows (tuples) of the database. All algorithms
included with ARMiner scale linearly with the size of the
database. This means that if you increase the size of the database by
2, the time taken by the algorithms will increase by 2.

b) number of attributes of the database. In the worst case the time
taken by the algorithms will increase exponentially with respect to
the number of attributes.

c) the minimum support specified for the mining. The lower this value,
the longer the algorithm will take to execute. Again here the time can
increase dramatically with the decrease of the support.

d) the density of the database. The databases used by ARMiner
represent binary data, that is in a row you either have or not have an
item/attribute present. A database like this could be represented as a
matrix of 0s and 1s, with rows corresponding to the rows of the
database and columns corresponding to the attributes/items. A 1 would
indicate the presence of an attribute in a row, a 0 would indicate the
lack of an attribute. The density of the database refers to the
density of this matrix. The more dense the database is, the longer it
will take to mine, other factors being constant. The algorithms that
come with ARMiner were applied mainly to sparse (low density)
databases, like supermarket data and they do not perform very well on
dense data, i.e. they are slow on dense data.

As you can see, there are plenty of factors that influence the time
taken by a mining process, so it is not easy to predict what will be
the time taken when doing a mining operation. You have to know both
the data and the parameters that you are using to get a rough idea of
how much time it will take for the results to be computed.
_____________________________________________________________________

Q: Why is ARMiner so slow for some mining operations? Is it because
the algorithms are inefficient?

A: No, unfortunately the problem of finding all association rules is a
complex one and there is no known efficient solution for it. The
problem is basically NP-complete, which means that the worst case
performance can be exponential which is really bad. The best you can
hope is to improve efficiency of algorithms by constant factors. For
example you can notice that most of the time the Closure algorithm is
about twice as fast as Apriori. You cannot get rid of the exponential
worst case except if someone proves some day that P=NP (it's a great
open problem in computer science theory, whether P is equal or not to
NP, P and NP being classes of problems, for more info about this see a
book on computer science theory). So the short answer is: the
algorithms are among the most efficient currently known but the
problem is difficult and no matter what algorithm or implementation
you use, there will be databases on which it can take days (or less if
you run out of memory :)) to get a result.
_____________________________________________________________________

Q: How should I start mining a database?

A: Here are a couple of advices to help you start mining a
database. First you should get to know the characteristics of your
data: how many rows and attributes it has, what is its density,
etc. Start mining using a high minimum support value, let's say 0.9
(if you know your dat is not dense you can start with lower value, 0.2
for example). If you don't get any rules or very few rules then
decrease this to 0.8. At one point (it will happen sooner for denser
databases) you will start getting plenty of rules to satisfy your
search. Personally, I have very rarely mined databases for supports
smaller than 1% (0.01 as expected by ARMiner). Playing a little with
the supports will make you get a feeling for what will take time and
what will work fast.

Limiting the number of attributes you are interested in will not
speed-up the mining too much. This is due to the fact that ARMiner
caches the frequent itemsets and therefore searches for all of them
anyway. The speed-up obtained in the rule generation procedure will be
hard to notice since this part is already quite fast. The caching
mechanism provides however an advantage since once you mined a
database for minimum support x you will never need to repeat the
process if your mining uses minimum supports higher or equal to x.

The last advice is to use FPgrowth, which is the fastest algorithm
currently implemented for ARMiner.
_____________________________________________________________________

Q: Can I contribute to ARMiner?

A: Actually nobody asked this question yet. But if you are interested
in contributing to ARMiner then you're welcome. Get in touch with me
and we can discuss the improvements you would like to add.
_____________________________________________________________________

Thank you for reading this document. If you have any questions do not
hesitate to send me email at laur@cs.umb.edu.

Laurentiu Cristofor