CS 110: Introduction to Computing with Java

Lab 9

Pre-Lab

1.       Do a Google search on the three word phrase Word Frequency Analysis.  Study the topic and its applications.

Lab

Introduction

This lab has you finish a utility that counts words in a text file.  Word frequency analysis is used by linguists and cryptographers in various applications. 

We use the following algorithm.  We maintain a Map of words and their frequencies in the analyzed text.  A Map is a table that connects a key to a value.  We call a <key,value> pair in a Map an entry.  There can be at most one entry with a given key.  In this application, we use a TreeMap object that implements the Map. The keys are String objects representing words, and the values are Counter objects representing the frequency of the word in the text.  We read the text file a line at a time, and then read the line a word at a time.  For each word, we look the word up in the Map.  If there is an entry for the word, we increment the associated Counter.  Otherwise, this is the first time we have encountered the word so we add an entry with a Counter with the value 1.

Purpose

This lab gives you some experience working with the File I/O, exceptions, and Java Collections.

Activities

1.       Copy the project.  A nearly complete set of Java files for the word count application can be found in Lab9.zip.  Extract these files to folder of your choosing on the student drive.  As usual, open Dr Java, create a new project, and open the files you just unzipped. 

2.       Go to the DrJava Edit-> Preferences-> Compiler Options Menu and uncheck the box labeled “Show Unchecked Warnings”.

3.       Look at the code.  You can make the code work by completing one (or maybe 2) line of code in the WordCount class.  This line of code must create a Scanner object from a String containing the file name.  NOTE:  You invoke this program on the Dr Java Interactions Pane with two command line arguments: one for the file name and one for the minimum frequency count.

4.       Write the line of code and run the application.  We have provided three text files that you can use as sample data.  These sample files contain the Declaration of Independence (doi.txt), Homer’s Odyssey (odyssey.txt), and Dicken’s Great Expectations (ge.txt).  Use the application to find the words that occur more than 100 times in the Odyssey.  Compare with the sample output below.

5.       Test exceptional conditions.  In particular, what happens when you give an invalid file name (e.g. you use “odssey.txt” instead of “odyssey.txt”)?  What happens when you give an invalid number as the minimum frequency count (e.g. you use “aardvark” instead of “1000”)?

6.       Modify the application.  In particular, modify the code so the case where an invalid file name is given behaves the same way as the case where an invalid frequency count is given.

7.       Test your changes. 

Sample Output

You should see the following when you run the WordCount class with application parameters “odyssey.txt” and “1000”:

> java WordCount odyssey.txt 1000

the   5,846

and   5,036

to    3,196

of    3,058

you   1,906

i     1,869

he    1,823

a     1,812

in    1,627

for   1,296

his   1,277

as    1,189

with  1,134

it    1,123

that  1,115

him   1,059

>

Before you leave, have your TA check off that you completed the lab.  Make sure each person saves a copy of your work.

Lab Report

Write a document describing your experiences.   Your lab must be printed (not handwritten). 

Answer the following questions related to what you did in this week’s lab.  You may complete the code on your own, but the TA must certify that most of your work was done in the lab.

1.       Answer each of the following questions about the application:

a)       In the original application, what were the different ways that NumberFormatException and IOException are handled?

b)       Find all Java Collection objects used in this application.  What are the roles of each object?

2.       Describe what you learned doing this lab.  Explain what was difficult and what was easy.

3.       Attach a listing of your completed WordCount classes. 

Note:  You should work alone on writing the lab report.

Note:  The assignment is due at the BEGINNING of your next lab.  No late assignments will be accepted.  Emailed assignments will not be accepted.  If you are not going to be in lab on the due date, you can turn the assignment ahead of time to the CS110 TA box in the CS department office.