Due Sunday, February 24th at 11:59 PM
Deliverables
There is one deliverable for this assignment:It must be in an hw4 directory, which you must create inside a hw directory inside you it117 directory.
- hw4.py
Make sure the script obeys all the rules in the Script Requirements page.
To test this script you must copy into your hw4 directory the file gettysburg.txt from /home/ghoffman/course_files/it117_files using the following Unix commandcp /home/ghoffman/course_files/it117_files/gettysburg.txt .Specification
Create a script that does the followingIn the frequency dictionary, the keys would be individual words and the values would be the number of times the word appeared in the file.
- Reads in a text file
- Creates a dictionary of word frequencies
- In this dictionary they keys will be words found in the file
- The values would be the number of times the word appeared
- Prints a sorted list of all words in the file and how many times they appeared in the file
This script must contains two functions.
- word_frequencies_create
- word_frequency_print
word_frequencies_create
This function must have the following headerdef word_frequencies_create(filename):This function should read in a text file and return a word frequency dictionary.
A word frequency dictionary is one where the keys are words and the values are the number of times the word appears in the file.
This function must ignore case, so "We" and "we" should count as the same word.
The following pseudocode shows the algorithmopen the file for reading create an empty dictionary for each line in the file make the line lowercase turn the line into a list for each word in the list if the word is not in the dictionary create an entry in the dictionary with a value of 1 else add 1 to the value of the word entry in the dictionary return the dictionary
word_frequency_print
This function has the following headerdef word_frequency_print(frequencies):This function must print the words in alphabetical order along with the number of times each word appears in the file.Test Code
The script must contain the following test code at the bottom of the fileword_freq = word_frequencies_create('xxxxxxx') word_freq = word_frequencies_create('gettysburg.txt') word_frequency_print(word_freq)Suggestions
Write this script in stages, testing your script at each step
- Create a file with the hashbang line, the test code and each function header.
The body of the function should be just the the Python statementpass
.
Pass does nothing but it will keep the code from giving syntax errors.
Run the code.
You should see nothing.
If you have errors, fix them.- Remove the
pass
statement in word_frequencies_create.
Create atry
/except
statement.
Add a line that opens a file for reading using the filename parameter.
Write code to print an error message if the file cannot be opened.
Run the code. You should seeCannot open xxxxxxxIf you see something else, fix it.- Add an
else
clause to thetry
/except
statement.
Create an empty dictionary called word_frequencies.
Create afor
loop that prints each line in the file.
Run the code and fix any errors.- You need to convert all words into lowercase.
To do this insert an assignment statement before the
Run the code and fix any errors.- Remove the print statement.
Use the split string method to create a list of all the words in the file and assign this list to a variable.
Print this list.
Run the code and fix any errors.- Remove the previous print statement.
Add a newfor
loop inside the oldfor
loop that prints each word in the list.
Run the code and fix any errors.- Remove the previous print statement.
Write anif
statement that checks whether the word is NOT already in the dictionary.
If the word is not already in the dictionary, create an entry in the dictionary with the word as the key and a value of 1.
Add a line to print the dictionary.
This line must be outside bothfor
loops but still inside theelse
clause of thetry
/except
statement.
Make sure you get the indentation right.
Run the code.
You should see many words, but all the values should be 1.
Fix any errors you find.- Add an
else
clause to theif
statement inside the secondfor
loop.
Inside theelse
clause add 1 to the count for the current word in the dictionary.
Run the code.
You should see words with many different values.
Fix any errors you find.- Remove the print statement from the end of the function.
Replace it with a statement that returns the word_frequencies dictionary.
Remove thepass
statement from word_frequency_print code block.
Replace it with a statement that prints the parameter frequencies.
Run the code.
The output should be similar to what you saw at the last step.
Fix any errors you find.- Replace the print statement with a
for
loop that prints the word and word count value for each word in the dictionary.
Run the code and fix any errors.- Change the
for
loop so it prints the words in alphabetical order.Output
When you run the program the output should look something like this
Cannot open file xxxxxxx a 7 above 1 add 1 advanced 1 ago 1 all 1 altogether 1 and 6 any 1 ... war 2 we 10 what 2 whether 1 which 2 who 3 will 1 work 1 world 1 years 1