IT 117: Intermediate Scripting
Homework 5

Due Sunday, October 8th at 11:59 PM


There is one deliverable for this assignment: It must be in an hw5 directory, which you must create inside a hw directory inside you it117 directory.

Make sure the script obeys all the rules in the Script Requirements page.

To test this script you must copy into your hw5 directory the files gettysburg.txt from gettysburg_hay.txt /home/ghoffman/course_files/it117_files.


The purpose of this exercise is to create a script that compares two versions of the Gettysburg address.

Create a script that does the following This script must contain three functions.


This function must have the following header
def word_set_create(filename)
This function reads in a text file and returns a set of all words found in the file.

This function must ignore case, so "We" and "we" should count as the same word.


This function has the following header
def set_difference(set_1, set_2):
This function will create a list of all words found in the first set that were not found in the second.


This function has the following header
def word_set_print(word_set):
This function prints the words in a set alphabetical order.

Test Code

The script must contain the following test code at the bottom of the file
word_set = word_set_create('xxxxxxx')
word_set_1 = word_set_create('gettysburg.txt')
word_set_2 = word_set_create('gettysburg_hay.txt')
set_1_set_2_difference = set_difference(word_set_1, word_set_2)
set_2_set_1_difference = set_difference(word_set_2, word_set_1)

print('Words in the first text not found in the second')
print('Words in the second text not found in the first')


Write this script in stages, testing your script at each step
  1. Create a file with the hashbang line, the test code and each function. The body of the function should be the Python statement pass which does nothing but it stops syntax errors.
  2. Remove the pass statement in word_set_create and replace it with code that opens a file for reading. If the file cannot be opened for reading, the code should print an error message and not do any further processing of the file.
  3. Create an set called word_set. After this statement write a for loop that prints each line in the file.
  4. Use the lower string method to change all capital letters in the string to lower case.
  5. Use the split string method to create a list of all the words in the file. Print this list.
  6. Remove the previous print statement. In its place write a for loop that prints each word in the list.
  7. Replace the print statement with a statement that adds the word to word_set. After the for loop, print word_set
  8. Remove the print statement from the end of the function. Replace it with a statement that returns word_set.
  9. Remove the pass statement from set_difference and replace it with a statement that return the difference between set_1 and set_2
  10. Replace the the pass statement in word_set_print with a statement that prints the parameter word_set.
  11. Replace the print statement in word_set_print with a for loop that prints each element of the set.
  12. Change the for loop so it prints the words in alphabetical order.


Your output should look something like this:
Cannot open file xxxxxxx

Words in the first text not found in the second

Words in the second text not found in the first