IT 117: Introduction to Scripting
Homework 5

Due

Sunday, February 25th at 11:59 PM

Deliverables

There is one deliverable for this assignment

Make sure the script obeys all the rules in Homework Script Rules

Specification

The script must have 5 functions

open_file_read

This function must have the following header:

def open_file_read(filename):

It must try to create a file object for reading on the file whose name is given by the parameter filename.

If it is succesful in creating the file object it should return the object.

If it is cannot create the object it should print an error message and return None.

count_words

This function must have the following header:

def count_words(file):

It must read in a file and count the words in the file.

It must return the word count.

The function should use the following algorithm.

initialize word_count to 0
for line in file:
   create the list words by calling split on line
   add the length of the list words to word_count
return word_count

word_set_create

This function must have the following header:

def word_set_create(file):

It must read in a file and create a set of the words in that file.

All words added to the set must be lowercase.

It must return that set.

The function should use the following algorithm.

create the emtpy set word_set
for line in file:
   create the list words by running split on line
   for word in words:
      make word lowercase
      add word to word_set
return word_set

different_words

This function must have the following header:

def different_words(set_1, set_2):

It must return a set of all the words that are in one set, but not the other.

common_words

This function must have the following header:

def common_words(set_1, set_2):

It must all the words that are found in bothsets.

Script for this assignment

Open an a text editor and create the file hw5.py.

You can use the editor built into IDLE or a program like Sublime.

Test Code

Your hw5.py file must contain the following test code at the bottom of the file:

filename_1 = "gettysburg.txt"
filename_2 = "gettysburg_hay.txt"
file_1     = open_file_read(filename_1)
file_2     = open_file_read(filename_2)
count_1 = count_words(file_1)
print(count_1)
count_2 = count_words(file_2)
print(count_2)
print()
file_1     = open_file_read(filename_1)
file_2     = open_file_read(filename_2)
word_set_1 = word_set_create(file_1)
word_set_2 = word_set_create(file_2)
print("Filename            Words  Unique Words")
print("---------------------------------------")
print(filename_1 + "      " + str(count_1) + "    " + str(len(word_set_1)))
print(filename_2 + "  " + str(count_2) + "    " + str(len(word_set_2)))
print()
different_word_set = different_words(word_set_1, word_set_2)
print("The two files have", len(different_word_set), "words in one file, but not the other" )
for word in sorted(different_word_set):
     print(word)
print()
common_words_set = common_words(word_set_1, word_set_2)
print("The two files have", len(common_words_set), "words in common")

For this test code to work, you must copy gettysburg.txt and gettysburg_hay.txt to your machine.

To do this use FileZilla to copy the files from /home/ghoffman/course_files/it117_files into the directory that holds your hw5.py script.

Suggestions

Write this program in a step-by-step fashion using the technique of incremental development.

In other words, write a bit of code, test it, make whatever changes you need to get it working, and go on to the next step.

  1. Create the file hw5.py.
    Enter the headers for each of the required functions.
    Under each header write the Python statement pass.
    Run the script.
    Fix any errors you find.
  2. Replace the pass statement in open_file_read with the body of the code from your hw4.py script.
    Copy the test code to the bottom of the script.
    Write a # as the first character in each line, except the first 4 lines.
    By doing this you have "commented out" all but the first 4 lines of the test code.
    This means only the first 4 lines of the test code will run.
    Run the script.
    Fix any errors you find.
  3. Replace the pass statement in count_words with a statement assigns 0 to the variable count.
    Now write a for loop that loops over the lines in the file.
    Inside the loop print each line.
    Uncomment the 5 lines in the test code starting with
    count_1 = count_words(file_1)
    Run the script.
    You should not see any output.
    Fix any errors you find.
  4. Remove the print statement.
    Create the list word_list by running split on each line.
    Print word_list.
    Run the script.
    Fix any errors you find.
  5. Remove the printstatement.
    Replace it with a statement that increments count by the length of the list word_list.
    Outside the for loop return count.
    Run the script.
    You should see
    272
    268
    Fix any errors you find.
  6. Remove the pass statement from word_set_create.
    Replace it with a statement that creates the empty set word_set.
    Now write a for loop that loops over the lines in the file.
    Inside the loop print each line.
    Uncomment the 4 lines in the test code beginning with
    file_1     = open_file_read(filename_1)
    Run the script.
    Fix any errors you find.
  7. Remove the print statement.
    Create the list word_list by running split on each line.
    Print word_list.
    Run the script.
    Fix any errors you find.
  8. Remove the print statement.
    In its place write another for loop that loops over the words in word_list.
    word would be a good name for the loop variable.
    Inside this second loop, print each word in word_list.
    Run the script.
    Fix any errors you find.
  9. Remove the print statement.
    Replace it with a statement that adds the lowercase value of each word to word_set.
    Outside both loops, return word_set.
    Uncomment the 5 lines in the test code beginning with
    print("Filename            Words  Unique Words")
    Run the script.
    You should see
    Filename            Words  Unique Words
    ---------------------------------------
    gettysburg.txt      272    132
    gettysburg_hay.txt  268    135
    Fix any errors you find.
  10. Remove the pass statement from different_words.
    There is a set method which finds all the elements that are in set_1 but not set_2 as well as all elements that are in set_2 but not set_1.
    Write a return statement which calls this method on the parameters set_1 and set_2.
    Uncomment the 5 lines in the test code beginning with
    different_word_set = different_words(word_set_1, word_set_2)
    Run the script.
    You should see
    The two files have 9 words in one file, but not the other
    advanced
    battle
    battlefield
    carried
    field
    fought
    god
    under
    upon
    Fix any errors you find.
  11. Remove the pass statement from common_words.
    There is a set method what will return all the elements found in both sets.
    Write a return statement which calls this method on the parameters set_1 and set_2.
    Uncomment the last two lines in the test code.
    Run the script.
    You should see
    The two files have  132  words in common
    Fix any errors you find.

Testing on Your Machine

Copy the file to Unix

Make the File Executable

Copyright © 2021 Glenn Hoffman. All rights reserved. May not be reproduced without permission.