IT 116: Introduction to Scripting

IT 116: Introduction to Scripting
Class 20

Microphone

Homework 9

I have posted homework 9 here.

It is due this coming Sunday at 11:59 PM.

You must make the script for this assignment executable or you will loose points.

Quiz 7

You will find the answers to Quiz 7 here.

Let's review them.

Questions

Are there any questions before I begin?

Tips and Examples

Reading Text Files With Python Scripts

Most of the scripts you will write from now on will read files
This means you will have to create file objects ...
by using the built-in function open
The first argument to open is the pathname of the file
The pathname consist of the path ...
and the name of the file
When the file is in the same directory as the script ...
the pathname is just the name of the file
To keep things simple, all assignment scripts will use files ...
in the same directory as the script
This means you must copy the text file to your script directory
You will do this using FileZilla
But the copying will go from pe15 ...
to the directory that contains your script
In the right hand window of FileZilla go to /home/ghoffman/course_files/it116_files/
Drag the file from pe15 on the right ...
to the directory that holds your script ...
on the left

Making a Script Executable With FileZilla

From now on, every script you write must be made executable ...
on pe15.cs.umb.edu
You can do this using FileZilla
Right-click on the file in FileZilla and select "Permissions" from the menu
Enter 755 in the box provided

Hashbang Problem with Windows Text Files

If you use Windows you may come across the following ...

when running a script on Unix

./ex20.py /usr/bin/python3^M bad interpreter: No such file or directory

The problem is that Unix and Windows have different ways ...
of marking the end of a line
Unix uses the single newline character to mark the end of a line
In Python a newline is written \n
Windows uses two characters to indicate the end of a line
- A carriage return - \r
- A newline - \n
It is the carriage return character that Unix prints as ^M above
Unix reads this extra carriage return character as part of the filename ...
for the Python interpreter
There is no file with this filename
So Unix give you the error above
There is a simple fix for this problem
Unix provides the utility dos2unix
It converts Windows text files into Unix text files
The command takes one argument
The file you are trying to convert
```
dos2unix FILENAME
```

Review

Running a Script Using `python3`

To write a program in the computer language C you create a text file
This file contains C statements and is called the source code
This is a text file ...
not a file with the binary instructions ...
that the CPU understands
To create a working program this file must be run through a compiler ...
which translates the source file into machine language
Machine language is binary code the CPU understands
Our Python scripts are text files ...
written in the Python programming language
The computer does not understand this text ...
so we cannot run Python scripts directly
The Python interpreter understands Python scripts ...
and translates each statement into machine language
The Python interpreter was written in a language like C ...
and compiled into an executable machine language file
When we type python3 at the command line we are running this binary file
To run the script we give the interpreter the filename of the Python script
That means to run a Python script we need two things in RAM
- The text of the script
- The binary code for the Python interpreter
The picture in memory looks like this

python3 then executes all the Python statements in the file

So if I have the Python script hello_1.py

# prints a friendly message

print("Hello world!")

I can run it on Unix command line like this
```
$ python3 hello_1.py 
Hello world!
```

Running a Script Without `python3`

A script that is executable can be run without typing python3
To make a file executable we need to do two things
- Give it read and execute permissions
- Add the hashbang line
We can set the permission using the Unix chmod command
```
chmod  755  FILENAME
```
Or we can change the permission in FileZilla
The hashbang line must be the very first line in the script
The first two characters must be #!
This is followed by the absolute pathname of the Python interpreter
The hashbang line on our Unix machines should be
```
#! /usr/bin/python3
```

Why Make a Script Executable?

We make script executable so we can run them from any directory
You do this by putting the script in a directory listed in your Unix PATH variable
When you enter the name of an executable script on the command line ...
Unix searches the directories listed in PATH
If it finds your script in one of these directories, it will run it

Records

For any sporting event involving teams we have at least 5 pieces of information
- Date
- Home team
- Opposing team
- Home team score
- Opposing team score
The collection of data about a specific event is called a record
The individual pieces of data within a record are called fields
A text file containing data usually has one record per line

Here is a file with two fields

Date
Temperature

2017-06-01 67
2017-06-02 71
2017-06-03 69
...

Reading Records from a File

The characters used to separate one field from the next are called delimiters
The two most common delimiters are space and comma
We can break up a line into fields using the string method split ...

and multiple assignment

line = "2017-06-01 67"
date, temp = line.split()

If the delimeter is a space split needs no argument
But if the delimiter is any other character ...

you must give split that characeter as an argument

>>> line = "2017-06-01,67"
>>> line.split(",")
['2017-06-01', '67']

Finding the Average From a File of Records

We can use split to compute the average temperature ...

from a file that looks like this

2017-06-01 67
2017-06-02 71
2017-06-03 69
2017-06-04 88
2017-06-05 74
...

Here is the relevant part of the code

count = 0
total = 0
for line in file:
    count += 1
    date, temp = line.split()
    temp   = int(temp)
    total += temp
 average = round(total/count)

Notice that I had to convert temp from a string into an integer

Looping Through a File More Than Once

There are some things that you cannot do by looping through a file once
To count the number of days with above average temperature ...
you have to loop through the file once to get the average ...
then loop through it again to count the days above that average
When we loop through a text file we can only go in one direction
Once we have reached the end of a file...
there is no way to rewind to the beginning
Instead we have to create a new file object

Attendance

New Material

More Data Processing in a Loop

In the last class we found the maximum and minimum temperature ...

from a file that looked like this

2017-06-01 67
2017-06-02 71
2017-06-03 69
...

But we can do more while we are looping through the file
We can also get the dates when the maximum and minimum occurred

We use an approach similar to the one we used for the maximum

set a max to the lowest possible value
for each temperature
    if the temperature is greater than max
        give set max to this temperature

To get both the maximum temperature and the date on which it happened
we will need two variables
```
max_date = ""
min_date = ""
```
Why did we initialize the variables with the empty string?
Because we do not have to compare the dates with any other values
We change these values ...

when either the max or min value is replaced

if temp > max:
    max      = temp
    max_date = date
if temp < min:
    min      = temp
    min_date = date

Here is the code with the new lines in red

#! /usr/bin/python3

file = open("temps.txt", "r")

max      = -100
min      = 200
max_date = ""
min_date = ""
for line in file:
    date, temp = line.split()
    temp  = int(temp)
    if temp > max:
        max      = temp
        max_date = date
    if temp < min:
        min      = temp
        min_date = date

print("Maximum:", max_date, max)
print("Minimum:", min_date, min)

When we run it we get

$ ./temps_max_min_dates.py
Maximum: 2017-06-24 89
Minimum: 2017-06-26 66

Errors

It is easy to get an error when writing scripts
But to fix errors it is important to recognize the different types
There are three classes of errors
- Syntax errors
- Logic errors
- Runtime errors

Syntax Errors

Syntax errors occur when you write a statement ...

that violates the rules of the programming language

>>> for = 5
  File "<stdin>", line 1
    for = 5
        ^
SyntaxError: invalid syntax

Python does not let you use a keyword as a variable name
So this is not a legal Python statement

Misspelling the name of a function or variable is a common error

>>> name = "Glenn"
>>> print(nme)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'nme' is not defined

It is also a syntax error

Another common syntax error is leaving out a )

$ cat get_number_1.py 
#! /usr/bin/python3

number = int(input("Number: ")

$ ./get_number_1.py 
  File "./get_number_1.py", line 6
    
                                  ^
SyntaxError: unexpected EOF while parsing

Indentation problems are also syntax errors

>>> for i in range(5):
...     num = i * 2
...   print(num)
  File "<stdin>", line 3
    print(num)
             ^
IndentationError: unindent does not match any outer indentation level

Syntax errors are the most common errors
But they are the easiest to fix
All you have to do is change some characters
As you write more code ...
you will find it easier to spot and fix syntax errors

Logic Errors

A logic error occurs when the code does not give the correct results

Here is an example

average = num_1 + num_2 / 2
print ("Average:", average)

Can you spot the problem here?
Division, /, has higer precedence than addition, +
So 2 will divide not the sum of num_1 and num_2 ...
but num_2 alone
The average should be calculated like this
```
average = (num_1 + num_2) / 2
```
When you have a syntax error the interpreter will tell you
The error message it gives specifies the kind of error ...
and prints the line where it occurs
But you get no warning with a logic error
The code runs with no sign of a problem
You will not know there is a problem ...
unless you check the results of running the code
This is why it is always important run tests on the scripts you write
Even if you know there is an error ...
it can be hard to find
The problem could be anywhere in the code
It is hard to spot your own errors
That is why smart coders ask others to proof read what they write

Runtime Errors

Runtime errors occur only when the the program is run
They occur when a specific value makes a statement fail
If the program is run again with a different value ...
no runtime error would occur
A common runtime error is to attempt to open a file ...

that does not exist

$ ./file_open.py 
Filename: xxxxxx
Traceback (most recent call last):
  File "./file_open.py", line 6, in <module>
    file     = open(filename, "r")
FileNotFoundError: [Errno 2] No such file or directory: 'xxxxxx'

If you try to open a file for which you have no permission ...

you will get another type of runtime error

$ ./file_open.py 
Filename: unreadable.txt
Traceback (most recent call last):
  File "./file_open.py", line 6, in <module>
    file     = open(filename, "r")
PermissionError: [Errno 13] Permission denied: 'unreadable.txt'

Giving a conversion function a value it can't handle ...

will result in another runtime error

$ ./int_request.py 
Integer: five
Traceback (most recent call last):
  File "./int_request.py", line 5, in <module>
    number = int(input("Integer: "))
ValueError: invalid literal for int() with base 10: 'five'

Or you can give a function an argument with the wrong type

>>> round("five")
Traceback (most recent call last):
  File "<stdin>, line 1, in <module>
TypeError: type str doesn't define __round__ method

Or divide by zero

$ ./divide_two_numbers.py 
Numerator: 5
Denominator: 0
Traceback (most recent call last):
  File "./divide_two_numbers.py", line 7, in <module>
    result      = numerator / denominator
ZeroDivisionError: division by zero

The problems that cause runtime errors are called exceptions

Exceptions Objects

Errors like the ones above are frustrating
But the error message gives you some idea what went wrong
Decades ago my CS classes required me to write C programs
In C a runtime error would produce the following output
```
Segmentation fault: Core dumped
```
Thanks, that was really helpful
Many computer languages have a better way of dealing with this problem
They have a mechanism built into the language ...
that deals with runtime errors
It gathers information about what happened and where ...
and uses that information to create an error message
The error message describes the problem ...
and where it occurred
It does this by creating an exception object
When this happens we say that the interpreter has raised an exception

Catching Exceptions

But the exception mechanism can do more than create helpful error messages
Python allows you to look for exceptions ...
while your script is running
Your code can then do something to prevent the script from aborting ...
and sending you back to the command line
You do this with a try/except statement ...

which has the following form

try:
    STATEMENT
    STATEMENT
    ...
except:
    STATEMENT
    STATEMENT
    ...

The statements in the try code block execute normally ...
unless a runtime error occurs
If if that happens, the interpreter stops running the statements in the try block ...
and jumps to the except code block ...
running the statements in this new code block
When your script does this, it is said to catch an exception
Whenever you use open to create a file ...
you should put it inside a try/except statement

Here is an example

$ cat open_file.py
#! /usr/bin/python3

# demonstrates using a try/except statement 
# to catch exceptions encountered while
# trying to open a file

filename = input("Filename: ")
try:
    file = open(filename, "r")
    for line in file:
        print(line.rstrip())
except:
    print("Could not open file", filename)

$ ./open_file.py
Filename: xxxx
Could not open file xxxx

Notice that the interpreter never ran the for loop
When the exception occurred ...
it stopped running the try code ...
and jumped to the except code block

Exceptions and Data Validation

We can use the input function to ask the user for a number ...
and then try to convert it into an integer

But if the user enters a decimal we will get an exception

>>>  number = int(input("Integer: "))
Integer: 5.0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '5.0'

Whenever we get data from the user we need to check it
This is called data validation ...
and it is best done with a function that uses a while loop

A few weeks ago I showed you an algorithm to validate data

ask the user for a value
while the value is not valid:
    print an error message
    ask the user for a value
return the value

We need a different algorithm here
If I ask the user for a number ...
and then try to convert it into an integer or float ...
I could get a runtime error
The conversion function must be run inside a try/except statement
And the try/except statement needs to be inside the while loop
But what boolean expression should the while loop use?
One way to deal with this is to set a flag to False
The while loop will keep running as long as the flag is False
If the conversion does not cause an exception ...
the flag is set to True
This will cause the loop to exit ...
and we can return the value

Here is the algorithm

set a flag to false
while the flag is not true
    ask the user  for a value
    try:
        convert the value to an integer
        set the flag to true
    except:
        print an error message
return the value

Here is a function that implements this algorithm

def get_integer():
    done = False
    while not done:
        number = input("Integer: ")
        try:
            number = int(number)
            done = True
        except:
            print(number, "cannot be converted into an integer")
    return number

When we run the code we get

$ ./get_integer.py
Integer: 2.0
2.0 cannot be converted into an integer
Integer: 2
2

Improving the Validation Function

The above function works
But we can make it shorter
And shorter code is usually better code
Why?
Because every line you write could cause an error
The shorter your code ...
the less chance there is for a problem
We can use a trick to make sure the while loop runs ...
when we don't yet have a value
We can write
```
while True
```
But how do we ever get out of the loop?
Instead of giving a flag a different value ...
we use a return statement
If the conversion does not cause a problem ...
the return statement will be run ...
and a return statement always causes the function to stop
That's how we exit the loop

Here is the refactored code

def get_integer():
    while True:
        number = input("Integer: ")
        try:
            number = int(number)
            return number
        except:
            print(number, "cannot be converted into an integer")

Being a Good Citizen