Class 9

- Sets in Mathematics
- Set Membership
- Subsets and Supersets
- Union of Sets
- Intersection of Sets
- Difference between Sets
- Symmetric Difference between Sets
- Sets in Python
- Creating a Set in Python
- Running
`set`

with an Argument - Set Literals
- Adding Elements to a Set
- Removing Elements from a Set

- The Size of a Set
- When Are Sets Equal?
- Elements in A Python Set
`for`

Loops with Sets- Testing for Set Membership
- Union of Sets in Python
- Intersection of Sets in Python
- Difference between Sets in Python
- Symmetric Difference between Sets in Python
- Subsets and Supersets
- Disjoint
- The clear Method
`min`

And`max`

with Sets- When Sets Are Best
- Practical Example
- Sets More Efficient Than Lists

It is due this coming Sunday at 11:59 PM.

The mid-term exam will be given on Tuesday, March 20th.

This is the first Tuesday after the Spring break.

It will consist of questions like those on the quizzes along with questions asking you to write short segments of Python code.

60% of the points on this exam will consist of questions from the Ungraded Class Quizzes.

The last class before the exam, Thursday, March 8th, will be a review session.

You will only be responsible for the material in the Class Notes for that class on the exam.

The Mid-term is a closed book exam.

Let's review the answers.

- All homework assignments and Class Exercises are due on the Sunday after they are posted ...
- at 11:59 PM
- I do not have the time to score this work until the following weekend
- In the week between when something is due ...
- and when I score it ...
- I run scripts to collect homework assignments ...
- and test Class Exercises
- I try to do this three times a week ...
- on Monday, Wednesday and Friday

- If I cannot collect your homework assignment ...
- I will send you an email
- If I cannot find you Class Exercise ...
- or if there is a problem with your script ...
- I will send you an email describing the problem
- This is only done in the week between the due date ...
- and the following weekend ...
- when I score the work
- Once I score an assignment ...
- I will send no more messages about missing work
- This means I cannot send you a list of the work ...
- that you have not turned in

- Once I have scored an assignment ...
- on the weekend following the due date ...
- I cannot accept any missing work
- I simply do not have the time

- A set is an
**unordered**collection ... - of
**distinct**objects - You can put anything into a set ...
- as long as it is different ...
- from what's already in the set
- The set with nothing in it ...
- is called the empty set

- If the value x is contained in set A ...
- we say that x is a member of A
- In mathematics is written
x ∈ A

- This is a boolean expression ...
- meaning its value can only be true or false

- If you have two sets, A and B
- and all the values inside A ...
- are also contained in B ...
- then A is a subset of B
- The situation is shown in the following diagram

- In mathematics, the statement that A is a subset
of is written
A ⊂ B

- Again this statement can only be true or false
- If one value inside A is not inside B ...
- then A is
**not**a subset of B - Another way to look at this relationship ...
- is to say that B is a superset of A
- In mathematics this relationship is written
B ⊃ A

- If we have two sets, A and B
- we can form a new set ...
- that has all the values inside A ...
- and all the values inside B
- This new set is called the union of A and B ...
- and is written
A ∪ B

- In the diagram below the union of A and B ...
- is shown in red

- There are other operations we can perform on sets ...
- to create a new set
- If we have the sets A and B ...
- we can create a new set which contains all the values inside A ...
- which are also inside B
- This new set is the intersection of A and B ...
- which is written
A ∩ B

- In the diagram below, the intersection of A and B is shown in red

- The set of all values inside set A ...
- that are
**not**inside set B ... - is the difference between A and B ...
- which is written
A - B

- In the diagram below, the difference between A and B is shown in red

- The set of all values in A which are not in B ...
- together with all the values in B that are not in A ...
- is called the symmetric difference between A and B ...
- and is written
A Δ B

- In the diagram below the symmetric difference between A and B is shown in red

- Sets in Python follow the mathematical definition of a set
- A set is an object ...
- that holds an
**unordered**collection ... - of
**unique**items - The items inside a set can by of
**any**data type ... - as long as the data type is immutable

- To create a set in Python ...
- you must use the built-in
`set`

function - If you run
`set`

with no arguments ... - you will create an empty set
>>> s1 = set() >>> s1 set()

- Notice that when we print this set ...
- the word "set" appears in the output
- This behavior is similar to the view objects

`set`

with an Argument- You can also run
`set`

with**one**argument - That argument must be iterable
- Broadly speaking a Python object is iterable ...
- if you can use it in a
`for`

loop - This means you can use any of the following ...
- as an argument to
`set`

- Lists
- Tuples
- Dictionaries
- Strings

- Let's see this in action with a list
>>> num_list = [1,2,3] >>> num_set = set(num_list) >>> num_set {1, 2, 3}

- It also works with a
tuple
>>> letter_tuple = ('a', 'b', 'c') >>> letter_set = set(letter_tuple) >>> letter_set {'a', 'b', 'c'}

- And a string
>>> letter_set_2 = set('bletch') >>> letter_set_2 {'l', 'c', 'b', 'e', 't', 'h'}

- Notice that the letters do not appear in the same order ...
- as they do in the string "bletch"
- That's because a set is an
**unordered**collection - When we use a dictionary to create a set
>>> numb_set_2 = set({'one': 1, 'two': 2, 'three': 3}) >>> numb_set_2 {'one', 'two', 'three'}

- only the keys are used
- If the argument to
`set`

contains duplicate values ... - only one instance of each duplicated value ...
- is added to the set
>>> letter_set_3 = set('Mississippi') >>> letter_set_3 {'i', 'M', 's', 'p'}

- To create a list, you can use a list literal ...
- by placing the values, separated by commas ...
- within square brackets
>>> list_1 = [1, 2, 3, 4, 5] >>> type(list_1) <class 'list'>

- Similarly, we can create a set using set literals ...
- by placing values, separated by commas ...
- within curly braces
>>> nonsense = {'foo', 'bar', 'bletch'} >>> type(nonsense) <class 'set'>

- We can use empty square brackets ...
- to create an empty list
>>> empty = [] >>> type(empty) <class 'list'>

- But we cannot use empty curly braces ...
- to create a empty set ...
- because the empty curly braces ...
- denote an empty
**dictionary**>>> empty = {} >>> type(empty) <class 'dict'>

- That is why an empty set
empty_set = set()

- looks like this
>>> empty_set set()

- Sets are mutable objects ...
- so they can be changed at any time
- There are two set methods that can be used ...
- to add elements to a set
- add()
- update()

- add adds a
**single element**to a set - So if we start with an empty set
>>> s1 = set() >>> s1 set()

- We can use add to add individual elements
>>> s1.add(1) >>> s1 {1} >>> s1.add('two') >>> s1 {1, 'two'} >>> s1.add((3,3,3)) >>> s1 {(3, 3, 3), 1, 'two'}

- If you try to add an element to a set ...
- that already contains the value ...
- nothing will change
>>> s1.add(1) >>> s1 {(3, 3, 3), 1, 'two'}

- but it won't raise an exception
- The update method adds several elements ...
- to a set
- I takes one argument ...
- which must be iterable
>>> s2 = set() >>> s2 set() >>> s2.update([1, 2, 3]) >>> s2 {1, 2, 3} >>> s2.update('foo') >>> s2 {1, 2, 3, 'f', 'o'}

- Notice that only one 'o' was added to the set ...
- and no exception was raised

- To remove an element from a set ...
- use one of two methods
- discard()
- remove()

- Both methods take a single argument ...
- the value that is to be removed
>>> numb_set {1, 2, 3, 4, 5} >>> numb_set.discard(2) >>> numb_set {1, 3, 4, 5} >>> numb_set.remove(4) >>> numb_set {1, 3, 5}

- The only difference between the two methods ...
- is what happens if you try to remove an element ...
- that is not in the set
- discard will say nothing
>>> numb_set.discard(2) >>> numb_set {1, 3, 5}

- But remove will raise an exception
`>>> numb_set.remove(4) Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 4`

- One way to compare sets is to compare the number of their elements
- Mathematicians call the size of a set its cardinality
- The
`len`

function gives the size of a set>>> s1 = {1, 2, 3} >>> len(s1) 3 >>> s2 = {3, 2, 1} >>> len(s2) 3 >>> s3 = {'one', 'two', 'three', 'four'} >>> len(s3) 4

- Both lists and sets have elements ...
- and we can run the
`len`

function on each to get their size - But how to we tell two lists, or two sets, apart?
- Lists have two attributes
- Their elements
- The order of the elements

- So we can have two sets that have the same elements ...
- but are not equal
>>> list_1 = [1, 2, 3] >>> list_2 = [3, 2, 1] >>> list_1 == list_2 False

- But sets only have one attribute ...
- their elements
- If two sets have the same elements ...
- they are equal
>>> s1 = {1, 2, 3} >>> s2 = {3, 2, 1} >>> s1 == s2 True

- In mathematics the elements of a set can be
**anything**... - even other sets
- Python is not so open minded
- Only immutable values ...
- can be elements of a set
- So you can create a set of tuples
>>> tuple_set = {(1,2), (3,4), (5,6)} >>> tuple_set {(5, 6), (1, 2), (3, 4)}

- but you cannot create a set of lists
`>>> list_set = {[1,2], [3,4], [5,6]} Traceback (most recent call last): File "<stdin>", line 1, in <module>`

- But what if you use variables to define a set?
- In that case, Python takes the values of the variables ...
- so the set elements remain constant ...
- even if the values change
>>> a, b, c = 1, 2, 3 >>> A = {a, b, c} >>> A {1, 2, 3} >>> a = 5 >>> A {1, 2, 3}

`for`

Loops with Sets- Sets are iterable
- This means that they can be used in a
`for`

loop - The general format of a
`for`

loop looks like this`for LOOP_VARIABLE in ITERABLE_OBJECT: STATEMENT ...`

- With each pass through the loop ...
- a new value is taken from the iterable object ...
- and assigned to the loop variable
- Then we go back to the top ...
- and take the next variable
- This continues until all values in the object ...
- are used once
>>> s1 = {1, 2, 3, 4, 5} >>> for number in s1: ... print(number) ... 1 2 3 4 5 >>> s2 = {'one', 'two', 'three', 'four', 'five'} >>> for number in s2: ... print(number) ... one two five four three

- Notice that the order in which the elements appear ...
- when we define the set ...
- is not necessarily the order in which they appear ...
- in the loop
- In the case of s1 the order was the same ...
- but
**not**in s2 - This is not the case with lists ...
- where order matters
>>> list_1 = [1, 2, 3, 4, 5] >>> for number in list_1: ... print(number) ... 1 2 3 4 5 >>> list_2 = [5, 4, 3, 2, 1] >>> for number in list_2: ... print(number) ... 5 4 3 2 1

- If you want to know whether a particular value is contained in a list ...
- you can use the
`in`

operator>>> list_1 [5, 4, 3, 2, 1] >>> 7 in list_1 False >>> 5 in list_1 True

- We can use the same
`in`

operator ... - to test whether a value is a member of a set
>>> s1 {1, 2, 3, 4, 5} >>> 7 in s1 False >>> 8 in s1 False >>> 3 in s1 True

- To test whether a value is
**not**inside a group ... - we can use the
`not in`

operator>>> 8 not in s1 True >>> 3 not in s1 False

- You can combine two sets to create a new set ...
- using the operation union
- The union of two sets is a new set ...
- consisting of all the elements of both sets ...
- with no duplicates
- In mathematics this is written
A ∪ B

- We can form the union of two sets in Python ...
- by using the union method
>>> A = {1, 4, 8, 12} >>> B = {1, 2, 6, 8} >>> A.union(B) {1, 2, 4, 6, 8, 12}

- The union operation is symmetrical
- This means that
A ∪ B

- is the same as
B ∪ A

- So it doesn't matter what set object we use ...
- when running the method
>>> B.union(A) {1, 2, 4, 6, 8, 12}

- In addition to the union method ...
- of the set object ...
- there is also a union operator
- The union operator gives the same results as the union
method
>>> A | B {1, 2, 4, 6, 8, 12}

- The union operator is also symmetric
>>> B | A {1, 2, 4, 6, 8, 12}

- The intersection of two sets ...
- is a new set consisting of all the elements ...
- that are present in both sets
- In mathematics, this is written
A ∩ B

- Sets in Python have an intersection method
>>> A {8, 1, 12, 4} >>> B {8, 1, 2, 6} >>> A.intersection(B) {8, 1}

- Intersection is also symmetrical ...
- so
A ∩ B = B ∩ A

- So we can get the same results by running the intersection method ...
- on either object
>>> B.intersection(A) {8, 1}

- Python also has an intersection operator,
>>> A & B {8, 1}

- which is also symmetrical
>>> B & A {8, 1} A & B == B & A True

- Another way to form a new set ...
- from two existing sets ...
- is to take the difference between the sets
- If we have two sets, A and B ...
- the difference between A and B ...
- is a new set consisting of all the elements in
`A`

... - that are not in B
- This is written
A - B

- In Python, we can use the set difference method
>>> A {8, 1, 12, 4} >>> B {8, 1, 2, 6} A.difference(B) {12, 4}

- Set difference is
**not**a symmetric operationA - B ≠ B - A

- So the difference method is not symmetric
>>> B.difference(A) {2, 6}

- Python also has a set difference operator, -
>>> A - B {12, 4}

- which is also not symmetric
>>> A - B == B - A False >>> A - B != B - A True

- The symmetric difference between two sets A and B ...
- consists of all the elements of A that are not in B ...
- and all the elements of B that are not in A
- In mathematics, this is written
A Δ B

- We can take the symmetric difference between two sets in Python ...
- by using the symmetric_difference method
>>> A {8, 1, 12, 4} >>> B {8, 1, 2, 6} >>> A.symmetric_difference(B) {2, 4, 6, 12}

- The symmetric difference operation is symmetric
A Δ B = B Δ A

- So we can run the method on either set object ...
- and get the same results
B.symmetric_difference(A) {2, 4, 6, 12}

- Python also has a symmetric difference operator, ^
A ^ B {2, 4, 6, 12}

- which is also symmetric
>>> A ^ B == B ^ A True

- If all the elements of set A ...
- are also contained in set B ...
- then A is a subset of B
- We can tell if one set is a subset of another ...
- by using the issubset method
- If we have two sets
>>> A = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} >>> B = {1, 3, 5, 7, 9}

- We can ask if one set is the subset of another like this
>>> A.issubset(B) False >>> B.issubset(A) True

- Python also provides the subset operator, <=
>>> A <= B False >>> B <= A True

- If all the elements of the set B are contained in A ...
- then A is a superset of B
- We can ask if one set is a superset of another using the
issuperset method
>>> A.issuperset(B) True >>> B.issuperset(A) False

- The superset operator is >=
>>> A >= B True >>> B >= A False

- If two sets have
**no**element in common ... - they are said to be disjoint
- The isdisjoint method of a set object ...
- will tell you if two sets are disjoint
>>> B = {1, 3, 5, 7, 9} >>> C = {2, 4, 6, 8, 10} >>> B.isdisjoint(C) True

- Since this condition is symmetric ...
- we can run the method on either set object
>>> C.isdisjoint(B) True

- The clear method removes all elements from a set
>>> D = {1, 2, 3, 4, 5} >>> D {1, 2, 3, 4, 5} >>> D.clear() >>> D set()

`min`

And `max`

with Sets- To find the set element with the maximum value ...
- you can use the
`max`

built-in function>>> B = {1, 3, 5, 7, 9} >>> max(B) 9

- To find the set element with the minimum value ...
- use the
`min`

function>>> min(B) 1

- I would guess that most of you think you will never need to use sets
- But I think you are wrong
- Let me show you an example
- Let's say you have two files with email address ...
- and you know there are duplicates
- You want to create a new file ...
- that combines the addresses in both files ...
- but eliminates duplicates
- You might think to use lists
- To do this we first have to read in the two files ...
- and store each in a list
emails_1 = [] emails_2 = [] for line in email_file_1: emails_1.append(line.strip()) for line in email_file_2: emails_2.append(line.strip())

- Now we create a new list ...
- which has all the emails from the first list
emails_new = emails_1[:]

- Then we loop through the second list ...
- looking for entries that are not already in the new list ...
- adding them to the new list when we find them
for email in emails_2: if email not in emails_new: emails_new.append(email)

- When we run this we get
$ ./merge_emails_1.py Email file 1: emails_1.txt Email file 2: emails_2.txt List 1 ghoffman@cs.umb.edu big_bill@hotmail.com larry.wall@gmail.com timsmith@yahoo.com bigorangeshithead@whitehouse.gov List 2 ghoffman@cs.umb.edu alanh@hotmail.com timsmith@yahoo.com davidm@mac.com billybob@cs.umb.edu Merged list ghoffman@cs.umb.edu big_bill@hotmail.com larry.wall@gmail.com timsmith@yahoo.com bigorangeshithead@whitehouse.gov alanh@hotmail.com davidm@mac.com billybob@cs.umb.edu

- You can see the full code for this script here
- Not a bad way to go if the lists were small
- But what if each list had thousands of addresses?
- The merge loop would have to test each one of the entries in the second list ...
- to see if it was already in the list
- This could take quite a bit of time
- If we used sets, it would be much quicker
- We replace the loop from the first script ...
- with a single line of code
emails_new = emails_1.union(emails_2)

- You can see the code here

- Some problems we face in real life ...
- are really problems that involve sets
- At UMB, instructors have to send academic status reports ...
- for students who are playing sports
- Let's say I have a list of student IDs each of my courses
>>> it244 = {'01459659', '00709552', '01565798', '00974687', '01357397', '01107434', ... '01516157', '01470962', '01015502', '01283749', '01313387', '01113342', ... '01684609', '01018750', '01458680', '01530289', '01600144'} >>> it341 = {'01276750', '01246473', '01146053', '01361550', '01330451', '01405338', ... '01240592', '01328324', '01393077', '00822499', '01158165', '01342910', ... '01559794', '01293714', '01352486', '01367216', '01165111', '01617531', ... '01485027', '01397047', '01459659'}

- Why did I have to enter the IDs as strings ...
- instead of numbers?
- Most UBM student IDs start with 0
- But in Python a number literal cannot begin with 0
- If you try Python will object
`>>> a = 01684609 File "<stdin>", line 1 a = 01684609 ^ SyntaxError: invalid token`

- The only way to get around this problem ...
- is to enter the IDs as strings
- Are there any students who are registered in both courses?
- I can use the intersection method to find out
>>> it244.intersection(it341) {'01459659'}

- Now let me create a new set of all the students in both classes
>>> all_students = it244 | it341

- By using the union operator | ...
- I have eliminated all duplicates ...
- as I can see by taking the length of both sets
>>> len(it244) 17 >>> len(it341) 21 >>> len(all_students) 37

- Since 17 + 21 is 38 ...
- creating the union of the two sets ...
- has eliminated the duplicates
- Now let's say the Athletic department has sent me a list ...
- of the ID's of all students playing sports this semester
- The first thing I need to do is combine these sets ...
- into a new set consisting of all student athletes
- Again I use the union operator
>>> athletes = basketball | baseball

- Now I can get the student IDs for all students ...
- for whom I have to report their academic standing ...
- by creating a new set that is the intersection ...
- of all_students and athletes
>>>report_needed = all_students & athletes >>> report_needed {'01146053', '01015502', '01367216', '01330451'}

- Why use sets if we can use lists?
- Lists can do everything sets can do ...
- and they also have an order associated with them
- But the way that sets are implemented in Python ...
- means that getting a value from a set is very fast
- So sets are more efficient than lists
- If the number of elements in a problem is small ...
- this efficiency doesn't make much of a difference
- But what if we were doing something ...
- that involved a large number of elements?
- Here sets could make things significantly faster