Class 5.
Hand in hw 1 next class.
Continuing with the division examples—see 58 for R, we are computing T := R ÷ S.
Here, need to show that (a1, b2, cx) is in R for all x from 1 to 4. Yes, it is, so t=(a1, b2) is in the quotient T:
S T
|
C |
|
|
A |
B |
|
c1 |
|
|
a1 |
b2 |
|
c2 |
|
|
|
|
|
c3 |
|
|
|
|
|
c4 |
|
|
|
|
S ´ T in R, and by looking at the rows in table R with C value c3 and c4, we see why T has the maximal content with this property.
S T
|
B |
C |
|
|
A |
|
b1 |
c1 |
|
|
a1 |
|
|
|
|
|
a2 |
Example of dividing R with three columns by a table S with two columns, resulting in a table T with a single column. Need to show that (a1, b1, c1) in R, OK, (a2, b1, c2) in R, OK, and that T is maximal.
S T
|
B |
C |
|
|
A |
|
b1 |
c1 |
|
|
a1 |
|
b2 |
c1 |
|
|
|
Why is T maximal?
Next we given an example to see why we use division: what kind of question does it answer about the data.
Recall C:= CUSTOMERS, A := AGENTS, P := PRODUCTS, O := ORDERS
Example 1. Get names of customers who order all products. When see "all" think "division". Let's get cids of such customers.
O[cid, pid] ÷ P[pid]
P[pid] : p01 through p07
O[cid, pid] ÷ P[pid] =
|
|
cid |
|
|
c001 |
Clear why must project P on pid: columns of divisor must be subset of columns of dividend.
Why project O on cid, pid? Otherwise will try to find other column values in ORDERS (e.g. qty) the same for all pid, but just want cid the same for all pid.
To get names of customers, join with C.
((O[cid, pid] ÷ P[pid])
C)[cname]
Example 2. Get cids of customers who order all products priced at $0.50. Build new table to take place of P above. Smaller divisor, maybe larger quotient...
O[cid, pid] ÷ (P where price = 0.50)[pid]
(P where price = 0.50)[pid] =
|
|
|
pid |
|
|
|
p01 |
|
|
|
p02 |
O[cid, pid] ÷ (P where price = 0.50)[pid]
Same as before, only c001 bought p02, so answer same as in Example 1.
Example 3. Get cids of customers who order all products that anybody orders. (Maybe few new products no-one orders yet.)
all products that anybody orders: O[pid] (removes products with no orders at all)
O[cid, pid] ÷ O[pid]
If O[pid] Ì P[pid], so that there is a product with no orders, O[cid, pid] ÷ P[pid] is empty set, since no customer can order all products.
OK. Some of the 8 operations of relational algebra: UNION, INTERSECTION, -, TIMES, PROJECT, SELECT, JOIN, DIVISION are unnecessary. Just carry them for ease of use.
OK. Some of the 8 operations of relational algebra: UNION, INTERSECTION, -, TIMES, PROJECT, SELECT, JOIN, DIVISION are unnecessary. Just carry them for ease of use.
Th. 2.8.1, don't need both INTERSECT and MINUS.
A Ç B = A - (A - B). Show this with a picture.
Th. 2.8.2. Can define JOIN of R and S in terms of TIMES, SELECT, PROJECT and ASSIGNMENT.
JOIN. Start by taking R ´ S.
Cross out rows (SELECT) where equivalently named attributes in R ´ S (such as cid in customers and cid in orders) don't have same value.
Do away with columns with duplicate values (PROJECT); drop qualification names (ASSIGNMENT -- don't need, since now all columns have different names)
Say Head(R) = AB and Head(S) = BC.
Take T1 := R ´ S (attributes are R.A, R.B, S.B, S.C).
Then take T2 := (T1 where R.B = S.B) [R.A, R.B, S.C]. (equate and project)
Finally let T3(A, B, C) := T2(R.A, R.B, S.C) (rename attributes, so no S.B). NOTE: SQL does it this way, even equates different named cols in 2 tables.
T3 is equivalent to R
S.
DIVISION. If Head(R) = A1 . . . An B1 . . . Bm and Head(S) = B1 . . . Bm:
R ¸ S = R[A1 . . . An] - ((R[A1 . . . An] X S) - R) [A1 . . . An]
Example: See Example 2.7.19 on pg. 58, the second S on pg. 58. Work out.
R[A1 . . . An] is R[A B], with three rows, (a1, b1), (a2, b1), and (a1, b2).
Calculate X S (6 rows), and subtract R. Only one left is (a1, b1, c2) and project on [A B] get (a1, b1), subtract from R[A B], result is two rows (a2, b1), and (a1, b2).
This agrees with T shown on pg. 58 for this case.
Back to general formula--
First term on left (R[A1 . . . An]) is certainly maximum possible answer. It will be the answer (T) if R[A1 . . . An] X S = R: Then T X S = R, definition.
But if it's too large, then R[A1 . . . An] X S É R, i.e, has extra rows because of extra rows in R[A1 . . . An] (i.e., rows not in the quotient T).
Consider reducing the initial set R[A1 . . . An] by subtracting out the part that gives rows not in R, i.e., subtract out: (R[A1 . . . An] X S - R) [A1 . . . An]
Claim now we have:
(R[A1 . . . An] - ((R[A1 . . . An] X S) - R) [A1 . . . An]) X S Í R, and this is max.
If you like a challenge, TRY to understand this division representation.
Now, we seem to have an awful lot of capability with these operations, to create queries that answer requests for data. Describe any columns you want as a result (do with project), any way in which different tables are interconnected (product then select is more general than join).
CAN DO THIS IN GENERAL, as long as connections can be described with the conditions we have specified. (<, <=, =, <>, >=, >)
Could think of conditions not so described (city contains a character pattern 'ew'; New York or Newark), but could add new conditions: trivial.
Even FOR ALL type queries can be done without division really, not easy intuition.
BAG OF TRICKS FOLLOWS (EXAM Qs): PAY CAREFUL ATTENTION!!!
Example 1. Get aids of agents who do not supply product p02. (How do it?)
A[aid] - (O where pid = 'p02')[aid]
\_ Note, maybe some agents in table A supply NO product, unlike agents in table O! SO: do you mean supply SOME products but not product p02?
Example 2. Get aids of agents who supply only product p02.
O[aid] - (O where pid <> 'p02')[aid]
The question implies that the these agents DO supply product p02, so no good to say:
A[aid] - (O where pid <> 'p02')[aid]
(English is a bit ambiguous, and we may want to clarify.) Will this relational algebra expression have any agents who do not place an order for product p02 either? Do we need to write:
(O where pid = 'p02')[aid] - (O where pid <> 'p02')[aid]
(No, unnecessary, for has an aid which appears in table O has to place some order and it can't be for any product other than p02.)
Example 3. Get aids of agents who take orders on at least that set of products ordered by c004. (How?) (Rephrase: take orders on ALL products ordered by c004.)
O[aid, pid] ÷ (O where cid = 'c004')[pid]
Another way to phrase this: gives aid of agents a where if a product is ordered by c004, it is also ordered through the agent a.
Example 4. Get cids of customers who order p01 and p07. (How?) (Wrong to write: (O where pid = 'p01' and pid = 'p07')[cid] WHY?)
(O where pid = 'p01')[cid] Ç (O where pid = 'p07')[cid]
Example 5. Get cids of customers who order p01 or p07. (Can use or.)
(O where pid = 'p01' or pid = 'p07')[cid])
Example 6. List all cities inhabited by customers who order product p02 or agents who place an order for p02. (Can't use or in selection: Why?)
((O where pid = 'p02')
C)[city] È ((O where pid = 'p02')
A)[city]
Example 7. Get aids of agents who place an order for at least one customer that uses product p01. NOTE: might be that an agent retrieved does NOT place an order for p01. Consider diagram that follows.
/-- Agents
/-- Customers --- who
p01 --- who order --- place orders
\-- p01 --- for those
\-- customers
We are retrieving cloud on right. This is one of the most missed problems on exams. Do it inside-out. Customers who order product p01.
(O where pid = 'p01')[cid]
Agents who place orders for those customers.
((O where pid = 'p01')[cid]
O) [aid] (Note OK: Join not
Qualified.)
Example 8. Retrieve product ids for all products that are not ordered by any customers living in a city beginning with the letter "D" (Recall Example 6). OK, start with the sense of "products not ordered".
P[pid] - (O
(C where . . .))[pid]
This works if we come up with a condition for . . . that means living in city beginning with the letter "D". Right? Any ideas? Only have conditions: <, <=, =, <>, >=, >. With alpha strings, means earlier in alphabetical order.
C where C.city >= 'D' and C.city < 'E'
In the solved Exercise, in 2.5 (i), see how it is difficult to find customers who have the largest discount. (Good Exam question -- note Scale Grades.)
Example 9. Retrieve cids of customers with the largest discounts.
CY := C
T1(cyid, cid) := ((CY ´ C) where CY.discnt >= C.discnt)[CY.cid,C.cid]
Now have pairs (cyid, cid) where the ones with larger or equal discount are on the left. The cids with maximum discount will be paired with all other cid values on the right.
T2 := T1 ÷ CY[cid]
(There's an easier way to do this in SQL.)
select cid from customers
where discnt = (select max(discnt) from