Class 7.
Example 3.3.7. Find pids of products ordered by at least two customers.
NOT EASY! How to do? Think of idea of two range variables alive at same time in one table. Picture in orders tables. Say pid the same and cid different for two.
select [distinct?] x1.pid from orders x1, orders x2
where x1.pid = x2.pid and x1.cid <> x2.cid;
Need distinct? Yes. Picture Cartesian product, pid p01 once with three different cid values, c001, c004, c006. What happens? There are three pairs of rows that x1, x2 can fall on: (c001, c004), (c001, c006), and (c004, c006), AND the REVERSE!
It's a better idea to write x1.cid < x2.cid above, instead of x1.cid <> x2.cid, but would STILL NEED to use DISTINCT keyword for the Select list.
Example 3.3.8. Get cids of customers ordering a product for which an order is placed by agent a06. Reminiscent of an example we had in rel alg. Note that a cid retrieved doesn't necessarily order any product through a06.
/----- Customers CREATE PICTURE OF RANGE
/-- Products --- who VARIABLES AND HOW TO
a06 ---ordered by --- place orders SOLVE THIS THINKING IN
\--agent a06 --- for those TERMS OF QBE; THAT
\------ products TRANSLATES DIRECTLY
TO SQL STATEMENT
TWO connections through orders relation, so need two aliases. First products ordered by a06 (first cloud above):
select [distinct?] x.pid from orders x where x.aid = 'a06';
Would need distinct if wanted unique x.pid values. Next, customers who order those products:
select distinct y.cid from orders x, orders y NESTED LOOP OF
where x.aid = 'a06' and y.pid = x.pid; (SLOW) RANGE VARIABLES
(PICTURE) x.pid reports back the first cloud above, and set y.pid = x.pid to get second cloud y.cid.
But an even MORE straightforward way to do that is to use a subquery (Section 3.4):
select distinct y.cid from orders y where y.pid in
(select x.pid from orders x where x.aid = 'a06');
The "subquery" retrieves the first cloud, and then the outer select retrieves all y.cid where y.pid is "in" the first cloud.
The condition "in" is an operator expecting a single value on the left and a "set" of values on the right (from a subquery ot list); it's true when the value is one of the set of values on the right.
Of course the outer Select retrieves all rows where the condition is true.
For the first time, we are using some recursion based on the idea that a select retrieves a table, and now can operate on that table as a set of rows (or in the case just covered, a set of values).
NOTE: For any type of query we have learned so far, if we can do it with join we can do with subquery and vice-versa. Subquerys are supposed to give a slightly more natural approach to some queries. BUT you should be able to translate Subqueries to joins in most cases.
Example 3.4.1. Retrieve cids of customers who place orders with agents in Los Angeles or Dallas. First find all agents in LA or Dallas, and make that a subquery:
select distinct cid from orders where aid in (select aid from agents
where city = 'Los Angeles' or city = 'Dallas');
Note don't need to qualify aid in subquery use or outside subquery: in each case there is only one table that is natural.
Conceptually, innermost subquery operates first, returns a set of values, then outer select runs through customer rows and tests whether cid is in this set. In fact that might NOT be the way it is done by the system (after query optimization). Equivalent to:
select [distinct?] x.cid from orders x, agents a where
x.aid = a.aid and (a.city = 'Los Angeles' or a.city = 'Dallas');
Why ( )s? Because of order of operations: p and q or r is (p and q) or r,
but that isn't what we want; want p and (q or r).
Otherwise, if r were true for any p, q, whole statement true; then if agents has some agent in Dallas, will get all cids reported (won't need x.aid = a.aid).
Skip a few simple examples. In particular, can construct your own set using a condition such as: . . . where . . . city in ('Duluth', 'Dallas'). Of course we could also do this using or,
All these variant ways of doing things don't necessarily help — just make user uncomfortable about having a complete grasp of the SQL syntax.
OK, now idea of correlated subquery. Recall in 3.4.1 didn't need qualified names because rows had natural home. But can refer to OUTER range variable from INNER subquery.
Example 3.4.4. (Variant) Find names of customers with discnt >= 10 who order product p05. To illustrate a point, using subquery:
select cname from customers where discnt >= 10 and
'p05' in (select pid from orders where cid = customers.cid);
Yes, unusual to say where <const> in subquery (looks like a typo), but perfectly valid and as long as condition is true will retrieve cname.
Note that in subquery, the cid without a qualifier has home qualifier of orders, while customers.cid harks back to outer select. As if we wrote:
FOR c FROM ROW 1 TO LAST OF customers
discard c if c.discnt < 10
create L as empty list
FOR x FROM ROW 1 TO LAST OF orders
Add x.pid to list L if x.cid = c.cid /* value set outside loop */
END FOR x
discard c if 'p05' not in list L
otherwise place c.cname in ANSWER list
END FOR c
print out ANSWER list
Note can't precalculate inner loop; need outer loop to set customers.cid. (This is called a correlated subquery). Of course, we could change query to:
select cname from customers where discnt >= 10 and
cid in (select cid from orders where pid = 'p05');
Can evaluate this with precalculation of subquery.
create L as empty list
FOR x FROM ROWS 1 TO LAST OF orders
Add x.cid to list L if x.pid = 'p05'
END FOR x
FOR c FROM ROWS 1 TO LAST OF customers
if (c.discnt < 10 or if c.cid not in list L)
discard c and go to next in loop
else place c.cname in ANSWER list
END FOR c
print out ANSWER list
OK, can refer to outer variable from inner loop. But can't do the REVERSE!!
Example 3.4.5. To get names of customers who order product p07 from agent a03, can't use query:
select cname from customers where orders.aid = 'a03' and
'p07' in (select pid from orders where cid = customers.cid);
***ILLEGAL***
The condition orders.aid = 'a03' can't be used outside subquery (doesn't exist yet -- scoping rule)
(** new **) Example. Find cids of customers who do not place any order through agent 'a03'. Can use not in, true exactly when in is false:
select cid from customers where cid not in (select cid from
orders where orders.aid = 'a03'); /* example of minus operator */
Example 3.4.6. In older products, IN predicate only works for single values. Can't have condition:
. . . where (cid, aid) in (select cid, aid from orders . . .
This is Basic SQL standard (Note need (,)), and Oracle permits it.
Quantified Comparison Predicates. List:
=some <>some <=some >some <some >=some
=any <>any <=any >any <any >=any
=all <>all <=all >all <all >=all
See Figure 3.7. Some, Any, or All predicates
If comparison operator µ stands for operator in the set {<, <=. =, <>, >, >=},
**IMPORTANT -- MEMORIZE**
v µsome (subquery) is TRUE iff v µ s for some s in the subquery
v µany (subquery) is TRUE iff v µ s for some s in the subquery
v µall (subquery) is TRUE iff v µ s is true for all s in the subquery.
µsome and µany mean same thing: µsome is easier to remember.
Example 3.4.7. Find aid values of agents with a minimum percent commission. This was HARD in relational algebra. Now easy.
select aid from agents
where percent <= all(select percent from agents);
Alternatively
select aid from agents
where percent = (select max(percent) from agents);
but we haven’t yet covered the set function “max”, so this is premature.