Class 10.

 

Midterm Nov 3, tentatively. Open book, open notes. Re­sponsible for ALL of Chapter 2 and 3 (except no detail on Chapter 3.11 — Responsible for all homework and concepts covered in class).

 

HW1 solutions are available: see note on class web page.

 

How to do the homework using your own PC as much as possible:

1.    On your PC, open your favorite editor to create the hw2.sql file.  This can be NotePad or WordPad or whatever you use for text files.

2.    On your PC, open another window for your ssh tool (say putty.exe) and log into dbs2.cs.umb.edu and run sqlplus to get the SQL> prompt. Now sqlplus is waiting for a SQL statement to execute.

3.    In the editor, type in the first SQL statement you need. Don’t forget the final ;

4.    Copy and paste it into the sqlplus window and see it execute.

5.    If it fails, just edit your editor buffer and repeat step 4.

6.    When it finally succeeds, save your text file and go on to the next SQL statement, ...

 

I reported on class progress setting up for using Oracle:

5 people have not even logged into dbs2: shame on you!

8 people have logged in, but not yet edited .cshrc to add the module load for Oracle

6 people have logged in and edited .cshrc: these people now can use sqlplus

 

Separately, 4 people have copied loadcap to their homework area as suggested.

 

See the notes at the beginning of class 9 on doing the setup steps.

I’ll check again next week. I can also see if the CAP tables are in your database!

Note: I never look in your non-homework directories—those are private to you.

 

Example 3.6.4  Retrieve all customers who purchased at least one prod­uct costing less than $0.50. In relational algebra, this is wrong:

 

    ((ORDERS  PRODUCTS where price < 0.50)  CUSTOMERS) [cname]

 

Why? Because cities of PRODUCTS and CUSTOMERS will match; query becomes: Retrieve customers who purchased at least one product costing less than $0.50 that is stored in the same city as the customer.

 

Correct this with the following Relational Algebra form:

 

    ((ORDERS  (PRODUCTS where price <0.50) [pid])  CUSTOMERS) [cname]

 

In SQL we have two methods to choose from: (1) use a condition join by naming a subset of the commonly named columns to be joined with an ON search condition:

                 

    select distinct cname     

         from (orders o join products p on o.pid = p.pid)

             join customers c on o.cid = c.cid where p.price < 0.50;

 

Since no product is this cheap, this returns no rows for the standard CAP DB.

 

Question in class: Isn’t the o.cid on the last line out of scope?

Answer: Although the parentheses look a little like those around a subquery with its scoping effect, in fact they do not contain a subquery and only group together the first join operation, using the tableref form of (tableref). You can drop the parentheses and get the same result, since the JOIN associates from left to right.

 

Execution on Oracle: using copy and paste from PC at home:

Note the imperfect echoing of the original copy operation, but in fact the query was received OK, as shown by the “r” command in sqlplus that lists and then executes the current statement. See pg. 727 for a list of sqlplus commands.

 

SQL> select distinct cname

                from (ord  2  ers o join products p on o.pid = p.pid)  <--bad echo of paste op

                          3  join customers c on o.cid = c.cid where p.price < 0.50;

 

no rows selected

 

SQL> r                        <-- use r command to repeat query and execution

  1  select distinct cname

  2             from (orders o join products p on o.pid = p.pid)

  3*                    join customers c on o.cid = c.cid where p.price < 0.50

 

no rows selected

 

Same query without parentheses: also works—try it!

    select distinct cname     

         from orders o join products p on o.pid = p.pid

             join customers c on o.cid = c.cid where p.price < 0.50;

 

Side note:

Another suggested form:

     select distinct cname

         from (orders o join products p on o.pid = p.pid) x

              join customers c on x.cid = c.cid where p.price < 0.50;

 

However, this does not work. There is no general form “tableref corr_name”, only “tablename corr_name” and “(subquery) corr_name”, so (tableref) corr_name is not good syntax.  See Fig. 3.11, pg. 117, where I asked you to add “| (tableref)” as another form. What this means is that we don’t have access to the virtual table that is the result of the first join, only the input tables and the result of the whole join. This behavior is the same as with traditional joins with “from t1, t2, t3 ...”.

 

 

Or we can restrict the columns used in a join by a Column Name Join, with a USING clause which limits the columns equated in a JOIN to a subset of commonly named ones:

 

    select distinct cname from

         (orders join products using (pid))

             join customers using (cid) where price < 0.50;

 

Of course city is not named in this using clause. Note that either an ON clause or a USING clause is always required with a JOIN (unless it’s a CROSS JOIN).

 

OK, now we deal with OUTER JOIN. 

 

Covered this in Section 2.10.  using T1 FULL OUTER JOIN T2 (must say FULL), if T1 has rows that don't match T2 in Join columns, rows still ap­pear in the answer with nulls for the columns from T2, and vice-versa.

 

We also have LEFT OUTER JOIN and RIGHT OUTER JOIN. LEFT means that will only preserver rows from the left-hand table in result, RIGHT analogously.

 

INNER JOIN means opposite of OUTER, what we have been using all this time without a special name. Given two tables, S and T:

 

  S                                  T

C

A

 

A

B

c1

a1

 

a1

b1

c3

a3

 

a2

b2

c4

a4

 

a3

b3

 

The FULL OUTER JOIN of S and T would give the following:

 

  select * from S full outer join T

 S.C

 S.A

 T.B

  c1

  a1

  b1

  c3 

  a3

  b3

  c4 

  a4

 null

 null

  a2

  b2

 

The LEFT OUTER JOIN of S and T is next; In a left outer join we preserve rows in the left table, creating nulls to match for the right table; the re­verse is true for RIGHT OUTER JOIN.

 

  select * from S left outer join T

 S.C

 S.A

 T.B

  c1

  a1

  b1

  c3 

  a3

  b3

  c4 

  a4

 null

 

Note that ORACLE Release 8 provided only Left and Right Outer Join with an unusual syntax, but with Release 9, ORACLE began to support the Standard for these join forms.

 

 

Section 3.7.  Set Functions in SQL, pg. 124, Figure 3.12.  count, max, min, sum, avg.  Operates on sets of values, returns a single value (all except count, which can op­erate on sets of rows).

 

    Name     Argument type                            Result type             Description

    count      any (can be *)                              numeric                   count of occurrences

    sum        numeric                                        numeric                   sum of arguments

    avg        numeric                                            numeric               average of arguments

    max        char or numeric                            same as arg            maximum value

    min         char or numeric                            same as arg            minimum value

 

       Figure 3.12  Standard set functions in SQL (pg. 124)

 

    select sum(dollars) from orders where aid = 'a01';

 

Returns total dollar sales of agent a01. (Note Where clause gets applied first!!!)  Looks like: --\  In ORACLE, Heading is SUM(dollars)

 

    COL1

1400.00

 

Don't get confused, not like:

 

    select sqrt(qty) from orders;             See pg. 139 for sqrt

 

. . . which must return one value for each row in orders, operates on one row at a time; Set function can operates on a SET of rows. Note, SQL is more powerful than rel alg: couldn't find the total dollar sales of an agent.

 

Lots of different terminology.  SQL92 and X/OPEN refer to set functions, ORACLE changed from group functions to aggregate functions (aggregate:  to bring together a number of objects into a single measure); DB2 says col­umn functions.  Following SQL92, we say Set functions.

 

ORACLE has lots more Set functions (See ORACLE Notes linked to Course Web Page, look up Aggregate functions), e.g. stddev and variance:

 

    select variance(observations) from experiment;

    select stddev(observations) from experiment;

 

Note max and min return int or real or char max and min.

 

    select min(city) from products;

 

Note avg(x) has same value as sum(x)/count(x);  probably more efficient. Now the select statement

 

    select count(*) from customers;

 

Counts the number of rows in the customers table. Note too that when we select count(colname) will count only if value is non-null:

 

    select count(cid) from customers;                 /* cid not null in customers                            */

    select count(city) from customers;                 /* count only non-null city values                 */

 

But if city has no null values, the two selects will give the same value, even if only three different city names on 100 rows. Surprising?  But:

 

    select count(distinct city) from customers;

 

is how we get count of distinct city names.  Could also do:

 

    select avg(distinct dollars) from orders;

 

But it would be VERY UNUSUAL if you really wanted this.

 

It is not legal to use an aggregate function directly in a where clause.

E.g., try to list all customers with maximum discount.

 

    select cid, cname from customers c where c.discnt = max(c.discnt);

    /*** NOT LEGAL ***/

 

Problem is that range variable c is only supposed to range once over cus­tomers, not once outside and once inside max( ). There's only one range variable here and we need two to evaluate max( ) first.  How solve this? USE SUBQUERY around max:

 

    select cid, cname from customers where discnt =

         (select max(discnt) from customers);

 

Now customers in subquery ranges separately from out­side.  Why can we use = above, rather than =any or =all? . . .  Because only one value returned!

 

REMEMBER THIS:  MUST USE SUBQUERY IF SET FUNCTION IN WHERE!

 

Where can you use a set function?

--in a select list (i.e. the list of items after the select keyword) or, when we get to it, in a HAVING clause.

 

Example 3.6.6.  Find products ordered by at least two customers.  Used:

 

    select distinct c1.pid from orders c1, orders c2

         where c1.pid = c2.pid and c1.cid < c2.cid;

 

New way (more efficient, generalizes to more than 2 without increasing number of  joins):

 

    select pid from products p

         where 2 <= (select count(distinct cid) from orders

                                    where pid = p.pid);

 

Note: can’t turn this around to (...) >= 2 and still call it Basic SQL, but it is good Intermediate SQL92, and OK in Oracle.

 

Handling Null Values.

 

Recall that a null value appears as a column value in a row when the value is either unknown (discnt for new customer) or inapplicable (employee manager for company president).

 

If insert a row (insert statement, to come) without specifying some col­umn values, nulls will be placed for those values

 

Unless column definition in create table specifies not null as for cid in Appendix A, pg. 757 — Then the insert statement will not work.  Ex 3.6.7:

 

    insert into customers (cid, cname, city)

         values ('c007', 'Windix', 'Dallas');

 

The discnt value is not specified, so will be placed as null value.  Note that it is also possible in Standard SQL to specify null as a value for insert or update: See Figs. 3.22 & 3.23 Insert and Update forms. (Pgs. 148-149)

 

A null value has IMPORTANT implications.  Two following are different:

 

    select count(*) from customers;

    select count(*) from customers where (discnt < 8 or discnt >= 8);

 

Why?  Because null values for discnt will not be selected in second state­ment, even though the condition seems to be exhaustive.

 

A null value in any comparison expression evaluates to UNKNOWN, rather than TRUE or FALSE.  In a Select statement, only rows for which the where condition is TRUE are retrieved.  (See pg. 150 for reason for UNKNOWN.)

 

This means that the null value of an integer type variable cannot be kept simply as some value pattern, because all patterns are taken up by real integer values.  Need special FLAG byte for a column to see when it is null.

 

Note another important property:  the set functions IGNORE null values.  If we write:

 

    select avg(dollars) from orders where aid = 'a01';

or

    select sum(dollars)/count(dollars) from orders where aid = 'a02';

 

and some rows have UNKNOWN dollars values, then the count, sum, and avg functions will all skip over those values.  If the values were zero instead of null, clearly the avg would be lower.