Class 11.

 

Homework due soon. 

Section 3.8.  Groups of rows.

 

SQL allows a select statement to serve a kind of report function.  Groups rows of a table based on a common values and performs aggregate func­tions on rows grouped.  E.g.

 

    select pid, sum(qty) from orders

       group by pid;

 

New GROUP BY clause.  Print out as if following logic was followed:

 

    FOR EACH DISTINCT VALUE v OF pid IN orders;

         select pid, sum(qty) from orders where pid = v;

    END FOR;

 

See text for table printed out on pg. 129. 

 

Similarly, find how many orders each customer has made:

    select cid, count(*) from orders group by cid;

 

Note, in Select can't include anything in target-list that is not single-valued for the groups created in the GROUP BY clause.

 

    select pid, aid, sum(qty) from orders

      group by pid;  ** ILLEGAL **

 

This is because a single pid in orders may be associated with multiple aid values.  However, we can have more finely aggregated groups: we can GROUP BY aid and pid both:

 

    select pid, aid, sum(qty) as total from orders

      group by pid, aid;

 

This has the effect of the loop:

 

    FOR EACH DISTINCT pair  of values (v, w) equal to (pid, aid) in orders

         select pid, aid, sum(qty) from orders where pid = v and aid = w;

    END FOR;

 

See table retrieved in Example 3.8.1, pg 130 in text. Now a surprise!  If we write:

 

    select p.pid, pname, sum(qty) from orders o, products p

         where o.pid = p.pid group by p.pid;   WON'T WORK!!

 

It won't work!  WHY?! (Class.) Even though pname is uniquely valued for each group (unique pid value), SQL doesn't recognize that.

 

SQL> select p.pid, pname, sum(qty) from orders o, products p

                where o.pid = p.pid group by p.pid;

  2  select p.pid, pname, sum(qty) from orders o, products p

              *

ERROR at line 1:

ORA-00979: not a GROUP BY expression

 

Note that the * is supposed to mark the pname, but the query text is shifted over 5 chars and it ends up in the wrong spot.  If we add 5 spaces to its line, it marks the right thing:

  2  select p.pid, pname, sum(qty) from orders o, products p

                   *

                     (5 spaces added before the *)

ERROR at line 1:

ORA-00979: not a GROUP BY expression

 

 

This is true even though we define in Create Table that pid is primary key. Still true with ORACLE 10g !! Must say:

 

    select p.pid, pname, sum(qty) from orders o, products p

         where o.pid = p.pid

        group by p.pid, pname;   -- add pname to allow it to be in the select list

 

Of course we can have a WHERE clause at the same time as a GROUP BY:

 

    select pid, sum(qty) from orders where aid = 'a03'

         group by pid;

 

CONCEPTUAL ORDER:  (1) Take Cartesian product;  (2) cross out rows not selected in WHERE clause; (3) Group remaining rows in accordance with GROUP BY clause;  (4) evaluate expressions in target list (aggregate function values depend on groups). LEAVE UP!

 

Now.  What if want to eliminate rows where sum(qty) is too small in:

 

    select pid, sum(qty) from orders group by pid;

 

Can't do it by saying:

 

    select pid, sum(qty) from orders

         where sum(qty) >= 1000 group by pid;  ** ILLEGAL **

 

Because not allowed to use set function in WHERE clause.  Also, Where acts before grouping occurs and long before target-list expressions eval­uated.

 

IBM managers invented a new clause to act as Where after grouped quan­tities have been calculated:  called a Having clause:

 

    select pid, sum(qty) from orders

         group by pid

        having sum(qty) >= 1000 ;

 

SQL>    select pid, sum(qty) from orders

                group by pid

       2    3     having sum(qty) >= 1000 ;

 

PID   SUM(QTY)

--- ----------

p01       4800

p03       2400

p05       2900

p07       1400

 

So now: (4) Eliminate rows in target-list that do not obey the HAVING clause requirements. (Evaluate expressions in target list is now (5).)

 

(*)  Note this is a kluge: Very much not recursive:  can't GROUP BY the re­sult of a GROUP BY after a HAVING.  But might want to.

 

However, with Advanced SQL can Select FROM a Subquery, thus can GROUP BY again on a second pass or aggregate differently (max value of Grouped avg). Not possible to do this with Having Clause.  Here is NEW Example:

 

    select pid, sales from (select pid, sum(qty) as sales from orders

                                         group by pid) t

    where sales > 1000;

 

SQL> select pid, sales from (select pid, sum(qty) as sales from orders

                            2                        group by pid) t

      3  where sales > 1000;

 

PID      SALES

--- ----------

p01       4800

p03       2400

p05       2900

p07       1400

 

A HAVING restriction need not refer to set function appearing in the tar­get list.  A HAVING clause is just used to restrict after a GROUP BY.

 

Note: you can use HAVING without GROUP BY. In that case, there’s just one group.

 

select max(qty) from orders

  having count(*) > 20;

 

Only reports max if there are 20 or more rows in orders. For CAP: “no rows selected”

 

 

Example 3.8.4.  Get ids of products purchased by more than two cus­tomers.

 

    select pid from orders

      group by pid

      having count(distinct cid) > 2;

 

It certainly feels risky to mention cid in group by pid, but count() makes it single-val­ued. 

The idea is that the groups made in a GROUP BY are groups of rows conceptually, so the cids are still there to work with.

 

This is an even more compact Select than 3.7.6.         

 

    select pid from products p

        where 2 <= (select count(distinct cid from orders

                            where pid = p.pid);

 

New method with HAVING clause retrieves from only one table, but old method retrieves from two and uses correlated subquery for a join.

 

Example: Get the pid (or pids if ties) with the most orders.

Solution not covered in class, here it is:

 

Consider:  select pid, count(*) from orders

                   group by pid;

 

This gives a table of pid, count. We want the max of these counts, the one >= all the counts:

      select pid, count(*) from orders

             group by pid

             having count(*) >= all (select count(*) from orders group by pid);

 

PID   COUNT(*)

--- ----------

p01          5

 

The following doesn’t work: ct is not yet associated with count(*) values:

      select pid, count(*) as ct from orders

             group by pid

             having ct >= all (select count(*) from orders group by pid);

ERROR at line 3:

ORA-00904: "CT": invalid identifier

 

Another way that works (given advanced SQL):

select pid, ct from (select pid, count(*) ct from orders group by pid) t

   where ct = (select max(ct) from (select pid, count(*) ct from orders group by pid) t1 );

 

In this problem, we have to use HAVING or recursive queries to be able to select the max out of the virtual table of counts that the group by creates.

 

Section 3.9.  Now complete description of Select statement, pg. 135.  Remember:  idea is to give you confidence — everything has been covered.

 

But you must make special effort now to gain that confidence for yourself. 

Subquery General Form, for Basic SQL. For Advanced SQL, replace the last line with Figure 3.10, pg. 115.

 

    SELECT [ALL|DISTINCT] expr [[AS] c_alias] {, expr [[AS] c_alias]}

         FROM tableref {, tableref}

         [WHERE search_condition]

         [GROUP BY column {, column}]

         [HAVING search_condition]

         | subquery [UNION [ALL]] subquery;

 

Recall tableref in FROM clause was defined for Advanced SQL in Section 3.6, Fig. 3.11, pg. 117. The tableref form for Basic SQL is above this on the same page, i.e. just “tablename [corr_name]” or, except for Oracle, “tablename [[AS] [corr_name]]”

 

Square brackets means the item is optional: WHERE, GROUP BY, etc.

 

Thus [ALL|DISTINCT]: the phrase is optional; if we specify, we must choose exactly one form, ALL (duplicates allowed in target-list) or DISTINCT (no duplicates. But since ALL is underlined, it is the default, so if the phrase is not specified, it will be as if ALL was specified.

 

search_condition in the WHERE clause is a complex object; give details in what follows: various Subqueries and predicates (a few new ones). Note CAN have Set function in HAVING search_cond, but not not in WHERE s_c.

 

OK, this is a Subquery form:  means can occur in subquery (part of search_condition), also part of full select as follows.

 

Full Select General Form

    subquery

         [ORDER BY result_col [ASC|DESC] {, result_col [ASC|DESC]}]

 

The ORDER BY clause is new, and allows us to order the rows output by a succession of result-column values, leftmost first.

 

Explain dictionary order; e.g.: order by lname, fname. Note: result_col can be a number, referring to numbered column in the target list or a column alias.

 

In a UNION, ORDER BY, can use numbers (or column alias) for columns; needn't only UNION same named columns (though must be of same type!), require column alias for expressions in ORDER.

 

(Alternatively we could use column aliases to make the column names the same for all subqueries.)

 

Therefore, the result_col in this case can be one of the column num­bers 1 through n, where n columns occur in the result. (Breaks rel rule.)

 

Note that in the [ASC|DESC] choice (ascending order, or descending order), ASC is the default.

 

Now everything in a Subquery comes before ORDER BY and the order of clauses in a Subquery carries over to the conceptual order of evalu­ation.

 

Reasonable, since an ordering of rows by column values is clearly a final step before display.

 

See Figure 3.15.  Conceptual order of evaluation of a Select statement.

 

o  Form the Cartesian product of tables in the FROM clause.

o  Eliminate rows not satisfying the WHERE condition.

o  Group remaining rows in accordance with the GROUP BY clause.

o  Eliminate groups not satisfying the HAVING clause.

o  Evaluate expressions of the Select clause target list.

o  If have key word DISTINCT (select distinct…), eliminate duplicate rows.

Added over what’s in text: attach column corr_names, finishing the job of creating a virtual table from the subquery. (Thus column corr_names can be used in UNIONs and ORDER Bys, but not earlier in GROUP BY or HAVING.)

o  The UNION, INTERSECT, EXCEPT is performed after its subqueries.

o  Finally, the set of all selected rows is sorted if an ORDER BY is present.

 

Example 3.9.1.  List all customers, agents, and the dollar sales for pairs  of customers and agents,  and order the result from largest to smallest sales totals. Retain only those pairs for which the dollar amount is at least 900.00.

 

select c.cname, c.cid, a.aname, a.aid,  sum(o.dollars)

   from customers c, orders o, agents a  where c.cid = o.cid and o.aid = a.aid

      group by  c.cname, c.cid, a.aname, a.aid 

      having sum(o.dollars)  >= 900.00

      order by  5  desc;

 

Shows use of ordinal position for the sum output column. Alternatively (better really): use a column corr_name:

 

select c.cname, c.cid, a.aname, a.aid,  sum(o.dollars) tot

   from customers c, orders o, agents a  where c.cid = o.cid and o.aid = a.aid

      group by  c.cname, c.cid, a.aname, a.aid 

      having sum(o.dollars)  >= 900.00

      order by tot  desc;