Class 6.

 

Next Assignment:  reading through Section 3.5.  UNDOTTED exercises at end of Chapter 3 through 3.7.  Make sure you can login to dbs2.cs.umb.edu and use sqlplus to access Oracle.

 

FAST:  Idea of outer join.  Given two tables R and S, Head(R) = AB, Head(S) = BC,

                             R                                 S

A

B

 

B

C

a1

b1

 

b1

c1

a2

b2

 

b2

c2

a3

b5

 

b3

c3

 

 

 

b4

c4

 

What is the join, R  S?

                                         R  S

A

B

C

a1

b1

c1

a2

b2

c2

 

But we're losing a lot of information.  Outer join KEEPS all the information of the two tables, filling in nulls where necessary.

 

                                         R O S

A

B

C

a1

b1

c1

a2

b2

c2

a3

b5

null

null

b3

c3

null

b4

c4

 

You use this sort of thing when you don't want to lose information.  Exam­ple given in the text is that even though an agent in AGENTS table has made no sales in SALES table, we DO want to see AGENT name in report.

 

Most commercial product versions of SQL (Including Oracle) already have outer join (it became standard in SQL-92).

 

Theta joinCAN SKIP THIS!!! Recall in equijoin (or "natural join"), of R and S, Head(R) = AB, Head(S) = BC:

 

I.e.,  (R ´ S) where R.B = S.B

 

What if wanted some other condition (allow common columns),

 

    (R ´ S) where R.B q S.B, where q is <, <=, <>, >, >=, anything but =.

 

No big deal.  In SQL, use product, no special name for this kind of join.

 

Chapter 3.  Idea of SQL.  Computerized query language.

 

IBM developed SQL, and it became standard (old QUEL died).  Most queries (including updates, in­serts, deletes)  are very standard across products.

 

We will only use ORACLE in class to begin with, but will also talk about others.  Try to keep to Basic SQL standard for now (Our own standard).

 

In Appendix A.1 (see hw2: READ THIS FIRST; also note DBA.msg emailed to your cs.umb.edu mailbox),

 

Need to edit .cshrc in your home directory—see the DBA email.

 

 To enter Oracle, tells you to login to UNIX on UNIX machine dbs2.cs.umb.edu that serves to Oracle, then give command:

 

    %sqlplus

 

Any problems, contact DBA, dba@cs.umb.edu.

 

You will need to start creating tables.  Follow directions in assignment and in Appendix A.1.

 

Create table.  Result is empty table.

 

create table customers (cid char(4) not null, cname varchar(13),

  city varchar(20), discnt real, primary key(cid));

 

Names and gives types to each of columns:  Explain char(4), varchar(20). 

 

Primary key: means can't duplicate values in table, foreign keys will refer to this column by default.

 

Note "not null" is an integrity constraint:  must specify value on insert, can't leave null.  More integrity constraints in Chapter 7, on DBA stuff.

                                                                                                            / see errata for text

Next, load from OS file into table.  (Called sqlldr in ORACLE and DB2)  No-one wants to type in one line at a time:  everything is moved around on tape or disk (say from one DBMS to another).  Note expected format:

 

c001,TipTop,Duluth,10.00              <-- comma separated, no spaces

c002,Basics,Dallas,12.00

 

You have to create a .ctl table for each table, custs.ctl, . . .  Then use "sqlldr" command (NOTE Correction to pg 723, see Errata on cs630 home page); execute sqlldr from the UNIX prompt, not from within SQL*PLUS.

 

After think have table loaded, type:

 

    select * from customers;

 

Should print out whole table. (* means all columns.)

 

X/OPEN Limits:  127 cols., 2000 bytes for row.  etc.  Oracle allows more.

 

Chapter 3.3.  Simple Select Statements  (Basically same as examples in the text, although starts somewhat different.)

 

Example 3.3.1.  Find (JUST) names of agents based in New York.

 

rel. alg.:    (AGENTS where city = 'New York')[aname]

 

SQL:         select aname from agents where city = 'New York';

 

Keywords: SELECT, FROM (clause), WHERE (clause), end with semi-colon (;).

 

After SELECT have select list, resulting variables to report:  project on.

 

The FROM clause names tables to take Cartesian product.

 

The WHERE clause connects and limits rows in Cartesian product.  No join.  (Our Advanced SQL standard now allows join operation:  Cover later.)

 

Note duplicate aname values ARE possible.  SQL statement above WILL re­port duplicate names (Relational rule broken) unless write:

 

    select distinct aname from agents where city = 'New York';

 

If want to emphasize that duplicates can arise, give name to default:

 

    select all aname from agents where city = 'New York'; (See Fig 3.1 pg 81)

 

Recall we did:  select * from customers;  at end of table load.

 

Select *  means all columns (no projection).  There is no where clause so whole table is chosen.  Like rel alg query:  CUSTOMERS

 

Example 3.3.3.  Select product ids of products for which orders are placed.

 

    select distinct pid from orders;

 

Might there be dupli­cates in this?  How avoid duplicates, class?

 

Example 3.3.4.  Retrieve all (cname, pname) pairs where the customer places an order for the product.  If don't use JOIN operator in SQL, can't make mistake of joining on city column as well.  In RELALG, need:

 

    (C[cid, cname]    O    P)[cname, pname] 

 

 Rather:

 

    ((C X O X P)

         where C.cid = O.cid and O.pid = P.pid)[cname, pname]

 

In SQL:

 

    select distinct cname, pname from customers c, orders o, products p

         where c.cid = o.cid and o.pid = p.pid;

 

Note that all "table alias" names must be specified in single statement where they are used, in FROM clause: the table alias follows the table name without a comma intervening.

 

In ORACLE and INFORMIX, look up "alias, table". In DB2 UDB, say "correla­tion name", and "alias" is something else. (Also in INFORMIX, "correlation name" is something else!)

 

SQL is non-procedural to the greatest extent possible, so a statement can't depend on an earlier statement.  Note too, could leave out alias in this example — just saves typing here.

 

Need qualifier when column names are ambiguous from one table to an­other; not otherwise (above cname, pname, but o.cid, c.cid).  But it never hurts to qualify column names.

 

Conceptual process in Select.  (1) Take product of tables in from clause, (2) apply selection in where clause,  (3)  project on attributes in select-list and report.

 

But this in not necessarily (usually not) the actual order of events.  Query optimization is applied.  If customers has 1000 rows, orders 100,000, and products 1000, then Cartesian product has 100 billion rows.  But eventual answer will have at most 100,000 rows.

 

Can make this much more effi­cient:  index customers by cid and products by pid;  now go through orders one row at a time and put out cname, pname into temporary file.  Finally, sort cname, pname pairs and cast out dupli­cates.

 

This is what a query optimizer might do in its query plan.  Basically, a query plan is a program that retrieves the data you want.  You don't have to worry about it.  Just type in SQL, query optimizer creates the program. 

 

Can perform calculations in select.

 

Example 3.3.5.  retrieve a table based on orders, with columns ordno, cid, aid, pid, and profit on order.

 

    select ordno, x.cid, x.aid, x.pid, 0.40*(x.qty*p.price)

         - 0.01*(c.discnt + a.percent)*(x.qty*p.price) as profit

         from orders x, customers c, agents a, products p

         where c.cid = x.cid and a.aid = x.aid and p.pid = x.pid;

 

(Profit is quantity * price, where subtract off 60% of cost, discnt for customer, and percent commission for agent.  Expressions allow:  +, -, *, /, and functions:  upper( ) or ceil( ) (Oracle, smallest integer greater than real number).

 

Without the "as profit" clause, Oracle will head the column with the ex­pression.  After clause put in, heading will be "profit"  In ORACLE, this is called a column alias, comparable to the way table aliases are created.

 

In INFORMIX, it is called a "Display Label".

 

Table alias is also called correlation name (in DB2) and range variable.  Sometimes not just shorthand, but necessary, as when join table with it­self (need different qualifying names).

 

Example 3.3.6.  List all pairs of customer cids based in the same city.

 

We have seen this kind of thing before in Rel Alg.  Need to re­port distinct pairs of cid only once.

 

    select [distinct?] c1.cid, c2.cid from customers c1, customers c2

         where c1.city = c2.city and c1.cid < c2.cid;

 

Note, could be upper case C1 and C2 column aliases;  lower case is our general rule, but not crucial;  just need to be consistent (case counts).

 

Do we need to say distinct above, class?  (No.  Because (c1, c2) is a set of unique pairs.  Waste to use distinct when not necessary.)

 

Alias idea is to make two copies of all content of customers table as c1 and c2;  then take Cartesian product, perform selection, and finally report out project onto c1, c2.

 

But what system really does is (draw customers table, show c1 and c2 as variables ranging over rows, perform nested loop — RANGE VARIABLES):

 

    FOR c1 FROM ROWS 1 TO LAST OF customers

         FOR c2 FROM ROWS 1 TO LAST OF customers

             IF (c1.city = c2.city and c1.cid < c2.cid)

                  PRINT OUT SELECT-LIST VALUES:  c1.cid, c2.cid

         END FOR c2

    END FOR c1

 

Once again c1 and c2 do not represent copied tables, but variables taking on row values in the same table!  Trick here is to NOT MATERIALIZE THE PRODUCT.  And now see why aliases are sometimes called range variables:  we can picture them as variables that range over the rows of a table.