Class 3.

 

Assignment  Read to end of Chapter 2.  Homework:  see it on class web page. Due on Class 6.

 

Theorem 2.4.2.  Every table T has at least one candidate key.

 

Idea of proof based on fact that a key is a minimal set of columns in T that distinguishes all rows:  if u and v are different rows and K is a key, then designer intention: u[K] (u restricted to K) is not identical to v[K].

 

Since the set of all columns has that property (is a superkey for T — note always relative to table T), seems that there MUST be a MINIMAL set of columns that does.  (NOTE PARTICULARLY:  dealing with SETS of columns.)

 

But many with no mathematical training can't show this.  Think in terms of Computer Science that you wanted to write a program to find such a set, and argue why the program termi­nates.

 

Proof.  NAME ITEMS.  Given a table T with Head(T) = A1 . . . An

SET UP TO LOOP:  Let attribute set S1 be this entire set.  (In what follows, we assume we know the intentions of the table designer to decide which sets of columns will always distinguish all rows, but can't answer the questions until the set of columns is named.)

 

LOOP:  Now S1 is a superkey for T;  either S1 is a Key for T or it has a proper subset S2,  S2 Ì S1, such that S2 is also a superkey for T.  Now, either S2 is a Key or it has a proper subset S3,  S3 Ì S2, such that S3 is also a superkey for T.  Can this go on forever?  (Mathematician:  why not?)

 

But each successive element in the sequence S1, S2, S3, . . . has a smaller number of columns (it is a PROPER subset), and we can never go to 0 (0 columns don't have unique values) or below, so this must be a finite se­quence with a smallest set at the end Sn.  That must be the key.  QED.  You should see why it wasn't enough to stop without giving the loop and ex­plaining why it terminates:  proof explains as much as possible.

 

Def.  Primary Key.  A Primary Key of a table T is a candidate key chosen by the DBA to uniquely identify rows in T (often used in references by other tables in "Foreign Key").  Thus on pg. 28, aid is the Primary key for agents and used as a foreign key in orders.

 

This Def implies there MIGHT be a situation where a table doesn't have to have a primary key, as when there is only one table in the database.  Most theory books don't allow this idea, but commercial databases do.

 

2.1.5.  Null values  If we insert a new agent:

 

    (a12, Beowulf, unknown, unknown)

 

Agent hasn't been assigned percent commission or city yet (still training, but want to have a record of him). A null value is placed in a column of a table when a value is either unknown or inappropriate.  Here it is unknown.

 

A slightly different mean­ing would be if warehouse manager also had a percent (commission) field, but since warehouse managers don't get commissions, so would get null value; a real value is inappropriate.

 

Another inappropriate example: employees table has columns (eid, mgrid); CEO of company has no manager, so mgrid is null.

 

A null value can be used for either a numeric or character type.  But it has a different value from any non-null value.  In particular, it is not zero (0) or the null string ('').  It is handled specially by commercial databases.

 

If we were to ask for the agent with the minimal percent or the short­est city name, we wouldn't want to get an agent with no percent or city yet assigned.

 

Similarly, if we were to ask for the average percent, wouldn't want to average in zero for a null.

 

In fact, if we were to ask for all agents with percent > 6, and then all agents with percent <= 6, we wouldn't get the new agent Beowulf in either answer.  There's just no meaningful answer to this question for null.

 

2.5  Relational Algebra.  Operations on tables to get other tables.  Codd. Interesting thing is can get answer to many useful Qs (first order predi­cate logic).  But abstract language:  we can't use a machine to get answer.

 

Two types of operations:  (1) Set Theoretic (depend on fact that table is a set of rows), and (2) Native operations (depend on structure of table).  Given two tables R and S, Head(R) = A1 . . . An, (S often the same), define 8 opera­tions.  See pg. 45.  Put on board.

 

SET THEORETIC OPERATIONS

                                                         KEYBOARD

      NAME                    SYMBOL  FORM            EXAMPLE

 

      UNION                       È            UNION                       R È S, or R UNION S

      INTERSECTION      Ç            INTERSECT             R Ç S, or R INTERSECT S

      DIFFERENCE          -              - or MINUS                R - S, or R MINUS S

      PRODUCT                ´             TIMES                        R ´ S, or R TIMES S

 

 

 

 

SPECIAL OPERATIONS

                                                         KEYBOARD

      NAME               SYMBOL       FORM            EXAMPLE

 

      PROJECT        R [    ]              R [    ]  R [Ai1. . .Aik]

      SELECT           R where C     R where C     R where A1 = 5

      JOIN                                      JOIN               R  S, or R JOIN S

      DIVISION          ¸                      DIVIDEBY     R¸S, or R DIVIDEBY S

 

Idea of Keyboard form:  name to use for an operation on a typewriter that doesn't have special operation symbol.

 

Set Theoretic Operations

 

Def. 2.6.1.  Two tables are said to be compatible iff they have the same schema.  (iff means "if and only if", or "means exactly the same thing".)

LEAVE UP FOLLOWING TABLES and RESULTS

                             R                                             S

A

B

C

 

A

B

C

a1

b1

c1

 

a1

b1

c1

a1

b2

c3

 

a1

b1

c2

a2

b1

c2

 

a1

b2

c3

 

 

 

 

a3

b2

c3

 

Illustrate R È S.  Union of two tables considering table as set of rows.  5 rows in result.  Clearly must be compatible tables, because rows of dif­ferent schemas wouldn't fit in same table.

 

Illustrate R Ç S.  two rows, R - S.  only one row, S - R.  two rows.

 

Before we go on to product (Cartesian product), consider idea of assign­ment or alias.

 

Def 2.6.3. Assignment, Alias.  Let  R be a table with Head(R) = A1. . . , An; to create a table S with renamed columns Head(S) = B1, . . ., Bn, such that Dom(Bi) = Dom(Ai) for all i, 1 ≤ i ≤ n, and S has the SAME CONTENT as R, we create table S with the as­sign­ment::

 

    S(B1,. . .,Bn) := R(A1,. . .,An). (Rename Table & Columns: Same (duplicated)

                                                                        Contents)

The content of the new table S is exactly the same as the content of the old table R, that is, a row u is in S if and only if a row t exists in R such that u[Bi] = t[Ai] for i, 1 ≤ i ≤ n.   The symbol := used in this assignment is called the assignment operator.

 

If don't need to rename columns, just want a different table names with same attributes, we call this a table alias, and write:

 

    S := R.

 

Use alias operator to save intermediate results of operations.  Can write: 

 

    T :=  (R È S) - (R Ç S),

 

or could write instead:

 

    T1 := (R È S); T2 := (R Ç S); T := T1 - T2  (Draw picture of this result)

 

NOTE THAT we often do not absolutely NEED to get intermediate results.  Can usually use subexpression in place of named alias.  One example below where we absolutely need it.

 

OK, now PRODUCT or Cartesian product.  Example of  R ´ S (from above, different than book):  (calculate it).  NOTE Tables NEED NOT BE COMPATIBLE for a product to be calculated.  (Write following Def for terminology.)

 

Definition 2.6.4.  Product.  The product  (or Cartesian product) of the tables R and S  is the table T whose heading is Head(T) =  R.A1. . .R.An S.B1. . .S.Bn.   For every pair of rows u, v in R and S, respectively, there is a row t in T such that  t(R.Ai) = u(Ai) for 1 ≤ i ≤ n and t(S.Bj) = v(Bj) for 1 ≤ j ≤  m.  No rows exist in T that do not arise in this way.  The product T of R and S is denoted by R ´ S.

 

Columns of product table are qualified names.  Used to distinguish identi­cally named attributes.  See Example 2.6.4, pgs 46-­47, for example.

 

Difference be­tween True Cartesian product of sets and Relational product: need Relational product to get properly qualified column names.

 

Special problem if try to take product of table with itself.  R ´ R would have identically named  columns. (What would happen then by Definition 2.6.4? Not possible.)  To get what we really want, say S := R and take R ´ S.

 

2.7.  Native Relational Operations

 

Projection.  Limit columns and cast out duplicate rows. (Example, not Def.)

 

Def. 2.7.1 PROJECTION. pg. 48 Given table R where Head(R) = A1, . . . , An, pro­ject R on sub­set of columns T :=  R[Ai1, . . ., Aik], where list in brackets is a subset of the complete set of attributes.  Cross out all columns and col­umn values in R not in set, then eliminate duplicate rows.

 

Ex:  Give a list of all cities in which customers are located.  CUS­TOMERS[city].  Look at Figure 2-2, pg. 28  get Duluth, Dallas, Kyoto.   (DRAW IT) (Note cast out duplicate rows.) 

 

List all customer names from the CUSTOMERS table of Figure 2.2.1.  An­swer to Query:  CUSTOMERS[cname].  Result is table with cname heading, TIpTop, Basics, Allied, ACME.  (DRAW IT) (Note cast out duplicate rows.)

 

List city and percent commissions for all agents.  Any duplicates go away?  (New York, 6;  but note duplicates in one of two columns is OK.)

 

Definition 2.7.2. pg. 49 SELECTION.  R where Condition.   The Condition is a logical condition that can be determined from the values of a single row of R.  Very simple kind of conditions. 

 

    Ai µ Aj  or Ai µ a, where Ai and Aj are attributes of R, a is a constant.

    µ  is any of:  <, >, =, <=, >=, <>

 

    Given conditions C, C', also get C and C', C or C', and finally:  not C.

 

Example:  Give the cid and cname of all customers living in Dallas with discount greater than 8.

 

    (CUSTOMERS where city = 'Dallas' and discnt > 8) [cid, cname]