Class 2.
New Students? Notes, hw1 up on CS630 Web Page: http://www.cs.umb.edu/cs630
OK, Now start Chapter 2. Sit with someone who has the text. Relational concepts & Relational Algebra.
Section 2.1. Look at pg. 28. CUSTOMERS, AGENTS, PRODUCTS, ORDERS. CAP database, collection of computerized records maintained for common purpose.
- Possible to have two databases on same machine used by same people (student records and University library contents)
- Go over meanings of column-names, p 27.
- Note that cid is unique for customers, cname is NOT.
- See how calculate dollars value in orders: qty of orders row times price
in products row for that pid and take discnt for customer row for cid. But we don't want to do that every time we ask for the dollars value so we carry the value in the orders table.
- Note too that table is unrealistically small: more columns (name, straddr, contact, timestamp for orders), more rows, more tables (keep track of billing, salaries, taxes)
Section 2.2. Terminology. Studying Rel Alg now, SQL next chapter.
Table (relation) (Old: file of records).
Column names (attributes) (Old: field names of records)
Rows (tuples) (Old: records of a file).
Table heading (Schema) (Old: set of attributes).
Set of columns is normally relatively constant, part of mental model used for queries, rows are changeable and not kept track of by user (queried).
- Note how we say Contents at a given moment on pg. 28, CAP. This turns out to be important when asked to make up a query to answer a question, query must still answer the question even if all the data changes.
This rule is sometimes called Program-Data Independence — mentioned in Chapter 1; used to be file of records, but this is better: level of abstraction
Heading (Schema) of table. Head(CUSTOMERS) = {cid, cname, city, discnt} (can leave out { } and commas, just write: cid cname city discnt)..
Typically name a table T, S, R with subscripted attributes.
Head( T) = A1 A2 A3 A4 Note notation: set of attributes is just a LIST.
VERY IMPORTANT CONCEPT. The number of rows changes frequently and rows are not remembered by users; the columns usually DON'T change in number, many names are remembered, and USED TO POSE QUERIES.
Column type (Column Domain) A table is DECLARED in SQL and columns have certain TYPES: real, integer, char(13). (See pg 733 for types in SQL.)
Idea of Domain in Relational Algebra, is like an enumerated type.
Domain of City: all the city names in the U.S.
Domain of DIscnt: all real numbers between 0.00 and 20.00.
This concept is not implemented in commercial systems today; only have types such as char(13) and real. But assume enumerated types in Relational Algebra.
Say CID = Domain(cid), CNAME = Domain(cname), CITY and DISCNT are the domains for CUSTOMERS table columns, then consider:
CID ´ CNAME ´ CITY ´ DISCNT (Cartesian Product)
consisting of all tuples: (w, x, y, z), w in CID, x in CNAME, . . .
ALL POSSIBLE tuples: (c999, Beowulf, Saskatoon, 0.01)
A mathematical relation between these four domains is a subset of the Cartesian product. E.g., if have 4 attributes, A1, A2, A3, and A4 , and T is a relation such that Head(T) = A1 A2 A3 A4, then:
T Í Domain(A1) ´ Domain(A2) ´ Domain(A3) ´ Domain(A4)
T is said to relate a data item in Domain(A1) with one or more in Domain(A2), . . . ; a function is a special case of a relation.
Section 2.3. Relational Rules. Idea is to make all products have same characteristics. But a bit too mathematical, overlooking important implementation aspects, so some rules are broken by most products.
Rule 1. First Normal Form rule. Can't have multi-valued fields. (Repeating fields). See pg. 33. All relational products obey this rule (Oracle, etc.).
- But object-relational products break this rule
- Thus can't have employees table with column "dependents" which contains multiple dependent's names (Figure 2.3, pg. 33)
- Could create one table with duplicates on different rows (e.g., employees-dependents table join of employees and dependents), but this is bad for other reasons. (Figure 2.4, pg 33)
- Ends up meaning we have to create two tables and join them in later queries. (Figure 2.5, pg. 34)
— in OR model, Figure 2.3 is OK, but we are assuming relational -- no multi-valued fields.
Rule 2. Access rows by content only. Can't say: the third row down from the top. No order to the rows. (Also: No order to the columns.)
- Disallows "pointers" to rows, e.g. Row IDs (RIDs) or "refs".
- All relational products break this rule by allowing users to get at rows by RIDs; ( object-relational products have refs as part of syntax.)
Rule 3. Unique rows. Two tuples cannot be identical in all column values at once. A relation is an unordered SET of tuples (2 rows can't be same in all attributes). But many products allow this for efficiency of load.
In the current Chapter, Chapter 2, we will assume that all these rules hold perfectly.
Section 2.4. Keys and Superkeys.
Idea is that by intention of DBA, some set of columns in a table distinguishes rows (not an accident of data). E.g., cid, ordno. Singleton sets. But pname in PRODUCTS (pg 28) is unique as an accident of data!
In commercial product, DBA declares which set of columns has this property and it becomes impossible for two columns to have the same values in all these columns.
It is USEFUL to have such a key for a table, so other tables can refer to a row, e.g.: employee number, bank account ID.
In CAP database, keys are single columns: cid, aid, pid, and ordno.
But can have two or even three columns in a key.
Ex. Figure 2.5 key for dependents table is (eid, dependent)
Ex: customer emails, allowing multiple email accounts per customer, where the first one is preferred, next one is second to try, etc.
Since tables don’t have order to their rows, we have to hold the first-position, second-position info in a column, say “rank”:
cust_emails
|
custid |
rank |
address |
|
1 |
1 |
joe@cs.umb.edu |
|
1 |
2 |
joe123@gmail.com |
|
2 |
1 |
sue@cs.umb.edu |
key: (cid, rank)
The address looks like a key, but if the DBA doesn’t say so, it isn’t.
Note this shows how to properly attach 2 ordered values to the customer record, relationally
A superkey is a set of columns that has the uniqueness property, and a key is a minimal superkey: no subset of columns also has uniqueness property.
Don't let the word "superkey" confuse you. A superkey is just a set of columns that contains one or more keys!
A superkey for CUSTOMERS is: cid, cname; A key is: cid (alone)
Rigorous definitions on pg. 37, Def 2.4.1. (This is a rephrasing.)
Let A1, A2, . . . , An be the attributes of a table T (subset of Head(T)). Let S be a subset of these attributes, Ai1, Ai2, . . . , Aik. Then S is a SUPERKEY for T if the set S distinguishes rows of T by designer intention, i.e. if u and v are two rows then for some Aim in S, u[Aim] ≠ v[Aim] (new notation).
Consider the set of all columns in T. Is this a superkey? (Yes, by relational rule 3. Designer intention must accept this rule.)
A set of attributes K is a KEY (a CANDIDATE KEY) if it is a minimal superkey, i.e., no proper subset of K will also act as a superkey.
Note:
- A singleton column superkey (cid) is always a key. Go to definition.
- The set of all attributes of a table is always a superkey. (why?)