CS 430/630 Homework 6 FDs, Normalization, Views, Access Control

due Wed., Dec. 11 in class, on paper

Note 1. An essential computation here is an attribute closure. Here is an example from slide 11 of Lecture 20:
FDs given and numbered: (1) C -> CSJDPQV, (2) JP -> C, (3) SD ->  P
Compute SDJ+, attribute closure of SDJ:
  SDJ+ = SDJ,  = SDJP  by (3), = SDJPC by (2), = SDJPCR by (1), = R (all attributes)

This means we started with SDJ, then added P by FD#3, then R by FD#1.  Here we can conclude that SDJ is a superkey. You can use this abbreviated notation in the homework.

Note 2. BCNF and 3NF are equivalent for tables with only single-column key(s), i.e., the majority of tables in practice. In fact, the only tables that are 3NF but not BCNF have at least one multi-column key with an inbound FD to part of the key, making them even rarer in practice. However, grad students should make an effort to understand the two definitions in R&G, sections 19.4.1 and 19.4.2. Undergrads may restrict their expertise to tables with possibly multiple one-column keys, in which case 3NF is equivalent to BCNF. Note that the BCNF decomposition algorithm (Sec. 19.6.1) is still relevant for 3NF decompositions: just look for violations of 3NF, not BCNF in choosing the FD to drive the decomposition step. If you end up with a lost FD (not preserved in one of the tables), add back a table as described by the bullets on pg. 627 under Dependency-Preserving Decomposition into 3NF.

1. Finding FDs. Exercise 19.3, which is solved online. Do those 2 parts and add a third part,
    3. Change the original contents by changing the lower two y1 values in the Y column to y2, leaving the upper two equal to y1. List all the FDs this new instance satisfies.

2. Exercise 19.2 Finding keys, determining 3NF, BCNF.
a. Replace the given FDs with the set  Given (1) A → B (2) BC → D, and (3) E → AC. This will yield a single single-column key, so we can be sure that 3NF status will be the same as BCNF status, i.e., the table is not in BCNF and also not in 3NF, or is in both BCNF and 3NF. If it's not in 3NF, say also whether it is in 2NF or not.

b.  (CS630 only) Use the original set of FDs: Given (1) A → B (2) BC → E, and (3)  ED → A
This will yield multiple multicolumn keys, so 3NF statis may differ from BCNF.

3. FDs, decomposition. Here is a table for classes at a college, with numbered time periods in each day:

Name

Department

Room

Period

Java 1

CS

100

5

Java 2

CS

110

4

Data Structures

CS

100

6

Calc 1

Math

350

6

Calc 2

Math

390

6

Linear Algebra

Math

350

4

 

a.       Find a functional dependency (FD) with one attribute (other than Name) on each side that holds for this instance.

 

b.      Disprove the proposed FD “department period -> room”

c.       Find a single-column key and a multi-column key that hold for this instance

d.    Is this table in 2NF? Why or why not?

e.      Propose a decomposition that removes the redundancy implied by the FD you found in a. Show the smaller table in full and state what column can be dropped in the larger table compared to the original table. Hint: this is a lot like the R -> W dependency example of Figures 19.1 and 19.2 and in the slides of Lecture 19.

4. Exercise 19.10 (a. and b.) Lossless Joins. For part b., a good decomposition means it provides lossless join(s) and preserves FDs. For each case 1-4, first check if the join(s) are lossless and give up if not. If lossless, determine whether the FDs are preserved.

5. Decompositions. Find a BCNF decomposition of the relation of Problem 19.10 for parts 1. and 2., using the FD sets given for each case. Part 2 is optional since it involves an unusual situation where BCNF and 3NF are possibly different: it has two two-column keys. Analyze the BCNF decomposition for unpreserved dependencies. If you find a 3NF decomposition that preserves dependencies along the way, report it. Note that the first step is to find the key(s), so you can test FDs as described on pp. 616-618.

6. Views
Consider a database schema with three relations:

emp (eid:integer, ename:string, age:integer, salary:decimal(10,2))
works (eid:integer, did:integer, pct_time:integer)
dept(did:integer, dname:string, budget:decimal(10,2), managerid:integer)

The keys are underlined in each relation. Relation emp stores employee information such as unique identifier eid, employee name ename, age and salary. Relation dept stores the department unique identifier did, unique department name dname, the department budget and managerid which is the eid of the employee who is managing the department. The managerid value must always be found in the eid field of a record of the emp relation. The works relation tracks which employee works in which department, and what percentage of the time s/he allocates to that department. Each eid and each did in works appears in emp or dept respectively. Note that, an employee can work in several departments. Note that this setup is almost exactly what we have specified in createdb.sql, where "string" is translated to char or varchar.

a. Explain how to change the create tables in createdb.sql to match these specifications.
b. Create a view ManagerSummary that lists for every department the department name, manager ID and manager name, manager salary and the number of employees in that department. The view will have five columns with headings: DeptName, MgrID, MgrName, MgrSalary and EmpCount. Put this SQL in createview.sql and drop-view in dropview.sql in your cs630/hw6 directory and show them in your paper.
c. Query the view above to retrieve the set of distinct salaries of managers who manage a department called “Sales”. Put this SQL in queryview1.sql.
d. Query the view above to find the name of the manager who manages most employees. If the same employee works in several departments, that employee is counted once in each of the departments. The manager is included in the count the same as all other employees, i.e., based on his or her records in the Works table. Put this SQL in queryview2.sql. Make a script problem6.sql of dropping and then creating the view, then doing the two queries. Supply the output in your paper, along with the contents of all the .sql files involved in this problem.

7. Views for Access Control. Suppose you need to provide access to the employees table without the salary information, to user clerk. Give the code to define a view empinfo(eid, ename) that provides this information from table employees, which is in schema HR for this problem. The view is also in schema HR. Give the command to grant select access to empinfo for user clerk, while keeping employees protected from clerk. Assume you can login to account HR, so you have full privileges on employees and empinfo. We don't need to do anything special about employees to hide it from user clerk because by default it is not visible to other users. You can test the view online but not the granted access, so just record everything in your paper.