### CS430/630 Homework 3: RA/SQL Division, Outer Join, E-R, Views

For scripts, use Oracle, and also optionally Mysql, in which case name the output samename_mysql.txt. For Oracle, use "set echo on" so the SQL is shown in the output.

1. Relational Division. Using schemas from the provided createdb.sql. For Oracle but not mysql, we can use MINUS for EXCEPT. Provide RA and SQL.

a. Find parts (pids, pnames and colors) that are supplied by all suppliers. Give RA and SQL

b. Find parts (pids, pnames and colors) that are supplied by all suppliers with names starting with 'A' (use sname >= 'A' and sname < 'B' for RA, LIKE for SQL).

c. Find employees (eid and salary) certified for all aircraft of cruising range under 2000.

d. Among employees (eid and salary) certified for all aircraft of cruising range under 2000, find the one(s) with maximal salary (SQL only)

Put the SQL in problem1.sql and its output in problem1.txt, and optionally for mysql in problem1_mysql.txt, as we did for hw2. Submit the RA on paper.

2. Outer Join. Do Problem 6 of Chapter 4 of Murach, and problem 3 of Chap. 6 of Murach. To test this, load the Invoices, etc. tables into Oracle using \$cs630/murach/create_ap_tables.sql. This file can be copied to your current directory on topcat by the command (don't forget the dot at the end):

```cp /data/htdocs/cs630/murach/create_ap_tables.sql .```

Put the SQL in problem2.sql and its output in problem2.txt. Don't worry about mysql for this problem. The script is full of Oracle-only data types.

3. E-R Study the solved problem 3.13 in R&G and the solved problem 2.3 on which it is based. The R&G solutions (odd numbers only) are available at the site linked from the class web page under Textbook Resources. The solved problem 2.3 lists the relevant design points and displays the E-R diagram (Figure 2.1). The E-R diagram gives names to the relationships in its diamonds, specifically work_in, Manages, ..., Runs, Work_proj listed in the tables below. Note that relationship Supervises is not listed since it is more complicated.

a. Examine the design points in the problem statement and use them to classify each relationship except Supervises as to whether it is a N-1 (many-to-one or one-to-many) or N-N (many-to-many) binary relationship. Then examine the E-R diagram of Figure 2.1 of the solutions, and record what it says (N-1 or N-N) for the relationship, hopefully in agreement with the design points. To answer this question, fill out the following table and include it in your paper document. Three parts are already filled in to get you started.

 Relationship Design Point for left-to-right or top-to-bottom Direction Design Point for right-to-left or bottom-to-top Direction Conclusion from design points:  N-1 or N-N? From E-R diagram: N-1 or N-N? work_in Profs can manage and/or work on multiple projects Each project is worked on by one or more profs N-N N-N Manages Profs can manage and/or work on multiple projects Each project is managed by one prof. N-1 N-1 Work_dept Runs Major none: department has grad students is assumed Grad students have one major department N-1 N-1 Work_proj Advisor

b. Implementation of this database. For each N-1 relationship listed in part a., figure out what FK could be used to implement it. Note the discussion on pp. 78-79 about two ways to implement a N-1 relationship, a whole relationship table like Manages with a single-column PK, or an added FK column on the "many" side Dept, making a Dept_Mgr table as on pg. 79. The solution of problem 3.13 does neither, wrongly proposing relationship tables with two-column PKs. We want to replace these two-column PKs with one-column PKs in the relationship table, to better implement the N-1 relationships. Show how to do this by filling in the second column of the following table. Then also show how to alternatively implement the N-1 relationship by adding a column (if necessary) to one entity table with a FK to the other table, thus avoiding a relationship table altogether.

 Relationship N-1 or N-N If N-1, name of relationship table and corrected PK for it (one column only: give column name) If N-1, Table to add column and FK to, to avoid relationship table, and what column to add If N-1, FK constraint to add to that table work_in N-N Manages N-1 manages: PK should be (pid) project, new column manager FK (manager) references professor Work_dept Runs Major N-1 (no relationship table specified) graduates already has column `major` FK (major) references depts Work_proj Advisor

c. In relationship Manages, determine what is allowed by the two-column PK on the relationship table (as in the 3.13 solution) that is disallowed by the single-column PK on the relationship table. Specifically, which of the following is disallowed:
i. Two professors managing a certain project.
ii. Two Projects managed by one professor.

d. For relationship Manages, explain how you know the answer to c. from the E-R diagram alone.

e. In relationship Manages, explain what is meant by the thick line from project to Manages. Specifically, which of the following is meant:
i. Every professor manages some project.
ii. Every project has a manager (a professor).

4. More on Yelp_db.
a. Provide an E-R diagram like the one used in problem 3. for the Yelp database tables review, yuser, business, and category, based on their database constraints. Treat review as an entity, since it has its own unique id and lots of important attributes. Show the thin-line arrows and thick-line arrows where appropriate. You don't actually have enough information as to where to put the solid lines that aren't arrows, so don't worry about that. You don't need to show all the attributes, just the most important four or five of them per entity.

b. More queries: show the SQL for these in your paper delivery. Optionally run them on Oracle or mysql.

i. For Las Vegas, NV, find the number of businesses with "Hookah" in their name, and their average rating, by neighborhood, reporting only neighborhoods with at least 5 such businesses. Report neighborhood, count and average rating.
ii. Find the businesses by name that are reviewed by Macaroni (a user with this name) and no more than 3 other users.
iii. For each business in Las Vegas, NV, in neighborhood "The Strip", with business name starting with "Z" (to save on output), find the user (by name) who rates it highest (highest stars value), including ties. Here the rating that matters is in the review table. Report business name and its highest rater name.
The following two basic statistical queries are optional for CS430. Students in CS630 should run the queries as well as provide them in the paper. Put the output in problem4.txt and problem4_mysql.txt.
iv. Read about histogram queries in Lecture08. Investigate extract(datetime) for Oracle and mysql. Write a portable query (good for both Oracle and Mysql) for a histogram of the various years that users starting yelping, i.e., a query for year (like 2014) and the number of users that started yelping that year, as listed in column yelping_since of yuser. Note that both databases allow extract(...) in the GROUP BY, but for mysql you may need to put sql_mode = ''; (See Lecture 8, slides 26-27).
v. Investigate the distribution of businesses in Las Vegas by latitude by getting data for a histogram with buckets of width .025 degree of latitude away from 36 degrees of latitude, so bucket# = floor((latitude -36)/.025). Make sure the data is listed in latitude order.
vi Optional challenge. Seeing that this latitude distribution looks more or less like a normal distribution, get the database to compute its standard deviation. Note the one data point in the Southern Hemisphere (latitude < 0), shown in your histogram. Recalculate with "and latitude > 0" to see the outsize effect of this one bad point on the standard deviation value.

5. Install Java if necessary. Report on your progress, in your paper document:

a.  What is your development system OS?  Windows (if so, version 7, 8, or 10?), Linux, or Mac?

c. What version of Java did you end up with?  Show javac -version output.

6. Views

Consider a database schema with three relations:
```emp (eid:integer, ename:string, age:integer, salary:decimal(10,2)) works (eid:integer, did:integer, pct_time:integer) dept(did:integer, dname:string, budget:decimal(10,2), managerid:integer)```

The keys are underlined in each relation. Relation emp stores employee information such as unique identifier eid, employee name ename, age and salary. Relation dept stores the department unique identifier did, unique department name dname, the department budget and managerid which is the eid of the employee who is managing the department. The managerid value must always be found in the eid field of a record of the emp relation. The works relation tracks which employee works in which department, and what percentage of the time s/he allocates to that department. Each eid and each did in works appears in emp or dept respectively. Note that, an employee can work in several departments. Note that this setup is almost exactly what we have specified in createdb.sql, where "string" is translated to char or varchar.

a. Explain how to change the create tables in createdb.sql to match these specifications. Put this answer in the paper submission.
b. Create a view ManagerSummary that lists for every department the department name, manager ID and manager name, manager salary and the number of employees in that department. The view will have five columns with headings: DeptName, MgrID, MgrName, MgrSalary and EmpCount. Put this SQL in createview.sql and drop-view in dropview.sql.
c. Query the view above to retrieve the set of distinct salaries of managers who manage a department called “Sales”. Put this SQL in queryview1.sql.
d. Query the view above to find the name of the manager who manages most employees. If the same employee works in several departments, that employee is counted once in each of the departments. The manager is included in the count the same as all other employees, i.e., based on his or her records in the Works table. Put this SQL in queryview2.sql.

Make a script problem6.sql of dropping and then creating the view, then doing the two queries. Supply the output in problem6.txt.

For Delivery: