cs430/630 hw3

For scripts, use Oracle, and also optionally Mysql, in which case name the output samename_mysql.txt. For Oracle, use "set echo on" so the SQL is shown in the output.

1. Relational Division. Using schemas from the provided createdb.sql. For Oracle but not mysql, we can use MINUS for EXCEPT. Provide RA and SQL.

a. Find parts (pids, pnames and colors) that are supplied by all suppliers. Give RA and SQL

b. Find parts (pids, pnames and colors) that are supplied by all suppliers with names starting with 'A' (use sname >= 'A' and sname < 'B' for RA, LIKE for SQL).

c. Find employees (eid and salary) certified for all aircraft of cruising range under 2000.

d. Among employees (eid and salary) certified for all aircraft of cruising range under 2000, find the one(s) with maximal salary (SQL only)

Put the SQL in problem1.sql and its output in problem1.txt, and optionally for mysql in problem1_mysql.txt, as we did for hw2. Submit the RA on paper.

2. Outer Join. Do Problem 6 of Chapter 4 of Murach, and problem 3 of Chap. 6 of Murach. To test this, load the Invoices, etc. tables into Oracle using $cs630/murach/create_ap_tables.sql. This file can be copied to your current directory on topcat by the command (don't forget the dot at the end):

-- Write a select statement that returns two columns from the
-- General_Ledger_Accounts table: account_number and accounr_description. The
-- result set should have one row for each account number that has never been
-- used. Sort the final result set by account-number. Hint: Use an outer
-- join to the Invoice_Line_Items table.

-- write a select statement that returns two columns from the general ledger
-- accounts table: account_number and account_description. The result should
-- have one row for each account number that has never been used. Use a
-- subquery introduced with the NOT EXISTS operator, and sort the final result
-- by account number.

Put the SQL in problem2.sql and its output in problem2.txt. Don't worry about mysql for this problem. The script is full of Oracle-only data types.

3. E-R Study the solved problem 3.13 in R&G and the solved problem 2.3 on which it is based. The R&G solutions (odd numbers only) are available at the site linked from the class web page under Textbook Resources. The solved problem 2.3 lists the relevant design points and displays the E-R diagram (Figure 2.1). The E-R diagram gives names to the relationships in its diamonds, specifically work_in, Manages, ..., Runs, Work_proj listed in the tables below. Note that relationship Supervises is not listed since it is more complicated.

a. Examine the design points in the problem statement and use them to classify each relationship except Supervises as to whether it is a N-1 (many-to-one or one-to-many) or N-N (many-to-many) binary relationship. Then examine the E-R diagram of Figure 2.1 of the solutions, and record what it says (N-1 or N-N) for the relationship, hopefully in agreement with the design points. To answer this question, fill out the following table and include it in your paper document. Three parts are already filled in to get you started.

b. Implementation of this database. For each N-1 relationship listed in part a., figure out what FK could be used to implement it. Note the discussion on pp. 78-79 about two ways to implement a N-1 relationship, a whole relationship table like Manages with a single-column PK, or an added FK column on the "many" side Dept, making a Dept_Mgr table as on pg. 79. The solution of problem 3.13 does neither, wrongly proposing relationship tables with two-column PKs. We want to replace these two-column PKs with one-column PKs in the relationship table, to better implement the N-1 relationships. Show how to do this by filling in the second column of the following table. Then also show how to alternatively implement the N-1 relationship by adding a column (if necessary) to one entity table with a FK to the other table, thus avoiding a relationship table altogether.

c. In relationship Manages, determine what is allowed by the two-column PK on the relationship table (as in the 3.13 solution) that is disallowed by the single-column PK on the relationship table. Specifically, which of the following is disallowed:
i. Two professors managing a certain project.
ii. Two Projects managed by one professor.

d. For relationship Manages, explain how you know the answer to c. from the E-R diagram alone.

e. In relationship Manages, explain what is meant by the thick line from project to Manages. Specifically, which of the following is meant:
i. Every professor manages some project.
ii. Every project has a manager (a professor).

4. More on Yelp_db.
a. Provide an E-R diagram like the one used in problem 3. for the Yelp database tables review, yuser, business, and category, based on their database constraints. Treat review as an entity, since it has its own unique id and lots of important attributes. Show the thin-line arrows and thick-line arrows where appropriate. You don't actually have enough information as to where to put the solid lines that aren't arrows, so don't worry about that. You don't need to show all the attributes, just the most important four or five of them per entity.

b. More queries: show the SQL for these in your paper delivery. Optionally run them on Oracle or mysql.

i. For Las Vegas, NV, find the number of businesses with "Hookah" in their name, and their average rating, by neighborhood, reporting only neighborhoods with at least 5 such businesses. Report neighborhood, count and average rating.

ii. Find the businesses by name that are reviewed by Macaroni (a user with this name) and no more than 3 other users.

iii. For each business in Las Vegas, NV, in neighborhood "The Strip", with business name starting with "Z" (to save on output), find the user (by name) who rates it highest (highest stars value), including ties. Here the rating that matters is in the review table. Report business name and its highest rater name.

The following two basic statistical queries are optional for CS430. Students in CS630 should run the queries as well as provide them in the paper. Put the output in problem4.txt and problem4_mysql.txt.

iv. Read about histogram queries in Lecture08. Investigate extract(datetime) for Oracle and mysql. Write a portable query (good for both Oracle and Mysql) for a histogram of the various years that users starting yelping, i.e., a query for year (like 2014) and the number of users that started yelping that year, as listed in column yelping_since of yuser. Note that both databases allow extract(...) in the GROUP BY, but for mysql you may need to put sql_mode = ''; (See Lecture 8, slides 26-27).

v. Investigate the distribution of businesses in Las Vegas by latitude by getting data for a histogram with buckets of width .025 degree of latitude away from 36 degrees of latitude, so bucket# = floor((latitude -36)/.025). Make sure the data is listed in latitude order.

vi Optional challenge. Seeing that this latitude distribution looks more or less like a normal distribution, get the database to compute its standard deviation. Note the one data point in the Southern Hemisphere (latitude < 0), shown in your histogram. Recalculate with "and latitude > 0" to see the outsize effect of this one bad point on the standard deviation value.

a. What is your development system OS? Windows (if so, version 7, 8, or 10?), Linux, or Mac?

b. Did you download Java, or did you already have a recent Java installation? Report any problems you had downloading or installing.

RA for problem 1
Tables and other answers for problem3
E-R diagram and SQL queries for problem 4
Problem 5

problem1.sql, problem1.txt, optional problem1_mysql.txt
problem2.sql, problem2.txt
problem4.txt, problem4_mysql.txt, both optional for cs430.

Relationship	Design Point for left-to-right or top-to-bottom Direction	Design Point for right-to-left or bottom-to-top Direction	Conclusion from design points: N-1 or N-N?	From E-R diagram: N-1 or N-N?
work_in	Profs can manage and/or work on multiple projects	Each project is worked on by one or more profs	N-N	N-N
Manages	Profs can manage and/or work on multiple projects	Each project is managed by one prof.	N-1	N-1
Work_dept
Runs
Major	none: department has grad students is assumed	Grad students have one major department	N-1	N-1
Work_proj
Advisor

Relationship	N-1 or N-N	If N-1, name of relationship table and corrected PK for it (one column only: give column name)	If N-1, Table to add column and FK to, to avoid relationship table, and what column to add	If N-1, FK constraint to add to that table
work_in	N-N
Manages	N-1	manages: PK should be (pid)	project, new column manager	FK (manager) references professor
Work_dept
Runs
Major	N-1	(no relationship table specified)	graduates already has column `major`	FK (major) references depts
Work_proj
Advisor

CS430/630 Homework 3: RA/SQL Division, Outer Join, E-R