CS430/630 Homework 3: RA/SQL Division, Outer Join, E-R

For scripts, use Oracle, and also optionally Mysql, in which case name the output samename_mysql.txt. For Oracle, use "set echo on" so the SQL is shown in the output.

1. Relational Division. Using schemas from the provided createdb.sql. For Oracle but not mysql, we can use MINUS for EXCEPT. Provide RA and SQL.

a. Find parts (pids, pnames and colors) that are supplied by all suppliers. Give RA and SQL

b. Find parts (pids, pnames and colors) that are supplied by all suppliers with names starting with 'A' (use sname >= 'A' and sname < 'B' for RA, LIKE for SQL).

c. Find employees (eid and salary) certified for all aircraft of cruising range under 2000.

d. Among employees (eid and salary) certified for all aircraft of cruising range under 2000, find the one(s) with maximal salary (SQL only)

Put the SQL in problem1.sql and its output in problem1.txt, and optionally for mysql in problem1_mysql.txt, as we did for hw2. Submit the RA on paper.

2. Outer Join. Do Problem 6 of Chapter 4 of Murach, and problem 3 of Chap. 6 of Murach. To test this, load the Invoices, etc. tables into Oracle using $cs630/murach/create_ap_tables.sql. This file can be copied to your current directory on topcat by the command (don't forget the dot at the end):

cp /data/htdocs/cs630/murach/create_ap_tables.sql .

Here is the text of these problems:

Problem 6 of Chap. 4


-- Write a select statement that returns two columns from the
-- General_Ledger_Accounts table: account_number and accounr_description. The
-- result set should have one row for each account number that has never been
-- used. Sort the final result set by account-number. Hint: Use an outer
-- join to the Invoice_Line_Items table.

 

Problem 3 of Chap. 6

-- write a select statement that returns two columns from the general ledger
-- accounts table: account_number and account_description. The result should
-- have one row for each account number that has never been used. Use a
-- subquery introduced with the NOT EXISTS operator, and sort the final result
-- by account number.

Put the SQL in problem2.sql and its output in problem2.txt. Don't worry about mysql for this problem. The script is full of Oracle-only data types.

3. E-R Study the solved problem 3.13 in R&G and the solved problem 2.3 on which it is based. The R&G solutions (odd numbers only) are available at the site linked from the class web page under Textbook Resources. The solved problem 2.3 lists the relevant design points and displays the E-R diagram (Figure 2.1). The E-R diagram gives names to the relationships in its diamonds, specifically work_in, Manages, ..., Runs, Work_proj listed in the tables below. Note that relationship Supervises is not listed since it is more complicated.

a. Examine the design points in the problem statement and use them to classify each relationship except Supervises as to whether it is a N-1 (many-to-one or one-to-many) or N-N (many-to-many) binary relationship. Then examine the E-R diagram of Figure 2.1 of the solutions, and record what it says (N-1 or N-N) for the relationship, hopefully in agreement with the design points. To answer this question, fill out the following table and include it in your paper document. Three parts are already filled in to get you started.

Relationship Design Point for left-to-right or top-to-bottom Direction Design Point for right-to-left or bottom-to-top Direction Conclusion from design points:  N-1 or N-N? From E-R diagram: N-1 or N-N?
work_in Profs can manage and/or work on multiple projects
Each project is worked on by one or more profs N-N N-N
Manages Profs can manage and/or work on multiple projects Each project is managed by one prof. N-1 N-1
Work_dept



Runs



Major none: department has grad students is assumed
Grad students have one major department
N-1 N-1
Work_proj



Advisor




b. Implementation of this database. For each N-1 relationship listed in part a., figure out what FK could be used to implement it. Note the discussion on pp. 78-79 about two ways to implement a N-1 relationship, a whole relationship table like Manages with a single-column PK, or an added FK column on the "many" side Dept, making a Dept_Mgr table as on pg. 79. The solution of problem 3.13 does neither, wrongly proposing relationship tables with two-column PKs. We want to replace these two-column PKs with one-column PKs in the relationship table, to better implement the N-1 relationships. Show how to do this by filling in the second column of the following table. Then also show how to alternatively implement the N-1 relationship by adding a column (if necessary) to one entity table with a FK to the other table, thus avoiding a relationship table altogether.

Relationship N-1 or N-N If N-1, name of relationship table and corrected PK for it (one column only: give column name) If N-1, Table to add column and FK to, to avoid relationship table, and what column to add If N-1, FK constraint to add to that table
work_in N-N



Manages N-1
manages: PK should be (pid)
project, new column manager
FK (manager) references professor
Work_dept



Runs



Major N-1
(no relationship table specified)
graduates already has column major
FK (major) references depts
Work_proj



Advisor



c. In relationship Manages, determine what is allowed by the two-column PK on the relationship table (as in the 3.13 solution) that is disallowed by the single-column PK on the relationship table. Specifically, which of the following is disallowed:
i. Two professors managing a certain project.
ii. Two Projects managed by one professor.

d. For relationship Manages, explain how you know the answer to c. from the E-R diagram alone.

e. In relationship Manages, explain what is meant by the thick line from project to Manages. Specifically, which of the following is meant:
i. Every professor manages some project.
ii. Every project has a manager (a professor).

4. More on Yelp_db.
a. Provide an E-R diagram like the one used in problem 3. for the Yelp database tables review, yuser, business, and category, based on their database constraints. Treat review as an entity, since it has its own unique id and lots of important attributes. Show the thin-line arrows and thick-line arrows where appropriate. You don't actually have enough information as to where to put the solid lines that aren't arrows, so don't worry about that. You don't need to show all the attributes, just the most important four or five of them per entity.

b. More queries: show the SQL for these in your paper delivery. Optionally run them on Oracle or mysql.

i. For Las Vegas, NV, find the number of businesses with "Hookah" in their name, and their average rating, by neighborhood, reporting only neighborhoods with at least 5 such businesses. Report neighborhood, count and average rating.
ii. Find the businesses by name that are reviewed by Macaroni (a user with this name) and no more than 3 other users.
iii. For each business in Las Vegas, NV, in neighborhood "The Strip", with business name starting with "Z" (to save on output), find the user (by name) who rates it highest (highest stars value), including ties. Here the rating that matters is in the review table. Report business name and its highest rater name.
The following two basic statistical queries are optional for CS430. Students in CS630 should run the queries as well as provide them in the paper. Put the output in problem4.txt and problem4_mysql.txt.
iv. Read about histogram queries in Lecture08. Investigate extract(datetime) for Oracle and mysql. Write a portable query (good for both Oracle and Mysql) for a histogram of the various years that users starting yelping, i.e., a query for year (like 2014) and the number of users that started yelping that year, as listed in column yelping_since of yuser. Note that both databases allow extract(...) in the GROUP BY, but for mysql you may need to put sql_mode = ''; (See Lecture 8, slides 26-27).
v. Investigate the distribution of businesses in Las Vegas by latitude by getting data for a histogram with buckets of width .025 degree of latitude away from 36 degrees of latitude, so bucket# = floor((latitude -36)/.025). Make sure the data is listed in latitude order.
vi Optional challenge. Seeing that this latitude distribution looks more or less like a normal distribution, get the database to compute its standard deviation. Note the one data point in the Southern Hemisphere (latitude < 0), shown in your histogram. Recalculate with "and latitude > 0" to see the outsize effect of this one bad point on the standard deviation value.

5. Install Java if necessary. Report on your progress, in your paper document:

a.  What is your development system OS?  Windows (if so, version 7, 8, or 10?), Linux, or Mac? 

b. Did you download Java, or did you already have a recent Java installation? Report any problems you had downloading or installing.

c. What version of Java did you end up with?  Show javac -version output.


For Delivery:

Paper delivery: Please make sure the answers are in this order.

RA for problem 1
Tables and other answers for problem3
E-R diagram and SQL queries for problem 4
Problem 5

Files in cs630/hw3:

problem1.sql, problem1.txt, optional problem1_mysql.txt
problem2.sql, problem2.txt
problem4.txt, problem4_mysql.txt, both optional for cs430.