cs634 hw1

CS634 – Homework 1

Released Jan. 24, Due Mon., Feb. 5

Relationships, Creating and Loading Databases, Generated PKs (15 points)

 Recent edits are marked by yellow highlight

Instructions: The homework is due BEFORE CLASS on the duedate. Please hand in paper copies (either word-processor or hand-written copies are fine, as long as the hand writing is clear). You also need to create files which you must place in your UNIX course directory for cs634 in directory hw1.

Problem 1

  1. Copy the tables directory from $cs634 (/data/htdocs/cs634) to your cs634/hw1 directory, ending up with directory cs634/hw1/tables. Read its README and *.sh. Use its contents to load the sample databases from the book into your Oracle and mysql accounts.  Note that the Report any problems (other than the expected ones involving .bad files) as soon as possible.
  2. Determine why Oracle fails to load the three rows shown in the .bad files.  Explain both the catalog and suppliers rows—there are different reasons.  Examine how mysql handles these three rows: any output related to them in the load?  What do they look like when loaded?  Briefly summarize the root cause of the difference between how the two databases handle these loads.
  3. Edit createdb.sql by changing one digit of one number so that the loads on Oracle and mysql work without error and load the same data. Explain in your homework what that edit was.

Problem 2

  1. Draw an E-R diagram for emp-dept-works tables of createdb.sql and explain why the diagram should be drawn that way.
  2. Draw an E-R diagram for the Yelp database. See a table-level diagram at https://www.yelp.com/dataset/documentation/sql and schema linked from there. First figure out which table(s) qualify as relationship tables. Also show the E-R diagram with review as an entity.

Problem 3

We looked at the flights table in class and determined it wasn’t normalized.

  1. Draw the E-R diagram of flights-aircraft-emplyees-certified as supplied.
  2. Normalize the database and draw the new E-R diagram.  Show the changed/new create table statements.
Problem 4 Review of SQL

a. (Review of cs630, OK to reuse old homework paper) Exercise 5.2 in the textbook, except use the table schemas in createdb.sql, slightly different. Give the quereis in relational algebra and SQL.  Try the SQL queries out on Oracle and mysql by writing a file queries4a.sql and then executing it on both databases, leaving their output in queries4a-oracle.txt and queries4a-mysql.txt in your cs634/hw1 directory, along with queries4a.sql of course. Show queries4a.sql in your homework paper. Note that some of these queries need relational division, very tricky to do in SQL. See an example of division in SQL in Q9, pg 150. Specifically, do parts 1, 3, 4, 5, 7, and 10 (all but #3 done in hw2 last term),
b. Queries over the Yelp database (following the logic of Exercise 5.2 queries, and numbered accordingly) for queries4b.sql. Show the SQL queries in your homework paper. Note that the user table has been renamed to yuser to avoid using the Oracle reserved word user. The Oracle tables exist but are empty, while the mysql tables have data. Columns originally named "date" have been renamed as well.   Sample query: select count(*) from yelp_db.yuser;  Note that output is not required for these queries because many of them take over 5 min. to run. Please abort queries that take more than 30 seconds, unless you are running after midnight.
1. Find the business names for which there is some review (no duplicate names).
3. Find the names of users who have reviewed every business in Clark, NV.
4. Find the businesses by name which are reviewed by Macaroni and noone else.
6. For each business, find the user (by name) who rates it highest (highest stars value), including ties.
7. Find the ids of users who review only businesses in zipcode 02138  (no such rows exist in the data)
10. For every business that is reviewed only by users having "cool count" value at least 10, print the business name and the total number of reviews they have.

Problem 5 Generating Ids

Create a table with abstract schema:

Employee (eid:integer, ename:string, age:integer, salary:USmoney)

a. The exact types used are up to you, as long as they can do the job and are portable. The eid should be generated by auto_increment on mysql and by a sequence on Oracle.  Note that neither of these features are covered by SQL-92, so portability isn't expected. The create table statement is different in the two cases. Write create_employee_oracle.sql and create_employee_mysql.sql for the two cases, where the Oracle file creates the sequence as well as the table.  Create one data file  employees.txt that works for both cases with 3 employees. Write a shell file employees_oracle.sh and employees_mysql.sh that builds and loads the table (just use inserts to load the table), and drop_employees_oracle.sql and drop_employees_mysql.sql that drop the table and sequence. All the files should be in your hw1 directory. Show the create_employee_* and employee_* files in your homework paper, along with a script of runs that shows the loaded table.

b. Discuss how to do the id generation portably, within SQL-92. You need another table. We use this way in cs636.

Files for delivery in hw1

tables directory, with *.bad, edited createdb.sql
queries4a.sql, queries4a-oracle.txt, queries4a-mysql.txt
queries4b.sql, queries4a-oracle.txt, queries4b-mysql.txt
create_employee_oracle.sql, create_employee_mysql.sql
employees.txt, employees_oracle.sh, employees_mysql.sh
drop_employees_oracle.sql, drop_employees_mysql.sql