CS634 – Practice Final Exam         NAME:_______________________________________

  150 points (25 points for each problem)

OPEN BOOKS - OPEN NOTES - CLOSED ELECTRONIC DEVICES  other than simple calculators.   Show all work on these pages.

 

1. Shorter answer
a.   If a disk has 1MB on each track, and rotates at 6000rpm, what is its maximum sustainable data transfer rate in MB/s?



b.  As DBA, how can you list all the users of a mysql/Mariadb server? Show the commands assuming you are already logged into the server as root.
 


c.  Consider the cross-tabs of Figure 25.5 on page 855, and its cube fact table in Figure 25.1 on page 851. Note that 1997 is timeid 3 here. What numbers from the fact table add up to the 35 you see in the cross-tabs? You can just write down the numbers in a list.



d. Suppose one docker container is running on your Linux VM. How can you find its IP address? Show the docker commands to use.



e. How is the two-phase locking protocol modified when read-committed isolation is in use for a transaction?

 


2.  B+ Trees

Consider a B+-tree that is stored on a disk with page size 8192 bytes, with 8000 bytes usable. A key requires 40 bytes of storage, a page pointer requires 10 bytes of storage, and a RID requires 12 bytes. The index (unclustered) uses Alternative 2 for the data entries in leaf nodes. Assume that you have a file that is indexed with 1,000,000 records and that the nodes (both leaf and internal) are full. Find how many levels the tree has, and how many nodes at each level.


 




3. Relational Operators

You are given the following two relations:

Employees (eid:integer, did:integer, salary:integer, age:integer)

      Departments (did:integer, dname:string, floor:integer, budget:integer)

 

Employee occupies 500 pages and Departments occupies 100 pages.

Assume that all attributes within the same relation have the same length. Employees are uniformly distributed over 1000 departments. An Employee page fits 40 tuples, while a Department page fits 50 tuples. Age values are uniformly distributed in the range [21:60]. Budgets are recorded in units of $1000 and are uniformly distributed in the range 100 to 500. Assume there are B=22 buffers available. Ignore the final output cost of the result for all cases.

(a)     What is the I/O cost to perform a block nested loop join of the two tables?

 

 

 

 

(b)     Assume that you are computing the projection of Employee on (did, age) with duplicate elimination using a sort-based approach.  How much data needs to be sorted? _______ pages.  Show the passes that are needed and the estimated i/o cost for each, and the total i/o cost.  As usual, don't include any i/o cost for writing out the sorted data.

 

 

 

 

 

(c)     You are given the query:

 

SELECT SUM(D.budget)

FROM Departments D

WHERE budget > 400

 

What is the approximate I/O cost to execute this query given a clustered Alternative 1 B+-tree index on budget?  Show calculation of the appropriate reduction factor first.

 

 

 

(d)     What is the I/O cost to execute this query using an unclustered Alternative 2 B+-tree index on budget? Is a table scan better?

 

 

 

 

 

 

 


 

 

4. Query Optimization. Use the same setup as in the last problem, and also given a clustered B+-tree index for Departments on budget and a unclustered hash index for Employee on did.  You are given the query:

 

SELECT D.did, E.age

FROM Departments D, Employees E

WHERE D.did = E.did AND E.age > 50 AND D.budget > 400

 

Find the best (lowest i/o cost) nested-loops join plan.

 

a.       Show the query tree for the best nested loops plan, the one that can use both indexes.

 

 

 

 

 

 

b.      Specify the access method for each table (i.e., what index is in use)

 

c.       Estimate the i/o cost of the query.





5. Concurrency Control

2)       (30 points.)  Consider the following schedule

 

                  R1(A) W1(A) R2(B) R1(B) R2(A) W1(B) C1 W2(A) C2

a.       Identify all conflicting pairs of operations above by connecting them with arcs above the printed line.

 

b.       Now draw the dependency graph below.

 

 

 

 

 

 

c.        Explain how you know from b. that the schedule is or is not serializable.

 

 

 

 

 

d.       The logic of the program for T1 is as follows. T1 is deducting 10 from account A, checking the balances of two accounts A and B, and subtracting 70 from B as long as the sum of balances A and B will remain > 0. The program for T2 is subtracting 70 from account A as long as the sum of the two balances will remain > 0.

 

Rewrite the schedule to show how it would run with strict two-phase locking in force, with A and B originally both 50. When you see a deadlock, choose T1 to abort. Don’t retry the aborted transaction (retries are optional and depend on the programming details.)   Show the history with locking operations and determine the final values of A and B.

 

 


 

6. Recovery. Again consider the history from the last problem:

a. Copy your final schedule, from problem 5d: Drop the locking calls for this problem, and be sure to include only actions that actually happened in the final schedule.

Executed Schedule:

b. Assume A is on page P1 and B is on page P2. The system crashes after all these operations have been logged, including the abort handling operations. Show the log just before the crash by finishing the list below. Then show the normal recovery (no crash during recovery.) Assume that the ARIES protocol is set up for this DBMS. Show the dirty page table and transaction table at the end of the Analysis phase. Show the steps of the redo and undo phases. Show the resulting additions to the log file (after complete recovery).  Use the back of the previous page for more space.

00           Begin checkpoint

10           End checkpoint

20           Update: T1 writes P1

               

               

 

--- CRASH ---

Recovery: