cs634 hw4

CS634 – Homework 4

Due Wednesday, Mar. 21, in class (solution available Mar. 23) 20 points

Chaps 14-15: Query Operators and Query Plans

Problem 1 Join Methods and their cost

Exercise  14.5 (which should refer to 14.4, not 14.1, and is in the answer set) parts 1-3, but increase all the sizes by a factor of 100:
R contains  1,000,000 tuples, 10 tuples/page
S contains 200,000 tuples, 10 tuples/page
(1) Assume 5200 buffer pages (or use 5002 for neater numbers), unclustered B+-tree indexes on R.a and S.b. Show the computation of the cost of the blocked nested loops join and the indexed nested loop join and then do the requested comparison.
(a) Would your answer change if only 500 buffer pages were available?
(b) ... if S contained only 1000 tuples instead of 20K
(2) As in (1) except now consider clustered indexes on R.a and S.b.
(3) Sort-merge join with 1500 buffers. Also hash join.

Problem 2 Index Matching

Exercise 15.2 in the textbook (only sub-question 1). Note that, the question in the book is incomplete, because you also need the page size. Assume a page size of 8KB, with 8000 useful bytes after the page header, and dense indexes. Also assume each unclustered index is 2000 pages in size, from the ratio of data entry size (20 bytes) to record size (100 bytes) and the size of the whole table (10,000 pages). The clustered index is mostly the table itself, in its leaves, where there are 8000/100 = 80 records/leaf (a little less, really). The next level up has 10,000 index entries of about 20 bytes each, so 200,000/8000 pages = 25 pages, pretty negligible, but show it in calculations. The next level up is the root.

Sample answer: #rows in table = 80/page*10,000 pages = 800,000
a.       Sal > 100. Only #2 index matches.
Cost = index access + table access = .1*2000 + .1*800000 = 80,200 page i/os

Problem 3 Single-table query optimization

Exercise 15.4 in the textbook (only sub-questions 2-3). Note that, the question in the book is incomplete, because you also need the total number of records. Assume that there are 500,000 records. Assume that the table size can also be used as any clustered index size.

Problem 4 Multiple-table query considerations

Exercise 15.8 in the textbook (only sub-questions 1-3). Note that we studied a similar schema in homework 3, looking for useful indexes for other given queries.

Problem 5 Multiple-table query optimization

Exercise 15.9 in the textbook (only sub-question 6, parts a,b as below). Change the sal condition from the ridiculous “sal = 50000” to “sal >= 50000 and sal < 50100”, still very selective and very important to the best plan selection. Assume nested-loop joins are to be used, and left-deep plans.

a.  1-relation plans: apply any selections, list RFs, index to use if any, and costs for these plans

2-relation plans: joins of two tables that join (have a join condition) in the query, with appropriate selections and/or index use. Draw all the possibilities that are left-deep, i.e., have a base table as the right child of the join.

3-relation plans: draw both the left-deep possibilities, then argue that one is much better than the other.

b.      Find the cost of the plan you argued for