CS210 Class 26

CS210 Thursday, Apr. 27

Binary Search Trees with Rank: Sec. 19.2, pg. 611

Remember that Binary Search Trees (or BSTs) have a key value at each node, leaf or not. At each node, the key x is greater than all the keys in its left subtree and less than all the keys in its right subtree. With this structure, it is easy to find any key, and all keys are stored by the order of the keys. Inserts and deletes can be done efficiently (unless the tree gets badly unbalanced.) Thus a BST can easily implement Set, or at least a Set without iterator(). (We could add iterator() by using the inorder Tree Traversal iterator of the last chapter, admittedly not easy.)

However, with a plain BST, there is no efficient way to find the kth element, for some given k value. We don’t know if the root element is first, last or somewhere in the middle of the key values. This problem is solved by storing the subtree size at each node, and the BST variant is called by Weiss the “BST with Rank.”

Example: each node has its size, the count of nodes at or below it in the tree:

key=100

size=9

/ \

key=70 key=200

size=5 size=3

/ \ / \

key=10 key=99 key=110 key=201

size=3 size=1 size=1 size=1

/ \

key=9 key=11

size=1 size=1

Note that this tree is still a BST, so we can do key lookup the same old way. For example, to do find(11), we start at the root, and see that 11 < 100, so go left, and compare 11 to 70, go left again, compare 11 to 10, go right, match 11.

Now consider findKth(7), looking for the 7^th element in the tree, counting up from 1. By counting through the tree, we can see that node 7 is the one with key 110. But we can find it by working down the tree from the root, using the size values at each level.

At the root, size=9, so that means that there are nodes 1, 2, 3, …, 9 in the tree somewhere, and 7 is a valid position.

At the root, leftSize=5, meaning the size of the left subtree. Since 7 > 5, node 7 can’t be in the left subtree.

At the root, leftSize + 1 = 6, covering the positions of the left subtree and the node itself, and 7 > 6, so it must be in the right subtree.

We go down to the right subtree, to find 7 among nodes 7, 8, and 9, i.e., at the first position in this subtree. This is findKth(1) starting from the node with key=200. In general, the new rank is k – (leftSize+1) when we go to the right child.

At the key=200 node, leftSize = 1, and 1 <= 1, so the node we’re looking for must be in the left subtree.

We go down to the left child, with key=110, to find 1: leftSize = 0, so not in left subtree, leftSize + 1 = 1, so we are at the desired node.

These sizes turn out to be easy to maintain through inserts and deletes in the BST. Suppose we add 112 to the above tree:

key=100

size=9->10

/ \

key=70 key=200

size=5 size=3->4

/ \ / \

key=10 key=99 key=110 key=201

size=3 size=1 size=1->2 size=1

/ \ \

key=9 key=11 key=112

size=1 size=1 size=1

You can see that the needed modifications to sizes occur along the path of descent through the tree. In fact, Weiss has the increments occurring as the execution returns back from level to level, so as to avoid incrementation in failing cases (attempted inserts of duplicate keys.)

Similarly, remove causes decrements in sizes along the path. Compare the code for BST and BSTwithRank and see there is only one line added to each.

In pa06, we implement OrderedList with BSTwithRank. Then get() is almost the same as findKth(), the only difference being that element positions start with 0 for get and with 1 for findKth.

OrderedList is supposed to allow duplicates, whereas BST’s disallow them by the fundamental rule.