Tues, Nov 7: Chap. 3 Deadlocks

In Chap. 2, we saw the necessity of locks. Here we study their downside, deadlocks.

Deadly Embrace: simplest and most common deadlock, example:

process A requests resource P, granted (P is a printer)

B requests resource Q, granted (Q is a plotter)

B requests P, waits

A requests Q, waits

pg. 163—Definition of deadlock. Note it doesn’t mention resources, which is good because sometimes they are hard to pinpoint.

Example: a process locks dev1, forks, and the parent waits for the child, and the child requests a lock on dev1

Now the parent is waiting on the child’s termination, which only the child can do, and the child is waiting on the unlock, which only the parent can do. So clearly this is a deadlock by the definition.

Clearly dev1 is one resource, but one resource isn’t enough. It turns out the child process itself is a resource here, since the parent is waiting on it.

pg. 164—conditions for deadlock.

Resource allocation graphs—draw deadly embrace pic, see cycle in graph.

Examples of resource graph on pg. 169. We see a cycle here, so there’s a deadlock involving processes E, G, and D. The others are not is such a state.

In fact, the situation involving S is short-lived. There are 4 requests for it and noone holds it! Some other process must have just given it up. If the lock manager is smart enough to give it to A, then A has all it wants and finishes. Then the lock manager can give it to C, and C finishes. Then the lock manager can give it to F, and F finishes. That leaves only the deadlocked processes, which have no legitimate way to proceed without killing off one of them.

How to do DFS to look for cycle. Need to mark nodes “visited” as we go along, so we can recognize that we’ve come back to an earlier node.

Actually there’s more to it than just revisiting. Consider a graph with node A connected to two nodes B and C, and then both of those are connected to D. A DFS from A will visit D via B, and then revisit it via C, but there is no cycle. We need to find that we are revisiting a node that we saw on the path down the tree, not just anywhere. We can check to see if the revisited node is in the list of nodes that defines where we are in the tree, the path from the starting point. If so, there’s a cycle.

Tan, top of pg 171: “This algorithm is far from optimal” What does that mean? It takes O(N+E), E the number of edges, N the number of nodes, and we know E can be O(N**2), while checking a cycle takes O(N) so there is room for improvement. However I don’t know such an algorithm. I’ve ordered the book.

Another example:

process A holds T, wants R

process B holds R, wants S

process C holds nothing, wants R and S

Graph:…

No cycle, so no deadlock

Unwinding: B gets S, finishes

A gets R, finishes

C gets R & T, finishes

DFS check

from B –nothing much explored

from A:

L = ()

L = (A)

L = (A, R)

L = (A, R, B)

L = (A, R,B, S)

L = (A, R, B)

L = (A, R)

L = (A)

From C …

So no cycle.

Multi-unit Resources

4 identical printers—can be servicing 4 different processes at the same time

Look at pg. 173: E = ( 4 2 3 1), A = ( 2 1 0 0)

C = …

so C₁₁ = 0 = process 1’s current holding of resource 1

C₁₂ = 0 = process 1’s current holding of resource 2

C₁₃ = 1 = process 1’s current holding of resource 3, etc.

R₁₁ = 2 = process 1’s current request for resource 1

R₁₂ = 0 = process 1’s current request for resource 2

C₁₃ = 0 = process 1’s current request for resource 3, etc.

Note that A₁ = E₁ – (C₁₁+ C₂₁+ C₃₁)

or in general,

A_i = E_i – (C_1i+ C_2i+ C_3i) = E_i – sum of C_ji

Recall idea of unwinding—find a process whose requested resources can be satisfied from the available ones.

Here we want

(ith row of R) <= A (in the vector sense of each componenet <=)

Then we finish that process, and add its held resources to A.

Loop until all processes finished, or some will be left in deadlock.

Note that “finished” or Tanenbaum’s “are able to complete” (line 8, pg 172) is really an overstatement. The process might get into trouble later on. We really should say “can make further progress from here”.

What should the system do when it detects deadlock??

Idea of checkpointing. Once I was involved with a program that took 90 days to finish on a mainframe (probably can do overnight on a PC today!). It checkpointed itself every 12 hours, so crashes and downtimes couldn’t stop it.

Databases do checkpoints too—covered in cs634.

Killing processes or rebooting the system—everything should be rerunnable. Don’t overwrite original information of runs!

Databases are a big help in keeping order. Take cs630 and apply it in your work.

Deadlock Avoidence

Neat idea, mostly theory.

pg. 176 analysis of deadly embrace:

A—assigns printer, then plotter

B—assigns plotter, then printer

They only get in trouble if they do this in just the right interleaved way. Basically, there needs to be a time x at which A and B have both assigned one resource but not the other. Then they are sunk. The system has a chance to do something when the second assignment of the first resource happens. That would be deadlock avoidance.

Note that if you had control over the two programs, you could fix this by making B assign the printer first, plotter second, just like A. Then they couldn’t deadlock. This is the idea of resource ordering, see pg. 182.

To get deadlock avoidance to work, we need more information than with deadlock detection. Note the “Max” numbers on pg. 177. These are the max number of resource units needed for the whole program run. (Current OS’s don’t collect this info, so can’t use this approach)

Safe state: allows unwinding even if all processes immediately requested their max resources.

Unsafe state: no such unwinding, might deadlock.

Famous Banker’s Algorithm—Dijkstra, 1965

Fix Pg 179—add spaces between digits in vectors E, P, A!

C = left matrix, R = right matrix, requests for resources, here covers all future requests of process

E = existent

P = possessed, i.e. sum of columns of C

A = available, so A = E – P

Algorithm—find unwinding again.

Note what Tanenbaum says: no systems he knows of actually use this!

Read through to the end of Section 3.6. Note the trick of resource numbering. That is actually in use in applications, inside some OS’s and in database servers. Unfortunately it is hard to use. It doesn’t fit in with any kind of modular programming. The active resource list is the worst kind of global data.