CS444 Class 24: Chap. 6 Deadlocks

Reading: Chap 6 through 6.6, pg. 457

In Chap. 2, we saw the necessity of locks. Here we study their downside, deadlocks.

Deadly Embrace: simplest and most common deadlock, example:

process A requests resource P, granted (P is a printer)

B requests resource Q, granted (Q is a plotter)

B requests P, waits

A requests Q, waits

They both wait forever…

This can be prevented by the resource ordering method: Tell all programmers to lock the printer before the plotter in all programs. Then the programs never deadlock. But of course this is very hard to enforce!

pg. 437—Definition of deadlock. Note it doesn’t mention resources, which is good because sometimes they are hard to pinpoint.

Example: a process locks dev1, forks, and the parent waits for the child, and the child requests a lock on dev1

Now the parent is waiting on the child’s termination, which only the child can do, and the child is waiting on the unlock of dev1, which only the parent can do. So clearly this is a deadlock by the definition.

Clearly dev1 is one resource, but one resource isn’t enough. It turns out the child process itself is a resource here, since the parent is waiting on it.

Well, there is one dumb case of deadlock on a single resource, with a single process: The process requests and gets a lock, then later requests it again, and waits forever. This really happens in apps during development, but should be fixed fairly soon.

pg. 438—conditions for deadlock.

Resource allocation graphs—resource node is a square, process is a circle.

Resource is held by a process: arrow from resource to process.

Process is waiting for resource, or requesting it: arrow from process to resource.

draw deadly embrace pic, see cycle in graph.

Examples of resource graph on pg. 443. We see a cycle here, so there’s a deadlock involving processes E, G, and D. The others are not is such a state.

In fact, the situation involving S is short-lived. There are 4 requests for it and noone holds it! Some other process must have just given it up. If the lock manager is smart enough to give it to A, then A has all it wants and finishes. Then the lock manager can give it to C, and C finishes. Then the lock manager can give it to F, and F finishes. That leaves only the deadlocked processes, which have no legitimate way to proceed without killing off one of them.

Deadlock Detection: The first step needed to resolve a deadlock, and a challenge in itself.

To do: Find a cycle if it exists in the resource graph.

How to do DFS to look for cycle. Need to mark nodes “visited” as we go along, so we can recognize that we’ve come back to an earlier node.

Actually there’s more to it than just revisiting. Consider a graph with node A connected to two nodes B and C, and then both of those are connected to D. A DFS from A will visit D via B, and then revisit it via C, but there is no cycle.

We need to find that we are revisiting a node that we saw on the path down the tree to this node, not just anywhere. We can check to see if the revisited node is in the list of nodes that defines where we are in the tree, the path from the starting point. If so, there’s a cycle.

Tan. Algorithm pg. 443—note that steps 4, 5, and 6 are DFS steps. We can start from a DFS code, and add the list handling to it.

Tan, middle of pg 444: “This algorithm is far from optimal” What does that mean? It takes O(N+E), E the number of edges, N the number of nodes, and we know E can be O(N**2), while checking a cycle takes O(N) so there is room for improvement. However I don’t know such an algorithm. You can use a topological sort algorithm for this, but they are also O(N+E) or more I believe.

Another example:

process A holds T, wants R

process B holds R, wants S

process C holds nothing, wants R and S

Graph:…

No cycle, so no deadlock

Idea of unwinding (not actually a term in Tanenbaum, unfortunately)—it’s a way out of a certain resource graph state, that always must exist if the system is not deadlocked. We list a sequence of actions that allows all the processes to run free.

Unwinding: B gets S, finishes

A gets R, finishes

C gets R & T, finishes

Example in Tan, bottom of pg. 446, but not labeled “unwinding”.

DFS check

from B –nothing much explored

from A:

L = ()

L = (A)

L = (A, R)

L = (A, R, B)

L = (A, R,B, S)

L = (A, R, B)

L = (A, R)

L = (A)

From C …

So no cycle.

Multi-unit Resources

4 identical printers—can be servicing 4 different processes at the same time

Look at pg. 446: E = (4 2 3 1), A = ( 2 1 0 0)

C = …

so C₁₁ = 0 = process 1’s current holding of resource 1

C₁₂ = 0 = process 1’s current holding of resource 2

C₁₃ = 1 = process 1’s current holding of resource 3, etc.

R₁₁ = 2 = process 1’s current request for resource 1

R₁₂ = 0 = process 1’s current request for resource 2

C₁₃ = 0 = process 1’s current request for resource 3, etc.

Note that A₁ = E₁ – (C₁₁+ C₂₁+ C₃₁)

or in general,

A_i = E_i – (C_1i+ C_2i+ C_3i) = E_i – sum of C_ji

Recall idea of unwinding—find a process whose requested resources can be satisfied from the available ones.

Here we want

(ith row of R) <= A (in the vector sense of each componenet <=)

Then we finish that process, and add its held resources to A.

Loop until all processes finished, or some will be left in deadlock.

Note that “finished” or Tanenbaum’s “are able to complete” (line 8, pg 172) is really an overstatement. The process might get into trouble later on. We really should say “can make further progress from here”.

What should the system do when it detects deadlock??

Idea of checkpointing. Once I was involved with a program that took 90 days to finish on a mainframe (probably can do overnight on a PC today!). It checkpointed itself every 12 hours, so crashes and downtimes couldn’t stop it.

Databases do checkpoints too—covered in cs634.

Killing processes or rebooting the system—everything should be rerunnable. Don’t overwrite original information of runs!

Databases are a big help in keeping order. Take cs630 and apply it in your work.

6.5 Deadlock Avoidence

Neat idea, mostly theory.

pg. 449 analysis of deadly embrace:

A—assigns printer, then plotter

B—assigns plotter, then printer

They only get in trouble if they do this in just the right interleaved way. Basically, there needs to be a time x at which A and B have both assigned one resource but not the other. Then they are sunk. The system has a chance to do something when the second assignment of the first resource happens. That would be deadlock avoidance.

Note that if you had control over the two programs, you could fix this by making B assign the printer first, plotter second, just like A. Then they couldn’t deadlock. This is the idea of resource ordering, see pg. 456.

To get deadlock avoidance to work, we need more information than with deadlock detection. Note the “Max” numbers on pg. 177. These are the max number of resource units needed for the whole program run. (Current OS’s don’t collect this info, so can’t use this approach)

Safe state: allows unwinding even if all processes immediately requested their max resources.

Unsafe state: no such unwinding, might deadlock.

Famous Banker’s Algorithm—Dijkstra, 1965

Fix Pg 453—add spaces between digits in vectors E, P, A!

C = left matrix, R = right matrix, requests for resources, here covers all future requests of process

E = existent

P = possessed, i.e. sum of columns of C

A = available, so A = E – P

Algorithm—find unwinding again. But complicated, don’t worry about details here.

Note what Tanenbaum says: no systems he knows of actually use this!

Read through to the end of Section 6.6. Note the trick of resource numbering. That is actually in use in applications, inside some OS’s and in database servers. Unfortunately it is hard to use. It doesn’t fit in with any kind of modular programming. The active resource list is the worst kind of global data.

In the printer-plotter example, make the printer #1, the plotter #2 for example. Then all programs will request the printer lock before the plotter lock, and will never deadlock.

Locks and Deadlock in Java

Each object has a built-in mutex that can act as a lock, with no deadlock detection. Semaphores can be used as mutex, thus locks, but have no deadlock detection. Java supplies a Lock interface that can have an implementation with deadlock detection—see its Javadoc in Java 7 (not earlier). But the concrete classes of the JDK apparently don’t offer this feature. What to do?

Use a database! Traditional databases (Oracle, DB2, mysql, SqlServer,etc.) do track deadlock situations and resolve them by aborting one transaction.

Although Java doesn’t do deadlock detection for the running program, it now (Java 6+) does help us diagnose deadlocks when they happen. When they happen, the app “locks up” and just hangs there, running in a certain process with a certain pid. To research this, get a “thread dump” by using the Java tool “jstack”:

jstack <pid> (or jstack <pid> > threaddump.dat)

This writes an extensive report on all the threads, with backtraces, and then near the end of the report, lists the deadlocked processes with backtraces, so you can see the locking calls that caused the deadlock. This can be very useful, especially for server processes that have a lot of threads.

Android Note

Although Linux has no deadlock detection, Android apps are encouraged to use the supplied SQLLite database for storing persistent data. However, SQLLite is not a traditional database, and does not use fine-grained locking. In fact for Android up to API 11 of Android 3.0, each transaction locked the entire database, which avoids deadlocks at great expense to concurrency. Since then, some concurrency has been allowed. See http://www.enterra-inc.com/techzone/handling_sql_issues/